Search This Blog

Showing posts with label algorithm. Show all posts
Showing posts with label algorithm. Show all posts

Sunday 28 May 2017

When algorithms are racist

Ian Tucker in The Guardian





Joy Buolamwini is a graduate researcher at the MIT Media Lab and founder of the Algorithmic Justice League – an organisation that aims to challenge the biases in decision-making software. She grew up in Mississippi, gained a Rhodes scholarship, and she is also a Fulbright fellow, an Astronaut scholar and a Google Anita Borg scholar. Earlier this year she won a $50,000 scholarship funded by the makers of the film Hidden Figures for her work fighting coded discrimination.


A lot of your work concerns facial recognition technology. How did you become interested in that area?

When I was a computer science undergraduate I was working on social robotics – the robots use computer vision to detect the humans they socialise with. I discovered I had a hard time being detected by the robot compared to lighter-skinned people. At the time I thought this was a one-off thing and that people would fix this.

Later I was in Hong Kong for an entrepreneur event where I tried out another social robot and ran into similar problems. I asked about the code that they used and it turned out we’d used the same open-source code for face detection – this is where I started to get a sense that unconscious bias might feed into the technology that we create. But again I assumed people would fix this.

So I was very surprised to come to the Media Lab about half a decade later as a graduate student, and run into the same problem. I found wearing a white mask worked better than using my actual face.
This is when I thought, you’ve known about this for some time, maybe it’s time to speak up.


How does this problem come about?


Within the facial recognition community you have benchmark data sets which are meant to show the performance of various algorithms so you can compare them. There is an assumption that if you do well on the benchmarks then you’re doing well overall. But we haven’t questioned the representativeness of the benchmarks, so if we do well on that benchmark we give ourselves a false notion of progress.

When we look at it now it seems very obvious, but with work in a research lab, I understand you do the “down the hall test” – you’re putting this together quickly, you have a deadline, I can see why these skews have come about. Collecting data, particularly diverse data, is not an easy thing.
Outside of the lab, isn’t it difficult to tell that you’re discriminated against by an algorithm?

Absolutely, you don’t even know it’s an option. We’re trying to identify bias, to point out cases where bias can occur so people can know what to look out for, but also develop tools where the creators of systems can check for a bias in their design.

Instead of getting a system that works well for 98% of people in this data set, we want to know how well it works for different demographic groups. Let’s say you’re using systems that have been trained on lighter faces but the people most impacted by the use of this system have darker faces, is it fair to use that system on this specific population?

Georgetown Law recently found that one in two adults in the US has their face in the facial recognition network. That network can be searched using algorithms that haven’t been audited for accuracy. I view this as another red flag for why it matters that we highlight bias and provide tools to identify and mitigate it.


Besides facial recognition what areas have an algorithm problem?


The rise of automation and the increased reliance on algorithms for high-stakes decisions such as whether someone gets insurance of not, your likelihood to default on a loan or somebody’s risk of recidivism means this is something that needs to be addressed. Even admissions decisions are increasingly automated – what school our children go to and what opportunities they have. We don’t have to bring the structural inequalities of the past into the future we create, but that’s only going to happen if we are intentional.


If these systems are based on old data isn’t the danger that they simply preserve the status quo?
Absolutely. A study on Google found that ads for executive level positions were more likely to be shown to men than women – if you’re trying to determine who the ideal candidate is and all you have is historical data to go on, you’re going to present an ideal candidate which is based on the values of the past. Our past dwells within our algorithms. We know our past is unequal but to create a more equal future we have to look at the characteristics that we are optimising for. Who is represented? Who isn’t represented?

Isn’t there a counter-argument to transparency and openness for algorithms? One, that they are commercially sensitive and two, that once in the open they can be manipulated or gamed by hackers?

I definitely understand companies want to keep their algorithms proprietary because that gives them a competitive advantage, and depending on the types of decisions that are being made and the country they are operating in, that can be protected.

When you’re dealing with deep neural networks that are not necessarily transparent in the first place, another way of being accountable is being transparent about the outcomes and about the bias it has been tested for. Others have been working on black box testing for automated decision-making systems. You can keep your secret sauce secret, but we need to know, given these inputs, whether there is any bias across gender, ethnicity in the decisions being made.


Thinking about yourself – growing up in Mississippi, a Rhodes Scholar, a Fulbright Fellow and now at MIT – do you wonder that if those admissions decisions had been taken by algorithms you might not have ended up where you are?

If we’re thinking likely probabilities in the tech world, black women are in the 1%. But when I look at the opportunities I have had, I am a particular type of person who would do well. I come from a household where I have two college-educated parents – my grandfather was a professor in school of pharmacy in Ghana – so when you look at other people who have had the opportunity to become a Rhodes Scholar or do a Fulbright I very much fit those patterns. Yes, I’ve worked hard and I’ve had to overcome many obstacles but at the same time I’ve been positioned to do well by other metrics. So it depends on what you choose to focus on – looking from an identity perspective it’s as a very different story.

In the introduction to Hidden Figures the author Margot Lee Shetterly talks about how growing up near Nasa’s Langley Research Center in the 1960s led her to believe that it was standard for African Americans to be engineers, mathematicians and scientists…

That it becomes your norm. The movie reminded me of how important representation is. We have a very narrow vision of what technology can enable right now because we have very low participation. I’m excited to see what people create when it’s no longer just the domain of the tech elite, what happens when we open this up, that’s what I want to be part of enabling.

Friday 16 December 2016

How Google's search algorithm spreads false information with a rightwing bias

Olivia Solon and Sam Levin in The Guardian


Google’s search algorithm appears to be systematically promoting information that is either false or slanted with an extreme rightwing bias on subjects as varied as climate change and homosexuality.


Following a recent investigation by the Observer, which uncovered that Google’s search engine prominently suggests neo-Nazi websites and antisemitic writing, the Guardian has uncovered a dozen additional examples of biased search results.

Google’s search algorithm and its autocomplete function prioritize websites that, for example, declare that climate change is a hoax, being gay is a sin, and the Sandy Hook mass shooting never happened.







The increased scrutiny on the algorithms of Google – which removed antisemitic and sexist autocomplete phrases after the recent Observer investigation – comes at a time of tense debate surrounding the role of fake news in building support for conservative political leaders, particularly US President-elect Donald Trump.

Facebook has faced significant backlash for its role in enabling widespread dissemination of misinformation, and data scientists and communication experts have argued that rightwing groups have found creative ways to manipulate social media trends and search algorithms.





Google alters search autocomplete to remove 'are Jews evil' suggestion



The Guardian’s latest findings further suggest that Google’s searches are contributing to the problem.

In the past, when a journalist or academic exposes one of these algorithmic hiccups, humans at Google quietly make manual adjustments in a process that’s neither transparent nor accountable.

At the same time, politically motivated third parties including the ‘alt-right’, a far-right movement in the US, use a variety of techniques to trick the algorithm and push propaganda and misinformation higher up Google’s search rankings.

These insidious manipulations – both by Google and by third parties trying to game the system – impact how users of the search engine perceive the world, even influencing the way they vote. This has led some researchers to study Google’s role in the presidential election in the same way that they have scrutinized Facebook.






Robert Epstein from the American Institute for Behavioral Research and Technology has spent four years trying to reverse engineer Google’s search algorithms. He believes, based on systematic research, that Google has the power to rig elections through something he calls the search engine manipulation effect (SEME).

Epstein conducted five experiments in two countries to find that biased rankings in search results can shift the opinions of undecided voters. If Google tweaks its algorithm to show more positive search results for a candidate, the searcher may form a more positive opinion of that candidate.

In September 2016, Epstein released findings, published through Russian news agency Sputnik News, that indicated Google had suppressed negative autocomplete search results relating to Hillary Clinton.

“We know that if there’s a negative autocomplete suggestion in the list, it will draw somewhere between five and 15 times as many clicks as a neutral suggestion,” Epstein said. “If you omit negatives for one perspective, one hotel chain or one candidate, you have a heck of a lot of people who are going to see only positive things for whatever the perspective you are supporting.”






Even changing the order in which certain search terms appear in the autocompleted list can make a huge impact, with the first result drawing the most clicks, he said.

At the time, Google said the autocomplete algorithm was designed to omit disparaging or offensive terms associated with individuals’ names but that it wasn’t an “exact science”.

Then there’s the secret recipe of factors that feed into the algorithm Google uses to determine a web page’s importance – embedded with the biases of the humans who programmed it. These factors include how many and which other websites link to a page, how much traffic it receives, and how often a page is updated. People who are very active politically are typically the most partisan, which means that extremist views peddled actively on blogs and fringe media sites get elevated in the search ranking.

“These platforms are structured in such a way that they are allowing and enabling – consciously or unconsciously – more extreme views to dominate,” said Martin Moore from Kings College London’s Centre for the Study of Media, Communication and Power.

Appearing on the first page of Google search results can give websites with questionable editorial principles undue authority and traffic.

“These two manipulations can work together to have an enormous impact on people without their knowledge that they are being manipulated, and our research shows that very clearly,” Epstein said. “Virtually no one is aware of bias in search suggestions or rankings.”

This is compounded by Google’s personalization of search results, which means different users see different results based on their interests. “This gives companies like Google even more power to influence people’s opinions, attitudes, beliefs and behaviors,” he said.

Epstein wants Google to be more transparent about how and when it manually manipulates the algorithm.


“They are constantly making these adjustments. It’s absurd for them to say everything is automated,” he said. Manual removals from autocomplete include “are jews evil” and “are women evil”. Google has also altered its results so when someone searches for ways to kill themselves they are shown a suicide helpline.

Shortly after Epstein released his research indicating the suppression of negative autocomplete search results relating to Clinton, which he credits to close ties between the Clinton campaign and Google, the search engine appeared to pull back from such censorship, he said. This, he argued, allowed for a flood of pro-Trump, anti-Clinton content (including fake news), some of which was created in retaliation to bubble to the top.

“If I had to do it over again I would not have released those data. There is some indication that they had an impact that was detrimental to Hillary Clinton, which was never my intention.”

Rhea Drysdale, the CEO of digital marketing company Outspoken Media, did not see evidence of pro-Clinton editing by Google. However, she did note networks of partisan websites – disproportionately rightwing – using much better search engine optimization techniques to ensure their worldview ranked highly.

Meanwhile, tech-savvy rightwing groups organized online and developed creative ways to control and manipulate social media conversations through mass actions, said Shane Burley, a journalist and researcher who has studied the alt-right.





“What happens is they can essentially jam hashtags so densely using multiple accounts, they end up making it trending,” he said. “That’s a great way for them to dictate how something is going to be covered, what’s going to be discussed. That’s helped them reframe the discussion of immigration.”

Burley noted that “cuckservative” – meaning conservatives who have sold out – is a good example of a term that the alt-right has managed to popularize in an effective way. Similarly if you search for “feminism is...” in Google, it autocompletes to “feminism is cancer”, a popular rallying cry for Trump supporters.

“It has this effect of making certain words kind of like magic words in search algorithms.”

The same groups – including members of the popular alt-right Reddit forum The_Donald – used techniques that are used by reputation management firms and marketers to push their companies up Google’s search results, to ensure pro-Trump imagery and articles ranked highly.

“Extremists have been trying to play Google’s algorithm for years, with varying degrees of success,” said Brittan Heller, director of technology and society at the Anti-Defamation League. “The key has traditionally been connected to influencing the algorithm with a high volume of biased search terms.”

The problem has become particularly challenging for Google in a post-truth era, where white supremacist websites may have the same indicator of “trustworthiness” in the eyes of Google as other websites high in the page rank.

“What does Google do when the lies aren’t the outliers any more?” Heller said.

“Previously there was the assumption that everything on the internet had a glimmer of truth about it. With the phenomenon of fake news and media hacking, that may be changing.”

A Google spokeswoman said in a statement: “We’ve received a lot of questions about autocomplete, and we want to help people understand how it works: Autocomplete predictions are algorithmically generated based on users’ search activity and interests. Users search for such a wide range of material on the web – 15% of searches we see every day are new. Because of this, terms that appear in Autocomplete may be unexpected or unpleasant. We do our best to prevent offensive terms, like porn and hate speech, from appearing, but we don’t always get it right. Aut
ocomplete isn’t an exact science and we’re always working to improve our algorithms.”

Sunday 4 December 2016

Are women evil? - Google, democracy and the truth about internet search

Carole Cadwalladr in The Observer


Here’s what you don’t want to do late on a Sunday night. You do not want to type seven letters into Google. That’s all I did. I typed: “a-r-e”. And then “j-e-w-s”. Since 2008, Google has attempted to predict what question you might be asking and offers you a choice. And this is what it did. It offered me a choice of potential questions it thought I might want to ask: “are jews a race?”, “are jews white?”, “are jews christians?”, and finally, “are jews evil?”

Are Jews evil? It’s not a question I’ve ever thought of asking. I hadn’t gone looking for it. But there it was. I press enter. A page of results appears. This was Google’s question. And this was Google’s answer: Jews are evil. Because there, on my screen, was the proof: an entire page of results, nine out of 10 of which “confirm” this. The top result, from a site called Listovative, has the headline: “Top 10 Major Reasons Why People Hate Jews.” I click on it: “Jews today have taken over marketing, militia, medicinal, technological, media, industrial, cinema challenges etc and continue to face the worlds [sic] envy through unexplained success stories given their inglorious past and vermin like repression all over Europe.”

Google is search. It’s the verb, to Google. It’s what we all do, all the time, whenever we want to know anything. We Google it. The site handles at least 63,000 searches a second, 5.5bn a day. Its mission as a company, the one-line overview that has informed the company since its foundation and is still the banner headline on its corporate website today, is to “organise the world’s information and make it universally accessible and useful”. It strives to give you the best, most relevant results. And in this instance the third-best, most relevant result to the search query “are Jews… ” is a link to an article from stormfront.org, a neo-Nazi website. The fifth is a YouTube video: “Why the Jews are Evil. Why we are against them.”

The sixth is from Yahoo Answers: “Why are Jews so evil?” The seventh result is: “Jews are demonic souls from a different world.” And the 10th is from jesus-is-saviour.com: “Judaism is Satanic!”

There’s one result in the 10 that offers a different point of view. It’s a link to a rather dense, scholarly book review from thetabletmag.com, a Jewish magazine, with the unfortunately misleading headline: “Why Literally Everybody In the World Hates Jews.”

I feel like I’ve fallen down a wormhole, entered some parallel universe where black is white, and good is bad. Though later, I think that perhaps what I’ve actually done is scraped the topsoil off the surface of 2016 and found one of the underground springs that has been quietly nurturing it. It’s been there all the time, of course. Just a few keystrokes away… on our laptops, our tablets, our phones. This isn’t a secret Nazi cell lurking in the shadows. It’s hiding in plain sight.


Are women… Google’s search results.

Stories about fake news on Facebook have dominated certain sections of the press for weeks following the American presidential election, but arguably this is even more powerful, more insidious. Frank Pasquale, professor of law at the University of Maryland, and one of the leading academic figures calling for tech companies to be more open and transparent, calls the results “very profound, very troubling”.

He came across a similar instance in 2006 when, “If you typed ‘Jew’ in Google, the first result was jewwatch.org. It was ‘look out for these awful Jews who are ruining your life’. And the Anti-Defamation League went after them and so they put an asterisk next to it which said: ‘These search results may be disturbing but this is an automated process.’ But what you’re showing – and I’m very glad you are documenting it and screenshotting it – is that despite the fact they have vastly researched this problem, it has gotten vastly worse.”

And ordering of search results does influence people, says Martin Moore, director of the Centre for the Study of Media, Communication and Power at King’s College, London, who has written at length on the impact of the big tech companies on our civic and political spheres. “There’s large-scale, statistically significant research into the impact of search results on political views. And the way in which you see the results and the types of results you see on the page necessarily has an impact on your perspective.” Fake news, he says, has simply “revealed a much bigger problem. These companies are so powerful and so committed to disruption. They thought they were disrupting politics but in a positive way. They hadn’t thought about the downsides. These tools offer remarkable empowerment, but there’s a dark side to it. It enables people to do very cynical, damaging things.”

Google is knowledge. It’s where you go to find things out. And evil Jews are just the start of it. There are also evil women. I didn’t go looking for them either. This is what I type: “a-r-e w-o-m-e-n”. And Google offers me just two choices, the first of which is: “Are women evil?” I press return. Yes, they are. Every one of the 10 results “confirms” that they are, including the top one, from a site called sheddingoftheego.com, which is boxed out and highlighted: “Every woman has some degree of prostitute in her. Every woman has a little evil in her… Women don’t love men, they love what they can do for them. It is within reason to say women feel attraction but they cannot love men.”

Next I type: “a-r-e m-u-s-l-i-m-s”. And Google suggests I should ask: “Are Muslims bad?” And here’s what I find out: yes, they are. That’s what the top result says and six of the others. Without typing anything else, simply putting the cursor in the search box, Google offers me two new searches and I go for the first, “Islam is bad for society”. In the next list of suggestions, I’m offered: “Islam must be destroyed.”

Jews are evil. Muslims need to be eradicated. And Hitler? Do you want to know about Hitler? Let’s Google it. “Was Hitler bad?” I type. And here’s Google’s top result: “10 Reasons Why Hitler Was One Of The Good Guys” I click on the link: “He never wanted to kill any Jews”; “he cared about conditions for Jews in the work camps”; “he implemented social and cultural reform.” Eight out of the other 10 search results agree: Hitler really wasn’t that bad.

A few days later, I talk to Danny Sullivan, the founding editor of SearchEngineLand.com. He’s been recommended to me by several academics as one of the most knowledgeable experts on search. Am I just being naive, I ask him? Should I have known this was out there? “No, you’re not being naive,” he says. “This is awful. It’s horrible. It’s the equivalent of going into a library and asking a librarian about Judaism and being handed 10 books of hate. Google is doing a horrible, horrible job of delivering answers here. It can and should do better.”

He’s surprised too. “I thought they stopped offering autocomplete suggestions for religions in 2011.” And then he types “are women” into his own computer. “Good lord! That answer at the top. It’s a featured result. It’s called a “direct answer”. This is supposed to be indisputable. It’s Google’s highest endorsement.” That every women has some degree of prostitute in her? “Yes. This is Google’s algorithm going terribly wrong.”

I contacted Google about its seemingly malfunctioning autocomplete suggestions and received the following response: “Our search results are a reflection of the content across the web. This means that sometimes unpleasant portrayals of sensitive subject matter online can affect what search results appear for a given query. These results don’t reflect Google’s own opinions or beliefs – as a company, we strongly value a diversity of perspectives, ideas and cultures.”

Google isn’t just a search engine, of course. Search was the foundation of the company but that was just the beginning. Alphabet, Google’s parent company, now has the greatest concentration of artificial intelligence experts in the world. It is expanding into healthcare, transportation, energy. It’s able to attract the world’s top computer scientists, physicists and engineers. It’s bought hundreds of start-ups, including Calico, whose stated mission is to “cure death” and DeepMind, which aims to “solve intelligence”.


FacebookTwitterPinterest Google co-founders Larry Page and Sergey Brin in 2002. Photograph: Michael Grecco/Getty Images

And 20 years ago it didn’t even exist. When Tony Blair became prime minister, it wasn’t possible to Google him: the search engine had yet to be invented. The company was only founded in 1998 and Facebook didn’t appear until 2004. Google’s founders Sergey Brin and Larry Page are still only 43. Mark Zuckerberg of Facebook is 32. Everything they’ve done, the world they’ve remade, has been done in the blink of an eye.

But it seems the implications about the power and reach of these companies is only now seeping into the public consciousness. I ask Rebecca MacKinnon, director of the Ranking Digital Rights project at the New America Foundation, whether it was the recent furore over fake news that woke people up to the danger of ceding our rights as citizens to corporations. “It’s kind of weird right now,” she says, “because people are finally saying, ‘Gee, Facebook and Google really have a lot of power’ like it’s this big revelation. And it’s like, ‘D’oh.’”

MacKinnon has a particular expertise in how authoritarian governments adapt to the internet and bend it to their purposes. “China and Russia are a cautionary tale for us. I think what happens is that it goes back and forth. So during the Arab spring, it seemed like the good guys were further ahead. And now it seems like the bad guys are. Pro-democracy activists are using the internet more than ever but at the same time, the adversary has gotten so much more skilled.”

Last week Jonathan Albright, an assistant professor of communications at Elon University in North Carolina, published the first detailed research on how rightwing websites had spread their message. “I took a list of these fake news sites that was circulating, I had an initial list of 306 of them and I used a tool – like the one Google uses – to scrape them for links and then I mapped them. So I looked at where the links went – into YouTube and Facebook, and between each other, millions of them… and I just couldn’t believe what I was seeing.

“They have created a web that is bleeding through on to our web. This isn’t a conspiracy. There isn’t one person who’s created this. It’s a vast system of hundreds of different sites that are using all the same tricks that all websites use. They’re sending out thousands of links to other sites and together this has created a vast satellite system of rightwing news and propaganda that has completely surrounded the mainstream media system.

He found 23,000 pages and 1.3m hyperlinks. “And Facebook is just the amplification device. When you look at it in 3D, it actually looks like a virus. And Facebook was just one of the hosts for the virus that helps it spread faster. You can see the New York Times in there and the Washington Post and then you can see how there’s a vast, vast network surrounding them. The best way of describing it is as an ecosystem. This really goes way beyond individual sites or individual stories. What this map shows is the distribution network and you can see that it’s surrounding and actually choking the mainstream news ecosystem.”

Like a cancer? “Like an organism that is growing and getting stronger all the time.”

Charlie Beckett, a professor in the school of media and communications at LSE, tells me: “We’ve been arguing for some time now that plurality of news media is good. Diversity is good. Critiquing the mainstream media is good. But now… it’s gone wildly out of control. What Jonathan Albright’s research has shown is that this isn’t a byproduct of the internet. And it’s not even being done for commercial reasons. It’s motivated by ideology, by people who are quite deliberately trying to destabilise the internet.”


A spatial map of the rightwing fake news ecosystem. Jonathan Albright, assistant professor of communications at Elon University, North Carolina, “scraped” 300 fake news sites (the dark shapes on this map) to reveal the 1.3m hyperlinks that connect them together and link them into the mainstream news ecosystem. Here, Albright shows it is a “vast satellite system of rightwing news and propaganda that has completely surrounded the mainstream media system”. Photograph: Jonathan Albright
Albright’s map also provides a clue to understanding the Google search results I found. What these rightwing news sites have done, he explains, is what most commercial websites try to do. They try to find the tricks that will move them up Google’s PageRank system. They try and “game” the algorithm. And what his map shows is how well they’re doing that.

That’s what my searches are showing too. That the right has colonised the digital space around these subjects – Muslims, women, Jews, the Holocaust, black people – far more effectively than the liberal left.

“It’s an information war,” says Albright. “That’s what I keep coming back to.”

But it’s where it goes from here that’s truly frightening. I ask him how it can be stopped. “I don’t know. I’m not sure it can be. It’s a network. It’s far more powerful than any one actor.”

So, it’s almost got a life of its own? “Yes, and it’s learning. Every day, it’s getting stronger.”

The more people who search for information about Jews, the more people will see links to hate sites, and the more they click on those links (very few people click on to the second page of results) the more traffic the sites will get, the more links they will accrue and the more authoritative they will appear. This is an entirely circular knowledge economy that has only one outcome: an amplification of the message. Jews are evil. Women are evil. Islam must be destroyed. Hitler was one of the good guys.

And the constellation of websites that Albright found – a sort of shadow internet – has another function. More than just spreading rightwing ideology, they are being used to track and monitor and influence anyone who comes across their content. “I scraped the trackers on these sites and I was absolutely dumbfounded. Every time someone likes one of these posts on Facebook or visits one of these websites, the scripts are then following you around the web. And this enables data-mining and influencing companies like Cambridge Analytica to precisely target individuals, to follow them around the web, and to send them highly personalised political messages. This is a propaganda machine. It’s targeting people individually to recruit them to an idea. It’s a level of social engineering that I’ve never seen before. They’re capturing people and then keeping them on an emotional leash and never letting them go.”

Cambridge Analytica, an American-owned company based in London, was employed by both the Vote Leave campaign and the Trump campaign. Dominic Cummings, the campaign director of Vote Leave, has made few public announcements since the Brexit referendum but he did say this: “If you want to make big improvements in communication, my advice is – hire physicists.”

Steve Bannon, founder of Breitbart News and the newly appointed chief strategist to Trump, is on Cambridge Analytica’s board and it has emerged that the company is in talks to undertake political messaging work for the Trump administration. It claims to have built psychological profiles using 5,000 separate pieces of data on 220 million American voters. It knows their quirks and nuances and daily habits and can target them individually.

“They were using 40-50,000 different variants of ad every day that were continuously measuring responses and then adapting and evolving based on that response,” says Martin Moore of Kings College. Because they have so much data on individuals and they use such phenomenally powerful distribution networks, they allow campaigns to bypass a lot of existing laws.

“It’s all done completely opaquely and they can spend as much money as they like on particular locations because you can focus on a five-mile radius or even a single demographic. Fake news is important but it’s only one part of it. These companies have found a way of transgressing 150 years of legislation that we’ve developed to make elections fair and open.”

Did such micro-targeted propaganda – currently legal – swing the Brexit vote? We have no way of knowing. Did the same methods used by Cambridge Analytica help Trump to victory? Again, we have no way of knowing. This is all happening in complete darkness. We have no way of knowing how our personal data is being mined and used to influence us. We don’t realise that the Facebook page we are looking at, the Google page, the ads that we are seeing, the search results we are using, are all being personalised to us. We don’t see it because we have nothing to compare it to. And it is not being monitored or recorded. It is not being regulated. We are inside a machine and we simply have no way of seeing the controls. Most of the time, we don’t even realise that there are controls.




Facebook and Google move to kick fake news sites off their ad networks



Rebecca MacKinnon says that most of us consider the internet to be like “the air that we breathe and the water that we drink”. It surrounds us. We use it. And we don’t question it. “But this is not a natural landscape. Programmers and executives and editors and designers, they make this landscape. They are human beings and they all make choices.”

But we don’t know what choices they are making. Neither Google or Facebook make their algorithms public. Why did my Google search return nine out of 10 search results that claim Jews are evil? We don’t know and we have no way of knowing. Their systems are what Frank Pasquale describes as “black boxes”. He calls Google and Facebook “a terrifying duopoly of power” and has been leading a growing movement of academics who are calling for “algorithmic accountability”. “We need to have regular audits of these systems,” he says. “We need people in these companies to be accountable. In the US, under the Digital Millennium Copyright Act, every company has to have a spokesman you can reach. And this is what needs to happen. They need to respond to complaints about hate speech, about bias.”

Is bias built into the system? Does it affect the kind of results that I was seeing? “There’s all sorts of bias about what counts as a legitimate source of information and how that’s weighted. There’s enormous commercial bias. And when you look at the personnel, they are young, white and perhaps Asian, but not black or Hispanic and they are overwhelmingly men. The worldview of young wealthy white men informs all these judgments.”

Later, I speak to Robert Epstein, a research psychologist at the American Institute for Behavioural Research and Technology, and the author of the study that Martin Moore told me about (and that Google has publicly criticised), showing how search-rank results affect voting patterns. On the other end of the phone, he repeats one of the searches I did. He types “do blacks…” into Google.

“Look at that. I haven’t even hit a button and it’s automatically populated the page with answers to the query: ‘Do blacks commit more crimes?’ And look, I could have been going to ask all sorts of questions. ‘Do blacks excel at sports’, or anything. And it’s only given me two choices and these aren’t simply search-based or the most searched terms right now. Google used to use that but now they use an algorithm that looks at other things. Now, let me look at Bing and Yahoo. I’m on Yahoo and I have 10 suggestions, not one of which is ‘Do black people commit more crime?’

“And people don’t question this. Google isn’t just offering a suggestion. This is a negative suggestion and we know that negative suggestions depending on lots of things can draw between five and 15 more clicks. And this all programmed. And it could be programmed differently.”

What Epstein’s work has shown is that the contents of a page of search results can influence people’s views and opinions. The type and order of search rankings was shown to influence voters in India in double-blind trials. There were similar results relating to the search suggestions you are offered.

“The general public are completely in the dark about very fundamental issues regarding online search and influence. We are talking about the most powerful mind-control machine ever invented in the history of the human race. And people don’t even notice it.”

Damien Tambini, an associate professor at the London School of Economics, who focuses on media regulation, says that we lack any sort of framework to deal with the potential impact of these companies on the democratic process. “We have structures that deal with powerful media corporations. We have competition laws. But these companies are not being held responsible. There are no powers to get Google or Facebook to disclose anything. There’s an editorial function to Google and Facebook but it’s being done by sophisticated algorithms. They say it’s machines not editors. But that’s simply a mechanised editorial function.”

And the companies, says John Naughton, the Observer columnist and a senior research fellow at Cambridge University, are terrified of acquiring editorial responsibilities they don’t want. “Though they can and regularly do tweak the results in all sorts of ways.”

Certainly the results about Google on Google don’t seem entirely neutral. Google “Is Google racist?” and the featured result – the Google answer boxed out at the top of the page – is quite clear: no. It is not.

But the enormity and complexity of having two global companies of a kind we have never seen before influencing so many areas of our lives is such, says Naughton, that “we don’t even have the mental apparatus to even know what the problems are”.

And this is especially true of the future. Google and Facebook are at the forefront of AI. They are going to own the future. And the rest of us can barely start to frame the sorts of questions we ought to be asking. “Politicians don’t think long term. And corporations don’t think long term because they’re focused on the next quarterly results and that’s what makes Google and Facebook interesting and different. They are absolutely thinking long term. They have the resources, the money, and the ambition to do whatever they want.

“They want to digitise every book in the world: they do it. They want to build a self-driving car: they do it. The fact that people are reading about these fake news stories and realising that this could have an effect on politics and elections, it’s like, ‘Which planet have you been living on?’ For Christ’s sake, this is obvious.”

“The internet is among the few things that humans have built that they don’t understand.” It is “the largest experiment involving anarchy in history. Hundreds of millions of people are, each minute, creating and consuming an untold amount of digital content in an online world that is not truly bound by terrestrial laws.” The internet as a lawless anarchic state? A massive human experiment with no checks and balances and untold potential consequences? What kind of digital doom-mongerer would say such a thing? Step forward, Eric Schmidt – Google’s chairman. They are the first lines of the book, The New Digital Age, that he wrote with Jared Cohen.

We don’t understand it. It is not bound by terrestrial laws. And it’s in the hands of two massive, all-powerful corporations. It’s their experiment, not ours. The technology that was supposed to set us free may well have helped Trump to power, or covertly helped swing votes for Brexit. It has created a vast network of propaganda that has encroached like a cancer across the entire internet. This is a technology that has enabled the likes of Cambridge Analytica to create political messages uniquely tailored to you. They understand your emotional responses and how to trigger them. They know your likes, dislikes, where you live, what you eat, what makes you laugh, what makes you cry.

And what next? Rebecca MacKinnon’s research has shown how authoritarian regimes reshape the internet for their own purposes. Is that what’s going to happen with Silicon Valley and Trump? As Martin Moore points out, the president-elect claimed that Apple chief executive Tim Cook called to congratulate him soon after his election victory. “And there will undoubtedly be be pressure on them to collaborate,” says Moore.

Journalism is failing in the face of such change and is only going to fail further. New platforms have put a bomb under the financial model – advertising – resources are shrinking, traffic is increasingly dependent on them, and publishers have no access, no insight at all, into what these platforms are doing in their headquarters, their labs. And now they are moving beyond the digital world into the physical. The next frontiers are healthcare, transportation, energy. And just as Google is a near-monopoly for search, its ambition to own and control the physical infrastructure of our lives is what’s coming next. It already owns our data and with it our identity. What will it mean when it moves into all the other areas of our lives?


 Facebook founder Mark Zuckerberg: still only 32 years of age. Photograph: Mariana Bazo/Reuters

“At the moment, there’s a distance when you Google ‘Jews are’ and get ‘Jews are evil’,” says Julia Powles, a researcher at Cambridge on technology and law. “But when you move into the physical realm, and these concepts become part of the tools being deployed when you navigate around your city or influence how people are employed, I think that has really pernicious consequences.”

Powles is shortly to publish a paper looking at DeepMind’s relationship with the NHS. “A year ago, 2 million Londoners’ NHS health records were handed over to DeepMind. And there was complete silence from politicians, from regulators, from anyone in a position of power. This is a company without any healthcare experience being given unprecedented access into the NHS and it took seven months to even know that they had the data. And that took investigative journalism to find it out.”

The headline was that DeepMind was going to work with the NHS to develop an app that would provide early warning for sufferers of kidney disease. And it is, but DeepMind’s ambitions – “to solve intelligence” – goes way beyond that. The entire history of 2 million NHS patients is, for artificial intelligence researchers, a treasure trove. And, their entry into the NHS – providing useful services in exchange for our personal data – is another massive step in their power and influence in every part of our lives.

Because the stage beyond search is prediction. Google wants to know what you want before you know yourself. “That’s the next stage,” says Martin Moore. “We talk about the omniscience of these tech giants, but that omniscience takes a huge step forward again if they are able to predict. And that’s where they want to go. To predict diseases in health. It’s really, really problematic.”

For the nearly 20 years that Google has been in existence, our view of the company has been inflected by the youth and liberal outlook of its founders. Ditto Facebook, whose mission, Zuckberg said, was not to be “a company. It was built to accomplish a social mission to make the world more open and connected.”

It would be interesting to know how he thinks that’s working out. Donald Trump is connecting through exactly the same technology platforms that supposedly helped fuel the Arab spring; connecting to racists and xenophobes. And Facebook and Google are amplifying and spreading that message. And us too – the mainstream media. Our outrage is just another node on Jonathan Albright’s data map.

“The more we argue with them, the more they know about us,” he says. “It all feeds into a circular system. What we’re seeing here is new era of network propaganda.”

We are all points on that map. And our complicity, our credulity, being consumers not concerned citizens, is an essential part of that process. And what happens next is down to us. “I would say that everybody has been really naive and we need to reset ourselves to a much more cynical place and proceed on that basis,” is Rebecca MacKinnon’s advice. “There is no doubt that where we are now is a very bad place. But it’s we as a society who have jointly created this problem. And if we want to get to a better place, when it comes to having an information ecosystem that serves human rights and democracy instead of destroying it, we have to share responsibility for that.”

Are Jews evil? How do you want that question answered? This is our internet. Not Google’s. Not Facebook’s. Not rightwing propagandists. And we’re the only ones who can reclaim it.

Monday 2 May 2016

Do we want our children taught by humans or algorithms?

Zoe Williams in The Guardian


 
Parents ‘have been galvanised by the … sight of their children in distress’ over the tests. Photograph: Dominic Lipinski/PA



It is incredibly hard for a headteacher to shout “rubbish” in a crowded hall while an authority figure is speaking. It is like asking a lung specialist to smoke a cigarette. Yet that’s what happened when Nicky Morgan addressed the National Association of Head Teachers conference yesterday. They objected partly to her programme of turning all schools into academies by 2020 and partly to her luminously daft insistence that “testing”, “improving” and “educating” are interchangeable words. 

Her government “introduced the phonics check for six-year-olds, and 100,000 more young people are able to read better as a result,” she told the BBC when she first became education secretary, and she has been trotting out the same nonsense ever since. No amount of disagreement from professionals in the field dents her faith or alters her rhetoric. Indeed, since the Michael Gove era, teachers have been treated as recalcitrant by definition, motivated by sloth, their years of experience reframed not as wisdom but as burnout. When they object to a policy, that merely proves what a sorely needed challenge it poses to their cushy lives. When they shout “rubbish” in a conference hall, it is yet more evidence of what a dangerous bunch of trots they are.

On Tuesday, parents enter the fray, with a school boycott organised by Let Our Kids Be Kids, to protest against “unnecessary testing and a curriculum that limits enjoyment and real understanding”. Some have been galvanised by the bizarre and unnecessary sight of their children in distress, others by solidarity with the teachers – who inconveniently continue to command a great deal of respect among people who actually meet with them – and others who can’t join in the boycott because of minor administrative details such as having to go to work, but have signed the petition. It is the beginning of a new activism – muscular, cooperative and agile because it has to be.


The boycott is in protest against ‘unnecessary testing and a curriculum that limits enjoyment and real understanding’. Photograph: Barry Batchelor/PA

If the only problem is that it causes anxiety to a load of pampered under-10s, shouldn’t they just suck it up? Isn’t that the best way to learn what the world is like? The framing of this debate is precisely wrong. No serious educationalist thinks that the way to drive up standards among children is to make tests more frequent and more exacting. Nor does anybody of any expertise really believe that teachers need to be incentivised by results. It is an incredibly tough, demanding, indifferently remunerated job, which nobody would do except as a vocation. It is not for the profession or the parents to explain what the tests are doing to the kids; it is for the education secretary to explain what these tests are for. 

By coincidence, at the end of last week, Randi Weingarten, head of the American Federation of Teachers, was in London to hand in a petition to Pearson, the education company and provider of curriculums and test delivery. The petition protested against two perceived issues: concerns about over-testing in US schools and alleged profiteering in the global south. The trajectory in US education, from universal public provision with local accountability to mass outsourcing and centralised control, is strikingly similar to what has happened here. It begins with the creation of a failure narrative, “that both the Democrats and the Republican bought into, which is, the sky is falling, the sky is falling, the sky is falling”, Weingarten told me. That creates the rationale for testing, since, without data, you can’t tell whether you’re improving. Those tests are consequential: the results can be used to fire teachers, close down schools, hold pupils back a year. All the most profound decisions in education can suddenly be made by an algorithm, with no human judgment necessary.

Simultaneously, says Weingarten, Charter schools were introduced, originally – like academies – “as part of a bigger public school system where you could incubate ideas”, but very soon remodelled as a way to supplant rather than supplement the existing system. “And in between all of this, you started seeing the marketisation and the monetisation.” Until things can be counted, there isn’t much scope to create a market.

I was never fully convinced that academisation and hyper-testing were undertaken to create the market conditions for privatisation down the line; I thought it more plausible that the testing was merely a politician’s wheeze to create data out of humans that could then be stuffed into manifestos to persuade other humans that the policies were going in the right direction. Yet the parallels between the US and England are insistent – it has become impossible to ignore the idea that our government is mimicking theirs for a reason.

Whether all this is a prelude to privatisation or a PR stunt for a chaotic government doesn’t actually matter in the medium term: to put seven-year-olds under intolerable pressure for either of those ends would be equally abhorrent. In the long term, the mutation of schools into joyless exam factories won’t be halted by resistance alone, we also need to make a proper account of what education is for.

As Weingarten describes, “We have to help kids build relationships. We have to address their life skills, so they can negotiate the world. We have to help kids build resilience. We have to help kids learn how to problem-solve, how to think, how to engage. So tell me, how are any of these things tested on a standardised test?” That’s a test question for the tin-eared secretary of state herself.

Monday 9 March 2015

Invasion of the algorithms: The modern-day equations which can rule our lives

Rhodri Marsden in The Independent

“This is a miracle of modern technology,” says dating-agency proprietor Sid Bliss, played by Sid James, in the 1970 comedy film Carry On Loving. “All we do is feed the information into the computer here, and after a few minutes the lady suitable comes out there,” he continues, pointing to a slot.

There’s the predictable joke about the slot being too small, but Sid’s client is mightily impressed by this nascent display of computer power. He has faith in the process, and is willing to surrender meekly to whatever choices the machine makes. The payoff is that the computer is merely a facade; on the other side of the wall, Sid’s wife (played by Hattie Jacques) is processing the information using her own, very human methods, and bunging a vaguely suitable match back through the slot. The clients, however, don’t know this. They think it’s brilliant.

Technology has come a long way since Sid James delivered filthy laughs into a camera lens, but our capacity to be impressed by computer processes we know next to nothing about remains enormous. All that’s changed is the language: it’s now the word “algorithm” that makes us raise our eyebrows appreciatively and go “oooh”. It’s a guaranteed way of grabbing our attention: generate some findings, attribute them to an algorithm, and watch the media and the public lap them up.

“Apothic Red Wine creates a unique algorithm to reveal the ‘dark side’ of the nation’s personas,” read a typical press release that plopped into hundreds of email inboxes recently; Yahoo, the Daily Mirror, Daily Mail and others pounced upon it and uncritically passed on the findings. The level of scientific rigour behind Apothic’s study was anyone’s guess – but that didn’t matter because the study was powered by an algorithm, so it must be true.

The next time we’re about to be superficially impressed by the unveiling of a “special algorithm”, it’s worth remembering that our lives have been ruled by them since the year dot and we generate plenty ourselves every day. Named after the eminent Persian mathematician Muhammad ibn Musa Al-Khwarizmi, algorithms are merely sets of instructions for how to achieve something; your gran’s chocolate-cake recipe could fall just as much into the algorithm category as any computer program. And while they’re meant to define sequences of operations very precisely and solve problems very efficiently, they come with no guarantees. There are brilliant algorithms and there are appalling algorithms; they could easily be riddled with flawed reasoning and churn out results that raise as many questions as they claim to answer. 

This matters, of course, because we live in an information age. Data is terrifyingly plentiful; it’s piling up at an alarming rate and we have to outsource the handling of that data to algorithms if we want to avoid a descent into chaos. We trust sat-nav applications to pull together information such as length of road, time of day, weight of traffic, speed limits and road blocks to generate an estimate of our arrival time; but their accuracy is only as good as the algorithm. Our romantic lives are, hilariously, often dictated by online-dating algorithms that claim to generate a “percentage match” with other human beings.

Our online purchases of everything from vacuum cleaners to music downloads are affected by algorithms. If you’re reading this piece online, an algorithm will have probably brought it to your attention. We’re marching into a future where our surroundings are increasingly shaped, in real time, by mathematics. Mentally, we’re having to adjust to this; we know that it’s not a human being at Netflix or Apple suggesting films for us to watch, but perhaps the algorithm does a better job. Google’s adverts can seem jarring – trying to flog us products that we have just searched for – precisely because algorithms tailor them to our interests far better than a human ever could.

With data being generated by everything from England’s one-day cricket team to your central heating system, the truth is that algorithms beat us hands down at extrapolating meaning.

“This has been shown to be the case on many occasions,” says data scientist Duncan Ross, “and that’s for obvious reasons. The sad reality is that humans are a basket of biases which we build up over our lives. Some of them are sensible; many of them aren’t. But by using data and learning from it, we can reduce those biases.” 


In the financial markets, where poor human judgement can lead to eye-watering losses, the vast majority of transactions are now outsourced to algorithms which can react within microseconds to the actions of, well, other algorithms. They’ve had a place in the markets ever since Thomas Peterffy made a killing in the 1980s by using them to detect mispriced stock options (a story told in fascinating detail in the book Automate This by Christopher Steiner), but today data science drives trade. Millions of dollars’ worth of stocks change hands, multiple times, before one trader  can shout “sell!”.

We humans have to accept that algorithms can make us look comparatively useless (except when they cause phenomena like Wall Street’s “flash crash” of 2010, when the index lost  1,000 points in a day, before recovering). But that doesn’t necessarily feel like a good place to be.

The increasing amount of donkey work undertaken by algorithms represents a significant shift in responsibility, and by association a loss of control. Data is power, and when you start to consider all the ways in which our lives are affected by the processing of said data, it can feel like a dehumanising step. Edward Snowden revealed the existence of an algorithm to determine whether or not you were a US citizen; if you weren’t, you could be monitored without a warrant. But even aside from the plentiful security and privacy concerns, other stuff is slipping underneath our radar, such as the homogenisation of culture; for many years, companies working in the film and music industry have used algorithms to process scripts and compositions to determine whether they’re worth investing in. Creative ventures that don’t fit the pattern are less likely to come to fruition. The algorithms forged by data scientists, by speeding up processes and saving money, have a powerful, direct impact on all of us.

Little wonder that the Government is taking a slightly belated interest. Last year Vince Cable, the Business Secretary, announced £42m of funding for a new body, the Alan Turing Institute, which is intended to position the UK as a world leader in algorithm research.

The five universities selected to lead that institute (Cambridge, Edinburgh, Oxford, Warwick and UCL) were announced last month; they will lead the efforts to tame and use what’s often referred to as Big Data.

“So many disciplines are becoming dependent upon it, including engineering, science, commerce and medicine,” says Professor Philip Nelson, chief executive of the Engineering and Physical Sciences Research Council, the body co-ordinating the institute’s output. “It was felt very important that we put together a national capability to help in the analysis and interpretation of that data. The idea is to pull together the very best scientists to do the fundamental work in maths and data science to underpin all these activities.”

But is this an attempt to reassert control over a sector that’s wielding an increasing amount of power?

“Not at all,” says Nelson. “More than anything else, it’s about making computers more beneficial to society by using the data better.”

On the one hand we see algorithms used to do pointless work (“the most-depressing day of the year” simply does not exist); on the other we’re told to fear subjugation to our computer overlords. But it’s easy to forget the power of the algorithm to do good.

Duncan Ross is one of the founder directors of DataKind UK, a charity that helps other charities make the best use of the data at their disposal.

“We’re in this world of constrained resources,” he says, “and we can ill afford for charities to be doing things that are ineffective.”

From weekend “datathons” to longer-term, six-month projects, volunteers help charities to solve a range of problems.

“For example,” says Ross, “we did some recent work with Citizens Advice, who have a lot of data coming in from their bureaux.



“They’re keen to know what the next big issue is and how they can spot it quickly; during the payday-loans scandal they felt that they were pretty late to the game, because even though they were giving advice, they were slow to take corporate action. So we worked with them on algorithms that analyse the long-form text reports written by local teams in order to spot new issues more quickly.

“We’re not going to solve all the charities’ problems; they’re the experts working on the ground. What we can do is take their data and help them arrive at better decisions.”

Data sets can be investigated in unexpected ways to yield powerful results. For example, Google has developed a way of aggregating users’ search data to spot flu outbreaks.

“That flu algorithm [Google Flu Trends] picked up on people searching for flu remedies or symptoms,” says Ross, “and by itself it seemed to be performing about as well as the US Centers for Disease Control. If you take the output of that algorithm and use it as part of the decision-making process for doctors, then we really get somewhere.”

But Google, of course, is a private company with its own profit motives, and this provokes another algorithmic fear; that Big Data is being processed by algorithms that might not be working in our best interests. We have no way of knowing; we feel far removed from these processes that affect us day to day.

Ross argues that it’s perfectly normal for us to have little grasp of the work done by scientists.

“How much understanding is there of what they actually do at Cern?” he asks. “The answer is almost none. Sometimes, with things like the Higgs boson, you can turn it into a story where, with a huge amount of anecdote, you can just about make it exciting and interesting – but it’s still a challenge.

“As far as data is concerned, the cutting-edge stuff is a long way from where many organisations are; what they need to be doing is much, much more basic. But there are areas where there are clearly huge opportunities.”

That’s an understatement. As the so-called “internet of things” expands, billions of sensors will surround us, each of them a data point, each of them with algorithmic potential. The future requires us to place enormous trust in data scientists; just like the hopeful romantic in Carry On Loving, we’ll be keeping our fingers crossed that the results emerging from the slot are the ones we’re after.

We’ll also be keeping our fingers crossed that the processes going on out of sight, behind that wall, aren’t overseen by the algorithmic equivalent of Sid James and Hattie Jacques.


Here’s hoping.

Monday 10 February 2014

How internet dating became everyone's route to a perfect love match

The algorithm method: how internet dating became everyone's route to a perfect love match

Six million Britons are looking for their perfect partner online before Valentine's day on Friday, but their chance of success may depend on clever maths rather than charisma
Woman kissing a computer
Six million Britons visit dating sites each month. Photograph: Tom Merton/Getty Images/OJO Images RF
In the Summer of 2012, Chris McKinlay was finishing his maths dissertation at the University of California in Los Angeles. It meant a lot of late nights as he ran complex calculations through a powerful supercomputer in the early hours of the morning, when computing time was cheap. While his work hummed away, he whiled away time ononline dating sites, but he didn't have a lot of luck – until one night, when he noted a connection between the two activities.
One of his favourite sites, OkCupid, sorted people into matches using the answers to thousands of questions posed by other users on the site.
"One night it started to dawn on me the way that people answer questions on OkCupid generates a high dimensional dataset very similar to the one I was studying," says McKinlay, and it transformed his understanding of how the system worked. "It wasn't like I didn't like OkCupid before, it was fine, I just realised that there was an interesting problem there."
McKinlay started by creating fake profiles on OkCupid, and writing programs to answer questions that had also been answered by compatible users – the only way to see their answers, and thus work out how the system matched users. He managed to reduce some 20,000 other users to just seven groups, and figured he was closest to two of them. So he adjusted his real profile to match, and the messages started rolling in.
McKinlay's operation was possible because OkCupid, and so many other sites like it, are much more than just simple social networks, where people post profiles, talk to their friends, and pick up new ones through common interest. Instead, they seek to actively match up users using a range of techniques that have been developing for decades.
Every site now makes its own claims to "intelligent" or "smart" technologies underlying their service. But for McKinlay, these algorithms weren't working well enough for him, so he wrote his own. McKinlay has since written a book Optimal Cupid about his technique, while last year Amy Webb, a technology CEO herself, published Data, a Love Story documenting how she applied her working skills to the tricky business of finding a partner online.
Two people, both unsatisfied by the programmes on offer, wrote their own; but what about the rest of us, less fluent in code? Years of contested research, and moral and philosophical assumptions, have gone into creating today's internet dating sites and their matching algorithms, but are we being well served by them? The idea that technology can make difficult, even painful tasks – including looking for love – is a pervasive and seductive one, but are their matchmaking powers overstated?

Rodin's the Kiss The Kiss, 1901-4, by sculptor Auguste Rodin. Photograph: Sarah Lee for the Guardian

In the summer of 1965, a Harvard undergraduate named Jeff Tarr decided he was fed up with the university's limited social circle. As a maths student, Tarr had some experience of computers, and although he couldn't program them himself, he was sure they could be used to further his primary interest: meeting girls. With a friend he wrote up a personality quiz for fellow students about their "ideal date" and distributed it to colleges across Boston. Sample questions included: "Is extensive sexual activity [in] preparation for marriage, part of 'growing up?'" and "Do you believe in a God who answers prayer?" The responses flooded in, confirming Tarr's suspicion that there was great demand for such a service among the newly liberated student population. Operation Match was born.
In order to process the answers, Tarr had to rent a five-ton IBM 1401 computer for $100 an hour, and pay another classmate to program it with a special matching operation. Each questionnaire was transferred to a punch-card, fed into the machine, and out popped a list of six potential dates, complete with address, phone number and date of graduation, which was posted back to the applicant. Each of those six numbers got the original number and five others in their response: the program only matched women with their ideal man if they fitted his ideal too.
When Gene Shalit, a reporter from Look magazine, arrived to cover the emerging computer-dating scene in 1966, Operation Match claimed to have had 90,000 applications and taken $270,000 in revenue. Even at the birth of the computer revolution, the machine seemed to have an aura about it, something which made its matches more credible than a blind date or a friend's recommendation. Shalit quoted a freshman at Brown University who had dumped her boyfriend but started going out with him again when Operation Match sent her his number. "Maybe the computer knows something that I don't know," she said. Shalit imbued it with even more weight, calling it "The Great God Computer".
The computer-dating pioneers were happy to play up to the image of the omniscient machine – and were already wary of any potential stigma attached to their businesses. "Some romanticists complain that we're too commercial," Tarr told reporters. "But we're not trying to take the love out of love; we're just trying to make it more efficient. We supply everything but the spark." In turn, the perceived wisdom of the machine opened up new possibilities for competition in the nascent industry, as start-up services touted the innovative nature of their programs over others. Contact, Match's greatest rival, was founded by MIT graduate student David DeWan and ran on a Holywell 200 computer, developed in response to IBM's 1401 and operating two to three times faster. DeWan made the additional claim that Contact's questions were more sophisticated than Match's nationwide efforts, because they were restricted to elite college students. In essence, it was the first niche computer-dating service.
Over the years since Tarr first starting sending out his questionnaires, computer dating has evolved. Most importantly, it has become online dating. And with each of these developments – through the internet, home computing, broadband, smartphones, and location services – the turbulent business and the occasionally dubious science of computer-aided matching has evolved too. Online dating continues to hold up a mirror not only to the mores of society, which it both reflects, and shapes, but to our attitudes to technology itself.
The American National Academy of Sciences reported in 2013 that more than a third of people who married in the US between 2005 and 2012 met their partner online, and half of those met on dating sites. The rest met through chatrooms, online games, and elsewhere. Preliminary studies also showed that people who met online were slightly less likely to divorce and claimed to be happier in their marriages. The latest figures from online analytics company Comscore show that the UK is not far behind, with 5.7 million people visiting dating sites every month, and 49 million across Europe as a whole, or 12% of the total population. Most tellingly for the evolution of online dating is that the biggest growth demographic in 2012 was in the 55+ age range, accounting for 39% of visitors. When online dating moves not only beyond stigma, but beyond the so-called "digital divide" to embrace older web users, it might be said to have truly arrived.
It has taken a while to get there. Match.com, founded in 1993, was the first big player, is still the biggest worldwide, and epitomises the "online classifieds" model of internet dating. Match.com doesn't make any bold claims about who you will meet, it just promises there'll be loads of them. eHarmony, which followed in 2000, was different, promising to guide its users towards long-term relationships – not just dating, but marriage. It believed it could do this thanks to the research of its founder, Neil Clark Warren, a then 76-old psychologist and divinity lecturer from rural Iowa. His three years of research on 5,000 married couples laid the basis for a truly algorithmic approach to matching: the results of a 200-question survey of new members (the "core personality traits"), together with their communication patterns which were revealed while using the site.
Whatever you may think of eHarmony's approach – and many contest whether it is scientifically possible to generalise from married people's experiences to the behaviour of single people – they are very serious about it. Since launch, they have surveyed another 50,000 couples worldwide, according to the current vice-president of matching, Steve Carter. When they launched in the UK, they partnered with Oxford University to research 1,000 British couples "to identify any cultural distinctions between the two markets that should be represented by the compatibility algorithms". And when challenged by lawsuits for refusing to match gay and lesbian people, assumed by many to be a result of Warren's conservative Christian views (his books were previously published in partnership with the conservative pressure group, Focus on the Family), they protested that it wasn't morality, but mathematics: they simply didn't have the data to back up the promise of long-term partnership for same-sex couples. As part of a settlement in one such lawsuit, eHarmony launched Compatible Partners in 2009.
Carter says: "The Compatible Partners system is now based on models developed using data collected from long-term same-sex couples." With the rise of Facebook, Twitter, and celebrity-driven online media, have come more personalised and data-driven sites such as OkCupid, where Chris McKinlay started his operation. These services rely on the user supplying not only explicit information about what they are looking for, but a host of assumed and implicit information as well, based on their morals, values, and actions. What underlies them is a growing reliance not on stated preferences – for example, eHarmony's 200-question surveys result in a detailed profile entitled "The Book of You" – but on actual behaviour; not what people say, but what they do.
In 2007, Gavin Potter made headlines when he competed successfully in the Netflix Prize, a $1m competition run by the online movie giant to improve the recommendations its website offered members. Despite competition from teams composed of researchers from telecoms giants and top maths departments, Potter was consistently in the top 10 of the leaderboard. A retired management consultant with a degree in psychology, Potter believed he could predict more about viewers' tastes from past behaviour than from the contents of the movies they liked, and his maths worked. He was contacted by Nick Tsinonis, the founder of a small UK dating site called yesnomayb, who asked him to see if his approach, called collaborative filtering, would work on people as well as films.
Collaborative filtering works by collecting the preferences of many people, and grouping them into sets of similar users. Because there's so much data, and so many people, what exactly the thing is that these groups might have in common isn't always clear to anyone but the algorithm, but it works. The approach was so successful that Tsinonis and Potter created a new company, RecSys, which now supplies some 10 million recommendations a day to thousands of sites. RecSys adjusts its algorithm for the different requirements of each site – what Potter calls the "business rules" – so for a site such as Lovestruck.com, which is aimed at busy professionals, the business rules push the recommendations towards those with nearby offices who might want to nip out for a coffee, but the powerful underlying maths is Potter's. Likewise, while British firm Global Personals provides the infrastructure for some 12,000 niche sites around the world, letting anyone set up and run their own dating website aimed at anyone from redheads to petrolheads, all 30 million of their users are being matched by RecSys. Potter says that while they started with dating "the technology works for almost anything". RecSys is already powering the recommendations for art discovery site ArtFinder, the similar articles search on research database Nature.com, and the backend to a number of photography websites. Of particular interest to the company is a recommendation system for mental health advice site Big White Wall. Because its users come to the site looking for emotional help, but may well be unsure what exactly it is they are looking for, RecSys might be able to unearth patterns of behaviour new to both patients and doctors, just as it reveals the unspoken and possibly even unconscious proclivities of daters.
A Tinder profile on a smartphone Tinder is a new dating app on smartphones.

Back in Harvard in 1966, Jeff Tarr dreamed of a future version of his Operation Match programme which would operate in real time and real space. He envisioned installing hundreds of typewriters all over campus, each one linked to a central "mother computer". Anyone typing their requirements into such a device would receive "in seconds" the name of a compatible match who was also free that night. Recently, Tarr's vision has started to become a reality with a new generation of dating services, driven by the smartphone.
Suddenly, we don't need the smart algorithms any more, we just want to know who is nearby. But even these new services sit atop a mountain of data; less like Facebook, and a lot more like Google.
Tinder, founded in Los Angeles in 2012, is the fastest-growing dating app on mobile phones but its founders don't like calling it that. According to co-founder and chief marketing officer Justin Mateen, Tinder is "not an online dating app, it's a social network and discovery tool".
He also believes that Tinder's core mechanic, where users swipe through Facebook snapshots of potential matches in the traditional "Hot or Not" format, is not simple, but more sophisticated: "It's the dynamic of the pursuer and the pursued, that's just how humans interact." Tinder, however, is much less interested in the science of matching up couples than its predecessors. When asked what they have learned about people from the data they have gathered, Mateen says the thing he is most looking forward to seeing is "the number of matches that a user needs over a period of time before they're addicted to the product" – a precursor of Tinder's expansion into other areas of ecommerce and business relationships.
Tinder's plans are the logical extension of the fact that the web has really turned out to be a universal dating medium, whatever it says on the surface. There are plenty of sites out there deploying the tactics and metrics of dating sites without actually using the D-word. Whether it's explicit – such as Tastebuds.fm, which matches up "concert buddies" based on their Spotify music tastes – or subtle, the lessons of dating research have been learned by every "social" site on the web. Nearly every Silicon Valley startup video features two photogenic young people being brought together, whatever the product, and the same matching algorithms are at work whether you're looking for love, a jobbing plumber, or a stock photograph.
Over at UCLA, Chris McKinlay's strategy seems to have paid off. After gathering his data and optimising his profile, he started receiving 10-12 unsolicited messages every day: an unheard of figure online, where the preponderance of creeps tends to put most women on the defensive. He went on 87 dates, mostly just a coffee, which "were really wonderful for the most part". The women he met shared his interests, were "really intelligent, creative, funny" and there was almost always some attraction. But on the 88th date, something deeper clicked. A year later, he proposed.
Online dating has always been in part about the allure and convenience of the technology, but it has mostly been about just wanting to find "the one". The success of recommendation systems ,which are just as applicable to products as people, says much about the ability of computers to predict the more fundamental attractions that would have got McKinlay there sooner – his algorithms improved his ability to get dates, but not much on the likelihood of them progressing further.
In the end, the development of online dating tells us more about our relationship with networked technology than with each other: from "the Great God Computer", to a profusion of data that threatens to overwhelm us, to the point where it is integrated, seamlessly and almost invisibly, with every aspect of our daily lives.