How Facebook could get you arrested

Smart technology and the sort of big data available to social networking sites are helping police target crime before it happens. But is this ethical?

- Evgeny Morozov
- The Observer, Saturday 9 March 2013 19.20 GMT
- Jump to comments (…)

Illustration of imaginary Facebook policeman

Companies such as Facebook have begun using algorithms and historical data to predict which of their users might commit crimes. Illustration: Noma Bar

The police have a very bright future ahead of them – and not just because they can now look up potential suspects on Google. As they embrace the latest technologies, their work is bound to become easier and more effective, raising thorny questions about privacy, civil liberties, and due process.

To Save Everything, Click Here: Technology, Solutionism, and the Urge to Fix Problems that Don’t Exist
by Evgeny Morozov
Buy it from the Guardian bookshop

Tell us what you think: Star-rate and review this book

For one, policing is in a good position to profit from "big data". As the costs of recording devices keep falling, it's now possible to spot and react to crimes in real time. Consider a city like Oakland in California. Like many other American cities, today it is covered with hundreds of hidden microphones and sensors, part of a system known as ShotSpotter, which not only alerts the police to the sound of gunshots but also triangulates their location. On verifying that the noises are actual gunshots, a human operator then informs the police.

It's not hard to imagine ways to improve a system like ShotSpotter. Gunshot-detection systems are, in principle, reactive; they might help to thwart or quickly respond to crime, but they won't root it out. The decreasing costs of computing, considerable advances in sensor technology, and the ability to tap into vast online databases allow us to move from identifying crime as it happens – which is what the ShotSpotter does now – to predicting it before it happens.

Instead of detecting gunshots, new and smarter systems can focus on detecting the sounds that have preceded gunshots in the past. This is where the techniques and ideologies of big data make another appearance, promising that a greater, deeper analysis of data about past crimes, combined with sophisticated algorithms, can predict – and prevent – future ones. This is a practice known as "predictive policing", and even though it's just a few years old, many tout it as a revolution in how police work is done. It's the epitome of solutionism; there is hardly a better example of how technology and big data can be put to work to solve the problem of crime by simply eliminating crime altogether. It all seems too easy and logical; who wouldn't want to prevent crime before it happens?

Police in America are particularly excited about what predictive policing – one of Time magazine's best inventions of 2011 – has to offer; Europeans are slowly catching up as well, with Britain in the lead. Take the Los Angeles Police Department (LAPD), which is using software called PredPol. The software analyses years of previously published statistics about property crimes such as burglary and automobile theft, breaks the patrol map into 500 sq ft zones, calculates the historical distribution and frequency of actual crimes across them, and then tells officers which zones to police more vigorously.

It's much better – and potentially cheaper – to prevent a crime before it happens than to come late and investigate it. So while patrolling officers might not catch a criminal in action, their presence in the right place at the right time still helps to deter criminal activity. Occasionally, though, the police might indeed disrupt an ongoing crime. In June 2012 the Associated Press reported on an LAPD captain who wasn't so sure that sending officers into a grid zone on the edge of his coverage area – following PredPol's recommendation – was such a good idea. His officers, as the captain expected, found nothing; however, when they returned several nights later, they caught someone breaking a window. Score one for PredPol?

Trials of PredPol and similar software began too recently to speak of any conclusive results. Still, the intermediate results look quite impressive. In Los Angeles, five LAPD divisions that use it in patrolling territory populated by roughly 1.3m people have seen crime decline by 13%. The city of Santa Cruz, which now also uses PredPol, has seen its burglaries decline by nearly 30%. Similar uplifting statistics can be found in many other police departments across America.

Other powerful systems that are currently being built can also be easily reconfigured to suit more predictive demands. Consider the New York Police Department's latest innovation – the so-called Domain Awareness System – which syncs the city's 3,000 closed-circuit camera feeds with arrest records, 911 calls, licence plate recognition technology, and radiation detectors. It can monitor a situation in real time and draw on a lot of data to understand what's happening. The leap from here to predicting what might happen is not so great.

If PredPol's "prediction" sounds familiar, that's because its methods were inspired by those of prominent internet companies. Writing in The Police Chief magazine in 2009, a senior LAPD officer lauded Amazon's ability to "understand the unique groups in their customer base and to characterise their purchasing patterns", which allows the company "not only to anticipate but also to promote or otherwise shape future behaviour". Thus, just as Amazon's algorithms make it possible to predict what books you are likely to buy next, similar algorithms might tell the police how often – and where – certain crimes might happen again. Ever stolen a bicycle? Then you might also be interested in robbing a grocery store.

Here we run into the perennial problem of algorithms: their presumed objectivity and quite real lack of transparency. We can't examine Amazon's algorithms; they are completely opaque and have not been subject to outside scrutiny. Amazon claims, perhaps correctly, that secrecy allows it to stay competitive. But can the same logic be applied to policing? If no one can examine the algorithms – which is likely to be the case as predictive-policing software will be built by private companies – we won't know what biases and discriminatory practices are built into them. And algorithms increasingly dominate many other parts of our legal system; for example, they are also used to predict how likely a certain criminal, once on parole or probation, is to kill or be killed. Developed by a University of Pennsylvania professor, this algorithm has been tested in Baltimore, Philadelphia and Washington DC. Such probabilistic information can then influence sentencing recommendations and bail amounts, so it's hardly trivial.

Los Angeles police arrest a man. The force is using predictive software to direct its patrols. Photograph: Robert Nickelsberg/Getty Images

But how do we know that the algorithms used for prediction do not reflect the biases of their authors? For example, crime tends to happen in poor and racially diverse areas. Might algorithms – with their presumed objectivity – sanction even greater racial profiling? In most democratic regimes today, police need probable cause – some evidence and not just guesswork – to stop people in the street and search them. But armed with such software, can the police simply say that the algorithms told them to do it? And if so, how will the algorithms testify in court? Techno-utopians will probably overlook such questions and focus on the abstract benefits that algorithmic policing has to offer; techno-sceptics, who start with some basic knowledge of the problems, constraints and biases that already pervade modern policing, will likely be more critical.

Legal scholar Andrew Guthrie Ferguson has studied predictive policing in detail. Ferguson cautions against putting too much faith in the algorithms and succumbing to information reductionism. "Predictive algorithms are not magic boxes that divine future crime, but instead probability models of future events based on current environmental vulnerabilities," he notes.

But why do they work? Ferguson points out that there will be future crime not because there was past crime but because "the environmental vulnerability that encouraged the first crime is still unaddressed". When the police, having read their gloomy forecast about yet another planned car theft, see an individual carrying a screwdriver in one of the predicted zones, this might provide reasonable suspicion for a stop. But, as Ferguson notes, if the police arrested the gang responsible for prior crimes the day before, but the model does not yet reflect this information, then prediction should be irrelevant, and the police will need some other reasonable ground for stopping the individual. If they do make the stop, then they shouldn't be able to say in court, "The model told us to." This, however, may not be obvious to the person they have stopped, who has no familiarity with the software and its algorithms.

Then there's the problem of under-reported crimes. While most homicides are reported, many rapes and home break-ins are not. Even in the absence of such reports, local police still develop ways of knowing when something odd is happening in their neighbourhoods. Predictive policing, on the other hand, might replace such intuitive knowledge with a naive belief in the comprehensive power of statistics. If only data about reported crimes are used to predict future crimes and guide police work, some types of crime might be left unstudied – and thus unpursued.

What to do about the algorithms then? It is a rare thing to say these days but there is much to learn from the financial sector in this regard. For example, after a couple of disasters caused by algorithmic trading in August 2012, financial authorities in Hong Kong and Australia drafted proposals to establish regular independent audits of the design, development and modification of the computer systems used for algorithmic trading. Thus, just as financial auditors could attest to a company's balance sheet, algorithmic auditors could verify if its algorithms are in order.

As algorithms are further incorporated into our daily lives – from Google's Autocomplete to PredPol – it seems prudent to subject them to regular investigations by qualified and ideally public-spirited third parties. One advantage of the auditing solution is that it won't require the audited companies publicly to disclose their trade secrets, which has been the principal objection – voiced, of course, by software companies – to increasing the transparency of their algorithms.

The police are also finding powerful allies in Silicon Valley. Companies such as Facebook have begun using algorithms and historical data to predict which of their users might commit crimes using their services. Here is how it works: Facebook's own predictive systems can flag certain users as suspicious by studying certain behavioural cues: the user only writes messages to others under 18; most of the user's contacts are female; the user is typing keywords like "sex" or "date." Staffers can then examine each case and report users to the police as necessary. Facebook's concern with its own brand here is straightforward: no one should think that the platform is harbouring criminals.

In 2011 Facebook began using PhotoDNA, a Microsoft service that allows it to scan every uploaded picture and compare it with child-porn images from the FBI's National Crime Information Centre. Since then it has expanded its analysis beyond pictures as well. In mid-2012 Reuters reported on how Facebook, armed with its predictive algorithms, apprehended a middle-aged man chatting about sex with a 13-year-old girl, arranging to meet her the day after. The police contacted the teen, took over her computer, and caught the man.

Facebook is at the cutting edge of algorithmic surveillance here: just like police departments that draw on earlier crime statistics, Facebook draws on archives of real chats that preceded real sex assaults. Curiously, Facebook justifies its use of algorithms by claiming that they tend to be less intrusive than humans. "We've never wanted to set up an environment where we have employees looking at private communications, so it's really important that we use technology that has a very low false-positive rate," Facebook's chief of security told Reuters.

It's difficult to question the application of such methods to catching sexual predators who prey on children (not to mention that Facebook may have little choice here, as current US child-protection laws require online platforms used by teens to be vigilant about predators). But should Facebook be allowed to predict any other crimes? After all, it can easily engage in many other kinds of similar police work: detecting potential drug dealers, identifying potential copyright violators (Facebook already prevents its users from sharing links to many file-sharing sites), and, especially in the wake of the 2011 riots in the UK, predicting the next generation of troublemakers. And as such data becomes available, the temptation to use it becomes almost irresistible.

That temptation was on full display following the rampage in a Colorado movie theatre in June 2012, when an isolated gunman went on a killing spree, murdering 12 people. A headline that appeared in the Wall Street Journal soon after the shooting says it all: "Can Data Mining Stop the Killing?" It won't take long for this question to be answered in the affirmative.

In many respects, internet companies are in a much better position to predict crime than police. Where the latter need a warrant to assess someone's private data, the likes of Facebook can look up their users' data whenever they want. From the perspective of police, it might actually be advantageous to have Facebook do all this dirty work, because Facebook's own investigations don't have to go through the court system.

While Facebook probably feels too financially secure to turn this into a business – it would rather play up its role as a good citizen – smaller companies might not resist the temptation to make a quick buck. In 2011 TomTom, a Dutch satellite-navigation company that has now licensed some of its almighty technology to Apple, found itself in the middle of a privacy scandal when it emerged that it had been selling GPS driving data collected from customers to the police. Privacy advocate Chris Soghoian has likewise documented the easy-to-use "pay-and-wiretap" interfaces that various internet and mobile companies have established for law enforcement agencies.

Publicly available information is up for grabs too. Thus, police are already studying social-networking sites for signs of unrest, often with the help of private companies. The title of a recent brochure from Accenture urges law enforcement agencies to "tap the power of social media to drive better policing outcomes". Plenty of companies are eager to help. ECM Universe, a start-up from Virginia, US, touts its system, called Rapid Content Analysis for Law Enforcement, which is described as "a social media surveillance solution providing real-time monitoring of Twitter, Facebook, Google groups, and many other communities where users express themselves freely".

"The solution," notes the ECM brochure, "employs text analytics to correlate threatening language to surveillance subjects, and alert investigators of warning signs." What kind of warning signs? A recent article in the Washington Post notes that ECM Universe helped authorities in Fort Lupton, Colorado, identify a man who was tweeting such menacing things as "kill people" and "burn [expletive] school". This seems straightforward enough but what if it was just "harm people" or "police suck"?

As companies like ECM Universe accumulate extensive archives of tweets and Facebook updates sent by actual criminals, they will also be able to predict the kinds of non-threatening verbal cues that tend to precede criminal acts. Thus, even tweeting that you don't like your yoghurt might bring police to your door, especially if someone who tweeted the same thing three years before ended up shooting someone in the face later in the day.

However, unlike Facebook, neither police nor outside companies see the whole picture of what users do on social media platforms: private communications and "silent" actions – clicking links and opening pages – are invisible to them. But Facebook, Twitter, Google and similar companies surely know all of this – so their predictive power is much greater than the police's. They can even rank users based on how likely they are to commit certain acts.

An apt illustration of how such a system can be abused comes from The Silicon Jungle, ostensibly a work of fiction written by a Google data-mining engineer and published by Princeton University Press – not usually a fiction publisher – in 2010. The novel is set in the data-mining operation of Ubatoo – a search engine that bears a striking resemblance to Google – where a summer intern develops Terrorist-o-Meter, a sort of universal score of terrorism aptitude that the company could assign to all its users. Those unhappy with their scores would, of course, get a chance to correct them – by submitting even more details about themselves. This might seem like a crazy idea but – in perhaps another allusion to Google – Ubatoo's corporate culture is so obsessed with innovation that its interns are allowed to roam free, so the project goes ahead.

To build Terrorist-o-Meter, the intern takes a list of "interesting" books that indicate a potential interest in subversive activities and looks up the names of the customers who have bought them from one of Ubatoo's online shops. Then he finds the websites that those customers frequent and uses the URLs to find even more people – and so on until he hits the magic number of 5,000. The intern soon finds himself pursued by both an al-Qaida-like terrorist group that wants those 5,000 names to boost its recruitment campaign, as well as various defence and intelligence agencies that can't wait to preemptively ship those 5,000 people to Guantánamo.

We don't know if Facebook has some kind of Paedophile-o-Meter. But, given the extensive user analysis it already does, it probably wouldn't be very hard to build one –and not just for scoring paedophiles. What about Drug-o-Meter? Or – Joseph McCarthy would love this – Communist-o-Meter? Given enough data and the right algorithms, all of us are bound to look suspicious. What happens, then, when Facebook turns us – before we have committed any crimes – over to the police? Will we, like characters in a Kafka novel, struggle to understand what our crime really is and spend the rest of our lives clearing our names? Will Facebook perhaps also offer us a way to pay a fee to have our reputations restored? What if its algorithms are wrong?

The promise of predictive policing might be real, but so are its dangers. The solutionist impulse needs to be restrained. Police need to subject their algorithms to external scrutiny and address their biases. Social networking sites need to establish clear standards for how much predictive self-policing they'll actually do and how far they will go in profiling their users and sharing this data with police. While Facebook might be more effective than police in predicting crime, it cannot be allowed to take on these policing functions without also adhering to the same rules and regulations that spell out what police can and cannot do in a democracy. We cannot circumvent legal procedures and subvert democratic norms in the name of efficiency alone.

This is an edited extract from To Save Everything, Click Here: Technology, Solutionism, and the Urge to Fix Problems that Don't Exist by Evgeny Morozov, published by Allen Lane.