Government Aims to Build a ‘Data Eye in the Sky’

By JOHN MARKOFF

Published: October 10, 2011

More than 60 years ago, in his “Foundation” series, the science fiction novelist Isaac Asimov invented a new science — psychohistory — that combined mathematics and psychology to predict the future.

Enlarge This Image

Brian Stauffer

Connect With Us on Social Media

@nytimesscience on Twitter.

Science Reporters and Editors on Twitter

Like the science desk on Facebook.

Now social scientists are trying to mine the vast resources of the Internet — Web searches and Twitter messages, Facebook and blog posts, the digital location trails generated by billions of cellphones — to do the same thing.

The most optimistic researchers believe that these storehouses of “big data” will for the first time reveal sociological laws of human behavior — enabling them to predict political crises, revolutions and other forms of social and economic instability, just as physicists and chemists can predict natural phenomena.

“This is a significant step forward,” said Thomas Malone, the director of the Center for Collective Intelligence at the Massachusetts Institute of Technology. “We have vastly more detailed and richer kinds of data available as well as predictive algorithms to use, and that makes possible a kind of prediction that would have never been possible before.”

The government is showing interest in the idea. This summer a little-known intelligence agency began seeking ideas from academic social scientists and corporations for ways to automatically scan the Internet in 21 Latin American countries for “big data,” according to a research proposal being circulated by the agency. The three-year experiment, to begin in April, is being financed by the Intelligence Advanced Research Projects Activity, or Iarpa (pronounced eye-AR-puh), part of the office of the director of national intelligence.

The automated data collection system is to focus on patterns of communication, consumption and movement of populations. It will use publicly accessible data, including Web search queries, blog entries, Internet traffic flow, financial market indicators, traffic webcams and changes in Wikipedia entries.

It is intended to be an entirely automated system, a “data eye in the sky” without human intervention, according to the program proposal. The research would not be limited to political and economic events, but would also explore the ability to predict pandemics and other types of widespread contagion, something that has been pursued independently by civilian researchers and by companies like Google.

Some social scientists and advocates of privacy rights are deeply skeptical of the project, saying it evokes queasy memories of Total Information Awareness, a post-9/11 Pentagon program that proposed hunting for potential attackers by identifying patterns in vast collections of public and private data: telephone calling records, e-mail, travel data, visa and passport information, and credit card transactions.

“I have Total Information Awareness flashbacks when things like this happen,” said David Price, an anthropologist at St. Martin’s University in Lacey, Wash., who has written about cooperation between social scientists and intelligence agencies. “On the one hand it’s understandable for a nation-state to want to track things like the outbreak of a pandemic, but I have to wonder about the total automation of this and what productive will come of it.”

Iarpa officials declined to discuss the research program, saying they are prohibited from giving interviews until contract awards are made later this year.

A similar project by their military sister organization, the Defense Advanced Research Projects Agency, or Darpa, aims to automatically identify insurgent social networks in Afghanistan.

In its most recent budget proposal, the defense agency argues that its analysis can expose terrorist cells and other stateless groups by tracking their meetings, rehearsals and sharing of material and money transfers.

So far there have been only scattered examples of the potential of mining social media. Last year HP Labs researchers used Twitter data to accurately predict box office revenues of Hollywood movies. In August, the National Science Foundation approved funds for research in using social media like Twitter and Facebook to assess earthquake damage in real time.

The accessibility and computerization of huge databases has already begun to spur the development of new statistical techniques and new software to manage data sets with trillions of entries or more.

“Big data allows one to move beyond inference and statistical significance and move toward meaningful and accurate analyses,” said Norman Nie, a political scientist who was a pioneering developer of statistical tools for social scientists and who recently formed a new company, Revolution Analytics, to develop software for the analysis of immense data sets.