Big data: early years and foundational pieces
An early mention of the upcoming “Industrial Revolution of data” can be found in a blog by Joe Hellerstein, a computer scientist at the University of California, Berkeley. It was published in November 2008, a few months after Wired had claimed that ‘data deluge’ would signify “the end of theory” and make the ‘scientific method’ obsolete as numbers would speak for themselves. Then, in 2009, a group of leading computer and social scientists published a commentary in Science describing a new academic field that explores data to reveal patterns of individual and group behaviours: computational social science. In early 2010, The Economist ran an article on the data deluge as part of a special report that stirred significant interest and remains highly informative today. The Wall Street Journal’s The really smart phone feature published in 2011, and the New York Time’s The age of Big Data opinion article published in 2012 both had a similar impact.
Recently, there has been an explosion in the number of publications about big data and international development, but three reports published within a few months in 2011-2012 can be considered as seminal pieces in the field: the McKinsey Global Institute’s Big data: the next frontier for innovation, competition and productivity, The World Economic Forum’s Big data, big impact: new possibilities for international development and UN Global Pulse’s Big data for development: challenges and opportunities. Other noteworthy contributions include Martin Hilbert’s literature review Big data for development: from information- to knowledge societies and the chapter ‘Big data for conflict prevention’ in a report by the International Peace Institute.
Several recent books that popularise the topic merit mention — one is Viktor Mayer-Schönberger and Kenneth Cukier’s Big Data: a revolution that will transform how we live work and think. Others are Honest Signals: how they shape our world by Massachusetts Institute of Technology professor Alex ‘Sandy’ Pentland and his more recent Social physics; a collection of essays edited by Lisa Gitelman titled Raw data is an oxymoron; and, focusing on a linguistic analysis, the book titled Uncharted by Erez Aiden and Jean-Baptiste Michel.
A few websites provide links to academic publications about big data or document research that makes use of big data. Notable examples include the Harvard Big Data for Social Good group’s publications page and the Massachusetts Institute of Technology’s Human Dynamics Lab webpage.
Impact on research, society and development
Harvard University professor Gary King, one of the contributors to the 2009 Science article, has written many seminal articles about big data, notably a commentary on the data-rich future of the social sciences in addition to technical papers. He has also talked about The social science data revolution, describing the opportunities and requirements of conducting social science research in the age of big data. On the specific issue of how big data affects social science research, a paper titled The data revolution and economic analysis by Liran Einav and Jonathan D. Levin is an engaging read, as is a recent post by Angus Whyte on the London School of Economics’ blog.
Over the past couple of years, thousands of media articles and editorials have covered big data’s impact on society. One of the most comprehensive is also one of the most recent: a Harvard Business Review feature published only a few weeks ago. The Guardian online has an excellent dedicated section (a ‘data store’) on big data. Many other informative articles can be found using a key word search for big data on the websites of major newspapers and magazines such as Forbes, The Wall Street Journal, the New York Times and, for French speakers, Le Monde. Specialised publications that offer more technical articles include The Technology Review and Wired.
Critical summaries of key resources and challenges for big data and development can be found in a 2013 post on the website of European think-tank Bruegel, and in a post on poverty monitoring on the Development Progress website facilitated by UK think-tank the Overseas Development Institute. Other good resources for an overview of the field include a Flipboard and a timeline curated by human rights activist Sanjana Hattotuwa.
The dark side: ethical problems of big data
Issues of individual privacy, ethics and human rights around the use of big data are getting increasing attention. A good summary of the main positions and contributions in the ‘privacy debate’ can be found in a recent post on the NGO Privacy International’s website, which also contains many other valuable articles on big data. Among the most prominent critical voices of big data are researchers Danah Boyd and Kate Crawford, who expressed scepticism in their 2011 essay Six provocations for big data and also since then either independently or with other co-authors.
Useful summaries of the key challenges of using big data in crisis contexts can be found on the Human Rights Data Analysis Group’s website, notably in this blog post by Patrick Ball, and on the website of tech-training organisation Techchange. Various events are also dedicated to using data responsibly, including the Responsible Data Forum series.
Following the data — institutions and programmes
Responding to the promise of big data, several large foundations — such as the Knight Foundation, the Rockefeller Foundation and the Bill & Melinda Gates Foundation — have already showed interest in the field. Many of them are positioning this work under the larger umbrella of the data revolution agenda as part of the post-2015 framework of development goals — a good source of information on the topic is the post-2015 site. A topic attracting growing attention is the impact of big data on official statistics, with useful information provided on the websites of the UN Economic Commission for Europe and the UN Statistical Division.
Many other organisations provide great resources through their work and websites. The UN Global Pulse, an innovation unit located in the Executive Office of the UN Secretary-General, has published two useful primers — one on mobile-phone network data for development, hosted on its regular blog, and another on big data for development. Another organisation with interesting resources is the Qatar Computing Research Institute. In academia, leading universities and programmes where valuable resources can be found include Harvard University’s Institute for Quantitative Social Sciences, the University of California, Berkeley’s D-Lab, Columbia University’s Institute for Data Sciences and Engineering, the Harvard School of Public Health’s Big Data for Social Good group, all in the United States, and the United Kingdom’s Oxford Internet Institute.
Several universities have started offering study programmes in data science — some are available online and many are listed here, while free offerings are available on Coursera. The University of Chicago offers a data science for social good fellowship. At present, there does not seem to be any course focusing specifically on development and big data or data science.
The website of the Data for Development (D4D) group — an informal consortium of institutions led by the mobile phone operator Orange — offers numerous resources related to a big data research competition it organised in 2012-13. GSMA, a global association of mobile phone operators and a D4D member, also provides resources on big data and development through its work on personal data. The World Economic Forum’s own work on personal data is also worth considering.
New institutes and partnerships have also recently been created, including the Data and Society Research Institute and the developing data-pop initiative.
Groups, networks and events
Anyone interested in big data and statistics can join groups including Stanford University’s Statistics for Social Good working group and Google’s Data Science for Social Good group. A few bloggers are especially active. One is Patrick Meier, whose iRevolution blog includes a whole series of posts on big data, especially applications related to humanitarian assistance and crises. Another is Jay Ulfelder on his Dart-Throwing Chimp blog, especially on issues of forecasting.
A further source of information on big data is of course Twitter — notably the hash tags #bigdata, which will yield close to one post per second, #bigdata4dev or simply #data4dev.
There is a plethora of events and forums on big data with direct relevance to development: NetMob conferences on the analysis of mobile phone data, conferences organised by the Strata software company, the International Conference of Crisis Mappers’ panel on big data, and TED talks on the topic. Other noteworthy videos include Sandy Pentland’s interview for the games-industry magazine The Edge, Kate Crawford’s publication at the Strata 2013 conference and The Economist data editor Kenneth Cukier’s intervention at the latest The Next Web conference. Readers/viewers with a bit of time and an interest in technical aspects of big data can watch a presentation by Nathan Eagle and this Unconference on the Future of Statistics, or listen to a recent interview by one of its participants, Daniela Witten.
Emmanuel Letouzé is a PhD candidate at the University of California, Berkeley, United States, a fellow at the Harvard Humanitarian Initiative, a visiting scholar at the Massachusetts Institute of Technology’s Media Lab and a research associate at the Overseas Development Institute. He is also co-founder and director of data-pop. He can be contacted at [email protected] and on Twitter @Data4Dev
This article is part of the Spotlight on Data for development.