We encourage you to republish this article online and in print, it’s free under our creative commons attribution license, but please follow some simple guidelines:
  1. You have to credit our authors.
  2. You have to credit SciDev.Net — where possible include our logo with a link back to the original article.
  3. You can simply run the first few lines of the article and then add: “Read the full article on SciDev.Net” containing a link back to the original article.
  4. If you want to also take images published in this story you will need to confirm with the original source if you're licensed to use them.
  5. The easiest way to get the article on your site is to embed the code below.
For more information view our media page and republishing guidelines.

The full article is available here as HTML.

Press Ctrl-C to copy

An evening spent among big data enthusiasts from different parts of the world left me with some useful insights into the use of big data for development — but also raised some doubts in my mind about the challenges that come with it.
The event, a panel discussion, entitled “What is the future of official statistics in the Big Data era?” was held at the Royal Statistical Society, London, this week (19 January). On my way home I reflected on the big question it left unanswered: does ‘counting’ always mean ‘knowing’?
One of the panellists, invited by event organisers Data-pop Alliance, the Overseas Development Institute and the Royal Statistical Society, was Sandy Pentland, a computer scientist and data expert at Massachusetts Institute of Technology, United States. He suggested that if we could find a way to track people, devices and behaviours that are currently out of reach, we would be able to improve their lives.
“There are millions of children that die every year because we don’t know [about them]. They die of ethnic violence or infectious diseases,” he said. “And it’s too slow and expensive to use traditional measurement techniques.”
His idea is that big data can help bridge gaps where national statistical offices fail to respond to the growing need for data in developing countries.
“Official statistics are the classical music of the data world,” said Kenneth Cukier, data editor at The Economist. “They are tidy, highly curated. Big data is like punk rock: it’s new, it’s untested and we struggle to make sense of it. But it’s also more timely and more granular.”
Cukier also pointed to the governance issues that have to be tackled to unlock the potential of big data where it is most needed. For example, during humanitarian crises such as the current Ebola outbreak. Mobile call data released for the area affected were 18 months old. This is “useless”, he said, “and almost a hoax if it wasn’t for the goodwill behind the operation”.
In his words, this “policy bottleneck” needed senior political leadership.

If we want to tackle real, global problems, the data analysts have to cooperate with the people that know what happens in the real world. If you just look at the data you can always find random correlations that have nothing to do with the actual problem.

Nuria Oliver, Telefonica

Among the examples of cooperation between the private and public sectors, Haishan Fu, director of Development Data Group at the World Bank, mentioned the partnership between Uber and the US city of Boston. The popular car service, which connects users with the nearest available taxi driver through a mobile app, has disclosed its anonymised data to help administrators improve the city’s transport system. Fu believes that this model could be applied to the developing countries that Uber is expanding into too.
It was nice to see challenges such as how to anonymise data effectively and balance the interests of the public against those of private firms discussed in a public event. But the conversation, though lively and opinionated, mostly overlooked the need to dispel misconceptions around the power of big data. For example, big data doesn’t necessarily depict reality more accurately than official statistics. It is just, as Cukier said, “more granular”.
Big data sources are often generated from biased samples (for instance those who have smartphones and use Twitter) and hence they capture just a part of the picture. It can be the case that everything that is offline is left out of the investigation and — when the data are used by the public sector — from policymaking.
In a scenario where Boston relied only on Uber for its baseline data, cyclists and pedestrians would remain unknown. The same would have happened to the people affected by the Ebola outbreak who happened not to have a mobile phone.
Computer scientist Nuria Oliver, scientific director at Spanish telecom firm Telefonica, picked up on a similar point, reminding the event’s audience that big data will not work without the support of a multidisciplinary team.
“If we want to tackle real, global problems, the data analysts have to cooperate with the people that know what happens in the real world. If you just look at the data you can always find random correlations that have nothing to do with the actual problem.”
And should the wrong conclusions be used to inform decision-making, the consequences would be disastrous, she warned.
It is a reminder that counting better can help investigate the world better — but it is by no means equal to knowing it.