Q&A: Big Data to help society when it’s industrialised
- Work needed before Big Data can have an impact
- Better for companies to hold data than governments
- Phone call records can reveal population density
Big data is limited to pilots and case studies and cannot yet be industrialised, holding back efforts to use it for social good, says Emmanuel Letouze.
Letouze is director and co-founder of the Data-Pop Alliance, a global coalition of researchers, experts, practitioners, and activists that promote the use in global development of big data – the vast tracts of information that are being collected about us every day.
As part of the Bellagio Residency 2018 series, Letouze tells SciDev.Net about efforts to leverage big data ethically and at scale.
How has the field of big data for development evolved in the past few years?I think that one of the words I used in 2014, when I wrote this piece for SciDev, is that the field was still in infancy – because it was two to three years old. Now it’s about eight years old, entering the age of reason, which is to say there has been quite significant maturation, but at the same time it feels like very early days. An example [is] the fact that we are still framing this as [an issue of] threats and opportunities.
There are works that are still very polarising. The book by Cathy O’Neil, Weapons of Math Destruction, is I think very polarising in saying that algorithms are biased, and bad. Because then [people] think about Facebook, about Cambridge Analytica. And it puts the spotlight on these aspects, forgetting all the possibilities of using big data – and data – to actually change the world.
Emmanuel Letouze, director of the Data-Pop Alliance
What’s an example that shows the benefits for global development?Here a big challenge is to define what we mean by a development application. There are thousands of apps that are applications of big data. Surely Google Maps helps us decide whether we're going to take the bus or whether we're going to take a car, and people do that all over the world.
Very often I think the discussion is about policy applications. We want to find case studies where it's a policymaker [saying] ‘let me look at big data and I'm going to make a decision’. There are very few such cases. When it comes to pollution, for instance, there are lots of algorithmic analyses of air quality that are made with sensors [to] predict where pollution is going to go up [or] down, which would’ve been pretty hard to make a couple of years ago.
There are definitely more SDG applications. I can give you examples of the work that DataPop is doing with the UNDP [UN Development Programme]. What we've developed is using public Facebook data to look at how people are talking about public service in their countries. And the pilot was in Botswana. That's clearly a forward-looking case where you use big data to monitor, and then hopefully improve [the] experience and outcomes of citizens.
Have these reached the point where you can pinpoint impact?When it’s about public policy, I think no. It's not at the point where there’s a large enough critical mass of tools and case studies. But in a sense it’s the same for the internet – [and] the internet has been around for 20-plus years. So we're still really in the early days.
The reason is that there is no way right now to access big data at scale. So we're stuck in a low equilibrium where we do pilots, case studies, proofs of concept, little things here and there. But it cannot be industrialized, it cannot be systematized.
Are there positive moves in that direction?I think a project like OPAL [an open algorithms project] is to make it possible for decision-making to leverage big data, for instance to estimate population density, and probably literacy and movement, almost in real time - starting with CDRs [call data records], in a way that is privacy preserving, where the data are not shared, they are not exposed. They stay on the servers of the private sector company. What is possible is only to extract predetermined indicators through open algorithms. So it's a question and answer mechanism. And around that we're building a governance system, where the algorithms are open so people can actually look at them.
I think this is probably the most potentially transformative project, because it builds on a decade of work [on the question of], how do you industrialise the use of big data, but in a way that is not spooky; that is not going to reinforce power dynamics and structures. It stays on the philosophy of openness and transparency.
The fact that this data is held in private hands is part of the criticism of big data and power dynamics. What would you say to that?I think it’s not part of the problem. It's a very good thing that these very sensitive data are held by the private sector. Because who else? Where should it be? The only alternative is that no data are collected and stored. It's possible that at some point societies could say, “Ok, this is too risky.” And we just shut down the system. But if we say, “Well, there's so much value in this data set - we can better understand epidemics, crime” – there is no alternative that I can think of, at least in the foreseeable future, to private companies actually controlling, storing and protecting those data. They have [the] strongest incentive, for their reputation, for this data not to leak. The last thing we want is for governments to be in charge of those data. Can you imagine Trump, can you imagine Erdogan, can you imagine Putin, having - by law! – control [of these data]?
What’s the next step?The big challenge is - yes, recognising those power dynamics – [to] incentivise, or force, or goad private companies to open up these data. But not expose [the data], not link them. All these data sets remain on the servers of these companies, and you can query them. They’re anonymised, or pseudonymised, aggregated, encrypted over time. There is a blockchain mechanism so you know who has asked which question, and it's logged.
You can think of private companies, like telecom companies, having to work with the data protection authority, universities, a local governance board that tells them, “It is your social, commercial, ethical responsibility to let societies leverage the data that we have – but it has to be open, let's be transparent.” Then there are [also] regulations for what they do themselves with the data, of course.
This interview has been edited for brevity and clarity.