Bringing science and development together through news and analysis

Data-jitsu: Making sense out of today’s large-scale data
  • Data-jitsu: Making sense out of today’s large-scale data

Copyright: Natalie Heng

SciDev.Net at large

Our blog from on the road and behind the scenes at key science and development events

Location Map

09/06/15

Speed read

  • World science journalists forum tackles issues like large-scale data analysis

  • A useful tool is web-scraping – mining the internet for information using Google

  • Data literacy increasingly demands working with numbers and developing this skill

Shares

[SEOUL] At the World Conference of Science Journalists in Seoul, South Korea this week, a roomful of science communicators are trying to figure out how many US National Security Agency-funded research papers are available on Google Scholar.

“This is not public information,” John Bohannon, our workshop instructor informs us.

But sometimes even classified information leaves a paper trail. With a few advanced Google search options and a little bit of common sense, most of us eventually figure out that “MDA904” is a NSA grant code prefix, and that including Google quotation marks make all the difference.

“Journalists have a long tradition of disdain for numbers, but that’s changing.”

By Jonathan Stray

John Bohannon is a regular contributor to publications like Science and Wired who is known for investigative pieces that make use of large-scale data analysis. He is at the forefront of a movement that combines the literacy of a computer scientist and the mind of a reporter to make sense of untapped data in today's digital era.

Tricks like “web-scraping” — mining the internet for useful information — don’t have to involve complicated code, he says. All you need is a few Google search terms and an Excel spreadsheet.

Programs like iPython will capture source code for web pages, and let you feed uncategorized data into a program for conversion into tables for analysis such as police crime records which through very basic programming anyone can convert it into charts, tables and the like.

All this is part of a revolution some are calling “data-jitsu”.

Jonathan Stray, one of the facilitators at Bohannon's workshop, is also a freelance journalist and computer scientist who teaches “computational journalism” at Columbia University in the US. He thinks data literacy is the future of journalism — especially investigative journalism.

“Journalists have a long tradition of disdain for numbers, but that’s changing,” Stray says.

"Because our job isn’t really [just] writing. It’s analysis and communication, and getting information. That requires data literacy,” he notes. "In fact, I don’t think you can do investigative journalism without data work. Too many of the questions we want to ask are quantitative questions."

At any rate, interpreting and communicating data is a basic skill valuable nowadays to researchers, policymakers and business.

This article has been produced by SciDev.Net's South-East Asia & Pacific desk.

Republish
We encourage you to republish this article online and in print, it’s free under our creative commons attribution license, but please follow some simple guidelines:
  1. You have to credit our authors.
  2. You have to credit SciDev.Net — where possible include our logo with a link back to the original article.
  3. You can simply run the first few lines of the article and then add: “Read the full article on SciDev.Net” containing a link back to the original article.
  4. If you want to also take images published in this story you will need to confirm with the original source if you're licensed to use them.
  5. The easiest way to get the article on your site is to embed the code below.
For more information view our media page and republishing guidelines.