Data-jitsu: Making sense out of today’s large-scale data

Copyright: Natalie Heng

By: Natalie Heng

We encourage you to republish this article online and in print, it’s free under our creative commons attribution license, but please follow some simple guidelines:

You have to credit our authors.
You have to credit SciDev.Net — where possible include our logo with a link back to the original article.
You can simply run the first few lines of the article and then add: “Read the full article on SciDev.Net” containing a link back to the original article.
If you want to also take images published in this story you will need to confirm with the original source if you're licensed to use them.
The easiest way to get the article on your site is to embed the code below.

For more information view our media page and republishing guidelines.

The full article is available here as HTML.

Press Ctrl-C to copy

<div class="article-wrap">
<div id="article-introduction">
<h1>Data-jitsu: Making sense out of today’s large-scale data</h1>
<h4>By: Natalie Heng</h4>
</div>
<br />
<br />

<div id="article-body">
<p>
[SEOUL] At the World Conference of Science Journalists in Seoul, South Korea this week, a roomful of science <a href="http://www.scidev.net/asia-pacific/communication/" target="_blank" rel="noopener noreferrer">communicators</a> are trying to figure out how many US National Security Agency-funded research papers are available on Google Scholar.</p>
<p>“This is not public information,” John Bohannon, our workshop instructor informs us.</p>
<p>But sometimes even classified information leaves a paper trail. With a few advanced Google search options and a little bit of common sense, most of us eventually figure out that “MDA904” is a NSA grant code prefix, and that including Google quotation marks make all the difference.</p>
<blockquote class="quote list-right">
<h3>
		“Journalists have a long tradition of disdain for numbers, but that’s changing.”</h3>
<h4>
		By Jonathan Stray</h4>
</blockquote>
<p>John Bohannon is a regular contributor to publications like <em>Science and Wired</em> who is known for investigative pieces that make use of large-scale <a href="https://www.scidev.net/journalism/practical-guide/data-journalism-how-to-find-stories-in-numbers.html" target="_blank" rel="noopener noreferrer">data analysis</a>. He is at the forefront of a movement that combines the literacy of a computer scientist and the mind of a <a href="http://www.scidev.net/asia-pacific/communication/journalism/" target="_blank" rel="noopener noreferrer">reporter</a> to make sense of untapped data in today's <a href="http://www.scidev.net/asia-pacific/communication/icts/" target="_blank" rel="noopener noreferrer">digital era</a>.</p>
<p>Tricks like “web-scraping” — mining the internet for useful information — don’t have to involve complicated code, he says. All you need is a few Google search terms and an Excel spreadsheet.</p>
<p>Programs like iPython will capture source code for web pages, and let you feed uncategorized <a href="http://www.scidev.net/asia-pacific/enterprise/data/" target="_blank" rel="noopener noreferrer">data</a> into a program for conversion into tables for <a href="http://www.scidev.net/asia-pacific/communication/evaluation/" target="_blank" rel="noopener noreferrer">analysis</a> such as police crime records which through very basic programming anyone can convert it into charts, tables and the like.</p>
<p>All this is part of a revolution some are calling “data-jitsu”.</p>
<p>Jonathan Stray, one of the facilitators at Bohannon's workshop, is also a freelance journalist and computer scientist who teaches “computational journalism” at Columbia University in the US. He thinks data literacy is the future of journalism — especially investigative journalism.</p>
<p>“Journalists have a long tradition of disdain for numbers, but that’s changing,” Stray says.</p>
<p>"Because our job isn’t really [just] writing. It’s analysis and communication, and getting information. That requires data literacy,” he notes. "In fact, I don’t think you can do investigative journalism without data work. Too many of the questions we want to ask are quantitative questions."</p>
<p>At any rate, interpreting and communicating data is a basic skill valuable nowadays to researchers, policymakers and business.</p>
<p><em>This article has been produced by SciDev.Net's South-East Asia & Pacific desk.</em></p>

</div>
<div class="quick-links-wrapper">
<h3>You might also like</h3>
[related-articles]
</div>
<p>This article was originally published on <a href="https://www.scidev.net" target="_blank">SciDev.Net</a>. Read the <a href="https://www.scidev.net/asia-pacific/scidev-net-at-large/data-jitsu-making-sense-out-of-today-s-large-scale-data/" target="_blank">original article</a>.</p>
<script type="text/javascript">
(function(e,t,n,r,i,s,o){e["GoogleAnalyticsObject"]=i;e[i]=e[i]||function(){(e[i].q=e[i].q||[]).push(arguments)},e[i].l=1*new Date;s=t.createElement(n),o=t.getElementsByTagName(n)[0];s.async=1;s.src=r;o.parentNode.insertBefore(s,o)})(window,document,"script","//www.google-analytics.com/ga.js","ga");var _gaq=_gaq||[];var _gaq=_gaq||[];_gaq.push(["_setAccount","UA-3223906-8"],["_trackEvent","article interaction","republished","https://www.scidev.net/asia-pacific/scidev-net-at-large/data-jitsu-making-sense-out-of-today-s-large-scale-data/",null,true])
</script>
</div>

[SEOUL] At the World Conference of Science Journalists in Seoul, South Korea this week, a roomful of science communicators are trying to figure out how many US National Security Agency-funded research papers are available on Google Scholar.

“This is not public information,” John Bohannon, our workshop instructor informs us.

But sometimes even classified information leaves a paper trail. With a few advanced Google search options and a little bit of common sense, most of us eventually figure out that “MDA904” is a NSA grant code prefix, and that including Google quotation marks make all the difference.

“Journalists have a long tradition of disdain for numbers, but that’s changing.”

By Jonathan Stray

John Bohannon is a regular contributor to publications like Science and Wired who is known for investigative pieces that make use of large-scale data analysis. He is at the forefront of a movement that combines the literacy of a computer scientist and the mind of a reporter to make sense of untapped data in today's digital era.

Tricks like “web-scraping” — mining the internet for useful information — don’t have to involve complicated code, he says. All you need is a few Google search terms and an Excel spreadsheet.

Programs like iPython will capture source code for web pages, and let you feed uncategorized data into a program for conversion into tables for analysis such as police crime records which through very basic programming anyone can convert it into charts, tables and the like.

All this is part of a revolution some are calling “data-jitsu”.

Jonathan Stray, one of the facilitators at Bohannon's workshop, is also a freelance journalist and computer scientist who teaches “computational journalism” at Columbia University in the US. He thinks data literacy is the future of journalism — especially investigative journalism.

“Journalists have a long tradition of disdain for numbers, but that’s changing,” Stray says.

"Because our job isn’t really [just] writing. It’s analysis and communication, and getting information. That requires data literacy,” he notes. "In fact, I don’t think you can do investigative journalism without data work. Too many of the questions we want to ask are quantitative questions."

At any rate, interpreting and communicating data is a basic skill valuable nowadays to researchers, policymakers and business.

This article has been produced by SciDev.Net's South-East Asia & Pacific desk.