Data journalism: How to find stories in numbers

Copyright: Pablo Rojas, Wellcome Images

By: Sandra Crucianelli

We encourage you to republish this article online and in print, it’s free under our creative commons attribution license, but please follow some simple guidelines:

You have to credit our authors.
You have to credit SciDev.Net — where possible include our logo with a link back to the original article.
You can simply run the first few lines of the article and then add: “Read the full article on SciDev.Net” containing a link back to the original article.
If you want to also take images published in this story you will need to confirm with the original source if you're licensed to use them.
The easiest way to get the article on your site is to embed the code below.

For more information view our media page and republishing guidelines.

The full article is available here as HTML.

Press Ctrl-C to copy

<div class="article-wrap">
<div id="article-introduction">
<h1>Data journalism: How to find stories in numbers</h1>
<h4>By: Sandra Crucianelli</h4>
</div>
<br />
<br />

<div id="article-body">
<p>Colleagues often ask me what data journalism is. They're confused by why it needs its own name — don't all journalists use data?</p>
<p>The term is shorthand for 'database journalism' or 'data-driven journalism', where journalists find stories, or angles for stories, within large volumes of data.</p>
<p>It overlaps with <a href="https://www.scidev.net/communication/practical-guide/how-to-be-an-investigative-science-journalist-1.html">investigative journalism</a> in requiring lots of research, sometimes against people's wishes. It can also overlap with data visualisation, as it requires close collaboration between journalists and digital specialists to find the best ways of presenting data.</p>
<p>So why get involved with spreadsheets and visualisation tools? At its most basic, adding data can give a story a new, factual dimension. But delving into datasets can also reveal new stories, or new aspects to them, that may not have otherwise surfaced.</p>
<p>Data journalism can also sometimes tell complicated stories more easily or clearly than relying on words alone — so it's particularly useful for science journalists.</p>
<p>It can seem daunting if you're trained in print or broadcast media. But I'll introduce you to some new skills, and show you some excellent digital tools, so you too can soon find your feet as a data journalist.</p>
<p><strong>Where to begin</strong></p>
<p>Like all journalism, ideas for stories can come from many sources. A statistic might not sound quite right, tempting you to look at the data behind it. Or you might have a question to answer — <a href="http://www.guardian.co.uk/news/datablog/2010/oct/01/science-funding-uk-cuts">how has science funding changed in the UK?</a>, for example.</p>
<p>One way data journalism differs from other forms is that you may have no inkling of the story until well after you start investigating. That doesn't mean getting hold of any old data and expecting to find a story — rather that the story is what the data tells you. <a href="http://www.guardian.co.uk/news/datablog/2011/apr/07/data-journalism-workflow">This presentation</a> on <em>The Guardian's</em> Datablog gives an idea of the workflow in data journalism.</p>
<p>So how do you choose what to delve into? It's good to familiarise yourself with data types and sources in your 'beats' and when that data might be released, just as you would know conference or journal publication dates.</p>
<p>It's best to start small with your first data journalism projects, particularly while you get used to the data processing and using all the available tools. Your main challenge will probably be the time needed to process data. Peter Aldhous, the <em>New Scientist </em>San Francisco bureau chief, has produced a <a href="http://www.peteraldhous.com/resources.html">tutorial</a> on how to approach science data journalism projects, and <a href="http://datajournalismhandbook.org/1.0/en/index.html">The Data Journalism Handbook</a> also has tips on where to start.</p>
<p><strong>Finding and accessing data</strong></p>
<p>Data journalism <a href="http://datajournalismhandbook.org/1.0/en/introduction_2.html">experts say</a> that journalists' roles are changing from hunting and gathering scarce information to processing information in 'an age of abundance'.</p>
<blockquote class="quote-centre">
<h3>
		“Evidence suggests that data journalism is the journalism of the future”</h3>
<h4>
		Sandra Crucianelli</h4>
</blockquote>
<p>Data might be abundant, but some types of data are easier to get hold of than others. Governments are beginning to recognise the importance of releasing data — including research findings — but this varies from country to country, and even a government that believes in openness may lack adequate systems for making data accessible.</p>
<p>Some nations, <a href="https://opendata.go.ke/">such as Kenya</a>, proactively make data available, while in others you'll have to ask — sometimes through systems such as India's Right to Information Act.</p>
<p>International bodies such as the <a href="http://data.worldbank.org/">World Bank</a> release data, and projects such as <a href="http://www.gapminder.org/data/">Gapminder</a> and <a href="http://www.google.com/publicdata/directory">Google Public Data Explorer</a> collate data from various organisations. For science/health journalists, <a href="http://clinicaltrials.gov/">clinicaltrials.gov</a> is a registry of clinical trial data. And environment or earth science reporters can access information from the <a href="http://earthquake.usgs.gov/research/data/">US Geological Survey</a>, for example.</p>
<p>You might even find some ready packaged data at your disposal. <a href="http://www.internewskenya.org/dataportal/">Data Dredger</a>, a collaboration between Internews and Kenya's open government data initiative<strong>, </strong>provides links to Kenyan health reports and has infographics on health topics you can download and use in stories.</p>
<p>And the web is full of data — finding it just requires honing your search engine skills. Sometimes you can just search for a term plus 'data', or use a specialised academic search engine such as <a href="http://scholar.google.com/">Google Scholar</a> or <a href="http://www.scirus.com/">Scirus</a>. 'Semantic' web resources, such as <a href="http://www.wolframalpha.com/" target="_blank" rel="noopener noreferrer">Wolfram|Alpha</a>, which search by extra data, not just the keywords within the page, are also useful.</p>
<p>Google's advanced search allows you to narrow your results by domain extension, helping you to search for academic or government data, and file format — such as the Excel files in which you're most likely to find tables of figures or statistics. Tables and graphics are often uploaded as an image, so your data hunt should include Flickr and Google Images.</p>
<p>You can even retrieve data that have been deleted from the web but were 'cached' or saved as screenshots. Try the <a href="http://www.archive.org/" target="_blank" rel="noopener noreferrer">Internet Archive</a> and its Wayback Machine to recover old files or broken URLs.</p>
<p>Social media can also be a data source. Tools such as <a href="http://www.socialmention.com/" target="_blank" rel="noopener noreferrer">SocialMention</a>, <a href="http://48ers.com/" target="_blank" rel="noopener noreferrer">48ers</a>, Twitterfall <a href="http://addictomatic.com/">Addictomatic</a>, <a href="http://boardreader.com/">Boardreader</a> and <a href="http://www.whostalkin.com/">Whostalkin</a> allow you to make searches by name, subject, time and geo-reference. An interesting example of social networks revealing news is the <a href="http://www.propublica.org/ion/bailout">Eye on the Bailout</a> project of ProPublica, an investigative journalism organisation, which has used social media mentions to alert journalists to new data on what has happened to the US 2008 bank bailout money.</p>
<p>Remember — it's good practice to link to, or state the sources of, your data.</p>
<p><strong>Data handling</strong></p>
<p>You've found the data, but can you use it? You'll need to import it into a spreadsheet such as those in Excel or Google Drive, so download data in a 'comma separated value', or CSV, format if possible.</p>
<p>You might have a table in a PDF file, or as a JPEG image file. Try a file converter like <a href="http://www.zamzar.com/">Zamzar</a> to get these into spreadsheets. Optical character recognition software can also be a big help: a simple, free one is <a href="http://www.free-ocr.com/" target="_blank" rel="noopener noreferrer">Free Ocr</a>. As a last resort you may have to manually input data, which is time consuming and error prone.</p>
<p>Wherever your data comes from, it probably needs 'cleaning' to make it useful. This can mean anything from reorganising and deleting data you don't need, to using tools such as <a href="http://openrefine.org/">OpenRefine</a> (formerly Google Refine) to make the data more consistent (watch the video tutorials for guidance on what this cleaning can mean). Science journalists at least should have access to well-kept scientific data that needs less cleaning.</p>
<p>You'll also need to start doing some basic processing. You might sort data from smallest to largest or by location. You might be looking for averages, or to join or compare two datasets.</p>
<p>Treat data as a 'source': ask it questions as your audience might. And ask it lots of questions — the answer might not be what you first think. For example, a spreadsheet of journal retractions might suggest rising fraud detection, but you also need to ask whether there are other interpretations.</p>
<p>Think carefully about your results — do they sound plausible? It's best to check and recheck calculations. Don't ruin your reputation for a basic error.</p>
<p>You can strengthen your conclusions or pinpoint new questions with simple statistical analyses. For example, you might spot more catastrophic storms in your country each year for 20 years. But is this a significant result or might it be chance natural variation? Tools such as the <a href="http://www.r-project.org/">R-Project</a> and <a href="http://www.rstudio.com/">RStudio</a> can help you judge that. You might also want to check your conclusions with experts or other experienced data journalists, particularly when you're starting out.</p>
<p><strong>Presenting the data</strong></p>
<p>Your presentation will depend on the story. There may be very little to present; you could have slaved to get a single but important figure to report in a conventional news piece — that your government has spent half what it promised on science, for example.</p>
<p>Or you might use data visualisation as an integral part of the story. This <a href="http://seattletimes.com/html/localnews/2016987032_silent11.html">investigation from <em>The Seattle Times</em></a> in the United States combines a written feature with supporting graphs, maps and source documents. One is an interactive map; elements like this can be used within larger stories and projects, or can be self-contained, like <a href="http://www.guardian.co.uk/news/datablog/2013/mar/18/information-beautiful-how-we-die">this visualisation of the causes of death</a> hosted by the UK newspaper <em>The Guardian</em>.</p>
<p>Online tools such as <a href="http://www.tableausoftware.com/">Tableau Public</a> and <a href="http://www-958.ibm.com">Many Eyes</a> can visualise data in various ways, while <a href="http://www.google.com/drive/apps.html#fusiontables">Google Fusion Tables</a>, <a href="http://geocommons.com/">Geocommons</a> and <a href="http://indiemapper.com">Indiemapper</a> produce good maps using longitude/latitude data or more complex GIS data. Many of these tools also let you add an animation layer to show timescales, for example.</p>
<p>Sometimes it's not just about presenting data, but letting your audience see what it means to them. This <a href="http://projects.propublica.org/docdollars/">ProPublica project</a> shows users whether their doctor receives drug company money, while this <a href="http://www.texastribune.org/library/data/government-employee-salaries/"><em>Texas Tribune</em> effort</a> shows you how US public money is spent.</p>
<p>Going further, this <a href="http://www.guardian.co.uk/news/datablog/2010/nov/19/government-spending-data"><em>Guardian</em> project</a> asks readers to help analyse data on UK public spending. This kind of project, called a 'news app', requires collaboration between journalists and programmers to design and build applications that handle and analyse many variables within big databases or across many datasets.</p>
<p>I've been involved in a <a href="http://interactivos.lanacion.com.ar/censo/#Hogares_Total-2010">news app</a> at Argentina's <em>La Nación</em> newspaper as part of my <a href="http://www.icfj.org/our-work/argentina-create-tools-collect-analyze-and-visualize-data-investigative-stories" target="_blank" rel="noopener noreferrer">Knight International Journalism Fellowship</a>. It uses national census information from 2001 and 2010, letting people explore how demographics have changed in their areas.</p>
<p>The website Information is Beautiful has <a href="http://www.informationisbeautiful.net/tag/science/">examples</a> of creative data visualisation, and shows how working with your publication's digital or graphics team can be productive.</p>
<p>You may need to persuade your editors to make time for data journalism. This gets easier when you see results, and <a href="http://www.icfj.org/sites/default/files/integrating%20data%20journalism-english_0.pdf">this report</a> (which I co-authored) on integrating data journalism into newsrooms might also help.</p>
<p>It might seem like a big ask, but evidence suggests that data journalism is the journalism of the future. If you can invest the time, you'll not only get better stories but you'll better serve your audience and the public interest.</p>
<p><a href="https://www.youtube.com/watch?v=S4OW9cp0D3k">Link to animation about data journalism in Argentina</a></p>
<p><a href="https://twitter.com/#!/spcrucianelli" target="_blank" rel="noopener noreferrer"><em>Sandra Crucianelli</em></a><em> is a </em><a href="http://www.icfj.org/news/knight-fellows-create-data-tools-help-journalists-tell-better-stories" target="_blank" rel="noopener noreferrer"><em>Knight International Journalism Fellow</em></a><em>. She is an investigative journalist and instructor, specialising in digital resources and data journalism. She is the founder and editor of </em><a href="http://www.sololocal.info/" target="_blank" rel="noopener noreferrer"><em>Sololocal.info</em></a><em>, an online magazine providing hyperlocal news from Bahía Blanca City, Argentina. See more: </em><a href="http://www.visualcv.com/sandracrucianelli"><em>www.visualcv.com/sandracrucianelli</em></a></p>
<p><script language="javascript">

</script></p>
<div class="form-panel" id="newsletter-signup-form-bg" style="background: #aacd46; width: 100%">
<form action="http://dmtrk.net/signup.ashx" class="standard" id="signup" method="post" name="signup" onsubmit="return validate_signup(this)">
<fieldset>
			<label class="legend">Sign up for training updates</label></p>
<p class="mediumtype">
				Be the first to get our practical guides and learn about our training courses and offers.</p>
<p>			<input name="addressbookid" type="hidden" value="11904284" /> <input name="userid" type="hidden" value="83392" /> <input name="ReturnURL" type="hidden" value="https://www.scidev.net/content/training-sign-up-success.html" /> <label for="firstname">First name</label><br />
			<input class="text" name="cd_FIRSTNAME" type="text" /><br />
			<label for="surname">Last name</label><br />
			<input class="text" name="cd_LASTNAME" type="text" /><br />
			<label for="email">Email</label><br />
			<input name="Email" type="email" /></p>
<div class="submit-wrap" id="editprofile-submit-wrap">
				<button name="Submit" type="Submit">Subscribe</button></div>
</fieldset></form>
</div>
<p></p>

</div>
<div class="quick-links-wrapper">
<h3>You might also like</h3>
[related-articles]
</div>
<p>This article was originally published on <a href="https://www.scidev.net" target="_blank">SciDev.Net</a>. Read the <a href="https://www.scidev.net/global/practical-guides/data-journalism-how-to-find-stories-in-numbers/" target="_blank">original article</a>.</p>
<script type="text/javascript">
(function(e,t,n,r,i,s,o){e["GoogleAnalyticsObject"]=i;e[i]=e[i]||function(){(e[i].q=e[i].q||[]).push(arguments)},e[i].l=1*new Date;s=t.createElement(n),o=t.getElementsByTagName(n)[0];s.async=1;s.src=r;o.parentNode.insertBefore(s,o)})(window,document,"script","//www.google-analytics.com/ga.js","ga");var _gaq=_gaq||[];var _gaq=_gaq||[];_gaq.push(["_setAccount","UA-3223906-8"],["_trackEvent","article interaction","republished","https://www.scidev.net/global/practical-guides/data-journalism-how-to-find-stories-in-numbers/",null,true])
</script>
</div>

Colleagues often ask me what data journalism is. They're confused by why it needs its own name — don't all journalists use data?

The term is shorthand for 'database journalism' or 'data-driven journalism', where journalists find stories, or angles for stories, within large volumes of data.

It overlaps with investigative journalism in requiring lots of research, sometimes against people's wishes. It can also overlap with data visualisation, as it requires close collaboration between journalists and digital specialists to find the best ways of presenting data.

So why get involved with spreadsheets and visualisation tools? At its most basic, adding data can give a story a new, factual dimension. But delving into datasets can also reveal new stories, or new aspects to them, that may not have otherwise surfaced.

Data journalism can also sometimes tell complicated stories more easily or clearly than relying on words alone — so it's particularly useful for science journalists.

It can seem daunting if you're trained in print or broadcast media. But I'll introduce you to some new skills, and show you some excellent digital tools, so you too can soon find your feet as a data journalist.

Where to begin

Like all journalism, ideas for stories can come from many sources. A statistic might not sound quite right, tempting you to look at the data behind it. Or you might have a question to answer — how has science funding changed in the UK?, for example.

One way data journalism differs from other forms is that you may have no inkling of the story until well after you start investigating. That doesn't mean getting hold of any old data and expecting to find a story — rather that the story is what the data tells you. This presentation on The Guardian's Datablog gives an idea of the workflow in data journalism.

So how do you choose what to delve into? It's good to familiarise yourself with data types and sources in your 'beats' and when that data might be released, just as you would know conference or journal publication dates.

It's best to start small with your first data journalism projects, particularly while you get used to the data processing and using all the available tools. Your main challenge will probably be the time needed to process data. Peter Aldhous, the New Scientist San Francisco bureau chief, has produced a tutorial on how to approach science data journalism projects, and The Data Journalism Handbook also has tips on where to start.

Finding and accessing data

Data journalism experts say that journalists' roles are changing from hunting and gathering scarce information to processing information in 'an age of abundance'.

“Evidence suggests that data journalism is the journalism of the future”

Sandra Crucianelli

Data might be abundant, but some types of data are easier to get hold of than others. Governments are beginning to recognise the importance of releasing data — including research findings — but this varies from country to country, and even a government that believes in openness may lack adequate systems for making data accessible.

Some nations, such as Kenya, proactively make data available, while in others you'll have to ask — sometimes through systems such as India's Right to Information Act.

International bodies such as the World Bank release data, and projects such as Gapminder and Google Public Data Explorer collate data from various organisations. For science/health journalists, clinicaltrials.gov is a registry of clinical trial data. And environment or earth science reporters can access information from the US Geological Survey, for example.

You might even find some ready packaged data at your disposal. Data Dredger, a collaboration between Internews and Kenya's open government data initiative, provides links to Kenyan health reports and has infographics on health topics you can download and use in stories.

And the web is full of data — finding it just requires honing your search engine skills. Sometimes you can just search for a term plus 'data', or use a specialised academic search engine such as Google Scholar or Scirus. 'Semantic' web resources, such as Wolfram|Alpha, which search by extra data, not just the keywords within the page, are also useful.

Google's advanced search allows you to narrow your results by domain extension, helping you to search for academic or government data, and file format — such as the Excel files in which you're most likely to find tables of figures or statistics. Tables and graphics are often uploaded as an image, so your data hunt should include Flickr and Google Images.

You can even retrieve data that have been deleted from the web but were 'cached' or saved as screenshots. Try the Internet Archive and its Wayback Machine to recover old files or broken URLs.

Social media can also be a data source. Tools such as SocialMention, 48ers, Twitterfall Addictomatic, Boardreader and Whostalkin allow you to make searches by name, subject, time and geo-reference. An interesting example of social networks revealing news is the Eye on the Bailout project of ProPublica, an investigative journalism organisation, which has used social media mentions to alert journalists to new data on what has happened to the US 2008 bank bailout money.

Remember — it's good practice to link to, or state the sources of, your data.

Data handling

You've found the data, but can you use it? You'll need to import it into a spreadsheet such as those in Excel or Google Drive, so download data in a 'comma separated value', or CSV, format if possible.

You might have a table in a PDF file, or as a JPEG image file. Try a file converter like Zamzar to get these into spreadsheets. Optical character recognition software can also be a big help: a simple, free one is Free Ocr. As a last resort you may have to manually input data, which is time consuming and error prone.

Wherever your data comes from, it probably needs 'cleaning' to make it useful. This can mean anything from reorganising and deleting data you don't need, to using tools such as OpenRefine (formerly Google Refine) to make the data more consistent (watch the video tutorials for guidance on what this cleaning can mean). Science journalists at least should have access to well-kept scientific data that needs less cleaning.

You'll also need to start doing some basic processing. You might sort data from smallest to largest or by location. You might be looking for averages, or to join or compare two datasets.

Treat data as a 'source': ask it questions as your audience might. And ask it lots of questions — the answer might not be what you first think. For example, a spreadsheet of journal retractions might suggest rising fraud detection, but you also need to ask whether there are other interpretations.

Think carefully about your results — do they sound plausible? It's best to check and recheck calculations. Don't ruin your reputation for a basic error.

You can strengthen your conclusions or pinpoint new questions with simple statistical analyses. For example, you might spot more catastrophic storms in your country each year for 20 years. But is this a significant result or might it be chance natural variation? Tools such as the R-Project and RStudio can help you judge that. You might also want to check your conclusions with experts or other experienced data journalists, particularly when you're starting out.

Presenting the data

Your presentation will depend on the story. There may be very little to present; you could have slaved to get a single but important figure to report in a conventional news piece — that your government has spent half what it promised on science, for example.

Or you might use data visualisation as an integral part of the story. This investigation from The Seattle Times in the United States combines a written feature with supporting graphs, maps and source documents. One is an interactive map; elements like this can be used within larger stories and projects, or can be self-contained, like this visualisation of the causes of death hosted by the UK newspaper The Guardian.

Online tools such as Tableau Public and Many Eyes can visualise data in various ways, while Google Fusion Tables, Geocommons and Indiemapper produce good maps using longitude/latitude data or more complex GIS data. Many of these tools also let you add an animation layer to show timescales, for example.

Sometimes it's not just about presenting data, but letting your audience see what it means to them. This ProPublica project shows users whether their doctor receives drug company money, while this Texas Tribune effort shows you how US public money is spent.

Going further, this Guardian project asks readers to help analyse data on UK public spending. This kind of project, called a 'news app', requires collaboration between journalists and programmers to design and build applications that handle and analyse many variables within big databases or across many datasets.

I've been involved in a news app at Argentina's La Nación newspaper as part of my Knight International Journalism Fellowship. It uses national census information from 2001 and 2010, letting people explore how demographics have changed in their areas.

The website Information is Beautiful has examples of creative data visualisation, and shows how working with your publication's digital or graphics team can be productive.

You may need to persuade your editors to make time for data journalism. This gets easier when you see results, and this report (which I co-authored) on integrating data journalism into newsrooms might also help.

It might seem like a big ask, but evidence suggests that data journalism is the journalism of the future. If you can invest the time, you'll not only get better stories but you'll better serve your audience and the public interest.

Link to animation about data journalism in Argentina

Sandra Crucianelli is a Knight International Journalism Fellow. She is an investigative journalist and instructor, specialising in digital resources and data journalism. She is the founder and editor of Sololocal.info, an online magazine providing hyperlocal news from Bahía Blanca City, Argentina. See more: www.visualcv.com/sandracrucianelli