Back in 2006, two journalists from The Guardian wrote the article Give us back our crown jewels, in which they pleaded for the UK government to collect the best data possible and to make it freely available to stimulate innovation. The article made waves and is still today seen as some sort of open data manifesto.
Since then, an increasing number of journalists have tirelessly pushed governments to release more data sets from public sources. Looking back, we now have hundreds of open data portals worldwide. That’s the part of the story we know, but how relevant are these open data portals for muckraking journalists and curious citizens out there anyway?
My short research finds that indeed, open data portals are important but! Yes, there is a but, and that is that these portals are still being underutilised by journalists when compared to incumbent open data offerings, such as geo databases, stats portals, and weather data sites. There are good reasons for that.
At the European level, the EU’s INSPIRE directive in 2007 successfully managed to get all public administrations of the Union on board to share geo-spatial information on 34 themes. These range from elevation data to transport networks, and on to protected sites. Geo data, which has now been used widely by journalists for years, will certainly get a boost through INSPIRE, set to be fully implemented by 2021. To get an appetizer of what kind of journalism this programme might help to support, let’s look at the ones who are already using geo data.
In May 2016, the Berlin newspaper Morgenpost released an interactive story drawing on data from both NASA and the United States Geological Survey. It analysed satellite pictures in detail and revealed which cities in Germany are the greenest. The same newspaper also used geo data from a 3D model of the City of Berlin to tell a story about how the skyline of Berlin changed between 1989 and 2015.
ProPublica, who is doing “Journalism in the Public Interest”, has invested resources in data journalism for a long time. For a big story published in 2014 called Dollars for Doctors: How Industry Money Reaches Physicians, open data was in fact key. The muckrakers behind this high-impact series have actually posted a short explanation of the methodology they used to gather, clean, and interpret the data. This is where they state that the US government’s Open Payments system and Medicare’s Physician Compare data – both open data – where central sources from which to dig further.
News outlets also use statistics from stats portals of the United Nations or inter-governmental organisations (e.g., Eurostat, UN Human Development Index) extensively, in large part due to the fact that these services have been around for decades – as opposed to most open data portals. The same goes for weather data: turn on your TV set and zap through the channels. Sooner or later you are bound to land on a weather programme where visuals are generated on the basis of open weather data.
Now that we are clear on the fact that open data is a lifeline for journalism, is this true for open data portals as well? Why don’t we see more examples of journalists drawing data from these portals specifically?
One of the reasons, some gossipers might say, is that open data portals host data sets that are too scattered, of little incidence and thereby not useful – i.e. the number of dog excrements between this street corner and that curb between 1988 and 1992. Now, although the critique of snore data is justified, there must be more to it. Others indeed say that the releases are not frequent enough. This does hold for certain portals, but not for the European Open Data Portal or Datos Abiertos de Panamá, just to name two.
Still, in some cases, open data portals can become handy, as suggests data journalism researcher Stefan Baack: “I would describe open data portals as a supportive infrastructure, which helps data journalists work in a quicker and easier way. Depending on the context, even ‘boring data’ in open data portals can suddenly become relevant for a journalistic story, especially to local journalists.”
Jonathan Stoneman wrote an interesting post in 2015 (which build on his Working paper Does open data need journalism) that “there are obvious reasons why journalists do not share too much about the stories they are working on until they are published“ and that “timeliness of Open Data is another factor which makes it a less attractive potential source“.
What open data portals need to do, it seems, is to look at ways in which they can create a habit. This can be done by either setting a more regular rhythm of publication of datasets and/or by offering a more comprehensive and fine-grained pool of data sets. Obviously, this is easier said then done, as it calls for a more streamlined data collection and harvesting cycle.
Before this long-term effort is delivered, there might be short-term tricks to get journalists to subscribe to open data portals. Many data journalists and open data specialists at least hinted towards one new feature that they would like to see sooner than later: real-time data streams. Will it be the killer feature? We’ll turn to that question in a further post of this series.
*Frédéric Dubois is a journalist based in Berlin. He is the Managing editor of [Internet Policy Review, an open access journal on internet regulation - published by the [Humboldt Institute for Internet and Society. He writes about the internet governance, new media, & open data. Frédéric is the co-editor of two books on media & journalism, as well as author & producer of award-winning interactive features. On Twitter: @fredericdubois