Who would have thought? When I wrote “You think open source is boring? Think again” in 2004, I never could have dreamt that it would still be relevant twelve years later. Open source is not only alive and kicking, it remains as inspiring as ever.
But what are the advantages of open source for open data? In a scholarly paper from 2005, Systems Librarian Edward M. Corrado found that the concept of open source can mean “lower costs, greater accessibility, and better prospects for long-term preservation of scholarly works". That's 11 years ago, and that's in the library context. Does this have any relevance for open data today?
To answer this question, we need to go back to the basics of open source. The open source collaboration model - where coders share technical improvements with others under an open source license, is more than technology. It's a way of working together using open standards. Now, there is more. In this post, we leave aside the content aspect (the open data itself) so as to only compare open source and proprietary open data infrastructures.
Open source “is generally available for free (or at a minimal cost)” and “it often has lower implementation and support costs,” when compared to proprietary software (Corrado, 2005). In the context of open data, that holds true. Many data management platforms - i.e. the portal your city hall or national government uses to make data public - is free to download and install. Initial costs are thereby eliminated.
Great, isn't it? But what about the ongoing costs: development of features, implementation of extensions, technical support. With open source, the client is given free choice over how to further develop and/or combine the portal with other software. You can choose the extensions you want to include in your open data portal, you can even decide what organisation you want to contract with for maintenance and support. In most - if not all - proprietary open data portals, the client is more limited and dependent on the software provider for support. The client cannot just turn to another company for support. So to make a long story short, even though costs are not necessarily lower with open source platforms at first, chances are that they will be over the long run or that open source will provide that edge on flexibility.
More importantly when it comes to costs, is the lock-in factor. In open source, the client avoids vendor lock-in. Many customers stick to the portal they use - be it open source or not - as migrating data to another platform is most often cumbersome and costly. Now if the proprietary software customer decides to migrate to another platform altogether, she will see the bill increasing quickly. Most contracts build in barriers to exit by limiting the interoperability of functionalities and design. In the case of open source, all files are exportable and usable towards a new platform.
On the surface, at the content level, one might think access is warranted in both cases. There where it gets crispy though, is when we dig deeper into the infrastructure layer, there where the ethics, spirit and mentality of access play out. The developers in the open data community have a heart for improving open source software for the specific purpose of broadening access to open data. This can mean shorter innovation cycles and the development of functions that provide the user with more granular access to data sets. This is something that proprietary code developers might not have the “luxury” to develop, especially if there is no direct commercial benefit attached. What's more, knowing the open-data-open-source community, I can attest to the fact that users and clients of portals are in constant exchange with the developers over bettering the product along the way. For instance, “connectors” between portals are regularly being developed, while this is not necessarily the case with proprietary coding. This enables applications to connect to a portal via an API, for harvesting purposes or other. Also, a government can decide to go solo with open source by getting its hands dirty with the software. This was the case with the UK government which decided to build capacity in-house so as to take over the management and development of the CKAN open source platform. The spirit of access, more than the theory of it, is thereby what plays out here. Let's settle on the fact that the case for open access matters.
What about “better prospects for long-term preservation” of data? Here, I would categorically advocate for using open source solutions to host open data. This is because they represent little or no risk that the technology will be diverted for other purposes along the way. Owners of proprietary open data projects are exposed to the risk of seeing their platform solution being coopted for other technological aims or goals. Proprietary source code development just means that choices in development are not community-driven and that the proprietary portals don't guarantee that they will be around if times are tough financially. It raises that sustainability issue. And that issue doesn't go away. When a government is getting serious on transparency, it wants to opt for an open data portal that will be permanently supported and sophisticated.
In short, while I don't find the cost issue to be the most determining, I firmly believe that your best pick is an open source based data management system for reasons of access and preservation.
Corrado, E. M. 2005. The Importance of Open Access, Open Source, and Open Standards for Libraries. Issues in Science and Technology Librarianship, Spring 2005. DOI:10.5062/F42F7KD8
Frédéric Dubois is a journalist based in Berlin. He is the Managing editor of Internet Policy Review, an open access journal on internet regulation - published by the Humboldt Institute for Internet and Society. He writes about the internet governance, new media, & open data. Frédéric is the co-editor of two books on media & journalism, as well as author & producer of award-winning interactive features. On Twitter: @fredericdubois