OECD’s SIS-CC .Stat Suite goes Open Source with Datopian
May 12, 2020 by Audrey Lobo-Pulo
With a long history in statistical software, the SIS-CC, under the OECD secretariat, recently launched the .Stat Suite in open source. To meet the challenges in making this transition, they turned to Datopian for their foresight, experience and expertise. Our CEO, Paul Walsh, spoke with the OECD’s SIS-CC community manager, Jonathan Challener, about their open source journey to .Stat Suite so far.
# With the community’s decision at a strategic level to go open source in 2019, we needed to look at other similar communities out there or similar projects and try to tap into their expertise and experiences to ensure that what we were doing wouldn’t fail at the first hurdle. - Jonathan Challener
Paul: First of all, Jonathan, can you tell us a bit about your role at the OECD?
Jonathan: I am the Community manager for a group of organisations that collaborate and co-develop the .Stat Suite, an open source solution for official statistics. So I’m basically coordinating and managing the community framework: the engagement with existing members, new members, fundraising and organising the regular interactions with community members; and also ensuring that the needs of the community members are met by representing their views in our daily operations.
Paul: Great, and who’s in that community?
Jonathan: We have 15 members now, with the most recent member being the Federal Competitiveness and Statistics Authority in the UAE. We have 14 other members ranging from regional with the Pacfic Community, International organisations such as UNICEF, the International Labour Organization, the National Bank of Belgium as a central bank, and a number of National Statistics Offices such as Australia, New Zealand, Italy, and Tunisia amongst others. We also have the UKData service which is a provider of data that’s collected from international organisations and distributed for UK academics.
SIS-CC Members as of December 2019
We also work with partners - we have a close partnership with EuroStat, which has a mandate to produce tools for data collection through SDMX with their member states in Europe. And we also work closely with PARIS21, hosted at the OECD, who work a lot with lower income countries around developing capacity on their national statistical systems. We work closely with them, bringing our technical expertise and open source solutions.
Paul: Okay, great - so that technical expertise and open source solution is around .Stat Suite. Could you tell us a little bit about that? What is .Stat Suite?
Jonathan: .Stat Suite is the new evolution of the .Stat software, which was developed at the OECD back in 2003. We have a long history of working together with other organisations and also of building software for statistics. So, although it’s a new evolution, by no means is it new in the sense of what we’re doing - in developing software and the thinking behind it.
With .Stat Suite - the reason why it came about is that technology changes over time and users behaviours change.
# The demand on data is ever increasing and the way that existing systems need to be integrated also demands a level of change in the way software is designed. - Jonathan Challener
So we took a ‘back to the drawing board’ approach for the architecture and built the .Stat Suite with SDMX at the core - this is an international standard and also an ISO (ISO17369) standard since 2013, and we’re very much driven by this. The tools that we develop are a product of the standard, and we’re very much a standards first approach tools second.
The .Stat Suite is made up of three components: the data management part (.Stat Data Lifecycle Manager), which allows an organisation to manage all their data structures including metadata internally in a single process - and source of the truth; then there’s the .Stat Core which is a service layer - the security etc. and then the .Stat Data Explorer - the dissemination end, where users can visualise, explore and share data.
Paul: So a very specific data platform targeting a specific, but widely held need across various entities working with statistical data. So, how is it you came to work with Open Knowledge Foundation and by extension Datopian?
Jonathan: The Open Knowledge Foundation has been around for a long time doing a lot of good work - and have a very long experience of working in Open Source environments. With the community’s decision at a strategic level to go open source in 2019, we needed to look at other similar communities out there or similar projects and try to tap into their expertise and experiences to ensure that what we were doing wouldn’t fail at the first hurdle.
We’ve been ‘open’ in the sense of a community model for a very long time, but not open source, which is a completely different model. So with CKAN, as an open source product - a very successful project over many years and used by some of our community members - we looked at it as a good benchmark to work towards. We wanted to tap into the knowledge and bring in your expertise to help us on our journey to move to a full open source model.
Paul: If we go back when we did the initial engagement - we came together and did some initial workshops with the OECD team and then more recently, I worked with you and several other internal stakeholders at the OECD, and some external ones from the Community, to produce a detailed report of recommendations for how to go about open sourcing the code - from community engagement through to more technical issues with how the code bases work and so on…
Based on that, what’s changed? Has there been any movement internally? And has that helped you on your path towards open source with .stat suite?
Jonathan: So yes, the short answer is definitely - it helped us. The initial engagement was, as you said, quite early in our thinking and we may not have had all those things in mind. That focussed a lot on the process - the key aspects that we had to think about, like documentation, and the CI/CD - the continuous integration and the continuous delivery aspect.
Exposing our team to these new concepts, things to develop and put in place, led to the engagement with Datopian to undertake a review of what had been done to see where we’re at - and what more we needed to do to launch with open source in June 2019.
Community Workshop from Idea to Reality
# It’s really opening new projects and allowing it to be leveraged by others in different contexts, including for example a training workshop in Bangkok with 6 National Statistics Offices run by the United Nations Statistics division, with support from UNICEF and the OECD. -Jonathan Challener
Paul: So what’s next for a .Stat Suite or open source at the OECD more generally?
Jonathan: So, the .Stat Suite has been developed heavily since the review that you undertook, and we’ve put in place a full devops operation now. We have three different delivery mechanisms: one is .Stat Suite as Containers; .Stat Suite as Codebase; and .Stat Suite as a Service.
So with that, we’ve adopted technologies such as docker and kubernetes, which is allowing for easy deployment and scalability. With deployment we have examples where we’ve built docker-compose files to allow developers to potentially deploy this in a matter of minutes rather than days! So technically you don’t necessarily need to know all the technologies with this approach - these are things we took from your advice to provide these examples, and provide a lower entry barrier to get started with a project.
We’ve also put in place a Developer Advocate, which was actually a suggestion from your report. The Developer Advocate is a community member, from the Australian Bureau of Statistics, who’ve volunteered and are quite involved in the daily process. They attend the regular team devops meetings and are reviewing all the documentation - they’re also fielding some of the technical questions from both community members and non-members. The idea is to have them also manage the forum space - which we plan to set up in the near future.
Again, this is another suggestion from your report. You can see we’ve actually used a lot of your advice, and we’re moving forward with that quite fast - but in a way that is constantly improving and aligned with our team.
More generally with open source at the OECD, I am not sure how far other teams will go - but definitely for our project and with our members and extended community it’s really opening new projects and allowing it to be leveraged by others in different contexts, including for example a training workshop in Bangkok with 6 National Statistics Offices run by the United Nations Statistics division, with support from UNICEF and the OECD. The .Stat Suite was deployed in a multi-tenant environment for this training, and was available to each of these participants in their own environment - that was set up quite quickly using the new approach and technology that was put in place.
Paul: Wow, that’s really amazing. And Jonathan, it sounds like things are really moving forward. It’s really great.
Jonathan: It is. We still have challenges, as you can imagine. Documentation is still work in progress - it’s improved a lot since you reviewed it - but there are still things to do, for sure.
Datopian delivers outstanding solutions that enable your organization to realise your data’s potential. From hosted data portals powered by CKAN to specialised data engineering, from agile data practices to data strategy development, Datopian empowers you to transform data to insight.
© Datopian (CC Attribution-Sharealike (by-sa))