Service providers: Datopian
Services: building and management of an open data portal based on CKAN; CKAN extensions; custom tool development, data strategy; data consultancy;
Period: January 2020 – present
Summary of what we did for them:
– UI/UX design work, including content architecture work (such as re-organising content to different levels) and wireframe development
– Consultation and development around technical infrastructure and various data-related issues
– Development and deployment of customised CKAN open data portal with extensions above vanilla CKAN installed and supported
– Data explorer extension which gives the ability to filter effectively and preview large resources with ease and download filtered resources using BigQuery
– Functionality for zipped upload of resources Providing API usage data to allow analysis of data trends
– Developing tools for transparent audit trail and output log of all user activities, etc.
– Fully responsive design for desktop, mobile and tablet devices
The NHS Business Services Authority (NHSBSA) is an arm’s length body of the Department of Health and Social Care
The NHS Business Services Authority is an Arm’s Length Body of the Department of Health and Social Care which provides a number of support services to the wider National Health Service. It manages over 35 billion pounds of NHS spend annually delivering a range of national services to NHS organisations, NHS contractors, patients and the public. Its purpose is to be a catalyst for better health and its vision is to be the delivery partner of choice for the NHS.
Being involved in the management of 35 billion pounds worth of NHS spend and having a broad range of services to deliver – from processing data and information to help the NHS run more efficiently and helping digitize more of their services, the NHSBSA was looking for a solution to ensure their Open Data Portal would fulfill their purpose and vision. Before approaching us, they did extensive market research. Datopian came out as a leading provider that could help them shape their data journey.
Our CKAN product is widely recognized as being one of the best products to use to create an open data portal.
One of NHSBSA’s main priorities was and is to make data freely available to everyone and get as much data out there in the open. Part of their requirements included:
- 500GB of storage for Open Data
- Minimum 2,000 downloads a month, equivalent to 6TB of download traffic out of the solution, in addition to API queries to the datastore for specific resources.
- A public, SQL queryable API data endpoint for each resource, achieved with each flat CSV file ingested into the CKAN datastore effectively. The download / go to resource button on each resource should provide each CSV as a zip to reduce the file size to users downloading it if large.
- A suitable data explorer extension, which gives the ability to filter effectively and preview large resources added to the portal quickly by external users. E.g. preview and return filtered data of 4GB up to approximately 40GB with ease and little lag. With no limit on the number of rows that can be extracted and downloaded from this data preview / filter.
Our team has clear, established methodologies for implementing open data projects. And we know very well that in order to be a delivery partner of choice, NHSBSA needs to make sure that their services are improving customer experience, it’s easy to work with them, and that they are delivering real value and people can see that.
Currently on the data portal there are 222 total resources (both public and private) across 43 datasets which includes around 1.3 TB of data.
We pulled some statistics: on a random day in July there were over 2000 interactions (APIs, explorer interactions, downloads, etc) or around 150 to 200 GB of data per day. So we’re talking about large volumes of data.
When working with large files, BigQuery is useful and faster. That’s why we used BigQuery to replace postgres. We are using Aircan which is Airflow based to load the data. We developed a data publishing UI which is a bit different from the classic one. It allows you to upload data directly to the cloud and then once it’s in the file store it gets picked up by the Аirflow and it’s pushed to the BigQuery. So basically folks from NHSBSA upload large data files around 5 or 6 GB into the file store directly which is very convenient and much quicker.
The zip upload for resources requirement was solved by creating a script so that a zipped version of the resource can be downloaded along with the standard CSV download. We also developed a data explorer extension which gives the ability to filter effectively and preview large resources with ease and download filtered resources (instead of the whole file).
To mention a few of the services Datopian has provided:
- Customisation of the front end according to the requirements of the client.
- Content architecture work (such as re-organising content to different levels).
- Custom extensions above vanilla CKAN such as data explorer extension which gives the ability to filter effectively and preview large resources with ease and download filtered resources using BigQuery and Google Buckets for data storage.
- Enabled uploads through datapub extension that adds the functionality of uploading tabular data along with editing the schema for BigQuery table for the resource.
- We provided the functionality for the zipped upload of the resources. That allows downloading the zipped version instead of the uncompressed file that saves the user from downloading large files.
- Provided egress details for each NHS storage bucket on the Google cloud platform via an API which allows egress details to be available for the analysis of the data trends for the NHS open data platform.
- Developed tools for transparent audit trail and output logs of user activities on the ODP and stored them in Google cloud storage through a sink for the ODP usage trends analysis.
In the coming years, NHSBSA will be looking to:
- Grow and develop knowledge about user needs including the ability for users to engage with NHSBSA and each other.
- Continue to publish more data from across the NHSBSA.
- Continue to collaborate with other parts of the NHS and outside the NHS.
- Improve the ability to search the datasets across the web.
- Continue to use the portal for Freedom of Information (FOI) requests to support the wider NHSBSA around operational efficiency and making more data open.
We create, maintain, and deploy data management technologies for government, enterprise, and the non-profit sector using CKAN, Frictionless Data, and other open-source software.