Article Image

From Socrata to CKAN: A Seamless Data Portal Migration

6 mins read

Key facts

Service providers:
Datopian

Client:
City of Santa Monica

Services:
UI/UX design; DevOps; Infrastructure; CKAN development; Migration from Socrata to CKAN;

Period:
October 2022 – November 2022

Work we've done:
Data Santa Monica Gov

Brief summary of the project

The City of Santa Monica collaborated with Datopian to transition their data portal from the proprietary Socrata platform to the open-source CKAN platform. The move aimed to gain cost-efficiency, control, and flexibility while adhering to a 30-day deadline.

Exclamation mark pointing the problem
Problem

Santa Monica faced the issue of high subscription costs and underutilization of the features provided by the Socrata platform. Moreover, there was a pressing deadline as their existing Socrata contract was set to be terminated, necessitating a rapid migration of data.

Interrogation mark pointing the need
Need

The City urgently required a more cost-effective, flexible, and customizable data portal solution. They required the new CKAN instance to be hosted on their own AWS account, fully migrated, and operational within 30 days. The platform needed to ensure seamless data migration, facilitate easy access to civic data, and allow for future enhancements and integrations.

Checkmark pointing the solution
Solution

Leveraging our specialized know-how in data management, Datopian effectively transitioned Santa Monica's data portal to a more cost-efficient CKAN-based solution hosted on AWS. Navigating tight deadlines and complex data migration tasks, we deployed a fully operational portal tailored to the city's specific needs while aligning with budget constraints. This cost-effective solution empowers Santa Monica with greater control over their data assets, thus reinforcing their commitment to data transparency and accessibility.

Main technologies & tools used

CKAN
Python
GraphQL
Hasura
Postgre
AWS
Github Actions

Context

The city of Santa Monica was looking for cost efficiency and improved user experience. Also, the city was not utilizing all the features offered by Socrata. The city was looking for an alternative that could provide similar benefits and save on subscription costs.

In order to find a more cost-effective and flexible solution, the city decided to move to an open-source platform. After evaluating various options, they chose CKAN which can be hosted on their own Amazon Web Services (AWS) account.

In addition to this requirement, the city was under a tight deadline as their Socrata instance was set to be terminated by the end of October, leaving them only 30 days to migrate all their data and have the portal online using CKAN. This put immense pressure on the city and our team to migrate all their data and get the portal online using CKAN in the short timeframe available. Despite the challenges, the outcome was successful.

The city's decision to switch to CKAN proved to be the right one, providing them with the cost-effectiveness, flexibility, and control they needed, while also meeting their tight deadline.

The situation

Migrating from the proprietary Socrata system to the open-source CKAN presented several unique challenges. Tasked with the multifaceted assignment of migrating the City of Santa Monica's data portal, our responsibilities spanned from receiving authorization from the city's team to modify the AWS cloud architecture, to handling the intricate task of downloading and transforming all Socrata-based data to formats compatible with CKAN.

Furthermore, given the fact that Datopian was not contracted for dedicated Hosting and Support services, we had to extend our expertise to offer specialized training. This upskilling initiative empowered the city's developers team with the capabilities to proficiently modify, update, and maintain the CKAN instance within a Kubernetes environment.

The criteria

The goal of the project for the city of Santa Monica was to migrate data, establish a CKAN instance, and allow for quick access and visualization of the data. The project was constrained by a tight deadline of just 30 days and a number of specific requirements, including:

  • Migrate all data to a database under the city's control
  • Create a Kubernetes cluster in the city's AWS account to host CKAN
  • Provide a means for users to filter data and download filtered results
  • Create a simple landing page for UI/UX
  • Modify names, such as changing "Organization" to "Topic" in the portal's UI
  • Provide documentation to guide maintenance and updates of the instance

The project presented an opportunity to demonstrate the importance of careful planning and attention to detail in ensuring a successful data migration and platform establishment.

The solution

Infrastructure

The first step in this process was to get our account credentials from the city’s Information Services Department. Once we had that, our main focus was to build Terraform scripts to spin up the following infrastructure within AWS: Kubernetes Cluster using the EKS Service PostgresDB using the RDS Service Auxiliary services such as Blob Storage

Once we did that, our main focus was on making sure that we could use the dx-helm-ckan Helm charts. Our main problems were related to not being able to use the Google Container Registry, where most of our Dockerfiles are stored as the necessary credentials could not be provided to the city. We overcame this challenge by using the AWS equivalent (ECR) to store most images.

After overcoming these challenges, we shifted our focus to addressing the "download of filtered data" requirement. To accomplish this, we employed the Data-API - a separate standalone service that connects to a GraphQL Server (in this case, Hasura) to facilitate seamless data retrieval. This approach was necessary due to the large size of some of the resources (over 10 GBs), as it ensured that the CKAN main thread was not blocked while processing requests.

Database migration

The Data Migration process was a crucial aspect of the project. With a massive 40 GB of data to transfer from Socrata to CKAN, our team faced a range of challenges.

One challenge was that Socrata does not differentiate between data and visualizations, making it difficult to determine which information was meant to be stored as raw data and which was used for mapping purposes. This resulted in maps being stored as data, leading to confusion and inefficiencies.

Further complexities arose in the data type mapping process, particularly when translating Socrata data types to PostgresDB formats. The task was notably challenging in the context of localization data. The data migration workflow was further complicated by data corruption issues such as missing rows and incorrectly escaped special characters, adding another layer of challenge to an already time-consuming data migration process.

CI/CD Pipeline

At Datopian, our team is well-versed in using the Google Cloud Builds service, but given its unavailability, we had to make a choice between using its AWS equivalent (CodeBuild) or a different CI/CD service. After careful consideration, we decided to use Github Actions for the CI/CD pipeline. This decision was made for two key reasons: firstly, our developers were already familiar with Github Actions, and we had already used it successfully on a previous project. Secondly, the City of Santa Monica's development team was also more familiar and comfortable with GitHub Actions.

Despite these difficulties, our team was able to successfully transfer all of the data, assuring data integrity and laying a solid foundation for the city's future data endeavors.

Frontend

The frontend requirements for the project were initially straightforward, requiring only a simple color change, which was easily achieved by adding custom CSS directly through the CKAN interface. Later, as the portal became live and operational, the City of Santa Monica requested a fully custom landing page, inspired by the City of Helsinki Data Portal. Our team was able to deliver on this request, providing a visually appealing and user-friendly landing page that met the City's needs.

Santa Monica portal frontend

The outcome

The outcome of the project has been a resounding success. The City of Santa Monica's official data portal, data.santamonica.gov, now serves as a one-stop-shop for citizens to access a wealth of information about their city, from public transportation schedules to demographics data. The portal's comprehensible API also allows for easy integration with other services within the city's web ecosystem.

The project's culmination has been unequivocally successful, validating Datopian's expertise in facilitating seamless transitions from proprietary data systems like Socrata to open-source platforms like CKAN. The City of Santa Monica's official data portal, data.santamonica.gov, now serves as an authoritative hub for civic data, offering citizens comprehensive access to diverse datasets ranging from public transportation timetables to demographic statistics. Moreover, the portal's robust API architecture has enhanced its interoperability, streamlining the integration with other digital services within the city's web ecosystem. This transformation not only elevates the user experience but also reinforces the city's commitment to data transparency and accessibility.

What's next

The future trajectory for the City of Santa Monica's data portal is a focus on iterative improvements and enriched user experiences. Committed to exploring the full potential of CKAN's extensive capabilities, the Santa Monica team stands ready to elevate user engagement through innovative visualizations and advanced features. Their unwavering dedication to data transparency and open governance underscores the merit of our collaborative efforts.

At Datopian, our values of integrity, agility, technological acumen, and strong commitment to client success are what sets us apart in the industry. The cornerstone of our business model is cultivating long-lasting partnerships that deliver tangible value. Our collaboration with Santa Monica is a testament to this, and we look forward to further opportunities to enhance their open data initiatives while continually supporting their journey towards an increasingly open and accessible data ecosystem.

We are the CKAN experts.

Datopian are the co-creators, co-stewards, and one of the main developers of CKAN. We design, develop and scale CKAN solutions for everyone from government to the Fortune 500. We also monitor client use cases for data to ensure that CKAN is responding to genuine challenges faced by real organizations.

Related Case Studies