Article Image

Understanding access permissioning with CKAN and Datopian DMS

Author Image
Rufus Pollock
5 mins read

Introduction

CKAN and Datopian DMS (which is built on CKAN) are an integrated data management solution for managing metadata and data. It provides non-specialist users with an easy-to-use interface, while deep extensibility supports customisation and integration. It is this balance of simplicity and extensibility that has seen CKAN widely adopted by governments, nonprofits and enterprise all over the world.

Background

CKAN was initially designed for use in government as an open data platform. With origins in open data, there was less demand for functionality to manage sensitive data. This need only arose as private companies began to look to CKAN as a tool to manage their internal data, as protecting sensitive data is a priority for many enterprise clients.

CKAN’s extensibility makes it an attractive option for any organisation hunting for a data management system with customisable access permission layers.

We’ll review some of the common approaches to managing sensitive data below.

Protecting sensitive data using discoverable metadata

CKAN’s suitability as a solution for protecting sensitive data lies in its high-granularity metadata search. For those unfamiliar with metadata search or indeed CKAN, we explain these in detail in our ‘the basics’ section below. 

The basics

In order to understand how CKAN protects private data through metadata search, we need to understand a) some key terms and b) some basics about CKAN.

What is the difference between data and metadata? 

Metadata is ‘a set of data that describes or gives information about other data’ (Oxford dictionary). To use the example of a photo on your phone: the photo is the data, and its metadata is information such as the date of capture, the photo size, the storage location etc.

What is meant by Personally Identifiable Information (PII)?

Personally Identifiable Information is data that is specific to a certain individual. The ICO defines personal data in the following terms: ‘if it is possible to identify an individual directly from the information you are processing, then that information might be personal data’. Examples of PII could be a name, phone number or address. This data is classed as sensitive and often protected by data protection laws. This is especially true of highly sensitive personal data, such as personal medical history or financial data. Only persons with access permission can view PII through CKAN.

The major value proposition of CKAN is its ability to capture metadata about data in a central registry. In other words, CKAN can store and manage metadata for data stored elsewhere. Publishers who administer data source systems simply push metadata into the centralized, CKAN-based catalog.

This allows CKAN users without access to certain sensitive datasets to still see that these datasets exist on the system; in short, metadata search protects private data without sacrificing the discoverability of this data.

Consequently, CKAN works particularly well in scenarios for organisations with highly sensitive data in existing, access-controlled source systems. Users come to the centralized catalog to find data that exists, including beyond the administrative boundaries they generally work in. If the user does not have sufficient permissions in the source system, she can request the data owner for access through CKAN.

Example use case

The best way to appreciate the significance of CKAN’s metadata search for organisations handling private data is to explore some practical applications. Imagine a drug developer in the cardiology department wants to use CKAN to do some research on clinical trials of similar drugs to the one she is currently working on. She may be able to see on CKAN that, last year, scientists in the neurology department ran a comparable drug trial on women aged 18-25 over a one year period. What she can’t see are the actual results themselves, as they contain PII. However, she believes that this data could really help her research, and so she approaches the neurology department to request access permission. Had she not been able to use metadata for data search and discovery through CKAN, she would have had no way of knowing this similar trial had even taken place, since the sensitive data would have not shown up on the system. 

Implementing access permissioning with CKAN: the different approaches

CKAN caters for different access permission requirements. There are two main approaches to implementing access permissions.

Integration with organizational permissions for access control

CKAN has a default authentication and authorization interface. Often, users may wish to extend this interface with other organisation-specific systems, such as SSO or Active Directory. This is a particularly attractive option for organisations with existing log-in systems that hold basic information about organisational structures (eg. who can access what, who belongs to which department, who is part of the leadership team). CKAN allows you to design access controls that are driven by the permission layers within your existing systems.

The way in which these systems interact with CKAN is variable and can be customised based on individual business needs. Here’s an example: you might design CKAN to allow anyone from the finance department to access financial data. When a user tries to access data in CKAN, CKAN pulls information on this user from your existing system. If your existing system tells CKAN that the user is from the finance team, CKAN will grant them access to all financial data sets. You might choose to add in another level of complexity, for example stipulate that only someone from finance who is also part of the executive team can access certain financial data.

Access permissions can be applied to metadata, data stored in CKAN, and/or data stored in an external source system registered in CKAN.

Custom access for data hosted in CKAN

Another approach is to customise all access controls for data and metadata within CKAN itself, regardless of any existing authentication and authorization systems. In this approach, CKAN effectively becomes the only access point for which users interact with data. Generally speaking, this approach is less common as it involves the implementation of complex organizational access controls into a single data system, which is generally not advised, but may be the best option permissible in scenarios where CKAN is the only data management solution within an organization.

This approach is the default way in which many of our government clients use CKAN. This is because many governments use CKAN as an open data portal, meaning they have far fewer requirements for access permission layers than our enterprise clients.

CKAN and secure data sharing

Being able to create custom access controls within CKAN brings the added advantage that it facilitates secure data sharing. You could, for example, customise the access permissions for people with a URL to certain datasets. It allows you to share data in the following schemas:

  • Share within a department.
  • Share within the organisation.
  • Share within external collaborators.

The latter bullet point demonstrates a real value proposition of CKAN. If your organisation is partnered with, collaborating with or linked to another organisation, you can choose to grant them access to certain datasets.

Executive summary

CKAN can be used to create varying approaches to protecting private or sensitive data. It can work alongside your existing organisational systems and access permission layers can be as complex or as simple as needed.

We are the CKAN experts.

Datopian are the co-creators, co-stewards, and one of the main developers of CKAN. We design, develop and scale CKAN solutions for everyone from government to the Fortune 500. We also monitor client use cases for data to ensure that CKAN is responding to genuine challenges faced by real organizations.

Related blog posts

Case Study Image
4 min read

A Brief Introduction to Data Portals

A crucial tool for any organization, data portals perform a range of functions, from providing an easily-searchable catalog of your data to enabling data visualizations and enhancement. This article i...

Author ImageAuthor Image

Annabel Van Daalen

Paul Walsh

Case Study Image
6 min read

On the Value of Data

Data has become increasingly intertwined with our daily lives as more companies collect, analyze, and utilize it—and its use is growing exponentially. Data is everywhere. IoT is opening up new possibi...

Author Image

Michael Polidori