Knowledge Base

Welcome to the Datopian Knowledge Base! Here we explain the theoretical foundation behind Datopian DMS – our unique framework for building reliable, flexible systems for organizing and integrating data – as well as the basics of good data management.

Background: data needs are evolving

The volume and variety of data being collected by organizations on a daily basis continues to increase. Even the simplest of organizations may have dozens of data assets, ranging from cloud spreadsheets to web analytics. Larger organizations, on the other end of the scale, can have very complex data arrangements, ranging from Hadoop clusters and data warehouses to CRM systems.

The surging interest in collecting and integrating data reflects a rise in the understanding of data as valuable. Whether your goal is to increase profits, campaign for policy reform, or develop new medicines, data is the key to your success. Most organizations know, or at least suspect, that they are sitting on valuable data. What they are not so sure about is how to unlock this value. 

The more data you collect, the more robust your storage needs to be and the more sophisticated your system for managing it. However, as use cases for data have increased in number and  complexity, most systems and approaches to working with data have not traditionally developed at the same rate. This has left many organizations facing the common data management problems explored below. 

Some common data problems

With decades of experience monitoring data usage across governments, enterprises and nonprofits, we’ve seen all manner of organizations come up against the same problems time and time again when trying to work with data.

Before approaching Datopian, organizations almost always find themselves using one of five approaches to data management:

1. No real approach to managing data.

Many organizations are overwhelmed by data. They often don’t know what data they have or could acquire, what sort of things they could do with data, or where to even start. 

Organizations commonly at this stage: start-ups and small businesses.

2. No real approach to using data.

The logical next step after finding and organizing data is working out how to connect this data with concrete business needs. Whether they can afford to do so cost effectively, or actually have the relevant data needed, are common associated blockers.

Organizations commonly at this stage: start-ups and small businesses. 

3. Processing data manually

The majority of organizations that approach Datopian are not new to managing data, but their approach is not often advanced or even adequate. In most cases, they process data manually, sharing information via email or ad-hoc solutions such as Dropbox, and wrangling Microsoft Excel. In all cases, they find this approach to be slow, error prone, costly and difficult to scale. Training staff and identifying the relevant tools are additional challenges to this approach.

Organizations commonly at this stage: SMEs, nonprofits, local public authorities.

4. Building semi-automated pipelines

Some of the more technologically advanced organizations turn to building semi-automated pipelines, either by themselves or with the help of hired data engineers. While more advanced than processing data manually, this approach is expensive, hard to audit and even harder to debug. It can also leave organizations dependent on ad-hoc solutions and key staff.

Organizations commonly at this stage: large enterprises, large NGOs and nonprofits, public authorities.

5. Investing in proprietary solutions

For many organizations processing large amounts of data, purchasing a data solution from a vendor can seem like the only option. After all, there are many solutions for ETL, BI and data governance on the market. However, a common denominator among proprietary solutions is that they are expensive, and falling prey to vendor lock-in can lead to even greater expense further down the line. Moreover, their one-size-fits-all approach is not conducive to the evolving needs of modern organizations. More than anything, proprietary solutions are inflexible and demand an organization rehaul its workflows and goals to fit with the limitations of the new technology. 

Organizations commonly at this stage: governments, multinational corporations, global NGOs.

The Data Management Journey

The crucial pitfall shared by all the above approaches is that they do not treat data management as a process or journey. If you have not taken the first step of collecting the right data, or working out how to use this data in the pursuit of concrete business goals, for instance, then building a system to analyse it is wasted investment. 

Good data management involves paying equal dividends to all stages of the data management journey. Here’s a basic overview of the data journey in 4 steps:

The Data Management Maturity Model 

As the Data Management Journey is just an overarching framework, different organizations will naturally fall on different points of the spectrum within each stage of the journey. For example, an organization that has integrated internal data from across internal systems is just as far along the Data Management Journey as an organization that is integrating external data using an API – they are just integrating at different levels of maturity. 

The Data Management Maturity Model is a framework developed by Datopian has over time to help clients better understand where they are now, and where they still have the option to go.