August 23, 2019 by Adam Kariv
DataCity is a project aimed at creating a single repository of all municipal data in Israel. In my presentation at this years’ PyCon Israel, I’ll talk about the project and the Python toolset we’ve built to create and manage this large ETL operation.
Municipalities are the branch of the government that probably affect us the most (think education, garbage collection, building permits, etc.).
They are also notoriously known to be non-transparent – making it difficult for us citizens to make sure that the people in charge are making good use of our taxes and that our city is performing well in comparison to others.
Less than half of Israel’s municipalities publish essential infos on their websites – information such as phone numbers or opening times of the city hall; Moreover, about 7% of municipalities don’t even have a website.
And that’s only the first degree which displays their lack of transparency and openness.
Very few publish good quality data methodically – However “good quality data” is not enough because there is no standard.
In the beginning of 2019 we (at Public Knowledge Workshop, also called “Hasadna”) embarked on a project to make municipalities more transparent – DataCity. In this project we aim to create a single API endpoint for all municipalities’ data (normalized, standardized, verified, regularly-updated).
There are a few problems along the way, though – Firstly, they don’t really want to be transparent. Secondly, data is of low quality and very non-uniform.
To solve the second point, we’re building a versatile framework for extracting data from various sources and formats, cleaning it, mapping it to a predefined schema, validating it with domain-specific rules, enriching it and finally publishing it in our data warehouse.
We’re doing all that in a reusable way, based on open source tools.
In my presentation I’ll talk in more detail about the software tools we used (e.g. the dataflows ETL library) as well as the reasons for the lack of transparency.
I’m Senior Data Engineer at Datopian, and an Open Data Consultant and Activist Founder of the Public Knowledge Workshop – הסדנא לידע ציבורי