Amazing uses of open data and the seven ways people try to kill them

Shaun McGirr
August 26, 2015

Assuming you can find the right stone, how will you get the blood (or open data) out? Source: flickr/dj2lip (CC-BY-2.0)

Why I want more open government data (we paid for it already!)

As you might already know I’m a bit of a crusader for open government data. Maybe it comes from my background as a researcher, but I do know as soon as I started paying serious income taxes again my zeal strengthened.
To me the logic is quite simple: our taxes fund government to do a number of jobs, during which government collects data, so haven’t we already paid for it? And beyond the risk to individuals’ privacy (for which the protections are well-established) what harm does releasing this data cause? Here are the objections I hear most often, and a proposal to resolve each one.
But before that, here are three outstanding NZ entries from GovHack 2015 to demonstrate the value of open data:

  • PowerSlide lets you to play with the complex tradeoff between CO2 emissions, power generation, annual cost and investment. I like this one because it visualises a complex policy decision AND encourages me to interact.
  • Crime Sheep take some pretty bland and depressing table-based crime statistics and ask “what kind of crimes do people like you commit?” Even more impressive is that the team of two learned all the necessary skills while making it over GovHack weekend!
  • What’s Next? developed a web app to guide school students towards careers that leverage their interests but also pay a dime. Give it a shot and see some clever visualisation of jobs and salary, qualification and debt data.

1. “We don’t have the resources to release or maintain it, and there isn’t enough demand for this data or that format to justify the expense”

This is the saddest objection to opening up more government data, as it is the most easily resolved. While I’m all for mandates and guidelines to agencies to encourage them to open more data, the reality is it’s a little bit complicated and often the job isn’t assigned to the right people. Furthermore, providing the data once creates an expectation by users that you’ll keep it up to date, creating future costs. And providing it in one useful format might lead users to ask for it in another useful format…OH NO PEOPLE ARE USING OUR DATA!
Resolution: In New Zealand the good people at the Open Government Information and Data Programme have put together a toolkit to help agencies assess and open data and are only too willing to provide additional guidance. Ultimately it’s up to agencies to give data enough strategic priority that they adequately resource its management, including publication and maintenance. Then agencies need to develop and implement a comprehensive data lifecycle process and assign that job to a team as something more than an afterthought. Easier said than done, but agencies that get ahead of this requirement will avoid pain from highly motivated ministers at a later date!

2. “We already put spreadsheets on our website”

Almost every agency publishes some data, and that’s good. Even better when this is in some kind of tabular format rather than stuck inside PDFs. There are three main problems with most “spreadsheets on the web” approaches. First of all, the data inside the spreadsheet is often poorly described, a criticism that can be levelled at most published data. Second, the need to make spreadsheets easily readable by humans leads to formatting decisions (eg merged cells, in-line category headings) that make consuming the data any other way a highly manual task. Third, the spreadsheet up on your website might not be one that anybody wants to use.
Resolution: Publish “clean CSVs” instead of spreadsheets. Again the Open Data NZ people have put up a handy explainer on the difference between spreadsheet and CSV, which concords nicely with the recommendations of Hadley Wickham’s “Tidy Data” framework. Also ask users of your data what they use and what else they would like to use. If you don’t know who your users are, work out a way to find out! Statistics NZ used Loomio recently!

3. “We’re planning to put up an API to that data”

Recently the open data crusade has become intertwined with calls for government to use APIs (application programming interfaces) to provide everything from census statistics to new ways for government and businesses to interact. While I’m all for this and see the overlapping goals, an API that allows applications to easily grab your data is only part of the puzzle. For a start, how many of us know how to call an API? Is that even possible from most people’s tool of choice (Excel)? I can’t put this any better than the City of Boston’s Chris Dwelley:

“Throwing data up to say you do open data is a disservice to everybody,” says Dwelley. “We want it to be easier for every day citizens to understand this data, relate to it and actually WANT to use it.” Source

Resolution: Start by understanding who currently uses your data, how they use it, and what more they want. Then ask who is missing from this picture, and talk to them. Only some will ask for an API, many will simply ask for more, well-documented, frequently-updated clean CSV files. Opening data to only the data geeks isn’t enough.

4. “It belongs to this agency so you can’t have it”

This is the most galling to me as it goes against the logic of “I already paid you to collect it”. What’s even more depressing is that with a well-written Official Information Request I’m sure I could get the data, but that’s time-consuming for everybody involved. Of course, some data is collected for a specific legislative purpose and can’t be used for other purposes. But the vast majority of data collected is not encumbered by such conditions.
Resolution: Change your thinking from “this belongs to me” to “this resource belongs to all of us and it’s my responsibility to make sure it is well used”.

5. “What you will do with it, and will there be a return on our investment?”

Late last year I sat through an enlightening presentation by a public official on the new data generated by their project. In the Q&A somebody asked “would you consider opening that up for reuse?” to which the answer was, and I kid you not: “collecting this dataset was expensive for ratepayers and I’m not sure opening it up would give them a good return on that investment”. Either the barrier was really one of the other reasons above (like resource constraints or low trust in the data) or somebody forgot whom they serve, while looking round for somebody to buy our data and stop us from using it. Or that reaction could be driven by fear that the data might show someone (maybe even its owner!) in a bad light, leading to the “what will you do with it” question, which is usually impossible to answer in advance.
Resolution: While every data custodian dreams of “collect once, sell many”, it doesn’t mean it will happen. If you fail to on-sell your data, and the exclusive right to use it, to a third party within a short period of time it is you who is killing off returns on investment ( anybody?). It’s better to recognise that those who collect data can’t envisage all its uses, and this innovation is best enabled by opening it for reuse. And yes, maybe releasing the data openly leads to some sunlight being shone where you would prefer it didn’t go, but journalists and OIA-armed citizens are going to find their smoking guns regardless. Here are even more fantastic case studies of reuse of open government data to change your mind.

6. Someone (or even worse, foreigners) might make money out of this

I used to be on the other side of this issue. If you’re from the free-as-in-beer software movement, you might not think any intellectual property should be sold for cash money. Now I have a real job I’ve learned you need money to do really cool stuff. Furthermore, what is the point of paying government to collect all this data if smart people can’t turn it in to economic growth?
Resolution: If you work in government your #1 job should be to serve your public by making their lives better. Opening up some data is a really easy way to unleash people’s innovative ideas to make their lives better. Get it?

7. “I don’t want you to know the breed of the dog I registered”

Lest you think I just made that up, I heard this one raised as a ‘word of caution’ by a prominent public official at a recent event celebrating…a hackathon dedicated to open government data (FACE PALM!). But this is just one of the more ridiculous examples of the “generalised privacy/security/confidentiality” concern often levied against requests to open up data, which is well-founded but misguided.
First of all, open data advocates generally couldn’t care less about unit record data on individuals and their behaviour, it’s Facebook and the Spies who want that (and who are collecting it regardless). All we’d like to know, for example, is how many of each breed of dog were registered in a given area over a given time period.
Secondly, even data about individuals can be confidentialised. Statistics NZ, for example, do this all the time and have well-established procedures. They’d probably even help you implement them within your organisation if asked nicely!
Resolution: It would be easy to just exclaim “idiots!” and walk away muttering, but open data advocates need to do a better job here. First we need to make reasonable requests that recognise legitimate confidentiality and privacy concerns. Use the word “aggregates” and demonstrate how what you’re asking for can’t be used to identify individuals. Be flexible and prepared to compromise to get some data opened up for reuse under a license compatible with the NZGOAL framework and use your success to educate others! Stone by stone, we’ll get there.
Until next time, keep asking better questions
Shaun – @shaunmcgirr

Copyright © 2019 OptimalBI LTD.