When it comes to data, “most people don’t know what you’re talking about”, says Martha Lane Fox, founder of Lastminute.com and now a board member of last year’s Open Data Institute (ODI), at the Open Data Summit. “It feels like a closed community.”
This was the opening session of the summit, which takes place at the British Film Institute in London every November and is organised by the ODI. The guiding principle of ODI co-founder Tim Berners-Lee is that data – largely, but not exclusively, government data – can be “freely used, modified and shared by anyone for any purpose”.
The theme of the summit – The web of data – was a reference to the critical need to link open datasets to each other to make the data they contain more powerful. At one level, this amounts to what Berners-Lee says is the test of a good city – can a smartphone get you from A to B on public transport?
ODI co-founder Nigel Shadbolt describes the web of data as “a good idea, a foundation which has not been laid yet”.
There are examples of data being linked in useful ways. In several, but by no means all, cities in the UK and Europe, Citymapper draws on open datasets, including mapping data and public transport timetables, to show people where they are and what their options are for getting where they want to go.
To do this, the data should, first and foremost, be available and up to date. It should also be in machine-readable format. Bus timetables in PDF form are not much fun for human beings – and they are almost useless for navigation apps.
Citymapper is often cited as an open data success story, but is comparatively rare. A counter-example was raised at the summit by a question concerning Threesixtygiving.org.
On its website, 360Giving says it “supports organisations to publish their grants data in an open, standardised way and helps people to understand and use the data to support decision making and learning across the charitable giving sector”. But a questioner from the floor pointed out that UK government data on grants is not currently open.
Why publishing data is not enough
So there is no shortage of open data – but is anyone using it? The UK government’s data portal, Data.gov.uk, currently shows 36,552 published datasets available, and just over 30,000 of those have an open government licence.
There are 6,444 more without a licence and a further 3,664 are listed as “unpublished”. Some 1,401 government departments, including local government and agencies, are listed as “publishers”. Two million datasets were downloaded in 2016, but 11,481 datasets – some 31% of the whole collection – were not.
The UK government sees publication as a measure in itself. In June 2015, the then environment secretary, Liz Truss, announced that the Department for Environment, Food and Rural Affairs (Defra) would be opening thousands of datasets to the public. She said at the time: “Defra has more broad, varied and rich data than any other government department.”
Truss said Defra would open 8,000 datasets by June 2016, calling it “the biggest data giveaway Britain has ever seen”. To explain the rationale, she said: “Tech City people, developers, entrepreneurs, scientists, investors and NGOs [non-governmental organisations] will have full and open access. This has the potential to bring billions of pounds to our economy.”
At the Open Data Summit, Defra permanent secretary Clare Moriarty said the department had opened 13,000 datasets, “some of them very large”. The data-fuelled future, she said, was one where “wine lovers will be able to sip English bubbly made from the sweetest grapes because growers have found the best soil and slopes, and canoeists will be able to check an app to see how fast their local river is flowing”.
Biggest data giveaway
Site analytics suggest that just 669 Defra datasets have been downloaded, with the most frequently downloaded being those covering staff pay and organograms, and financial transactions over £25,000. Together, these two datasets account for one in six of all downloads of Defra’s published data.
Figures showing how much data government departments have published are pored over by civil servants and politicians, keen to be seen as being on the open bandwagon. Then there are league tables. The Open Knowledge Foundation maintains an annual Global open data index, which consistently rates the UK in the top two or three for “readiness”.
The Global open data index looks at the 10 key datasets that should, in the foundation’s view, be open, up to date and machine-readable. These include items such as mapping, government statistics, budgets, legislation and transport timetables. The latest survey gives the UK a score of 76%, putting it in second place behind Taiwan.
The Global open data index 2014, published in 2015, contained results for 97 countries, which would have given a maximum 970 open datasets in the key areas. The figure for datasets that existed, and were open, available, up to date and machine-readable was 106 – a shade under 11%.
In the edition published in 2016, more countries have been included – just under 160. The survey tracked the availability of 1,585 datasets – of which just 142, or 9%, are available, machine-readable and up to date.
Are these the right metrics? The Open Data Summit featured several talks and a panel discussion devoted to data infrastructure, but there was little or no agreement on what aspect of data infrastructure was being considered, and therefore what the precise question was. Were we talking about servers and connectivity, or data formats and readability?
The Global open data index does not track data that may be being accessed and/or linked to other data in the background. So, by concentrating on this more visible data infrastructure and the ability of someone to find a spreadsheet online, the global index is not really tracking open data as it
is really used.
Not the whole story
In other words, published spreadsheets, available at the click of a mouse, do not tell the whole story. It is data that is being accessed through an application programming interface (API) as part of a more complex app, which is where the real work of linking data is done and where the real tests of openness and usability of data lie.
Here, too, there is much to be done. APIs that are available on many UK government data publishers’ web pages adhere to no single standard, and reflect considerable uncertainty about who the user might be and what they might want.
Some APIs limit what they will allow the user to do, while an enlightened few offer a full range of search and download options, expressed in a user-friendly style, which, in turn, allow the user much fuller rein.
Without a fuller understanding of what users want from datastore APIs, the web of data will be a long time coming.