Poor data quality hindering government open data programme

Poor data quality is hindering the UK government's open data programme, intended to reform the public sector

Poor data quality is hindering the UK government's flagship open data programme, intended to reform the public sector by making it more transparent and accountable.

Public bodies have published spending records every month since November 2010 under a coalition programme billed as encouraging voters to become "armchair auditors" and to make politicians more accountable.

But experts say the government's data releases have been of such poor quality that public scrutiny is all but impossible.

A Computer Weekly analysis of 50 spending data releases by the Cabinet Office since May 2010 has shown they were so marred by "dirty data" and inconsistent computer encoding, systematic scrutiny would require advanced computer programming skills.

Ian Makgill, whose company Spend Network has processed 42 million public spending records from 7,500 spreadsheets, said the data was troublesome. "There's a range of problems with this data," he said. "Central government has pulled the wool over ministers' eyes. They don't want the accountability."

He accused some public bodies of "wilful" defiance of the coalition government's data edict. Wigan Council withheld spending data unless ordered under Freedom of Information law, while The Ministry of Justice refused to publish spending data until the information commissioner ordered it in February. Makgill said public bodies feared unfair criticism from tabloid newspapers.

A source working on the open data programme at the Cabinet Office said public data releases had been dirty and inconsistent.

"I would agree the evidence is there to support that," said the source. "They talked about armchair auditors – there hasn't been a lot of that. You can look around and not find them. Some busybody can read through the PDFs, but to make some sense of the aggregated mass is almost impossible with the raw data you've got.”

A spokeswoman for the government-backed Open Data Institute (ODI) said it plans to release a certification scheme in an attempt to improve the quality of public data. The ODI has been training civil servants and working with the Institute for Government to address the problem.

"[Civil servants] just don't have the skills," she said. "They don't understand the difference between good and bad data."

Harvey Lewis, head of data analytics at consulting firm Deloitte, said government data releases had been "patchy".

"There's some standardisation but it's not complete, so cross-referencing is difficult," he said. "It's of varying quality from different departments – that's a challenge."

Lewis attempted to analyse spending by all government departments in a single month, but had to go back eight months to find the records and had trouble matching them up. Some departments had missing data, while some tagged it differently.

Experts said the widespread use of Microsoft Excel spreadsheet software was a common source of problems. It usually outputs data in a proprietary format incompatible for anyone not using Microsoft software.

The Cabinet Office Open Standards Board addressed this problem in January by decreeing that all government data releases should be made in a non-proprietary format called UTF-8. However, even the Cabinet Office itself continued to release spending data in Microsoft formats.

Computer Weekly understands the Cabinet Office is still intent on getting its UTF-8 edict to stick. But the open data programme could be held up by its policy to drop proprietary formats, such as Microsoft's, altogether

Computer Weekly was informed through unofficial channels that the Cabinet Office believed Microsoft Excel caused data problems even when it used the officially sanctioned UTF-8 format.

Download the Computer Weekly buyer's guide to data management

In this free 11-page buyer’s guide, Computer Weekly looks at the business benefits of data management tools and methods of analysing the stored data, along with some examples of how this data is being used to innovate.

Contents include:

  • Good data management cuts costs and boosts compliance
  • Anglian water opens floodgates for external data
  • Data-driven innovation

Click here to download the buyer's guide to data management

Buyer's guide to data management

The Cabinet Office's own data releases suggested that data problems were caused more by its staff than Microsoft software. Computer Weekly's analysis of 50 Cabinet Office data releases between May 2010 and March 2014 encountered a high frequency of dirty and inconsistent data, only explicable as human error.

Though the Cabinet Office administers the open data strategy, its own spending data was mired by inconsistent dates, filenames, data fields and by problems with the data itself.

Its data tallies of public spending were regularly formatted with commas that derailed simple attempts at data processing. It changed its filename convention 22 times. It omitted or added data fields in 20 releases. It changed the way it formatted dates halfway through 2011 and, after a blip in 2013, changed it back in January to what it had been in 2010.

HM Treasury tried to prevent such problems in 2010 by publishing instructions for public data releases. It assumed public bodies would release their data in a Microsoft format, but it gave specific instructions to avoid the data problems consistently found in Cabinet Office data.

Cabinet Office data was nevertheless contaminated with characters incomprehensible to software using UTF-8 in 18 releases. It was ill-formatted twice in three months following its UTF-8 edict. It didn't once publish its data in UTF-8. Instead, it flipped erratically between other character formats 25 times in 50 months. It used the old US ASCII standard 16 times, the disused international ISO-8859 standard related to Microsoft Windows 12 times, and a similar non-ISO encoding on 22 occasions.

A Cabinet Office spokesman said: "Transparency is at the heart of our agenda and we’ve made this government the most transparent in history. But we are aware there is always more to do. It is our firm objective that data is of high quality. We continue to work with users and welcome suggestions."

A Ministry of Justice spokesperson denied the claims of dirty data: "These claims are untrue. Ensuring our data is accurate and open to scrutiny is crucial to public confidence in our work. While this has led to some delays, we are now publishing reliable data on a regular basis."

A Wigan Council spokesman said: "We do not publish spend data as a formality because we have previously encountered cases of attempted fraud [with it]."

The data problems follow Computer Weekly's revelation in 2013 that the Conservative Party deleted its archive of political speeches after promising voters it would nurture political transparency. Spending transparency was the other half of its flagship policy of government reform.

It has also been revealed that the Conservative open data strategy was held up in London, where mayor Boris Johnson's plans for reform were blocked by government officers.

Read more on Data quality management and governance