“What century is that from?,” is the natural reaction upon getting a set of accounts from Companies House, the UK’s public register of company records.
If you get a chance to put this question to Companies House, you might be surprised to find the explanation as unsatisfying as the experience.
Computer Weekly put it to Companies House CEO Tim Moss in 2012, at a conference put on by the open data reformers then operating out of the Cabinet Office. They were trying to get public bodies to publish their records as data.
People had been labouring for years over the old-fashioned way Companies House published company records: as scanned images of paper documents that companies sent in the post, which it stuck in PDF files on its website. You could download company records from the public register. But since the documents were effectively photos of documents you could get information from them only if you copied it by hand. You couldn’t copy-and-paste a company’s tax record, for example, or directors’ salaries, offshore shareholders and so on, unless you copied it out by hand. It was backwards Steam-Punk: 19th Century technology delivered in a 21st Century wrapper.
“So what century is that from?”
Moss’s answer was data.
Companies House was preparing then to publish company accounts as data. It did so in 2014. But it still put “photo” PDFs on the public register.
You could get the data if you went round to the registrar’s tradesman’s entrance, as it were. Its bulk data downloads of company records were unsuitable for anyone but computer experts or users well endowed with time, computer skills and resources.
The general public tend to use the registrar’s front desk – the conventional companies registry, where you ask for details of a company and get a list of its records. That’s where you would expect people to go. When they do, they still get “photos” of documents.
Companies House has been absorbed instead by data, which it publishes as zip files containing tens of thousands of company records at a time. Its recent company data releases were published as daily downloads of up to 10,000 files amounting to 500Mb a-time. It puts up to 200,000 files a-time in downloads for records more than a few months old, which it zips by the month-load. Its data serves the business information industry, not the general public.
Companies House claimed on 22 June it had reached a milestone in its modernization of company records when it let software developers get at its data through an application programming interface (API). Computer experts could use the API to build software applications that referred to its company records.
“This makes the UK register one of the most open in the world and the UK economy one of the most transparent,” Moss insisted in a statement in June.
And Mike Bracken, who was then director of the Cabinet Office modernization unit, insisted: “This is the model for registers of the future.”
But the general public still got photos of company records on its public-facing register.
So two weeks ago, CW put to Companies House press office the same question it put to Moss in 2012.
“Photocopied PDFs? What century are we in?”
We have given people data downloads, it said.
Do not try this at home though, people.
Let’s say you did though, because you wanted a company report you could utilise by simply cutting-and-pasting.
You would first have to find the Companies House data download that had the file you were looking for. That’s like finding a haystack before you know which one has the needle in it.
You might have to get every single daily and monthly download from the last twelve months – 80 zip files totalling approximately 90Gb of data in upwards of 7m company records. It would take an inordinate amount of time just to download them, and then to unzip them.
Then you would need to find which file among 7m contains the report for the company you were looking for. But to search them you would have to convert them, from the iXBRL data format the registrar keeps them in, into an XML format that can be processed using standard computer programming tools…
You following? Good. Because you would need a computer expert to do this. You would first need skills to search the XML data. That would not be straightforward. And data derived from XBRL moreover is notoriously complex.
And that is not all. With so many files, in order to search them you would first have to index them, or to stuff them into a database system that would do the indexing for you, neither of which you could do unless you had an intimate understanding the XBRL data’s underlying structure, or unless you paid for specialist software that did, and a specialist to do it.
So really, you aren’t going to do that. You might write some computer code to query the Companies House API. But as a member of the general public, you aren’t going to do that either are you?
Let’s just say you found a clever way round this though.
You would need to know the registration number of the company you were looking for. And you would need to know on which day it had filed the document you were looking for. You could get this by looking it up in the public register, where its accounts are given as a photo PDF. Ignore the PDFs for now. Just get the details and come back here: we want the data.
Let’s say for example you were looking for the last published accounts for Methods Advisory Limited (formerly Methods Consulting Limited) – a firm operated by two of the architects of the government’s ‘transformation’ of the public sector, and which has profited them handsomely. You would see from the public register its registered number was 02485577 and that its last annual accounts were published there on 12 Feb 2015.
To get the data for Methods Advisory Limited you would need to get the Companies House data download for that day. It comes as a zip file of 115,937 files. The zip is so big it takes 10 minutes to download, almost three hours to unzip and contains so many files that it takes a whole minute just to open the folder (on a freelance journalist’s not-overly-creaky computer).
But if you got this far, you might have noticed that the data filenames contain within them the registered number for the company concerned. So to find the file you were looking, you would need merely use your local file browser’s search facility to find the file with a name containing ‘02485577’.
There is a big problem though. Methods Advisory Limited’s accounts aren’t in the February 2015 download where they should be. There’s no choice but to go back to the public register’s front desk to check.
Only if you download the photo PDF will you learn that Methods directors actually filed their accounts on 30 January 2015. The registered date is merely the date Companies House loaded their accounts into system.
So you would get the January download as well: 10 minutes to download, 3 hours to unzip and 133,228 files. It’s been more than six hours and 13Gb. And you’ve only got two months of data.
Worse still, Methods Advisory Limited’s accounts aren’t in the January download either. They are simply not there. In fact, 40 per cent of companies haven’t filed their accounts with Companies House as data. Methods just happens to be one of them.
Then – if you copy it out by hand – you could relay the following information:
“On 1 May 2014, [Methods] demerged into 5 separate legal entities: Methods Advisory Limited, Methods Analytics Limited, Methods Digital Limited, Methods Enterprise Limited, and Methods Professional Services Limited.”
Many companies have such complex structures that the only way to find the records of the corporate entity it uses primarily to account for its business is to actually open the accounts and look at them.
Fortunately, the public register already contains all the company records. Companies House has even indexed them. So you only need ask the registrar to give you a copy. You just go to the registrar’s website and do a search. It takes milliseconds. This is the purpose of Companies House – officially called its ‘public task’: to keep a reliable record of companies so members of the public don’t have to climb a mountain just to hold a single company to account for a single year.
Moss, the public registrar, was talking about this when Computer Weekly raised the question of electronic documents with him in 2012. Companies House did not have a misguided belief that it was only there to serve corporate interests. It had an official duty to serve the public interest, by giving the public the means to get company information about private companies.
“Our public task is very simple,” said Moss.
“It’s based on the Companies Act which is, incorporate the company, record all the events during its life, dissolve it, and make all the information available to the public.”
Yet the registrar is still publishing company records as photos.
What explanation did Moss have for a member of the public fed up with his photo PDFs in 2012?
“We have a big move now to iXBRL. We are now getting about 50 per cent of that data in and we are in the process – we have a board meeting this month on how we are actually making that available,” said Moss.
The registrar subsequently made it available to the general public, but only with the limitations described above.
Even though iXBRL, a data format HM Revenue and Customs developed to store company records, doubles both as data and as a document people can actually cut-and-paste from a web browser. So it needed only point people’s web browsers at the iXBRL files when they go to its website looking for company records.
It instead turned the iXBRL back into “photo” PDFs and put those on its website instead – going out of its way to limit the public access to proper electronic company records.
Companies House press office said in response to this that it did plan to put iXBRL versions on its website, possibly within two months. But it was a low priority. It first declared a strategy to publish company records in such a format in 1995. It has since served the business information industry very well.