The government must improve its understanding of technology to prevent the identity and personal information of individuals being revealed in datasets, a privacy expert has warned.
The Cabinet Office commissioned a review last December on the privacy impact of the government's transparency agenda. In the report published today, Kieron O'Hara, a senior research fellow in electronics and computer science at the University of Southampton, warned that, as the subjects of datasets about government activity, individual citizens could be at risk of identification.
Discussion about identification has been driven largely by legal considerations, with the input of the technical community neglected, said O'Hara. "Legal definitions of privacy have tended to dominate the debate in the United Kingdom and elsewhere. However, these have proved inadequate to provide a clear framework for analysis of privacy issues, especially in the context of jigsaw identification using recently developed de-anonymisation techniques," he said.
O'Hara called for a greater awareness of the potential for identifying citizens. He said this could be done by including technologically-trained experts in procedures for deciding whether or not to release particular datasets. There also needs to be a greater awareness of technical issues in the Information Commissioner's Office (ICO), he said.
O'Hara also recommended that data.gov.uk should include prominent reminders of the provisions of the Data Protection Act. He said it should clearly state that best practice discourages attempts to strip data of anonymity.
He also suggested the government be more transparent about the use of anonymisation techniques in datasets. "This will facilitate sensible and accurate debate about the risks and benefits of data releases," said O'Hara.
As part of its transparency drive, the government recently opened up public data from the NHS, schools, criminal courts and transport sectors.
More data is expected to be released at the end of this year, when the government is due to launch its Public Data Corporation (PDC), which will house the datasets. However, it remains unclear as to whether all data contained in the PDC will be made available.