Unstructured data compliance: Obstacles and solutions

Stored unstructured data could be a black hole full of unknown risk. We look at the key dangers to compliance in unstructured data and some ways of mitigating the risks

Stephen Pritchard

Published: 24 Jun 2019

Data scientists estimate that 80% of the world’s electronic information is “unstructured”, or held as email, documents, video and photographs, or even free text.

“unstructured data is data held outside data structures like tables and rows without predictable content patterns, such as documents, emails, photos or free text,” says Jacob Isaksen, a digital forensics expert and founder and CEO of Avian, a consulting firm based in Copenhagen.

Or, as Mathieu Gorge, CEO of compliance specialists Vigitrust puts it, unstructured data looks rather like an unbuilt Lego model.

“Once it’s built you end up with a toy, but it starts with chaos,” he says. “Each piece of information is a brick scattered across the network or even cloud providers.”

Often, this is because an organisation has no defined process in place to categorise or tag data. And, given the volumes of information most businesses now deal with, that might be impossible, at least for older records.

Businesses are moving towards a more structured – or semi-structured – approach through data classification and metadata, to make it easier to manage information and extract value from it. But it remains a work in progress.

“The contents are less predictable in unstructured data. GDPR-relevant information, for example, can reside almost anywhere,” says Isaksen.

Untidy data is a compliance risk

Most unstructured data is never used. According to industry analysts IDC, more than 90% of unstructured data is never examined. This means businesses are not making the most of what could be a valuable asset. But, it also means the organisation is probably not compliant with data protection laws.

“There are all kinds of ways organisations can end up in technical breach of regulations with unstructured data,” says Neil Harris, head of technical services at law firm DWF. “Data retention is a key one: you are likely to have some data for longer than you should.”

This “data debt”, he suggests, is unlikely to attract regulatory penalties, unless the data is lost or stolen. “If you don’t know what you have or where it is, you can’t protect it,” he warns.

The lack of data categorisation and classification is an ongoing challenge for commercial and public sector bodies, with too many organisations relying on individual employees to file or categorise information. At a low level this includes using email rules and applying data classification to files on Sharepoint, cloud and local document servers.

But the sheer variety of file types, and the volumes of data, make manual processes inefficient or impractical.

As Harris points out, businesses in sectors such as insurance have been forced to use rather arbitrary measures, such as the age of a document, to select files for deletion. Other organisations are less proactive.

“unstructured data is largely governed in a decentralised way, often by each user,” says Avian’s Isaksen. “Many enterprises simply write off unstructured data as the responsibility of each employee, whether it is the mailbox owner, the SharePoint site owner, or the network folder owner. But, when a stash of documents containing sensitive data is leaked it very much becomes the enterprise's problem.”

Mining data, growing risks

And there is a further challenge around unstructured data. Regulators such as the ICO say they will take a pragmatic view of technical breaches, such as keeping historic data longer than is justified.

But the picture is very different when that data is subject to active processing. And that processing – attempting to mine business value from the 90% of unstructured data that is untouched – is increasingly important to businesses.

“Unstructured data must be regarded as just as critical and sensitive as structured data,” says Matthias Reinwarth of KuppingerCole. “It often comes as composite objects with embedded documents of varying origin and source system. However, in most cases, these lack clear ownership or categorisation by criticality.”

This can force businesses to embark on time-consuming and often expensive data categorisation exercises, or face the risk that security measures such as data loss prevention and access control tools will miss sensitive documents. At the same time, organisations could be wasting resources protecting information that is less sensitive than they think.

KuppingerCole: Eight key steps towards governing unstructured data

KuppingerCole recommends the following steps to manage and control access to unstructured data:

Identify systems that store unstructured data on-premises and in the cloud.
Assess the risks associated with the data, such as compliance or security. Sensitive data and other intellectual property, for example.
Define specific requirements. Assign these to the existing toolset and identify gaps.
Analyse if and which additional tools are needed and how you can thems into your existing toolset.
Implement and integrate the identified solutions.
Analyse the data and entitlements in detail, classify the data.
Define data ownership and involve data owners in IGA (Identity Governance & Administration) controls and processes.
Enforce consistent access management and governance processes across all types of data and systems.