Data scientists estimate that 80% of the world’s electronic information is “unstructured”, or held as email, documents, video and photographs, or even free text.

“Unstructured data is data held outside data structures like tables and rows without predictable content patterns, such as documents, emails, photos or free text,” says Jacob Isaksen, a digital forensics expert and founder and CEO of Avian, a consulting firm based in Copenhagen.

Or, as Mathieu Gorge, CEO of compliance specialists Vigitrust puts it, unstructured data looks rather like an unbuilt Lego model.

“Once it’s built you end up with a toy, but it starts with chaos,” he says. “Each piece of information is a brick scattered across the network or even cloud providers.”

Often, this is because an organisation has no defined process in place to categorise or tag data. And, given the volumes of information most businesses now deal with, that might be impossible, at least for older records.

Businesses are moving towards a more structured – or semi-structured – approach through data classification and metadata, to make it easier to manage information and extract value from it. But it remains a work in progress.

“The contents are less predictable in unstructured data. GDPR-relevant information, for example, can reside almost anywhere,” says Isaksen.

Untidy data is a compliance risk Most unstructured data is never used. According to industry analysts IDC, more than 90% of unstructured data is never examined. This means businesses are not making the most of what could be a valuable asset. But, it also means the organisation is probably not compliant with data protection laws. “There are all kinds of ways organisations can end up in technical breach of regulations with unstructured data,” says Neil Harris, head of technical services at law firm DWF. “Data retention is a key one: you are likely to have some data for longer than you should.” This “data debt”, he suggests, is unlikely to attract regulatory penalties, unless the data is lost or stolen. “If you don’t know what you have or where it is, you can’t protect it,” he warns. The lack of data categorisation and classification is an ongoing challenge for commercial and public sector bodies, with too many organisations relying on individual employees to file or categorise information. At a low level this includes using email rules and applying data classification to files on Sharepoint, cloud and local document servers. But the sheer variety of file types, and the volumes of data, make manual processes inefficient or impractical. As Harris points out, businesses in sectors such as insurance have been forced to use rather arbitrary measures, such as the age of a document, to select files for deletion. Other organisations are less proactive. “Unstructured data is largely governed in a decentralised way, often by each user,” says Avian’s Isaksen. “Many enterprises simply write off unstructured data as the responsibility of each employee, whether it is the mailbox owner, the SharePoint site owner, or the network folder owner. But, when a stash of documents containing sensitive data is leaked it very much becomes the enterprise's problem.”