Einstein's secret answer to unstructured data

Software application developers today are struggling (we are constantly told) with the task of juggling unstructured (usually big) data.

Grafting unstructured data onto application and data management landscapes populated with other more structured (and even semi-structured) data makes the task even more troublesome.

One data store to bind them all

EMEA marketing director at Perforce Mark Warren suggests that the answer here will come (in part) through more automation (de-duping, mining content) and tooling that helps either avoid or reduce duplication.

“For example, centralised repositories which are shared by all and links or references get shipped around not the data. Actually the implementation could be decentralised or distributed but the appearance to the human being is a ‘one data store to bind them all’ as it were,” said Perforce’s Warren.

If email is the most visible example of unstructured data growth in most organisations, then shouldn’t developers address the applications they build that touch this massively popular communication channel?

Head of global product management at GFI Software Sergio Galindo reminds us that data from the Radicati Group reports an average of:

• 144.8bn emails are sent every day,

• 90bn of which are business-related messages and,

• 60% of all email on average is spam

Galindo comments as follows, “Many users treat their email inbox as a database and repository, as well as a transmission medium, for everything from links to videos to Word docs and spreadsheets. The thing is, email silos are not built for the way they are increasingly being used, and using them as ad-hoc unstructured databases may provide users with a short-term productivity fix, but in the longer term it is both a major hindrance to workflow and places company data at significant risk of loss, corruption and theft.”

According to GFI’s Galindo, unstructured data management needs a tough, disciplined approach that incorporates good policy and stringent enforcement, alongside training and technology adoption to ensure that unstructured data sources like email is backed up, archived and retained to enable fast future access.

Strict policy enforcement is important, but we must be careful not to go to far lest we might lose the context for this unstructured data that we seek to find and pin down.

Tony Speakman, director at FileMaker suggests that the key to handling unstructured data is in the old Einstein quote:

Albert 456px-Einstein_1921_by_F_Schmutzer.jpg

“Make everything as simple as possible, never simpler.”

Speakman says that we can not force people to create information to fit a database or strip down what they create, or we’ll lose the meaning of the data.

“We need to make technology expand to fit around what people create. There will never be total order but we can herd data in a way that people can understand and therefore use effectively. One example is a FileMaker developer working with the Bodleian library to digitise the writings of Voltaire. This involved creating a system of search engine like tags to order transcripts and letters, making it all completely searchable for the end user, without the need for Voltaire to alter his behaviour,” said Speakman.