Information management – when metadata is king

canstockphoto7719120Consider a document.  It makes no odds as to whether it is a Microsoft Word document, an Adobe pdf file, an Autodesk file or whatever.  Just what can you find out about it?

Well, every file has a digital fingerprint associated with it:  an operating system can look at more than just the file extension to identify just what type of file it really is.  Within the zeroes and ones of the binary content of the file on the disk is a ‘wrapper’, a set of details that describe what the file is.

Once the wrapper is understood, the contents of the file can be indexed, so that systems can search this index as well as the actual document contents. For example, on my Windows device, a ‘search’ in for the term ‘information management’ would pull up this document (and many other files) up in the Windows File Explorer.

However, although this has uses, there are some problems.  Much metadata is not immutable.  As an example, open a Microsoft Word document.  Click on the ‘File’ tab and then look at the right-hand pane marked ‘properties’.  You should see an author marked there.  However, if you click on the ‘properties’ marker itself, you can choose ‘advanced properties’ – here, you can change the author to anything you want.

Likewise, much of the metadata associated with the document can be changed.  Someone with very basic knowledge and the right tools can change the content of a document, along with its dates and make it look to all intents and purposes that it was the original document.  As such, should a conflict arise between the actual creator of the file and the recipient of the same file who has then changed it, it becomes a case of one person’s word against another.

However, if immutable metadata is used, then things change.  By storing the file with extra information where all modifications are logged, such content changing is no longer possible.  By ensuring that the original author is logged and held against the document, along with all dates and times that the document has had an action taken against it (opened, edited, emailed, printed, whatever), full governance, risk and compliance (GRC) needs should be covered.

Let’s just start with document classification.  By assigning a simple set of metadata tags, such as ‘Public’, ‘Commercial’ and ‘Private’ to documents, a lot of process flows can be made more intelligent.  A Public document can be left unencrypted and moved along a process flow with very little interruption.  It can also be passed through email systems without too much scrutiny, apart from a content check to ensure that certain types of data or alphanumeric strings aren’t found within the document for data loss prevention purposes.  A Private document may need to be encrypted, and can only be made available to certain named individuals or discrete roles within the organisation.  The credentials of the sender and receiver of such a Private document should also be checked before it can be sent as an attachment to an email.

Enterprise information systems (EIM) make extensive use of metadata as it enables so much more to be done.  It can do away with folder and file constraints, as pointers to the document are metadata in themselves and the documents can reside anywhere.  Rather than taking an old-style enterprise content management (ECM) approach of pushing files into a relational database as binary large objects (BLObs), EIM content can stay where it is, using the EIM index and global namespace (the database of the pointers and all the metadata held on the files) to find the files themselves.

With EIM, when an individual searches for something, the system searches the metadata.  When they want to read or edit a document, the pointer shows the path to the file and enables access to it.

This provides a much more flexible information management approach, and by copying the metadata store across multiple different locations, provides a level of high availability without the need for expense on dedicated systems using synchronised content databases.

A metadata-driven EIM system also improves security.  A cyclic redundancy check (CRC) can be carried out on each file as it is embraced by the system.  This creates a unique code based on the content of the file.  Should anyone change that file outside of the system, for example by using a hex editor at the hard drive storage level, the EIM system will know that this has happened, as the CRC check will identify that something has changed.

All told, in the new world of highly open information sharing chains, immutable metadata is a need, not a nice to have.

Quocirca has authored a report on the subject, commissioned by M-Files, which can be downloaded here.