Feature

Good data management cuts costs and boosts compliance

An organisation is part of a value chain consisting of itself, its customers – and sometimes their customers – as well as its suppliers and their suppliers.

Other stakeholders in the flow of data include shareholders, regulatory bodies such as ISO, the Financial Conduct Authority or the US Food and Drug Administration, and central government departments such as HM Revenue & Customs.

binary-thinkstock.jpg

Ensuring data flows are efficient and transparent will keep an organisation’s costs under control while ensuring they can show compliance to data rules and laws that apply to them. In the past, the main approach has been to apply data security based around the application concerned or the database that lies under it. Such an approach has led to problems where processes cross multiple applications, requiring “glue” between the applications themselves, as well as the use of identity federation and single sign-on (SSO) systems. The evolution of big data is also making database-centric security a bit of a non-starter.

Security architecture

A different approach is needed for managing the data and information that an organisation deals with. What is really required is an approach based around a secure information architecture, rather than individual application or database security. Data on its own is just a collection of ones and zeroes. For an organisation to gain value from its data, it needs to create information from it – and from that information to enable knowledge to be inferred that allows decisions to be made. As the move from data to information to knowledge is made, there is also often a move towards intellectual property for the organisation – a process that adds lasting value to the business itself. In a complex, multi-organisational value chain, just how can a suitable information architecture be put in place? Firstly, the data available has to be identified and brought together in a meaningful manner.

Grasping the basics

The problem here is that the value chain includes searches carried out on external data sources – including the internet. Avid followers of the Dilbert cartoon strip may conjure images of the Boss asking Dilbert to just run him off a copy of the internet. However, modern big data approaches can help in minimising the volumes of data being dealt with. Figure 1 (below) shows a basic schematic of an approach to dealing with an organisation’s data needs. Existing applications continue to run as they are – if they sit above an existing SQL database. Other, less-structured data in the organisation, such as word processed documents, spreadsheets, images and voice can be stored in a NoSQL database with native JSON, BSON, YAML, XML or other formatting to improve searchability. NoSQL databases that fit into this category include MongoDB, CouchDB, Terrastore and RavenDB. Where the data source in the organisation is more diffuse, using a filter such as MapReduce can help bring the amount of data under consideration to a reasonable level. This is an area where Hadoop excels, with a scalable architecture based on commodity hardware.

The use of Hadoop as a consolidation step means that it is also useful for dealing with external data sources. As a stream of information comes through the firewall, it needs to be captured and filtered to remove any obviously useless data and then tagged as data type and contextual metadata added through the use of a tool such as CommVault Simpana.

The output from this can then be fed through to the Hadoop cluster, enabling a massive amount of data to be consolidated down to a more manageable volume. Quocirca generally does not recommend the use of Hadoop as a persistent data store, preferring that the data is stored either in SQL or a NoSQL data stores. Therefore, the output from the MapReduce stage should then be targeted at one or other of these data store types.

Grasping the basics

The problem here is that the value chain includes searches carried out on external data sources – including the internet. Avid followers of the Dilbert cartoon strip may conjure images of the Boss asking Dilbert to just run him off a copy of the internet. However, modern big data approaches can help in minimising the volumes of data being dealt with. Figure 1 (below) shows a basic schematic of an approach to dealing with an organisation’s data needs. Existing applications continue to run as they are – if they sit above an existing SQL database. Other, less-structured data in the organisation, such as word processed documents, spreadsheets, images and voice can be stored in a NoSQL database with native JSON, BSON, YAML, XML or other formatting to improve searchability. NoSQL databases that fit into this category include MongoDB, CouchDB, Terrastore and RavenDB.

Where the data source in the organisation is more diffuse, using a filter such as MapReduce can help bring the amount of data under consideration to a reasonable level. This is an area where Hadoop excels, with a scalable architecture based on commodity hardware.

The use of Hadoop as a consolidation step means that it is also useful for dealing with external data sources. As a stream of information comes through the firewall, it needs to be captured and filtered to remove any obviously useless data and then tagged as data type and contextual metadata added through the use of a tool such as CommVault Simpana. The output from this can then be fed through to the Hadoop cluster, enabling a massive amount of data to be consolidated down to a more manageable volume. Quocirca generally does not recommend the use of Hadoop as a persistent data store, preferring that the data is stored either in SQL or a NoSQL data stores. Therefore, the output from the MapReduce stage should then be targeted at one or other of these data store types.

Having all the data in just two types of data store makes it easier to apply the search and analytics tools that will be required to truly move the data up into the information level.

Market maturity

Many of the usual-suspect suppliers have been making progress in their approaches to dealing with the problem of handling an organisation’s information management problems. From their starting point of watching the bandwagon of big data roll into town and creating messaging that attempted to shoehorn their existing portfolio into the market, we are now seeing a good level of maturation in the market.

The key for any organisation is to make sure that what they put in place will be flexible

Clive Longbottom, Quocirca

IBM has brought its PureData systems into the market, and Teradata has linked its existing products into a big data approach based around its acquisition of Aster. EMC has spun out its GreenPlum and Pivotal Labs acquisition as a joint venture with VMware and GE into a new entity called Pivotal. Pivotal will not only provide a data management environment, but will also be creating a development platform, Pivotal One. SAP is also moving in this direction. Its Hana inmemory database has already morphed from being “just” a fast database. The Hana Cloud Development Edition is already a complete development platform for creating data-intensive apps that can deal with multiple types of data.

Good presentation can improve decisions

The last step for organisations revolves around how the information is taken and presented to the people involved so that the knowledge can be extracted and used to make decisions. Here, the battle has been waging for some time. The “old guard” business reporting suppliers such as Business Objects, Cognos and Hyperion were snapped up by larger players some time back. Newer players, such as Panopticon, Pentaho and QlikTech came into the market and disrupted the ways in which analytics were applied to data. Many of these suppliers are developing or have already developed more advanced systems using combinations of Hadoop and in-memory databases to be able to approach organisations as fullservice data management companies, rather than just business analytics suppliers. New players seem to be coming to market on a daily basis – and this is causing problems, as many will fail on their journey and others will be acquired as the larger companies push to create more complete data management solutions.

The key for any organisation is to make sure that what they put in place will be flexible. It can be argued that a good decision can only be made when as much information as possible is used to support the knowledge. This means bringing in data from as many sources as possible, in as many different formats as necessary. Formal data, stored and managed under SQL databases is not going away, and any supplier that tries to say that everything can be done under a NoSQL database should be regarded with deep distrust. However, likewise, any supplier that tries to persuade you that storing unstructured data as binary large objects (BLOBs) in a SQL database is the way forward should also be treated with deep scepticism.

The solution is likely to involve a hybrid approach – and your choice as an organisation is whether to build it yourself, or to go to a supplier who has taken the various technology components required and integrated them together already as a single solution.


Email Alerts

Register now to receive ComputerWeekly.com IT-related news, guides and more, delivered to your inbox.
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy

This was first published in July 2013

 

COMMENTS powered by Disqus  //  Commenting policy