Time to tame data architecture complexity, but task is tough

Data architecture is close to a misnomer. Complex corporations are striated by applications, beset by politics. Service buses, data governance programmes offer remedies, but don't underestimate task.

A building design carefully sets the layout of supporting walls, gas pipes, cabling and other architectural components, but technical architects have no such luxury when it comes to data architecture. Most companies rely on a hodgepodge of systems of varying ages, replacing some of them every few years and bolting on others as acquisitions and mergers occur.

Instead of a shiny new skyscraper, our technology framework is more a ramshackle old building, with an extension here and an outbuilding there -- a Gormenghast Castle of data architecture complexity.

One large corporation I worked with reckoned to have 600 major applications: Their global ERP system was just one of these, itself having numerous instances, all subtly different in their implementation. Connecting all these systems were interfaces of considerable complexity, with data running through them from system to system.

Getting the data to behave itself is alone a major undertaking. Let’s ignore the documents and emails, the slide decks and spreadsheets, and focus on the corporate systems. To run a corporation effectively, it is self-evident that its business data should be consistent, controlled, accurate and timely; yet, in reality, it is far from that. A 2011 Information Difference survey found that only 18% of companies even attempt to measure the quality of their data across the corporation, and fewer still try to put a cost on data quality problems.

Quality is one issue, data consistency another: A survey we conducted in 2008 on master data management (MDM) found that the average large company had six systems generating different versions of customer master data and nine generating product master data. Some surveyed companies had more than a hundred such systems, competing with each other as potential sources of master data.

New MDM hub = one more data source
It is not easy to fix this problem. If, as an enterprise architect, you think, “OK, there will be just one source of customer data from now on,” then how exactly will you go about that? If you set up an MDM hub, all you have done is increase the number of sources by one. You need to actively switch off the ability of your existing applications to generate new customer data and instead do that in a new system, or at the very least hook up the applications to the new authorised master source and validate customer data against it. That will require technical changes to those applications and a business decision to modify the process of adding a new customer account.

For more on reining in data architecture complexity

SearchSOA offers this guide to integration architecture

Read this extract from Master Data Management and Data Governance on MDM design

Bone up on service-oriented architecture

Ignoring the technical aspects, you probably don’t have the authority to make that happen if you work in IT; that is to say, you cannot force people in sales and marketing to change the way they are doing things. The same issue will occur whatever the type of master data involved, whether it’s product, location, asset, supplier or financial data. The more important and sensitive the data is, the more status people attach to the ability to control it, and the harder they will fight to retain that control.

Assuming for a moment that you are granted the authority to make such a change, how would that happen technically? Your new customer master data hub needs to be connected to the current systems that use customer data, both those that generate new customer data and those that use it for other purposes, such as business reporting or compliance. That could be a lot of systems, so a lengthy set of interfaces will be needed. The same, remember, has to be done for product data, location data and so on.

Service bus as rare as the unicorn
An alternative approach is to build a service bus architecture, whereby applications are plugged in to a pipeline of data managed by an infrastructure tool, pulling data from the pipeline as they need it and supplying new data where authorised. This has the major advantage of removing the need for point-to-point interfaces: every application just gets plugged into the service bus, with the new master data hub providing data into that pipeline. This is a neater solution to the problem, but it obviously puts considerable operational demands on the service bus and also requires technical plumbing to all your applications, old and new.

In reality, how many organisations have one of these implemented throughout the enterprise? Relatively few, despite the best marketing efforts of vendors. Initiatives like this also need to take into account external data: retailers need to be able to access supplier data, banks depend on market data from third parties such as Bloomberg and Reuters, and most companies use credit ratings from suppliers such as Dun & Bradstreet. The formats of such data are not within your remit to change.

In most companies, the political obstacles are even greater than the technical challenges. As previously mentioned, owning data gives people a measure of control inside organisations, and most are unwilling to give up such control without a fight. The dawning realisation of this has led to the emergence of data governance programmes, involving and usually led by business representatives. Data governance is first and foremost an organisation and a process, not a technology, whose aim is to get business executives to assign stewardship over data amongst their staffs and to be responsible for the consistency and quality of that data. At least it tackles the politics head on, though data governance is still very much in its infancy. An August 2010 Information Difference study found that just 31% of companies had an active data governance programme, and a benchmarking survey that we conducted in December 2010 found that only 57% of such programmes were deemed successful by those that had them.

I cannot see any silver bullets. Once systems are operational, it is hard to justify replacing them, and the diversity of data in an enterprise is inextricably linked to budgets, control and power, things not easily wrested back once divested. Data governance programmes can start to tackle the underlying diversity problem, and some are succeeding, but no one should underestimate the sheer magnitude of the task facing technical and data architects looking to rein in data architecture complexity. Lebanese poet Mikhail Naimy wrote, “The more elaborate his labyrinths, the further from the sun his face,” and we have built for ourselves an elaborate labyrinth of technology in today’s large corporations.

Andy Hayler
is co-founder and CEO of consultancy The Information Difference and a frequent speaker at conferences on master data management, data governance and data quality. He is also a restaurant critic and author (see www.andyhayler.com).

Read more on Master data management (MDM) and integration