The essential problem for all IT development, including the data integration process, is that business users know what they want to achieve with, say, customers, but developers only understand things in terms of esoteric constructs such as database tables.
The issue thus becomes one of translating business requirements into terminology that developers can understand, and then having them produce applications, data transformations and functions that meet the requirements of the business.
Unfortunately, the gap of understanding between the two parties means that this is rarely completely successful and is sometimes a downright disaster.
There has been a series of attempts to get over this problem, which could be dubbed the “specification mismatch problem” -- starting with rapid application development (RAD) and continuing today with methods such as agile programming.
However, none of these attempts at a resolution has been really successful, and I believe that this is because all of the efforts are focused too much on the developer and not enough on the business user.
The basic problem is this: business users think about customers, suppliers and products in a logical manner, but developers think about them from the perspective of their physical implementation.
What is needed is a way for business users to work at that logical level and developers to work at the physical level, with an automated function that maps the former into the latter.
Of course, this is what model-driven architectures do in the world of application development. But this has never been very popular, in part because relevant tools are still targeted at developers rather than business users.
Benefits of data integration projects
However, taking a logical/physical approach to data integration has advantages when compared to general-purpose development. To begin with, data integration projects are typically less complex than developing entire applications. Secondly, for data integration you don’t need the conceptual layer that also forms part of a model-driven architecture.
On the other hand, a lot of data movement tasks such as migration, ETL and archiving have an extra dimension that doesn’t apply to generic application development. For instance, there are existing relationships between customers and their orders that have to be maintained during the data movement process. As it turns out, addressing this issue is the key to enabling business-IT alignment for data integration and overcoming the specification mismatch problem.
The point to remember is that business users understand these relationships: how they work, what constraints exist and so forth. Conversely, developers don’t understand them: they can see what has been instantiated in the database schema, and they may be able to infer possible relationships using data profiling tools, but only the business can confirm whether those inferences are correct or not. Therefore it makes sense to have business users, working at a logical level, identify and manage these relationships.
This doesn’t just mean using the right language. It is certainly necessary for users, of whatever stripe, to be able to work with terminology that they are familiar with, but it is not sufficient. Putting a lot of diagrams of database tables in front of business users and then labelling them with the proper data semantics may help -- but ultimately, it won’t solve the problem because you are trying to force users to think about customers or products in terms of their physical implementation and not at a level they are comfortable with.
Business users need to deal with what I would call a ’business entity’. IBM calls them business objects (ironic, considering that IBM owns Cognos while SAP bought BusinessObjects). Essentially, a customer business entity is a real customer with delivery addresses, a payment and service history, outstanding orders, invoices and so on. It is not a bunch of tables with strange names.
This sort of treatment of business entities is commonplace in the information archival market, strangely enough, but it is only just beginning to find its way into data integration tools and techniques per se.
Mixed support from data integration vendors
For some time, both IBM and SAP BusinessObjects have had products that let business users work on a data integration task and then provide for that work to be mapped to business requirements for the benefit of developers. IBM’s InfoSphere FastTrack software is an example of this. However, neither vendor currently offers business user capabilities that work at the business entity level for the purposes of data integration.
On the other hand, IBM does have such capabilities in its Optim archival product, and it is probable that these will be migrated into its data integration technology in due course. And since SAP recommends the use of business entities when migrating SAP applications from one system to another, it can only be a matter of time before we see the same from SAP BusinessObjects as part of its data integration platform.
Informatica, a data integration software market leader, does support the concept of business entities in its new Informatica 9 software. (Interestingly, it acquired the archiving vendor Applimation last year). It also has a major focus on business-IT alignment in this release, making extensive use of role-based interfaces, which is another prerequisite to enabling collaboration in the data integration process.
All of that said, the companies that have been leading the march in this direction are British, and especially Celona and X88. The former is a data migration specialist that specifically works at the level of business entities (which is particularly useful when doing incremental and zero-downtime migrations), while the latter is a data profiling vendor.
X88’s software is particularly interesting because it directly generates a developer specification from the work done by the business user. In addition, the specification is created as a PDF so that it can be used with any data integration system, whereas IBM’s and Informatica’s products, for example, will only support their own individual data integration architecture.
Business-IT alignment, ROI go together in data integration process
People tend to waffle a lot about collaboration, and it is often unclear what the return on investment (ROI) is. However, in data integration it should be clear-cut. We can quantify how much data integration projects overrun and by how long or how much. For example, a survey conducted by Bloor Research in 2007 found that more than 80% of data migration projects missed their mark in terms of time and/or money. If we can significantly reduce that overrun by eliminating specification mismatches -- which I believe we can -- then ROI should be easy to prove.
Of course, we are not quite there yet: apart from Informatica, the major data integration vendors have yet to implement all of the pieces. But this is certainly the direction in which the market is moving. As I see it, business-IT alignment represents the future of the data integration process. Whether it will percolate through to application development more generally is another question entirely.
Philip Howard is a research director focused on data management for Bloor Research. He tracks technologies and processes such as databases, data integration, data quality and master data management. Howard has worked as a Bloor analyst since 1992; he also writes frequently for IT publications and websites and is a regular speaker at conferences and other industry events.