When you start to design a data warehousing and business intelligence (BI) system, you need to make several fundamental...
decisions right at the outset. Get any of them wrong and you risk disaster: as we all know from any design process, the more fundamental the error, the more expensive it is to fix. (So, no pressure then!)
Of these fundamental design decisions, probably the most crucial is: ‘Inmon or Kimball?’
The Inmon vs Kimball debate is about how you structure the data in your data warehouse. The Kimball approach favours a dimensional structure, while the Inmon data warehouse methodology favours a relational structure. Both assume that the data warehouse will be underpinned by a relational database (Oracle, SQL Server, DB2, etc). But a Kimball data warehouse will use a relational OLAP implementation of a dimensional model – that is, a mixed set of dimension and fact tables.
Clearly, there is more to it than that. Both Bill Inmon and Ralph Kimball have written extensively on this subject, and neither makes bald statements without backing them up. Here I am compressing a huge collection of well-reasoned, carefully structured arguments into a very brief description, so inevitably, the detail is lost. If you want to know more about the arguments, the obvious place to go is back to the source material – which a quick search of Amazon will happily yield.
If you listen to a zealot on either side, you could be forgiven for assuming that the alternative is doomed to certain failure. This is demonstrably not the case. I have worked on both Kimball- and Inmon-style warehouses, and I can tell you that both approaches work. Nor am I alone in making that observation.
A couple of years ago, I gave a webcast on data structuring in BI systems and was able to poll the 100 or so participants, who were asked this question: “Data warehouses (as distinct from data marts) can be structured as relational (Bill Inmon) or dimensional (Ralph Kimball). Which do you favour?”
The answers were:
Use either/both as appropriate: 52%
The results were encouraging because more than half the respondents were open-minded enough to make the decision based on the particular circumstances they found, rather than on dogma. (And some of those who chose ‘relational’ or ‘dimensional’ might still be open-minded – they just might have happened to work on several warehouses where the best solution was one or the other.)
So how do you decide: Inmon or Kimball?
The Inmon vs Kimball choice should have nothing to do with dogma and everything to do with the characteristics of the two approaches. And simply calling those characteristics ‘pros and cons’ would be pejorative – if you choose a dimensional structure for a data warehouse, your choice will have a number of consequences that can only be classified as pros and cons in the context of the particular warehouse you’re building.
In practice, there is a whole collection of factors that you need to weigh up. To get a feel for the kind of issues you need to think about, let’s just take a look at one – data restructuring.
The data in your transactional systems is almost bound to be relational. And no matter which data warehouse model you choose, your data marts are almost certainly going to be dimensional. So it follows that at some point in your BI system, you’re going to have to change the structure of the data. If you go Kimball, you have to change the structure during the extract, transform and load (ETL) process. If you go Inmon, the change is put off until the data flows from the warehouse to the marts.
Now let’s think about data volumes for a minute. Almost all BI systems use ETL tools to move data from transactional systems once a day. They don’t move it all – they only move data that is new or that has been changed since the last extraction. So, a warehouse may contain, say, 10 years worth of data amounting to 3.5 TB, and each nightly extract may only be moving 1 GB of information.
Before data can be loaded into a Kimball data warehouse, you have to transform it into a dimensional structure. That will slow down the ETL process. With an Inmon data warehouse, on the other hand, the ETL process is faster and easier because you’re simply moving data from one relational structure to another.
But wait a minute: we haven’t finished yet. Either way, you’re eventually also going to have to move the data down into your data marts for business users to query and analyse. Now the tables (if you’ll pardon the pun) are turned. With Kimball, the amount of data restructuring required to populate the marts is minimal. That isn’t the case with Inmon, making data mart population slower.
Not only that, think about the volumes of data. During the ETL process, we were only talking about 1 GB per night. When populating data marts, you may, in many cases, be moving hundreds of gigabytes of data to several different marts during a single night.
Assessing the likely impact of choosing Inmon or Kimball
Once you understand the fundamentally different effects that the Kimball and Inmon approaches have on data flow and restructuring, you can look at a particular data warehouse and determine the likely impact of choosing one over another. For example, in a large warehouse (perhaps that of a telecoms company like Orange), data volume and transformation are likely to be crucial, while for an SME it may be entirely inconsequential.
This question of where you restructure data is not, of course, the only factor you need to consider, or even the most important factor – it is simply one of many. But the choice between a Kimball data warehouse structure and an Inmon one should be made on the basis of reason rather than dogma – and sadly, some people tend to give inflexible support to one methodology or the other.
It is perhaps worth remembering that while Ralph Kimball and Bill Inmon differ in their views, there is no evidence that they themselves take dogmatic positions. Indeed, the fact that Inmon wrote a foreword for the first edition of Kimball’s book The Data Warehouse Toolkit tells us that they probably take a balanced view. We’d all do well to take the same approach.
Dr. Mark Whitehorn specializes in the areas of data analysis, data modeling and business intelligence (BI). Based in the UK, he works as a consultant for a number of national and international companies and is a mentor with Solid Quality Mentors. In addition, he is a well-recognized commentator on the computer world, publishing articles, white papers and books. Whitehorn is also a senior lecturer in the School of Computing at the University of Dundee, where he teaches the master's course in BI. His academic interests include the application of BI to scientific research.