IBM believes in commoditised HPC for BI

BI needs power. Lots of power. Before multi-core processors and desktop BI arrived, companies doing serious BI either used mainframes, mini-computers or small High Performance Computing (HPC) setups. The cost was not just the hardware, software vendors priced for the hardware configuration which made this an expensive solution.

Moving BI to the desktop changed this. By taking advantage of unused local processing power and drastically reducing the cost of the tools, BI quickly found itself being used more widely.

In the last decade, however we have seen the massive explosion of multi-core, multi-socket commodity servers, blade systems and motherboard capable of supporting 512GB of RAM. Alongside this have come virtualisation, fast Storage Area Networks (SANS) and huge storage arrays. So is now the time to consider moving BI back to the datacentre?

IBM thinks so and its reasons make a lot of sense.

While BI at the desktop has been a success, IBM points to the fact that it has some serious shortcomings. One of these is “versions of the truth”. The issue here is that users can often be working off of the same dataset at different points in time. If that data is not linked to the same core data or has been derived from another users dataset then there are inconsistencies within the data. Anyone making business decisions is not doing so with the best data around.

Another issue is data control. As data is pulled down by users, it is “lost” by the datacentre. While the original data remains, the copies can often be stored locally and as few organisations have a truly universal backup approach to all their compute devices, this data disappears from view. In many cases, it is still owned by the organisation but there exists the potential for data to be removed from the organisation and that is a much more serious matter.

Bandwidth can also become a significant challenge. If the users and the data are co-located then this is just an issue of the LAN. But where data is remotely located, perhaps in a different country or even in the Cloud, there will be charges to moving such large amounts of data.

As BI evolves we are seeing users wanting to go further than just working with a subset of corporate data. They want and are being encourage to access data from other sources. A typical example here is geodata and census information from government and other sources. This allows sales teams to get a very detailed picture of who and where the products are being sold and from this they can build much more effective sales plans.

The challenge here is that the data will often be in different formats and may not even have common fields or data types. This means that the BI users needs very advanced tools, a lot of knowledge and the processing power to make sense of all of this data.

Proponents of localised BI accept that bandwidth and data security are an issue and rightly point to the fact that these are wider issues than just BI and on this they are right. They also point to the increasing power of desktop tools and the power of local machines. So does IBM actually have a case?

The answer is yes.

The commodity server explosion has ensured that the datacentre has more resources than ever before. With virtualisation, those resources are flexible and can be applied as required. Commodity servers have also made HPC easier and cheaper to implement and this is one of the reasons why IBM believes that it is time to bring HPC and BI back together again.

One of the key advantages of modern HPC is that it is capable of taking advantage of parallel programming and for complex BI applications, parallelism offers a significant advance in performance. While desktops can run individual analysis of datasets, an HPC array using parallel processing can outperform a large number of desktops.

More importantly, that HPC array is working off of a single master dataset. This means that there is a single enterprise version of the truth as far as the data goes. The data is kept synchronised and not lost to the datacentre. Data security and accuracy get enhanced and it is much easier to meet your legal and compliance obligations.

As the HPC array will be located in the same place as the SAN, the network can be tuned to ensure that the data is delivered optimally. This means no delays due to external systems and, financially, there are no huge penalties for moving large volumes of data in and out of the Cloud.

This does not mean that there is no place for fine tuning of data locally, but it does reduce the amount of data moved and ensures that the vast bulk of the processing is done much more efficiently.

Integrating multiple different types of data sources is also something that can be best done at the IT department level making it easier for users to then take advantage of data from different places and allowing them to spend their time focusing on getting answers rather than manipulating data.

On principle, I am not a great fan of pulling everything back into the datacentre but IBM does make a compelling case here. I would even go as far as saying that if you move the end-user BI tools onto virtual desktops or terminal services that are run in the same location as the HPC and BI software, you would gain even more advantages in terms of performance and security.