EMC picks Greenplum

At the beginning of July, EMC announced its intention to acquire data warehouse vendor Greenplum. On the 29th July that deal was completed and EMC announced that Greenplum is the foundation of the new EMC Data Computing Product Division.

This  is another step down EMC’s path to becoming an all round player capable of competing with IBM, HP and Oracle. We now have four major players with large hardware and software divisions that are capable of providing most, if not all of the component for a corporate datacentre.

At the same time EMC has jumped from being a storage company to a large-scale data warehousing vendor. This means it is going to be competing with the likes of Teradata, Oracle, HP, Microsoft, IBM and the newly formed SAP/Sybase.

All of this fits into a much wider strategy that encompasses VMware, the EMC Backup and Recovery Solution division and its ever closer partnership with Cisco.

With VMware, EMC all but owns the virtualisation market which in turn gives it the keys to build Cloud infrastructure. The acquisition of Data Domain and a small number of other companies to create its BRS division meant it could now backup data into the Cloud as well as locally. The two together solve one major problem for companies looking to use the Cloud as a private/hybrid environment where they want to take advantage of Disaster Recovery and on-demand resources – how to synchronise data and switch control from local to Cloud.

Now, EMC sees an opportunity to take the largest consumer of storage and processing resources inside most large organisations into that same Cloud market. Business Intelligence (BI), data warehouses and data analytics are resource hungry processes. They need a lot of storage both for the raw data and for the refined sets of data that are used by people mining that data. On top of this, they consume a lot of processor power, memory and bandwidth.

EMC knows all of this, after all, it is the biggest supplier of storage systems. In the past it has been content to be the underlying hardware to Oracle, IBM, Microsoft, Teradata or any other database. But as the databases become larger there comes a point where they need to be tightly integrated with the hardware to get the maximum performance.

IBM has spent a lot of time with its very large customers integrating DB2 into its hardware platforms in order to extract maximum performance. One of the stated reasons for Oracle buying Sun was to get a hardware platform that it could tightly integrate with the database. The same is true of HP. As a result, it should come as no surprise then that EMC, as a hardware vendor, wants to do the same.

What make this especially interesting is that the Greenplum database is a massively parallel  processing (MPP) product. This means that it can take advantage of all the processor cores that you can throw at it. This is where VMware comes into play. It has all the required management tools to allocate resources on demand, including processor cores. Ally an MPP database to a resource management system and now you can ramp up and down on demand and this is where the competitors will need to respond quickly in order to counter this move.

We have already seen VMware tightly integrate hardware and software stacks with VSphere 4. They have a highly successful software solutions division where companies have ported their applications into VMs that can be used for evaluation and made live by the purchase of a key. On top of this they have used the EMC engineering division to build a hardware certification programme.

It is not a huge jump, therefore, to see the Greenplum database being distributed as a virtual machine ready for deployment. A step beyond that comes the EMC Database Appliance to compete with IBM and Oracle. Move sideways and the whole of the VMware management suite gets its own underlying database platform.

Don’t forget that a lot of the current VMware team are ex Microsoft and when we first saw the abstraction layers for VSphere 4 it looked an awful lot like Windows Server but with a proper certification programme. So will we see a version of Greenplum acting as the repository for directory services, authentication and management?

Don’t bet against it, because it is a natural step to take. In fact, given the need for service management and integrated help desk solutions, it would not be unreasonable to think of people inside EMC and VMware already planning their own CMDB solutions and other applications on top of the Greenplum platform.

Think SAP + Sybase + NetApp. SAP and NetApp are already working closely together along with a number of Cloud and hosting providers to provision hosted SAP. The acquisition of Sybase will allow SAP to have its own tightly bound database and given its NetApp relationship, it cannot have escaped the R&D team that tightly binding Sybase to the NetApp platform would make this a very effective solution.

For ISVs and anyone who wants to be a Cloud platform supplier EMC has a solution here as well. Once EMC integrates Greenplum effectively onto its hardware and adds in the multi-tenant management elements from VMware, you have a Cloud BI platform that can be delivered as part of a public and even a private Cloud product.

How about EMC taking the Software as a Service approach and creating Database as a Service and BI as a Service based on the Greenplum platform? We’ve already seen software development and software testing go down this route so taking the database and BI into the “as a Service” world is quite feasible.

These are just a few of the possible ways that we could see EMC go. The most important thing, however, is that this presages a whole new world for databases where they are no longer just an application but instead return to the core of the business as part of the hardware platform.