Cloudera v Hortonworks: Hadoop to complement or replace data warehouse?

As the market for enterprise Hadoop heats up, the battle lines between two suppliers - Cloudera and Hortonworks - become more clearly defined

As the market for enterprise Hadoop heats up, the battle lines between two suppliers – Cloudera and Hortonworks – become more clearly defined with every week that passes.

For these two suppliers, there is everything to fight for and much at stake in a battle that is being played out on several different fronts. For example, there is the issue of having the financial backing that a pre-IPO start-up needs to persuade enterprise customers that it is a safe bet. Cloudera seems to have won that tussle, with the late March 2014 announcement of $740m in financing from chipmaker Intel.

On the matter of customer acquisition, six-year-old Cloudera probably has a slight lead over three-year-old Hortonworks, but only just. Analysts estimate Cloudera’s base of paying subscribers at around 350, while Hortonworks’ CEO Rob Bearden says his company has acquired 250 customers over the past five quarters.

But the most significant point of disagreement between Cloudera and Hortonworks lies in their answers to a single question – and the one that, arguably, matters most to enterprise customers: should Hadoop complement or replace traditional enterprise data warehouse (EDW) investments?

At Hortonworks, vice-president of marketing David McJannet says it is the former: Hadoop is a valuable addition to existing analytic technologies. “A unique aspect of our approach is the fact that we’re not trying to compete with the data warehousing incumbents,” he says. “This is a pivotal philosophical difference we have with other folk in this market.

“There is one [supplier] in the market that will tell you, ‘Just throw out Teradata and stick it all in Hadoop’, but it’s just not realistic to do that.”

And who would offer that advice to customers? “Cloudera,” McJannet answers, flatly. “That’s their leading message: that the data warehouse is dead.”

For more on Hadoop and the enterprise data warehouse

At Cloudera, chief strategy officer Mike Olson tells a slightly different story – but his broad message is that, over time, many analytic workloads will move out of the EDW and into Hadoop.

He points out that Hortonworks has a close go-to-market partnership with data warehousing company Teradata, making it impossible for the company to present Hadoop as an EDW competitor. Even McJannet acknowledges that Hortonworks’ reseller agreements with Teradata, Microsoft, SAP and Hewlett-Packard mean that a combined 1,000 extra enterprise software sales executives are out there in the market, selling on Hortonworks’ behalf.

But Olson’s remarks reveal a strong inclination to position the EDW as a vulnerable “older technology” whose days are numbered.

“Everyone wants to know if Cloudera is after Teradata,” he says. “The fact of the matter is that the opportunity here is not to knock over old guys and steal their wallets. The opportunity is to monetise vast amounts of data using tools that weren’t previously available.”

That doesn’t mean there won’t be a place for data warehouses, or that products from EDW suppliers won’t evolve over time, he says. “But workloads that belong in high-end enterprise data warehousing systems today, won’t in the future – and even high-performance, interactive analytic workloads will run in Hadoop.”

Teradata advocates détente

At Teradata, meanwhile, executives put a rather different spin on the situation: they claim that the more big data that companies store – in Hadoop or elsewhere – the more they will see the value of a proven, high-performance data warehouse environment for storing more important data and running mission-critical analytic workloads.

Chief technology officer Stephen Brobst dismisses outright the idea that Hadoop may appeal to companies that find the cost of Teradata’s high-performance data warehouses unpalatable or simply prohibitive.

“That doesn’t make any sense at all to me,” he says. “The organisations that have the skills to deal with Hadoop are not smaller organisations – they’re large ones. And when we talk about ‘affordability’, you have to factor in total cost of ownership. Many companies want an analytical environment that just works – and that’s what Teradata delivers. Total cost of ownership, over time, is much higher for Hadoop.”

Instead, Teradata is as vigorous as Hortonworks in promoting a ‘co-existence’ view of EDW, Hadoop and other big data technologies. At its recent Teradata Universe conference in Prague, the company announced QueryGrid, a new technology that will allow users to write a SQL query in Teradata and have it automatically analysed in Hadoop or its own Teradata Aster system, an MPP [massively parallel processing] appliance.

Many companies want an analytical environment that just works – and that’s what Teradata delivers

Stephen Brobst, CTO, Teradata

The company's official line is that Hadoop will be simply another layer in the ‘data management stack’ for many companies. But that’s not to say that, behind the scenes, executives aren’t rattled. On Teradata’s last earnings call, Hadoop was mentioned no less than 54 times, according to financial analysts. One-third of the company’s 50 largest customers in the Americas are now running Hadoop in production, with the other two-thirds “in various stages of evaluations”, according to Teradata CEO Mike Koehler.

Commenting on Teradata’s results in a research note, financial analyst Keith Backman of BMO Capital Markets writes: “We believe the following: Hadoop is being deployed by leading-edge, large organisations; Hadoop will gain more traction over time, mostly with larger organisations that can acquire and/or develop Hadoop-based skills; and Hadoop will have a negative impact on [Teradata’s] revenue growth over the next five years, as more users deploy Hadoop.”

However, Backman adds that, for many large, data-intensive organisations, Teradata is still an essential component of the data warehousing and business intelligence value chain, and that Hadoop hype is “probably” ahead of adoption.

What does Forrester Research analyst Mike Gualtieri make of these competing and conflicting views? Right now, he argues, it’s not reasonable to think in terms of ‘death or glory’ for either EDWs or Hadoop. 

“Enterprises pay for an enterprise data warehouse because they need the superior performance it provides, but they either end up putting a lot of cold data on a very expensive platform, or leaving potentially valuable data on the cutting-room floor,” he says. “Hadoop may offer low-cost storage for data, but it simply isn’t fast enough to replace EDW right now. It can’t hold a candle to Teradata, for example, in terms of performance, whatever the Hadoop [suppliers] say.”

But Gualtieri insists Hadoop is a threat to the EDW, even if it does have a long way to go, particularly in terms of hardware performance (and this could be where Cloudera’s funding from Intel could reap the biggest benefits for suppliers.)

SQL on Hadoop

For enterprise customers, the area to watch is the development of tools that enable organisations to run SQL queries against Hadoop, says Gualtieri. Here, Cloudera offers Impala, while Hortonworks’ Stinger project, now complete, has seen engineers work to improve the performance of the Apache Hive tool.

“SQL on Hadoop is the hottest area of Hadoop innovation right now,” he adds. “Most organisations analyse only about 12% of the data they hold, so the race is on to get faster, interactive SQL working well on Hadoop.”

In fact, SQL on Hadoop may be the most important deciding factor in Hadoop’s progress from now on. Not only does it give suppliers a chance to differentiate by developing the fastest, easiest-to-use SQL tools for Hadoop, but it may also unlock a new range of use cases for enterprise customers yet to make a move on Hadoop. 

And in the process, it may also pose the biggest threat to the data warehouse by enabling those customers to move a significant share of data and query loads over to Hadoop and away from their venerable EDW.

Read more on Data warehousing