Open Data Platform: the answer to a question no one asked?

What should enterprise customers make of the Open Data Platform, an initiative by IT suppliers to establish common technologies for Hadoop?

What should enterprise customers make of the recent launch of the Open Data Platform (ODP), an initiative by IT industry leaders to establish a core set of common technologies for the Hadoop big data platform?

Not much, says Gartner analyst Nick Heudecker. “At this point, we are advising clients not to take ODP into account at all when evaluating their Hadoop options,” he says.

“ODP is the answer to a question that nobody’s asking.”

So far, so damning – but since members of the ODP promise their work will result in “less confusion and friction for enterprise customers”, it is worth examining their claims in more depth.

To recap: the ODP was announced in mid-February by a group of suppliers led by Hortonworks and Pivotal, but also including EMC, IBM, SAS, Teradata and VMware. They are advocating a standard version of the core Hadoop stack, upon which all suppliers – at least those that have paid for membership of their club – can agree.

These core components will be the Hadoop Distributed File System (HDFS), cluster management technology YARN, and Hadoop management console Ambari. By establishing this ‘kernel’, ODP members say they will “take the guesswork” out of developing applications to run further up the Hadoop stack – a process they say is too fragmented, too slow and plagued by duplicated effort for ISVs (independent software vendors).

That seems reasonable enough, but the fact is that ODP does not represent the entire Hadoop ecosystem. Most notable by its absence is Hortonworks’ arch-rival Cloudera. Then there is MapR, another Hadoop distributor that trails Hortonworks and Cloudera. Also missing is cloud provider Amazon Web Services (AWS), which gives the customers of its Elastic MapReduce (EMR) Hadoop-in-the-cloud service the option of using its own Hadoop distribution or MapR’s.

And therein lies the problem: the Open Data Platform just isn’t very ‘open’. By partnering with a number of industry big-hitters, it is certainly a smart way for Hortonworks to consolidate its position in the market. But whether it represents a true industry ‘standard’ is more questionable – and it is already providing the focus for industry bickering that pushes enterprise customers into the corner, watching on.

Profound differences

For his part, Herb Cunitz, president of Hortonworks, puts up a staunch defence of the ODP. “Much of the questioning or consternation or, you could even say, rhetoric that you’ve seen in the industry has come from others saying either ‘I wasn’t invited’ or ‘I don’t like this idea’. But these are 19 companies coming together saying we have agreed on what we believe the standard kernel should be in Hadoop. Others are free to say they have a different opinion.”

And executives at Cloudera have been quick to exercise that freedom. Almost as soon as the ODP was announced, Cloudera CSO [chief strategy officer] Mike Olson took to the company’s blog to publicly denounce the initiative.

ODP, he wrote, should stand for ‘Only Dollars Pay’, open only to companies with the resources to spend money on what is effectively a marketing exercise. “That is antithetical to the open-source model and the Apache way,” he said.

For more about the Open Data Platform

At the heart of this squabble lie profound, longstanding differences in the business models of Cloudera and Hortonworks. Both are key members and prolific contributors to the Apache Software Foundation (ASF), the governance body that oversees the open-source development of Hadoop through the voluntary, collaborative efforts of individual software engineers, many of whom work at competing companies. 

Both Cloudera and Hortonworks distribute the core Hadoop code developed by ASF – but that is where their paths diverge.

On the one hand, Hortonworks has always pursued a ‘pure’ open-source model: all the software it distributes, for free, is based on ASF-developed code. The company makes its money solely on the enterprise subscriptions it levies for supporting these products.

On the other hand, Cloudera makes its money by selling proprietary extensions to Hadoop - for example, its Cloudera Manager management suite.

This brings us neatly to the subject of Ambari. Its inclusion in the ODP kernel effectively makes membership a non-starter for Cloudera. Ambari is an open-source product, but largely led and supported by Hortonworks. More importantly, it competes directly with the proprietary Cloudera Manager.

“Ambari is an open-source project with committers from several other companies, including Pivotal, IBM and RedHat,” says Hortonworks' Cunitz. “Our philosophy has always been that open source is the right way to approach enterprise software development. Last year, Pivotal leaned into Ambari in favour of its own competing tool, because it, too, saw the value in embracing an open-source solution over a proprietary one.”

However, at the core of any Hadoop distribution is pretty much the same ASF-derived technology, requiring the same skills. So the issue of supplier lock-in is really not as serious as the ODP is making out, says Gartner's Heudecker. “We’re talking about technologies that are basically built on the same stack, so porting from one to another, you might have friction at the management console level, yes, but beyond that, there are a lot of similarities between suppliers.”

Shaky ground

But even if enterprise customers are swayed by the inclusion of a management console in the ODP Hadoop kernel, they are still looking to ISVs to build other analytic applications on top of this stack, in areas such as real-time event processing, graph processing, search and text indexing.

It is around this question of integration that the ODP’s message starts to look more shaky.

This was a point picked up by Cloudera's Olson in his blog. “Pivotal and Hortonworks claim that the ODP is driven by an industry-wide longing for standardisation in the Apache Hadoop ecosystem,” he wrote. “I don’t believe them.”

Pivotal and Hortonworks claim that the ODP is driven by an industry-wide longing for standardisation in the Apache Hadoop ecosystem. I don’t believe them

Mike Olson, Cloudera

Cloudera’s partner ecosystem includes some 1,447 companies, Olson noted. “We are simply not hearing from them that they are confused about building applications on core Hadoop,” he observed.

This is an important point, because of the 17 companies currently listed on the ODP website as members, 10 are also Cloudera partners. In other words, even as they are building apps to run on the ODP kernel, many will simultaneously continue to build them for the Cloudera Hadoop distribution, too.

One example is Wandisco, the Sheffield-based, AIM-listed technology company that sells data replication technology for Hadoop. “The next version of our software will run as an application on the Cloudera platform, so, from the perspective of being a member of ODP and working with Cloudera, there will be no change,” says CEO David Richards.

So what does Wandisco get for its fee for gold membership of ODP, rumoured to be $100,000 for two years? Richards won’t comment on the terms of membership, but says signing up has had “a net-positive impact on our pipeline – it’s really helped us”.

And as for the row between Cloudera and Hortonworks? “We largely ignore that battle royale,” he says.

Similarly, when ODP gold member EMC launched its Federation Business Data Lake storage-and-big-data stack in late March, the storage giant was clear that this would support “customer choice of Hadoop distribution, including Cloudera and Hortonworks, along with any future Open Data Platform-based Hadoop distribution”.

Matt Brandwein, director of product marketing at Cloudera, says: “Customers just want to know that their ISVs are certified against a commercial platform they use, not just some common core decided upon by a particular group of suppliers with a vested stake in putting a drag-shoot [slowdown] on innovation.”

In other words, a lot of suppliers, ODP members included, are still hedging their bets. This again begs the question: what are they getting for their membership? In Pivotal’s case, it’s pretty clear: the ODP is a chance to gracefully exit, once and for all, its attempts to pursue a proprietary approach to big data, in favour of a more open-source one, by aligning with open-source champion Hortonworks. 

A platinum ODP membership like Pivotal’s, we’re told, costs $300,000 for two years. EMC, meanwhile, spun off its own Hadoop strategy with the launch of Pivotal in early 2013.

Elsewhere in the consortium, HP made a hefty $50m equity investment in Hortonworks last July. IBM has long supported open source, dating back to its mid-1990s battles with Microsoft, and by supporting ODP it has the opportunity – if it wants it – to invest more time and attention on building out the BigInsights add-on tools it currently runs on its own distribution, potentially to run on ODP and Cloudera’s distribution, too. 

In other words, there are a lot of defensive strategies in play here.

What this boils down to for customers is, effectively, no change at all, as long as major ISVs continue to integrate with Hortonworks/ODP and Cloudera. Gartner's Heudecker says: “The idea that membership of ODP means that an ISV doesn’t need to certify against different distributions only works if everyone is in that club – and they’re not. Otherwise, I don’t see the value today.”

Read more on Big data analytics