The Hortonworks Cloudera merger, announced yesterday, has riled competitor, MapR, and has also provoked analyst comment pointing up the threat to all the original Hadoop distributors from cloud providers – suggesting the merger might bring some much-needed coherence for users, albeit with reduced choice.
The two giants of the Hadoop distribution world have fought each other like lion cubs, but seem now to have decided to join forces. Both have their roots in the open source stack of Hadoop technologies, originally developed, in 2006, as the Hadoop Distributed File System, at Yahoo, leveraging some technology open sourced by Google, MapReduce.
Meanwhile, third rival MapR, which also has its roots in Hadoop distribution but decided to eschew open source HDFS for its own proprietary file system, MapR-FS, has reacted acerbically to the news that its twin rivals have opted to become one entity.
John Schroeder, CEO and chairman of the board, MapR, said: “Customers will not gain innovation benefits through this merger. The merger is about cost cutting. Cloudera and Hortonworks have several redundant competing technologies, for example, Ambari and Cloudera Manager or Sentry and Ranger.
“The merger announcement says these redundant technologies will be ‘unified’, meaning some will be discontinued [causing] customers undue switching cost pain. They have little technology that is accretive to their overall platform. They are claiming a next-generation data platform without the underlying technology. MapR has delivered on a next-generation platform based on nine years of hard engineering.
“We support a broader set of workloads from Hadoop, to Spark to AI/ML and already provide hybrid cloud and containerisation with Kubernetes. MapR customers are already implementing what the merged company says they are hoping to deliver.”
Big data under management
Of the to-be-merged companies, Cloudera was first to market in 2008, and Hortonworks followed in 2011. In a recent interview with Computer Weekly, Rob Bearden, CEO and co-founder of Hortonworks, said the company’s software had always been about the business value to be derived from bringing unstructured, big data under management and less about Hadoop, as such.
“Back in 2011, our intuition was that all the ‘new paradigm’ data sets – the mobile, the click stream, the sensor data – was all coming at enterprises very quickly, and in large volumes. Architecturally, that data would not go into relational environments.
“Also, it was data about [companies’] customers, products and suppliers that was pre-transaction or pre-event. Our hypothesis was if we could bring that data under management, and learn how to get value from it, we could transform business models to be less reactionary – post event, post transaction – and more proactive – pre-event, pre-transaction. And we thought that Hadoop had the best shot of being the platform that would do that.”
In another recent interview with Computer Weekly, Amy O’Connor, chief data and information officer at Cloudera, said that when she was a customer at Nokia, she had been impressed that the company’s founders Amr Awadallah and Mike Olson said that all companies should be able to transform their businesses with new ways of treating data, not just the likes of Yahoo or Google.
In yesterday’s merger statement, Tom Reilly, chief executive officer at Cloudera, said, presenting the two suppliers as complementary: “By bringing together Hortonworks’ investments in end-to-end data management with Cloudera’s investments in data warehousing and machine learning, we will deliver the industry’s first enterprise data cloud from the edge to AI.”
Read more about Hortonworks and Cloudera
- Cloudera-Hortonworks merger narrows Hadoop users’ options.
- Four factors for comparing the top Hadoop distributions.
- Hortonworks supports Google Cloud Storage and has also broadened cloud deals with Microsoft and IBM, aiming to increase cloud uses of its big data platform.
Matt Aslett, analyst at 451 Research, said of the proposed merger in a comment provided to Computer Weekly: “There shouldn’t be significant overlap in terms of customers. While many companies might have both Cloudera and Hortonworks distributions running tactical deployments, in terms of strategic adoption, most organisations have chosen one or the other, and there is a commitment from Cloudera that customers will be supported on current offerings for at least three years.
“While there is a common foundation of Apache Hadoop and associated open source projects, the two companies do have some differentiating functionality and Cloudera clearly sees opportunities to sell Hortonworks DataFlow (HDF) to Cloudera customers for streaming analytics and Cloudera Data Science Workbench to Hortonworks clients for machine learning and AI,” he said.
“There is also significant overlap in some areas, particularly data management, data governance and data security. In relation to overlapping products, Cloudera has said that the combined engineering teams will identify the best and merge them where appropriate. This is likely to be a lot easier said than done, and could be a major hurdle to the company realising its potential R&D cost savings if not managed effectively.
“If the leadership and engineering teams of the combined company are able to put aside their historically sometimes acrimonious differences to successfully rationalise the merged product portfolio, the result should be positive for customers overall. It will potentially involve a reduction in choice, it’s true, but given the proliferation of competing projects from Cloudera and Hortonworks in recent years, that may not be a bad thing.”
Doug Henschen, an analyst at Constellation Research said, in a comment provided to our sister site SearchDataMangement.com: “The move to the cloud by enterprises is sapping growth and revenue potential for Cloudera and Hortonworks such that both players can’t sustain strong and profitable growth. Amazon EMR and Spark services, and similar Azure and Google services, are seeing faster growth, and, together, are capturing the lion’s share of the big data platforms market.”