Gajus - stock.adobe.com
The final collapse of big data firm MapR into HPE could be read as a particular fate. But might it also be a sign of things to come for those suppliers that sprang into life around the Hadoop family of big data storage technologies around a decade ago?
Hadoop was, and is an open source distributed processing framework that manages data processing and storage for big data applications running in clustered systems, on commodity servers.
It was created by Doug Cutting and Mike Cafarella, initially to support processing in the Nutch open source search engine. After Google published technical papers detailing its Google File System (GFS) and the MapReduce programming framework in 2003 and 2004, Cutting and Cafarella developed a Java-based MapReduce implementation and a file system modeled on Google's. This they called Hadoop, famously after Cutting’s son’s toy elephant.
MapR, the company, was commonly seen as the third horse – or should that be elephant? – in a race with Cloudera and Hortonworks. The latter two have merged, while MapR has, essentially, gone out of business.
It was announced on Monday 5 August that Hewlett Packard Enterprise (HPE) has acquired all MapR assets for an undisclosed sum. HPE has said it will use MapR’s technology for its own Intelligent Data Platform, which is a basis for artificial intelligence (AI) and machine learning applications.
“MapR’s file system technology enables HPE to offer a complete portfolio of products to drive artificial intelligence and analytics applications and strengthens our ability to help customers manage their data assets end to end, from edge to cloud,” said Antonio Neri, president and CEO of HPE.
MapR’s distinction – as against the other two main Hadoop distributors – was that it eschewed the Hadoop Distributed File System (HDFS) in favour of its own, and had, more widely, more defensible intellectual property in its armoury. It wasn’t only based on open source, which was and is the case with Hortonworks – entirely open source – and Cloudera – less so than the company it has merged with, but more so than MapR.
It seemed a plausible case, often ably articulated by Ted Dunning, chief application architect at MapR.
But for most of this year, MapR has teetered on the verge of collapse. In a letter to employees and a notice MapR filed on 13 May with California’s Employment Development Department, the company said it would shut down its headquarters in Santa Clara and terminate 122 employees there if necessary funding wasn’t obtained by 14 June.
It has now submerged into HPE.
James Curtis, senior analyst, data, AI & analytics, 451 Research, says “it makes a lot of sense for MapR to land at HPE, particular given that MapR has its data fabric offering based on its proprietary file system, making it a good match for a storage vendor. The MapR technology has been solid, so it comes down to HPE pulling off the right execution strategy to make this work.
“Customers are likely glad that MapR found a buyer first and foremost. Next there would be concern about the roadmap and future updates, which HPE has indicated they will continue to deliver”.
Moving beyond Hadoop
Meanwhile, the merged Cloudera/Hortonworks outfit has not experienced plain sailing in 2019. CEO Tom Reilly and Cloudera co-founder and chief strategy officer Mike Olson have left the company, with first quarter revenue of $187m only slightly up on the $182m combined revenues of Cloudera and Hortonworks for the same year-ago quarter, as noted by The Register in June. Reilly conceded, in a financial analyst call, that the merger had created uncertainty among both sets of customers.
Cutting, chief technology officer of Cloudera, and co-inventor of Hadoop, in an interview with Computer Weekly earlier this year, expressed confidence that the new Cloudera would ultimately find a path to growth, because it was not fatally wedded to any one data storage or database technology.
He said the company was a data technology company that had moved on from Hadoop. He also expressed no sympathy for open source-based database or data storage suppliers, which would be prone to cry “foul” when and if the public cloud providers – Amazon Web Services, Google and Microsoft – package up their capabilities into cloud services. MongoDB did this earlier in 2019.
Doug Cutting, Cloudera
“If something is freely licensed, and someone uses it without paying, that is things working out as designed. If you are angry about that, then that is a form of lunacy,” said Cutting.
As for Hadoop’s seeming eclipse, he said: “People saw the open source model was successful and built things around Hadoop to the degree that while it is not quite obsolete, it is getting there over time. MapReduce is mostly inferior to Spark, for example. HDFS is still a great file system, but as we see more of a shift to cloud, and spinning up clusters on demand, you might be building on S3 as your storage. And Yarn is needed much less as you are using public or private cloud, because you are no longer time sharing on the cluster, but bringing up dedicated clusters that are short-lived per application, so the need for a scheduler is lessened.
“With our model, we are happy to move and adopt new technologies, since the customers are not paying for the technologies themselves, and so there is no need for us, as a vendor, to keep them trapped in a licence.”
And while Cutting did not mention MapR by name, he said it was hard to see companies such as Mongo, Databricks, Elastic or Confluent apart from their particular technologies. “They are single technology companies – it is hard to imagine Confluent without Kafka, or Databricks without Spark, and so on. We are trying to provide a set of tools to solve a particular set of problems and those tools will evolve,” he said.
Has MapR, by staking its ground on having its own proprietary intellectual property, lost out, and will Cloudera, by being more aligned with open source, prove comparatively more successful? Or are all these big data technology firms that emerged to do Hadoop distribution a decade or so ago now entering a trough of disillusionment?
“Probably not”, says the 451’s Curtis. “A lot of it is how the market is shifting and settling. There’s still some demand for Hadoop and similar distributed data-processing frameworks, but the majority appears to be moving to the cloud, of course”.
Nevertheless, is hard to resist the cliché: time will tell.