The newly minted Open Data Platform (ODP) initiative is “an affront to the Apache Software Foundation”, says Cloudera CEO Tom Reilly in an interview with Computer Weekly.
The ODP brings together a group of companies whose goal is to promote Hadoop, the open-source distributed computing framework used in big data.
The initiative involves Cloudera’s rivals, Hortonworks and Pivotal. Like Cloudera, Hortonworks is a Hadoop distributor, and Pivotal is a spin-out company from EMC and VMware that provides custom applications for data analytics by way of cloud delivery. The ODP involves 13 other companies, including data warehousing specialist Teradata and statistical software company SAS.
Reilly confirms that neither his company nor fellow Hadoop distributor MapR have been invited to join the ODP. “When we found out about it, shortly before launch, we were asked to sign a non-disclosure agreement and found out there was a flat fee of $300,000,” he says.
“It runs counter to, and is an affront to, the ASF, where the best projects will rise to the top, whether they come from a company or an individual. Take [the parallel processing framework] Apache Spark as an example. That was invented by a college student at UC [University of California] Berkeley. It would never have seen the light of day under the ODP.”
Matei Zaharia, CTO of Databricks, has said the idea for Spark came from work at Berkeley's Algorithms, Machines and People Lab in 2009.
“The ODP is solely for the benefit of a few suppliers and is against the open-source community,” says Reilly.
He cites Yonik Seeley’s recent recruitment to Cloudera as an indication of how his company is regarded in the open-source community. Seeley is the creator of the Apache Solr search engine. “His joining us shows our commitment to Apache,” says Reilly.
For more about the Open Data Platform
- How group of IT suppliers in the US has launched an “open data platform” association to boost big data technology.
- Herb Cunitz, president, Hortonworks says the ODP means “standardisation to take the friction out of the market” for Hadoop.
- Is Cloudera v Hortonworks about this question: Hadoop to complement or replace data warehouse?
He says the initiative is not solving a perceived problem of the Hadoop ecosystem, being a disparate set of components, as Raymie Stata, CEO of Altiscale, has blogged. Indeed, he sees no such problem. “Apache Big Top addresses that,” says Reilly.
“This initiative will simply lock customers into Hortonworks as a distributor. Yes, SAS and Teradata are signed up, but they are not giving [the ODP] overwhelming support.”
Reilly says the top strategic issue for Cloudera in 2015 is delivering high-impact business applications with big data. “We think that this year, big data will move from IT to business. In each vertical industry we are in, we are having boardroom discussions about real business value. For instance, combating fraud and money-laundering are boardroom issues at financial services companies.”
Cloudera, whose UK customers include BT and Compare the Market, recently announced a deal with MasterCard under which the credit card company will deliver PCI DSS compliance services to its customers.
“They have PCI-certified their deployment of our enterprise data hub,” says Reilly. “But they are now taking that to market through MasterCard Advisors. They are offering this as a secure data vault for credit card information,” he says.
Cloudera and Intel
Reilly has a background in information security, having served as president and CEO at security management company ArcSight, before and after its 2011 acquisition by HP, and says he has brought this security focus to Cloudera.
“We have worked with Intel to ensure encryption at the hardware level for when data lands in Apache Hadoop by calling an instruction in the x86 chip set,” he says.
The relationship with Intel, which invested $740m in Cloudera in 2014, is of “massive” significance, says Reilly, “even more important now that we are in it than when we were forming it”.
He adds: “Intel anticipates that, within two years, Hadoop will be the number one application driving servers and datacentres, surpassing ERP applications. It has 98% market share in datacentres, so it has clarity. It wants to design new chips for Hadoop analytic workloads.
“We are now working on the software to take advantage of specific instructions on those chips. The processing capability will go up dramatically. Intel is a very smart company. It selected us because we have the lead engineers working on compute intensive use cases, such as the Impala SQL on Hadoop engine, and our contribution to the Spark project.
“For customers, this will mean they can do more workloads with fewer servers and so reduce total cost of ownership.”
Refute the 'myth'
Reilly is quick to refute the “myth” that Cloudera would like its “enterprise data hub” to replace traditional enterprise data warehousing. “No, we are complementary to the enterprise data warehouse, and we do have a partnership with Teradata,” he says.
He says corporate IT, in both the US and the UK, will find his company's technology attractive because there are technical differentiators. “For large enterprises with data security and governance concerns, and concerns over total cost of ownership, we have the systems management tools to address those.”
The UK is Cloudera's strongest market outside the US, he says, and it is particularly strong in financial services and telecoms, “where we have a great track record in reducing customer churn”.
Although Cloudera is not a public company, Reilly confirms that it surpassed $100m in revenue in 2014. “That's very unusual for an enterprise software company of our age – we are seven years old,” he says. “Two thirds of that revenue comes from software licences, not from services, and we are showing 100% year-on-year growth.
“We have 525 customers. We added 250 in 2014 and increased our number of partners from 800 to 1,450. We are internally focused on customer renewal and expansion rate. For instance, we have 150 projects at one telco.”