hin255 - Fotolia

SAP embraces Spark on Hadoop with Vora

SAP announces Vora software to “deeply embrace” Hadoop as it evolves with Apache Spark

SAP is “deeply embracing Hadoop”, according to its global chief technology officer Quentin Clark, with the announcement of Vora, an in-memory query engine that runs against the Apache Spark framework.

Clark joined SAP in November 2014 from Microsoft, where he was the principal driver of SQL Server, as well as Power BI, BI in Microsoft Office, the data capabilities in Azure and Microsoft’s Hadoop-related offerings.

With his fresh eyes, how does Clark see SAP’s in-memory columnar database Hana, and its associated platform?

“We recognise we are on a mission for our customers to become leaders in digital and to re-imagine experiences in the workplace. SAP is in a great structural position to do that, but we can’t do it alone. We need to create a broad ecosystem, built around the Hana platform,” he said.

The supplier is looking to connect to big data Hadoop stack technologies in deeper ways than heretofore, he added. “We’ve had smart data access [to unstructured in formation in Hadoop data stores], but meaning is only made on the new sets of information, when they are married to line-of-business information.”

What would he say to a line of argument from the Hadoop community that Hana is specialised and expensive? 

The creator of Hadoop, Doug Cutting, chief architect at Hadoop distributor Cloudera, said in a recent interview with Computer Weekly: “Hana is a specialised tool that can do some things that you can’t do with [Cloudera’s] enterprise data hub, but it is narrow and expensive. It may be that there are certain types of applications where it is the only way to get the job done, so you will pay that.”

Clark said Hana is a new category of database that does analytics and processing, and does it in zero response time. 

“Decades ago, the database industry separated transactional processing and analytics into two different databases because they had to. Hana is not niche, and any application that builds to it gets this advantage of bringing OLAP and OLTP together,” he said.

“That is not to dismiss Hadoop, which is appropriate for streaming in data, not for running transactions at the individual record level,” he added.

The chief product officer of cloud business intelligence supplier Birst, Brad Peters, expressed support for Hana, in an interview on another matter with Computer Weekly. 

“There are more real enterprise deployments of Hana than there are of Hadoop at this point. Hadoop is a phenomenal storage and ETL tool, but it is terrible for interactive data exploration and analytics, which is what Hana is phenomenal for,” said Peters. Birst is an SAP partner, offering cloud analytics on Hana.

Read more about SAP’s Hana and Hadoop

In a press statement, SAP’s Clark said: “To succeed in digital transformation, companies need a platform that enables real-time business, delivers business agility, is able to scale and provides contextual awareness in a hyper-connected world. With the introduction of SAP Hana Vora and the planned new capabilities in SAP Hana Cloud Platform, we aim to enable our customers to become leaders in the digital economy.”

Due to be released to customers in late September 2015, Vora is an in-memory query engine that, according to SAP, “leverages and extends the Apache Spark execution framework to provide enriched interactive analytics on Hadoop”.

It is said to help extend in-memory computing to distributed data and “provides OLAP-like analytics with a business semantic understanding of data in and around the Hadoop ecosystem. Companies can enhance their decision-making with full understanding of their business activities in context with SAP Hana Vora”.

In support of Vora, Aziz Safa, vice-president and general manager at Intel IT Enterprise Applications and Application Strategy, said: “Mining large datasets for contextual information in Hadoop is a challenge. SAP Hana Vora will provide us with the capability to conduct OLAP processing directly on these large, rich datasets, all in-memory and stored in Hadoop. This will allow us to extract contextual information and then push those valuable insights back to our business.”

Next Steps

Apache Spark analytics underpins HANA Vora

Read more on Big data analytics