Dirk Vonten - Fotolia

DataWorks 17: Hortonworks promotes maturity of HDP upgrade

Hadoop distributor leads on an upgrade to its Hortonworks Data Platform at the DataWorks 2017 Summit in Munich

At this year’s DataWorks conference in Munich, Hortonworks has announced version 2.6 of its data platform with a promise of reducing systems of record production for its customers.

In a pre-conference briefing with Computer Weekly, the Hadoop distributor’s chief technical officer, Scott Gnau, said: “Customers are genuinely excited about this. It means fewer systems of record, and real-time classification of workloads against [our] HDP [Hortonworks Data Platform]”.

The supplier said that HDP 2.6 was, in its view, “the industry’s only true secure, enterprise-ready open source Apache Hadoop distribution that addresses the complete needs of data at rest, powers real-time customer applications, and delivers robust analytics that accelerate decision-making and innovation”.

In a press statement, Gnau added: “HDP 2.6 showcases the advantages of the open source community. It introduces key new enterprise features and performance improvements that will benefit our customers immediately – no application rewrite required.”

The supplier said the upgrade also “introduces ACID [atomicity, consistency, isolation and durability] merge functionality to enable additional use cases for optimising existing Enterprise Data Warehouse investments without requiring all data to be reloaded”.

HDP 2.6 was also said to be available on IBM “Power Systems”. Hortonworks cited work it is doing with IBM, including joint support for the Open Data Platform Initiative – originally announced in 2015 as one way, putatively, to make the Hadoop technology stack more user-friendly for enterprises.

The statement quoted Srinivasan Sankar, data office lead at Massachusetts firm the Hanover Insurance Group, in support of HDP 2.6: “At the Hanover, our strategy is focused around modern, business-led analytics and driven by and for the business. Hortonworks Data Platform is an important part of that strategy, and we are looking forward to the new HDP 2.6 functionality. We are particularly excited to see the enhancements in Spark 2.1.”

Read more about the Hortonworks Hadoop Summits

Apache Spark is an open source parallel processing framework for running large-scale data analytics applications across clustered computers. It is often said to be a replacement for the MapReduce processing framework in the Hadoop family of technologies.

Tony Baer, principal analyst at Ovum, said of the upgrade: “There is a need to improve SQL performance and support, along with Spark adoption in Hadoop-related workloads. A key enhancement [here] is the addition of Upsert [inserting rows into a database table] support, which is essential for building confidence in data currency and making Hadoop BI-ready. Backing with LLAP [Live Long and Process functionality] and Spark 2.1 should produce the kind of service levels that BI users of [data summarisation, query and analysis tool] Hive expect.”

Mark Mossel, director of enterprise data management at Geisinger Health System, a health insurance firm based in Pennsylvania, said: “We are using HDP to optimise our enterprise data warehouse, with the immediate goal of enriching clinical data with additional data, such as billing and claims records. Apache Hive is a key part of that optimisation programme. Over the past several months, we have been running a preview version of Hive with LLAP, and we have seen dramatic improvement to query performance.”

Read more on Business intelligence software