Pentaho ignites Apache Spark orchestration

Orlando-based open source analytics company Pentaho is ‘in the process of being acquired’ by Hitachi Data Systems, but the brand appears strong enough to be retained 100% intact inside of the new parent company.


So it came to pass then that Pentaho has continued to deliver on what it perceives to be the future of analytics.

The firm has this week announced the native integration of its Pentaho Data Integration (PDI) software with Apache Spark to enable orchestration of Spark jobs.

NOTE: Apache Spark is an open source processing engine engineered around core attributes of machine learning, speed, ease of use and analytics

This integration is hoped to lower the skill set requirements required as Spark is incorporated into big data projects.

Spark works with big data to store, blend and govern data and said to still be an ’emerging’ big data technology.

“For two years, we experimented with possible use cases based on our big data blueprints and sizing the enterprise market opportunity for Spark. Our customers now benefit from that work with simplified, real-time analytic capabilities, ” said James Dixon, CTO at Pentaho.

“Our open-source heritage allows us to quickly evolve our capabilities keeping our customers’ big data technology options open, reducing risk and saving considerable development time while taking advantage of the latest innovations in popular big data stores.”

This integration with Spark follows other labs efforts that have led to support for YARN and the Adaptive Big Data Layer. Following the native support of YARN alone, enterprise customers like RichRelevance, edo Interactive and MultiPlan have been able to innovate and drive greater value from Hadoop.