Pentaho aims to alleviate big data pains

Seemingly retaining its original name, technology stack and altogether vibe-ness with competancy over a year now since being acquired by Hitachi Data Systems, Pentaho is putting out the ‘data developer/analyst’ messages and tuning up its own integration prowess in the process.

The data integration, visualisation and analytics company has turned the volume up with announcements including SQL on Spark, which will focus on (attempting to) overcome big data complexity, skills shortages and integration challenges in complex enterprise environments.

Manual coding

Software application developers will still be presented with data enrichment from the firm’s platform and tools, but the suggestion here is that Pentaho will have supported more of the big data technology ecosystem… and, so therefore, less manual coding will be required.

Pentaho also now expands its existing Spark integration. Data analysts can now query and process Spark data via Pentaho Data Integration (PDI) using SQL on Spark.

According to the firm, “[Users can] coordinate, schedule, reuse and manage Spark applications in data pipelines more easily and flexibly – expanded PDI orchestration for Spark Streaming, Spark SQL and Spark machine learning (Spark MLlib and Spark ML) to support the growing number of developers who use multiple Spark libraries.”

Also here, there is the option to integrate Spark apps into larger data-driven processes – PDI Orchestration of Spark applications written in Python benefits developers writing Spark applications in this language.

Hand-code? No thanks!

Pentaho’s metadata injection capability is here to onboard multiple data types faster and allow data engineers to dynamically generate PDI transformations at runtime instead of having to hand-code each data source, reducing costs by 10X.

“Securing big data environments can be extremely difficult because the technologies that define authentication and access are continuously evolving. Pentaho expands its Hadoop data security integration to promote better big data governance, protecting clusters from intruders. These include enhanced Kerberos integration for secure multi-user authentication and Apache Sentry integration to enforce rules that control access to specific Hadoop data assets,” said the company, in a press statement.

Avro and Parquet

Pentaho now also supports the output of files in Avro and Parquet formats in PDI, both popular for storing data in Hadoop in big data onboarding use cases.

“Our latest enhancements reflect Pentaho’s continued mission to quickly make big data projects operational and deliver value by strengthening and supporting analytic data pipelines”, says Donna Prlich, senior vice president, product management, product marketing & solutions, at Pentaho.

The firm insists that enterprises can use its technology to focus on big data deployments without the complexity involved in data preparation by taking advantage of new, high potential technologies like Spark and Kafka in the big data ecosystem.