Hitachi Group's Pentaho: 'metadata injection' kills big data complexity

Open source data analytics player Pentaho has upped its metadata injection feature set.

Metadata injection?


Yes, metadata injection — the diversity of data and the sheer number of different data sources out there gives us a problem in terms of knowing what data means what — so metadata injection is a means of putting more “information about information into the information”, if you will.

With metadata injection, Transformation logic (the T in ETL) is machine-generated; rather then developers having to hand code it.

Pentaho suggests that the complexity of this process is what has been holding back banks (and other firms) from being able to integrate and analyse diverse and high numbers of data sources (especially unstructured data).

Data onboarding

The firm’s metadata injection feature set is meant to combat so-called “data onboarding” i.e. the process through which we get data into databases for analysis and then, logically, into the big data analytics pipeline.

According to Pentaho, “Modern big data onboarding is more than just data loading or movement. It includes managing a changing array of data sources, capturing metadata, making processes repeatable at scale and ensuring control and governance. These challenges are compounded in big data environments like Hadoop.”

In Pentaho 6.1, data-centric developers and others now have a wider array of options for dynamically passing metadata to Pentaho Data Integration at run time to control complex transformation logic.

Data ingestion & preparation

Teams can now drive hundreds of data ingestion and preparation processes through a few transformations and so accelerating time to delivery of governed analytics-ready data sets.

NOTE: Typically, data onboarding is a highly repetitive, manual and risk-prone process that creates a bottleneck in the data pipeline.

In addition to the new features in 6.1, Pentaho has also introduced a new self-service data onboarding blueprint. This architected process is meant to allow business users to onboard a variety of data themselves — without IT assistance — streamlining the data ingestion process.

“In this latest release, Pentaho streamlines the hand-offs between the different stages of the analytic data pipeline, including onboarding, engineering, preparing, and analysing data,” claims Donna Prlich, senior vice president of product marketing and solutions, Pentaho, a Hitachi Group Company.

Pentaho says that 6.1 also adds enhancements to its data integration and analytics platform to help data pipelines to accommodate greater volume variety, and complexity of data.