Using Sqoop to ploop to Hadoop

| No Comments
| More

Syncsort has enhanced its DMX-h Hadoop ETL software.

So what?

Extract, Transform, Load (ETL) refers to three separate functions combined into a single programming tool.

Getting data from enterprise data warehouses and legacy systems (including mainframes) into Hadoop is clearly a key implementation today for big data ETL jobs.

sqoop-logo.png

Syncsort is also addressing growing offload demand by supporting the Sqoop Apache Hadoop initiative.

Apache Sqoop is a tool designed for transferring bulk data between Apache Hadoop and structured datastores such as relational databases.

"Many of our customers are looking to free-up data warehouse capacity and reduce legacy system costs by offloading expensive data workloads and data to Apache Hadoop," said Lonne Jaffe, CEO Syncsort. "

The company also contributes to the Project SILQ Technology Preview.

apache-hadoop-sqoop1.jpg

This is a "data warehouse offload technology" for analysing SQL scripts and providing a detailed, graphical visualisation of the entire data flow and best practices on how to develop the corresponding DMX-h jobs in Hadoop.

A final note, the company is also focused on Tableau Integration: Syncsort's Hadoop ETL now allows users to create Tableau data extracts that blend data from a wide variety of sources including data warehouses, mainframe and other legacy systems, facilitating advanced analytics and visualisation in Tableau.

NOTE: our story title here should really be: Using Sqoop to perform ETL data warehouse offload technology functions to Hadoop, but we wanted to coin ploop as a new shorthand, so go figure.

"In addition to the product enhancements, Syncsort continues to actively invest in the Apache Hadoop open source community including new open source initiatives that help simplify and accelerate offload, and enhance performance and efficiency of the workloads in Hadoop. Syncsort's new initiative extends Sqoop, a framework to move data between relational databases and Hadoop," said the company, in a prepared press statement.

Syncsort is open sourcing to the Sqoop project the ability to move multiple mainframe data sets in parallel to Hadoop and store them in Sqoop supported file formats.

The open source also opens the interface to allow anyone to extend the support for more complex mainframe data files. The upcoming release of DMX-h uses this interface, providing a plug-in to move all mainframe data formats, including binary sequential data with COBOL copybook metadata and VSAM to Hadoop.

Leave a comment

(You will need either to sign in or enter a valid email address to comment.)

About this Entry

This page contains a single entry by Adrian Bridgwater published on June 5, 2014 8:21 AM.

Pentaho: don't get blinded with (data) science was the previous entry in this blog.

What do firms really mean by Hadoop leverage? is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.

Categories

Archives

-- Advertisement --