Hadoop 2

Apache Hadoop 2 is the second iteration of the Hadoop framework for distributed data processing.  Hadoop 2 adds support for running non-batch applications as well as new features to improve system availability.


SQL-on-Hadoop is a class of analytical application tools that combine established SQL-style querying with newer Hadoop data framework elements.

Hadoop as a service (HaaS)

Hadoop as a service (HaaS), also known as Hadoop in the cloud, is a big data analytics framework that stores and analyzes data in the cloud using Hadoop.

Hadoop data lake

A Hadoop data lake is a data management platform comprising one or more Hadoop clusters.

Hadoop is an open source distributed processing framework that manages data processing and storage for big data applications running in clustered systems.

Apache Hadoop YARN

Apache Hadoop YARN is the resource management and job scheduling technology in the open source Hadoop distributed processing framework.

Hadoop cluster

A Hadoop cluster is a special type of computational cluster designed specifically for storing and analyzing huge amounts of unstructured data in a distributed computing environment.

Hadoop Distributed File System (HDFS)

The Hadoop Distributed File System (HDFS) is the primary data storage system used by Hadoop applications.


MapReduce is a core component of the Apache Hadoop software framework.

Apache Parquet

Apache Parquet is a column-oriented storage format for Hadoop.

