98 Results for:hadoop

  • Sort by: 

SAP HANA Vora

SAP HANA Vora is an in-memory computing engine designed to make big data from Hadoop more accessible and usable for enterprises. It takes data stored in Hadoop and integrates it with data from enterprise systems, ... Read Full Definition

Associated Glossaries

Apache HBase

Apache HBase is a column-oriented key/value data store built to run on top of the Hadoop Distributed File System (HDFS). Read Full Definition

Apache Hive

Apache Hive is an open source data warehouse system for querying and analyzing large data sets that are principally stored in Hadoop files. Read Full Definition

Apache Pig

Apache Pig is an open-source technology that offers a high-level mechanism for parallel programming of MapReduce jobs to be executed on Hadoop clusters. Read Full Definition

Associated Glossaries

Apache Falcon

Apache Falcon is a data management tool for overseeing data pipelines in Hadoop clusters, with a goal of ensuring consistent and dependable performance on complex processing jobs. Read Full Definition

Associated Glossaries

SequenceFile

A SequenceFile is a flat, binary file type that serves as a container for data to be used in Hadoop distributed compute projects. SequenceFiles are used extensively with MapReduce. Read Full Definition

VMware vSphere Big Data Extensions (BDE)

VMware vSphere Big Data Extensions (BDE) is a virtual appliance that enables administrators to deploy and manage the Hadoop clusters for big data analytics in the vSphere virtual infrastructure. Read Full Definition

Associated Glossaries

Avro (Apache Avro)

Apache Avro project is a row oriented object container storage format for Hadoop as well as a remote procedure call and data serialization framework. Read Full Definition

Associated Glossaries

enterprise data hub

An enterprise data hub, also referred to as a data lake, is a new big data management model for big data that utilizes Hadoop as the central data repository. Read Full Definition

Associated Glossaries

Google Cloud Dataproc

Google Cloud Dataproc is a managed service within the Google Cloud Platform for processing large datasets, such as those used in big data initiatives. Dataproc is built on open source platforms including Apache ... Read Full Definition

Associated Glossaries