A new open source big data framework

MapR and Mesosphere are announcing a new open source big data framework (called Myriad) that allows Apache YARN jobs to run alongside other applications and services in enterprise and cloud datacentres.

What is Apache YARN?

Apache Hadoop YARN (Yet Another Resource Negotiator) is a cluster management technology said to fall into the ‘second-generation’ Hadoop family. YARN has also been called a large-scale enterprise-level distributed operating system for big data applications.

The MapR and Mesosphere initiative was kicked off by a developer at Ebay and turned into a collaborative effort between multiple companies — the Myriad project now unifies Apache YARN and Apache Mesos resource management.

NOTE: Apache Mesos is a distributed systems kernel that abstracts CPU, memory, storage and other compute resources allowing developers to program against the datacentre like a single pool of resources.

Mesosphere itself is the creator of the Mesosphere Datacenter Operating System (DCOS) for managing datacentre and cloud resources.

MapR Technologies, Inc. is a provider of a well-ranked distribution for Apache Hadoop.

A single pool of resources

Myriad (available on GitHub) is an open source project built on the vision of consolidating big data with other workloads in the datacentre into a single pool of resources for greater utilisation and operational efficiency.

Concurrently, there are plans to submit Myriad as an Apache Incubator project with the Apache Software Foundation in the first quarter of 2015.

Where Hadoop is hard work

To date, Hadoop developers are said to have been “forced to run” big data jobs on dedicated clusters, leaving those resources isolated from other applications and services in production, and typically (says the firms) resulting in poor server utilisation rates.

How Myriad works

Myriad uses both Apache YARN and Apache Mesos, allowing big data workloads to run alongside other applications including long-running Web services, streaming applications (like Storm), build systems, continuous integration tools (like Jenkins), HPC jobs (like MPI), Docker containers, as well as custom scripts and applications.

“Big data developers no longer have to choose between YARN and Mesos for managing clusters,” said Florian Leibert, CEO and co-founder of Mesosphere.

“Myriad allows you to run both, and to run all of your big data workloads and distributed applications and systems on a single pool of resources. Big data developers get the best of YARN’s power for Hadoop-driven workloads, and Mesos’ ability to run any other kind of workload, including non-Hadoop applications like Web applications and other long-running services.”

“Myriad enables businesses to tear down the walls between isolated clusters just as Hadoop enables businesses to tear down the walls between data silos,” said Jim Scott, director, enterprise strategy and architecture, MapR Technologies. “Developers can now focus on the data and applications which the business depends on, while IT operations can manage compute resources to maximize business agility and minimise operating expenses.”