Apache Twill: real abstraction is a decoupled algorithm

Cloud computing is a ‘decoupled’ thing.


To be clearer, this term decoupling arises time & time again in relation to the cloud computing model of service-based processing and storage power.

Two senses of mobile

Decoupling is a good emotive term that transcends previous pre-cloud notions of mere networking to provide us with a new notion of a computing layer where applications and their dependent resources can be set free for a more mobile (in the interchangeable sense AND in the smartphone sense) existence.

But this is superficial decoupling (actually it’s not, but we’re making a point here… so go with it for now), deeper decoupling occurs when we start to look down into the substrate.

Deeper decoupling involves disconnecting individual management layers, computing platforms and processing engines from their core algorithmic kin.

Apache Twill is an abstraction layer that sits over Apache Hadoop YARN (the clustering and resource manager) that reduces the complexity of developing distributed applications — it does this by decoupling Hadoop itself from the MapReduce algorithm.

This action is designed to allowing developers to focus more on their application logic.

Hadoop is then decoupled to be able to run with other processing engines such as Spark for example.

It’s like threads

The Apache Twill project team explains that this technology allows programmers to use YARN’s distributed capabilities with a programming model that is similar to running ‘threads’ i.e. separated-out dependent streams of logic that can exist on their own.

While YARN is extremely complex technology, Twill aims to make this easier to pick up programmatically.

According to the development team, Apache Twill dramatically simplifies and reduces development efforts, enabling you to quickly and easily develop and manage distributed applications through its simple abstraction layer on top of YARN.

Its like distributed is good, decoupled distributed is really good — but abstracted decoupled distributed is even better.