Is Hadoop hard?

The problem with Hadoop, it is said, is that tangible deployment of this open source framework for big data is tough, complex and not exactly out-of-the-box simple.

Apache Hadoop describes itself as:

“A software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models.”


That was “simple” programming models OK?


But – and it is one of those really big buts…

…. on stage last year at the Red Hat Summit keynote in Boston one of the plenary session speakers (let’s call him Bradley or Zack for the sake of argument, I forget who it was) asked how many of the open source focused developer/programmer attendees had witnessed full-scale Hadoop big data implementations in their business?

About 10 percent or less of hands went up.

“Well then,” said Zack or Bradley. “You must be the really smart guys, as this stuff ain’t easy,” he enthused ebulliently.

Some good clarity here was provided recently by Forbes when speaking to Mike Driscoll, CEO of Metamarkets, a firm that focusses on real time digital marketing analytics.

Driscoll’s evaluation of Hadoop in 2012 is that the platform does indeed sport great data crunching power, but that it provides only “a fraction of” what might represent a complete application.

Speaking on Driscoll is quoted explaining that Hadoop excels a batch processing of large-scale, unstructured data, such as web server logs.

What Hadoop really is?

“[But] Hadoop is a foundational technology, but it is not a database, it is not an analytics environment, and it is not a visualisation tool. By itself, it is not a solution for helping businesses make better decisions,” he said.

Since 2012, things don’t appear to have gotten that much easier.

The Strata Conference + Hadoop World 2013 closes up this week in New York, so did this provide any clarity?

MapR Technologies used the event to announce native security authentication and authorisation with the MapR Distribution of Hadoop.

Could the ability for businesses to more easily meet stringent security requirements and regulations (with the security functionality that comes included here) make Hadoop adoption easier?

NOTE: MapR’s security protects against user impersonation, rogue daemons and malicious remote procedure calls.

“Hadoop is an enterprise solution, making security a critical component,” said Ben Woo, principal analyst, Neuralytix.

“Very few Hadoop clusters today meet enterprise-grade security requirements. With MapR’s innovations businesses can meet stringent security requirements and regulations easily with security functionality that come out-of-the-box with the MapR Distribution for Hadoop.”

Was he being paid to say that? We think not.

All operations on the MapR Distribution of Hadoop are secured natively. For example, file reads and writes, HBase operations and MapReduce job submissions are secured natively. Intra-cluster node-node interactions including remote procedure calls are also secured natively.

Security is not the only step forward needed to make Hadoop more easily deployable, but it’s a good stepping stone and will be a big bridge for some.

This is of course not the first security consideration for Hadoop that exists, but it is another move forward to reduce complexity.