Hadoop is not the only fruit

It is true, Hadoop is a key focal point for many of us when we talk about big data — and indeed, open source big data projects.


However, it’s important to think outside the Hadoop box for a number of reasons.

Outside the Hadoop loop

By its very nature Hadoop is open source, so many of its developers and other contributors will naturally revel in the openness of the entire open code surface and work on other projects as well… these ‘tangential’ (many of them substantial) projects are typically complementary to Hadoop.

Where projects in fact compete with Hadoop, that’s also a good thing as it keeps the overall drive for efficiency and functional excellence as sharp as it should be.

Why open source is so good

We might suggest that the there is a core reason for why open source is so well suited to big data… that is to say, if we accept that Hadoop is hard and that the actual implementation of big data analytics is still in its relative infancy, then we can see how the open customisability of open software structures could be better suited to big data projects as they now grow.


Looking outside the Hadooposphere, the Enterprise Apps Today website brings together a much needed selection pack cum Obligatory List Article of some of the other open source big data tools out there.

Lumify is an open source data integration, analytics, and visualisation platform built to help you understand the world of data.

Lumify features include its ability to analyze relationships, automatically discover paths between entities — it can also overlay data as layers on a map for a geographical view of the data model.

Talend Open Studio for Big Data provides simple graphical tools and wizards to generate native code that helps you leverage the full power of Hadoop

HPCC Systems Big Data — as detailed at the above link, “Is a platform for manipulating, transforming, querying and data warehousing your Big Data and is an alternative to Hadoop. It uses the Thor data refinery, Roxie data query/delivery engine and Enterprise Control Language (ECL) as an alternative to Apache Pig. (ECL is claimed to be 4.45 times faster than Pig on average.)”

You can read Paul Rubens’ piece at the above link for more clarification on the other tools available in this space.

Image credit: http://isifol.com/