Raymie Stata on Hadoop's first decade

Raymie Stata, the chief executive and co-founder of Altiscale, and a former chief technology officer at Yahoo, had some reflections on the first ten years of Hadoop when he spoke to me on the phone recently, while driving to work in Palo Alto.

Raymie was an early champion of Hadoop. Indeed, at Yahoo, he engaged the framework’s founders Doug Cutting and Mike Cafarella as consultants in 2004, and employed them in 2005. He knew Cutting wanted to re-implement his Nutch web crawler project using the MapReduce approach that had been developed at Google.

I asked Raymie if it struck him as odd that, though a decade old, Hadoop has not made more implementation progress in enterprises. He agreed that it was still early days for the business value of Hadoop, but made the interesting point that the future value of Hadoop is already here, it is just unevenly distributed – invoking the famous quote from William Gibson (he of the cyberpunk novel Neuromancer) that the future is already here, it is just uneven in its distribution.

He argued that the techniques clustered, and clustering around Hadoop partake strongly of the collective wisdom of the two decades or so of the history of the (web era) internet. Problems that the internet companies confronted some ten to twenty years ago are now facing large, straight corporate organisations now, he said – chiefly with respect to the instrumentation of their worlds. “The blending of cyberspace and the real world was faced up to by the internet companies some decades ago”, he said, “and other industries are facing it now”.

It is, he said, that “impact of big data in the instrumentation of your world outside your firewall” that constitutes the main significance of Hadoop, ten years on.

“It is all a means to an end, things like YARN and Spark coming along. What are people going to do with them? Take drug efficacy, for example. That differs tremendously from person to person, and being able to use genomic information to solve that problem is revolutionary”.

He gave other examples, such as supply chain optimisation going to a new level, revenue optimisation in hotels, and qualifying people for credit using their digital data exhaust. But it was the medicine example that he was most passionate about.