News

Executive interview: Google's big data journey

A programming concept developed in 1958 inspired the seminal Google whitepaper that introduced the world to MapReduce

Cliff Saran, Managing Editor

Published: 19 May 2015 4:30

A programming concept developed way back in 1958 was the inspiration behind the seminal Google white paper that introduced the world in 2004 to MapReduce, an early big data initiative.

With MapReduce, Google tried to address a problem it had identified in the way it processed internet search data. In essence, MapReduce split big data in a way that enabled it to be processed with Hadoop running on low-cost commodity hardware.

The search engine company has now extended its data processing strategy and recently introduced Cloud BigTable, a fully managed, scalable NoSQL database service.

Internet search, social media and the internet of things are some of the IT areas experiencing huge data growth.

Indeed, experts predict traditional relational databases will be unable to process the tsunami of data that a truly digital society will require.

In that MapReduce white paper over a decade ago, Jeffrey Dean and Sanjay Ghemawat from Google described how there was no single infrastructure where heterogeneous jobs could be scheduled and processed in one common infrastructure. Everything and anything had to be hand-written for specific environments and architectures.

The internet search giant is now on version 3 of its big data vision since the publication of that white paper, says Cory O’Connor (pictured), Cloud BigTable product manager: “2002 to 2004 was the big bang of big data; this was when Google wrote its white papers on MapReduce.

Storage costs

He adds: “Data is growing. The market data today will require 10 times the amount of computing and 10 times the amount of storage. At some point you cannot build bigger, you have to adopt the paradigm of commodity hardware and scaling horizontally. This was the premise behind NoSQL, which is able to scale out very effectively.”

Google fundamentally rethought the practice of building bigger machines to solve these problems. We only build using commodity machines and we assume systems will fail

Cory O’Connor, product manager for Cloud BigTable

Given the cost of storing greater and greater amounts of data and the way it is deployed, he says, “it looks like it won’t be economical to maintain storage using traditional procurement”.

The question for large enterprises is whether investing in something like 1PB of enterprise storage is as reliable as 1,000 1TB commodity discs.

But Google, which started 20 years ago, runs arguably one of the world’s biggest databases, and it is all based on commodity storage. The technologies it uses internally are now available as external cloud services such as BigQuery for scalable analytics, the cloud Pub/Sub streaming data pipeline, Cloud DataFlow for streamed processing and now Cloud BigTable.

O’Connor says: “All scale, all are fully managed, and all are world class and version 3, from when the white papers were released.”

Lowering the technical barriers

With Cloud BigTable Google is attempting to lower the education barrier, by building in many of the services Hadoop users previously had to develop themselves. Going after existing users of Hadoop represents the low-hanging fruit.

What about enterprises running relational database applications?

O’Connor says: “BigTable is a big step for people who have not experienced NoSQL.” He expects such organisations will need to reachitect their applications. Start building new projects using Cloud BigTable, he adds.

Executive interview: Google's big data journey

A programming concept developed in 1958 inspired the seminal Google whitepaper that introduced the world to MapReduce

Read more on Hadoop

Storage costs

Lowering the technical barriers

Read more on Big data analytics

Top 35 big data interview questions with answers for 2025

Gambling cloud provider bets on Nutanix and cools on VMware

CIO interview: Steve O’Connor, Aston Martin

Number of girls taking GCSE computing grows three years in a row