CW500: Everything you need to know about big data in 15 minutes

Harvey Lewis, research director for Deloitte Analytics, discusses the big trends in big data.

We are all now unquestionably living in a world of "big data". But what I find most striking when I look at this world is how relatively few organisations realise that, under the ongoing digital onslaught, the primarily point-to-point and analogue foundations underpinning traditional businesses are beginning to crumble.

Every year, at about this time, I take a look back at the previous year in data. In 2011, for instance, according to research firm IDC, we humans created or replicated about 1.8 zettabytes (ZB) of data. Just a few years ago, we hadn’t even invented the word zettabyte. To put this quantity of data into perspective, if you were to convert it into text and print it as books, the resulting stack of books would stretch from the Earth to Pluto and back 10 times.

Much of this data is created by individuals using social networks to connect with other people and organisations in new and exciting ways. Every day, for instance, we send approximately 300 billion e-mails and share one billion items on Facebook; every minute we post 170,000 tweets to Twitter, 3,000 photos to Flickr and 48 hours of video to YouTube. This year, for the first time, humans will be outnumbered by mobile devices. By 2020, IDC forecasts that the digital universe will have grown nearly 20-fold to contain approximately 35ZB of data.

More from the CW500 Club on big data

Organisations are keen to tap into this data. The recent Wikibon report, for example, claimed that big data is a $5bn market today, and is expected to top $15bn by 2017 – a 58% compound annual growth rate.

Making sense of big data

Without getting too Rumsfeldian, there are three key things to note.

The first is characterised by something I like to call the reverse Hadron Collider effect.

The Large Hadron Collider, or LHC, in Cern, Switzerland, is the world’s largest machine. It works by smashing smaller and smaller particles together to reveal the fundamental particles that will hopefully help scientists explain the physical universe and everything in it. Recent activity has centred on identifying the elusive Higgs Boson. And, although the scientific returns are incalculable, when considered objectively, the LHC is a very costly, risky and time-consuming experiment.

In effect, we’re doing the same sort of thing with big data. With big data, of course, we already understand the fundamental particles – they are the bits and bytes of digital information; the ones and zeros that make up every data source. But now we’re mashing together bigger and bigger data sets in the hope that, through their collisions, data scientists will be able to explain the virtual universe and everything in it.

But this sort of experimental approach to big data strikes me as folly.

Computer Weekly 500 Club

The CW500 Club is a private members' club for senior IT professionals and leading industry figures. Membership is by invitation only and allows access to this dedicated online resource, the portal, coupled with a monthly networking event held at the BCS (Chartered Institute for IT), London.

That is not to say we should be ignoring big data. Not at all. I’m saying that mastery of data is a necessary but insufficient condition for success. What matters first is your objective, followed by a sense of how you will use the insights you extract from data to enable you to meet it.

Imagine if, in 1960, instead of announcing, “We choose to go to the moon before this decade is out”, President John F. Kennedy had appealed to the American people with, “We choose to build the world’s most powerful liquid-fuelled rocket before this decade is out.” Big data and the technology for analysing it are just the means to an end – they may well be the vitally important fuel and rocket, but they are not the ultimate, imagination-capturing destination.

Let me give you an example. One of the organisations we spoke to as part of our research – a very large, multinational organisation – has just kicked off a big data programme. When we asked it why and what it hoped to achieve, it said it was doing it because everyone was talking about big data – so it felt it should be doing something too.

In today’s big data race, organisations are in a fierce contest to make sense of big data. But in their haste, they have failed to recognise that different sources of data can have dramatically different meaning, value and quality depending on their context and the type of analysis applied. Here, Menger’s Law on modelling success applies: “If you torture the data sufficiently, it will confess to almost anything.”

Big data does not automatically deliver big value. CIOs must get closer to the business to understand the organisation's strategic objectives and to play a role in achieving them

Harvey Lewis, Deloitte Analytics

Bluntly, the message for CIOs is that big data does not automatically deliver big value. CIOs must get closer to the business to understand the organisation’s strategic objectives and to play a role in achieving them.

Enrico Bombieri, a celebrated mathematician from Princeton University, once said: “When things get too complicated, it sometimes makes sense to stop and wonder: Have I asked the right question?” In their rush to make sense of big data, sometimes organisations haven’t even thought to ask a question. 

Bill Stensrud, the wealthy American investor and venture capitalist, goes further: “If you know what questions you’re asking of the data, you may be able to work with a 2% sample of the whole data set. If you don’t know what questions you’re asking, reducing it down to 2% means that you discard all the noise that could be important information.” 

To my mind, it makes sense to start asking questions before diving into big data. Start with the 2%, not the 100%.

The personal nature of data

The second key point I’d like to raise from our research is the personalisation of the big data world. According to IDC, approximately 70% of all data is created by individuals – their photos, blog posts, music playlists, purchases, facebook posts, tweets, skype calls, online bank transactions, and so on. Ultimately, though, 85% of this data is managed by organisations. Of the remaining data, a large proportion of it is collected by organisations about the individuals.

This means that most of the big data we’re talking about processing is not dry, meaningless organisation data – it’s data about people. 

Many of the organisations we talk to realise the importance of this – in two respects. 

First, that this kind of data potentially offers them new and detailed insight into their customers; insight that they can use to deliver better, more differentiated services and to improve the customer experience. One of the most talked about examples of this is the use of social media analytics – analysis of information on social networking sites and tweets, for instance, to create a better understanding of customer behaviour. Second, that they can use the data they hold to empower their customers. Data gives people evidence and it gives them choice. 

It is these qualities that are making the regulators also sit up and take notice. 

Take my detailed mobile usage data: why can’t I request that data from my current mobile provider and then present it to a different provider to get a better price? Taking that one step further, Which, the consumer group, announced an initiative two weeks ago to use collective consumer purchasing power to negotiate better deals with energy providers. 

People are starting to realise that all this data about them – data that in many senses belongs to them, even if it isn’t created by them – has value to organisations.

The personalisation of data is beginning to shift the balance of power away from the organisation and back to the individual.

Rising public awareness of the value of data

And this brings me neatly onto the third key point from our research – the point that has perhaps the most profound impact for big data: the great public awakening.

In the past, although individuals were creating lots of data, surveys have shown that they remained relatively unaware of its use. The more we talk about big data, however, and the more that citizens and customers see and feel the effects of big data on their lives, the more aware they will become. And once aware, they will have something to say about it. Our research shows that this growing public awareness is generating something akin to fear among many organisations.

Some, for instance, are worried about a backlash from the public if something should go wrong with the handling or processing of this data – and the rising tide of personal data breaches demonstrates that as the volume of data rises, so too does this risk. Some are worried about the unintended consequences – when data is used in ways that no one thought of when plans were first being made. Others are worried about the increasing granularity of insight into customer behaviour, and what this might lead to.

What if, for instance, using more granular data about individuals, we could predict who would be most likely to commit a crime? And what do we do about these people? It’s the Minority Report, for those of you who’ve read the book or seen the film starring Tom Cruise. Do we arrest these people before they’ve committed their crime? What about when we get our predictions wrong?

And if this sounds far-fetched, consider the data science competition website, which last year launched a competition, sponsored by the US Department of Health, where entrants are asked to use the department’s anonymised data to develop models to predict which patients are most likely to be admitted to hospital in the next 18 months. The federal government’s desire is to better forecast loads on hospital resources, but what impact will this scheme have on patient insurance, particularly for those predicted to be admitted?

In the future, it will no longer be enough to say just because we have the big data, we can store it, analyse it, derive insight from it and exploit it. In the future, we will also have to ask: should we?

Harvey Lewis, Deloitte Analytics

We’re all concerned about protecting the interests of vulnerable people, but just how far does this concern go? Look at pricing optimisation – one of the mainstays of data analytics. We use more granular insight into customer behaviour, lifestyle and propensity to target offers we think are more likely to be taken up. But where do you draw the line? When does someone go from being not vulnerable to vulnerable in the context of targeted marketing? Is someone vulnerable just because they are more likely to take up an offer?

Balancing risk and reward

Responsibility and ethics are hugely important. With the great public awakening, organisations have got to quickly become accustomed to a world in which the boundaries of big data analysis are not set just by the relevant legislative and regulatory frameworks, but also by a set of greater responsibilities to the people represented by the data. In the future, it will no longer be enough to say just because we have the big data, we can store it, analyse it, derive insight from it and exploit it. In the future, we will also have to ask: should we?

For many organisations, this will create an impossible dilemma. For others, though, this will help them to focus on the benefits and rewards they can offer to individuals through big data. It will shift the debate, which has for too long centred solely on the risks of data analysis to a more open trade-off – where individuals will accept the risks on the understanding that they will get more utility from the data analysis. The greater the risk, the greater the reward.

So, in conclusion, big data is only going to get bigger and more complex. But we shouldn’t become fixated upon the bigness. We should think carefully about the questions we want to ask of big data and the destination we’re heading for. 

And we should definitely think about the people represented by the data, for it is their digital lives we hold in our hands. We can give power back to them through big data, and we can do so responsibly.

This article is an edited version of Harvey Lewis' presentation at the Computer Weekly 500 Club in February 2012.

Harvey Lewis is research director at Deloitte Analytics. He leads a major programme of research in collaboration with Nigel Shadbolt, professor of artificial intelligence and head of the web and internet science group at the University of Southampton, the government’s information advisor and director of the Open Data Institute.


Read more on CW500 and IT leadership skills