Analytics is not just about patterns in big data

This is a guest blog by Nick Clarke, Head of Analytics, Tessella

For most people, the term ‘data analytics’ conjures up images of Amazon, Facebook and Google: digitally born companies that gather huge data sets in order to better target products, adverts or services across a huge customer base.

When it comes to really transformational analytics projects, this type of business, using this type of data is the exception rather than the rule. However, because they are conspicuously successful and well known, they have become poster children for analytics. This is a blessing and a curse; they fly the flag for analytics as a disruptive creator of business value, but they distort the perception of what most analytics projects are about for most businesses.

Marketing-led companies like Amazon and Google mine huge data sets to deliver small changes across vast numbers of events – ie whether an individual buys a product or clicks a link. There are millions of such events, so a small percentage change delivers large returns. Furthermore they have the luxury of trial and error; if something doesn’t work, they try something else until the recommendations pay off.

There are many equally transformational “data driven” business challenges that take an almost opposite approach: using large volumes of data to accurately predict a small number of high risk, high consequence events. Usually the goal is to prevent them happening, eg jet engine failure or oil drilling problems that shuts off production for days.

The world has many more projects taking this approach to data – design of clinical drug trials, predictive maintenance of equipment, city infrastructure planning, fleet management – than it has Googles and Facebooks. You just don’t necessarily notice them.

They cannot afford thousands of failed experiments. They need to get it right first time. Failure to predict becomes costly, and in some cases could even put lives at risk. These projects need a different approach in terms of attitude and skills.

Possible correlations vs. understanding cause and effect

The difference between these two types of analytics project is the difference between spotting possible correlations in a sea of numbers and actually understanding cause and effect.

If you are selling books – a high frequency, low value, low business risk activity – an analytics programme which adds 1% to sales is great. It’s not important why the 1% bought. If a 10% price reduction leads to 50% more sales, that is good enough, you don’t need to know whether it’s because people can’t resist a deal, or because the book was originally overpriced. Your goal was increased sales and you achieved it. And if you hadn’t, well, you live and learn.

Conversely, maintaining a fleet of trains is a high value, high business risk activity. Knowing 1% of your rolling stock will develop a fault this month is not helpful; you need to know which 1% will be affected and what the fault will be. This is about using data to make very specific predictions.

This requires models to be developed which bake in an understanding of what the data means in different circumstances, not just that X might correlate with Y. If you are going to make a judgement call about whether to inspect an engine, leave it alone, or take it out of service entirely, based upon changes in oil condition and pressures, then you don’t want to base it on a punt that you’ve found a pattern in the data.

The rigour and techniques that go into the development and validation of such a model are very different. Context, meaning and interpretation become everything. Finding the existence of an interesting pattern in the data might be a beginning of a new model – but a lot of rigorous scientific method is then needed before it can become truly useful in this scenario.

Most businesses cannot act like Google

These differences in approach are not well understood by companies commissioning data projects, and so the difference in skills, approach and experience needed are often overlooked. This contributes to the unacceptably high failure rate of many data projects, which has dominated the business press of late. Too many assume that solutions to their problems can be found via automated pattern finding technology, without understanding how to bridge the gulf between a pattern and the appropriate response that will improve their business.

Data analytics has never been one size fits all. Most projects worth doing have to be more handcrafted, requiring a deep understanding of what really links the data to the underlying business challenge. If you want the insight from your data to deliver right first time for your business, you need to understand the right approach to take.