Misinformation can be beautiful. Tim Harford (pictured), author of The Undercover Economist book series and presenter of Radio 4’s statistics More or Less programme, says 200 years of statistical science should not be cast aside for the blandishments of big data.
Harford, a senior columnist at the Financial Times, cautioned delegates at Teradata’s recent Universe conference in Prague against falling too easily for the “dazzle” of data visualisation. While infographics enthusiasts, such as David McCandless, posit that information is beautiful, Harford encourages his readers and listeners to dig beneath the surface.
His Prague talk took its inspiration from the zebra-like dazzle camouflage applied to allied ships in the First World War to evade torpedo attack from German submarines. Data visualisation often functions like that, he contends.
Harford, the son of a long-time reader of Computer Weekly, advises IT leaders to step back and think hard about the purposes of the data collection and analysis projects they undertake.
“It’s slightly presumptuous of me to suggest an answer here," he says. "But what I’d say is one should treat this kind of analysis as an experiment. What are you trying to learn? What are the costs if the project doesn’t achieve what you are hoping for? And what are the benefits if it does succeed?
"Some statistical projects have very low costs, and very high potential benefits, so you can afford misses. But if you are trying to spot random patterns you will often find those will evaporate as soon as you start to exploit them.
For more on data visualisation storytelling
“If you are a supermarket, you’ll want to ask questions like, ‘What’s the optimal pricing for a particular product?’ ‘Is it more effective to make sales a surprise or stick to a discounting strategy?’”
He also counsels companies against developing “elite teams of data geeks who are brilliant but who no one in management understands or takes seriously".
"Managers need to be data savvy enough to deal with an in-house team of data scientists or indeed an outsource supplier,” says Harford.
And he urges attention to the dark side of the much-hyped concept of big data.
“My concern is that we have a cluster of wonderful new data visualisation tools, but that does not allow us to just ignore 200 years of statistical lessons," he says. "I think some of the journalistic hype does that. I’m thinking of Chris Anderson’s famous Wired article – Big data and the end of theory – that argued that, in the petabyte age, theory is dead. Or New York Times reporter Charles Duhigg in his story about data analytics and teenage pregnancy at Target."
Duhigg is the first teller of the much-cited episode of a teenage girl being sent promotions based on her pregnancy, as disclosed by predictive analytics. US grocery retailer Target knew the girl was pregnant before her outraged father. Harford points out that this now famous story ignores false positives. How many other women were falsely, in their cases, identified as pregnant? Who knows?
Statistical problems that still matter, he says, include sampling bias, the multiple comparison problem – where you are testing so many hypotheses that some of them seem vindicated by chance – transparency, and false positives, as in the Target example.
All data analytics tells a story, and you need to ask yourself why you are being told that story
Tim Harford, author of The Undercover Economist
“The strange thing about that story,” he says, “is data analytics does not have to be so omniscient to be useful. If you double or triple your hit rate with customers that’s valuable. Increasing it by 10 million is not plausible and not necessary to be effective.”
Harford was educated at the same Oxford college as David Cameron (“and, in my defence, Michael Palin”), and graduated, like the prime minister, in philosophy, politics and economics.
But, as an economist, first and foremost, he bewails the lack of maths and science graduates in the houses of parliament. Drawing attention to Mark Henderson’s book, The Geek Manifesto, which totted up science, technology, engineering and mathematics (STEM) graduates in the commons and found there to be only a couple, he says: “I’m not saying scientists should run the country, but if even one in five of parliamentarians had science degrees we would be better governed."
“All data visualisation,” he says, “tells a story, and you need to ask yourself why you are being told that story. What are the motivations behind it?
“It is easy to become cynical about statistics, but they are the only way we can understand the modern world. And the reason why I am so passionate about debunking [their misuse] is that it is possible through statistical carelessness or deliberation to do real social damage.”