Millions of lives blighted by toxic Big Data

Big Data and Open Data are fashionable. Collecting and collating large volumes of data for some-one (medical researchers, the security services or sometimes even the general public) to share, search or analyse, will help us find the answer – whatever the problem. Yesterday at an excellent ISSA conference I listened to Eddie Shwartz (VP and CISO of RSA) talk about combatting furture cyber threats with Big Data transformation. He used the Wikipaedia definition of Big Data and began, after commending Nate Silvers‘ book. The Signal and the Noise, with a few cautions which I summarise and interpret below:

  • Big Data without an architectural strategy and roadmap is stupid
  • Big Data without intelligence (in all its meanings) is stupid
  • Big Data without analytics (including skilled people to use the tools)
  • Big Data without committed multi-year funding is stupid

I note that HMG plans to fund research into Big Data and there is most timely e-Skills-SAS report on the skills that will be needed by a predicted 28,000 Big Data technical staff. Eddie, however, made the point that making sense of Big Data requires a cadre of highly skilled  quants to focus the efforts of those involved. The rcent banking crisis was, however, brought to use by the undisciplined relience on quants, without understanding what they do and the meaning of what they had produce. The quants analysed what was happening . Some of their clients used their analyses to make fortunes with their own money. Others used them to sell complex gambling packages (alias derivative based products under a variety of guises, from packages of toxic mortgages to “insurance” against interest rate rises) to naive customers and cost the latter (and the banks, taxpayers and us) even bigger more.

My core point is that “Big Data” , however it is used, is lethal in the hands of those who do not understand its provenance and the meaning behind the analysis. Thirty years ago, (after the conclusion of the Water Industry Change Programme), ICL did not know what to do with me and I was “parked” for the best part of a year in what was left of the old English Electric Management Science Team. My job title was “Public Sector Financial Modelling Consultant”.  I was the tyro acting as buffer between a handful of world-class statisticians (we did not use the word quant in those days) who had honed their skills in military operational research and intelligence. They politely bludgeoned into me, [with examples from military catastrophes, product failures and marketing blunders], the message that I should not believe the analysis, however statistically reliable, until I had not only worked out the likely causitive mechanism, but tested it.

This leads me to the message that much public sector “big data” is as vulnerable to systemic distortion as that which was used to set Libor. The reason is that the providers know it will be used to set targets and allocate resources, perhaps even to set their own pay and bonuses.

An obvious example is crime reporting, where the public no longer believe the figures because they know just how difficult it has become to report a crime and how little will be done if you do. Therefore the proportion reported has fallen away sharply, even in locations where reporting does not render you and your family liable to reprisals. The Cabinet Office may boast of the value of Crime Maps as an example of the imaginative use of public sector data, but I recently looked at that for our local area. It was as the local estate agents would wish: almost crime free. Yet a busy century old bank branch was recently closed because staff would no longer work there and my wife expects an escort if she has to go shopping at that end of the high street in the late afternoon.

I do hope that, in its forward plans to look at some of the issues around making effective use of Big Data, the Digital Policy Alliance will also build on the work of EURIM Information Governance Group in looking at the issues of data quality from the original round table organised in co-operation with the Audit  Commission ,”Uncovering the truth: using information to deliver more for less”  to the most recent report in produced in co-operation with CILIP and the Consultation Institute “Improving the Evidence Base: the Quality of Information .

I will blog separately on the opportunities that insecure Big Data gives to organised crime. But in the mean time I find the routine use of Big Data technology by those tracking our on-line behaviour, (whether to “improve service”, target advertising, sell to others or …) even more chilling. I commend the TED Talk used by Gary Kovaks to launch the “Collusion” add-on to Firefox.  When I showed it to my son he reminded me that “If is free, YOU are the product“. Yesterday he sent me a link to an article on Silent Circle the latest attempt to help us fight back against the surveillance society. I sent him a link to Scrambls which enables you to encrypt your Facebook page so that tools like Graph Search cannot be used so easily by stalkers and fraudsters to track your conversations.

In conclusion – I support Big Data in much the same way as I support Nuclear Power. It can be a great force for good but …. 

Hence the headline.