If Big Data is the answer, what was the question?

I have just read an article in the Register that puts the current promotion of “Big Data” into perspective: Why its not always that big, nor even that clever .

When I left ICL and joined the Wellcome Foundation as a Corporate Planner in the late 70s my perspective on computing changed. It was not just that I was now with a user, until I went to Business School I had been with the in-house user operations of STC and ICL. It was that I ceased to believe that the world revolved around general purpose programmable mainframes, operating systems and databases.

Wellcome had three broad types of processing engine: office systems (from word processing through sales and accounting to management information), production systems (from process control to planning and scheduling) and research systems  The production systems used considerably more computer power than the office systems to crunch considerably more data. But a single research programme might use more raw computer power to crunch more raw data than our UK office and production systems added together. One of my side tasks was to deter research scientists wasting time trying to word process reports the margins of their DEC 10s to write reports when they or their secretaries could do this more efficiently using Rank Xerox or Wang machines, once we had worked out how to swing text files between DEC, IBM, Wang and Xerox mail boxes.

Dave Manl points out that the world today is not all that different (with the attitudes of Big Data and Cloud advocates remarkably similar to those of mainframe salesmen in the 70s), save that players like Google are trying to crunch large volumes of unstructured personal information from a variety of sources to extract information of value to advertisers. Meanwhile research labs and city institutions are still looking for patterns in much bigger files of data and flows of transactions than most of the organisations whose staff attend events on “Big Data” will ever see. And the public sector is still failing to make effective use of data matching and analysis techniques and tools that have been around for decades.

Some of the reasons for the lack of “real” progress were summarised by EURIM, now the Digital Policy Alliance, in a report on why Data becomes a toxic liability unless you actively value and manage it as a strategic asset. The one by A4 “politicians” crib sheet should be used by all Directors whose CIO comes to them to propose a Big Data solution. EURIM subsequently produced a report on the need to address the skills necessary to improve the quality of the data and understanding of the analytical disciplines (not just the tools) necessary if we are to refine gold from heterogeneous files of source data (from outright lies, through that entered to get a service, regardless of whether it was accurate, to that derived from actual transactions) and not be overcome by the misleading fumes from vats of slurry.