Is data science a science?

Imperial College, London has officially launched its Data Science Institute, announced last year. And the government has announced £42 funding for the Alan Turing Institute, location to be decided.

Data Science is, then, officially in vogue. Not just the pet name for data analytics at Silicon Valley companies, like Google, LinkedIn, Twitter, and the rest, but anointed as a ‘science’.

Imperial College is doing a great deal with data, for its science, already: from the crystallisation of biological molecules for x-ray crystallography, though the hunt for dark matter to the development of an ovarian cancer database. And much else besides.

What will make the college’s Data Science Institute more than the sum of these parts? I asked this question of Professor David Gann, chairman of the research board at Imperial’s new institute. His response was: “Imperial College specialises in science, engineering and medicine, and also has a business school. In each of those areas we have large scale activities: largest medical school in Europe, largest engineering school in the world. And we are a top ten player in the university world globally.

“So you would expect us to be doing a lot with data. As for our developing something that is more than the sum of the parts, I would say we genuinely mean that there is a new science about how we understand data. We are going to take a slice through the [current] use of large data sets in incumbent fields of science, engineering, medicine, and business to create a new science that stands on its own two feet in terms of analytics, visualisation, and modelling. That will take us some time to get right: three to five years”.

Founding director of the Institute Professor Yike Guo added: “creating value out of data is key, too. Our approach at Imperial is already multi-disciplinary, with the individual fields of study as elements of a larger chemistry, which is data”.

I put the same question to Duncan Ross, director of data science, Teradata at the vendor’s recent ‘Universe’ conference in Prague. Duncan made the traditional scientist’s joke that if you have to put the word ‘science’ at the end of a noun, then you don’t really have science. He then went on to say: “There is an element of taking a scientific approach to data which is worth striving for. But, Bayes Theorem of 1763 is hardly new, it is just that we now have the computing technology to go with it”.

At the same event, Tim Harford, the ‘undercover economist‘ who presents Radio 4’s More or Less programme, ventured this take on the data science concept: “It [the data science role] seems like a cool new combination of computer science and statistics. But there is no point in hiring an elite team of data geeks who are brilliant but who no one in management understands or takes seriously”.

There was a time when computer science was not considered to be a science, or at least not much of one. And, arguably, it is more about ‘technology’ and ‘engineering’ than it is about fundamental science. Can the same be said of ‘data science’? The easy thing to say is that it does not matter. Perhaps an interesting test would be how many IT professionals would want their children to graduate in Data Science in preference to Mathematics, Physics, or, indeed, History, Law or PPE?

Moreover, do we want scientists and managers who are data savvy or do we need a new breed of data scientist – part statistician, part computer programmer, part business analyst, part communications specialist? Again, it is easy to say: “we want both”, when investment choices will always have to be made.

As for the Alan Turing Institute, David Gann at Imperial told me: “As you can imagine, we would be interested, but the process is just starting. Other good universities would say the same”.

If any institution has a decent shot of forging a new discipline (shall we just call it that?) of data science, it is Imperial College, London. That said, King’s College, Cambridge and the University of Manchester might well have a word or two to say about the eventual location of the Alan Turing Institute.