This is a guest blog by Yves de Montcheuil, Vice President of Marketing at Talend
When big data was still in its infancy – or rather, before the term was minted – a population of statisticians with advanced analytical expertise would dominate the data research field. Sometimes called “quants” (short for “quantitative analysts”), these individuals had the skills to tackle a mountain of data and find the proverbial needle. Or rather, the path to that needle – so that such a path, once identified and handed over to skilled programmers, could be turned into a repeatable, operational algorithm.
Challenges facing quants were multiple. Gathering and accessing the data was the first one: often, the only data available was the data already known in advance to be useful. In order to test a theory, the quant would need to obtain access to unusual or unexpected sources, assuming these were available at all. Digging, drilling and sifting through all this data with powerful but intricate statistical languages was another issue. And then, of course, once a quant had found the gold nugget, operationalising the algorithms to repeat this finding would require another, very different set of skills. Not only would quants command sky-high compensation packages, but they also needed a full-scale support system, from databases and IT infrastructure, to downstream programmers for operationalisation.
The coming of age of big data has seen a reshuffling of the cards. Nowadays, many an organisation does collect and store any data it produces, even if its use is not immediately relevant. This is enabled by a dramatic plunge in the cost of storing and processing data – thanks to Hadoop, which decreases the cost per terabytes by a factor of fifty. Navigating, viewing and parsing data stored in Hadoop is made intuitive and fast by the combination of next generation data visualisation tools, and the advent of new so-called “data preparation” or “data wrangling” technologies – while still in their infancy, these provide and Excel-like intuitive interface to sift through data. And the latest advances in Hadoop make the operationalisation of big data glimmer on the now-not-so-distant horizon.
These technology shifts have made it a lot simpler to harvest the value of data. Quants are being replaced by a new population: the data scientists. A few years ago, there used to be a joke that said that a “data scientist” was actually how a business analyst living in California was known. This is no longer true. Data scientists now live and work in Wall Street and in the City of London, in the car factories in Detroit and Munich, in the apparel districts of Madrid and Paris.
But simpler does not mean easy. True, the data scientist works without the complex support system that the quant required, and uses tools that have a much steeper learning curve. But the data scientist still needs to know what to look for. The data scientist is an expert in his industry and domain. He knows where to find the data, what it means, and how his organisation can optimise processes, reduce costs, increase customer value. And more importantly, the data scientist has a feel for data: structured, semi-structured, unstructured, with or without metadata, he thrives when handed a convoluted data set.
There are still very few data scientists out there. Few universities train them: whereas one can get a Masters Degree in statistics in almost any country in the world, the few data science courses that exist are mostly delivered in California. And while big data technologies are becoming more and more pervasive, few people can invoke years of experience and show proven returns on big data projects.
Today, as an industry, we are only scratching the surface of the potential of big data. Data scientists hold the keys to that potential. They are the new statisticians. They are the new quants.
About the author
Yves de Montcheuil
Yves de Montcheuil is the Vice President of Marketing at Talend, which does open source integration. Yves holds a master’s degree in electrical engineering and computer science and has 20 years of experience in software product management, product marketing and corporate marketing. He is also a presenter, author, blogger, social media enthusiast, and can be followed on Twitter: @ydemontcheuil.