alphaspirit - Fotolia

How to find a much sought-after data scientist

Every organisation seems to be hunting for a data scientist, but securing the right people with the right skills is a challenge

This article can also be found in the Premium Editorial Download: Computer Weekly: The nightmare driving test – making driverless cars safe

With many organisations putting data at the heart of their digitisation strategy, analyst firm Gartner predicts the market for data analytics is likely to exceed $100bn in the next few years, suggesting that organisations are driving adoption of initiatives where there is a strong need to invest in analytics.

Techniques for analytics can be relatively simple. “Finding something useful in your data is a starting point,” says TV presenter Hannah Fry, who is an associate professor in the mathematics of cities at UCL’s centre for advanced spatial analysis. “The tiniest clues can give us a prediction of what will happen a long time in the future,” she adds.

It is this ability to predict likely outcomes and derive hidden meaning based on analysis of existing datasets that is propelling data science into a modern day snake oil.

Gartner’s analysis of earnings statements from public companies shows a strong correlation between the use of the word “data” and reported business growth. The more an organisation can exploit the data it creates to make informed business decisions, the greater its potential to succeed.

Skills challenge

To use Gartner’s definition, the role of a data scientist is to create a shared vision of how data is used in an organisation. Such people tend to have an academic background – perhaps a PhD in mathematics – combined with strong programming skills. They are hard to find and expensive to hire. Gartner estimates that the average tenure of data scientists is just over three years, and the average level of experience is just two years.

Technology, growth and strategy adviser Mark Ridley has held a number of chief technology officer (CTO) positions in blue-chip organisations. In his experience, data science is a nascent field. He says it first took form by breaking out from the traditional database administration role, then evolved with data analysis and, subsequently, big data analytics, which, he says, requires data engineering skills.

“I hired my first data scientist in 2012, direct from academia. I saw the role of data scientist as being someone who has a whole lot of data and treats it as an experiment to prove a hypothesis,” says Ridley.

According to Iain Brown, head of data science at SAS, while organisations have tended to hire highly skilled mathematicians, statisticians and computer scientists for their data science roles, this approach has not always been a success. “Highly skilled people may not understand the needs of the business. I believe data science should start with the business,” he says.

Read more about data science

Allocating work by algorithm might have advantages for workers as well as employers. Can it be done fairly and with respect for “human capital”?

Discover the uses of Hadoop distributions and the first steps in evaluating these products, as well as how the merger of rivals Cloudera and Hortonworks affects the market.

Brown adds that organisations sometimes treat data science as unicorns, but there needs to be a business context. Since there is a high cost associated with data science, it must deliver value. “Where data science has succeeded is when there is close alignment. Then you get a fast turnaround of projects,” he says.

It is also difficult for a data scientist to gain domain expertise in two years.

According to Simon Blunn, vice-president at DataRobot EMEA, organisations often end up not using the skills of their data scientists effectively. “They may have a data scientist, but linking with all the right stakeholders is very difficult to achieve,” he says.

This is one of the key problem areas DataRobot aims to address. The tool is designed to enable more people in business to collaborate on the data model.

Insurance firm Liverpool Victoria (LV) has been using DataRobot for about a year. LV has a team of data scientists who are building machine learning using open source software tools that require coding and an understanding of different programming languages. Pardeep Bassi, head of data science at LV, says DataRobot removes the need for coding, allowing business analysts to collaborate on building data models. “DataRobot democratises analytics. It stops data analytics being restricted to people who can code. This increases the number of people who can build data  models,” he explains.

Bassi says LV builds its data team both by growing internal expertise and hiring external contractors. “We have a joint strategy to train internally and hire externally – [not looking for] people with all the skills needed, but people with the ability to learn,” he says. “We give employees opportunities to learn and develop their skills through a data science forum. This gets people interested. There is also a mentoring forum where key individuals who play with analytics tools in their own time are identified and supported.”

Centralising data science

LV’s data science team is a centralised function, whose role is to identify business areas where machine learning and advanced analytics can be applied. Tied to LV’s overall business strategy, it looks at how to enable business decisions to be made more accurately and more efficiently, such as examining how to improve the claims process. Problem areas are identified through workshops. “We will have a workshop with a business function and its domain experts, then we’ll prioritise outputs and look at how easy they are to implement,” says Bassi.

Similarly, venture building company Blenheim Chalcot (BC) has a centre of excellence for building out its data science expertise. BC uses Fospha, its data science-heavy business, as a training ground for the whole group.

Describing the approach BC takes, CEO Kate Newhouse says: “We have created a grow-your-own programme to invest in and foster brilliant data scientists to forge a career in this arena.” Newhouse says BC takes either graduates, including master’s and PhD graduates, or second jobbers, such as analysts and engineers, onto the programme and provides centralised training via Fospha, its centre of excellence for data science. “We partner with Imperial College London, but attract from a diverse range of academic institutions to try to build a diverse workforce and approach to complex problems,” she says.

In-house or external skills

While data science incurs high costs and requires a high level of expertise, Ridley says there are no defined outcomes. This raises the question of whether it is more cost-effective to hire a contractor or invest in a full-time data scientist. He advises organisations to frame their data science expectations. “If the expertise is needed for a short period of time or used irregularly, you cannot justify hiring a full-time employee,” he says.

In Ridley’s experience, many organisations rush to bring in talent and hire permanent data science teams before they understand what they really need. “At the experimental stage, how do you know you have the right data scientist for your business? Having a longer-term view may mean that organisations hire data science contractors for skunkworks initiatives to show what is possible, or when projects require specific expertise.

Given the difficulty organisations face in hiring data scientists, Gartner has found that the market for service providers to support organisations’ data analytics strategy is maturing. Jorgen Heizeberg, senior direct analyst at Gartner, says: “Vendors are planning to do more for you, such as building data foundations to create value.”

The analyst firm has also seen a shift from traditional per-hour-based costings to asset-based consulting where the service providers offer services and software. Some consulting firms are also creating their own big data platforms and developer workbenches to provide to organisations. “The convergence of software and services is very disruptive,” adds Heizeberg.

Lack of understanding

Since data scientists are not cheap and hoarding data is expensive, a lack of understanding in organisations means the money invested in data projects appears to flow in the wrong direction, according to Harvinder Atwal, head of analytics at “The C-suite doesn’t understand data,” he says. “They understand the need to hoard data and hire data scientists, but then they think magic happens.”

Rather than treat data science initiatives as projects, he recommends organisations run them as products and create a data science factory for developing new data products.

Branching out

This becomes more important as organisations reach beyond their own internal IT systems to cloud-based systems, and connect through application programming interfaces (APIs) to business partners’ systems. In this age of software as a service (SaaS), where organisations may have employee data in WorkDay or another SaaS-based HR system, they should use SalesForce, Microsoft Dynamics, SAP or Oracle in the cloud and connect across its supply chain with their business partner. Understanding the data flow, such as the customer’s journey, is more complex when that journey spans multiple external and cloud-based systems.

Ridley sees the emergence of an enterprise data architect role, analogous to an enterprise architect, who has the expertise and broad view of the whole data ecosystem, in terms of bringing all the disparate data sources together.

For SAS’s Brown, while a chief data officer may have a view and insight of all data sources, there needs to be a consideration both for the value that can be created, and how data can be brought to the surface to enable the data science people to derive business value from the data source. “But,” says Brown, “unfortunately data scientists are being left to their own devices.”

Read more on IT jobs and recruitment

Data Center
Data Management