Sergej Khackimullin - Fotolia
A recent Forrester report found that only 22% of companies see a return on investment (ROI) from data science. Given the tantalising opportunities that data science, artificial intelligence (AI) and advanced analytics promise business, why do most initiatives fail?
For Harvinder Atwal, head of data strategy and advanced analytics at price comparison site MoneySuperMarket.com, part of the problem is that the mantra is wrong. He believes many business leaders have little idea of how to create value with data.
“The C-suite doesn’t understand data,” he says. “They understand the need to hoard data and hire data scientists – but then they think magic happens.”
Atwal thinks there is also a misconception in what business sees as the role of data scientists. For instance, although strong control of finance is clearly important in business, no one expects everyone to be a finance specialist. But in Atwal’s experience, there is an expectation in businesses that data scientists can share their expertise across the company. “Data scientists are expected to teach the organisation how to use data,” he says.
Then there is the AI silver bullet. “Marketing thinks AI will solve all their problems,” he says. Because data scientists are not cheap and hoarding data is expensive, this lack of understanding means that the money invested in data projects appears to flow in the wrong direction, he points out.
Atwal previously worked at Dunnhumby as insight director for Tesco Clubcard. Describing how MoneySuperMarket.com uses analytics, he says: “Our mission is to use our data to help customers save money.”
He says MoneySuperMarket.com captures more data about its customers than an average website. The information gleaned includes where they live, what they drive, where they go on holiday. The site also knows when a customer’s insurance is up for renewal or when their utility is about to switch from a discounted to a standard tariff.
Read more about DataOps
“We can save people £1,000 if they come to our site, but it requires machine learning for personalisation,” says Atwal.
Basically, MoneySuperMarket.com recommends products rather like how Amazon or Netflix recommends things. But people have very different attitudes about money: some are extremely cautious, while others may be more open to risk.
For MoneySuperMarket.com, this means that customers only see offers of products that are relevant to them and fit within their risk profile. The architectural changes and approach that the company has taken has allowed it to create 1,400 variant newsletters for its customers, which Atwal says has resulted in a “decent revenue uplift”.
Broken IT processes
For Atwal, creating the actual model is by far the easiest part of machine learning, but 90% of the conversation appears to be about this. He says the real problem for data scientists is that they have to do everything themselves – find data, clean the data, find the software and install it.
“IT is stuck in a 20th century operating model,” he says. “People don’t have access to data warehouses.”
This poses a real challenge for data scientists, says Atwal, who have to request data access from IT, negotiate with IT for the required compute resources, then wait for these resources to be provisioned. They may then need to install a query language.
“As a data scientist, you just want to use data as quick as possible,” says Atwal. In his experience, the rigmarole experienced by data scientists when doing their job means that they often choose to bypass IT and test, build and deploy data models on their own laptops.
But he does not believe this is the right approach either. “Data scientists will spend a lot of time building the perfect model on a laptop,” he says, but while it is being tuned on a laptop, the model is not finding real uses for the business – so it remains isolated.
Atwal believes data scientists should be able to get feedback when the model is deployed for real, to enable them to enhance it or build new data models based on real customer data.
Simplify the data architecture
When Atwal joined MoneySuperMarket.com in 2012, the company was deploying SAS to provide a single customer view. “We made a decision to move over to AWS [Amazon Web Services], but had data stores scattered all over the business,” he says.
Although it was relatively easy to move the website, migrating the data warehouse was very complex, says Atwal. The company had a multi-cloud strategy, which meant it was not possible to use any services specific to AWS. Instead, he says, MoneySuperMarket.com had to manage and deploy an open source software stack.
“We began building a stack in AWS with storage and analytics layers, deployed this in production, then built data products,” he says. But this was not an easy approach because it required database administrators, DevOps teams and agile data science. “We didn’t have the expertise,” he adds.
When it was time to migrate from SAS, MoneySuperMarket.com took the opportunity to run a proof of concept on GCP, using Google’s serverless software components, including Big Query, Kubernetes, Dataflow and TensorFlow.
This enabled the company to simplify its data architecture. Based on Google’s reference architecture, MoneySuperMarket.com was able to deploy serverless and software-as-a-service technology, which meant there was no infrastructure to manage, enabling the data science teams to concentrate on getting their work done on GCP, says Atwal.
Improving data science workflow
Atwal says most data scientists do not come from a software development background and do not understand software development best practices. To improve the investment that companies make in data science, he believes new data models need to be created more quickly. This requires data scientists to use agile collaboration and to apply lean thinking to data analytics, while adhering to data regulations and governance.
As in software development, where developers have used DevOps to produce code rapidly through regular iterations, automated tested is also needed. And like in software development, data scientists also need to ensure the data on which they base their model is correct, and that there is version control in place to ensure changes can be tracked, says Atwal.
These are among the main requirements in DataOps, which aims to speed up the process of building new data models that achieve measurable business results. Ideally, a data scientist would want to download a working environment and get to work straight away without having to configure everything in that environment. Containers give businesses a way to achieve this, says Atwal.
A number of so-called data science platforms are starting to emerge that support DataOps. Domino Data Lab is the one MoneySuperMarket.com has deployed. Atwal says it offers a way to provide self-service for its data scientists to work.
Atwal has spoken at a number of events about how MoneySuperMarket.com has rearchitected its data analytics. His presentation, which covers nine steps to transform data science and move organisations towards DataOps, draws on lean principles that Toyota used to optimise car manufacturing, and agile software development practices.
Ultimately, he says, data scientists need to be cognisant of business strategy. “Business has a hypothesis of what creates value,” he says. “Think about flow and how quickly you can get data into a product to get feedback from customers.”