morganimation - Fotolia

CIO interview: Giles Pavey, chief data scientist, Dunnhumby

Analytics pioneer Dunnhumby is undertaking a large reskilling exercise and investing in new technologies to cater for increasingly sophisticated client requirements around data

Dunnhumby, the Tesco-owned analytics firm, has a range of products and services based on leading-edge technology to help its clients – mainly retailers – to understand their customers better, enhance their offerings and therefore sell more.  

The company’s data science group is now looking to evolve its current set-up and train people to take client needs and technological advances into account across what it does currently, in the medium term and in the future, according to Dunnhumby’s chief data scientist, Giles Pavey. 

“For example, when we put in our price optimisation software, we had to do a special customisation for it – that’s the kind of work we do for ‘now’,” says Pavey. “The work we do for ‘next’ is really thinking about the product lifecycle of each of our products and the science within it. 

“We are thinking about how to improve the science, how to put in new features or improve the algorithms. And then the last piece of the jigsaw is that we do ‘future’ work – this is work where the team are investigating emerging areas of technology or algorithms.”

Pavey says the data company already uses technology such as machine learning in many of its products, but it needs to keep up with new technologies that are relevant to its data analytics activities, including deep learning. 

The company is using machine learning in its work on human text, using natural language processing to understand what people are commenting on to see how it can get insights into that.   

“I think we first came across multi-layered neural networks in the late 1990s, but in those days, there really wasn’t enough data, and there definitely wasn’t enough computing power to be able to do anything other than very toy examples,” says Pavey. 

“But it was really only in the last 18 months or so that both the data and the computing power have become available to be able to really use deep learning.”

Agent-based models

One area where Dunnhumby is experimenting in terms of research and building new products for the future is what Pavey calls “agent-based models”. Here, rather than machine learning, the goal is to use simulation techniques of segments such as the UK grocery market. 

In this scenario, “agents” are simulations of the customer within a massive computer model, where customers are given characteristics so they act in a probabilistic manner. If, for example, there was an agent that represented an upmarket customer, they would have a high probability of going to a Waitrose store, compared with a more price-sensitive customer, who would have a higher probability of going to an Asda or Lidl store.  

“It’s a little bit like if you imagine a Call of Duty-like computer game,” says Pavey. “The computer has to control all the characters and they have to react to both the environment and then also to each other and to the player. 

“So we have built a model of the UK grocery market, which has up to a million agents in it. Each of these agents is given characteristics and they live in a certain part of the country and we know the characteristics of that part of the country. 

“And then it’s a model to simulate whether, on any given day, do they turn left from their door and go to a Tesco, or right to go to an Asda, or turn left and go past the Tesco and go to an Aldi?”

If the agent model is something for the future, how does that compare with Dunnhumby’s existing set-up to deliver similar results? According to Pavey, all the existing predictive techniques – and even, to some extent, machine learning – rely on the fact that their predictions can never really predict non-linear events.   

“You would never see [the existing techniques] predicting either a sudden increase or a sudden decrease in something,” he says. “What the agent-based model does is allow you to investigate scenarios that have never occurred before.”

“What the agent-based model does is allow you to investigate scenarios that have never occurred before”

Giles Pavey, Dunnhumby

To illustrate how this could be brought to life, Pavey gives the example of a model that could be built around how people buy drinks, which could be used to investigate how a sugar tax would affect the market. 

If there was a big tax on sugar, that would change the price and a change in demand would occur, but some non-linear effects would also be likely. For example, it might become very unfashionable to drink sugary drinks, schools might ban them, or some supermarkets could either vastly reduce the drinks’ distribution or even stop selling them. 

“You wouldn’t be able to investigate those kind of outcomes from a classical model that has within it an assumption that the future has to be a combination of the past,” says Pavey. 

“Whereas the simulations from an agent-based model can change the environment, then you can just watch how things develop. It is also very good at investigating non-linear or non-observed behaviours.”

New computing techniques 

As well as new ways to analyse data, Dunnhumby is also looking into the future of the technology required to process data. It is also researching new computing techniques, such as non-CPUs, or field gate programmable arrays, says Pavey. 

“We are very interested in not only the power of the technique, but also how quickly it can run,” he says. “Obviously we want things to be able to run in real time often.”

There are some techniques that are very intensive to build that turned out to be very fast to run, he says. For example, a neural network can take a long time to build, but once it’s built, it runs “incredibly fast”, he adds.

“This whole balance between the effectiveness of the model, build time and then the run time, it’s a real edge of discovery for us in terms of we’re always trying to push that boundary.”

Pavey says Dunnhumby is in the final stages of that particular body of work around enhancing the speed of data processing. The company also takes part in joint development projects with Oxford University, University College London and Imperial College in the UK, as well as Chicago University and UCLA in the US. 

“The universities are a gateway for us to make sure we are working with world experts in keeping the best of the latest techniques,” he says. 

The technology estate 

Dunnhumby is a big user of open source software and most of its research work is done either with R software or Python. It also uses C, Spark and Hadoop for many of its products. 

Because it works with other companies’ data as well as Tesco’s, the company is moving from hosting data in its own servers towards cloud computing, which is crucial for scalability. 

This whole balance between the effectiveness of the model, build time and then the run time, it’s a real edge of discovery for us in terms of we’re always trying to push that boundary
Giles Pavey, Dunnhumby

“Scalability is obviously important for having more storage, but also many of our techniques are very computationally intensive, so we will also use cloud computing to burst CPU,” says Pavey. 

“You just use a lot of CPU in a short period of time to build a difficult model that can then be implemented once we’ve developed it.”

Dunnhumby uses different models to link up with its customers. It sells software delivered the traditional way, as well as products under the software-as-a-service (SaaS) model. The company has a number of application programming interfaces (APIs) that link up to clients’ digital services platforms. 

The company also offers automated reporting and dashboards, and acts as a consultancy. 

“Sometimes people use the analytic software and then the output to a client is a set of recommendations in a presentation deck,” says Pavey. “So we do things both in real time and in a much more batch way.”

Reskilling the workforce

The analytics company currently has 30 people working in its central data science team, but is in the process of training up 300 staff to join the unit. The idea is to give data analysts the tools and skills they need to become data scientists. 

“It’s a large undertaking,” says Pavey. “And it’s not just a training thing – it is very much about allowing people to transition.”

The process will take place over the next 18 months and, according to the CDO, staff are really excited about the change. 

“Probably the people we’ve hired who have come out of university in the last five years are far more familiar with the use of open source, so they are very excited about that,” he says. 

According to Pavey, the shift is also significant because the data science group is moving to more modern approaches, which apply to more exciting work. The executive maintains that there will always be a place for reporting, but there will be a lot more focus on predictive work. 

“Historically, we’ve been focused on reporting and now we’re really leading the way on prediction,” he says. “Likewise, the business has done very well by analysing structured data, but we’re now adding in insights from unstructured data.”

Moving away from legacy

You might imagine there are no challenges in advancing the capabilities of a company whose bread and butter is data analytics, but Pavey says the issues involved in moving away from legacy models and introducing change are also true at Dunnhumby. 

“For all the advantages of all of these new things, there are also some negatives of some changes, for example open source. You rely on the community and yourselves to solve problems – there is no supplier help desk,” says Pavey. 

“We want to not only bring in the new skills and the new tools, but also just this new mindset of far more agility, both in the agile development sense and also in the sense of far more responsiveness and pivoting of projects, the need to progress quicker and be not afraid to kill projects that aren’t working,” he adds. 

Pavey says much future advancement will come from transferring techniques. “It’s a lot about transferring techniques to make them available in lots of different scenarios and to give people the skills,” he says. “A lot of our clients don’t know what is possible, they don’t know that they’re not aware of what’s possible with machine learning. 

“So it’s an exciting challenge for us to show them what is possible, show them how we can now explore an uncertain future. We can predict what’s going to happen as machine learning allows you to do things in real time, whereas before you had to wait for the weekend to run.”

Read more CIO interviews

Pavey uses the analogy of having to “change the engines while the plane is flying” to describe the trickiest challenge Dunnhumby will face in the next 18 months as it retrains its workforce and enhances its toolkit to provide better products to clients. 

“We need to update all these things while maintaining a great service to our clients,” he says. “And then educating both our own wider workforce and then our clients to the benefits that data science brings.”

Pavey says new analytics technologies have proved far more powerful than had been predicted, and tend to demand more from users. This also means organisations are seeing far greater upsides in having skilled staff. 

“But the downside of that is that the tools can be very hard to use for people who are maybe not the highest flyers, so attracting and retaining the top talent is definitely a challenge,” he says. 

Dunnhumby puts a lot of effort into staff retention, says Pavey, focusing not only on tangible aspects such as the work environment and pay, but also on providing stimulating continuous development. 

“We try to build and maintain a really active learning community, so people feel like they’re progressing themselves,” he says. “And we really strongly push the community side of things, so people can support each other, which is extremely important.”

Read more on CW500 and IT leadership skills