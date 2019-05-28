As the focus in cyber security is shifting from threat prevention alone to detection and response, data science is playing an increasingly important role, according to Joshua Neil, principal data scientist lead for Windows Defender Advanced Threat Protection at Microsoft.

“I have watched the evolution of data-driven methods applied to cyber security from its early days, and I am excited to be part of that revolution,” he told Computer Weekly, adding that this approach is gaining momentum and efficacy as all the necessary underlying technologies become available.

In the past few years, Neil said there has been rapid progress from heuristic approaches using Boolean logic to encode rigid rule sets to match well-known attack behaviour against threat data.

“But in those early days we were being defeated – because it was easy to move around these heuristics-based defences – but data science has introduced a more general approach that is much more difficult to avoid or innovate around the defences, because we are asking more general and behavioural questions.”

Data driven approaches to cyber security have come a long way, said Neil, since he started working in the field as a statistician at the US Los Alamos National Laboratory, where he led an investigation into using data-driven approaches in cyber security, focusing on lateral movement of adversaries inside targeted enterprises.

This type of activity typically came to light only after attacks had taken place through intensive manual effort and expensive forensic investigations.

“In the early 2000s, we realised that first and foremost, we needed visibility into the enterprise, but only now is the industry making available the tools and technologies that enable the collection of the high quality data required to find malicious activity in automated ways,” said Neil.

Anomaly detection Post-breach, anomaly detection is among the most successful applications of data science, said Neil. “Once attackers are inside the enterprise, they look like users. They are using valid credentials to access systems and data, and they are stealing that data using built in system tools, making it difficult to detect.” Anomaly detection, he said, uses self-learning models designed by data scientists to understand “normal” behaviour inside the enterprise. “This can be very high-resolution. Every user’s credential behaviour is modelled and the behaviour between every communicating computer on a network is modelled, creating hundreds of millions of models for any given enterprise to identify anomalous – and potentially malicious – activity. “Supervised machine learning pre-breach and anomaly detection or statistical methods post-breach are the two big areas where I think we have made the best contributions in terms of detection,” said Neil.