Smart fraud detection needs a fresh data approach

In a guest blogpost, Emil Eifrem of Neo Technology says graph databases detect sophisticated scams and fraud rings in real time

Before PayPal came along, online commerce was fraught with security problems. But how did PayPal and its like solve the problem?

By being able to mount a real-time view of its entire payment network, due to a new way of visualising data and complex networks of connections between users. More precisely, it did this by using the emerging alternative to SQL database technology, graph technology. Years later, it’s time to take graphs on a step, and embed them into fraud detection systems.

Unlike most other ways of looking at data, graph databases are designed to exploit relationships in data, which means they can uncover patterns difficult to detect using traditional representations such as tables. And although developed in-house by the big social web giants (Google, for instance, using graphs, exploited the connections in Web documents to rank search results, namely the ‘Google algorithm’) now these technologies that it took many engineers-hours to construct are available to the wider market. Forrester says over a quarter of enterprises will be using such databases by 2017for instance, while Gartner believes that over 70% of leading companies will be piloting a graph database by 2018.

As a result, an increasing number of enterprises, from banks to ecommerce firms, are using graphs to solve a variety of complicated data problems in real time, including the speedy detection of fraudulent activity.

 Varieties of online hoodwinking

There are various types of fraud – first-party, insurance, and e-commerce fraud, etc. But what they all have in common are layers of deceit. Traditional technologies, while still suitable for certain types of prevention, are simply not designed to detect these layers, which are only really visible by spotting patterns in data and relationships. Graph databases, in contrast, through connected analysis, provide a unique ability to uncover a variety of important fraud patterns, and in real time.

First party fraud is a good example of how graph technology can make a difference, as the complexity of the relationships is what makes these schemes so damaging. Banks lose tens of billions of pounds annually from this form of deception; experts suggest as much as 20% of unsecured bad debt at leading US and European banks is due to this form of opportunistic crime.

However it’s the network of relationships powering this that makes the fraud ring vulnerable to graph-based methods of detection. First-party fraud involves the fraudsters opening bank accounts, taking out loans, credits cards and overdrafts. They initially behave like legitimate customers until the moment they clean out all their accounts and disappear. Collections processes kick in but these account thieves are long gone, repeating the process elsewhere.

A fraud ring like this usually involves two or more people sharing a subset of legitimate contact information to create a series of false identities. In the case of two individuals, sharing only a phone number and address (two pieces of data), they can create four false identities with fake names, each with four to five accounts – a total of 18 accounts. Assuming an average of £4,000 in credit exposure per account, the bank’s loss could be £72,000, perhaps more. The potential loss in a ten-person fraud is no less than £1.5m, assuming 100 false identities and three financial instruments per identity with a £5,000 credit limit, and so on.

To meet the challenge, Gartner has proposed a layered model for fraud prevention that starts with simple discrete methods but which progresses to more elaborate types of analysis, specifically, Entity Link Analysis that leverages connected data. This is another way of saying, look at the relationship patterns – which by definition, is a form of analysis graph databases excel at.

Discrete data is hard to work with

Banks’ standard instruments for dealing with fraud, such as a monitoring for deviation from normal purchasing patterns, is all about discrete data, rather than looking at the bigger network of relationships. Discrete data picks up sole fraudsters, well enough. But it can’t as easily detect the shared characteristics that typify fraud rings (collectives working often cross-border, even cross-continent). What’s more, such methods tend to issue false positives, which harm customer relationships.

The problem bedevils traditional relational database approaches, because as they can only really model data as a set of tables and columns, carrying out complex joins and self-joins when the dataset becomes more inter-related is just messy and painful. Such queries are technically tricky to construct and expensive to run, and making them work in synchronous time is problematic, with performance faltering as the total data set size increases.

Graphs are your stepping stone to ‘in-flight’ fraud blocking

Graph databases, by contrast, have none of these issues. Used with modern data query languages like Cypher, they offer a simple semantic for detecting fraud rings and navigating the data connections in-memory and in real time. That makes spotting the connections between fraudsters and their activities far more straightforward, potentially before anything untoward taking place. And as business processes become faster and increasingly more automated the window we have to detect fraud is shrinking too.

That makes the need for real-time, in-flight fraud blockage all the more important. Graph databases provide a unique ability to uncover a variety of important fraud patterns, in real time, and are a major step in the right direction to do just that. The verdict has to be, take a leaf out of the social web giants’ book and look at this great data infrastructure alternative to working better with complex data.