Sergey Nivens - Fotolia
Graph databases are an 18th century concept with a host of modern applications.
Used for tasks as diverse as dating sites and fraud detection, graph technology works by looking at relationships, not just data. But the idea behind them – or, at least, their theoretical basis – is attributed to Swiss-born mathematician Leonhard Euler, in 1735.
For almost 300 years, graph theory remained a mostly academic pursuit. But graphs have turned out to be an ingenious way of dealing with large volumes of data, and especially complex relationships between data.
In recent years, technologists have taken graph theory and created the graph database, a type of database where connections, as well as data, are first class citizens.
By recording links between data, as well as data itself, graph-based systems can quickly mine information and identify trends, making them a powerful tool for real-time analytics, as well as for mapping social networks, supply chain patterns, or even crime waves.
As a graph database looks at connections and relationships – known as edges – it takes just minutes, or even seconds, to answer queries that might take days using a conventional database system.
Graph databases in action
According to Alan Duncan, a research director covering data and analytics at Gartner, graph technology is finding its way into a wider range of applications, in both the public and private sectors.
“Law enforcement investigators are using it to look at patterns in crime, and it is used in predictive policing,” says Duncan. “In banking, a fraud manager needs to identify fraud networks, and someone running a telecoms network needs to optimise call routing.”
These, he says, are core applications for graph database technology, where there are “complex relationships that have an influence on what you’re interested in”.
But so far, the largest applications of graph databases have been based on proprietary technologies, rather than commercially available or open source graph databases.
Twitter, Facebook and LinkedIn are all understood to use graph technology to identify connections between their users, as well as to produce information of use to advertisers. IBM also uses graph technologies as part of its Watson natural language computer.
But, although IBM is starting to commercialise Watson's technology, and Google released its Cayley open source graph database in 2014, the graph technologies used by the social networks are largely closed to outsiders – at least for now.
This means that although almost everyone with an internet connection is likely to use at least some graph database technology on a daily basis, the tool’s uptake in business remains limited and, often, experimental.
Gartner, for example, believes that only 1% to 5% of the target market for graph databases is using the technology, and that many of the graph database projects being carried out by businesses are at the experimental, or proof of concept, stage.
In fact, companies are more likely to be using graph tools through a specialist application – such as a fraud detection package – bought by a line of business, or a broad-brush analytics tool with graph capability. But, as the data volumes stored by organisations grow, and business analysts put more store on the relationships between data, that could change.
Data-driven decision making at Gamesys
“We drive an awful lot of decisions based on data,” says Toby O’Rourke, head of player services platforms at online gaming site Gamesys. “We’re always looking at ways to ingest more data and learn more about what’s going on, on our sites, in terms of the players and how they are playing.”
When Gamesys decided to develop a social network element to its site, the company chose graph databases because of their performance, but also because they were easy to implement.
“We had to store this social network somewhere, and it seemed such a good fit for a graph-like storage system,” says O'Rourke.
“The problem mapped really well onto the underlying technology. The fact that we could build a domain model within our Java application and it mapped almost directly into the data store without layers and layers of abstraction sped us up massively.”
This, O’Rourke says, was a significant benefit in an industry where time to market is critical.
Making connections between datasets
This is a common path for companies introducing graph databases, says Emil Eifrem, CEO of Neo Technologies, the company behind the Neo4j graph database used by Gamesys.
“As long as 10 to 15 years ago, we observed that web companies had data-based business models. But, although the data is valuable, so are the connections between the data. The value is in the connections between people, and that gave rise to Facebook,” he says.
“Then Google started to look at the links between websites and extracted that link graph and used it to rank sites. And we ran into tangible problems ourselves, in that it wasn’t easy to manage connections within the data. There was nothing really available.”
Eifrem claims that a Neo4j installation can be a thousand or even a million times quicker than a conventional relational database when it comes to looking at connections. And, he says, the idea of looking at connections between data is not as complex as it might at first seem.
“If you are building a system to manage an inventory of cars, you will build a system that stores cars, and their parts,” he says.
“You will have a database of windshields, wheels, steering wheels and so on. These are objects, but there are connections between them. It might be that these screws go here, or you can only model this part here with that one there. All the parts can be connected to other parts, but you can’t model that in a table-based database.”
Using graph technology allows manufacturers both to constrain choices of parts – for example for maintenance – but also enable them, for example so a dealer can give a potential buyer an up-to-date list of specification options for a new car.
Graph databases meet the need for speed
Similar tools are being used by online retailers for recommendation engines, and by social media firms to suggest new links.
CIOs can, of course, build recommendation systems around relational databases, but the time it takes to run queries means they might have to be done in batches, which risks the data being out of date. Graph systems are much faster.
“In today’s world you want to base decisions on the most up-to-date data,” says Michal Bachman, managing director of GraphAware, a consulting firm. “Rather than pre-computing recommendations overnight, you want to serve real-time information to users, based on the latest information.”
The response time from a graph-based system can be milliseconds, he says.
Graphs are logical and flexible
There is, though, another factor in favour of graph technology: ease of use. Perhaps surprisingly for a technology that can be complex to implement, part of the appeal of graphs is that the way they present the relationships between data points are relatively simple to understand.
“For people who are not technologists, you can draw circles and arrows on a board, and explain the business logic behind the graph system,” explains Bachman. “It is not limited to developers, or data scientists.”
Graph databases are also, he says, more flexible than traditional databases, as they can store a wider range of attributes than conventional systems and attach multiple attributes to one entry. A conventional database might struggle with describing an employee with two jobs, whereas graph databases handle this with ease.
Despite these advantages, however, as yet relatively few businesses are investing directly in graph databases. Instead, they are more likely to buy special-purpose tools that use graphs at their core, but hide the technology from users.
This is the approach taken by a number of fraud detection systems and social media monitoring packages. But, as the technology gains more mainstream appeal, more companies are likely to build graph technology directly, or invest in analysis tools with graph capabilities built in.
“So far adoption of graph technology has not been as fast as that of some others,” concedes Bachman. “But I do believe the ecosystem around graph databases will come together to make it a mainstream data platform.”
Case study: Toy maker Schleich traces materials
As a toy maker, German manufacturer Schleich puts great store on product safety. But, with a global supply chain, materials traceability was a growing challenge for the company, says Andreas Weber, Schleich’s vice-president of operations. It was a challenge that seemed right for graph database technology.
Schleich makes a popular range of model animal figures, including farm animals and horses. “We need a clear picture of what’s being used in the pigments, plastics and resins,” says Weber. The company needs detailed and up-to-date information on everything used in each model sold, for regulatory and reputational reasons.
Often, companies rely on Excel spreadsheets for materials traceability. Schleich had already moved beyond that, to an SQL database, but, according to Weber, the system had become too complex for classic SQL. Instead, the decision was made to start from scratch and build a new system, known within the company as Spims.
The company started working with the Neo4j graph database as a proof of concept.
The graph database takes the material information stored in the Schleich enterprise resource planning (ERP) system and links it to data from suppliers. “That gives us a thread through the data to the pigment level, where we do tests in our laboratory on an annual basis,” he says. The database is linked to a dashboard that provides status lights for each product, which shows that everything on the bill of materials has been tested and approved.
“You can click on a model and drill down into the raw materials in Germany, Africa and China,” says Weber. “We can see that everything is alright, or that something may be wrong. “It takes seconds to identify a model, and ask what is going on.”
The traceability system speeds up Schleich’s time to market, as the bill of materials information is ready as soon as a new model’s tool is complete and the tool is ready to launch. But the benefits go beyond compliance.
“The big benefit is that everyone is working with original data,” says Weber. “Before, it was Excel and copying and pasting data, with all the issues that causes. Now everything is stored centrally, can be accessed via a browser and is updated alongside our ERP system.”
Read more about graph databases
- Graph databases map relationships between entities in a network. They won’t replace conventional relational databases, but for harnessing the value of interconnectedness they mark a breakthrough.
- Read about NoSQL company DataStax’s 2015 acquisition of the Titan graph database distributor Aurelius.
- Journalists are using a combination of Neo4j’s graph database and data visualisation software Linkurious to interrogate the 11.5 million files leaked from Panama-based law firm Mossack Fonseca.
- Discover why Facebook and the NSA love graph databases.