sdecoret - stock.adobe.com
When Emil Eifrem, founder and CEO of Neo4j, was working for an enterprise content management startup in Sweden in the mid-2000s, he was struggling with the challenge of mapping relationships between files, folders and the people who owned all that content in a relational database.
On a flight to Mumbai, he picked up a napkin and drew what is known today as the property graph model, laying the foundation for Neo4j to become one of the biggest specialist graph database suppliers in the market.
About five years later in 2012, Yu Xu, a former Teradata programmer who was inspired by Google’s PageRank graph model that the search giant uses to rank search results, started TigerGraph to make graph databases easier to use and more scalable through a distributed model.
To be sure, building a new database platform from scratch is hard. Not only do graph databases have to support so-called Acid (atomicity, consistency, isolation and durability) transactions, but they have to scale across multiple machines and massive datasets.
“Acid transactions are really important for graph because if you write to two different nodes and the relationship between them, you had better be able to write that relationship in one transaction,” said Eifrem. “Otherwise, you have a dangling relationship, and you don’t want corrupted data.”
These relationships could be between people, entities and things like bank accounts, making graph databases suited for applications such as anti-money laundering and fraud detection. This has opened the door to some of the world’s largest financial institutions for TigerGraph and Neo4j.
Retailers, too, are using graph databases to improve product recommendations and fulfilment rates through sophisticated supply chain analyses. According to Gartner, graph technologies will be used in 80% of data and analytics innovations by 2025, up from 10% in 2021, facilitating rapid decision-making across the enterprise.
Merv Adrian, vice-president analyst on Gartner’s data management team that tracks developments in operational database management systems (DBMS), Apache Hadoop, Spark, non-relational DBMS and adjacent technologies, said the healthcare industry has also been a keen adopter of graph databases.
“There is so much about medical technology and the pharmaceutical business that is about understanding correlations and being able to look at large populations and find factors that improve outcomes,” he said, adding that uncovering correlations about cancers, for example, could help with early genome therapy.
Another example is logistics where companies are managing hundreds and thousands of nodes and touchpoints across the global supply chain. “Understanding all the possible correlations gets complex very quickly and the ability to rapidly find least-cost alternative or the fastest alternative routes can make an enormous difference in business outcomes,” said Adrian.
Besides financial services, healthcare and logistics, governments are also very interested in using graph technology to identify threats against their populations, he added. “It’s sort of the same problem as fraud, which in many ways is about finding bad actors. The same thing happens in the political environment as well.”
Merv Adrian, Gartner
Despite the rise in adoption of graph databases, most people don’t begin with the assumption that they need a specialist graph database to do graph analytics, said Adrian.
“It’s usually only when they’ve had some experience and finding that the applications or use cases that they are pursuing are complex enough or used by enough people at the same time that they realise it’s time to get into a specialist technology,” he said.
“That’s because you can pretty much do large graph analyses all by yourself on data that isn’t in a graph database. But if there are five more people doing it at the same time, everything is going to grind to a halt.”
Adrian noted that the difference between a graph database and other databases, such as multimodal databases from Oracle and IBM, is that the former stores relationships, whereas other databases don’t, which means relationships have to be built at runtime.
“That requires computation and takes time if it is complex,” he said. “And if several people are doing something like that at the same time, and they are doing different kinds of joints, you can see it’s almost a geometric explosion of complexity.
“With graph databases, the relationships are stored and managed, even if you’re doing multiple different analyses with different relationships.”
Still, the degree to which the specialist graph databases are taking off is being moderated by the addition of graph capabilities to other popular multimodal databases in the market.
“When the likes of MongoDB, Microsoft and Oracle add graph capabilities into their products, people find that, at least initially, they can use those products for a period of time until they start to go for specialised graph databases,” said Adrian.
As a testament to the growth of specialist players, Neo4j surpassed $100m in revenue in 2021, putting it in the list of top 30 database suppliers that includes multimodal database suppliers Oracle and IBM, but also cloud providers such as Amazon Web Services, Microsoft Azure and Google Cloud.
The global cloud providers may well be the next big players in the graph database space. Although cloud dominates new DBMS deployments overall, not all of the dominant suppliers are competing aggressively in the graph DBMS space, according to a new report by Gartner.
But that is expected to change as the market grows. Gartner expects the percentage of revenue attributable to cloud in the overall DBMS market to exceed 50% by 2023.
“As the incumbent cloud service providers begin to take market share, they will be a formidable barrier for small vendors that will have to both partner with and compete with them,” said Gartner.
Read more about database technology in APAC
- Neo4j is partnering with consulting firm Deloitte to meet the demand for graph technology on the back of its growing business in the ASEAN region.
- TigerGraph is approaching the Asia-Pacific region with a solution-based strategy and partnering with universities to grow local capabilities.
- Alibaba Cloud’s growing clout in cloud-based databases was validated recently when it reported that its database revenues grew by over 50% year over year.
- MongoDB teams up with Alibaba Cloud to expand its presence in Asia-Pacific as it prepares to release a serverless variant of the open source database.
“As with other specialty markets, the small vendors’ agility and concentration on specific requirements, such as enterprise knowledge graphs, domain solutions, or analytics and AI, will help them stay ahead.”
Xu, TigerGraph’s CEO, said the company started with “visionary customers” who were adept at using graph databases, but to scale the business, he said it was important to reduce any friction that stood in the way of wider adoption – akin to what data analytics tools such as Tableau did for SQL queries.
“We are innovating in the graph business intelligence and user interface space with a product called Query Builder to provide a visual approach to building your graph business logic,” he said. “People who don’t know any query language can ask questions easily through a browser and, in the case of fraud detection, receive alerts of potentially fraudulent transactions.”
TigerGraph is also working on emerging domain solutions like entity resolution to help organisations such as e-commerce companies to build “customer identity graphs”, said Xu, so as to understand the number and types of devices used by a customer to access the same service, among other insights. “This will help them to personalise and contextualise their services for their customers,” he said.
Gartner’s Adrian said small graph database suppliers will also have to focus on integration with other systems and prove enterprise-level readiness.
That includes integrating into the workflows of an emerging group of graph technology users – data scientists who are behind the AI and machine learning models in the use cases for which graph databases are geared.
Neo4j, for example, has built what it calls Graph Data Science (GDS), a connected data analytics and machine learning platform that helps users to understand the connections in big data to answer critical questions and improve predictions.
GDS is now one of the fastest-growing segments of Neo4j’s business since the product was launched in early 2020 and presents a “massive opportunity” for the company, with technology leaders like Google having already shifted to graph-based machine learning to generate user insights, said Eifrem. “We believe that the way that Google goes, so does the enterprise.”
Another development that could expand the market for graph databases is GraphQL, a graph query language that will standardise graph database queries across different front-end tools and databases. The ISO (International Standards Organisation) standard, which many suppliers, including the large enterprise DBMS vendors, are involved in, is expected to be ready by the end of 2023.
“That will likely stimulate a significant uptick in growth and make it easier for skills to be transferable and for programmes that have been written to be transferable from one product to another,” said Adrian.