everythingpossible - Fotolia
As the world awaits the predicted death of the relational database, another suspect – known as the graph database – has entered the frame to potentially deliver the final, fatal blow.
But is the graph database really a worthy challenger, or just another way of dealing with data? Is it just another database (JADB), or is there something in this approach?
Graph databases are nothing new, but are – to date – not as popular as relational databases (such as IBM DB2 or Microsoft SQLServer) or their non-relational counterparts (MongoDB, Riak or Cassandra, for example).
To achieve that, graph databases need a mature, non-relational platform, which is why interest in this approach is on the rise.
But where does the name ‘graph database’ come into it? A relational database is based on the concept of columns and rows, with searches carried out across indices and joins across tables. The relationship between the data points is essentially constructed at the time of query, and this can be expensive in resource terms.
A graph database looks at the relationships between the data points as the information is stored, and maintains metadata around how different data items are related to each other as it goes along.
It creates a ‘graph’ of relationships, and when a query is required to be run against the data, the results can be pulled out far more efficiently than is possible in a relational or basic non-relational database.
Managing data in this way means graph database performance is, essentially, independent of database size. Give a graph database a ‘pattern’ to search for, and it can ignore all data that it knows does not match and concentrate on the data that does.
A graph database must be constructed around some basic concepts, and needs to understand the relationships between different data assets.
There is a set of hierarchical associations, to start with, where an item can have relationships with others, but can also have properties.
This uses the concepts of nodes (an asset of some sort), properties (information pertinent to the nodes) and edges (connections between nodes that define the relationship between them).
This is based around a resource description format (RDF). In RDF language, each entity needs a ‘triple’ – a subject, a predicate and an object.
RDF is behind the way much of the internet works, in that universal resource identifiers (URIs) need to be based on RDF triples. The whole idea of the semantic web (a more standardised web where data is available globally for access and analysis) is based on RDF.
Take a house on a road, as an example. The house is a main node and has relationships with the other houses on the road, as well as the road itself and services such as electricity, gas and water.
Read more about graph databases
- We explore why the growth of graph databases could be pointing the database market in a new direction.
- The 11.5 million files leaked from Panama-based law firm Mossack Fonseca are being interrogated by journalists using a combination of Neo4j’s graph database and data visualisation software Linkurious.
It also has properties – it may be semi-detached with three bedrooms, two reception rooms, a kitchen, and so on.
That kitchen will also have its own properties, housing a washing machine, dish washer, and so on. The kitchen has an edge between it and the washing machine that could define when the washing machine was purchased, last serviced, or any other relationship that could be of interest.
In this scenario, the kitchen would be an RDF subject, the predicate could be “which contains” and the object would be the washing machine.
Within a relational database, each of these relationships has to be found each time a query is run but, with a graph database, the relationships form a bigger part of the way data is stored.
Delete once and delete it all
One of the bits of ‘secret sauce’ in a graph database is the way it deals with the deletion of assets.
It does not just delete the asset itself, but all relationships. So, if this house is deleted, it will leave no broken links. In this case, deleting the house will also delete the bedrooms, kitchen and reception rooms, because they cannot exist without the house.
This is why the JADB tag really does not apply to graph databases. Relational databases are good for pure data-driven systems where indices and SQL queries rule, while NoSQL databases are good for dealing with less-structured data, such as documents and object-based information.
But when it comes to carrying out analysis of mixed data – where the real need is to query the relationships between the various types – a graph database is the most efficient way to achieve this.
As an emerging market, buyers face the usual problems – there are currently about 50 graph databases available – and choosing one that will survive through to maturity is difficult to predict.
Under the Apache open-source licence, there are choices such as ArangoDB, Cayley, OhmDB, Orly and Titan. Under the various GPL licences are Blazegraph, Neo4j, OpenCog, and others.
In the commercial space, there is IBM’s System G Native Store, Oracle’s Spatial and Graph, and Teradata has Graph capabilities within Aster. FactNexus has GraphBase, Complexible has Stardog and there are plenty of other options available.
Business benefits of graph databases
From a business point of view, graph databases have much to offer. Geographical data can be handled more easily, and complex queries dealt with efficiently, which are useful characteristics to have in retail environments, for example.
Clothes can have properties of material type, size and colour – so queries of “show all cotton T-shirts in blue in size XXL” can quickly throw up results via a retail website. So, too, could “show all cotton T-shorts in blue in size XXL that have short sleeves with green trimming that can be delivered to my address tomorrow morning”.
In short, graph databases are not JADBs. They have a role to play in the mix of data stores that organisations are using, and the key is to understand where it stands to be superior to the other database options available.