Shipping and delivery specialists generate huge amounts of data every day as part of the services they provide, and they have to deal with that information in real time. France-based Chronopost, a parcel delivery company that is part of the DPDgroup and linked to France’s national postal service, is no exception.
The company, which has 3,500 employees and serves 230 countries worldwide, distributed 114,500,000 parcels in 2014.
Chronopost software engineer Alexander Dejanovski says the company generates and uses a great deal of granular data in the process of tracking and delivering parcels. The challenge is to process this data with sub-second response times.
“We were also looking for ways to improve resiliency, which is a common issue when using monolithic databases using external replication systems and custom software/network switches,” says Dejanovski. “On top of that, there was a desire to lower our software licence fees, which led to the search for open-source NoSQL solutions.”
Not an easy choice
Chronopost found that choosing a NoSQL database was no easy matter as there are hundreds of databases and not a single standard.
“Our database had to be a safe place to store data that wouldn’t exist elsewhere in our system. It had to be fast and scalable, it had to provide a high level of resiliency with native replication and it had to be simple to use and operate,” says Dejanovski.
After a thorough search, the company was able to compile a shortlist of three paltforms: HBase, Cassandra and MongoDB. Following tests of the technology, Apache Cassandra emerged as the preferred option for Chronopost.
The right fit at the right time
“We happened to look at Cassandra at the right time I guess, when 1.2 was the stable release and 2.0 was available for testing,” says Dejanovski. “Previous versions were quite different and wouldn’t have fit our needs. Things obviously move very fast in the NoSQL area.”
Cassandra is used for managing data at scale, where information is created at fast rates and by hundreds or millions of users and their devices. This means Cassandra is suited to use cases like internet of things (IoT) projects, telecoms messaging platforms, personalisation systems on e-commerce sites, recommendation engines such as Netflix and fraud detection systems. Cassandra allows companies to capture and then analyse all that data as it is created.
“Cassandra was the perfect fit for us – peer-to-peer architecture, Hadoop/Spark connectivity with no effort, very easy to operate (a single process to run on each machine) and a strong schema. Furthermore, it provided Cassandra Query Language (CQL) that eases transitioning from an RDBMS [relational database management system] since it is almost a subset of SQL,” says Dejanovski.
The reason why companies might opt for Cassandra rather than a Microsoft SQL or Oracle database is due to the sheer scale of data being created. All of those transactions have value, and a company does not want to lose any of them. For example, with an IoT project, all the transactions or information produced by each device have to be captured. This is referred to as “time-series” data, and it provides a better picture of each customer as they carry on using a service or device.
“Cassandra’s best use case is time series. Being an express shipment company, we mostly deal with time series (the life of each parcel),” says Dejanovski. “We are a real-time company, giving constant feedback to our customers about their parcels. We deliver them fast so data must travel even faster throughout our systems to meet expectations.”
Cassandra helped to iron out some major challenges for Chronopost. For example, Dejanovski says it provided a Java Database Connectivity (JDBC) driver so the company could port its enterprise application integration (EAI) supervision application with limited effort.
“Same protocol, slightly different schema – and overall we didn’t have to rewrite the app, just change the few parts that were interacting with the database,” he adds.
Alexander Dejanovski, Chronopost
An unexpected development was that Chronopost ended up with a much simpler data schema than the RDBMS version.
“We were blown away by Cassandra’s speed. By the end of the proof of concept our app was four to five times faster than its RDBMS version. Latencies dropped from 100-120ms [milliseconds] to 10-30ms per message, with very fast writes that allowed us to handle a load we couldn’t absorb before very easily – all of this running on inexpensive commodity hardware,” says Dejanovski.
Having a faster database allows us to build new services, says Dejanovski.
“Technically, we could have made them with RDBMS, but were limited by the power needed to fuel them, keeping in mind that vertical scalability is painful (downtimes, migrations) and expensive,” he says.
“Databases have been the bottleneck and the single point of failure of every IT system and we are glad that technologies like Cassandra are changing this.”
Dejanovski has other plans for Cassandra, noting that the delivery company already has a solid multi-datacentre Cassandra cluster in production and is currently building a search and an analytical one, using Apache Spark on top of Cassandra.
“So far we’ve learned how to build real-time apps backed by Cassandra, and have many strong projects in development that use it. On the development side though, it forces us to leave the ‘bulk insert then process with stored procedures’ paradigm to something that better fits in the micro-service paradigm,” he adds.
Chronopost is now working with its operations team and database administration to benefit from the high resiliency offered by Cassandra and automate aspects such as backups, “which in a distributed world are much harder than in a monolithic one”, says Dejanovski.
A further challenge will be to switch from traditional business intelligence to big data analytics, bringing in developers that are mostly used to SQL and asking them to build Scala scripts for Spark.
“Cassandra will help us to build fast and resilient applications for our customers, and give us power to build new innovative services,” concludes Dejanovski.
Read more about open source
- Apache Cassandra was designed to handle very large amounts of data and deliver high availability without a single point of failure.
- Online bookie William Hill has deployed a stack of open-source tools to present time-sensitive web content.
- NoSQL company DataStax has acquired the Titan graph database distributor Aurelius.