Guide to NoSQL databases: How they can help users meet big data needs
A comprehensive collection of articles, videos and more, hand-picked by our editors
Under the banner of big data marches a confusing mass of terminology and acronyms. NoSQL is among them, attracting growing interest from businesses struggling to cope with exploding volumes and new types of data.
E-commerce, social media and smart devices are all placing data demands on businesses that relational database systems cannot support at speed and reasonable cost. Although NoSQL technologies promise a solution, the term groups together a range of approaches to data management, each of which has its strengths and weaknesses depending on the problem corporate IT professionals are trying to solve.
New applications in the field of NoSQL are characterised by polyglot persistence, says Matthew Aslett, research director for data platforms and analytics at The 451 Group. This concept is credited to Martin Fowler, software researcher and independent consultant at ThoughtWorks, who designed a web application using NoSQL databases Riak, Neo4j, MongoDB and Cassandra, as well as an RDBMS, for distinct data sets.
Such a mish-mash of technologies may be anathema to IT managers who like to see standardisation, optimisation and clarity in support, but they may have to get used to it, says Aslett.
“You are using each technology for a very focused purpose," he says. "If you look from enterprise IT, that’s not necessarily a good thing. You get an application with four or five databases under it, all of which are interdependent on each other, with different support relationships and only one person knows how it is stitched together.
For more on NoSQL
“Organisations are now making strategic choices, looking at which [NoSQL] database is useful for which purpose, which ones the developers like to use, and trying to reduce that set. It is accepting the inevitable and planning for it rather than trying to hold back the tide.”
This is the case with experienced NoSQL users, such as the global distribution system Amadeus, which distributes flight inventory from 700 airlines to thousands of travel agents and internet booking engines worldwide. It uses both NoSQL database Couchbase and relational database Oracle to manage flight inventory and booking systems, while it is also piloting NoSQL MongoDB for document storage (see below).
Dietmar Fauser, vice-president of architecture and infrastructure at Amadeus R&D, says: “You need to understand your problem and pick the technology that fits it best.”
Water management company i2O uses Cassandra
Cassandra is a column-store NoSQL database that has benefited water management business i2O by prioritising availability over consistency of data. Data is stored together in columns, rather than the rows used in relational databases. The distributed nature of the database means creating very wide columns does not harm performance.
But the business also uses Elasticsearch as a NoSQL database to store irregular documents for IT auditing and continues to invest in relational database PostgreSQL.
i2O helps water companies and businesses with significant water needs to reduce leaks and over-supply. Intelligent hardware, such as valves and pressure meters, is linked to cloud-based learning algorithms to ensure its customers give the right volumes of water to consumers when it is needed, saving 200 million litres of water a day.
Software and IT director Mike Williams says the system needs to cope with large volumes of times series data from devices in the water network.
When we started experimenting with Cassandra, we found it was so much more scalable and effective on that type of data
Mike Williams, software and IT director, i2O
The company chose Cassandra because it was inherently able to manage very wide rows in a distributed database. “When we started experimenting with Cassandra, we found it was so much more scalable and effective on that type of data, even though we did not get our design perfect first time,” says Williams. “We can get a lot more in the column-oriented view with no performance hit in retrieving that volume of data.”
Because it was built to be distributed, Cassandra, which is backed by services firm DataStax, makes it easier to add new nodes than with other systems, says Williams. “It's a doddle. I worked with databases for a large part of my career and clustered relational databases were never as easy as this.”
Stored in cloud-based system
Temetra is another utility firm that collects gas and water meter data and stores it in a cloud-based system. It selected NoSQL database Riak, from Basho, for its “key value” qualities, which allow it to store large volumes of unstructured data.
Since 2010, the Irish company has expanded in the UK and has experienced a twenty-fold increase in the number of meters managed while the data collected has jumped from 300 pieces of data per meter per year to about 35,000 in cases where data is read every 15 minutes.
The company had been using relational database PostgreSQL, but the growing data volume was hitting performance. Software engineer and Temetra founder Paul Barry opted to move to Riak after experimenting with Cassandra.
You have to be a bit nimble with NoSQL to accommodate changing your mind
Paul Barry, founder, Temetra
He says the move to NoSQL requires new thinking by IT management. “We are still learning a lot about it," he says. "If you put three experienced SQL engineers in the room and ask them to map some data, they will probably come up with pretty much the same pattern. With NoSQL, it’s a blank sheet. Our first case was not perfect. It was reliable and it stored the data, but it was not the best way to accommodate all use cases. We have since changed the way we store data. You have to be a bit nimble with NoSQL to accommodate changing your mind.”
Temetra continues to use legacy SQL systems for transaction data and NoSQL search technology Solr, but Riak is underpinning rapid expansion, says Barry. “I sleep at night. We know we’re getting another million meters in six months and I know we can provision that.”
NoSQL databases may require IT managers to step outside their comfort zone. The technologies are varied and lack standard tools, skills and approaches to data mapping. But, used for the right problem, they are helping businesses improve data performance in markets where that brings a competitive advantage.
Case study: GDS get to grips with global data growth
Amadeus is among the world’s largest travel global distribution systems, enabling consumers and travel agents to access flights from 700 airlines, as well as hotel rooms and parking spaces. The system was once used only by travel agents, but the birth of the online travel site means its flight database is accessed 3 or 4 million times per second.
Growing mobile travel search and booking could see this rise to 20 or 30 million times per second, says Dietmar Fauser, vice-president of architecture and infrastructure at Amadeus R&D.
The Amadeus R&D department moved away from mainframes to Unix (and later Linux) and relational databases from Oracle in the early 2000s. But with the extra demands of online traffic, it needed to find a new way to offer access to inventory data.
Amadeus split the read part of its database, giving users views of flight availability, out of the Oracle database, duplicated it and sent it into a distributed in-memory environment, based on open-source Memcached, a key-value NoSQL data cache. This produced a two-orders-of-magnitude performance improvement over relational database, but it needed to be refined, so Amadeus opted for Couchbase, a similar NoSQL database, says Fauser.
The problem was that Memcached has no persistency, so if a node fails, it cannot be brought back. Users either need to code persistency themselves, or use the code already available in the Couchbase database to create it, says Fauser.
NoSQL databases generally trade off performance against consistency, accepting "eventual consistency" as opposed to the strict rules of relational databases. Amadeus continues to use Oracle to resolve transactions, which is made consistent with NoSQL read data in about a second.
Meanwhile, Amadeus is also piloting MongoDB, a document-based NoSQL database, for an e-ticketing database and revenue accounting system. “We accept divergent technology if we believe it fits our needs,” says Fauser.