There is a lot of data out there, and much of it doesn’t need to be stored in heavyweight relational databases, with complex query languages.
That is particularly true of modern cloud applications and big data services, leading to the rise of a new category of NoSQL data stores from a new breed of suppliers and open source projects.
You will find them used in a wide variety of application stacks: Cassandra, MongoDB, Hadoop, Riak, CouchDB – the list goes on.
But those new players are not the only NoSQL platforms out there.
You will find offerings from most database suppliers, including the two largest, Microsoft and Oracle. Both have taken different routes to NoSQL, one offering a NoSQL store that runs alongside its existing tools, and the other building new services that run as part of a set of hyperscale cloud offerings.
Microsoft’s approach to NoSQL is built around its Azure cloud service, with a focus on supporting “born-in-the-cloud applications”.
While the key/value-based Azure Tables has been around since the early days of the service, a larger-scale service, DocumentDB, has just been made available globally. If you want to run Microsoft’s NoSQL tools on-premise, you will need to run its Azure Stack, which will initially offer a local version of Microsoft’s Tables NoSQL key/value store.
NoSQL stores serve two key roles in Azure: as a source of operational data and as a source of analytical data. Analytical data is generally addressed using Hadoop via Microsoft’s HDInsight service (which also gives you access to Apache HBase column stores), with the option of installing and running your own Hadoop servers on Azure’s infrastructure as a service platform.
HDInsight is perhaps best thought of as Microsoft’s implementation of a commonly used big data NoSQL solution, and is comparable to other Hadoop implementations and services, although it does offer direct access to on-premise SQL Server installations.
Azure’s Table store was the original Azure database, a key/value store for cloud applications that runs as an Azure platform service. It can be used by any application on Azure and has a set of easy-to-use application programming interfaces (APIs), as well as being part of the Azure App Service software development kit. Like most key/value stores, Azure table storage is intended to help manage sessions in stateless cloud applications or in microservice instances.
Using tables is relatively simple. Data is stored in partitioned tables with row and partition keys. You can access any value stored in a table with a row and partition key, returning either XML or JSON ready for use in your application.
There is no schema to worry about, and each stored entity can hold different information. Access is quick, so is ideal to help manage the front end of a web or mobile application. However, there is little or no support for complex queries or for transactions outside the scope of a single partition in a table.
If you are building web or mobile apps that use Azure’s back-end services, you will want to use tables. It is cheap, fast, and can be configured to replicate data across two or more regions to help ensure your services continue to operate in the event of service failures. One thing to note is that there is no guaranteed compute capacity, so you may see performance varying from operation to operation. Still, all you’re paying for is the storage you use.
Read more about Microsoft Big Data
Azure HDInsight is a cloud implementation of Apache Hadoop that provides a software framework designed for processing, analyzing and reporting on big data
Amazon Redshift is the market leader in cloud data warehousing services. But what does Azure SQL Data Warehouse bring to the table?
DocumentDB is a fascinating tool. Microsoft describes it as a “planet-scale database”. Unlike many NoSQL databases, it is designed to be distributed, taking advantage of the global nature of Azure to spread database content and services around the world. This approach brings many benefits, but also raises new issues, around key distributed computing concepts such as concurrency and consistency.
If you are using DocumentDB at scale, working across several instances, you have four options for handling consistency between instances. Data will automatically replicate across servers and stores, but trade-offs need to be made to handle data consistency and performance.
The obvious option is strong consistency, which ensures that applications cannot access data until all the replicas are up to date. That means reads will be slow, but you are always guaranteed access to consistent data. The other option you will normally find in concurrent data stores is eventual consistency. Here you can read data at any time, but you might be reading data from an out-of-date instance that doesn’t have an updated replica – and there is no guarantee of when that replica will be updated.
Microsoft gives DocumentDB users two options, both of which are proving much more popular than the traditional consistency models.
The first, known as bounded staleness, manages the order in which updates are made to a replica. An application accessing a DocumentDB will see updates in the order in which they have been made, but there will be a short period of time in which it might be reading older data.
Then there is session, which allows an application to see its changes as it makes them, but any data from another instance or another application working with the same DocumentDB store may be out of date or out of order.
More than just a document store
Microsoft’s product naming sometimes leaves a lot to be desired, and DocumentDB is a lot more than just a document store. It is designed to work with unstructured JSON data, dynamically creating schema and mappings between blocks of data as it is loaded.
For developers that want to transition from on-premise NoSQL to DocumentDB, Microsoft also offers MongoDB-compatible APIs. MongoDB is one of the more popular NoSQL platforms, giving developers access to a cloud-NoSQL system with just a change of endpoints, which makes a lot of sense. It simplifies code changes considerably, at the same time ensuring developers can work with familiar tools and queries while having a massively scalable service to handle scale-up and scale-out scenarios.
DocumentDB is currently cloud-only, but with Microsoft’s renewed focus on developers and development tooling, an on-premise release cannot be ruled out. Many Azure technologies, such as the microservice host Service Fabric, are getting on-premise releases outside of the Azure Stack, and it would be no surprise to see DocumentDB following suit.
Microsoft may not have an on-premise NoSQL product, but that is not the case with Oracle. Building on its purchase of Sleepycat Software in 2006, it has built its Oracle NoSQL database on top of Sleepycat’s Berkeley DB. That has led to a relatively complex freemium licensing model, with a community edition that uses an AGPL licence (a variant of the GPL that means the source code must be given to anyone accessing software over a network, not just running it on their own systems) and Oracle’s own commercial licence for the Enterprise Edition.
Under the hood, Oracle NoSQL is a key/value store that uses shards to handle high availability, with data replicated across each node in a shard. That helps with performance, as replicas are continuously kept up to date, with a master node handling writes and replicas handling reads. This is designed for stores that handle more reads than writes, and where data is not distributed outside a single datacentre. As you increase or reduce the number of shards, data is reallocated across the entire database without needing a restart.
Read more about Oracle NoSQL
New cloud apps seem ready-made for NoSQL. This may cause Oracle to put more focus on its Oracle NoSQL database, which is often overlooked amid a crush of NoSQL contenders.
The new option for the Oracle Big Data Appliance, Oracle Big Data SQL, can simultaneously query Hadoop, NoSQL and SQL data and extend Oracle's management and security tools.
There is support for a range of data types, including JSON documents and key-value pairs, with a wide selection of drivers and APIs. Oracle also brings its relational database background to NoSQL, and you can use Oracle NoSQL to host tables and use ACID transactions to ensure reliable operations. The result is a useful NoSQL service that builds on the familiar Berkeley DB to deliver an enterprise-ready NoSQL store, which can also act as a feed into big data services.
Oracle NoSQL is not its only NoSQL offering, although other tools stretch the definition of NoSQL considerably. One option is a set of APIs that give NoSQL-like access to MySQL, letting you use a MySQL installation for both standard relational and key-value operations. This approach can make sense if you’re building a web or mobile application that needs key-value operations for users, while still offering more complex queries to back-end services.
If you’re building an application that needs to scale and doesn’t need the complex query capabilities of a relational database, then NoSQL is the way to go. With a mix of tools on the market focused on key/value stores, on time-series work, and on distributed cache services, it is worth exploring the options. Microsoft’s and Oracle’s offerings are designed to support very different scenarios, but together they are a validation of the entire category – and give you an option that fits in with existing licences and existing infrastructures.