Feature

In-memory databases - what they do and the storage they need

In-memory databases offer high performance data processing, but how do you protect data in volatile DRAM and what kind of storage is needed for longer term retention?

Chris Evans

Published: 28 May 2014

As processor speeds increase, the need to reduce latency between the CPU and data becomes more pressing. The answer to that need has seen the rise of local flash storage and PCIe flash solutions.

But if that boost in performance isn’t enough, there is always the option to place data directly into system memory. This is the concept behind in-memory databases that put data in memory to achieve the fastest possible performance.

What is an in-memory database?

In-memory databases put the working set of data into system memory, either completely, in the case of solutions such as SAP Hana, or partially, based on the identification of tables that will benefit most from DRAM speed.

There is an obvious performance benefit in the reduced latency in-memory database solutions bring, even over heavily cached systems, which can only optimise database read requests.

But in-memory databases are subtler than that. This is because they provide an opportunity to optimise the way data is managed compared to traditional databases on disk-based media.

When all data is kept in memory, the need to deal with issues arising from the use of traditional spinning disks disappears. This means, for example, there is no need to maintain additional cache copies of data and manage synchronisation between them.

Data can also be compressed and decompressed in memory more easily, resulting in the opportunity to make space savings over the equivalent disk copy.

More on in-memory databases

What is the top storage challenge with in-memory databases?
Storage design for in-memory databases should focus on low latency, flash
In-memory databases: The golden ticket to deeper analyses?
In-memory databases best suited for 'relatively small' data sets
In-memory processing helps databases meet need for IT speed
McKnight: In-memory technology finds lane in database mainstream

So, why not simply create a RAM disk in memory and move the database to this virtual volume to achieve similar results?

This could be done, but the internal algorithms of the database would still manage data as if it were on disk and so perform tasks such as pre-fetching, caching and lazy writes. And that would be less than optimal in terms of performance and use more processor time.

Instead, in-memory database solutions have logic specifically adapted to work with data in DRAM.

However, system memory is Volatile, which means in-memory databases only conform to three out of four of the Acid model of database characteristics - atomic, consistent, isolated and durable. Of these, durability cannot directly be served by in-memory database solutions, because data is lost when power is removed from the server.

Overcoming the shortcomings of volatile memory

But there are solutions to the problem. These include keeping additional copies of data in clustered and scale-out databases that allow systems to keep running by replicating updates to one or more standby systems.

Some database systems also perform periodic commits-to-disk to maintain state to a point from which recovery can be performed in the case of a server crash. Here there is a trade-off between the time between commits (and subsequent recovery) and the overhead of the commit process on performance.

In-memory database technology has largely been avoided for general OLTP applications, and instead targeted at specific data types or analytics requirements

Because of the perceived risk of in-memory databases over traditional OLTP databases, a degree of caution has been evident with regard to the types of applications it is used for. The result is that in-memory database technology has largely been avoided for general OLTP applications, and instead targeted at specific data types or analytics requirements (including batch reporting) where re-running transactions can easily be achieved.

This also makes sense from a budget perspective as DRAM is still more expensive than disk or even flash, which can provide the I/O performance required without compromising data durability.

Having said that, in-memory databases are set to move into the OLTP world as the acceptance and adoption of the technology continues, and businesses have started to use SAP Hana for OLTP workloads.

In addition, the release of Microsoft SQL Server 2014 promises to offer in-memory capability with the use of “memory optimised tables” that allow portions of a database to be placed into system memory.

Meanwhile, database giant Oracle has announced an in-memory option for its main database platform, which promises high levels of performance without application changes.

Storage for in-memory databases

Although the operation of in-memory databases occurs in system memory, there is a need for permanent storage media.

There are two main in-memory database storage requirements: Permanent media to store committed transactions, thereby maintaining durability and for recovery purposes if a database does need to be reloaded into memory, and; permanent storage to hold a copy or backup of the database in its entirety.

When processing commits, disk I/O performance is the biggest bottleneck to performance and minimising I/O overhead is critical. This suggests the best possible storage media to use is flash. Moving flash closer to the processor reduces latency and so PCIe SSD or the recently released range of NVDIMM memory channel storage devices provides the lowest latency possible.

Memory channel storage puts flash on hardware that uses the DIMM form factor and plugs directly into the motherboard of the server, providing solid-state storage on the DRAM bus. This results in extremely low latency I/O but does require Bios changes and operating system (OS) drivers to allow the OS to identify the memory as non-volatile. Bios amendments are required to prevent the server failing the memory on POST boot-time checks.

IBM is the first server vendor to release NVDIMM technology in its new X6 products, using the brand name eXflash. Both X6 server and eXflash technology have been combined with IBM’s DB2 database to create an in-memory option called BLU Acceleration. IBM claims speed improvements of almost 100 times over previous deployments of DB2.

In-memory database performance can be improved by having only a small amount of non-volatile local storage, so we can expect to see increased adoption of memory channel storage for databases as vendors adapt and optimise their products.

With regard to the requirement to reload databases more quickly, then clearly flash is a benefit here too. Reading an entire database into memory from flash will always be much faster than from spinning disk.

The issue here, of course, is one of cost, with flash being significantly more expensive than disk and, in the case of in-memory database use, being accessed very infrequently. However, in clustered environments the investment in a shared flash-based solution may be a wise one.

In-memory databases promise great leaps in performance, but as we have seen, these solutions still need some traditional storage to operate, irrespective of where the main processing occurs.

In-memory databases - what they do and the storage they need

In-memory databases offer high performance data processing, but how do you protect data in volatile DRAM and what kind of storage is needed for longer term retention?

What is an in-memory database?

More on in-memory databases

Overcoming the shortcomings of volatile memory

More on in-memory technology from SAP, Oracle and IBM

Storage for in-memory databases

Read more on SAN, NAS, solid state, RAID

Pure Storage profits from all-flash, as-a-service and cloud focus

NetApp: Not just NAS filers, and a comprehensive cloud strategy

Pure Enterprise Data Cloud bundles its IP for business outcomes

columnar database