In-memory databases - what they do and the storage they need

Feature

In-memory databases - what they do and the storage they need

As processor speeds increase, the need to reduce latency between the CPU and data becomes more pressing. The answer to that need has seen the rise of local flash storage and PCIe flash solutions.

But if that boost in performance isn’t enough, there is always the option to place data directly into system memory. This is the concept behind in-memory databases that put data in memory to achieve the fastest possible performance.

What is an in-memory database?

memory_chips_pile.jpg

In-memory databases put the working set of data into system memory, either completely, in the case of solutions such as SAP Hana, or partially, based on the identification of tables that will benefit most from DRAM speed.

There is an obvious performance benefit in the reduced latency in-memory database solutions bring, even over heavily cached systems, which can only optimise database read requests.

But in-memory databases are subtler than that. This is because they provide an opportunity to optimise the way data is managed compared to traditional databases on disk-based media.

When all data is kept in memory, the need to deal with issues arising from the use of traditional spinning disks disappears. This means, for example, there is no need to maintain additional cache copies of data and manage synchronisation between them. 

Data can also be compressed and decompressed in memory more easily, resulting in the opportunity to make space savings over the equivalent disk copy.

So, why not simply create a RAM disk in memory and move the database to this virtual volume to achieve similar results?

This could be done, but the internal algorithms of the database would still manage data as if it were on disk and so perform tasks such as pre-fetching, caching and lazy writes. And that would be less than optimal in terms of performance and use more processor time.

Instead, in-memory database solutions have logic specifically adapted to work with data in DRAM.

However, system memory is volatile, which means in-memory databases only conform to three out of four of the Acid model of database characteristics - atomic, consistent, isolated and durable. Of these, durability cannot directly be served by in-memory database solutions, because data is lost when power is removed from the server.

Overcoming the shortcomings of volatile memory

But there are solutions to the problem. These include keeping additional copies of data in clustered and scale-out databases that allow systems to keep running by replicating updates to one or more standby systems.

Some database systems also perform periodic commits-to-disk to maintain state to a point from which recovery can be performed in the case of a server crash. Here there is a trade-off between the time between commits (and subsequent recovery) and the overhead of the commit process on performance.

In-memory database technology has largely been avoided for general OLTP applications, and instead targeted at specific data types or analytics requirements

Because of the perceived risk of in-memory databases over traditional OLTP databases, a degree of caution has been evident with regard to the types of applications it is used for. The result is that in-memory database technology has largely been avoided for general OLTP applications, and instead targeted at specific data types or analytics requirements (including batch reporting) where re-running transactions can easily be achieved.

This also makes sense from a budget perspective as DRAM is still more expensive than disk or even flash, which can provide the I/O performance required without compromising data durability.

Having said that, in-memory databases are set to move into the OLTP world as the acceptance and adoption of the technology continues, and businesses have started to use SAP Hana for OLTP workloads.

In addition, the release of Microsoft SQL Server 2014 promises to offer in-memory capability with the use of “memory optimised tables” that allow portions of a database to be placed into system memory.

Meanwhile, database giant Oracle has announced an in-memory option for its main database platform, which promises high levels of performance without application changes.

Storage for in-memory databases

Although the operation of in-memory databases occurs in system memory, there is a need for permanent storage media.

There are two main in-memory database storage requirements: Permanent media to store committed transactions, thereby maintaining durability and for recovery purposes if a database does need to be reloaded into memory, and; permanent storage to hold a copy or backup of the database in its entirety.

When processing commits, disk I/O performance is the biggest bottleneck to performance and minimising I/O overhead is critical. This suggests the best possible storage media to use is flash. Moving flash closer to the processor reduces latency and so PCIe SSD or the recently released range of NVDIMM memory channel storage devices provides the lowest latency possible.

Memory channel storage puts flash on hardware that uses the DIMM form factor and plugs directly into the motherboard of the server, providing solid-state storage on the DRAM bus. This results in extremely low latency I/O but does require Bios changes and operating system (OS) drivers to allow the OS to identify the memory as non-volatile. Bios amendments are required to prevent the server failing the memory on POST boot-time checks.

IBM is the first server vendor to release NVDIMM technology in its new X6 products, using the brand name eXflash. Both X6 server and eXflash technology have been combined with IBM’s DB2 database to create an in-memory option called BLU Acceleration. IBM claims speed improvements of almost 100 times over previous deployments of DB2.

In-memory database performance can be improved by having only a small amount of non-volatile local storage, so we can expect to see increased adoption of memory channel storage for databases as vendors adapt and optimise their products.

With regard to the requirement to reload databases more quickly, then clearly flash is a benefit here too. Reading an entire database into memory from flash will always be much faster than from spinning disk.

The issue here, of course, is one of cost, with flash being significantly more expensive than disk and, in the case of in-memory database use, being accessed very infrequently. However, in clustered environments the investment in a shared flash-based solution may be a wise one.

In-memory databases promise great leaps in performance, but as we have seen, these solutions still need some traditional storage to operate, irrespective of where the main processing occurs.


Email Alerts

Register now to receive ComputerWeekly.com IT-related news, guides and more, delivered to your inbox.
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy

This was first published in May 2014

 

COMMENTS powered by Disqus  //  Commenting policy