Red Hat adds erasure coding in Inktank Ceph Enterprise v1.2

Red Hat adds erasure coding aimed at backup and archive use cases to Inktank Ceph Enterprise version 1.2

Red Hat has upgraded its OpenStack storage software distribution Inktank Ceph Enterprise (ICE) to version 1.2. It will include erasure coding data protection to provide cost efficient backup and archive use, as well as tiered storage functionality.

Ceph is an open-source product that provides object storage, and block and file access to the OpenStack cloud environment. 

ICE is Red Hat’s commercially-supported distribution of Ceph, and currently supports only object storage and block access. Red Hat delivers file-based access via the parallel file system, GlusterFS

ICE – the result of Red Hat’s acquisition of Inktank in May 2014 – is targeted at organisations that want to build storage for cloud services, public or private, and for the provision of data in web-scale operations, said Ross Turk, director of product marketing, storage and big data business unit, Red Hat.

The addition of erasure coding for backup and archive use cases brings greater efficiency in the use of storage capacity than has been possible in ICE.

To date, all data in ICE has been subject to triple-mirrored replication for data protection. This provides a high level of resilience of data and rapid recovery from existing full copies, but at the cost of a 3x premium on storage capacity.

Now, using open-source Jerasure erasure coding libraries, data protection for backup and archive can be achieved by use of only around 50% extra capacity in addition to the original copy. 

But because a copy is re-built using a set of parity data, the recovery process is slower and comes with resilience that Red Hat describes as “cost effective durability”.

More on OpenStack storage

Jerasure supports erasure coding algorithms, including Reed Solomon and the Cauchy variant.

Erasure coding is a method of data protection in which data is broken into fragments that are expanded and encoded with a configurable number of redundant pieces of data and stored across a set of different locations.

If data is lost or corrupted, it can be reconstructed using information about the data stored elsewhere. It works by creating a mathematical function to describe a set of numbers so they can be checked for accuracy and recovered if one is lost.

Turk said: “Erasure coding will provide basic data durability in a Ceph cluster and we expect to see most use for infrequently accessed data with the lowest performance requirements. 

"It will allow customers to build backup and archiving pools with a fraction of the data overhead, not the 3x needed for production data.”

Meanwhile, the addition of automated storage tiering to ICE allows data to be moved between tiers of storage defined by performance and data protection method.

For example, an image uploaded to a web page and subsequently unused would reside in the least costly, slowest-to-access drives protected by erasure coding.

But if the image is repeatedly accessed some time later it would be moved to better-performing drives with 3x mirroring data protection.

Read more on Data protection, backup and archiving