Hyper-convergence + backup software = scale-out backup nodes

Hyper-converged infrastructure with built-in backup software brings scale-out capabilities to provide data protection nodes via pre-configured appliances that can build into clusters

For many years, the typical backup deployment model was “build your own”. The customer was expected to design, source and implement the hardware, then install backup software on top. 

Then came backup appliances, which put all that in one box.

Now, the trend towards hyper-converged infrastructure has met with backup and we see the emergence of hyper-converged backup products.

The standard backup model

The design of backup software has changed little over the years. Most have a central metadata server and scheduler, with scaling out managed by one or more media servers.

To deal with scale, each component would be built on a separate server to provide additional network throughput and back-end storage capacity.

As we moved into the virtualisation era, the backup process changed significantly.

Instead of streams of data from many different servers, backup software was able to use an application programming interface (API) provided by the hypervisor maker. This change provided two benefits.

First, it meant backup software worked with a stream of block-level changes for each virtual machine (VM). So there was no need for client software on each virtual server to track updates and differences, making the backup deployment process much easier.

Second, the consolidated stream of backup data removed the need for lots of load balancing across media servers, so the backup solution just had to manage one or more streams of data from the hypervisor.

Moving to scale-out

The ability to process hypervisor-based data isn’t what defines hyper-converged backup solutions. Data from hypervisor-aware backup usually still ends up on disk or tape-based media.

The next step in the hyper-convergence process has been to collapse backup software and scale-out storage into a single product to create hyper-converged backup.

A hyper-converged backup solution consolidates backup storage and software into a scale-out architecture that encompasses all the features of a backup platform.

Solutions are deployed as a cluster of servers or nodes, across which the functions of metadata management, data storage and scheduling are implemented.

A hyper-converged backup solution consolidates backup storage and software into a scale-out architecture that encompasses all the features of a backup platform

In common with many hyper-converged infrastructure offerings, hyper-converged backup solutions implement a distributed scale-out storage layer across the cluster of nodes/servers. This provides a landing zone for backup data that can be used for recovery or “instant” restores.

As hyper-converged backup products are essentially scale-out storage in their own right, many offer the ability to act as a hypervisor to a data store. This means backups can be visualised as virtual machines, connecting the hyper-converged backup platform to a hypervisor and powering up a backup image.

Instant restore is a feature that couldn’t be implemented without an internal file system or a process to create synthetic views of VMs from backups.

Managing server failure

The scalability capabilities of hyper-converged backup deliver two main benefits – mitigating against server failure and providing simple scale-out.

In traditional backup design, the metadata and scheduling services are based on a single node or implemented as a resilient cluster. Managing this cluster in the event of a server or site failure can be a complex task.

Most backup solutions use “monolithic” database software like SQL Server or MySQL that don’t scale easily across many nodes. Replication of the backup software to remote locations often involves using SAN technologies to replicate the backup data.

In contrast, hyper-converged backup solutions devolve services such as metadata management across nodes, managing server failure more gracefully. Should a single node fail, other nodes in the cluster are able to take over and continue operations. With a Distributed File System, this also means no loss of access to backup data.

The distributed architecture also extends to scalability. As backup capacity needs increase, hyper-converged backup solutions can be expanded by adding extra nodes to the cluster in the same way hyper-converged infrastructure operates. Additional capacity can be added simply by racking a new server and adding to the cluster.

The flexibility to add extra nodes means capacity can also be increased in accordance with demand. Suppliers offer a range of server sizes to meet multiple capacity increments (see “Hyper-converged backup suppliers” below).

Backup administrators don’t need to think about how to rebalance workload across the new infrastructure as they did in the past. Many solutions also provide the ability to geographically disperse nodes, creating a multisite configuration that would have been much more complex to manage with traditional backup.

Hyper-converged backup licensing and support

For many customers, licensing will also be simplified. Hyper-converged backup solutions usually licence by capacity or node, making the cost of adding new capacity easy to quantify.

There’s no need to consider having to licence individual features or components of the backup architecture.

Although not strictly a hyper-converged backup feature, backup appliances often offer a more flexible maintenance model, whereby the patching and upgrade process is handled by the supplier rather than backup administrators.

In hyper-converged backup solutions, downtime can be kept to a minimum or virtually eliminated, with rolling software upgrades across the nodes of a cluster.

Hyper-converged backup limitations

While offering great operational benefits, hyper-converged backup solutions aren’t a replacement for all requirements.

Their evolution from virtual backup means hyper-converged backup solutions may have limited support for traditional workloads that run on physical servers

Their evolution from virtual backup means many have limited support for traditional workloads that run on physical servers. It’s never desirable to run multiple backup solutions, so hyper-converged backup will likely be more appropriate where it can entirely replace previous backup implementations.

Then there’s the question of lock-in. Backup deployments that manage high levels of data deduplication will have created lock-in for the customer. Moving backups to another platform would mean rehydrating the backup image and capturing it again on another system.

The lock-in issue can be expected with any backup solution, however, including those that use proprietary formats – not just with hyper-converged backup.

Hyper-converged backup suppliers

We have seen a number of startups introduce products in the hyper-converged backup space. Offerings are typically hardware–based, including support for products from the major server suppliers.

Alternatively, there are software-only products that address branch office or public cloud environments.

Many suppliers allow their hardware and software offerings to be put together as an integrated distributed backup platform.


Cohesity offers a range of hardware nodes, either unbranded or in partnership with HPE and Cisco. Solutions scale from the entry-level C2105, with 6TB of spinning disk HDD and 800GB flash capacity per node, to the C2605 with 30TB HDD and 1.6TB SSD of storage space.

Cohesity also provides DataPlatform Virtual Edition for smaller environments without dedicated hardware. There is also a cloud edition for managing public cloud instances or replicating data from on-premise infrastructure to public cloud storage.


Commvault recently announced a scale-out appliance solution called HyperScale. This encompasses the current Commvault Data Platform software with a scale-out file system based on GlusterFS.

Per-node storage capacities scale from 16TB to 40TB of disk capacity, with an additional 150GB of flash storage. HyperScale is also available as software or deployable on Cisco Unified Computing System (UCS) hardware.


Rubrik offers a range of hardware appliances that scale from the three-node r334 entry-level system to high-end r3410 with four nodes. Storage capacities start at 36TB hard disk drive (HDD) and 1.2TB solid-state drive (SSD) in the r334 to 120TB HDD and 1.6TB SSD in the r3410.

For customers that prefer to use existing server partner technology, solutions are available that use HPE, Cisco and Lenovo hardware. 

Rubrik also offers a software appliance implementation that can be used in smaller branch offices and a public cloud implementation that runs in Amazon Web Services (AWS), Microsoft Azure and Google Cloud.

Read more about hyper-converged backup

Read more on Data protection, backup and archiving