Host-based replication vs. array-based replication for backup and disaster recovery

When creating remote replicas for business continuity, the decision to deploy a host- or a storage-based solution depends on the platform being replicated and the applications.

An organisation needs to replicate its data for business continuity and operational readiness. But for replication purposes, the movement of data from source to target has only been classified as replication if the process traverses storage arrays. When assessing the benefits of implementing replication at the host or storage layers, the movement of data within an array should also be considered.

Array-based replication

Array-based replication for business continuity can incorporate the creation of local copies of data within the same array as the source data, as well as the creation of remote copies in an array situated elsewhere. The largest storage vendors provide sophisticated technologies that provide this functionality driven either from the arrays themselves -- via management hosts that control the direction and identity of replication -- or within a virtualisation layer that operates in the SAN fabric or IP storage network.

Creating local and remote replicas of data gives an organisation a form of limited business continuity as well as enhancing its operational readiness by creating copies that can be used for testing and development. Using storage technology, this is achieved at a volume level (often grouped) and with features such as quality of service (QoS), priority controls, data consistency, source-target identity swaps, duplex data flow, protected copies, data queuing/batching and full or pointer-based read/write.

The same functionality can be somewhat achieved at the host layer by using Logical Volume Manager (LVM) and plexes/mirrors as well as any inbuilt "snap" functionality that may exist in the host LVM software. However, the feature set tends to be limited and management more onerous due to the need to operate within LVM and in close logical proximity to the source data.

Creating local replicas for the purpose of business continuity is easily achieved using LVM when only one to three copies of that data are required. When four or more copies are required, and the business demands time-specific restore points, it's advisable to create replicas using the storage software. This is because there is a greater ability to create a large number of copies that can be synchronised and 'detached' at specific intervals using advanced algorithms for incremental and full data copy. This, combined with intelligent space management techniques such as copy on read/write, make the storage software offering pervasive.

In reality, few applications require immediate failover protection in the event of a disaster.


Atiek Arian,
Senior Consultant, GlassHouse Technologies

Host-based replication

When creating remote replicas for business continuity, the decision to deploy a host- or a storage-based solution depends heavily on the platform being replicated and the business requirements for the application(s) in the event of a disaster. If the business demands no impact to operation in the event of site disaster, host-based replication techniques provide the only feasible solution. But they require a more able storage and software infrastructure, which includes extended fabrics and parallel clustering.

Hosts at each site would be connected to a cross-site SAN fabric and mount the same copy of data, typically mirrored between arrays at each site, whilst participating in a cluster that implements a concurrent access, cached file system. Clustering and mounting data in any other fashion, such as active/passive at one site, would introduce a delay in continuing operation should the primary site fail.

 In reality, few applications require "immediate failover protection" in the event of a disaster. In most cases, the infrastructure is incapable of providing the service without massive investment. In other instances, business or operational activities (within or without the IT division) are required in order to failover the service; hence, immediate access to the technology is neither relevant nor needed.

In either scenario, the benefits from implementing a host-based cluster solution (whether active/active or active/passive) are negated by what is actually required by the business and/or what can be delivered by a relatively feature-rich offering provided by the storage technology itself.

More often than not, the job of replicating data should be left to the hardware and software devices hosting that data.

Atiek Arian is a senior consultant at Glasshouse Technologies (UK), a global provider of IT infrastructure services, with eight years experience in IT systems, storage, disaster recovery and high availability. He has experience in architecture, implementation and operational support within complex enterprise environments. Prior to working at GlassHouse, Atiek was a senior SAN, storage and Unix specialist at Centrica Information Systems and a SAN and Unix engineer at Samsung Data Systems focusing on disaster recovery and the design of highly available storage and systems environments running Oracle and SAP.

Read more on Data protection, backup and archiving