Feature

Storage technology explained: Replication vs snapshots and backup

Are replication and snapshots the same? Can you replace backup with replication or snapshots? We look into the key planks of data protection strategy, including in cloud storage

Antony Adshead

By

Antony Adshead, Storage Editor

Published: 18 Jan 2024

Backups, snapshots and replication are key methods of data protection. We look at how and why they should form part of a comprehensive enterprise data protection strategy.

In this article, we’ll look at replication – the different ways it can be done, how it differs from snapshots, and its various pros and cons. And while we’ll define and examine replication as it’s found in on-premise infrastructures, we’ll also look at use of replication in cloud storage, where customers may want to specify their requirements.

Are replication and snapshots the same?

Replication essentially produces, as you’d imagine, a replica of a defined set of stored data. It can be a replica of a drive, volume or logical unit number (LUN ), for example. What you get with replication is an exact copy. How the variants of replication differ is the mechanism by which they are created and whether that replica arrives almost immediately or maybe just eventually.

A snapshot is quite different to a replica, because for snapshots to become a usable replica some sort of rebuilding process has to occur. Replication pretty much creates a useable copy there and then – with some caveats, as we’ll see.

Snapshots are literally that. A saved point-in-time snapshot of the state of a given dataset. Then “the snapshot” typically comprises many of those recorded copies of the drive or volume, plus any updates made to it. That would also include deleted blocks that must be reincorporated to create an accurate copy from a specified previous point in time.

Snapshots can be rebuilt and rolled back to pretty quickly. Meanwhile, replication creates replicas that exist as an alternative, usable copy of the source media.

The simplest example of replication imaginable is a one-off case when, for example, a developer needs a test database to work on. In such a case they can clone an existing production database and do what they want with it in the test environment. That illustrates what a replica looks like, but it won’t reflect any further changes to the source copy, and is also limited in that it’s one specific dataset.

At the other end of the continuum is synchronous replication. In this case, data is written to two or more storage instances as near to simultaneously as possible. That provides a second working copy that can be used for almost immediate failover. Think mission-critical systems where the margin for error or delay is close to non-existent.

Obviously, synchronous replication is costly and demands the best in terms of technical infrastructure and networking.

Can replication replace backup?

Replication cannot replace backup – the two things are quite different, and they should both be used as part of a data protection strategy.

Replication will often be an almost continuous process that creates a near real-time copy. That means it will also make a replica of corrupted or infected files. So, you need backups to provide a version of your data to roll back to.

Replication cannot replace backup – the two things are quite different, and they should both be used as part of a data protection strategy

The key here is cost and how quickly data needs to be accessed – see recovery point objective (RPO) and recovery time objective (RTO). Replication is probably the most costly form of data protection, so it may be that only certain datasets are replicated while everything is backed up.

What is synchronous and asynchronous replication?

In synchronous replication, data is written to the second location as soon as it hits cache in the primary site. When it is received, the second site sends an acknowledgement to the primary site and the host where the change originated.

Synchronous replication is as close as you can get to writing multiple copies of data as near to simultaneously as possible.

Asynchronous replication acknowledges the host at the primary site when data is written. Then the write goes to the second site, and that is acknowledged back to the primary site. It therefore adds a stage in the process compared with synchronous replication.

Latency in replication suffers by about one millisecond per 100 miles. For the most critical systems that puts a cap on physical distances, but may be fine for other use cases.

Synchronous replication has more impact on application performance because it demands acknowledgement before the next input/output (I/O) operation can take place.

Asynchronous replication acknowledges locally so the next change can take place, with movement of data delayed.

The difference, of course, is that in asynchronous replication the two datasets will differ for a longer time than with synchronous.

An enterprise data protection strategy would aim to use a combination of synchronous replication for the most critical applications or datasets, while less critical data goes via asynchronous. Snapshots could be in the mix too, with the whole thing underpinned by regular backups.

What is cloud replication?

So far, we have dealt primarily with synchronous and asynchronous replication in on-premise storage arrays and servers.

But many replication options are available for cloud storage. The big three hyperscalers – Amazon Web Services (AWS), Microsoft Azure and Google Cloud Platform (GCP) – all offer replication services for customers that store data with them.

Cloud replication services allow data to be stored in multiple remote locations, potentially very distant from each other for disaster recovery or to enhance availability

AWS offers live replication that also copies metadata, with replication options that can be cross-region, same region, bi-directional, to different storage classes or to different owners, and within 15 minutes of writes or in batch – ie, when required – mode.

Microsoft Azure offers similar services, with built-in disaster recovery functionality as part of the service.

Google Cloud Platform has its Turbo Replication offer, which is also a within-15-minutes replication service.

Cloud replication services allow data to be stored in multiple remote locations, potentially very distant from each other for reasons of disaster recovery or to enhance availability.

Replication in the cloud is usually carried out via erasure coding, because most cloud storage is object storage and not suited to synchronous and asynchrounous replication as described here for on-premise storage.

Read more about data protection

The 3-2-1 backup rule: Has cloud made it obsolete? The 3-2-1 backup rule was made for small-scale use in the pre-cloud era when tape still ruled. Is it relevant in the 2020s, or can we repurpose its fundamental principles?
Cloud-to-cloud backup – when native cloud protection is not enough: There is a certain amount of protection built into cloud services, but it has its limits and full data protection requires that cloud data is secured with cloud-to-cloud backup.

Read more on Cloud storage

Search CIO

Trump shifts U.S. competition policy
While revoking former President Joe Biden's executive order on competition may make M&A more favorable for tech companies, it ...
How to become a Web 3.0 developer: Required skills and guide
Becoming a Web 3.0 expert means mixing old and new skills.
How to attract tech talent in 2025: 7 essentials
In this time of 'the great churn,' finding and keeping great tech talent sounds merely aspirational. Read on for seven methods ...

Search Security

How outer space became the next big attack surface
VisionSpace Technologies' Andrzej Olchawa and Milenko Starcik discussed a set of vulnerabilities capable of ending space missions...
How to vibe code with security in mind
As more organizations integrate vibe coding and AI-assisted coding into their application development processes, it's important ...
AI agents access everything, fall to zero-click exploit
Zenity CTO Michael Bargury joins the Black Hat USA 2025 News Desk to discuss research on a dangerous exploit, how generative AI ...

Search Networking

Cisco data center SVP on AI, cloud evolution and competition
The company's data center and internet infrastructure leader said Cisco's strategy aims to match explosive artificial ...
How to begin Wi-Fi 7 deployment
As Wi-Fi lifecycles move faster than ever, enterprises must consider when to deploy a new Wi-Fi standard. This article offers ...
Broadcom's Jericho4 boosts data center AI networking
The company's latest networking chip promises better bandwidth, security and connectivity throughout and between data centers.

Search Data Center

Nvidia introduces entry-level RTX Pro GPU
The company's RTX Pro 6000 Blackwell Server Edition GPU and RTX Pro Server offer companies using smaller-scale enterprise ...
Server hardware guide: Architecture, products and management
Today's server platforms offer various options for SMBs and enterprise IT buyers; it's important to learn the essentials before ...
Trump fee for Nvidia, AMD China exports could face legal battle
The administration's unprecedented move may conflict with the U.S. Constitution's rules against export taxes.

Search Data Management

How AI-powered governance enables scalable AI deployment
AI-powered governance tools help organizations move AI from trials to production by automating compliance, mitigating risks and ...
Alation unveils agentic AI-powered query capabilities
By accessing a knowledge layer consisting of curated data products and metadata, Chat with Your Data provides more accurate ...
Confluent joins agentic AI fray with Streaming Agents
The vendor's new environment for developing agents includes tool calling via model context protocol and connections with key ...

Close