Data backup vs archiving: What's the difference?

Learn the difference between data backups and data archiving in terms of file format, data retention and searchability, as well as the pros and cons of disk and tape for archive.

The key difference between backup vs. archiving is that data backups are designed for the rapid recovery of operational data, while data archiving stores data that's no longer in day-to-day use but must still be retained.

Data backups are intended to provide a quick means of recovering data that's in current or recent use in cases ranging from data corruption or accidental deletion to full disaster recovery (DR) scenarios. Speed of restoration is vitally important.

Data archiving is intended as a repository for data that needs to be stored for periods that may extend to decades. Speed of restoration from a data archive usually isn't as critical as from a data backup, but searchability is of vital importance.

Backup applications tend to keep data in a proprietary format, which can be a problem for long-term data retention. Many businesses will have been through a number of data backup software upgrades in the space of, say, a decade, and that may well mean that old backups soon become unreadable. For that reason, data archiving should be handled by an application specially designed for the job and that moves files to the archive in native format.

Backup vs. archiving

Being able to search a data archive is vital, for business and compliance reasons, and especially when a formal legal search may impose penalties against late submission of information.

With data backups, you'll often know what files and folders you want to find and where the media is. That's not going to be the case with data archives that have built up over several years, so you'll likely need to search by keyword. Data archiving software builds up metadata indexes on the stored data to allow for reasonably rapid searching.

But don't mix data backups and data archiving. Each technology has its own characteristics and needs in terms of retention and recovery objectives, and you'll only create more data storage issues if you combine the two.

Forms of data archiving media

Tape can be used for archives, but there are some caveats to bear in mind. Tapes need to be maintained and have a limited life, whereupon data must be transferred to media. In addition, with the rapid passage of LTO and other tape formats, you'll need to ensure that you'll be able to read a tape a decade from now. There are also numerous tales of data being impossible to recover from tape at all.

In addition, you must always have a way to search tapes, which involves careful manual or tool-based recording of where data is located on the tape.

Similar considerations of media life expectancy and searchability apply to optical forms of storage such as DVD.

Spinning disk can be used for archives and there are advantages in it being a consolidated, accessible and indexable medium. The drawbacks come from disk requiring power to keep it spinning and, as with other media, there's also the risk of a disk subsystem coming to end of life and data needing to be transferred to media.

A couple of ways to mitigate the cost of disk as an archive medium are data deduplication, which can substantially cut down on the size of data held, and Maid, which is a disk subsystem that has various methods of shutting down disk hardware to save power.

Some regulatory regimes require data archives to be kept in media that's write once, read many (WORM) so that once data is written it can be verified as unchanged. Disk, tape and removable optical media can be WORM.

Read more on Data protection, backup and archiving