Backup vs archive: Can they be merged?

As backup software suppliers build more archiving features into their products, we look at whether it’s possible to merge backup and archive systems

The distinction between backup and archive is often blurred but should be very clear.

Backups are made at least daily, leave the original data in place, and have the aim of protecting data against technology failure or human error over relatively short periods, such as weeks or months.

Archiving, on the other hand, is the retention of data for lengthy periods, usually years, sometimes decades, and moves the data from its primary location.

Greg Schulz, senior advisory analyst at StorageIO, explains: "Backup is for restoring a file, object, database, volume or system based on some recovery time objective and recovery point objective, whereas the archive is a picture of the data and its state at a point in time."

Schulz highlights key characteristics of archiving systems. These include: "Indexing and metadata management for search, replication, cloning, secure shred, Worm [write-once read-many], along with compliance or regulatory items."

In addition, archiving includes movement of data off production storage systems onto the archive medium, driven by retention policies. "Data mover tools may be tightly or loosely integrated with the destination or target devices and in some cases even have overlapping features," says Schulz.

The third component which does not attract as much awareness is the most important, however - how the data mover tools integrate with different applications, which need to be configured to use rules or policies to archive the data, or present it to the data mover.

Another element of the distinction can also be the medium. Media used for backup need to be able to ingest vast quantities of data quickly during a limited time window. As a result, disk rather than tape has increasingly been used for the added performance it provides, as well as providing faster access times to recently backed-up data.

Archives, on the other hand, have increasingly become tape-based, which offers the advantage of being cheap and robust over long periods of time, while the fairly slow speed of recovery is rarely a problem as occurrences are rare. This also allows time for the long process of indexing and creating metadata.

Read more on backup vs archive

  • Backup and archive continue to converge
  • The critical role of archiving systems
  • Misconceptions about using a single platform for backup and archive
  • Aligning the linear tape file system for enterprise data archiving
  • Tech Talk: Classifying backup, disaster recovery and archiving
  • Do many backup and archive products support Linear Tape File System?

Backup vs archive: Are they converging?

But, is the apparently clear distinction between backup and archiving changing? Are the two merging?

According to Jason Buffington, senior analyst and lead researcher at Enterprise Strategy Group (ESG), users are increasingly taking a converged approach to backup and archiving. In a recent survey, 83% of respondents said they used backup software for all or part of an archiving strategy, while 41% use backup software as their only means to archive data.

Maintaining separate archive and backup architectures containing duplicated data is expensive, not just in terms of hardware and software but also maintenance. This has led some backup vendors to add archiving features to their backup software products.

But Buffington warns that while the archiving features in backup software is good enough for many organisations, "You still have to be careful. Backup vendors do try to cash in on the remaining confusion between backup and archive."

In particular, he cites archiving features such as auditing, e-discovery, hold and compliance that require archiving software.

So, if using backup software to retain data for archiving purposes, IT teams need to consider whether the software in use is the right tool for the job.

That’s because simply lengthening the time for which data is retained as backups can lead to other problems, such as increased use of storage capacity, inadequate metadata to allow data recovery, and a lack of data medium management.

For example, if the organisation is faced with legal proceedings and an e-discovery request, the IT team will need to find items such as emails with specific keywords or files in a particular directory rather than needing to restore things to how they were yesterday.

Another, if less-discussed, complication is what happens when archived data needs to be deleted, perhaps because it relates to a project containing intellectual property that the organisation no longer owns.

While in some cases the tasks of backup and archiving are being merged in the software, the media used is also a site for convergence. ESG’s survey found tape use numbers for backup and archive were "nearly identical". In other words, tape is still being used for backup, despite being increasingly positioned by IT suppliers as mainly an archiving medium.

Key archiving features

For an organisation that uses backup software to archive data to tape and then store those tapes off-site, retrieving data involves a number of steps.

Tapes that contain the required data need to be identified, retrieved from off-site storage, and then mounted and read and possibly deleted. All these operations can be problematic, especially when reading tapes that may be several years old. Are they still readable or has the medium degraded? Are hardware and software still compatible? And how long does it take to find the data?

With those obstacles surmounted, a rich set of metadata is required to find the relevant information, especially if it was created over a considerable period of time, as a large number of files in multiple formats will need to be examined.

Archiving systems can help resolve many of these issues. Rich metadata enables identification of the correct tapes and ensure the required data is quickly retrieved, while tape libraries ensure tapes are regularly refreshed to avoid bit rot.

Few enterprise backup systems provide all these features although some archiving features are provided, as the following two examples demonstrate.

Symantec NetBackup Enterprise Vault, for example, adds metadata in the form of searchable indices, and tags that allow you to classify data not just by its format, but also, for example, whether it includes credit card numbers, or by longevity requirement for compliance purposes.

Meanwhile, IBM Tivoli Storage Manager for Space Management offers a hierarchical storage option to allow infrequently used files to be offloaded onto cheaper media, using stubs as links to the archived data.

Backup vs archive options

So, despite the use by many companies of backup software for archiving purposes, it is more likely that larger organisations, perhaps with more awareness of issues such as compliance and almost certainly with considerably more data to retain, will opt for an archiving solution rather than bending its backup software to a task to which it is not ideally suited.

As Pierre Dorion, senior consultant with Long View Systems, points out, an archive is not a backup copy kept for a long time, and using backup software to produce large amounts of archives that may eventually need to be searched is a bad idea.

An alternative option for smaller businesses, which are less likely to want to sink considerable financial investments into archiving systems, automated tape libraries and the maintenance associated with them, is to use an online archiving service.

Not only could this help circumvent legal and other issues stemming from data retrieval difficulties further down the line, it should ensure the rigorous procedures that need to be followed to ensure data retrieval is someone else's headache.

Read more on Data protection, backup and archiving