Archive or backup?

How long does a backup object have to be retained before it is considered an archive? Or, is it actually not just a simple matter of time? This tip presents information that will help draw your own conclusion on a not so "cut and dried" subject.

What you will learn from this tip: Archiving, backup and their differences.

Backups and archives consist of copies of data kept for a certain amount of time for future access (at a very high level). In essence, both are very similar in nature except for their respective lifespan. Generally, most view a backup as a short-term retention copy of a file or record in case the original is lost or damaged beyond repair. Conversely, an archive is typically viewed as the means to meet a requirement to retain a record for future reference.

Theoretically, you could take a backup copy of specific data and retain the copy for many months or even years and you would have yourself an archive. In that particular context, the first question that comes to mind is: When does a backup copy become an archive? This is where things get a little muddy. While many IT practitioners associate the term archive with long-term retention, it is not just a question of time. This is mostly due to the way most traditional backup products handle data retention. Keeping track of which "backup jobs" to delete and which ones to keep can become a daunting data management task. Traditional backups are usually part of a sequence, which is typically a series of weekly full backups followed by daily incremental backups that are kept for a predetermined amount of time (i.e., 30 days). In order to keep a copy for a longer period than usual, an out-of-sequence copy must be created. That is, a copy that is not associated with the 30-day retention in our example.

This is where the attributes of an archive start to take shape. We can think of an archive as an out of sequence copy; a copy that is not associated with other copies for retention purposes (i.e., full and incremental). Let's look at other attributes that should differentiate an archive from a backup object:

  • Archives should not be retained simply based on the number of existing copies. Each archive should be a unique object bearing a time stamp, descriptor and a retention parameter.
  • We typically backup data to protect it from being lost or altered and because it must remain readily available; it would therefore go against the rules to delete a file after backing it up. Conversely, data is often archived so it can be deleted from its original location because immediate access is no longer required.
  • Archived data can be extracted from its original context and catalogued or indexed for later retrieval. This is the case for CAS or email archiving products where a message or attachment is taken out of its usual structure and stored elsewhere.

As a general rule, we can go back to the days of paper records and draw a parallel with today's backups and archives. Back then, records were typed or handwritten and carbon copies or photocopies were used for backups. When a document lost some of its daily business relevance but still had to be retained, is was taken out of the filing cabinet, put into a cube-box and sent to some basement or warehouse to be kept as an archive. That said, this is pretty much where the similarities end. We don't have a problem reading a paper document that was archived 50 years ago -- the same cannot be said about electronic archives.

In closing, and without trying to oversimplify things, if a record is copied for protection, we can probably call it a backup. If the same record it stored on some media with particular concern with immediate access, it's probably safe to call it an archive.

About the author: Pierre Dorion is a certified business continuity professional for Mainland Information Systems Inc

Read more on Storage management and strategy

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.