In a career working in more data backup environments than I can remember, I have occasionally come across a documented...
operation referred to as data archiving. However, these operations are more often than not a simple data dump performed by an enterprise backup application directly to tape and then taken somewhere else until someone decides they need access to it. I'm regularly informed that this is the business' "archiving solution." This process is neither 'archiving' nor any kind of 'solution' to good data management.
Data archiving itself is still a relatively immature form of data management, but that's exactly what it is — data management. Though data backup applications are capable of an incredible number of tasks these days, backups are taken to satisfy the need for data protection, not for file archiving purposes.
You take a backup to protect an important document you've created so that if it becomes corrupted you can go back and retrieve a copy of it. Your work has been protected because a copy was made from a previous version.
Archives are not data copies. Data archiving isn't going to help bring your database back to life in the event of a disaster. Where data backups can be (very) simply described as a copy operation, data archives are a move.
Businesses typically employ a data archiving solution to accomplish two main goals: regulatory compliance and to cut storage costs. While we're obligated to comply with industry regulations, every IT organisation is focused on cutting storage costs.
Data archiving moves data of a certain age that is infrequently accessed off the primary storage pool and into one that costs less. This move may occur a number of times within a given near-line archiving solution until the data is, or nearly is, immutable and rarely accessed. It's then moved to the cheapest form of storage possible (usually tape) and shipped to an off-line location.
This cycle requires fewer purchases of primary storage capacity, and frees up resources on that primary storage to allow better performance. The first of these directly drives cost reduction in a strictly budgetary manner; the latter can dramatically slow increasing access times to the primary storage pool, therefore saving time (time = ££). The latter is also especially important for those giant, high-profile databases and their search functions. A backup creates a copy of a portion of data, and having more of the same data lying around most definitely does not decrease the costs of the storage environment.
Administering a backup environment is essentially an officially permissible form of extortion. Application owners are forced to pay for protection services or face roving bands of data corruptors at their own risk. Data archiving is more like hiring well organised movers to wrap your data in boxes with lots of tape, get it out of your way, and keep it in a secure storage locker close by until you need it again or can throw it out. Of course, both data backups and data archives provide many complicated and wonderful services.
In the end though, they're most definitely NOT interchangeable. That old "archiving solution" -- backing up old data to tape and then sending it to an off-site location indefinitely -- may work well enough for certain pools of structured data, but it's merely a long-term backup and should be known as such.
About the author: Brian Sakovitch, senior consultant at GlassHouse Technologies (UK), has followed a six-year path in backup technologies ranging from hands-on installation and implementation, to design and theory. Three of those years have been with GlassHouse US focusing on a number of predominantly backup related engagements for companies of all shapes and sizes.