Storage administrators usually don't have trouble getting data
onto disc or tape. The real challenge is keeping that valuable
corporate data safe in the face of daily operations. Whether you're
dealing with a hard disc failure, chasing a misplaced tape or
recovering from nature's wrath, data loss is simply a fact of life.
But data loss isn't simply an inconvenience -- it can result in
costly business interruptions, and the increasing weight of
government regulations and consumer expectations can pose severe
penalties for lost data. IT professionals must take decisive steps
to
back up and protect corporate data. This chapter covers the
areas of disc, tape and remote data backup, and explains the
essential ideas used in successful data backup strategies.
Tape backup
Tape is the quintessential data backup medium. Most tape
technology is well established and inexpensive but it is also too
slow to serve as a primary storage platform. The appeal of "cheap,
plentiful and slow" storage has made tape a traditional complement
to
disc storage systems.
Tape storage is a removable media technology, so tape cartridges
can easily be exchanged with any compatible drive mechanism. The
cartridges are designed for specific tape drive architectures and
are not interchangeable. The "tape" is simply a length of flexible
plastic ribbon coated with magnetic media and wrapped around a set
of spindles. The spindles are mounted inside a plastic cartridge
enclosure that protects the tape media from damage. Tape cartridges
have a relatively short working life because the tape media
actually contacts the tape drive's read/write heads. It is
recommended that tapes should be replaced after about 2,000
passes.
A tape drive is the electromechanical device that reads and
writes to the tape cartridge, and exchanges that data with the rest
of the computer. Drives typically use either
helical scan or linear tape head technology to access the tape.
Helical scan drives use a rotating head positioned at an angle,
reading and writing data as diagonal stripes along the tape's
width. Linear tape simply positions a stationary head that runs
along the tape length. There are numerous tape formats in service
today that leverage these two approaches, including
advanced intelligent tape (AIT),
digital data storage (DDS),
digital linear tape (DLT),
Linear Open-Tape (LTO) and Travan. The choice of tape drive
should include a consideration of capacity need, performance speed,
media cost and technological longevity.
Even the latest tape technology cannot offer enough capacity to
backup an entire enterprise to a single tape. Rather than manually
spanning data backup jobs across multiple tape cartridges, tape
drives are often organized into groups, dubbed
tape libraries. Backup software then utilizes the various
drives and cartridges in the tape library to achieve a complete
backup. In many cases, a robotic arm or
autoloader is added to exchange tapes with each drive, allowing
a tape library to manage a huge number of tapes.
Backup software is a critical management tool that interfaces
backup hardware (the tape drives and libraries) with corporate data
servers, allowing administrators to decide when and where to backup
selected files, folders, drives, servers or even entire data
centers. Backup software also supports automation so backups can be
performed and verified during off hours without direct human
intervention. For example, EMC NetWorker and Symantec Veritas
Backup Exec 10d are two well-known backup tools.
disc backup
While hard discs are certainly the primary storage medium for
all types of computer systems, discs are increasingly being used
for data backup tasks. This is partly due to the falling costs of
high-volume storage devices, such as
SATA and
SAS drives, but also because backup needs are changing. Many
organizations work in a global 24/7 marketplace and cannot afford
to go offline for nightly tape backups. When trouble does strike, a
busy organization must restore its operations in a matter of hours
-- not days. discs offer the cost-effective speed and storage
capacity to make disc-based backup effective
[see Chapter one for more information about disc storage].
The simplest type of disc-based backup is disc-to-disc, basically
copying the contents of one disc to another. If the first disc
fails, data can be retrieved from the other. This is sometimes
called
mirroring and is an essential tenant of
RAID. In some cases, both disc and tape technologies are
combined in a
disc-to-disc-to-tape platform, dubbed D2D2T. Primary disc
storage is first backed up to secondary discs -- lost data can be
quickly restored from the backup disc. Tape is then added on as a
form of long-term
archival storage. A benefit of D2D2T is that tapes can be
written from the secondary disc storage so the main storage system
is not taken offline in the tape writing process. The resulting
tapes can then be sent off site to protect the primary and
secondary disc storage systems against disaster.
Some companies with established investments in tape libraries
may have trouble justifying the shift to disc-based backup systems.
One way to ease the transition anxiety from tape to disc is through
a
virtual tape library (VTL). A VTL is simply a disc storage
system designed to mimic the behaviors of a tape library. By
emulating a tape system, a VTL can utilize disc speed to accelerate
backups and
restorations while leveraging an organization's existing backup
software, policies, infrastructure and in-house technical
expertise. Select a VTL that will most closely match your current
tape library system, capacity needs and backup software. For
example, Advanced Digital Information Corp.'s Pathlight VX can
offer up to 57.6 terabytes of capacity while emulating LTO-1 and
LTO-2 drives.
Remote backup
One problem for today's enterprise is the proliferation of
remote offices. Business data can be just as important on servers
in the Boise, Idaho, sales office as in the Seattle headquarters.
Unfortunately, remote offices typically do not staff IT personnel
-- relying instead on non IT workers to rotate backup tapes and
ship them to a data center. Several trends are appearing to address
this problem. A growing number of organizations are eliminating
tapes in favor of
WAN-based backups that transfer crucial information to the data
center across broadband WAN connections. LiveVault Corp.'s
InControl is one product intended for remote WAN backups. Rather
than creating physical tape backups and rotate them to an off-site
storage facility, WAN is also being employed to transfer data
directly to an off-site archive service, such as Iron Mountain Inc.
[see the SearchStorage.com article on remote office
backup].
Bandwidth is the main issue with any WAN-based backup scheme.
Fast bandwidth is expensive, so the focus with WAN backups is to
use techniques like data deduplication (a.k.a. single-instance
storage or commonality factoring) and conventional compression to
optimize the use of available bandwidth. Another popular technique
is to avoid complete backups over WAN and just transfer the most
important business files between locations.
Some organizations are eliminating the difficulties of remote IT
by consolidating remote IT into a single data center. Remote access
then uses WAN links with application accelerating technologies,
like WAFS, to serve applications and files to remote offices just
as if the data were local. WAFS generally involves appliances
installed at both ends of the WAN link, which cache needed files to
each remote office for quick access. Any changes to a file can then
be saved back to the data center as time and bandwidth allow
[see the SearchStorage.com article on WAFS].
Other backup concepts
Backups generally fall into three categories: full, incremental
and differential. A full backup is a complete copy of all files. A
full backup on a server with 528 GB of data will transfer all of
that data to the backup target (e.g., disc or tape). Full backups
take the longest to make, but they are easiest and fastest to
restore. An incremental backup only tracks changes made since the
last backup event. If you perform a full 200 GB backup on a server
Monday, and 2 GB of new data are added on Tuesday, an incremental
backup will only capture the new 2 GB. If another 1 GB changes on
Wednesday, only the new 1 GB is captured. Once a full backup is
performed, incremental backups can be very quick. However, you must
restore a full backup first and then all of the incremental backups
in succession since that last full backup. By comparison, a
differential backup captures the total changes made since the last
full backup. For example, if 3 GB changes on Monday, 2 GB on
Tuesday and 7 GB on Wednesday, each day's differential backup will
capture 3 GB, 5 GB and 12GB respectively. Differential backups can
take longer to make than incremental backups, but are easier to
restore. With a differential backup, only the full backup and last
differential backup must be restored.
Mirroring and
replication are essentially the same thing -- both create
copies of data -- but there are subtle differences in context.
Replication is basically an offline copy of the data that isn't
necessarily intended for use but mirroring creates a data copy that
can be used directly. For example, data is frequently replicated to
CD or DVD for long-term archival storage but data may be mirrored
to disc for RAID.
Snapshot and
continuous data protection (CDP) technologies are also
appearing in disc-based backup systems. Snapshots capture the state
of a storage system at a given point in time, saving detailed
reference information about available data and its location,
similar to a detailed table of contents. When trouble strikes, data
can be restored based on the latest snapshot. Snapshots can be
taken as frequently as a storage administrator deems necessary. CDP
provides even more granular detail, recording each storage
transaction to a journal in real-time. If data loss occurs, the
storage system can be "wound back" to the last good transaction,
which could be minutes, even seconds, ago
[see the SearchStorage.com article on CDP].
Security is increasingly important for all data backup
operations. Company data often includes confidential or personally
identifiable information that needs to be protected. When a tape is
lost or a network is hacked, sensitive information may fall into
the wrong hands. Backup systems are starting to use
encryption when saving files to tape or archival storage.
Encrypted data cannot be read without the corresponding keys, so
encrypted data cannot be misused if it's stolen.