Much is made in the enterprise data storage industry about
the performance of disc systems over tape drives, but the managers
of one datacentre that has reached the far limits of capacity say
otherwise. Budget and performance demands forced them to build
access protocols and data management tools for disc systems from
scratch.High-end commercial tape drives, on
the other hand, have largely met their requirements as one of the
largest data producing facilities in the world.The facility is the Large Hadron Collider (LHC), owned by CERN,
the world's largest physics laboratory, in Switzerland. The
collider, which will be used for new, highly data-intensive
experiments beginning in May 2008, is a tube large enough in
diameter to drive a small car through. It accelerates particles
around a 10-mile-wide circle formed by the tube underground,
bringing them together at four set collision points in order to
smash them apart. Even further down, 12-story-high caverns full of
electronic detection equipment collect raw data on the
collisions.
With the new project being launched next year, scientists hope
they can use the collider to discover new subatomic particles,
which in turn could help to explain fundamental mysteries of the
universe.
"One particular theoretical particle that we're looking for is
called the Higgs boson particle," said Francois Grey, head of IT at
CERN. "It's the missing piece in a model known as the Standard
Model that provides a coherent picture of our universe."
When fully operational, these new experiments will produce 15
petabytes (PB) of raw data annually. During each collision, the
system produces high-resolution images in the hopes of capturing
evidence of the elusive particle. From that, the data is pared down
to about 1 PB of refined event summary data annually.
This data is stored on a massive farm of
network attached storage (Nas) servers, currently 500 in all,
though that number will increase to 800 by the end of this year.
The servers are based on Red Hat Enterprise Linux running XFS and
whitebox x86 hardware. Some older systems with smaller capacity are
included in the Nas farm, but each of the servers currently being
delivered to CERN comes with twenty 500 GB SATA discs, or 10
terabytes (TB) total storage. Currently, the farm has 3.5 PB of
capacity, and once more are added next year that number will nearly
double.
Data is placed on the Nas servers, granted access from clients,
migrated to tape libraries and managed using a home-grown program
called CASTOR, which stands for CERN Advanced Storage System.
According to Tony Cass, the leader of the Fabric and
Infrastructure Operations group at the CERN facility, during
previous projects at the collider, data storage had been handled by
a mainframe and Unix systems, but CERN moved to open systems
several years ago when it could no longer afford the number of CPU
calculations required for the mainframe to support its data.
The organization looked into a number of commercially available
filesystems and file management products, including GPFS, Lustre
and Sun Microsystems's SAM-FS, which performs similar functions to
CASTOR, but never found one that could meet its scalability and
performance requirements. In addition to requiring hundreds of
gigabytes per second throughput, the system must also allow
concurrent access to the grid from researchers around the world,
including grid partner sites in other countries that are hosting
copies of some of the analysis data. The organization has also
found that standard filesystem protocols, like CIFS and NFS, aren't
up to the task, so it's using specialized community-developed
communication protocols specific to high-energy physics research
for client access to the system.
While the disc technologies and software being used will be
exotic to most enterprise storage managers, with the possible
exception of Google Inc. Cass said that data centers, like those
run by Google and use a farm of commodity PCs, are the closest
commercial comparison to the datacentre at CERN, the tape libraries
used for long-term storage should be familiar. CERN has 160 tape
drives backing its system in all, fifty 3592 and TS1120 drives
spread across two IBM 3584 libraries and another 50 T10000 drives
spread across two Sun/StorageTek SL8500 silos. Another 60 drives
are older Sun/StorageTek 9940s.
The proprietary tape drives go against the organization's
standards favoring inexpensive and commodity products, Cass
admitted, but said CERN had found the proprietary drives not only
had performance advantages over LTO, which CERN evaluated first,
but also that the ability to repurpose media would more than cover
the higher cost of the libraries.
It's a capability that has long been supported by IBM and
StorageTek, according to IDC tape analyst Robert Amatruda, but it
often gets "lost in the discussion" around tape. Proprietary media
cartridges can be reformatted at the density of a new generation.
For example, today's 500 GB cartridge can be reformatted to a 1 TB
cartridge if there is a capacity refresh for the format.
"It takes a long time to do, so many users don't do it,"
Amatruda said.
However, Cass said that at about $200,000 for a petabyte's worth
of cartridges and the anticipated capacity of 15 PB, the ability to
recycle the cartridges will save CERN millions per year.
Though the tapes, which are constantly being written,
reformatted, copied, transferred and read by the CASTOR system have
not been a bottleneck, there are some enhancements that Cass said
he'd like to see to boost their performance further. For example,
due to the high mount rate inside CERN's silos, Cass said he'd like
to see the tape robots have the ability to "pre-fetch" cartridges
for the appropriate drives, the better to feed them faster.
Cass also said he has in no way dismissed the possibility of
going with a commercial disc system down the road. "If you look out
in a few years, other enterprises are coming up with similar
[capacity and performance] demands. Five years down the line, I
would expect commercial products to have caught up with us. After
all, five years ago you didn't have the transfer rates or
capacities you're seeing with these high-end tape drives now."