In the following two case studies we look at how to
effectively track and store vast amounts of data to enable both
long-term storage and instant retrieval.
At the heart of a planet-hunting research project based at
Leicester University lies huge amounts of data storage.
In January 2005, the university implemented a massive
integrated storage system based on tape and disc storage as a
repository for 100Tbytes of planetary observation data.
The IT project started when the Department of Physics and
Astronomy at the university went on the hunt for a system to store
image data captured by the Wide Angle Search for Planets (Wasp)
Consortium project, a collaborative venture involving a number of
universities in the UK.
Wasp identifies new planets by searching for slight dips in the
brightness of stars as a planet passes in front of them, which
blocks some of the star's light. Wasp telescopes record tens of
thousands of stars every minute, and can send 8,000 images back to
the university every night.
Scientists analyse the data, searching for evidence of new
planets and this creates yet more data. The images and data are
stored at the University of Leicester in a database recording
observations of tens of millions of different stars.
The observational data from Wasp is available to academics
worldwide, and over the initial five-year life of the project it is
estimated 100Tbytes of data will be collated.
The university was looking for an integrated system that came in
at a competitive price, and which could allow it to add capacity as
it was required.
Among the storage systems the university considered were large
hierarchical arrays of discs with traditional back-up tape
cabinets. However, few of the systems appealed, said Richard West,
research fellow at Leicester University, who heads the university's
work on the Wasp project.
Instead, West was attracted to a proposition from storage
provider SGI. The two organisations already had a good relationship
and the university had bought a significant amount of hardware from
SGI.
SGI's proposal was to build a multi-supplier storage system,
with SGI being a single point of contact and support for the
equipment. This, in addition to the £350,000 price tag, made the
system the most attractive for the university.
The system itself uses tape, disc and storage software from SGI,
Engenio and Adic. Adic's Scalar i2000 storage array supplies
140Tbytes of tape storage to hold the vast amounts of raw data that
the telescopes produce.
SGI's TP9300, based on technology from Engenio, offers an
additional 30Tbytes of disc space for fast access to the smaller,
processed data files. The university uses SGI's DMF data migration
software to retrieve data quickly from the Adic tape library -
almost as quickly as from the disc system.
The university receives raw planetary observation data which
originates at telescopes on the island of La Palma in the Canary
Islands and in South Africa. This data is shipped by courier to the
UK and processed at other universities, which include Keele and
Queen's Belfast.
These facilities then send the data via the Janet Internet
network to Leicester University which applies some secondary
processing using a cluster of Linux servers to organise and log the
data.
The system has proved very reliable, but the main challenge was
in managing the data that came in, and keeping track of where it
was stored.
"The system presents us with a file system that is 200Tbytes in
size, so if you do not manage what you are doing you are in
trouble," said West.
The university therefore developed a Linux-based database
application that runs on a low-cost AMD Opteron server, to keep
track of the data. "An important part of our strategy is delivering
a cost effective solution," said West.
One major benefit of the SGI storage system is that it offers
flexible licensing through what is termed capacity on demand -
which means that SGI sends additional storage cartridges and
cabinets as they are required.
In 20 months of using the system, Leicester University has
doubled its tape capacity, which it did not originally anticipate.
This flexibility was the reason it chose capacity on demand.
"University project funding often comes in dribs and drabs, and
that sort of system allows us to buy capacity when we can afford
it," said West.
"Our primary goal was to get as much storage as possible, and we
calculated this on the split between disc and tape, with disc being
a fraction for live data. Over time, we have added more tape but
not more disc. We found that our usage patterns have changed and we
are using tape for long-term storage," said West.
He said that although it is hard to measure a return on
investment for the system, he believed that the management software
was good and that the hardware was reliable. "It sits there and
ticks away," he said.
In terms of expanding the system beyond 200Tbytes, West said,
"We have enough storage for our projected data acquisitions over
the next year or two, and then we will see how the funding
goes."
How document management made the difference
Birmingham City Council is one of Europe’s largest councils,
employing 55,000 staff and providing services to more than one
million citizens. As a result of its size, the council’s data
storage requirements are vast.
The council needed to adopt a new storage platform which had
document management technology integrated into it so it could scan
and store documents.
The council had a number of libraries that were full of
documents, with more arriving each day. For example, it was
receiving 800 planning applications a day.
Eman Al-Hillawi, electronic document management system project
manager, said the problem hit home during an office move. “We had
to dedicate a full floor to our physical archive,” he said.
In addition, the council was predominantly using paper-based
processes, which were inefficient and time consuming. The council
needed to meet e-government objectives, cut costs, conserve space
and improve document control and access.
Al-Hillawi said, “We wanted to improve service to citizens. With
paper files, only one council employee at a time can hold the
master version of a document. That makes it difficult for other
staff members to have timely access to all of the latest materials
they need to make decisions.”
The council tested a number of different storage systems, using
professional contractors and in-house benchmark testers, eventually
choosing a unified storage system from NetApp.
This system met the various criteria of performance,
connectivity, scalability and disaster recovery, said Andrew Jones,
server manager at the council.
The architecture of the NetApp EDMS system was based on a NetApp
FAS920c, a mid-range storage filer unit that can scale up to
12Tbytes. This is used to consolidate the council’s primary data
for an EMC
Documentum document management application, as well as for
storing user files.
The council deployed the system across its planning and urban
design service areas, scanning and indexing all its paper
documents. More than 150,000 documents have now been added to the
system, including planning applications, project documents,
drawings, photographs and specifications. The storage system also
supports Autocad design images, voice files and video
recordings.
The council linked its Oracle database into the storage system,
and 1,500 council employees now access Oracle-based Documentum and
Windows files stored on the NetApp system.
Read article:
Expand your outsourcing horizons
Comment on this article:
computer.weekly@rbi.co.uk