As a digital version of the oldest known English language
Valentine's Day message goes up on its Web site, James Rogers looks
at what other organisations can learn from the British Library's
project to digitise its most important documents
A pioneering project by the British Library to digitally store
records could serve as a textbook example of sound data management,
analysts believe.
If your company is struggling to handle spiralling volumes of data
then spare a thought for the IT staff at the British Library, whose
collection of more than 150 million items covers every age and
place of written civilisation.
Although in its early stages, the project can be broken down into
three main areas:
- Digitisation
- Building a digital library store to help create a national
digital library
- Making records more accessible for users.
Undertaking something as ambitious as the British Library's
wide-ranging digitisation scheme would make most IT managers
shudder but there are common themes that can be applied to other
major data-intensive projects.
For example, analysts have highlighted the importance of content
management in a project of this scale. Sue Clarke, senior research
analyst at Butler Group, explained, "At the back-end you will need
a good content management system that will be able to handle
digital content such as photos."
Content management systems use a database as a repository for
information and the actual choice of database is of paramount
importance, said Clarke.
Most content management systems use a relational database although
she believes that a system employing an object database could be a
better choice. She said, "An object database handles digital
content better and there is at least one content management
solution on the market that employs an object database for its
content, both analogue and digital."
All organisations face the threat of losing data through a
technical failure such as a computer crashing but the British
Library also has to consider the issue of technological
obsolescence. This is where files on old forms of storage are left
stranded because the machines that are needed to run them have been
ditched. For the British Library, digitisation is key.
Neil Smith, of the British Library's e-strategy and programmes,
said, "The library intends to digitise significant and important
items in its collection to enhance access to its collections.
Digitisation can make resources available at the desktop to our
existing users and to non-users or those for whom physical access
is difficult."
When in 1477 Margery Brews of Norfolk wrote a Valentine missive to
her fiancé John Paston she will have had little idea of the complex
21st century technology that went into saving it for future
generations to read on the Internet. Nor indeed, would she have
realised that her note would form part of one of the UK's most
ambitious archiving projects.
Essentially, this Valentine's message is just the tip of a huge
project. A British Library spokesman said, "Digitising collection
items such as this is part of the library's ongoing commitment to
make its collections more accessible to people, wherever they may
be."
Like most public sector bodies, the British Library is under
pressure to make as much information available to the public as
possible. Last year it announced its New Strategic Directions
document which put the Internet firmly at its core and outlined its
plan for the next five to seven years. Central to this vision was
the desire to make its collections more accessible to people via
the Web.
With this in mind, the library is developing a digital library
store which forms an integral part of its plan to build a national
digital library. Launched in 2000 as part of a 10-year
multimillion-pound joint venture with IBM, this store will allow
the library to preserve and access electronic materials
indefinitely.
The materials to be stored here include existing digital formats
such as CD-Roms and items published on the Web, as well as digital
copies of analogue objects. The library has already created
digitised copies of some of its greatest treasures, such as the
Lindisfarne Gospels and the notebooks of Leonardo da Vinci.
Of course, storing the data in this type of project is one thing,
but organisations also need to consider how they will protect their
data while ensuring availability. Clarke said, "If you want to
ensure that there is 24x7 availability then you will have to have
some disaster recovery facility in place."
Digital images also have a significant knock-on effect on a
company's data storage arrangements.
Analyst firm IDC recently highlighted the fact that storage is now
a strategic issue, so organisations need to consider their future
needs when getting involved in major storage projects. Clarke said,
"You have to also consider the scalability of your storage."
Digital images can take up a lot of space, she warned.
In practical terms, the digital library store means that
electronically created and stored material will survive future
technology changes. This is a major issue in the research world.
The US has already admitted, for example, that a significant amount
of Nasa data on the early space programme was lost when the
technology it was stored on became obsolete.
The digital library store is being designed according to the Open
Archival Information System reference model, which relies heavily
on a description of data known as meta data. This means that
digital objects can be moved to new hardware or software platforms
as technology develops.
The solution devised by IBM involves constructing and storing
preservation meta data for every digital item in the collection. In
effect, the meta data preserves access to the object and can
provide a range of information on it.
The library spokesman said, "The preservation meta data records any
changes made to the original object and thus supports preservation
of long-term access, whilst retaining the history of the object in
digital form."
Preserving and storing digital materials while still allowing
long-term access is an ongoing dilemma for major research libraries
around the world. The British Library, for example, is currently
working with other organisations, such as the Dutch national
library and IBM, to work out the best long-term strategy.
The British Library also hopes to enable users to search remotely
for items when the digital library store is fully up and running.
The spokesman said, "When the store and supporting systems are
fully implemented, users of the British Library will be able to
retrieve objects held in the digital store at workstations in the
reading rooms or remotely."
Meanwhile organisations in both public and private sectors could
pick up some useful data management tips from the British Library's
digitisation project so far.
Early digitisation successes
There have already been a
number of success stories that have emerged as part of the
digitisation scheme. For example, the library has already launched
a Web site allowing academics to explore rare copies of the
Gutenberg Bible - the oldest surviving printed book in the Western
World. Johannes Gutenberg's Bible was printed in Mainz, Germany, in
around 1455 and only 48 copies (of which the British Library has
two) and a few fragments have survived. Using the Web site, users
can magnify images of the Bible's pages, allowing them to examine
details not visible on the original printed copies. Another success
is the "turning the pages" project, which gives library users
electronic access to medieval manuscripts.
Benefits of the digital library store
- Add new electronic materials to the digital store, including
new formats as they come onstream
- Automatically construct meta data from the object itself -
effectively saving staff resources
- Achieve zero loss of data from the store and provide a secure
environment for the library's digital collection items
- Enable the migration or emulation of objects in the store in
order to preserve long-term access
- Exchange meta data with other library IT systems for management
information and budgetary purposes.