A pioneering project by the British Library to digitally store records could serve as a textbook example of sound data management, analysts believe.
If your company is struggling to handle spiralling volumes of data then spare a thought for the IT staff at the British Library, whose collection of more than 150 million items covers every age and place of written civilisation.
Although in its early stages, the project can be broken down into three main areas:
- Building a digital library store to help create a national digital library
- Making records more accessible for users.
Undertaking something as ambitious as the British Library's wide-ranging digitisation scheme would make most IT managers shudder but there are common themes that can be applied to other major data-intensive projects.
For example, analysts have highlighted the importance of content management in a project of this scale. Sue Clarke, senior research analyst at Butler Group, explained, "At the back-end you will need a good content management system that will be able to handle digital content such as photos."
Content management systems use a database as a repository for information and the actual choice of database is of paramount importance, said Clarke.
Most content management systems use a relational database although she believes that a system employing an object database could be a better choice. She said, "An object database handles digital content better and there is at least one content management solution on the market that employs an object database for its content, both analogue and digital."
All organisations face the threat of losing data through a technical failure such as a computer crashing but the British Library also has to consider the issue of technological obsolescence. This is where files on old forms of storage are left stranded because the machines that are needed to run them have been ditched. For the British Library, digitisation is key.
Neil Smith, of the British Library's e-strategy and programmes, said, "The library intends to digitise significant and important items in its collection to enhance access to its collections. Digitisation can make resources available at the desktop to our existing users and to non-users or those for whom physical access is difficult."
When in 1477 Margery Brews of Norfolk wrote a Valentine missive to her fiancé John Paston she will have had little idea of the complex 21st century technology that went into saving it for future generations to read on the Internet. Nor indeed, would she have realised that her note would form part of one of the UK's most ambitious archiving projects.
Essentially, this Valentine's message is just the tip of a huge project. A British Library spokesman said, "Digitising collection items such as this is part of the library's ongoing commitment to make its collections more accessible to people, wherever they may be."
Like most public sector bodies, the British Library is under pressure to make as much information available to the public as possible. Last year it announced its New Strategic Directions document which put the Internet firmly at its core and outlined its plan for the next five to seven years. Central to this vision was the desire to make its collections more accessible to people via the Web.
With this in mind, the library is developing a digital library store which forms an integral part of its plan to build a national digital library. Launched in 2000 as part of a 10-year multimillion-pound joint venture with IBM, this store will allow the library to preserve and access electronic materials indefinitely.
The materials to be stored here include existing digital formats such as CD-Roms and items published on the Web, as well as digital copies of analogue objects. The library has already created digitised copies of some of its greatest treasures, such as the Lindisfarne Gospels and the notebooks of Leonardo da Vinci.
Of course, storing the data in this type of project is one thing, but organisations also need to consider how they will protect their data while ensuring availability. Clarke said, "If you want to ensure that there is 24x7 availability then you will have to have some disaster recovery facility in place."
Digital images also have a significant knock-on effect on a company's data storage arrangements.
Analyst firm IDC recently highlighted the fact that storage is now a strategic issue, so organisations need to consider their future needs when getting involved in major storage projects. Clarke said, "You have to also consider the scalability of your storage." Digital images can take up a lot of space, she warned.
In practical terms, the digital library store means that electronically created and stored material will survive future technology changes. This is a major issue in the research world. The US has already admitted, for example, that a significant amount of Nasa data on the early space programme was lost when the technology it was stored on became obsolete.
The digital library store is being designed according to the Open Archival Information System reference model, which relies heavily on a description of data known as meta data. This means that digital objects can be moved to new hardware or software platforms as technology develops.
The solution devised by IBM involves constructing and storing preservation meta data for every digital item in the collection. In effect, the meta data preserves access to the object and can provide a range of information on it.
The library spokesman said, "The preservation meta data records any changes made to the original object and thus supports preservation of long-term access, whilst retaining the history of the object in digital form."
Preserving and storing digital materials while still allowing long-term access is an ongoing dilemma for major research libraries around the world. The British Library, for example, is currently working with other organisations, such as the Dutch national library and IBM, to work out the best long-term strategy.
The British Library also hopes to enable users to search remotely for items when the digital library store is fully up and running. The spokesman said, "When the store and supporting systems are fully implemented, users of the British Library will be able to retrieve objects held in the digital store at workstations in the reading rooms or remotely."
Meanwhile organisations in both public and private sectors could pick up some useful data management tips from the British Library's digitisation project so far.
Early digitisation successes
There have already been a number of success stories that have emerged as part of the digitisation scheme. For example, the library has already launched a Web site allowing academics to explore rare copies of the Gutenberg Bible - the oldest surviving printed book in the Western World. Johannes Gutenberg's Bible was printed in Mainz, Germany, in around 1455 and only 48 copies (of which the British Library has two) and a few fragments have survived. Using the Web site, users can magnify images of the Bible's pages, allowing them to examine details not visible on the original printed copies. Another success is the "turning the pages" project, which gives library users electronic access to medieval manuscripts.
Benefits of the digital library store
- Add new electronic materials to the digital store, including new formats as they come onstream
- Automatically construct meta data from the object itself - effectively saving staff resources
- Achieve zero loss of data from the store and provide a secure environment for the library's digital collection items
- Enable the migration or emulation of objects in the store in order to preserve long-term access
- Exchange meta data with other library IT systems for management information and budgetary purposes.