An ABC guide to data migration

To achieve an effective migration to a new system, first invest some time and effort in cleaning your data

Gigo - garbage in, garbage out - is one of the most ingrained principles of IT, but it can still be the last thing on anyone's mind in a major software implementation, writes . But since most newly-installed systems need to operate with data that used to be processed by the replaced system, ensuring that it is correctly migrated to the new system is an essential task. It is seldom straightforward, warns Richard John, managing director of data migration specialist Alchemy. Imagine, he says, having to transfer a train-load of passengers and their luggage between trains on different gauges, with carriages of different dimensions, into exactly the same arrangement as before. There are two home truths about data migration that have to be taken on board, says John: Cracking the problem as painlessly as possible is a matter of ABC - analysis, building and cleaning, says John. Analysis should be done by a methodical tool, not as it usually is, manually, which is not only uncertain but slow and expensive - anything from a third to a half of the cost of the whole implementation. "Data migration is the last great bastion of manual effort - but it is too big to do manually. There can be thousands of individual data elements in a legacy system, so people start making assumptions about it, which gives us dodgy foundations [for the new system]," he says. Scoping, for example, can go wrong if it is done simply on size of database. "Far more important than the volume of data is the number of fields each with its own code," says John. "For every single field, to analyse it, build the interface code, test it and implement it takes about seven hours." "How clean is your data? That's the great unknown," warns John. It is, he says, as important as the complexity of the data. Typically, about 5% of data may not fit a business rule it is supposed to, often because of an undocumented spur-of-the-moment programming fix to a one-off problem. Moreover, it may have been clean enough for the last system, but the new system could, for example, want to be able to collate data from a customer point of view, rather than product line. This means sorting out whether Bill Jones buying product A is the same as Bill J Jones buying product B, or W Jones buying product C, before you can establish just how profitable Mr Jones is as a customer. Traditionally, says John, analysis takes about 10% of the data migration effort, the remainder divided between 50% for coding and 40% for cleaning. Better to take longer on initial analysis, he advises, and get it right, than have the code fall over in the build phase and have to go back and re-analyse. Good analysis, with a tool let loose on the data itself to understand structure and content, rather than metadata or documentation, can cut the number of iterations of the build stage - where the data is carried forward from one environment to another - to only two or three cycles. Otherwise, 30 iterations is not uncommon, warns John. There is no doubt, he says, that data migration is not regarded as the most fascinating aspect of a major new systems implementation project, yet if it is skimped, muffed or even rushed the success of the new system can be dangerously compromised. Gigo will come back to bite. "If users discover they can't get invoices sent out because of rubbish data it's a bit of a shock," John warns. To achieve an effective migration to a new system, first invest some time and effort in cleaning your data

Ross Bentley

  • There is always a bigger mess than you thought

  • Your assumptions are always wrong.

Read more on Database software