Legacy data management does not allow for easy or efficient data access. The data, once entered and stored in such legacy data management systems, cannot be viewed or reused by users from all departments and business units in the organization. Data migration can also be tedious considering the various outdated data repositories and scores of legacy system applications. A 360-degree viewing ability can be provided only by building a master data management (MDM) hub.
Follow this step-by-step manual to transport legacy data into a MDM hub:
1) Set up the environment
This would, in most cases, be the staging environment; which is a data collection area set up in parallel to the production environment. For data that is unified from various data sources, this staging environment serves as storage and helps the MDM hub activity to commence.
Gather data from your legacy source systems that represent multiple copies of the same master information of the customer, vendor, product, or inventory. Reconcile and clean the legacy data in the staging area and create the final copy of the MDM data before transitioning it to the production environment. Use the validation frameworks and deduplication algorithm to do so.
2) Compose the registry
In the MDM hub registry, extract and clean data from different instances of a master entity, for example, ‘Customer’. Create a central registry with an enterprise ID for each customer and a cross-reference to the customer’s code in the individual instances. Maintain the essential data elements that are regularly required, such as Name, Address, telephone number, etc. in the registry.
Refresh this registry at regular intervals to keep it up to date until the data synchronization for the MDM hub is completed. This approach does not require any change to existing application systems and provides a quick win.
3) Synchronize and harmonize
Undertake data synchronization or harmonization after composing the registry. Develop an MDM hub application for each master with a master maintenance guide to record and maintain data. Update the data that is distributed to the different business applications that make use of the master data.
Next, provide access to the hub application to an MDM group and disable the master maintenance feature in the individual systems. If the users do not find a master entry of an application in their system, they would raise a request to the MDM group. The group would guarantee that it is not a duplicate, follow an approval process, and accordingly create the entry in the hub. This process ensures that the master data is consistent across applications and the central master data facilitates single view of the master. Once de-duplication is done, the MDM hub would generate keys for commonly identified records so that the filtration process is optimized.
4) Centralize the data
To do this, deactivate the master maintenance features and the master data tables in the different business applications after a central master data is created along with ‘Web services’ to access it. The MDM hub maintenance will be done by the MDM group and the individual applications consume the data through Web service calls. This is a long-term solution that requires significant investments and a well-crafted implementation strategy since it involves changes to existing business applications.
5) Group your master data
Run through the synchronized and cleansed data with a fuzzy logic algorithm. During the process, the data would get grouped in buckets. There can be multiple records with similar looking name fields or address fields, which the fuzzy logic intuitively catches and throws out for correction. The data steward would study and eliminate unwanted records, carry out corrections, and retain only one copy of the group that contains similar records.
6) Generate the surrogate key
The surrogate key gets assigned by the master record set that helps recognize groups of master data. Thus, the surrogate key is that entity which is finally given to the one record that is created from the entire group of similar looking records. The business rule is to create a surrogate key depending on how the business wants to define it. Build the MDM surrogate key with a steady algorithm to configure. The algorithm will be based on the type of master for which the key is being made.
Be consistent in verifying that no duplicate keys are produced. Automate the key verification process to ensure sanctity of the surrogate keys produced. Store the copy of keys in the target MDM hub repository as well as the source repository for reference so that the right record is available for querying when reports summon the cleansed master records.
7) Build the messaging layer
The messaging layer is the feedback system that notifies local systems about a new record being created or updated at the MDM hub. For asynchronous operations, a messaging software like Java Message Service (JMS) can be used, with due configuration and the local systems.
8) Out of the staging environment
a) In the registry approach, the target MDM hub then prunes grouped data to get the final copy of the golden record (clean and reconciled data from the hub) that gets replicated into the source system using a messaging layer. Put back individual copies of the final record into the source and no separate MDM copy exists. If your organization does not have an MDM hub and the extent of the enterprise is small, a registry approach would suffice.
b) Option two is to harmonize a copy of the final master record both at the source and at the MDM hub using the messaging system. If the organization is mid-sized then it is advisable to build a harmonized MDM hub since there are many touch points which a registry cannot handle. Another plus is that a reference repository to look up, to in case of a process failure, would always exist.
c) A centralization approach needs the MDM golden record copy to be available only in the target repository. For all practical purposes the MDM hub is used as the single reference for all master data. For mature organizations, it’s advisable to start with data harmonization and then take the path of centralization. This is a must because if there are diverse environments and systems, data harmonization cannot handle too many record entries. There may be a mismatch between source systems and the MDM hub. A data centralization approach ensures that no master is created at any source.
9) Post migration
It becomes necessary that every time a new record is added into the MDM hub, enough notifications sent to the source departments announcing the arrival of a new master record and details should be shared using a work flow model.
If a department wants to add new records into its source, post the MDM hub implementation, a workflow should ensure that any new entry of master information is prohibited at the source and forces record creation only at the MDM hub. An arrangement in the workflow for simultaneous updates should be made for all the sources corresponding to the entry created at the MDM hub along with the generation of the surrogate key.
About the author: Rajesh Parameswaran is a business intelligence consultant specializing in data mining and master data management (MDM). He is a Six Sigma Green Belt certified professional with 17+ years of IT experience. At L&T Infotech, he rudders the MDM practice, and has been instrumental in creating the CON-TXT Text Analytics Accelerator.
(As told to Sharon D'Souza)