The Complete Data Migration Methodology
The Complete Data Migration Methodology
Most software implementation efforts are conducted to satisfy one of the following initiatives: Deploying a new On Line Transactional Processing (OLTP) system Deploying an On Line Analytical Processing (OLAP) system
Each type of system may be replacing and/or enhancing functionality currently delivered by one or more legacy systems. This sort of systems evolution means that organizations are working to grow at or near the pace that the ever-changing world of technology dictates. Choosing a new technological direction is probably the easiest task in accomplishing the entire effort. Complications arise when we attempt to bring together the information currently maintained by the legacy system(s) and transform it to fit into the new system. We refer to the building of this bridge between the old and new systems as data migration. Data migration is a common component most systems development efforts. One would think that any two systems that maintain the same sort of data must have been performing very similar tasks. Therefore, information from one system should map to the other with ease. This is hardly ever the reality of the situation. Legacy systems have historically proven to be far too lenient with respect to enforcing integrity at the atomic level of data. Fields that should be populated from a list of valid values, such as STATES, tend to require that a value be entered, but seldom validate the value entered by the user. Another common problem has to do with the theoretical design differences between hierarchical legacy systems and relational systems. Two of the cornerstones of hierarchical systems, namely de-normalization and redundant storage are strategies that make the relational purist cringe. The most significant problem with data migration projects is that people really do not understand the complexity of data transformation until they have undergone a number of arduous migration projects. Having made these points, it is obvious to this author that there is a desperate need for a sound methodological approach with which organizations can tackle migration projects. Although there is no way to avoid unpleasant surprises in data migrations, we can certainly be prepared to confront and resolve them.
Each of these phases will be defined in further detail in the following sections. Some may choose to argue about the necessity of dividing the migration process into the finite levels of the overall project phases. As you proceed through this paper, it will become apparent that each of these sub-divisions requires critical milestones to be achieved. These milestones mark strategic points along the projects timeline. Any successes or failures in a given phase will significantly impact the outcome of the entire project. Pre-Strategy
Pre-Design? In the later stages of the Pre-Design phase, attribution of the data model will have been completed, and we will be ready to generate the physical database design. The attribution process will be fed from legacy data analysis, legacy report audits, and user feedback sessions. This is essentially the start of data mapping. Mapping must occur to physical data structures because entities do not contain attributes that their succeeding tables will contain because they are components of foreign keys. The mapping of foreign key columns is extremely important, and is easily overlooked when attempting to map from physical to logical structures. Design The Design phase is where the bulk of the actual mapping of legacy data elements to columns takes place. The physical data structures have been frozen, offering an ideal starting point for migration testing. Note that data migration is iterative it does not happen in a single sitting. The mapping portion of a data migration project can be expected to span the Design phase through Implementation. The reason for this is quite simple. The most important resources for validating the migration are the users of the new system. Unfortunately, they will be unable to grasp the comprehensiveness of the migration until they view the data through the new applications. We have concluded from experience that developing the new reports prior to new forms allows for more thorough validation of the migration earlier on in the project lifespan. For instance, if some sort of calculation was performed incorrectly by a migration script, reports will reflect this. A form typically displays a single master record at a time, whereas reports display several records per page, making them a better means of displaying the results of migration testing. A popular misconception about data mapping is that it can be performed against logical data models. Unfortunately logical data models represent attributes that arise through relationships without defining the attributes in the child entity. This essentially means that you cannot map any of the connections between data structures while you are working with the logical design. Therefore, it is necessary to perform data mapping to the physical data model. With the physical data structures in place, you can begin the mapping process. Mapping is generally conducted by a team of at least three people per core business area (i.e. Purchasing, Inventory, Accounts Receivable, etc.). Of these three people, the first should be a business analyst, generally an end user possessing intimate knowledge of the historical data to be migrated. The second team member is usually a systems analyst with knowledge of both the source and target systems. The third person is a programmer/analyst that performs data research and develops migration routines based upon the mappings defined by the business analyst and the systems analyst, cooperatively.
The truest test of data mapping is providing the populated target data structures to the users that assisted in the analysis and design of the core system. Invariably, the users will begin to identify scores of other historical data elements to be migrated that were not apparent to them during the Analysis/Design phases. The fact is that data mapping often does not make sense to most people until they can physically interact with the new, populated data structures. Frequently, this is where the majority of transformation and mapping requirements will be discovered. Most people simply do not realize they have missed something until it is not there anymore. For this reason, it is critical to unleash users on the populated target data structures as soon as possible. The data migration Testing phases must be reached as soon as possible to ensure that it occurs prior to the Design and Build phases of the core project. Otherwise, months of development effort can be lost as each additional migration requirement slowly but surely wreaks havoc on the data model, which, in turn, requires substantive modifications to the applications. . The measures taken in the Test phases are executed as early as the Design phase. Testing is just as iterative as the migration project itself, in that every enhancement must pass the test plan. Revise The Revise phase is really a superset of the last four phases (Pre-Test, Test, Implement and Maintenance), which are iterative and do not take place independently. This is the point in the process where cleanup is managed. All of the data model modifications, transformation rule adjustments, and script modifications are essentially combined to form the Revise phase. At this point, the question must be asked: Are both the logical and physical data model being maintained? If so, you now have doubled the administrative workload for the keeper of the data models. In many projects, the intention is to maintain continuity between the logical and physical designs. However, because the overwhelming volume of work tends to exceed its perceived value, the logical model is inevitably abandoned, resulting in inconsistent system documentation. CASE tools can be used to maintain the link between the logical and physical models, though it is necessary for several reports to be developed in house. For example, you will want reports that indicate discrepancies between entities/tables and attributes/columns. These reports will indicate whether there is a mis-match between the number of entities versus tables and/or attributes versus columns, identify naming convention violations and seek out data definition discrepancies. Choose a CASE tool that provides an API to the meta-data, because it will definitely be needed. Maintenance The Maintenance phase is where all of the mappings are validated and successfully implemented in a series of scripts that have been thoroughly tested. The maintain phase differs depending upon whether you are migrating to an OLTP or an OLAP system. If the migration is to an OLTP system, you are working within the one and done paradigm. You goal is to successfully migrate the legacy data into the new system, rendering the migration scripts useless when the migration has been accomplished.
1. 2.
1.
Manual data migration The manual approach to data migration is still a valid method. In fact, there are many small data migration projects that have very little data to migrate, and as a result do not require a major investment in a data migration tool or large amounts of effort by one or more programmers. Decision factors in selecting a data transformation tool Inevitably, a decision about whether to perform data migration manually or purchase a data transformation tool must be made. Though the cost of a transformation tool is indeed recognized up front, most project leaders simply cannot foresee the complexity of the data migration aspects of a systems development project. This is an observation that I have found to be true at several organizations around the country.
Conclusion
Data Migration is a necessary evil, but not an impossible one to conquer. The key is to prepare for it very early on, and monitor it carefully throughout the process. Project timelines tend to become more rigid as time passes, so it really makes sense to meet migration head on. A devoted team with a clearly defined project plan from project inception armed with automated tools where applicable, is indeed the formula to success.