Data Migration Solution

Data migration is the perennial exercise in enterprise computing. However, it needs an agile and proactive solution model to overcome its unique challenges.

Venus Informatics offers a proven solution, to overcome pitfalls of data migration projects. It is based on the lesson learned from my experience and the best practices for data migration.
Data is of significant value for the organization who owns it. Therefore, considerable care must be taken to migrate the legacy data into the new system accurately. Several challenges occur in data migration projects. In the previous blog, I described following unique challenges of the data migration project.

  • Legacy data is an unfamiliar terrain.
  • Limited development time
  • Delays in decisions making from the business.
  • Restricted execution time
  • Poor data quality
  • Lack of rigorous testing
  • Effect on data warehouse and BI

Solution

The primary objectives of the data migration project are to extract the data from the legacy system, transform it into the desired target data model, and finally to upload into the new information system. However, it needs an agile and proactive solution model to overcome unique challenges of data migration. This blog summarizes the coherent data migration solution, used by Venus Informatics, which mitigates the risk stemming from above-stated problems.

Venus Informatics offers a proven solution, to overcome above challenges of data migration projects. Based on the lesson learned from my experience and the best practices for data migration, recommend by veterans and industry experts, It comprises of following key components:

  • Develop with production data
  • Migrate along logical data partitions
  • Measure migration quality
  • Regular quality reports
  • Agile and robust processing
  • Data cleansing
  • Incremental ETL
  • Living documentation

Develop with production data

Database models of legacy and target systems are significantly different. The documentation (data dictionary) of the legacy system is not completely reliable, which often results in some assumptions in the migration code. In such scenario, developing a data migration system with a stagnant copy of production data may not cover all cases and outliers in the business data. On the go-live day, data migration code should run without any issue to ensure integrity and consistency of migrated data. Leaving any particular untested case in the legacy data can break the migration code or prevents some legacy data from the migration.

From the beginning, developing and testing data migration code with a clone of production data, with frequent refresh (daily or weekly) overcomes this issue by rapidly revealing the data semantics of legacy system to data migration developers. Therefore, It ensures early detection of all possible glitches at very early stage of development and testing. Developing with near real-time data highlights any missing knowledge about individual cases in the business data, prevents wrong assumptions, provides sufficient test data, and gives a good estimation of the execution time at go-live.

Migrate along logical data partitions

Although the data models and the capabilities of legacy and target system cannot be matched one-to-one, their domains are always resembling. Therefore, data migration code should first migrate the master data(more inserts, fewer updates) followed by transactional data. For example, when migrating legacy order system data, migration code should first migrate the customers and products(master data), followed by the historical order data.

Migrating data along logical partitions enables the rapid and continuous, but manageable, development of data migration code. It reduces the overall testing efforts but improves the quality of tests as each step of the migration can be tested individually. In empowers early testing of migrated data, as soon as data migration code for a logical data partition is ready. For example, the business system experts can start comparing legacy customer data with migrated customer data in the test target environment while data migration team is working on migrating products and orders data.

Measure migration quality

The transformation part of the migration code may incorrectly modify the data or a subset of the data, which can prevent the wrongly transformed data from the migration or in worst-case load the corrupted data. Data corrupted by migration code may remain hidden from developers. Therefore, assessing the quality of the data migration code, after each test execution, is vital.The quality of data migration code can be assessed by logging necessary information during the run time. For example, data migration code should capture key indicators, such as the number of records extracted, transformed and loaded, for every target table. Any difference in the number of records extracted and loaded should initiate further investigation. Migration code should also keep a log of any failures in each stage of the ETL pipeline.

Accessing quality of migration code from the beginning facilitates a quick feedback cycle, which in return validates assumptions, reveals potential data clean-up in the legacy system, and eventually mitigate the risk of data migration. Therefore, after only few iterations, it builds the trust in the data migration process and ensures data integrity and consistency of the migrated data.

Regular quality reports

No matter how expert a data migration developer is, she needs inputs from business system analysts or system matter experts managing the legacy system. Therefore, closely involving business system experts into the data migration efforts is necessary. Their inputs, feedbacks, and anecdotes on legacy system are critical in forming business rules, transform logic and validating assumptions. Providing business system experts, who mostly have little time to spare, with the aggregated view of data migration result can help them to highlight key issues and share valuable insights.

The data migration quality report should involve key statistics about the overall status of data migration, the numbers of successfully migrated data and the failures due to poor data quality or technical error. Regularly sending a data migration quality report to key business system experts can assist them in productively share their domain expertise with the data migration team and evaluate any potential risk associated with the data migration.

Agile and robust processing

More often than not, during the development of data migration code, the target system is in the ongoing development state. Although the majority of the data model is defined, development of new features and major bug fixing in the target system can alter the structure of target data model.

Data migration code should be flexible enough for continuous development and at the same time robust enough to prevent the execution of migration code from unexpected failures by implementing the exception handling to control and log all types of problematic data.

Data cleansing

Having higher data quality checks in the target system prevents migrating data with poor data quality. For example, the customer phone number field is a free text field in the legacy system, thus containing alphanumeric values in many cases. Whereas, the analogous field in the target system only allows digits, brackets and hyphen. Manual data cleansing is required for such instances. However, dirty data due to repetitive data entry glitches should be identified and fixed on the fly by blending data cleansing scripts in the data migration code.

Highest possible data quality is the primary prerequisite for any data analytics task. Data migration project brings the golden opportunity to improve the overall data quality of the organization. Apart from mandatory data cleansing, due to stronger data quality constraints in the target system, it is recommended to define data quality rules for other data elements within the scope of data migration. Data with poor data quality, which violates the pre-defined data quality rule, can be then cleansed or identified during the data migration.

Incremental ETL

Data migration execution time is restricted because it is not desirable to keep business processes offline for a long interval when the data migration process is under execution. Therefore, an incremental approach should be considered, in which an initial heavy data migration load is performed before the go-live. After the initial load, only the data which has changed are migrated before the target system is live.

Incremental ETL can drastically reduce the downtime of the legacy system. However, in some cases, incremental approach may not be feasible due to technical limitations of the legacy system. It is also not worth the effort for a simpler migration where the data in the migration scope is limited, relatively smaller, and takes only a few minutes to migrate successfully.

Living documentation

Documentation is crucial to the ongoing success of any data migration project but is often overlooked; mostly due to lack of desire and resource constraints. An accurate and complete documentation on data migration can prove to be a very useful resource for the business, in particular the BI/reporting team. During a data migration project, it is common to find business system experts and super users anxious about missing out any critical field in the migration, mostly those fields which are used for managing some exceptional cases; completely different from what documented in the data dictionary. Such scenarios are common when legacy system is timeworn. On the other hand, BI/Reporting team of an organization, with existing reporting infrastructure, is always curious about the new data model of the target system. Because all current SQL procedures, used in building data warehouse and operational reports, must be modified with new field names and table names immediately after the go-live.

Maintaining the living documentation can provide visibility of the process and progress of the data migration project to the entire organization. It improves the response time of business system experts in validating fields and lookup data mappings and in providing data migration developers with business rules for data quality and migrating outliers. Initially, a living document can simply be a bunch of excel files, one file per target table (or upload template), which can gradually move to a collaborative platform for wider visibility and amendments. Data migration documentation should include comprehensive information about ETL data pipelines such as target field name, mapped legacy field(s) and table name(s), extract logic, data quality rule, business rules for handling special cases and a boolean field denoting whether the mapping of lookup data is required or not. Such documentation, for every target field in a particular target table, also provides data migration developers with the code outline, which indeed optimizes the development time.

Contact us for more

Often the data migration project, with its unique challenges and multiple stakeholders, is a part of a bigger organization-wide change. Therefore, a holistic and coherent solution approach is required to facilitate a reliable solution, which is visible to all stakeholders. Venus Informatics follow above-stated solution approach to deliver end-to-end data migration service covering all stages of data migration projects; these include consulting, building data migration code, evaluating data quality on agreed measures and fixing bad data, developing new ETL processes for existing data warehouse and post go-live support.

 

 

 

Krupesh Desai
Consulting Director
Venus Informatics Limited

Data Migration Challenges

Several challenges occur in data migration projects. Following is my attempt to concisely summarize key challenges from the business perspective.

Data migration projects are short-lived but inevitable because a new information system can be introduced into the business for a variety of reasons.
Data migration is the perennial exercise in enterprise computing. With the introduction of a new information system (ERP/CRM) in the business, data existing in the legacy system must be moved into a new improved target system. Data migration projects are short-lived but inevitable because a new information system can be introduced into the business for a variety of reasons. Often, it is the deterioration of existing legacy systems over a period which leads to heavy maintenance and eventually minimal or no vendor support. Mergers, acquisition, demand for more functionality and improved usability are other mundane business scenarios requiring data migration.

 

Challenges

Data is of significant value for the organization who owns it. Therefore, considerable care must be taken to migrate the legacy data into the new system accurately. Several challenges occur in data migration projects. Following is my attempt to concisely summarize key challenges from the business perspective.

  • Legacy data is an unfamiliar terrain.
  • Limited development time
  • Delays in decisions making from the business.
  • Restricted execution time
  • Poor data quality
  • Lack of rigorous testing
  • Effect on data warehouse and BI

Legacy data is an unfamiliar terrain

When migrating from a legacy system to a new one, it is less likely to have first key developers in the team. The extreme case could be zero vendor support. The data dictionary is your friend, but legacy systems may have been documented well in the beginning. But, modifications, documented if you are lucky otherwise undocumented, in the data models from its original state, are expected when the system is operational for years. Therefore, you cannot entirely rely on the available documentation.

Limited development time

Until the data model of the target system is not well-defined, articulated and documented, the data migration project cannot commence. However, it must be achieved before the go-live. In an agile development environment, target data model may remain subject to change until the final release.On the other hand, in an incremental go-live execution, drastic change in the data model may occur due to any major bug fix in the live system. An extreme but likely situation could be limited testing time between the production ready release date, and the go-live(can be as short as one or two weeks).

Delays in decisions from the business side.

Data migration is always a part of larger enterprise level change, which may alter business processes, reporting methods, and habitual data entry practices. It would be over-optimistic to desire quick decisions from business experts in the midst of such change. System matter experts, super users of the legacy application, and in fact the database developers may typically be struggling to complete their daily work while working with the new application vendor concurrently on requirement gathering and testing.

Restricted execution time

It is often recommended by the business to keep the data migration development time and execution time to its optimal best. Unwanted delay in the development time can shorten the data migration testing hours and may even extend the production release deadline. Large data extract from the legacy system can take more time to transform into the data model of the target system resulting in longer execution time. It is not feasible to keep business processes on hold for hours when data migration is in execution.

Poor data quality

The target system may have relatively higher data quality constraints than those of the legacy system. Example: In the legacy system, country_name field of customer address table is a free text field; prone to the typo. Whereas the data in the analogous country name field in the target system must be valid, recognized by the UN, country name, represented by a unique code. The data quality issue, which the legacy system not even aware of, can cause a severe problem in the target system. In the above example, data migration developer may have to map one country name written in 10 different ways(due to typo) to a single matching country code in the new system. When the legacy data is of poor quality, data migration scripts can corrupt the data during transformation, which is only visible to end users.

Lack of rigorous testing

The data migration testing is usually conducted, in a tight time-frame, after the legacy data is migrated and available in the target system. The volume of data within the data migration scope is usually large thus it is not practically possible to manually compare each data element in the destination with its origin. One popular workaround is to limit the number of records for data migration testing as a good representative sample(When QA lead appreciates statistics). However, for massive datasets, this sample size may, still be higher and require a lot of human resources, but still, count as not reliable.

Effect on data warehouse and BI

Large organizations with existing data warehouse and reporting infrastructure need to be more conscious about the data migration as the new target system will feed the data warehouse ETL process and also populate operational reports from the go-live moment. Immediately after the business is live with the new system, the data warehouse and reporting should be aware of the data model underneath. Business analysts receiving corrupted reports or no reports at all after the go-live can be a nightmare for the data warehouse and BI team.

Coherent data migration solution

Whatever the reason could be for the change, data migration is inevitable. The new information system can not be operational with an empty database. The legacy data, generally large in amount must be first loaded into the new system before its go-live moment. More often than not, data models of old and new systems are fundamentally different.

The primary objectives of the data migration project are to extract the data from the legacy system, transform it into the desired target data model, and finally to upload into the new information system. However, it needs an agile and proactive solution model to overcome unique challenges of data migration. The next blog in this series describes the coherent data migration solution, used by Venus Informatics, which mitigates the risk stemming from above-stated challenges.

 

 

 

Krupesh Desai
Consulting Director
Venus Informatics Limited