Migrate To New Technology Notes
Migrate To New Technology Notes
Big data is what drives most modern businesses, and big data never sleeps. That means data
integration and data migration need to be well-established, seamless processes — whether
data is migrating from inputs to a data lake, from one repository to another, from a data
warehouse to a data mart, or in or through the cloud. Without a competent data migration
plan, businesses can run over budget, end up with overwhelming data processes, or find that
their data operations are functioning below expectations.
Data migration is the process of moving data from one system to another. While this
might seem straightforward, it involves a change in storage and database or application.
In the context of the extract/transform/load (ETL) process, any data migration will involve at
least the transform and load steps. This means that extracted data needs to go through a series
of functions in preparation, after which it can be loaded in to a target location.
Organizations undertake data migrations for a number of reasons. They might need to
overhaul an entire system, upgrade databases, establish a new data warehouse, or merge new
data from an acquisition or other source. Data migration is also necessary when deploying
another system that sits alongside existing applications.
Regardless of the exact purpose for a data migration, the goal is generally to enhance
performance and competitiveness.
Less successful migrations can result in inaccurate data that contains redundancies and
unknowns. This can happen even when source data is fully usable and adequate. Further, any
issues that did exist in the source data can be amplified when it’s brought into a new, more
sophisticated system.
A complete data migration strategy prevents a subpar experience that ends up creating more
problems than it solves. Aside from missing deadlines and exceeding budgets, incomplete
plans can cause migration projects to fail altogether. In planning and strategizing the work,
teams need to give migrations their full attention, rather than making them subordinate to
another project with a large scope.
A strategic data migration plan should include consideration of these critical factors:
• Knowing the data — Before migration, source data needs to undergo a complete
audit. Unexpected issues can surface if this step is ignored.
• Cleanup — Once you identify any issues with your source data, they must be
resolved. This may require additional software tools and third-party resources
because of the scale of the work.
• Maintenance and protection — Data undergoes degradation after a period,
making it unreliable. This means there must be controls in place to maintain
data quality.
• Governance — Tracking and reporting on data quality is important because it
enables a better understanding of data integrity. The processes and tools used to
produce this information should be highly usable and automate functions where
possible.
There is more than one way to build a data migration strategy. An organization’s specific
business needs and requirements will help establish what’s most appropriate. However, most
strategies fall into one of two categories: “big bang” or “trickle.”
In a big bang data migration, the full transfer is completed within a limited window of time.
Live systems experience downtime while data goes through ETL processing and transitions
to the new database.
The draw of this method is, of course, that it all happens in one time-boxed event, requiring
relatively little time to complete. The pressure, though, can be intense, as the business
operates with one of its resources offline. This risks a compromised implementation.
If the big bang approach makes the most sense for your business, consider running through
the migration process before the actual event.
“Trickle” Migration
Compared to the big bang approach, these implementations can be fairly complex in design.
However, the added complexity — if done right — usually reduces risks, rather than adding
them.
Regardless of which implementation method you follow, there are some best practices to
keep in mind:
• Back up the data before executing. In case something goes wrong during the
implementation, you can’t afford to lose data. Make sure there are backup
resources and that they’ve been tested before you proceed.
• Stick to the strategy. Too many data managers plan and then abandon it when
the process goes “too” smoothly or when things get out of hand. The migration
process can be complicated and even frustrating at times, so prepare for that
reality and then stick to the plan.
• Test, test, test. During the planning and design phases, and throughout
implementation and maintenance, test the data migration to make sure you will
eventually achieve the desired outcome.
Each strategy will vary in the specifics, based on the organization’s needs and goals, but
generally, a data migration plan should follow a common, recognizable pattern:
Before migrating data, you must know (and understand) what you’re migrating, as well as
how it fits within the target system. Understand how much data is pulling over and what that
data looks like.
There may be data with lots of fields, some of which won’t need to be mapped to the target
system. There may also be missing data fields within a source that will need to pull from
another location to fill a gap. Ask yourself what needs to migrate over, what can be left
behind, and what might be missing.
Beyond meeting the requirements for data fields to be transferred, run an audit on the actual
data contained within. If there are poorly populated fields, a lot of incomplete data pieces,
inaccuracies, or other problems, you may reconsider whether you really need to go through
the process of migrating that data in the first place.
If an organization skips this source review step, and assumes an understanding of the data,
the result could be wasted time and money on migration. Worse, the organization could run
into a critical flaw in the data mapping that halts any progress in its tracks.
The design phase is where organizations define the type of migration to take on — big bang
or trickle. This also involves drawing out the technical architecture of the solution and
detailing the migration processes.
Considering the design, the data to be pulled over, and the target system, you can begin to
define timelines and any project concerns. By the end of this step, the whole project should
be documented.
During planning, it’s important to consider security plans for the data. Any data that needs to
be protected should have protection threaded throughout the plan.
3. Build the Migration Solution
The testing process isn’t over after testing the code during the build phase. It’s important to
test the data migration design with real data to ensure the accuracy of the implementation and
completeness of the application.
After final testing, implementation can proceed, using the style defined in the plan.
6. Audit
Once the implementation has gone live, set up a system to audit the data in order to ensure
the accuracy of the migration.
Building out data migration tools from scratch, and coding them by hand, is challenging and
incredibly time-consuming. Data tools that simplify migration are more efficient and cost-
effective. When you start your search for a software solution, look for these factors in a
vendor:
• Connectivity — Does the solution support the systems and software you currently
use?
• Scalability— What are the data limits for the software, and will data needs
exceed them in the foreseeable future?
• Security — Take time investigating a software platform’s security measures.
You’re data is one of your most valuable resources, and it must remain
protected.
• Speed — How quickly can processing occur on the platform?
Increasingly, organizations are migrating some or all of their data to the cloud in order to
increase their speed to market, improve scalability, and reduce the need for technical
resources.
In the past, data architects were tasked with deploying sizeable server farms on-premises to
keep data within the organization’s physical resources. Part of the reason for pushing ahead
with on-site servers had been a concern for security on the cloud. However, as major
platforms adopt security practices putting them on par with traditional IT security (and
necessarily in compliance with the GDPR), this barrier to migration has largely been
overcome.
The right cloud integration tools help customers accelerate cloud data migration projects with
a highly scalable and secure cloud integration platform-as-a-service (iPaaS). Talend’s suite
of open source, cloud-native data integration tools enable drag-and-drop functionality to
simplify complex mapping, and our open-source foundations make our solution cost-effective
and efficient.