Migrate To The Cloud: The How and Why of Modernizing Your Data Warehouse
Migrate To The Cloud: The How and Why of Modernizing Your Data Warehouse
6 Strategy requirements
16 Conclusion
16 Learn more
17 About Snowflake
What the market says
According to Gartner, 92 percent of Enterprises realize that legacy data
organizations are aware they have unmet warehouses can no longer deliver on their
data management demands in support true purpose to organize data, enable
of their analytics.1 Seventy-five percent rapid analysis and make insights available
of data executives say they can’t deliver to all business users who need them.
useful data analytics effectively to the That’s why they’re moving away from
enterprise due to inflexible computing traditional data warehouse solutions
solutions². As a result, more than one- toward cloud solutions.
third of data professionals say their
With some upfront planning and
organizations are already using a cloud
consideration, migrating your data analytics
data warehouse3. But this doesn’t mean
to the cloud is a process that can lead
they want a “cloud-washed” version of
to big payoffs for your business and
a legacy and inflexible on-premises data
technology demands. In this eBook, we’ll
warehouse. Nearly all data professionals,
address your organization’s data analytics
93 percent, see the unique benefits of a
needs with a roadmap for migrating your
data warehouse built from the ground up
data warehouse to the cloud.
for the cloud4.
1 Survey Analysis: New Data and New Analytics Are All Mythology Unless You Add Skills, Gartner.com, 9/18/17
2-4 Data Analytics: Beyond the Hype. A Survey of Data Professionals and Executives, Dimensional Research, 9/16
approaches, strategies
and requirements
There are many reasons organizations THE FOUR MOST COMMON
choose to embrace cloud computing. But MIGRATION SCENARIOS
most organizations need a plan, something The type of migration you embark on
to grab onto and see what the future will significantly influence your migration
looks like. Not everyone is in the same strategy. Here are four potential paths many
place regarding their analytic capability organizations take to migrate their data
or cloud maturity. Therefore, give careful analytics and data warehouse to the cloud:
consideration as to how fast and how
much legacy code you want to move from 1. OLTP for operational
your on-premise environment to a public reporting and analytics
cloud infrastructure. This is extremely common. Many
organizations use OLTP (online transaction
processing) systems, such as SQL Server,
Oracle, or MySQL for basic reporting
and analytics. While this might work
as a short-term solution, the reporting
needs of the business compete with the
operational needs, overtaxing a fixed
resource and slowing performance for
both. A truly elastic cloud data warehouse
eliminates this problem. It’s pretty easy.
As discussed later, take your existing
transactional schema, which is usually in
3rd normal form, and move it, as is, to the
cloud. This removes the reporting workload
from the existing system and houses the
data in a platform built for analytics. This
eliminates the performance bottlenecks,
and in some cases, gives your operational
data store new life.
Executing these steps, and in this order, STEP 2: DOCUMENT THE “AS IS”
is not a necessity. Depending on the This isn’t the most glamorous part of a
scope of your migration, you may need migration but it’s likely one of the most
more or fewer steps. The key is to design critical. You’ll need to communicate both
a framework and core elements of the internally and externally, and up and
plan that you can work from. Assess down the reporting chains, regarding the
your internal skill sets, don’t be afraid to current “as is” implementation. A short list
leverage the best practices outlined by of assets to migrate include but are not
strategic vendors, and consider partnering limited to:
with migration experts.
1. All sources that populate the
STEP 1: DETERMINE THE SCOPE existing systems
Stating the obvious, no two migrations are 2. All database objects (tables, views,
the same and rarely is the end state well users, etc)
understood. The goal is to create a plan
3. All transformations, with schedules
that aligns with the goals of the business,
for execution, or triggering criteria
provides capabilities in the shortest
reasonable timeframe and sets you on 4. A diagram from the interaction of
the path for incremental improvement. systems/tools
Your end state could be getting a single
STEP 3: DETERMINE THE
workload into the cloud within one
APPROACH AND ASSEMBLE
month, or it could be migrating your entire
THE IMPLEMENTATION TEAM
analytics platform by the end of the year.
It’s reasonable to plan for a one-year ROI, Multiple options exist here. We've outlined
which you can even accelerate under above the most common approaches
certain scenarios. but there are combinations of these. You
could choose to implement one method
to get to initial capability and another as
you approach full production. Creating
high-level milestones at this step is a good
way to segment when a capability will be
available, and which requirements you’ll
satisfy via release schedules.
NO ETL
ETL A MOD ERN CLOUD
ELT
D ATA WAREHOUSE
DATA FLOW
ARCHITECTURE
S3 /
AZURE
STAGING NATIVE
DATA BASES CONNECTOR /
ODBC / JDBC
The initial load can be challenging place where you could limit the amount of STEP 6: CONVERT ASSETS
based on data volumes and security change in the existing process and revisit This step refers to defining data
requirements. Work closely with your after the initial implementation. warehouse/database assets you may
security team, and the lines of business need to convert. These include data
3. Planning for warehouse
that own the data to make sure you definition language (DDL), role-based
usage and storage
don’t have to go through a tokenization/ access control (RBAC) and data
obfuscation process before moving Typically, most organizations execute manipulation language (DML) used in
data into the cloud. Many organizations a POC and go through an ROI exercise scripts. The good news is that most
segment their data into “zones” or “layers” before executing a migration. At this phase, relational databases leverage the
inside their data lake: raw, curated, it’s usually a good idea to re-validate the ANSI-SQL standard. Most of the changes
aggregated and cleansed areas. Then make usage plan and work the operational side will revolve around ensuring DATE and
the decision which data sets are okay to of the equation with regards to how to TIMESTAMP formats are converted
move, taking into account regulatory and monitor availability and how to govern correctly, and the SQL functions used to
data privacy compliance standards such as usage of the system. Because some cloud access those are checked for compliance.
PII, PCI and HIPAA. Regarding the volume data warehouses provide the ability to (Not all vendors implement functions
of data, as networking gets better, this scale up and down, turn resources on and the same). Some cloud data warehouses
issue starts to go away as you can move off, segment workloads and resources, and simplify DDL by eliminating the need to
terabytes of data into the cloud. auto-scale both processing power and partition and index, so your DDL becomes
storage, the model changes from time much cleaner (less verbose).
TIP: Many organizations receive data
slicing a fixed resource (limiting your
externally from partners and vendors STEP 7: SETUP YOUR “TO BE”
business user access) to allocating resources
(Salesforce, etc). It might be prudent to ENVIRONMENT AND TEST
based on business need and value. You no
dump these data sets into a cloud object CONNECTIVITY / SECURITY
longer have to do a big planning exercise
storage service to keep it from becoming
to handle your largest workload and leave It should be no surprise that you’ll have
classified as an on-premise asset. If data is
the system underutilized for the other 364 to complete your networking, proxy
coming over the Internet, it should be ok
days of the year. and firewall configurations during your
to secure it in the cloud.
TIP: If you are integrating your migration migration strategy. It usually helps to have
2. Ongoing updates with your data lake strategy, be aware of a chart or two outlining what ports and
transfer charges of moving data between URLs you will need to access. You will also
Each source of data, the ETL logic and
regions or cloud providers. want to work with your security group to
integration with the data lake strategy,
download and install any drivers (ODBC,
will dictate the methods used for
JDBC, etc) or support software such as
updating data in your cloud enterprise
a command line interface (CLI), which
Enterprises now realize there are efficiency and to get instructions specific to migrating