0% found this document useful (0 votes)
752 views34 pages

EDW-ETL Migration Approaches With Databricks

Uploaded by

jatin mane
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
752 views34 pages

EDW-ETL Migration Approaches With Databricks

Uploaded by

jatin mane
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 34

Technical Migration Approaches

SI Partners can customize this slides as per their Migration Frameworks,


Automation Tools and Approaches … this is just guidance..

1
Technical Migration Approach

Architecture/ Data Migration ETL and Pipelines BI and Analytics Data Science/ML
Infrastructure ● Map Data ● Migrate Data ● Re-point reports ● Establish
Structures and transformation and and analytics for connectivity to ML
● Establish
Layout pipeline code, Business Analysts Tools
deployment
● Complete One time orchestration and and Business ● Onboard Data
Architecture
load jobs Outcomes Science teams
● Implement Security
and Governance ● Implement ● Speedup your ● OLAP cube
framework incremental load migration using repointing
approach Automation tools ● Connect to
● Validate: Compare reporting and
your results with analytics
On Prem data and applications
expected results

2
Strategies for Data Migration
One-time loads, catch-up loads , Real-time vs Batch Ingestion

1. Extract from Databases via JDBC ODBC connectors via spark.read.jdbc.. (Parallel ingestion)
2. Extract to Cloud Storage using AWS DMS and use Databricks Autoloader for streaming ingest
3. ISV Partners for Real-Time CDC Ingestion ( Arcion, Fivetran, Qlik, Rivery, Streamsets..)
Ingestion Partner - HVR and Fivetran

4
Ingestion Partner - Qlik Replicate

5
Data Ingestion Partner - Arcion
CDC data into Databricks Lakehouse
• No-maintenance streaming ELT for the cloud Data Lakehouse.
• Centralized view of all analytics ready data in real-time from operational data stores to cloud analytics platforms
• Demo at https://www.arcion.io/partners/databricks.

6
Strategies for ETL/Code Migration
Use of Automated tools or frameworks can reduce your timelines by over 50%!
Migration of Stored Procedures and/or ETL Mappings

• Use Databricks Notebooks or Delta Live Tables:


• Delta Live Tables or Databricks Notebook-based ETL
• Metadata-driven Ingestion Frameworks ( e.g. DLT-Meta )

• Or use GUI-based ETL tool Partners:


• Matillion, Prophecy, DBT, Informatica Cloud (IDMC), Talend, Infoworks.. many more

• Auto code converters accelerate migrations!


Migration via our ISV Software
Partner:
Bladebridge

8
ETL Code Migration with BladeBridge Converter
Consulting System Integrators use Bladebridge tooling to
convert your ETL/Stored procedures Legacy ETL and EDW Code
to Databricks

10X faster than Adapt Converter Manual


Converting by Config
Hand
Low
Pattern
80-95%

9
Databricks Code Migration can help modernize your legacy
platforms in an Automated way!
Massive reduction in cost and time to get the project done!

EDW Systems & Stored


ETL/ELT Platforms procedures
● Informatica ● Teradata
● DataStage ● DB2
● Talend ● Netezza
● Greenplum
● Hive/ Hadoop Platforms
● SQL server
● Oracle
● Snowflake
● Redshift
Running Databricks Migration Analyzer

1.Export Metadata from 2.Install and point 3.Review all code 4.Databricks PS+ SI
Legacy systems Analyzer to the metadata patterns and job Partner to give you
location complexity full migration
proposal
Databricks
Analyzer Results

DataStage Analyzer Result Sheet , Teradata Analyzer Result Sheet , Informatica Analyzer Result sheet
DB Automated
Code
Migration
Process
Teradata Converter Samples ( BTeq - import)


Teradata Converter Samples ( BTeq - Conditional Flow)


Migration via our code automation
partner:
LeapLogic

16
Achieving the Right Balance – A Modern Approach
LeapLogic: From Legacy to Databricks
Convert or re-architect legacy code, workflows, and analytics with Automated Tooling

* Under development working with Databricks product team

18
Automation Levels Possible
Source** Targets

EDW ETL**
Teradata 80 – 95% Ab Initio 70 – 90% Databricks Notebook

Netezza 80 – 95% Informatica 70 – 80% Databricks Workflow

Oracle 60 – 90% DataStage 70 – 80% Databricks Delta Lake

SQL Server 60 – 90% SAS 50 – 80% Databricks SQL (Photon)

Vertica 80 – 95% BI Integrations


Security and Governance
integrations

**Automation percentage above is for the code conversion activity and is an estimate based on prior engagements that will vary by workload

*** Assessment helps create an accurate inventory with complexity along with target-specific auto conversion percentage that can be used for a firm
estimate and project plan creation
ETL Migration via Prophecy
( GUI-based ETL software tool works on top of Databricks)

20
ETL Code Migration with Prophecy Migrate
Excellent at Migrating Abinitio and other ETL mappings to
Prophecy/Databricks

Prophecy converts your ETL workflows

LEGACY ETL PRODUCT - from legacy ETL formats 100% OPEN SOURCE CODE

- to standard Spark code

This is done with 80-90% automation using


PROPRIETARY ENGINE
source-to-source compilers SPARK ENGINE
that Prophecy has developed

We have converted, data matched


workflows, and put in production
for Fortune 500 Enterprises
21
ETL Migration via Matillion
( GUI-based ETL software tool works on top of Databricks)

22
Legacy ETL code migration to Matillion + Databricks

Legacy ETL Cloud Native Data Integration

Datastage

Import into Matillion

Export legacy mappings

Run on Databricks 23
Strategies for Report Migration/Modernization
Unleash Self-service Analytics with a Semantic Lakehouse
• As easy as repointing your reports to DBSQL jdbc/odbc drivers
(Photon and our newest cloudfetch ODBC drivers )
• Key Integrations
• PowerBI Premium (Large Models/Composite Models)
• Tableau Hyper Extracts
• OLAP cube partners like Microstrategy
• Amazon QuickSight
• Atscale: Universal Semantic layer
( now in Databricks PartnerConnect)
{REST:API}
What about High Concurrency?
Can Databricks SQL Warehouses handle concurrency demands?

How would a Databricks SQL Warehouse scale when 10 parallel runs of TPC-DS 99
Power run, repeated twice?
Large Serverless SQL Warehouse 1 to 10 Scaling

2
Results
7 minutes to serve 1980 queries, $22 total cost

Serverless is $.70 per DBU per hour,


and the Large Warehouse scaled up to 7
clusters at its peak. Running this same
workload on the best cloud data
warehouse on the market, Snowflake,
it would probably cost around $37.

33 queries ran in 1 second or less!


Semantic Lakehouse with PowerBI
Self-Service BI and Enterprise BI Deployment Patterns

Self-Service BI Enterprise Semantic Layer (Enterprise BI)

• Business led datasets created by • IT led Enterprise Semantic Models


business or data analysts ( Composite Models)
• Typically smaller in model size, • Can be 10GB to 400GB in compressed
ranging from 1GB to 10GB in data in-memory
compressed data in-memory • Leverages Azure Analysis Services
• Power BI datasets and reports shared under the covers to scale to 1,000s of
with business units, departments, and users
groups using AAD security • Shared with the enterprise using AAD
security

27
AtScale + Databricks: enables a Universal Semantic Lakehouse
The “Diamond” Layer - enables all your Enterprise Reporting on Databricks Lakehouse

Databricks Small
SQL Endpoint Interactive DAX
XMLA

Bronze Silver Gold


MDX
JDBC
Databricks Large
SQL Endpoint interactive
ODBC
SQL
Raw ingestion Filtered, Cleaned, Business-level AtScale
and History Augmented Aggregates Aggregates

REST
Databricks System
Python
SQL Endpoint Engine

azure
active directory active directory
Report modernization to Databricks
Run Semantic Layer & Analytics directly on all your data - in one place!

• Use DBSQL warehouse with photon and use our cloudfetch ODBC drivers
• Use PowerBI or Power BI Premium Large Models
• (AAS - but bigger and better. Supports 400GB extracts vs. old limit of 10GB)
• Integration with Tableau
• Integration with OLAP cube providers like partners like Microstrategy, Atscale etc.

Reporting and Semantic layer ISV Partners:

29
Accelerating Your Migration
Databricks Ecosystem Partners and Tools Cloud

ISV Data Ingestion Partners

ISV Migration automation tooling & ETL partners

Migration
Team

+
System Integrators (Partial List) Migration Brickbuilder Program SIs
CS Packaged
Services
For Migration
Databricks Migration Assurance Package
A layer of Databricks PS Assurance for SI Partners or Customers to ensure success!

Databricks Migration Experts work as part of


an SI or a Customer team to provide
architecture, planning, design and ongoing
expert technical guidance in form on an
advisory capacity throughout the migration
journey to ensure success!

32
Next steps
Customized EDW Migration Success Plan with an Expert-led Assessment
1. Fill out the EDW Migration Discovery Questionnaire & run EDW/ETL Code Analyzers/Profilers
2. Finalize High Level Target State Architecture, ISV and Delivery Partner

3. Use-case, TCO/DBU estimation and Business Value analysis


Map use-cases and prioritize them, understand how $$ value is driven with the migration

4. Review Delivery Proposal by SI partners and start planning implementation

33
Thank you
Please visit databricks.com/migration to read more
Connect with us for your free EDW migration assessment

34
Databricks conversion tools demo video recordings

Data and AI Summit Talk 2022 - Demo of how to convert to Databricks from Legacy Tech

LeapLogic videos for converting Legacy Tech to Databricks

Datastage to Databricks/PySpark conversion Demo

Teradata to Databricks conversion demo

Informatica to Databricks conversion demo

DataStage & Powercenter to Matillion Demo ( If a GUI-based ETL tool is desired with Databricks)

Talend to PySpark/Databricks

T-SQL to Databricks conversion Demo video

You might also like