EDW-ETL Migration Approaches With Databricks
EDW-ETL Migration Approaches With Databricks
1
Technical Migration Approach
Architecture/ Data Migration ETL and Pipelines BI and Analytics Data Science/ML
Infrastructure ● Map Data ● Migrate Data ● Re-point reports ● Establish
Structures and transformation and and analytics for connectivity to ML
● Establish
Layout pipeline code, Business Analysts Tools
deployment
● Complete One time orchestration and and Business ● Onboard Data
Architecture
load jobs Outcomes Science teams
● Implement Security
and Governance ● Implement ● Speedup your ● OLAP cube
framework incremental load migration using repointing
approach Automation tools ● Connect to
● Validate: Compare reporting and
your results with analytics
On Prem data and applications
expected results
2
Strategies for Data Migration
One-time loads, catch-up loads , Real-time vs Batch Ingestion
1. Extract from Databases via JDBC ODBC connectors via spark.read.jdbc.. (Parallel ingestion)
2. Extract to Cloud Storage using AWS DMS and use Databricks Autoloader for streaming ingest
3. ISV Partners for Real-Time CDC Ingestion ( Arcion, Fivetran, Qlik, Rivery, Streamsets..)
Ingestion Partner - HVR and Fivetran
4
Ingestion Partner - Qlik Replicate
5
Data Ingestion Partner - Arcion
CDC data into Databricks Lakehouse
• No-maintenance streaming ELT for the cloud Data Lakehouse.
• Centralized view of all analytics ready data in real-time from operational data stores to cloud analytics platforms
• Demo at https://www.arcion.io/partners/databricks.
6
Strategies for ETL/Code Migration
Use of Automated tools or frameworks can reduce your timelines by over 50%!
Migration of Stored Procedures and/or ETL Mappings
8
ETL Code Migration with BladeBridge Converter
Consulting System Integrators use Bladebridge tooling to
convert your ETL/Stored procedures Legacy ETL and EDW Code
to Databricks
9
Databricks Code Migration can help modernize your legacy
platforms in an Automated way!
Massive reduction in cost and time to get the project done!
1.Export Metadata from 2.Install and point 3.Review all code 4.Databricks PS+ SI
Legacy systems Analyzer to the metadata patterns and job Partner to give you
location complexity full migration
proposal
Databricks
Analyzer Results
DataStage Analyzer Result Sheet , Teradata Analyzer Result Sheet , Informatica Analyzer Result sheet
DB Automated
Code
Migration
Process
Teradata Converter Samples ( BTeq - import)
→
Teradata Converter Samples ( BTeq - Conditional Flow)
→
Migration via our code automation
partner:
LeapLogic
16
Achieving the Right Balance – A Modern Approach
LeapLogic: From Legacy to Databricks
Convert or re-architect legacy code, workflows, and analytics with Automated Tooling
18
Automation Levels Possible
Source** Targets
EDW ETL**
Teradata 80 – 95% Ab Initio 70 – 90% Databricks Notebook
**Automation percentage above is for the code conversion activity and is an estimate based on prior engagements that will vary by workload
*** Assessment helps create an accurate inventory with complexity along with target-specific auto conversion percentage that can be used for a firm
estimate and project plan creation
ETL Migration via Prophecy
( GUI-based ETL software tool works on top of Databricks)
20
ETL Code Migration with Prophecy Migrate
Excellent at Migrating Abinitio and other ETL mappings to
Prophecy/Databricks
LEGACY ETL PRODUCT - from legacy ETL formats 100% OPEN SOURCE CODE
22
Legacy ETL code migration to Matillion + Databricks
Datastage
Run on Databricks 23
Strategies for Report Migration/Modernization
Unleash Self-service Analytics with a Semantic Lakehouse
• As easy as repointing your reports to DBSQL jdbc/odbc drivers
(Photon and our newest cloudfetch ODBC drivers )
• Key Integrations
• PowerBI Premium (Large Models/Composite Models)
• Tableau Hyper Extracts
• OLAP cube partners like Microstrategy
• Amazon QuickSight
• Atscale: Universal Semantic layer
( now in Databricks PartnerConnect)
{REST:API}
What about High Concurrency?
Can Databricks SQL Warehouses handle concurrency demands?
How would a Databricks SQL Warehouse scale when 10 parallel runs of TPC-DS 99
Power run, repeated twice?
Large Serverless SQL Warehouse 1 to 10 Scaling
2
Results
7 minutes to serve 1980 queries, $22 total cost
27
AtScale + Databricks: enables a Universal Semantic Lakehouse
The “Diamond” Layer - enables all your Enterprise Reporting on Databricks Lakehouse
Databricks Small
SQL Endpoint Interactive DAX
XMLA
REST
Databricks System
Python
SQL Endpoint Engine
azure
active directory active directory
Report modernization to Databricks
Run Semantic Layer & Analytics directly on all your data - in one place!
• Use DBSQL warehouse with photon and use our cloudfetch ODBC drivers
• Use PowerBI or Power BI Premium Large Models
• (AAS - but bigger and better. Supports 400GB extracts vs. old limit of 10GB)
• Integration with Tableau
• Integration with OLAP cube providers like partners like Microstrategy, Atscale etc.
29
Accelerating Your Migration
Databricks Ecosystem Partners and Tools Cloud
Migration
Team
+
System Integrators (Partial List) Migration Brickbuilder Program SIs
CS Packaged
Services
For Migration
Databricks Migration Assurance Package
A layer of Databricks PS Assurance for SI Partners or Customers to ensure success!
32
Next steps
Customized EDW Migration Success Plan with an Expert-led Assessment
1. Fill out the EDW Migration Discovery Questionnaire & run EDW/ETL Code Analyzers/Profilers
2. Finalize High Level Target State Architecture, ISV and Delivery Partner
33
Thank you
Please visit databricks.com/migration to read more
Connect with us for your free EDW migration assessment
34
Databricks conversion tools demo video recordings
Data and AI Summit Talk 2022 - Demo of how to convert to Databricks from Legacy Tech
DataStage & Powercenter to Matillion Demo ( If a GUI-based ETL tool is desired with Databricks)
Talend to PySpark/Databricks