0% found this document useful (0 votes)
98 views36 pages

Microsoft SQL Server to Databricks Migration Guide

The Microsoft SQL Server to Databricks Migration Guide provides a comprehensive roadmap for migrating data warehouse workloads from SQL Server to the Databricks Data Intelligence Platform, highlighting key differences, migration strategies, and best practices. It outlines a structured migration process across multiple phases, including discovery, architecture design, data migration, and integration with BI tools. The guide emphasizes the advantages of Databricks' lakehouse architecture, which simplifies data management and enhances scalability and analytics capabilities.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
98 views36 pages

Microsoft SQL Server to Databricks Migration Guide

The Microsoft SQL Server to Databricks Migration Guide provides a comprehensive roadmap for migrating data warehouse workloads from SQL Server to the Databricks Data Intelligence Platform, highlighting key differences, migration strategies, and best practices. It outlines a structured migration process across multiple phases, including discovery, architecture design, data migration, and integration with BI tools. The guide emphasizes the advantages of Databricks' lakehouse architecture, which simplifies data management and enhances scalability and analytics capabilities.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

Guide

Microsoft SQL
Server to Databricks
Migration Guide

!"
Contents
Introduction 3
About this guide 4
Migration strategy 4
Overview of the migration process 4

Phase 1: Migration discovery and assessment 5

Phase 2: Architecture design and planning 8


Monitoring and observability framework 8
EDW architecture 9

Phase 3: Data warehouse migration 13


Considerations for workspace creation 13
Considerations for schema and data migration 13
Recommended approach 15
Phase 3.1: Schema migration 16
Phase 3.2: Data migration 17
Phase 3.3: Other database objects migration 18
Stored procedures implementation in databricks 19
Implement slowly changing dimensions 20
Phase 3.4: Security migration 20
Authentication 21
Authorization 21

Phase 4: Code and ETL pipelines migration 23


Orchestration migration 23
Query migration and refactoring 25
Code refactoring and optimization 27
SQL Server to Databricks cutover phase 28

Phase 5: BI and Analytics Tools Integration 29


Microsoft Power BI integration 31

Phase 6: Migration validation 32

Need help migrating? 34

MICROSOFT
SQ L S ERVER TO
D ATA B R I C K S
M I G R AT I O N
GUIDE 2
Introduction

Introduction Traditional enterprise database warehouse (EDW) appliances like


Phase 1:
Microsoft SQL Server come with significant limitations — high costs, lack
Migration Discovery of support for unstructured data, built-in AI, ML or real-time streaming
and Assessment capabilities and scaling storage or computing that is challenging and
Phase 2: expensive. To solve this, enterprises have different data marts, data
Architecture Design warehouses, data lakes, ML platforms and streaming platforms that
and Planning
create silos as it requires constant ETL processes to move data between
Phase 3: platforms for different workloads, increasing complexity and slowing
Data Warehouse down insights.
Migration
Databricks’ Data Intelligence Platform introduces a paradigm shift by
Phase 4:
Code and ETL eliminating the need for separate data processing systems and constant
Pipelines Migration data movement. Instead of copying data between warehouses, marts,
lakes and ML platforms, Databricks brings different processing engines to
Phase 5:
BI and Analytics a single copy of data in the cloud, enabling seamless data warehousing,
Tools Integration AI, ML and GenAI use cases on a single platform and data asset. With its
lakehouse architecture, Databricks provides governance, scalability and
Phase 6:
Migration Validation advanced analytics while decreasing costs and operational complexity.
Migrating to Databricks ensures your organization is ready for the future
Need Help
Migrating?
with a unified, efficient and intelligent data platform.

MICROSOFT
SQ L S ERVER TO
D ATA B R I C K S
M I G R AT I O N
GUIDE 3
ABOUT THIS GUIDE

This guide provides a detailed roadmap for migrating data warehouse


workloads from SQL Server to the Databricks Data Intelligence Platform.
Introduction
It outlines key differences between the two systems, standard data and
Phase 1: code migration patterns, and best practices to streamline the transition.
Migration Discovery
and Assessment Additionally, it compiles proven methodologies, tool options and
insights gained from successful migrations. This migration guide covers
Phase 2:
Architecture Design theoretical concepts and practical applications and is a comprehensive
and Planning resource for organizations looking to leverage Databricks for enhanced
performance, scalability and advanced analytics.
Phase 3:
Data Warehouse
Migration

Phase 4: MIGRATION STRATEGY


Code and ETL
Pipelines Migration Successfully migrating from SQL Server to Databricks requires
meticulous planning, strategic alignment and precise target architecture
Phase 5:
BI and Analytics
to secure a favorable outcome. Following a proven structured migration
Tools Integration methodology is critical to achieving a seamless, effective migration to
Databricks, enabling your organization to realize value and position itself
Phase 6:
Migration Validation for rapid future innovation.

Need Help
Migrating?
OVERVIEW OF THE MIGRATION PROCESS

Proper planning is required to migrate data and ETL processes from


legacy on-premises systems to cloud technologies. This migration
involves transferring data and business logic from on-premises
infrastructure to the cloud.

MICROSOFT
SQ L S ERVER TO
D ATA B R I C K S
M I G R AT I O N
GUIDE 4
Despite the substantial differences between SQL Server and Databricks,
there are surprising similarities that can facilitate the migration process:

• Despite its proprietary SQL dialect, Microsoft SQL Server’s T-SQL


Introduction
adheres mostly to ANSI SQL standards, providing compatibility with
Phase 1:
Databricks SQL syntax.
Migration Discovery
and Assessment • Code migration can be accomplished through code refactoring,
leveraging the shared ANSI SQL compliance between the two
Phase 2: systems.
Architecture Design
and Planning • Fundamental Data Warehouse concepts exhibit similarities across
Phase 3:
SQL Server and Databricks.
Data Warehouse
Migration

Phase 4: The migration process typically consists of the following technical


Code and ETL implementation phases:
Pipelines Migration
t
Phase 5:
BI and Analytics
Tools Integration

Phase 6: Migration Architecture Data warehouse


Migration Validation discovery and design and planning migration
assessment
Need Help
Migrating?

Stored procedures BI and analytics Migration validation


and ETL pipeline tools integration
migration

Figure 1: Technical migration approach

MICROSOFT
SQ L S ERVER TO
D ATA B R I C K S
M I G R AT I O N
GUIDE 5
Phase 1: Migration Discovery and Assessment

Introduction Conducting a migration assessment is crucial before migrating any data


Phase 1:
or workloads. This enables Databricks to:
Migration Discovery
and Assessment
• Gain insight into data ingress and egress, ETL patterns, data volume,
orchestration tools and execution frequency.
Phase 2:
Architecture Design • Understand the technologies involved in upstream and downstream
and Planning integrations.
Phase 3: • Assess the business criticality and value of the existing systems.
Data Warehouse
Migration • Evaluate the existing security framework and access control
mechanisms.
Phase 4:
Code and ETL
Pipelines Migration
• Gather pertinent information to provide a realistic estimation of the
effort required for migration.
Phase 5:
BI and Analytics • Compare and calculate infrastructure costs.
Tools Integration
• Identify any imminent deadlines, particularly regarding license
Phase 6: renewal fees for the existing SQL Server setup.
Migration Validation
• Document any functional or cross-functional dependencies in the
Need Help
Migrating?
migration plan.

Databricks recommends automation tools, such as the recently


acquired BladeBridge Code Analyzer, to expedite the gathering of
migration-related information during this phase.

MICROSOFT
SQ L S ERVER TO
D ATA B R I C K S
M I G R AT I O N
GUIDE 6
Typically, these profiling tools examine SQL Server system usage via its
system views and catalog tables, providing consumption and complexity
insights and an inventory of objects and code migration complexity.
Introduction They capture the types of workloads, long-running ETL queries and user
access patterns. This level of analysis aids in pinpointing databases
Phase 1:
Migration Discovery
and pipelines that contribute to high operational costs and complexity,
and Assessment thereby supporting the prioritization process.

Phase 2: Our BladeBridge Code Analyzer not only classifies queries based on
Architecture Design
their complexity in “T-shirt sizes” (small, medium, large, extra-large, etc.)
and Planning
— but also assesses function compatibility of Bteq scripts and stored
Phase 3: procedures, which is vital in ensuring seamless migration.
Data Warehouse
Migration Running Databricks Migration Analyzer

Phase 4:
Code and ETL
Pipelines Migration

Phase 5:
BI and Analytics
Tools Integration
Export Install and Review all Databricks
1 2 3 4
metadata point Analyzer code patterns PS+ SI Partner
Phase 6: from legacy to the metadata and job to give you
Migration Validation systems location complexity full migration
proposal

Need Help
Migrating?
Figure 2: Running Databricks migration analyzer

MICROSOFT
SQ L S ERVER TO
D ATA B R I C K S
M I G R AT I O N
GUIDE 7
Phase 2: Architecture Design and Planning

Introduction SQL Server and Databricks operate in markedly different ways. SQL
Phase 1:
Server requires careful selection of a suitable Primary Index with high
Migration Discovery cardinality to ensure proper data distribution across all participating.
and Assessment
By contrast, the Databricks Intelligence Platform is a distributed system
Phase 2:
by design; the data distribution depends on the configuration of a
Architecture Design
and Planning cluster and the nature of the data. For example, if data is loaded from a
sample CSV file into a Databricks Notebook using the spark.read.csv()
Phase 3:
Data Warehouse
function, the data will be automatically distributed across the nodes in
Migration a cluster. By default, Spark will split the data into partitions, processing
each partition by a separate task on an individual node. This allows for
Phase 4:
Code and ETL efficient parallel processing of the data.
Pipelines Migration
The Databricks distributed design facilitates horizontal scaling, enabling
Phase 5: data distribution and computations across multiple nodes in a cluster.
BI and Analytics
Tools Integration
This capability allows Databricks to process large datasets and handle
high query volumes efficiently, surpassing the capabilities of a traditional
Phase 6: database system like on-premises SQL Server.
Migration Validation
It is essential to consider these distinctions when migrating from
Need Help
Migrating? SQL Server to Databricks. By consciously mapping the similarities
and differences between the two platforms, organizations can better
understand Databricks’ capabilities with SQL Server.

MONITORING AND OBSERVABILITY FRAMEWORK

Plan for a comprehensive monitoring and observability framework


that provides real-time insights into the cloud infrastructure and
applications’ performance, health and security.

MICROSOFT
SQ L S ERVER TO
D ATA B R I C K S
M I G R AT I O N
GUIDE 8
EDW ARCHITECTURE

In legacy EDW architectures, data from various systems is typically


ingested via ETL tools or ingestion frameworks. After landing in the raw
Introduction
layer, the data progresses to the stage or central layer, where further
Phase 1: cleansing and processing occur. Finally, it moves to the last layer,
Migration Discovery containing the most complex business logic.
and Assessment
After the final layer is prepared, use it for reporting purposes through
Phase 2:
Architecture Design third-party tools such as Microstrategy or BusinessObjects.
and Planning Typical EDW High Level Architecture on SQL Server

Phase 3:
Data Warehouse
Operational Data Mart
Migration Database
Lookup

Phase 4: Extract-Load Transform


Dashboards

Code and ETL Operational


Database
Landing Zone Staging Area Enterprise Data
Warehouse
Data Mart

Pipelines Migration

Phase 5: Files

BI and Analytics Analysts

Tools Integration OLAP Cube

Phase 6:
Figure 3: SQL Server reference architecture of an enterprise data warehouse
Migration Validation

Need Help
Migrating?
It is imperative to analyze the current architecture, the as-is architecture
comprehensively. This involves understanding upstream and
downstream integrations and the respective tools and technologies.

Following this analysis, assess the potential for modernizing each


stage of the target architecture. This entails evaluating how well an
organization can transition from legacy systems to modern alternatives
at each stage. Key decisions on data ingest into cloud storage include
evaluating where features like Databricks Autoloader, Lakeflow Connect
or Lakehouse Federation can be implemented. ETL modernization and
partners are considered, and then BI tool compatibilities are also done.
A target architecture and tooling roadmap is created that guides the
migration process.

MICROSOFT
SQ L S ERVER TO
D ATA B R I C K S
M I G R AT I O N
GUIDE 9
Below is an example of a data warehousing architecture on Databricks
with various ISV partner integration options.

Introduction

Phase 1:
Migration Discovery
and Assessment

Phase 2:
Architecture Design
and Planning

Phase 3:
Data Warehouse
Migration

Phase 4:
Code and ETL
Pipelines Migration Figure 4: Modern data warehousing on Databricks

Phase 5:
BI and Analytics
Tools Integration Following the architectural alignment, we will dive deeply into the SQL
Server’s current features.
Phase 6:
Migration Validation
SQL SERVER VS. DATABRICKS FEATURE MAPPING EXAMPLE
Need Help
OBJECTS/ SQL SERVER DATABRICKS
Migrating? WORKLOAD

Compute SQL Server on-premises compute Databricks Managed Clusters optimized for
workload types with a runtime:

• All-purpose for interactive/developer use


• Job clusters for scheduled pipelines
• SQL warehouse for BI workloads

Storage Physical HDD or SSD for on-premises Cloud storage (Amazon S3, Azure Blob
deployment. Storage, Azure Data Lake Storage Gen2,
Google Cloud Storage)

Tables SQL Server Tables Delta tables in Unity Catalog

Format SQL Server proprietary Delta and Iceberg Format (open source)

MICROSOFT
SQ L S ERVER TO
D ATA B R I C K S
M I G R AT I O N
GUIDE 10
OBJECTS/ SQL SERVER DATABRICKS
WORKLOAD

User Interface SQL Server Management Studio (SSMS) Databricks workspace


Databricks collaborative notebooks
Introduction Databricks SQL Query Editor
Databricks CLI
Visual Studio Code
Phase 1:
Spark Connect
Migration Discovery Databricks API
and Assessment Terraform

Phase 2:
Architecture Design Database Tables, Views, Materialized Views Tables, Views, Materialized Views,
Objects (Join Index), Stored Procedures, UDFs DLT, UDFs
and Planning

Phase 3:
Metadata Built-in system tables under the Unity Catalog
Data Warehouse Catalog DBC schema
Migration

Phase 4: Data Sharing No native support for on-premises mode Delta Sharing
Code and ETL Delta Sharing Marketplace
Pipelines Migration

Data Bulk Insert/Bulk Load COPY INTO


Phase 5:
Ingestion SQL Server Integration Services (SSIS) CONVERT TO DELTA
BI and Analytics OPENROWSET Auto Loader
Tools Integration Linked Servers DataFrameReader
Integrations via Partner Connect
Phase 6: Add data UI

Migration Validation

Data Types T-SQL Data types Data Types in Databricks


Need Help
Migrating?
Workload SSIS - SQL Server Integration Services Cluster configuration (policies)
Management Resource Governor Multi-clustering
Intelligent Workload Management
Intelligent Autoscaling
Adaptive Routing

Security RBAC IAM, RBAC


Database roles Database object permissions (UC)
Database object permissions Dynamic data masking
Dynamic Data masking (2016 and above) Row-level security
Row-level security Column level security
Column level security

Storage SQL Server Proprietary Delta (Parquet files with metadata) and
Format Iceberg format

MICROSOFT
SQ L S ERVER TO
D ATA B R I C K S
M I G R AT I O N
GUIDE 11
OBJECTS/ SQL SERVER DATABRICKS
WORKLOAD

Sorting Unsupported Z-Ordering


Liquid Clustering
Introduction

Phase 1: Programming SQL only SQL, Python, R, Scala, Java


Migration Discovery Language

and Assessment

Data External ETL tools like SSIS, PowerCenter DLT


Phase 2:
Integration Databricks workflows
Architecture Design External tools (dbt, Matillion, Prophecy,
and Planning Informatica, Talend, etc.)

Phase 3:
Data Warehouse Orchestration SSIS - SQL Server Integration Services Databricks Workflows
Migration

Machine Unsupported Databricks ML (Runtime with OSS ML


Phase 4:
Learning packages, MLflow, feature store, AutoML)
Code and ETL
Pipelines Migration
Pricing Unit PerCore Licensing Databricks units (DBUs)
Phase 5:
BI and Analytics
Tools Integration

Phase 6: The table above compares key features between SQL Server and
Migration Validation
Databricks. Undertaking a thorough comparison is essential during
Need Help this stage of the migration process. This systematic process ensures a
Migrating? comprehensive understanding of the required transformation, facilitating
a smoother transition by identifying equivalent services, functionalities
and potential gaps or challenges.

Typically, by the end of this phase, we have a good handle on the scope
and complexity of the migration and can come up with a more accurate
migration plan and cost estimate.

MICROSOFT
SQ L S ERVER TO
D ATA B R I C K S
M I G R AT I O N
GUIDE 12
Phase 3: Data Warehouse Migration

Introduction CONSIDERATIONS FOR WORKSPACE CREATION


Phase 1: The entire planning of a new Databricks environment for a customer
Migration Discovery
and Assessment is out of scope for this document (please see Azure Databricks
Administration Guide | Databricks on AWS administration introduction
Phase 2:
| Databricks administration introduction on GCP and Azure Databricks
Architecture Design
and Planning well-architected framework | Databricks on AWS well-architected
data lakehouse | Introduction to the well-architected data lakehouse
Phase 3:
Data Warehouse
| Databricks on Google Cloud) for a much more in-depth guide for
Migration planning, deploying and configuring a customer Databricks environment),
but key considerations for the target workspaces of a data warehouse
Phase 4:
Code and ETL migration include:
Pipelines Migration
• Separation of environments: Requiring different workspaces for
Phase 5:
BI and Analytics development, staging, production and other environments.
Tools Integration
• Separation of business units: Different workspaces can be
Phase 6: designed for different departments, such as marketing, finance, risk
Migration Validation management, etc. BU-separated workspaces can make chargeback
Need Help
billing models easier instead of relying on tags.
Migrating?
Note: If these separate departments must share data and collaborate,
features like delta sharing and Unity Catalog will help.

• Implementing modern data architectures: Different workspaces are


required to support modern data architectures, such as Data Mesh
architecture, to decentralize data ownership for different domains.

CONSIDERATIONS FOR SCHEMA AND DATA MIGRATION

Once the Databricks workspaces have been established, the initial


migration phase involves migrating schema and data. This includes
metadata, such as table data definition language (DDL) scripts, views
and table data.
MICROSOFT
SQ L S ERVER TO
D ATA B R I C K S
M I G R AT I O N
GUIDE 13
As organizations navigate the process of migrating data out of SQL
Server, it’s crucial to consider several key decisions. These include:

Introduction • What is the target design for the tables being migrated?

Phase 1: • Should the destination retain the same hierarchy of catalogs,


Migration Discovery databases, schemas and tables?
and Assessment
• How can the separation into hot (recent) and cold (historical)
Phase 2:
datasets be considered to optimize data migration?
Architecture Design
and Planning
• Consider a potential cleanup or reorganization of the existing data
Phase 3: footprint in SQL Server. This step could significantly enhance the
Data Warehouse efficiency and effectiveness of the migration process, reducing
Migration potential issues, simplifying data management and optimizing
Phase 4: resource use in the new environment.
Code and ETL
Pipelines Migration
There are also a few recommendations that can help to enable a
Phase 5: smoother and less risky migration:
BI and Analytics
Tools Integration
• Data modeling: As part of the migration, there might be a need to
Phase 6: refactor or reproduce a similar data model in an automated and
Migration Validation scalable fashion. Visual data modeling tools like Quest ERWIN or
SQLDBM can be found in Databricks Partner Connect. These tools
Need Help
Migrating? can help accelerate the development and deployment of the
refactored data model in just a few clicks. Such a tool can reverse
engineer a SQL Server data model (table structures and views) in a
way that can be implemented in Databricks easily.

• When migrating DDLs, verifying the provenance of the data schema


(e.g., source data) is essential. Consider an instance where one of the
SQL Server tables is presented with a data type that is proprietary
to SQL Server (Ex. cast, convert, etc.). That exact data type might
not find a precise equivalent for such data types in Databricks, so
replacement data types are needed.

• Schematic and data migration can proceed after carefully


deliberating and finalizing key decisions. It’s generally advisable to
avoid introducing significant changes to the schema structure
MICROSOFT
during migration.
SQ L S ERVER TO
D ATA B R I C K S
M I G R AT I O N
GUIDE 14
Delta Lake’s Schema Evolution capability allows for flexible modification
and schema evolution after data is placed in Delta, simplifying the initial
migration process. This approach facilitates easier data comparison in
Introduction
SQL Server and Databricks during parallel runs.

Phase 1:
Maintaining the existing schema during migration ensures consistency,
Migration Discovery easing the data verification process and fostering a more reliable and
and Assessment efficient migration.
Phase 2:
Architecture Design
and Planning RECOMMENDED APPROACH
Phase 3: It is important to note that not every data migration will follow the same
Data Warehouse
Migration
pattern, as each migration is influenced by unique factors such as data
volume, system complexity and organizational requirements. However,
Phase 4: Databricks recommends adhering to the following general flow for an
Code and ETL
effective and efficient data migration process:
Pipelines Migration

Phase 5:
BI and Analytics
Tools Integration
1 Migrate enterprise data warehouse (EDW) tables
Phase 6: into Delta Lake medallion data architecture:
Migration Validation
• The raw layer into Bronze
Need Help • The stage/central or historical layer to Silver
Migrating?
• The final layer or semantic layer to Gold

2 Migrate or build data pipelines that populate the


Bronze, Silver and Gold layers within the Delta Lake
incrementally.

3 Backfill of Bronze, Silver, and Gold tables as needed.

For more details on modeling approaches and design patterns, refer to


Data Warehouse Modeling Techniques.

MICROSOFT
SQ L S ERVER TO
D ATA B R I C K S
M I G R AT I O N
GUIDE 15
PHASE 3.1: SCHEMA MIGRATION

Before offloading tables to Databricks, it’s essential to establish the


schema of the tables within the Databricks environment. These can be
Introduction beneficial if an organization possesses Data Definition Language (DDL)
Phase 1:
scripts from its existing system. With minor adjustments, mainly to
Migration Discovery accommodate the data types used in Databricks, these scripts can be
and Assessment utilized to create corresponding table schemas in Databricks. Reusing
existing scripts can streamline the preparation process, thus fostering
Phase 2:
Architecture Design efficiency and maintaining schema consistency during the migration.
and Planning
DDL scripts can also be extracted from SQL Server using the SQL Server
Phase 3: Management Studio tool or SQL Server Management Objects (SMO) API
Data Warehouse or constructed dynamically using metadata available in system tables.
Migration
Once the DDLs are extracted, converting them to comply with
Phase 4:
Code and ETL Databricks is essential. Below are some of the example scenarios to be
Pipelines Migration considered:

Phase 5:
BI and Analytics
Tools Integration

Phase 6:
Migration Validation
1 Namespace mapping, i.e., schema/object in
SQL Server to catalog/schema/object in Databricks
Need Help Unity Catalog.
Migrating?

2 Databricks supports IDENTITY columns only on


bigint columns.

3 Table options such as distributions and indexes are


not applicable in Delta tables.

MICROSOFT
SQ L S ERVER TO
D ATA B R I C K S
M I G R AT I O N
GUIDE 16
4 Caution must be exercised when converting
Introduction PRIMARY and SECONDARY indices to partitions
in Delta tables. Over partitioning can lead to
Phase 1:
Migration Discovery
unnecessary overhead and minor file problems,
and Assessment ultimately compromising performance in the
Lakehouse architecture. Delta’s default partition
Phase 2:
size is 1TB, and Z-order indexes and predictive
Architecture Design
and Planning optimizations simplify the design in Databricks.

Phase 3:
Data Warehouse 5 Additional Delta table properties can be specified
Migration via the TBLPROPERTIES clause, e.g., delta.
targetFileSize, delta.tuneFileSizesForRewrites, delta.
Phase 4:
Code and ETL
columnMapping.mode, and others.
Pipelines Migration

Phase 5:
BI and Analytics
Tools Integration
PHASE 3.2: DATA MIGRATION
Phase 6:
Migration Validation Transferring legacy on-premises data to a cloud storage location for
Need Help
seamless consumption in Databricks can be a demanding task, but
Migrating? there a few viable options:

1 Databricks Lakeflow Connect offers fully


managed connectors for ingesting data from SaaS
applications and databases into a Databricks
lakehouse. For more information, refer to Ingest data
from SQL Server.

2 Leveraging Databricks Lakehouse Federation:


Databricks Lakehouse Federation allows for
federated queries across different data sources,
including SQL Server.
MICROSOFT
SQ L S ERVER TO
D ATA B R I C K S
M I G R AT I O N
GUIDE 17
3 Microsoft Azure Data Factory (ADF): Another
Introduction approach involves using Azure ADF to extract
data directly from SQL Server using a JDBC/ODBC
Phase 1: connector and then store it in a Blob.
Migration Discovery
and Assessment
4 ISV Partners such as Qlik Replicate: Qlik can
Phase 2:
Architecture Design
replicate data from SQL Server to the Databricks
and Planning Delta table for historical and CDC data.

Phase 3:
Data Warehouse 5 Using Databricks’ JDBC Connector: Databricks
Migration provides a JDBC (Java Database Connectivity)
connector that facilitates direct reading from SQL
Phase 4:
Code and ETL
Server databases.
Pipelines Migration

Phase 5:
BI and Analytics
Tools Integration
PHASE 3.3: OTHER DATABASE OBJECTS MIGRATION
Phase 6:
Migration Validation Other Database Objects, such as Views, Stored procedures, and
Macros and Functions can also be easily migrated to Databricks via
Need Help our automated code conversion processes. Please review this helpful
Migrating?
cheat sheet packed with essential tips and tricks to help users start on
Databricks using SQL programming in no time! Some key pointers while
converting T-specific SQL objects:

• Views is typically used for data access control. In the context


of the medallion architecture, they can be considered part of
the Gold layer. Views would also be used as an intermediate data
structure while transforming data and publishing the business KPIs
to final users.

MICROSOFT
SQ L S ERVER TO
D ATA B R I C K S
M I G R AT I O N
GUIDE 18
• Stored procedures are typically used in data warehouse
environments to leverage the ELT pattern. This methodology signifies
that most data processing transactions are performed in the
Introduction
warehouse.

• Functions are generally utilized to execute additional


Phase 1:
Migration Discovery transformations over scalar values. Inline table-valued functions,
and Assessment while not extensively used, hold their significance. When called
upon, they can be conceptualized as parameterized views, offering
Phase 2:
Architecture Design a more dynamic approach to data processing. They can adapt to
and Planning varying input parameters, providing greater flexibility in handling and
transforming data.
Phase 3:
Data Warehouse
Migration
Stored procedures implementation in Databricks
Phase 4:
Code and ETL
Note that with the newly released Databricks SQL Scripting support,
Pipelines Migration
you can now easily deploy or convert powerful procedural logic within
Phase 5: Databricks. Databricks SQL scripting supports compound statement
BI and Analytics blocks (with BEGIN….END). Within the Databricks SQL scripting
Tools Integration
procedures, we can declare local variables, user-defined functions,
Phase 6: use condition handlers for catching exceptions and use flow control
Migration Validation statements such as FOR loops over query results, conditional logic
such as IF and CASE and means to break out loops such as LEAVE and
Need Help
Migrating? ITERATE. These features make stored procedures (T-SQL ) migration to
Databricks even easier.

MICROSOFT
SQ L S ERVER TO
D ATA B R I C K S
M I G R AT I O N
GUIDE 19
Implement Slowly Changing Dimensions

Slowly Changing Dimensions (SCDs) are crucial in data warehousing


and managing historical changes to dimensional data over time,
Introduction including updates, inserts or deletions to maintain accurate historical
Phase 1:
data. However, migrating SCDs from SQL Server to Databricks poses
Migration Discovery challenges, such as ensuring data consistency and accuracy, especially
and Assessment with large data volumes. Additionally, mapping SCD logic from T-SQL to
Databricks SQL requires careful consideration and testing to maintain
Phase 2:
Architecture Design functionality and performance. Overall, effectively managing and
and Planning migrating SCDs from SQL Server to Databricks demands thorough
planning, testing and optimization for a successful transition.
Phase 3:
Data Warehouse
Migration Here are a few resources that can assist in achieving a successful
transition:
Phase 4:
Code and ETL
• How to implement SCDs when you have duplicates - Part 1
Pipelines Migration
• How to implement SCDs when you have duplicates - Part 2: DLT
Phase 5:
BI and Analytics
• APPLY CHANGES API: Simplify change data capture in DLT
Tools Integration

Phase 6:
Migration Validation
PHASE 3.4: DATA SECURITY MIGRATION
Need Help
Migrating?
When discussing security migration, we need to consider both
authentication and authorization.

When planning SQL Server security objects migration, it is essential to


understand the differences between the two platforms and correctly
map SQL Server security capabilities to Databricks Data Intelligence
Platform security capabilities.

MICROSOFT
SQ L S ERVER TO
D ATA B R I C K S
M I G R AT I O N
GUIDE 20
Authentication

MS SQL Server supports Active Directory (AD) and SQL Server-specific


Introduction login and password authentication.

Phase 1: Azure AD can connect with on-premises AD (users, groups, service


Migration Discovery principals) registered in the Azure Databricks account and used across
and Assessment Databricks workspaces, Unity Catalog and within Databricks SQL.
Phase 2: SQL Server SQL logins are used in SQL basic authentication, defined
Architecture Design
and Planning using CREATE LOGIN statements. On the other hand, Azure Databricks
SQL supports only Azure AD authentication. This means that SQL logins
Phase 3: in SQL Server must be converted to Azure AD principals for migration
Data Warehouse
Migration
purposes or wholly excluded from the migration scope.

Note that when discussing MS SQL Server, it is natural to migrate to


Phase 4:
Code and ETL Azure Databricks. Therefore, authentication in Databricks on AWS is not
Pipelines Migration within the scope of this guide.
Phase 5:
BI and Analytics Authorization
Tools Integration

Phase 6:
Though Databricks supports Hive metastore, this document will
Migration Validation focus on Unity Catalog only, as it is a future-proof approach for data
governance in Databricks Data Intelligence Platform.
Need Help
Migrating? From a security perspective, Unity Catalog shares many similarities
with SQL Server. Both offer authorization models where permissions are
assigned to objects and use ANSI-compliant SQL statements such as
GRANT and REVOKE. However, it is crucial to understand the difference
between the two to execute a successful migration.

First, SQL Server offers a concept of database roles — another type


of database principals that may have members apart from assigned
permissions. Thus, all members get all permissions assigned to their
respective roles. At this time, Databricks doesn’t have a concept of
database roles. The most suitable migration strategy is leveraging
Unity Catalog Account groups.

MICROSOFT
SQ L S ERVER TO
D ATA B R I C K S
M I G R AT I O N
GUIDE 21
Similarly, Databricks doesn’t have fixed database roles, which are
widely used in SQL Server (e.g., db_owner or db_securityadmin). We
recommend revising the usage of such roles and moving to a more
Introduction
granular permissions model in the Unity Catalog.

Phase 1:
Both SQL Server and Databricks offer similar GRANT and REVOKE
Migration Discovery statement syntax. However, Databricks doesn’t possess DENY
and Assessment statements. Therefore, there is no explicit support for the denial of
permission. If you use this security technique in your design, we suggest
Phase 2:
Architecture Design revising and redesigning to an approach based on explicit grants and,
and Planning optionally, permission inheritance.

Phase 3: It is also essential to understand the difference between permissions


Data Warehouse and privileges that can be assigned to different database objects. One
Migration
of the most important differences is that SQL Server offers INSERT,
Phase 4: UPDATE and DELETE permissions for tables, while Databricks offers only
Code and ETL MODIFY permission, which essentially includes all three permissions.
Pipelines Migration
A more detailed description of the permissions applicable to specific
Phase 5:
BI and Analytics
securable objects techniques is available in the documentation on the
Tools Integration hyperlinks that follow:

Phase 6: • Databricks
Migration Validation
• SQL Server
Need Help
Migrating? We recommend following these standard practices in Databricks Unity
Catalog to enable better permissions manageability and operational
excellence.

• Assign permissions at the highest possible level (e.g., schema,


catalog) to leverage permissions inheritance.

• Assign permissions to groups rather than individual principals.

• Using AAD groups to manage permissions rather than giving


permissions to individuals is recommended.

Column-level security, Row-Level Security, and Dynamic Data


Masking are implemented via Databricks dynamic views, row-level
filters, and column masking.

MICROSOFT
SQ L S ERVER TO
D ATA B R I C K S
M I G R AT I O N
GUIDE 22
Phase 4: Code and ETL Pipelines Migration

Introduction A comprehensive understanding of the pipelines, extending from data


Phase 1:
sources to the consumption layer and including governance aspects, is a
Migration Discovery crucial prerequisite for executing an effective workload migration. It’s not
and Assessment just about moving data; it’s about ensuring a smooth, efficient transition
Phase 2: and maintaining data integrity. Data pipeline migrations are multifaceted
Architecture Design operations, especially those transitioning from SQL Server to Databricks.
and Planning
They encompass several key components: orchestration, source/sink
Phase 3: migration, query migration and refactoring. These elements contribute to
Data Warehouse a successful and seamless data migration process.
Migration

Phase 4: Depending on the adopted data ingestion and transformation pattern


Code and ETL
(ETL vs. ELT), more migration efforts may be needed for either SQL query
Pipelines Migration
migration or ETL pipeline redesign.
Phase 5:
BI and Analytics
Tools Integration

Phase 6: ORCHESTRATION MIGRATION


Migration Validation
An ETL orchestration can refer to orchestrating and scheduling
Need Help end-to-end pipelines covering data ingestion, data integration, result
Migrating?
generation or orchestrating DAGs of a specific workload type like data
integration. In SQL Server, the orchestration is typically done using
SQL Server Agent or SSIS, and there are generally these options when
migrating these workflows:

1 Use Databricks Workflows to orchestrate the


migrated pipelines. Databricks Workflows support
various tasks like Python scripts, Notebooks, dbt
transformations, and SQL tasks. The customer
needs to provide job sequences and schedules as
a prerequisite for converting them into Databricks
workflows.
MICROSOFT
SQ L S ERVER TO
D ATA B R I C K S
M I G R AT I O N
GUIDE 23
2 DLT provides a standard framework for building
batch and streaming use cases. It also includes
Introduction
critical data engineering features such as automatic
Phase 1: data testing, deep pipeline monitoring and recovery.
Migration Discovery It also has out-of-the-box functionality for Slow
and Assessment Change Dimension (SCD) Type 1 and Type 2 tables.
Phase 2:
Architecture Design 3 It is also possible to use external tools like Apache
and Planning
Airflow. Considering how tightly coupled Databricks
Phase 3: Workflows is with the Databricks Intelligence
Data Warehouse Platform, it is recommended that Databricks
Migration
Workflows be used for better integration, simplicity
Phase 4:
and lineage.
Code and ETL
Pipelines Migration

Phase 5:
BI and Analytics
Tools Integration

Phase 6:
Migration Validation

Need Help
Migrating?

Figure 5: Databricks Workflows

MICROSOFT
SQ L S ERVER TO
D ATA B R I C K S
Figure 6: DLT pipelines
M I G R AT I O N
GUIDE 24
QUERY MIGRATION AND REFACTORING

Migrating from T-SQL to Databricks SQL requires identifying and


replacing any incompatible/proprietary T-SQL functions or syntax.
Introduction
Databricks have mature code converters and migration tooling to make
Phase 1: this process smoother and highly automated.
Migration Discovery
and Assessment

Phase 2:
Architecture Design
and Planning

Phase 3:
Data Warehouse
Migration

Phase 4:
Code and ETL
Pipelines Migration

Phase 5:
BI and Analytics
Tools Integration

Figure 7: Databricks migration tooling simplifies legacy EDW migrations


Phase 6:
Migration Validation
Databricks Code Converter (acquired from BladeBridge) offers
Need Help
Migrating?
automated tooling to modernize and convert Microsoft SQL Server code
to Databricks.

• Automated conversion: Databricks Converter can automatically


convert SQL workloads, significantly speeding up and de-risking
migration projects.

• Broad support: It supports a wide range of legacy EDW and ETL


platform syntax and can convert legacy code to Databricks.

• Broad adoption by services firms: Most System Integrator partners


have deep expertise and access to our converters.

• Cost and time-effective: Our Converter reduces the cost and time
required for a migration project by automating the process.

• Decreases complexity: The tool reduces the complexity of the


MICROSOFT migration process by providing a systematic approach to conversion.
SQ L S ERVER TO
D ATA B R I C K S
M I G R AT I O N
GUIDE 25
Since SQL Server and Databricks support ANSI SQL standards,
many T-SQL queries can be automatically converted to Databricks
syntax to accelerate the migration. The Databricks Code Converter
Introduction supports schema conversion (tables and views), SQL queries (select
statements, expressions, functions, user-defined functions, etc.), stored
Phase 1:
Migration Discovery
procedures and data loading utilities such as COPY INTO. The conversion
and Assessment configuration is externalized, meaning users can extend conversion rules
during migration projects to handle new code pattern sets to achieve a
Phase 2:
Architecture Design more significant percentage of automation. A migration proposal with
and Planning automated converter tooling can be created for your organization via
Databricks Professional Services or our certified Migration Brickbuilder
Phase 3:
Data Warehouse SI Partners. Databricks Code Converter tooling requires Databricks
Migration professional services or a Databricks SI Partner agreement.
Phase 4:
Code and ETL Please review this short demonstration of the conversion tool.
Pipelines Migration

Phase 5:
BI and Analytics
Tools Integration

Phase 6:
Migration Validation

Need Help
Migrating?

MICROSOFT
SQ L S ERVER TO
D ATA B R I C K S
M I G R AT I O N
GUIDE 26
Code Optimization

Many queries will likely need to be refactored and optimized during the
Introduction migration process. Easy techniques like automated liquid clustering and
predictive optimization make performance tuning almost an automated
Phase 1:
Migration Discovery
process in Databricks. Predictive Optimization uses techniques like:
and Assessment

Phase 2:
Architecture Design
and Planning
1 Compaction - that optimizes file sizes.
Phase 3:
Data Warehouse 2 Liquid clustering - that incrementally clusters
Migration
incoming data, enabling optimal data layout and
Phase 4: efficient data skipping.
Code and ETL
Pipelines Migration
3 Running Vacuum - which reduces costs by deleting
Phase 5: unneeded files from storage.
BI and Analytics
Tools Integration
4 Automatic updating of Statistics - running the
Phase 6: ANALYZE STATISTIC command on the required
Migration Validation
columns for best performance.
Need Help
Migrating?

Figure 8: Automatic liquid clustering

MICROSOFT
SQ L S ERVER TO
D ATA B R I C K S
M I G R AT I O N
GUIDE 27
SQL Server to Databricks Cutover Phase

During this phase, while data workloads run concurrently in SQL Server
Introduction and Databricks, it presents an opportunity for a comparative analysis to
understand the behavior of workloads in Databricks versus SQL Server.
Phase 1:
Migration Discovery
This can help identify potential bottlenecks or shortcomings resulting
and Assessment from the code migration and refactoring phase.

Phase 2:
Architecture Design To minimize expenses and disruption to business during this transition,
and Planning consider the following recommendations:
Phase 3:
Data Warehouse
Migration

Phase 4: 1 Develop a cutover schedule that prioritizes the most


Code and ETL resource-intensive workloads.
Pipelines Migration

Phase 5: 2 Establish clear criteria for approvals for cutover to


BI and Analytics Databricks.
Tools Integration

Phase 6: 3 Define appropriate criteria for retiring workloads in


Migration Validation
SQL Server based on the approvals from step 2.
Need Help
Migrating?
4 Implement an effective communication strategy
and conduct downstream user enablement sessions
to minimize business disruption.

5 Implement a round-the-clock hypercare phase to


support critical business-related workloads during
the transition.

With this phase complete, ETL workloads are fully migrated and
operational in Databricks, and the final layer data in SQL Server is
synchronized with the Gold layer data in Databricks.
MICROSOFT
SQ L S ERVER TO
D ATA B R I C K S
M I G R AT I O N
GUIDE 28
Phase 5: BI and Analytics Tools Integration

Introduction To further consolidate data platform infrastructure and maintain a single


Phase 1:
source of truth data, organizations have adopted Databricks SQL to meet
Migration Discovery their data warehousing needs and support downstream applications and
and Assessment business intelligence dashboards.
Phase 2:
Architecture Design Once ingestion and transformation pipelines are migrated to the
and Planning
Databricks Data Intelligence Platform, it is critical to ensure the business
Phase 3: continuity of downstream applications and data consumers. Databricks
Data Warehouse Data Intelligence Platform has validated large-scale BI integrations
Migration
with many popular BI tools like Tableau, Power BI, Qlik, ThoughtSpot,
Phase 4: Sigma, Looker and more. The expectation for a given set of dashboards
Code and ETL
or reports to work is to ensure all the upstream tables and views are
Pipelines Migration
migrated along with the associated pipelines and dependencies to
Phase 5: conform to the existing data models and semantic layers in those tools.
BI and Analytics
Tools Integration
As described in the blog (see section 3.5 Repointing BI workloads), one
Phase 6: common way to repoint BI workloads after data migration is to test
Migration Validation
sample reports, rename existing tables’ data source/table names, and
Need Help point to the new ones.
Migrating?

Typically, if the schema of the tables and views post-migration hasn’t


changed, repointing is a straightforward exercise in handling switching
databases on the BI dashboard tool. If the schema of the tables has
changed, you will need to modify the tables/views in the lakehouse to
match the expected schema of the report/dashboard and publish them
as a new data source for the reports.

MICROSOFT
SQ L S ERVER TO
D ATA B R I C K S
M I G R AT I O N
GUIDE 29
Many customers take this opportunity to optimize their BI models and
semantic layers to align with business needs.

Introduction

Phase 1:
Migration Discovery
and Assessment

Phase 2:
Architecture Design
and Planning

Phase 3:
Data Warehouse Figure 9: Future-state architecture
Migration

Phase 4:
Code and ETL During report migration, you may encounter a scenario where expanding
Pipelines Migration the permissions of BI tool access to cloud storage buckets becomes
Phase 5:
necessary to leverage Databricks Cloud Fetch feature. This feature
BI and Analytics enables high-bandwidth data exchange and enhances the efficiency
Tools Integration of data retrieval. For more details, refer to the blog How We Achieved
Phase 6: High-bandwidth Connectivity With BI Tools.
Migration Validation

Need Help
Migrating?

MICROSOFT
SQ L S ERVER TO
D ATA B R I C K S
M I G R AT I O N
GUIDE 30
MICROSOFT POWER BI INTEGRATION

Microsoft Power BI, a commonly seen downstream application in


Introduction various customer environments, typically operates on top of SQL
Server’s serving layer.
Phase 1:
Migration Discovery
and Assessment
When migrating to Databricks Data Intelligence Platform, Power BI
Phase 2: datasets must also be migrated. As Microsoft Power BI provides a
Architecture Design connector for Azure Databricks, migrating Power BI datasets is a
and Planning
repointing to Databricks SQL warehouses. This can be performed by
Phase 3: changing the data sources in M-code1 using Power BI Desktop, SQL
Data Warehouse
Migration Server Management Studio or Tabular Editor.

Phase 4:
When migrating Power BI datasets to Azure Databricks, standard
Code and ETL
Pipelines Migration migration techniques may apply, including piloting or MVP, using
separate Power BI workspaces for testing and data validation after
Phase 5: switching datasets to Azure Databricks SQL.
BI and Analytics
Tools Integration
For more information on implementing Semantic Lakehouse with Azure
Phase 6: Databricks and Power BI, please refer to the following blog posts:
Migration Validation

Need Help • The Semantic Lakehouse With Azure Databricks and Power BI
Migrating?
• Power Up Your BI With Microsoft Power BI and Lakehouse in Azure
Databricks: Part 1 — Essentials

• Power Up Your BI With Microsoft Power BI and Lakehouse in Azure


Databricks: Part 2 — Tuning Power BI

• Power Up Your BI With Microsoft Power BI and Lakehouse in Azure


Databricks: Part 3 — Tuning Azure Databricks SQL

MICROSOFT 1. A proprietary functional language primarily used for data transformation, allowing for the import, filtering,
SQ L S ERVER TO merging and shaping of data
D ATA B R I C K S
M I G R AT I O N
GUIDE 31
Phase 6: Migration Validation

Introduction The primary validation method for a data pipeline is the resulting dataset
Phase 1:
itself. We recommend establishing an automated testing framework that
Migration Discovery can be applied to any pipeline. Typically, this involves using a testing
and Assessment framework with a script capable of automatically comparing values in
Phase 2: both platforms.
Architecture Design
and Planning
Databricks recommends you perform the following checks at a minimum:
Phase 3:
• Check to see if a table exists
Data Warehouse
Migration • Check the counts of rows and columns across the tables

Phase 4: • Calculate various aggregates over columns and compare,


Code and ETL for example:
Pipelines Migration
• SUM, MIN, MAX, AVG of numeric columns
Phase 5:
• MIN, MAX for string and date/time columns
BI and Analytics
Tools Integration • COUNT(*), COUNT(NULL), COUNT(DISTINCT) for all columns

Phase 6:
Migration Validation Run the pipelines in parallel for a specific period (we find one week to
be an acceptable baseline, but you may wish to extend this to ensure
Need Help
Migrating? stability) and review the comparison results to ensure the data is
ingested and transformed into the proper context.

It is advisable to initiate validation with your most critical tables,


which often drive the results or calculations of tables in the gold layer.
This includes control tables, lookup tables and other essential datasets.

MICROSOFT
SQ L S ERVER TO
D ATA B R I C K S
M I G R AT I O N
GUIDE 32
A robust data validation requires the following components:

• Snapshot(s): Data to work with, including a pre- and post-version for


Introduction each script (ideal) and job being migrated.
• Table comparison code: A standardized way to compare the result
Phase 1:
Migration Discovery table to determine whether the test is successful. The tables can be
and Assessment compared based on:
Phase 2: • schema checks
Architecture Design
• row count checks
and Planning
• row-by-row checks
Phase 3:
Data Warehouse
Migration For more advanced table data and schema comparison, tools like
Datacompy can be used.
Phase 4:
Code and ETL
Pipelines Migration Identifying the primary key combination from the customer is essential
Phase 5:
to check counts and row-by-row comparisons.
BI and Analytics
Tools Integration

Phase 6:
Migration Validation

Need Help
Migrating?

MICROSOFT
SQ L S ERVER TO
D ATA B R I C K S
M I G R AT I O N
GUIDE 33
Need Help Migrating?

Introduction Regardless of size and complexity, the Databricks Professional Services


Phase 1:
team and an ecosystem of certified migration services partners and
Migration Discovery ISV partners offer different levels of support (advisory/assurance, staff
and Assessment augmentation, scoped implementation) to accelerate your migration and
Phase 2: ensure successful implementation.
Architecture Design
and Planning When engaging with our experts, you can expect:
Phase 3:
Data Warehouse
Discovery and Profiling: Our team starts by clearly understanding
Migration migration drivers and identifying challenges within the existing
Microsoft SQL Server deployment. We conduct collaborative
Phase 4:
Code and ETL discussions with key stakeholders, leveraging automated profiling
Pipelines Migration tools to analyze legacy workloads. This is used to determine
drivers of business value and total cost of ownership (TCO) savings
Phase 5:
BI and Analytics achievable with Databricks.
Tools Integration
Assessment: Using automated tooling, we perform an analysis
Phase 6: of existing code complexity and architecture. This assessment
Migration Validation
helps estimate migration effort and costs, refine migration scope
Need Help and determine which parts of the legacy environment require
Migrating?
modernization or can be retired.

Migration Strategy and Design: Our architects will work with your
team to finalize the target Databricks architecture, detailed migration
plan and technical approaches for the migration phases outlined in
this guide. We will help select appropriate migration patterns, tools
and delivery partners and collaborate with our certified SI partners
to develop a comprehensive Statement of Work (SOW).

MICROSOFT
SQ L S ERVER TO
D ATA B R I C K S
M I G R AT I O N
GUIDE 34
Execute and Scale: We and our certified partners deliver on our
comprehensive migration plan and then work with your team to
facilitate knowledge sharing and collaboration and scale successful
Introduction practices across the organization. Our experts can help you set up
a Databricks Center of Excellence (CoE) to capture and disseminate
Phase 1:
Migration Discovery
lessons learned and drive standardization and best practices as you
and Assessment expand to new use cases.

Phase 2:
Architecture Design
and Planning Contact your Databricks representative or use this form for more
information. Our specialists can help you every step of the way!
Phase 3:
Data Warehouse
Migration

Phase 4:
Code and ETL
Pipelines Migration

Phase 5:
BI and Analytics
Tools Integration

Phase 6:
Migration Validation

Need Help
Migrating?

MICROSOFT
SQ L S ERVER TO
D ATA B R I C K S
M I G R AT I O N
GUIDE 35
About Databricks
Databricks is the data and AI company. More than 10,000
organizations worldwide — including Block, Comcast, Condé
Nast, Rivian, Shell and over 60% of the Fortune 500 — rely on the
Databricks Data Intelligence Platform to take control of their data and
put it to work with AI. Databricks is headquartered in San Francisco,
with offices around the globe, and was founded by the original
creators of Lakehouse, Apache Spark™, Delta Lake and MLflow.

To learn more, follow Databricks on LinkedIn, X and Facebook.

Start your free trial

© Databricks 2025. All rights reserved. Apache, Apache Spark, Spark and
the Spark logo are trademarks of the Apache Software Foundation. Privacy Notice | Terms of Use !"

You might also like