0% found this document useful (0 votes)
1K views686 pages

Fabric Data Warehouse

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1K views686 pages

Fabric Data Warehouse

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 686

Tell us about your PDF experience.

Data warehousing documentation in


Microsoft Fabric
Learn more about Fabric Data Warehouse

Data warehousing

e OVERVIEW

Overview

Get started with Data Warehouse

g TUTORIAL

End to End tutorials

Data warehouse tutorial

Create a Warehouse quickstart

Migrate Azure Synapse Analytics dedicated SQL pools to Fabric Warehouse

Get started with SQL analytics endpoint

b GET STARTED

What is a Lakehouse?

Better together - the lakehouse and warehouse

Create a lakehouse with OneLake

Default Power BI semantic models

Load data into the Lakehouse

How to copy data using Copy activity in Data pipeline

How to move data into Lakehouse via Copy assistant


Security

c HOW-TO GUIDE

Data warehousing security

Connectivity

Microsoft Entra ID authentication

Workspace roles

SQL granular permissions

Row-level security

Column-level security

Share your data

Ingest data

c HOW-TO GUIDE

Ingest data guide

Ingest data using pipelines

Ingest data using TSQL

Ingest data using Copy

p CONCEPT

Mirroring

Database Mirroring in Fabric

Mirroring Azure SQL Database

Mirroring Azure SQL Managed Instance

Mirroring Azure Cosmos DB

Mirroring Databricks

Mirroring Snowflake

Open mirroring
Share and secure mirrored databases

Design and Develop

p CONCEPT

Semantic models

Model data in the default Power BI semantic model

Define relationships in data models

Reports in the Power BI service

Query

p CONCEPT

Query using the SQL Query editor

Query with T-SQL

Query using visual query editor

View data in the Data preview

Manage

c HOW-TO GUIDE

Statistics

Query insights

Caching

Troubleshoot

Monitor

c HOW-TO GUIDE
Capacity Metrics app

Monitor using DMVs

Warehouse capacity

Y ARCHITECTURE

Burstable capacity

Pause and resume

Smoothing and throttling

Workload Management

Data recovery

Y ARCHITECTURE

Restore in-place

Clone tables

Clone tables in the Fabric portal

Best practices

Y ARCHITECTURE

Warehouse performance

SQL analytics endpoint performance

Security

Ingest Data
What is data warehousing in Microsoft
Fabric?
Article • 08/22/2024

Applies to: ✅ SQL analytics endpoint and Warehouse in Microsoft Fabric

Microsoft Fabric is a next-generation data warehousing solution within Microsoft Fabric.

The lake-centric warehouse is built on an enterprise grade distributed processing engine


that enables industry leading performance at scale while minimizing the need for
configuration and management. Living in the data lake and designed to natively support
open data formats, Fabric data warehouse enables seamless collaboration between data
engineers and business users without compromising security or governance.

The easy-to-use SaaS experience is also tightly integrated with Power BI for easy analysis
and reporting, converging the world of data lakes and warehouses and greatly
simplifying an organizations investment in their analytics estate.

Data warehouse customers benefit from:

Data stored in Delta-parquet format enables ACID transactions and


interoperability with other Fabric workloads means you don't need multiple copies
of data.
Cross database queries can use multiple data sources for fast insights with zero
data duplication.
Easily ingest, load and transform data at scale through Pipelines, Dataflows, cross
database query or the COPY INTO command.
Autonomous workload management with industry-leading distributed query
processing engine means no knobs to turn to achieve best in class performance.
Scale near instantaneously to meet business demands. Storage and compute are
separated.
Reduced time to insights with an easily consumable, always connected semantic
model that is integrated with Power BI in Direct Lake mode. Reports always have
the most recent data for analysis and reporting.
Built for any skill level, from the citizen developer to DBA or data engineer.

Data warehousing items


Fabric Warehouse is not a traditional enterprise data warehouse, it's a lake warehouse
that supports two distinct warehousing items: the Fabric data warehouse and the SQL
analytics endpoint. Both are purpose-built to meet customers' business needs while
providing best in class performance, minimizing costs, and reduced administrative
overhead.

Synapse Data Warehouse


In a Microsoft Fabric workspace, a Synapse Data Warehouse or Warehouse is labeled as
'Warehouse' in the Type column. When you need the full power and transactional
capabilities (DDL and DML query support) of a data warehouse, this is the fast and
simple solution for you.

The warehouse can be populated by any one of the supported data ingestion methods
such as COPY INTO, Pipelines, Dataflows, or cross database ingestion options such
as CREATE TABLE AS SELECT (CTAS), INSERT..SELECT, or SELECT INTO.

To get started with the Warehouse, see:

- Create a warehouse in Microsoft Fabric

Performance guidelines

SQL analytics endpoint of the Lakehouse


In a Microsoft Fabric workspace, each Lakehouse has an autogenerated "SQL analytics
endpoint" which can be used to transition from the "Lake" view of the Lakehouse (which
supports data engineering and Apache Spark) to the "SQL" view of the same Lakehouse
to create views, functions, stored procedures, and apply SQL security.

With the SQL analytics endpoint of the Lakehouse, T-SQL commands can define and
query data objects but not manipulate or modify the data. You can perform the
following actions in the SQL analytics endpoint:

Query the tables that reference data in your Delta Lake folders in the lake.
Create views, inline TVFs, and procedures to encapsulate your semantics and
business logic in T-SQL.
Manage permissions on the objects.

To get started with the SQL analytics endpoint, see:

Better together: the lakehouse and warehouse in Microsoft Fabric


SQL analytics endpoint performance considerations
Query the SQL analytics endpoint or Warehouse in Microsoft Fabric

Warehouse or lakehouse
When deciding between using a warehouse or a lakehouse, it's important to consider
the specific needs and context of your data management and analytics
requirements. Equally important, this is not a one way decision!

You always have the opportunity to add one or the other at a later point should your
business needs change and regardless of where you start, both the warehouse and the
lakehouse use the same powerful SQL engine for all T-SQL queries.

Here are some general guidelines to help you make the decision:

Choose a data warehouse when you need an enterprise-scale solution with open
standard format, no knobs performance, and minimal setup. Best suited for semi-
structured and structured data formats, the data warehouse is suitable for both
beginner and experienced data professionals, offering simple and intuitive
experiences.

Choose a lakehouse when you need a large repository of highly unstructured data
from heterogeneous sources, leveraging low-cost object storage and want to use
SPARK as your primary development tool. Acting as a 'lightweight' data warehouse,
you always have the option to use the SQL endpoint and T-SQL tools to deliver
reporting and data intelligence scenarios in your lakehouse.

For more detailed decision guidance, see Microsoft Fabric decision guide: Choose
between Warehouse and Lakehouse.

Related content
Better together: the lakehouse and warehouse
Create a warehouse in Microsoft Fabric
Create a lakehouse in Microsoft Fabric
Introduction to Power BI datamarts
Create reports on data warehousing in Microsoft Fabric
Source control with Warehouse (preview)

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


Microsoft Fabric decision guide: choose
a data store
Article • 11/19/2024

Use this reference guide and the example scenarios to help you choose a data store for
your Microsoft Fabric workloads.

Data store properties


Use this information to compare Fabric data stores such as warehouse, lakehouse,
Eventhouse, SQL database, and Power BI datamart, based on data volume, type,
developer persona, skill set, operations, and other capabilities. These comparisons are
organized into the following two tables:

ノ Expand table

Lakehouse Warehouse Eventhouse

Data volume Unlimited Unlimited Unlimited

Type of data Unstructured, Structured Unstructured,


semi-structured, semi-structured,
structured structured

Primary Data engineer, data Data warehouse App developer, data


developer scientist developer, data scientist, data
persona architect, database engineer
developer

Primary dev Spark (Scala, PySpark, SQL No code, KQL, SQL


skill Spark SQL, R)

Data organized Folders and files, Databases, schemas, Databases, schemas,


by databases, and tables and tables and tables

Read Spark, T-SQL T-SQL, Spark* KQL, T-SQL, Spark


operations

Write Spark (Scala, PySpark, T-SQL KQL, Spark, connector


operations Spark SQL, R) ecosystem

Multi-table No Yes Yes, for multi-table


transactions ingestion
Lakehouse Warehouse Eventhouse

Primary Spark notebooks, Spark SQL scripts KQL Queryset, KQL


development job definitions Database
interface

Security RLS, CLS**, table level (T- Object level, RLS, CLS, RLS
SQL), none for Spark DDL/DML, dynamic data
masking

Access data via Yes Yes Yes


shortcuts

Can be a source Yes (files and tables) Yes (tables) Yes


for shortcuts

Query across Yes Yes Yes


items

Advanced Interface for large-scale Interface for large-scale Time Series native
analytics data processing, built-in data processing, built-in elements, full geo-
data parallelism, and fault data parallelism, and spatial and query
tolerance fault tolerance capabilities

Advanced Tables defined using Tables defined using Full indexing for free
formatting PARQUET, CSV, AVRO, PARQUET, CSV, AVRO, text and semi-
support JSON, and any Apache JSON, and any Apache structured data like
Hive compatible file Hive compatible file JSON
format format

Ingestion Available instantly for Available instantly for Queued ingestion,


latency querying querying streaming ingestion
has a couple of
seconds latency

* Spark supports reading from tables using shortcuts, doesn't yet support accessing
views, stored procedures, functions etc.

ノ Expand table

Fabric SQL database Power BI Datamart

Data volume 4 TB Up to 100 GB

Type of data Structured, Structured


semi-structured,
unstructured

Primary developer AI developer, App developer, Data scientist, data analyst


persona database developer, DB admin
Fabric SQL database Power BI Datamart

Primary dev skill SQL No code, SQL

Data organized by Databases, schemas, tables Database, tables, queries

Read operations T-SQL Spark, T-SQL

Write operations T-SQL Dataflows, T-SQL

Multi-table Yes, full ACID compliance No


transactions

Primary SQL scripts Power BI


development
interface

Security Object level, RLS, CLS, DDL/DML, Built-in RLS editor


dynamic data masking

Access data via Yes No


shortcuts

Can be a source Yes (tables) No


for shortcuts

Query across Yes No


items

Advanced T-SQL analytical capabilities, data Interface for data processing with
analytics replicated to delta parquet in automated performance tuning
OneLake for analytics

Advanced Table support for OLTP, JSON, Tables defined using PARQUET, CSV,
formatting vector, graph, XML, spatial, key- AVRO, JSON, and any Apache Hive
support value compatible file format

Ingestion latency Available instantly for querying Available instantly for querying

** Column-level security available on the Lakehouse through a SQL analytics endpoint,


using T-SQL.

Scenarios
Review these scenarios for help with choosing a data store in Fabric.

Scenario 1
Susan, a professional developer, is new to Microsoft Fabric. They're ready to get started
cleaning, modeling, and analyzing data but need to decide to build a data warehouse or
a lakehouse. After review of the details in the previous table, the primary decision points
are the available skill set and the need for multi-table transactions.

Susan has spent many years building data warehouses on relational database engines,
and is familiar with SQL syntax and functionality. Thinking about the larger team, the
primary consumers of this data are also skilled with SQL and SQL analytical tools. Susan
decides to use a Fabric warehouse, which allows the team to interact primarily with T-
SQL, while also allowing any Spark users in the organization to access the data.

Susan creates a new lakehouse and accesses the data warehouse capabilities with the
lakehouse SQL analytics endpoint. Using the Fabric portal, creates shortcuts to the
external data tables and places them in the /Tables folder. Susan now can write T-SQL
queries that reference shortcuts to query Delta Lake data in the lakehouse. The shortcuts
automatically appear as tables in the SQL analytics endpoint and can be queried with T-
SQL using three-part names.

Scenario 2
Rob, a data engineer, needs to store and model several terabytes of data in Fabric. The
team has a mix of PySpark and T-SQL skills. Most of the team running T-SQL queries are
consumers, and therefore don't need to write INSERT, UPDATE, or DELETE statements.
The remaining developers are comfortable working in notebooks, and because the data
is stored in Delta, they're able to interact with a similar SQL syntax.

Rob decides to use a lakehouse, which allows the data engineering team to use their
diverse skills against the data, while allowing the team members who are highly skilled
in T-SQL to consume the data.

Scenario 3
Ash, a citizen developer, is a Power BI developer. They're familiar with Excel, Power BI,
and Office. They need to build a data product for a business unit. They know they don't
quite have the skills to build a data warehouse or a lakehouse, and those seem like too
much for their needs and data volumes. They review the details in the previous table
and see that the primary decision points are their own skills and their need for a self
service, no code capability, and data volume under 100 GB.

Ash works with business analysts familiar with Power BI and Microsoft Office, and knows
that they already have a Premium capacity subscription. As they think about their larger
team, they realize the primary consumers of this data are analysts, familiar with no-code
and SQL analytical tools. Ash decides to use a Power BI datamart, which allows the team
to interact build the capability fast, using a no-code experience. Queries can be
executed via Power BI and T-SQL, while also allowing any Spark users in the organization
to access the data as well.

Scenario 4
Daisy is business analyst experienced with using Power BI to analyze supply chain
bottlenecks for a large global retail chain. They need to build a scalable data solution
that can handle billions of rows of data and can be used to build dashboards and
reports that can be used to make business decisions. The data comes from plants,
suppliers, shippers, and other sources in various structured, semi-structured, and
unstructured formats.

Daisy decides to use an Eventhouse because of its scalability, quick response times,
advanced analytics capabilities including time series analysis, geospatial functions, and
fast direct query mode in Power BI. Queries can be executed using Power BI and KQL to
compare between current and previous periods, quickly identify emerging problems, or
provide geo-spatial analytics of land and maritime routes.

Scenario 5
Kirby is an application architect experienced in developing .NET applications for
operational data. They need a high concurrency database with full ACID transaction
compliance and strongly enforced foreign keys for relational integrity. Kirby wants the
benefit of automatic performance tuning to simplify day-to-day database management.

Kirby decides on a SQL database in Fabric, with the same SQL Database Engine as Azure
SQL Database. SQL databases in Fabric automatically scale to meet demand throughout
the business day. They have the full capability of transactional tables and the flexibility
of transaction isolation levels from serializable to read committed snapshot. SQL
database in Fabric automatically creates and drops nonclustered indexes based on
strong signals from execution plans observed over time.

In Kirby's scenario, data from the operational application must be joined with other data
in Fabric: in Spark, in a warehouse, and from real-time events in an Eventhouse. Every
Fabric database includes a SQL analytics endpoint, so data to be accessed in real time
from Spark or with Power BI queries using DirectLake mode. These reporting solutions
spare the primary operational database from the overhead of analytical workloads, and
avoid denormalization. Kirby also has existing operational data in other SQL databases,
and needs to import that data without transformation. To import existing operational
data without any data type conversion, Kirby designs data pipelines with Fabric Data
Factory to import data into the Fabric SQL database.

Related content
Create a lakehouse in Microsoft Fabric
Create a warehouse in Microsoft Fabric
Create an eventhouse
Create a SQL database in the Fabric portal
Power BI datamart

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


Microsoft Fabric decision guide: Choose
between Warehouse and Lakehouse
Article • 11/19/2024

Microsoft Fabric offers two enterprise-scale, open standard format workloads for data
storage: Warehouse and Lakehouse. This article compares the two platforms and the
decision points for each.

Criterion

No Code or Pro Code solutions: How do you want to develop?​

Spark
Use Lakehouse​
T-SQL​
Use Warehouse​

Warehousing needs​: Do you need multi-table transactions?​

Yes
Use Warehouse​
No​
Use Lakehouse​

Data complexity​: What type of data are you analyzing?​

Don't know​
Use Lakehouse​
Unstructured and structured​data
Use Lakehouse​
Structured​data only
Use Warehouse​
Choose a candidate service
Perform a detailed evaluation of the service to confirm that it meets your needs.

The Warehouse item in Fabric Data Warehouse is an enterprise scale data warehouse
with open standard format.​

No knobs performance with minimal set-up and deployment, no configuration of


compute or storage needed. ​
Simple and intuitive warehouse experiences for both beginner and experienced
data professionals (no/pro code)​.
Lake-centric warehouse stores data in OneLake in open Delta format with easy
data recovery and management​.
Fully integrated with all Fabric workloads.
Data loading and transforms at scale, with full multi-table transactional guarantees
provided by the SQL engine.​
Virtual warehouses with cross-database querying and a fully integrated semantic
layer​.
Enterprise-ready platform with end-to-end performance and usage visibility, with
built-in governance and security​.
Flexibility to build data warehouse or data mesh based on organizational needs
and choice of no-code, low-code, or T-SQL for transformations​.

The Lakehouse item in Fabric Data Engineering is a data architecture platform for
storing, managing, and analyzing structured and unstructured data in a single location.

Store, manage, and analyze structured and unstructured data in a single location
to gain insights and make decisions faster and efficiently.​
Flexible and scalable solution that allows organizations to handle large volumes of
data of all types and sizes.​
Easily ingest data from many different sources, which are converted into a unified
Delta format ​
Automatic table discovery and registration for a fully managed file-to-table
experience for data engineers and data scientists. ​
Automatic SQL analytics endpoint and default dataset that allows T-SQL querying
of delta tables in the lake

Both are included in Power BI Premium or Fabric capacities​.

Compare different warehousing capabilities


This table compares the Warehouse to the SQL analytics endpoint of the Lakehouse.
Microsoft Fabric offering

Warehouse

SQL analytics endpoint of the Lakehouse

Primary capabilities

ACID compliant, full data warehousing with transactions support in T-SQL.

Read only, system generated SQL analytics endpoint for Lakehouse for T-SQL querying
and serving. Supports analytics on the Lakehouse Delta tables, and the Delta Lake
folders referenced via shortcuts.

Developer profile

SQL Developers or citizen developers

Data Engineers or SQL Developers

Data loading

SQL, pipelines, dataflows

Spark, pipelines, dataflows, shortcuts

Delta table support

Reads and writes Delta tables

Reads delta tables

Storage layer

Open Data Format - Delta

Open Data Format - Delta


Recommended use case
Data Warehousing for enterprise use
Data Warehousing supporting departmental, business unit or self service use
Structured data analysis in T-SQL with tables, views, procedures and functions and
Advanced SQL support for BI
Exploring and querying delta tables from the lakehouse
Staging Data and Archival Zone for analysis
Medallion lakehouse architecture with zones for bronze, silver and gold analysis
Pairing with Warehouse for enterprise analytics use cases

Development experience
Warehouse Editor with full support for T-SQL data ingestion, modeling,
development, and querying UI experiences for data ingestion, modeling, and
querying
Read / Write support for 1st and 3rd party tooling
Lakehouse SQL analytics endpoint with limited T-SQL support for views, table
valued functions, and SQL Queries
UI experiences for modeling and querying
Limited T-SQL support for 1st and 3rd party tooling

T-SQL capabilities

Full DQL, DML, and DDL T-SQL support, full transaction support

Full DQL, No DML, limited DDL T-SQL Support such as SQL Views and TVFs

Related content
Microsoft Fabric decision guide: choose a data store

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


Create a Warehouse in Microsoft Fabric
Article • 09/24/2024

Applies to: ✅ Warehouse in Microsoft Fabric

This article describes how to get started with Warehouse in Microsoft Fabric using the
Microsoft Fabric portal, including discovering creation and consumption of the
warehouse. You learn how to create your warehouse from scratch and sample along with
other helpful information to get you acquainted and proficient with warehouse
capabilities offered through the Microsoft Fabric portal.

 Tip

You can proceed with either a new blank Warehouse or a new Warehouse with
sample data to continue this series of Get Started steps.

How to create a blank warehouse


In this section, we walk you through three distinct workloads available for creating a
Warehouse from scratch in the Microsoft Fabric portal: using the Home hub, the Create
hub, or the workspace list view.

Create a warehouse using the Home hub

The first hub in the navigation pane is the Home hub. You can start creating your
warehouse from the Home hub by selecting the Warehouse card under the New
section. An empty warehouse is created for you to start creating objects in the
warehouse. You can use either sample data to get a jump start or load your own test
data if you prefer.
Create a warehouse using the Create hub
Another option available to create your warehouse is through the Create hub, which is
the second hub in the navigation pane.

You can create your warehouse from the Create hub by selecting the Warehouse card
under the Data Warehousing section. When you select the card, an empty warehouse is
created for you to start creating objects in the warehouse or use a sample to get started
as previously mentioned.

Create a warehouse from the workspace list view


To create a warehouse, navigate to your workspace, select + New and then select
Warehouse to create a warehouse.

Ready for data

Once initialized, you can load data into your warehouse. For more information about
getting data into a warehouse, see Ingesting data.

How to create a warehouse with sample data


In this section, we walk you through creating a sample Warehouse from scratch.

1. The first hub in the navigation pane is the Home hub. You can start creating your
warehouse sample from the Home hub by selecting the Warehouse sample card
under the New section.
2. Provide the name for your sample warehouse and select Create.

3. The create action creates a new Warehouse and start loading sample data into it.
The data loading takes few minutes to complete.

4. On completion of loading sample data, the warehouse opens with data loaded into
tables and views to query.

If you have an existing warehouse created that's empty, the following steps will show
how to load sample data.

1. Once you have created your warehouse, you can load sample data into warehouse
from Use sample database card on the home page of the warehouse.

2. The data loading takes few minutes to complete.

3. On completion of loading sample data, the warehouse displays data loaded into
tables and views to query.

4. The following sample T-SQL scripts can be used on the sample data in your new
warehouse.

7 Note

It is important to note that much of the functionality described in this section


is also available to users via a TDS end-point connection and tools such as
SQL Server Management Studio (SSMS) or Azure Data Studio (for users who
prefer to use T-SQL for the majority of their data processing needs). For more
information, see Connectivity or Query a warehouse.

SQL

/*************************************************
Get number of trips performed by each medallion
**************************************************/

SELECT
M.MedallionID
,M.MedallionCode
,COUNT(T.TripDistanceMiles) AS TotalTripCount
FROM
dbo.Trip AS T
JOIN
dbo.Medallion AS M
ON
T.MedallionID=M.MedallionID
GROUP BY
M.MedallionID
,M.MedallionCode

/****************************************************
How many passengers are being picked up on each trip?
*****************************************************/
SELECT
PassengerCount,
COUNT(*) AS CountOfTrips
FROM
dbo.Trip
WHERE
PassengerCount > 0
GROUP BY
PassengerCount
ORDER BY
PassengerCount

/**********************************************************************
***********
What is the distribution of trips by hour on working days (non-holiday
weekdays)?
***********************************************************************
**********/
SELECT
ti.HourlyBucket,
COUNT(*) AS CountOfTrips
FROM dbo.Trip AS tr
INNER JOIN dbo.Date AS d
ON tr.DateID = d.DateID
INNER JOIN dbo.Time AS ti
ON tr.PickupTimeID = ti.TimeID
WHERE
d.IsWeekday = 1
AND d.IsHolidayUSA = 0
GROUP BY
ti.HourlyBucket
ORDER BY
ti.HourlyBucket

 Tip

You can proceed with either a blank Warehouse or a sample Warehouse to


continue this series of Get Started steps.

Next step
Create tables in the Warehouse in Microsoft Fabric

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


Create tables in the Warehouse in
Microsoft Fabric
Article • 09/24/2024

Applies to: ✅ Warehouse in Microsoft Fabric

To get started, you must complete the following prerequisites:

Have access to a Warehouse within a Premium capacity workspace with


contributor or higher permissions.
Choose your query tool. This tutorial features the SQL query editor in the Microsoft
Fabric portal, but you can use any T-SQL querying tool.
Use the SQL query editor in the Microsoft Fabric portal.

For more information on connecting to your Warehouse in Microsoft Fabric, see


Connectivity.

Create a new table in the SQL query editor with templates


1. In the warehouse editor ribbon, locate the SQL templates button.

2. Select Table, and an autogenerated CREATE TABLE script template appears in your
new SQL query window, as shown in the following image.

3. Modify the CREATE TABLE template to suit your new table.

4. Select Run to create the table.

To learn more about supported table creation in Warehouse in Microsoft Fabric, see
Tables in data warehousing in Microsoft Fabric and Data types in Microsoft Fabric.

Next step
Ingest data into your Warehouse using data pipelines

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


Ingest data into your Warehouse using
data pipelines
Article • 04/24/2024

Applies to: Warehouse in Microsoft Fabric

Data pipelines offer an alternative to using the COPY command through a graphical user
interface. A data pipeline is a logical grouping of activities that together perform a data
ingestion task. Pipelines allow you to manage extract, transform, and load (ETL) activities
instead of managing each one individually.

In this tutorial, you'll create a new pipeline that loads sample data into a Warehouse in
Microsoft Fabric.

7 Note

Some features from Azure Data Factory are not available in Microsoft Fabric, but
the concepts are interchangeable. You can learn more about Azure Data Factory
and Pipelines on Pipelines and activities in Azure Data Factory and Azure Synapse
Analytics. For a quickstart, visit Quickstart: Create your first pipeline to copy data.

Create a data pipeline


1. To create a new pipeline navigate to your workspace, select the +New button, and
select Data pipeline.
2. In the New pipeline dialog, provide a name for your new pipeline and select
Create.

3. You'll land in the pipeline canvas area, where you see three options to get started:
Add a pipeline activity, Copy data, and Choose a task to start.

Each of these options offers different alternatives to create a pipeline:

Add pipeline activity: this option launches the pipeline editor, where you can
create new pipelines from scratch by using pipeline activities.
Copy data: this option launches a step-by-step assistant that helps you select
a data source, a destination, and configure data load options such as the
column mappings. On completion, it creates a new pipeline activity with a
Copy Data task already configured for you.
Choose a task to start: this option launches a set of predefined templates to
help get you started with pipelines based on different scenarios.

Pick the Copy data option to launch the Copy assistant.

4. The first page of the Copy data assistant helps you pick your own data from
various data sources, or select from one of the provided samples to get started.
For this tutorial, we'll use the COVID-19 Data Lake sample. Select this option and
select Next.

5. In the next page, you can select a dataset, the source file format, and preview the
selected dataset. Select Bing COVID-19, the CSV format, and select Next.

6. The next page, Data destinations, allows you to configure the type of the
destination workspace. We'll load data into a warehouse in our workspace, so
select the Warehouse tab, and the Data Warehouse option. Select Next.


7. Now it's time to pick the warehouse to load data into. Select your desired
warehouse in the dropdown list and select Next.

8. The last step to configure the destination is to provide a name to the destination
table and configure the column mappings. Here you can choose to load the data
to a new table or to an existing one, provide a schema and table names, change
column names, remove columns, or change their mappings. You can accept the
defaults, or adjust the settings to your preference.

When you're done reviewing the options, select Next.

9. The next page gives you the option to use staging, or provide advanced options
for the data copy operation (which uses the T-SQL COPY command). Review the
options without changing them and select Next.

10. The last page in the assistant offers a summary of the copy activity. Select the
option Start data transfer immediately and select Save + Run.
11. You are directed to the pipeline canvas area, where a new Copy Data activity is
already configured for you. The pipeline starts to run automatically. You can
monitor the status of your pipeline in the Output pane:

12. After a few seconds, your pipeline finishes successfully. Navigating back to your
warehouse, you can select your table to preview the data and confirm that the
copy operation concluded.

For more on data ingestion into your Warehouse in Microsoft Fabric, visit:

Ingesting data into the Warehouse


Ingest data into your Warehouse using the COPY statement
Ingest data into your Warehouse using Transact-SQL

Next step
Query the SQL analytics endpoint or Warehouse in Microsoft Fabric

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


Query the SQL analytics endpoint or
Warehouse in Microsoft Fabric
Article • 08/01/2024

Applies to: SQL analytics endpoint, Warehouse, and Mirrored Database in Microsoft
Fabric

To get started with this tutorial, check the following prerequisites:

You should have access to a SQL analytics endpoint or Warehouse within a


Premium capacity workspace with contributor or higher permissions.

Choose your querying tool.


Use the SQL query editor in the Microsoft Fabric portal.
Use the Visual query editor in the Microsoft Fabric portal.

Alternatively, you can use any of these tools to connect to your SQL analytics
endpoint or Warehouse via a T-SQL connection string. For more information, see
Connectivity.
Download SQL Server Management Studio (SSMS).
Download Azure Data Studio .

7 Note

Review the T-SQL surface area for SQL analytics endpoint or Warehouse in
Microsoft Fabric.

Run a new query in SQL query editor


1. Open a New SQL query window.

2. A new tab appears for you to write a SQL query.


3. Write a SQL query and run it.

Run a new query in Visual query editor


1. Open a New visual query window.


2. A new tab appears for you to create a visual query.

3. Drag and drop tables from the object Explorer to Visual query editor window to
create a query.

Write a cross-database query


You can write cross database queries to warehouses and databases in the current active
workspace in Microsoft Fabric.
There are several ways you can write cross-database or cross-warehouse queries within
the same Microsoft Fabric workspace, in this section we explore examples. You can join
tables or views to run cross-warehouse queries within current active workspace.

1. Add SQL analytics endpoint or Warehouse from your current active workspace to
object Explorer using + Warehouses action. When you select SQL analytics
endpoint or Warehouse from the dialog, it gets added into the object Explorer for
referencing when writing a SQL query or creating Visual query.

2. You can reference the table from added databases using three-part naming. In the
following example, use the three-part name to refer to ContosoSalesTable in the
added database ContosoLakehouse .

SQL

SELECT *
FROM ContosoLakehouse.dbo.ContosoSalesTable AS Contoso
INNER JOIN Affiliation
ON Affiliation.AffiliationId = Contoso.RecordTypeID;

3. Using three-part naming to reference the databases/tables, you can join multiple
databases.

SQL

SELECT *
FROM ContosoLakehouse.dbo.ContosoSalesTable AS Contoso
INNER JOIN My_lakehouse.dbo.Affiliation
ON My_lakehouse.dbo.Affiliation.AffiliationId = Contoso.RecordTypeID;

4. For more efficient and longer queries, you can use aliases.

SQL

SELECT *
FROM ContosoLakehouse.dbo.ContosoSalesTable AS Contoso
INNER JOIN My_lakehouse.dbo.Affiliation as MyAffiliation
ON MyAffiliation.AffiliationId = Contoso.RecordTypeID;

5. Using three-part naming to reference the database and tables, you can insert data
from one database to another.

SQL

INSERT INTO ContosoWarehouse.dbo.Affiliation


SELECT *
FROM My_Lakehouse.dbo.Affiliation;

6. You can drag and drop tables from added databases to Visual query editor to
create a cross-database query.

Select Top 100 Rows from the Explorer


1. After opening your warehouse from the workspace, expand your database, schema
and tables folder in the object Explorer to see all tables listed.
2. Right-click on the table that you would like to query and select Select TOP 100
rows.

3. Once the script is automatically generated, select the Run button to run the script
and see the results.

7 Note

At this time, there's limited T-SQL functionality. See T-SQL surface area for a list of
T-SQL commands that are currently not available.

Next step
Create reports on data warehousing in Microsoft Fabric
Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


Create reports on data warehousing in
Microsoft Fabric
Article • 07/26/2024

Applies to: SQL analytics endpoint and Warehouse in Microsoft Fabric

Microsoft Fabric lets you create reusable and default Power BI semantic models to
create reports in various ways in Power BI. This article describes the various ways you
can use your Warehouse or SQL analytics endpoint, and their default Power BI semantic
models, to create reports.

For example, you can establish a live connection to a shared semantic model in the
Power BI service and create many different reports from the same semantic model. You
can create a data model in Power BI Desktop and publish to the Power BI service. Then,
you and others can create multiple reports in separate .pbix files from that common
data model and save them to different workspaces.

Advanced users can build reports from a warehouse using a composite model or using
the SQL connection string.

Reports that use the Warehouse or SQL analytics endpoint can be created in either of
the following two tools:

Power BI service
Power BI Desktop

7 Note

Microsoft has renamed the Power BI dataset content type to semantic model. This
applies to Microsoft Fabric as well. For more information, see New name for Power
BI datasets.

Create reports using the Power BI service


Within Data Warehouse, using the ribbon and the main home tab, navigate to the New
report button. This option provides a native, quick way to create report built on top of
the default Power BI semantic model.
If no tables have been added to the default Power BI semantic model, the dialog first
automatically adds tables, prompting the user to confirm or manually select the tables
included in the canonical default semantic model first, ensuring there's always data first.

With a default semantic model that has tables, the New report opens a browser tab to
the report editing canvas to a new report that is built on the semantic model. When you
save your new report you're prompted to choose a workspace, provided you have write
permissions for that workspace. If you don't have write permissions, or if you're a free
user and the semantic model resides in a Premium capacity workspace, the new report is
saved in your My workspace.

For more information on how to create reports using the Power BI service, see Create
reports in the Power BI service.

Create reports using Power BI Desktop


You can build reports from semantic models with Power BI Desktop using a Live
connection to the default semantic model. For information on how to make the
connection, see connect to semantic models from Power BI Desktop.

For a tutorial with Power BI Desktop, see Get started with Power BI Desktop. For
advanced situations where you want to add more data or change the storage mode, see
use composite models in Power BI Desktop.

If you're browsing for a specific SQL analytics endpoint or Warehouse in OneLake, you
can use the integrated OneLake data hub in Power BI Desktop to make a connection
and build reports:

1. Open Power BI Desktop and select Warehouse under the OneLake data hub
dropdown list in the ribbon.
2. Select the desired warehouse.

If you would like to create a live connection to the automatically defined data
model, select Connect.
If you would like to connect directly to the data source and define your own
data model, select the dropdown list arrow for the Connect button and select
Connect to SQL endpoint.

3. For authentication, select Organizational account.


4. Authenticate using Microsoft Entra ID (formerly Azure Active Directory) multifactor
authentication (MFA).
5. If you selected Connect to SQL endpoint, select the data items you want to
include or not include in your semantic model.

Alternatively, if you have the SQL connection string of your SQL analytics endpoint or
Warehouse and would like more advanced options, such as writing a SQL statement to
filter out specific data, connect to a warehouse in Power BI Desktop:

1. In the Fabric portal, right-click on the Warehouse or SQL analytics endpoint in your
workspace and select Copy SQL connection string. Or, navigate to the Warehouse
Settings in your workspace. Copy the SQL connection string.
2. Open Power BI Desktop and select SQL Server in the ribbon.
3. Paste the SQL connection string under Server.
4. In the Navigator dialog, select the databases and tables you would like to load.
5. If prompted for authentication, select Organizational account.
6. Authenticate using Microsoft Entra ID (formerly Azure Active Directory) multifactor
authentication (MFA). For more information, see Microsoft Entra authentication as
an alternative to SQL authentication in Microsoft Fabric.

Related content
Model data in the default Power BI semantic model in Microsoft Fabric
Create reports in the Power BI service in Microsoft Fabric and Power BI Desktop

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


Data warehouse tutorial introduction
Article • 08/05/2024

Applies to: SQL analytics endpoint and Warehouse in Microsoft Fabric

Microsoft Fabric provides a one-stop shop for all the analytical needs for every
enterprise. It covers the complete spectrum of services including data movement, data
lake, data engineering, data integration and data science, real time analytics, and
business intelligence. With Microsoft Fabric, there's no need to stitch together different
services from multiple vendors. Instead, the customer enjoys an end-to-end, highly
integrated, single comprehensive product that is easy to understand, onboard, create
and operate. No other product on the market offers the breadth, depth, and level of
integration that Microsoft Fabric offers. Additionally, Microsoft Purview is included by
default in every tenant to meet compliance and governance needs.

Purpose of this tutorial


While many concepts in Microsoft Fabric might be familiar to data and analytics
professionals, it can be challenging to apply those concepts in a new environment. This
tutorial has been designed to walk step-by-step through an end-to-end scenario from
data acquisition to data consumption to build a basic understanding of the Microsoft
Fabric user experience, the various experiences and their integration points, and the
Microsoft Fabric professional and citizen developer experiences.

The tutorials aren't intended to be a reference architecture, an exhaustive list of features


and functionality, or a recommendation of specific best practices.

Data warehouse end-to-end scenario


As prerequisites to this tutorial, complete the following steps:

1. Sign into your Power BI online account, or if you don't have an account yet, sign up
for a free trial.
2. Enable Microsoft Fabric in your tenant.

In this tutorial, you take on the role of a Warehouse developer at the fictional Wide
World Importers company and complete the following steps in the Microsoft Fabric
portal to build and implement an end-to-end data warehouse solution:

1. Create a Microsoft Fabric workspace.


2. Create a Warehouse.
3. Ingest data from source to the data warehouse dimensional model with a data
pipeline.
4. Create tables in your Warehouse.
5. Load data with T-SQL with the SQL query editor.
6. Clone a table using T-SQL with the SQL query editor.
7. Transform the data to create aggregated datasets using T-SQL.
8. Time travel using T-SQL to see data as it appeared.
9. Use the visual query editor to query the data warehouse.
10. Analyze data with a notebook.
11. Create and execute cross-warehouse queries with SQL query editor.
12. Create Power BI reports using DirectLake mode to analyze the data in place.
13. Build a report from the OneLake Data Hub.
14. Clean up resources by deleting the workspace and other items.

Data warehouse end-to-end architecture

Data sources - Microsoft Fabric makes it easy and quick to connect to Azure Data
Services, other cloud platforms, and on-premises data sources to ingest data from.

Ingestion - With 200+ native connectors as part of the Microsoft Fabric pipeline and
with drag and drop data transformation with dataflow, you can quickly build insights for
your organization. Shortcut is a new feature in Microsoft Fabric that provides a way to
connect to existing data without having to copy or move it. You can find more details
about the Shortcut feature later in this tutorial.
Transform and store - Microsoft Fabric standardizes on Delta Lake format, which means
all the engines of Microsoft Fabric can read and work on the same data stored in
OneLake - no need for data duplicity. This storage allows you to build a data warehouse
or data mesh based on your organizational need. For transformation, you can choose
either low-code or no-code experience with pipelines/dataflows or use T-SQL for a code
first experience.

Consume - Data from the warehouse can be consumed by Power BI, the industry
leading business intelligence tool, for reporting and visualization. Each warehouse
comes with a built-in TDS endpoint for easily connecting to and querying data from
other reporting tools, when needed. When a warehouse is created, a secondary item,
called a default semantic model, is generated at the same time with the same name. You
can use the default semantic model to start visualizing data with just a couple of steps.

Sample data
For sample data, we use the Wide World Importers (WWI) sample database. For our data
warehouse end-to-end scenario, we have generated sufficient data for a sneak peek into
the scale and performance capabilities of the Microsoft Fabric platform.

Wide World Importers (WWI) is a wholesale novelty goods importer and distributor
operating from the San Francisco Bay area. As a wholesaler, WWI's customers are mostly
companies who resell to individuals. WWI sells to retail customers across the United
States including specialty stores, supermarkets, computing stores, tourist attraction
shops, and some individuals. WWI also sells to other wholesalers via a network of agents
who promote the products on WWI's behalf. To earn more about their company profile
and operation, see Wide World Importers sample databases for Microsoft SQL.

Typically, you would bring data from transactional systems (or line of business
applications) into a data lake or data warehouse staging area. However, for this tutorial,
we use the dimensional model provided by WWI as our initial data source. We use it as
the source to ingest the data into a data warehouse and transform it through T-SQL.

Data model
While the WWI dimensional model contains multiple fact tables, for this tutorial we
focus on the fact_sale table and its related dimensions only, as follows, to demonstrate
this end-to-end data warehouse scenario:
Next step
Tutorial: Create a Microsoft Microsoft Fabric workspace

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


Tutorial: Create a Microsoft Fabric
workspace
Article • 11/19/2024

Applies to: ✅ SQL analytics endpoint and Warehouse in Microsoft Fabric

Before you can create a warehouse, you need to create a workspace where you'll build
out the remainder of the tutorial.

Create a workspace
The workspace contains all the items needed for data warehousing, including: Data
Factory pipelines, the data warehouse, Power BI semantic models, operational
databases, and reports.

1. Sign in to Power BI .

2. Select Workspaces > New workspace.


3. Fill out the Create a workspace form as follows:
a. Name: Enter Data Warehouse Tutorial , and some characters for uniqueness.
b. Description: Optionally, enter a description for the workspace.
4. Expand the Advanced section.

5. Choose Fabric capacity or Trial in the License mode section.

6. Choose a premium capacity you have access to.

7. Select Apply. The workspace is created and opened.

Next step
Tutorial: Create a Microsoft Fabric data warehouse

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


Tutorial: Create a Warehouse in
Microsoft Fabric
Article • 08/05/2024

Applies to: SQL analytics endpoint and Warehouse in Microsoft Fabric

Now that you have a workspace, you can create your first Warehouse in Microsoft
Fabric.

Create your first Warehouse


1. Select Workspaces in the navigation menu.

2. Search for the workspace you created in Tutorial: Create a Microsoft Fabric
workspace by typing in the search textbox at the top and selecting your workspace
to open it.
3. Select the + New button to display a full list of available items. From the list of
objects to create, choose **Warehouse ** to create a new Warehouse in Microsoft
Fabric.

4. On the New warehouse dialog, enter WideWorldImporters as the name.

1. Select Create.

2. When provisioning is complete, the Build a warehouse landing page appears.


Next step
Tutorial: Ingest data into a Microsoft Fabric data warehouse

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


Tutorial: Ingest data into a Warehouse in
Microsoft Fabric
Article • 09/08/2024

Applies to: ✅ Warehouse in Microsoft Fabric

Now that you have created a Warehouse in Microsoft Fabric, you can ingest data into
that warehouse.

Ingest data
1. From the Build a warehouse landing page, select the Data Warehouse Tutorial
workspace in the navigation menu to return to the workspace item list.

2. Select New > More options to display a full list of available items.

3. In the Data Factory section, select Data pipeline.

4. On the New pipeline dialog, enter Load Customer Data as the name.
5. Select Create.

6. Select Pipeline activity.

7. Select Copy data from the Move & transform section.

8. If necessary, select the newly created Copy data activity from the design canvas
and follow the next steps to configure it.

9. On the General page, for Name, enter CD Load dimension_customer .


10. On the Source page, select the Connection dropdown. Select More to see all of
the data sources you can choose from, including data sources in your local
OneLake data hub.

11. Select New to create a new connection.

12. On the New connection page, select or type to select Azure Blobs from the list of
connection options.

13. Select Continue.

14. On the Connection settings page, configure the settings as follows:

a. In the Account name or URL, enter


https://fabrictutorialdata.blob.core.windows.net/sampledata/ .

b. In the Connection credentials section, select Create new connection in the


dropdown list for the Connection.

c. The Connection name field is automatically populated, but for clarity, type in
Wide World Importers Public Sample .

d. Set the Authentication kind to Anonymous.

15. Select Connect.

16. Change the remaining settings on the Source page of the copy activity as follows,
to reach the .parquet files in
https://fabrictutorialdata.blob.core.windows.net/sampledata/WideWorldImporters
DW/parquet/full/dimension_customer/*.parquet :

a. In the File path text boxes, provide:

i. Container: sampledata

ii. File path - Directory: WideWorldImportersDW/tables

iii. File path - File name: dimension_customer.parquet

b. In the File format drop-down, choose Parquet.

17. Select Preview data next to the File path setting to ensure there are no errors.

18. Select the Destination page of the Copy data activity. For Connection, select the
warehouse item WideWorldImporters from the list, or select More to search for
the warehouse.

19. Next to the Table option configuration setting, select the Auto create table radio
button.

20. The dropdown menu next to the Table configuration setting will automatically
change to two text boxes.

21. In the first box next to the Table setting, enter dbo .

22. In the second box next to the Table setting, enter dimension_customer .

23. From the ribbon, select Run.


24. Select Save and run from the dialog box. The pipeline to load the
dimension_customer table with start.

25. Monitor the copy activity's progress on the Output page and wait for it to
complete.

Next step
Tutorial: Create tables in a data warehouse

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


Tutorial: Create tables in a data
warehouse
Article • 08/05/2024

Applies to: Warehouse in Microsoft Fabric

Learn how to create tables in the data warehouse you created in a previous part of the
tutorial.

Create a table
1. Select Workspaces in the navigation menu.

2. Select the workspace created in Tutorial: Create a Microsoft Fabric data workspace,
such as Data Warehouse Tutorial.

3. From the item list, select WideWorldImporters with the type of Warehouse.

4. From the ribbon, select New SQL query. Under Blank, select New SQL query for a
new blank query window.

5. In the query editor, paste the following code.

SQL

/*
1. Drop the dimension_city table if it already exists.
2. Create the dimension_city table.
3. Drop the fact_sale table if it already exists.
4. Create the fact_sale table.
*/

--dimension_city
DROP TABLE IF EXISTS [dbo].[dimension_city];
CREATE TABLE [dbo].[dimension_city]
(
[CityKey] [int] NULL,
[WWICityID] [int] NULL,
[City] [varchar](8000) NULL,
[StateProvince] [varchar](8000) NULL,
[Country] [varchar](8000) NULL,
[Continent] [varchar](8000) NULL,
[SalesTerritory] [varchar](8000) NULL,
[Region] [varchar](8000) NULL,
[Subregion] [varchar](8000) NULL,
[Location] [varchar](8000) NULL,
[LatestRecordedPopulation] [bigint] NULL,
[ValidFrom] [datetime2](6) NULL,
[ValidTo] [datetime2](6) NULL,
[LineageKey] [int] NULL
);

--fact_sale

DROP TABLE IF EXISTS [dbo].[fact_sale];

CREATE TABLE [dbo].[fact_sale]

(
[SaleKey] [bigint] NULL,
[CityKey] [int] NULL,
[CustomerKey] [int] NULL,
[BillToCustomerKey] [int] NULL,
[StockItemKey] [int] NULL,
[InvoiceDateKey] [datetime2](6) NULL,
[DeliveryDateKey] [datetime2](6) NULL,
[SalespersonKey] [int] NULL,
[WWIInvoiceID] [int] NULL,
[Description] [varchar](8000) NULL,
[Package] [varchar](8000) NULL,
[Quantity] [int] NULL,
[UnitPrice] [decimal](18, 2) NULL,
[TaxRate] [decimal](18, 3) NULL,
[TotalExcludingTax] [decimal](29, 2) NULL,
[TaxAmount] [decimal](38, 6) NULL,
[Profit] [decimal](18, 2) NULL,
[TotalIncludingTax] [decimal](38, 6) NULL,
[TotalDryItems] [int] NULL,
[TotalChillerItems] [int] NULL,
[LineageKey] [int] NULL,
[Month] [int] NULL,
[Year] [int] NULL,
[Quarter] [int] NULL
);

6. Select Run to execute the query.


7. To save this query for reference later, right-click on the query tab, and select
Rename.

8. Type Create Tables to change the name of the query.

9. Press Enter on the keyboard or select anywhere outside the tab to save the
change.

10. Validate the table was created successfully by selecting the refresh icon button on
the ribbon.

11. In the Object explorer, verify that you can see the newly created Create Tables
query, fact_sale table, and dimension_city table.
Next step
Tutorial: Load data using T-SQL

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


Tutorial: Load data using T-SQL
Article • 07/17/2024

Applies to: Warehouse in Microsoft Fabric

Now that you know how to build a data warehouse, load a table, and generate a report,
it's time to extend the solution by exploring other methods for loading data.

Load data with COPY INTO


1. From the ribbon, select New SQL query.

2. In the query editor, paste the following code.

SQL

--Copy data from the public Azure storage account to the


dbo.dimension_city table.
COPY INTO [dbo].[dimension_city]
FROM
'https://azuresynapsestorage.blob.core.windows.net/sampledata/WideWorld
ImportersDW/tables/dimension_city.parquet'
WITH (FILE_TYPE = 'PARQUET');

--Copy data from the public Azure storage account to the dbo.fact_sale
table.
COPY INTO [dbo].[fact_sale]
FROM
'https://azuresynapsestorage.blob.core.windows.net/sampledata/WideWorld
ImportersDW/tables/fact_sale.parquet'
WITH (FILE_TYPE = 'PARQUET');

3. Select Run to execute the query. The query takes between one and four minutes to
execute.
4. After the query is completed, review the messages to see the rows affected which
indicated the number of rows that were loaded into the dimension_city and
fact_sale tables respectively.

5. Load the data preview to validate the data loaded successfully by selecting on the
fact_sale table in the Explorer.

6. Rename the query for reference later. Right-click on SQL query 1 in the Explorer
and select Rename.
7. Type Load Tables to change the name of the query.

8. Press Enter on the keyboard or select anywhere outside the tab to save the
change.

Next step
Tutorial: Clone a table using T-SQL in Microsoft Fabric

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


Tutorial: Clone a table using T-SQL in
Microsoft Fabric
Article • 07/18/2024

Applies to: Warehouse in Microsoft Fabric

This tutorial guides you through creating a table clone in Warehouse in Microsoft Fabric,
using the CREATE TABLE AS CLONE OF T-SQL syntax.

You can use the CREATE TABLE AS CLONE OF T-SQL commands to create a table
clone at the current point-in-time or at a previous point-in-time.
You can also clone tables in the Fabric portal. For examples, see Tutorial: Clone
tables in the Fabric portal.
You can also query data in a warehouse as it appeared in the past, using the T-SQL
OPTION syntax. For more information, see Query data as it existed in the past.

Create a table clone within the same schema in


a warehouse
1. In the Fabric portal, from the ribbon, select New SQL query.

2. To create a table clone as of current point in time, in the query editor, paste the
following code to create clones of the dbo.dimension_city and dbo.fact_sale
tables.

SQL

--Create a clone of the dbo.dimension_city table.


CREATE TABLE [dbo].[dimension_city1] AS CLONE OF [dbo].
[dimension_city];

--Create a clone of the dbo.fact_sale table.


CREATE TABLE [dbo].[fact_sale1] AS CLONE OF [dbo].[fact_sale];

3. Select Run to execute the query. The query takes a few seconds to execute.

After the query is completed, the table clones dimension_city1 and fact_sale1
have been created.

4. Load the data preview to validate the data loaded successfully by selecting on the
dimension_city1 table in the Explorer.

5. To create a table clone as of a past point in time, use the AS CLONE OF ... AT T-
SQL syntax. The following sample to create clones from a past point in time of the
dbo.dimension_city and dbo.fact_sale tables. Input the Coordinated Universal

Time (UTC) for the point in timestamp at which the table is required to be cloned.

SQL

CREATE TABLE [dbo].[fact_sale2] AS CLONE OF [dbo].[fact_sale] AT '2024-


04-29T23:51:48.923';

CREATE TABLE [dbo].[dimension_city2] AS CLONE OF [dbo].[dimension_city]


AT '2024-04-29T23:51:48.923';

6. Select Run to execute the query. The query takes a few seconds to execute.


After the query is completed, the table clones dimension_city2 and fact_sale2
have been created, with data as it existed in the past point in time.

7. Load the data preview to validate the data loaded successfully by selecting on the
fact_sale2 table in the Explorer.

8. Rename the query for reference later. Right-click on SQL query 2 in the Explorer
and select Rename.

9. Type Clone Table to change the name of the query.

10. Press Enter on the keyboard or select anywhere outside the tab to save the
change.

Create a table clone across schemas within the


same warehouse
1. From the ribbon, select New SQL query.

2. Create a new schema within the WideWorldImporter warehouse named dbo1 . Copy,
paste, and run the following T-SQL code which creates table clones as of current
point in time of dbo.dimension_city and dbo.fact_sale tables across schemas
within the same data warehouse.

SQL

--Create new schema within the warehouse named dbo1.


CREATE SCHEMA dbo1;

--Create a clone of dbo.fact_sale table in the dbo1 schema.


CREATE TABLE [dbo1].[fact_sale1] AS CLONE OF [dbo].[fact_sale];

--Create a clone of dbo.dimension_city table in the dbo1 schema.


CREATE TABLE [dbo1].[dimension_city1] AS CLONE OF [dbo].
[dimension_city];

3. Select Run to execute the query. The query takes a few seconds to execute.

After the query is completed, clones dimension_city1 and fact_sale1 are created
in the dbo1 schema.

4. Load the data preview to validate the data loaded successfully by selecting on the
dimension_city1 table under dbo1 schema in the Explorer.

5. To create a table clone as of a previous point in time, in the query editor, paste the
following code to create clones of the dbo.dimension_city and dbo.fact_sale
tables in the dbo1 schema. Input the Coordinated Universal Time (UTC) for the
point in timestamp at which the table is required to be cloned.

SQL

--Create a clone of the dbo.dimension_city table in the dbo1 schema.


CREATE TABLE [dbo1].[dimension_city2] AS CLONE OF [dbo].
[dimension_city] AT '2024-04-29T23:51:48.923';

--Create a clone of the dbo.fact_sale table in the dbo1 schema.


CREATE TABLE [dbo1].[fact_sale2] AS CLONE OF [dbo].[fact_sale] AT
'2024-04-29T23:51:48.923';

6. Select Run to execute the query. The query takes a few seconds to execute.

After the query is completed, table clones fact_sale2 and dimension_city2 are
created in the dbo1 schema, with data as it existed in the past point in time.

7. Load the data preview to validate the data loaded successfully by selecting on the
fact_sale2 table under dbo1 schema in the Explorer.

8. Rename the query for reference later. Right-click on SQL query 3 in the Explorer
and select Rename.

9. Type Clone Table in another schema to change the name of the query.

10. Press Enter on the keyboard or select anywhere outside the tab to save the
change.

Next step
Tutorial: Transform data using a stored procedure

Related content
Clone table in Microsoft Fabric
Tutorial: Clone tables in the Fabric portal
CREATE TABLE AS CLONE OF
Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


Tutorial: Transform data using a stored
procedure
Article • 05/21/2024

Applies to: SQL analytics endpoint and Warehouse in Microsoft Fabric

Learn how to create and save a new stored procedure to transform data.

Transform data
1. From the Home tab of the ribbon, select New SQL query.

2. In the query editor, paste the following code to create the stored procedure
dbo.populate_aggregate_sale_by_city . This stored procedure will create and load

the dbo.aggregate_sale_by_date_city table in a later step.

SQL

--Drop the stored procedure if it already exists.


DROP PROCEDURE IF EXISTS [dbo].[populate_aggregate_sale_by_city]
GO

--Create the populate_aggregate_sale_by_city stored procedure.


CREATE PROCEDURE [dbo].[populate_aggregate_sale_by_city]
AS
BEGIN
--If the aggregate table already exists, drop it. Then create the
table.
DROP TABLE IF EXISTS [dbo].[aggregate_sale_by_date_city];
CREATE TABLE [dbo].[aggregate_sale_by_date_city]
(
[Date] [DATETIME2](6),
[City] [VARCHAR](8000),
[StateProvince] [VARCHAR](8000),
[SalesTerritory] [VARCHAR](8000),
[SumOfTotalExcludingTax] [DECIMAL](38,2),
[SumOfTaxAmount] [DECIMAL](38,6),
[SumOfTotalIncludingTax] [DECIMAL](38,6),
[SumOfProfit] [DECIMAL](38,2)
);

--Reload the aggregated dataset to the table.


INSERT INTO [dbo].[aggregate_sale_by_date_city]
SELECT
FS.[InvoiceDateKey] AS [Date],
DC.[City],
DC.[StateProvince],
DC.[SalesTerritory],
SUM(FS.[TotalExcludingTax]) AS [SumOfTotalExcludingTax],
SUM(FS.[TaxAmount]) AS [SumOfTaxAmount],
SUM(FS.[TotalIncludingTax]) AS [SumOfTotalIncludingTax],
SUM(FS.[Profit]) AS [SumOfProfit]
FROM [dbo].[fact_sale] AS FS
INNER JOIN [dbo].[dimension_city] AS DC
ON FS.[CityKey] = DC.[CityKey]
GROUP BY
FS.[InvoiceDateKey],
DC.[City],
DC.[StateProvince],
DC.[SalesTerritory]
ORDER BY
FS.[InvoiceDateKey],
DC.[StateProvince],
DC.[City];
END

3. To save this query for reference later, right-click on the query tab, and select
Rename.

4. Type Create Aggregate Procedure to change the name of the query.

5. Press Enter on the keyboard or select anywhere outside the tab to save the
change.

6. Select Run to execute the query.

7. Select the refresh button on the ribbon.

8. In the Object explorer, verify that you can see the newly created stored procedure
by expanding the StoredProcedures node under the dbo schema.
9. From the Home tab of the ribbon, select New SQL query.

10. In the query editor, paste the following code. This T-SQL executes
dbo.populate_aggregate_sale_by_city to create the

dbo.aggregate_sale_by_date_city table.

SQL

--Execute the stored procedure to create the aggregate table.


EXEC [dbo].[populate_aggregate_sale_by_city];

11. To save this query for reference later, right-click on the query tab, and select
Rename.

12. Type Run Create Aggregate Procedure to change the name of the query.

13. Press Enter on the keyboard or select anywhere outside the tab to save the
change.

14. Select Run to execute the query.

15. Select the refresh button on the ribbon. The query takes between two and three
minutes to execute.

16. In the Object explorer, load the data preview to validate the data loaded
successfully by selecting on the aggregate_sale_by_city table in the Explorer.
Next step
Tutorial: Time travel using T-SQL at statement level

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


Tutorial: Time travel using T-SQL at
statement level
Article • 06/10/2024

In this article, learn how to time travel in your warehouse at the statement level using T-
SQL. This feature allows you to query data as it appeared in the past, within a retention
period.

7 Note

Currently, only the Coordinated Universal Time (UTC) time zone is used for time
travel.

Time travel
In this example, we'll update a row, and show how to easily query the previous value
using the FOR TIMESTAMP AS OF query hint.

1. From the Home tab of the ribbon, select New SQL query.

2. In the query editor, paste the following code to create the view Top10CustomerView .
Select Run to execute the query.

SQL

CREATE VIEW dbo.Top10CustomersView


AS
SELECT TOP (10)
FS.[CustomerKey],
DC.[Customer],
SUM(FS.TotalIncludingTax) AS TotalSalesAmount
FROM
[dbo].[dimension_customer] AS DC
INNER JOIN
[dbo].[fact_sale] AS FS ON DC.[CustomerKey] = FS.[CustomerKey]
GROUP BY
FS.[CustomerKey],
DC.[Customer]
ORDER BY
TotalSalesAmount DESC;

3. In the Explorer, verify that you can see the newly created view Top10CustomersView
by expanding the View node under dbo schema.

4. Create another new query, similar to Step 1. From the Home tab of the ribbon,
select New SQL query.
5. In the query editor, paste the following code. This updates the TotalIncludingTax
column value to 200000000 for the record which has the SaleKey value of
22632918 . Select Run to execute the query.

SQL

/*Update the TotalIncludingTax value of the record with SaleKey value


of 22632918*/
UPDATE [dbo].[fact_sale]
SET TotalIncludingTax = 200000000
WHERE SaleKey = 22632918;

6. In the query editor, paste the following code. The CURRENT_TIMESTAMP T-SQL
function returns the current UTC timestamp as a datetime. Select Run to execute
the query.

SQL

SELECT CURRENT_TIMESTAMP;

7. Copy the timestamp value returned to your clipboard.

8. Paste the following code in the query editor and replace the timestamp value with
the current timestamp value obtained from the prior step. The timestamp syntax
format is YYYY-MM-DDTHH:MM:SS[.FFF] .

9. Remove the trailing zeroes, for example: 2024-04-24T20:59:06.097 .

10. The following example returns the list of top ten customers by TotalIncludingTax ,
including the new value for SaleKey 22632918 . Select Run to execute the query.

SQL

/*View of Top10 Customers as of today after record updates*/


SELECT *
FROM [WideWorldImporters].[dbo].[Top10CustomersView]
OPTION (FOR TIMESTAMP AS OF '2024-04-24T20:59:06.097');

11. Paste the following code in the query editor and replace the timestamp value to a
time prior to executing the update script to update the TotalIncludingTax value.
This would return the list of top ten customers before the TotalIncludingTax was
updated for SaleKey 22632918. Select Run to execute the query.

SQL
/*View of Top10 Customers as of today before record updates*/
SELECT *
FROM [WideWorldImporters].[dbo].[Top10CustomersView]
OPTION (FOR TIMESTAMP AS OF '2024-04-24T20:49:06.097');

For more examples, visit How to: Query using time travel at the statement level.

Next step
Tutorial: Create a query with the visual query builder

Related content
Query data as it existed in the past
How to: Query using time travel
Query hints (Transact-SQL)

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


Tutorial: Create a query with the visual
query builder
Article • 04/25/2024

Applies to: SQL analytics endpoint and Warehouse in Microsoft Fabric

Create and save a query with the visual query builder in the Microsoft Fabric portal.

Use the visual query builder


1. From the Home tab of the ribbon, select New visual query.

2. Drag the fact_sale table from the Explorer to the query design pane.

3. Limit the dataset size by selecting Reduce rows > Keep top rows from the
transformations ribbon.
4. In the Keep top rows dialog, enter 10000 .

5. Select OK.

6. Drag the dimension_city table from the explorer to the query design pane.

7. From the transformations ribbon, select the dropdown next to Combine and select
Merge queries as new.

8. On the Merge settings page:

a. In the Left table for merge dropdown list, choose dimension_city

b. In the Right table for merge dropdown list, choose fact_sale

c. Select the CityKey field in the dimension_city table by selecting on the column
name in the header row to indicate the join column.

d. Select the CityKey field in the fact_sale table by selecting on the column
name in the header row to indicate the join column.

e. In the Join kind diagram selection, choose Inner.


9. Select OK.

10. With the Merge step selected, select the Expand button next to fact_sale on the
header of the data grid then select the columns TaxAmount , Profit , and
TotalIncludingTax .
11. Select OK.

12. Select Transform > Group by from the transformations ribbon.

13. On the Group by settings page:

a. Change to Advanced.

b. Group by (if necessary, select Add grouping to add more group by columns):
i. Country
ii. StateProvince
iii. City

c. New column name (if necessary, select Add aggregation to add more
aggregate columns and operations):
i. SumOfTaxAmount
i. Choose Operation of Sum and Column of TaxAmount .
ii. SumOfProfit
i. Choose Operation of Sum and Column of Profit .
iii. SumOfTotalIncludingTax
i. Choose Operation of Sum and Column of TotalIncludingTax .

14. Select OK.

15. Right-click on Visual query 1 in the Explorer and select Rename.


16. Type Sales Summary to change the name of the query.

17. Press Enter on the keyboard or select anywhere outside the tab to save the
change.

Next step
Tutorial: Analyze data with a notebook

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


Tutorial: Analyze data with a notebook
Article • 11/19/2024

Applies to: ✅ SQL analytics endpoint and Warehouse in Microsoft Fabric

In this tutorial, learn about how you can use analyze data using T-SQL notebook or
using a notebook with a Lakehouse shortcut.

Option 1: Create a T-SQL notebook on the


warehouse
To get started, create a T-SQL notebook in one of the following two ways:

1. Create a T-SQL notebook from the Microsoft Fabric Warehouse homepage.


Navigate to the Data Warehouse workload, and choose Notebook.

2. Select + Warehouses and add the WideWorldImporters warehouse. Select the


WideWorldImporters warehouse from OneLake dialog box.

3. Create a T-SQL notebook from the warehouse editor. From your


WideWorldImporters warehouse, from the top navigation ribbon, select New SQL

query and then New SQL query in notebook.


4. Once the notebook is created, you can see WideWorldImporters warehouse is
loaded into the explorer, and the ribbon shows T-SQL as the default language.

5. Right-click to launch the More menu option on the dimension_city table. Select
SELECT TOP 100 to generate a quick SQL template to explore 100 rows from the
table.

6. Run the code cell and you can see messages and results.

Option 2: Create a lakehouse shortcut and


analyze data with a notebook
First, we create a new lakehouse. To create a new lakehouse in your Microsoft Fabric
workspace:

1. Select the Data Warehouse Tutorial workspace in the navigation menu.

2. Select + New > Lakehouse.

3. In the Name field, enter ShortcutExercise , and select Create.

4. The new lakehouse loads and the Explorer view opens up, with the Get data in
your lakehouse menu. Under Load data in your lakehouse, select the New
shortcut button.

5. In the New shortcut window, select the button for Microsoft OneLake.

6. In the Select a data source type window, scroll through the list until you find the
Warehouse named WideWorldImporters you created previously. Select it, then
select Next.

7. In the OneLake object browser, expand Tables, expand the dbo schema, and then
select the checkbox for dimension_customer . Select Next. Select Create.

8. If you see a folder called Unidentified under Tables, select the Refresh icon in the
horizontal menu bar.
9. Select the dimension_customer in the Table list to preview the data. The lakehouse
is showing the data from the dimension_customer table from the Warehouse!

10. Next, create a new notebook to query the dimension_customer table. In the Home
ribbon, select the dropdown list for Open notebook and choose New notebook.

11. In the Explorer, select the Lakehouses source folder.

12. Select, then drag the dimension_customer from the Tables list into the open
notebook cell. You can see a PySpark query has been written for you to query all
the data from ShortcutExercise.dimension_customer . This notebook experience is
similar to Visual Studio Code Jupyter notebook experience. You can also open the
notebook in VS Code.

13. In the Home ribbon, select the Run all button. Once the query is completed, you
will see you can easily use PySpark to query the Warehouse tables!

Next step
Tutorial: Create cross-warehouse queries with the SQL query editor

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


Tutorial: Create cross-warehouse queries
with the SQL query editor
Article • 07/18/2024

Applies to: ✅ SQL analytics endpoint and Warehouse in Microsoft Fabric

In this tutorial, learn about how you can easily create and execute T-SQL queries with
the SQL query editor across multiple warehouse, including joining together data from a
SQL analytics endpoint and a Warehouse in Microsoft Fabric.

Add multiple warehouses to the Explorer


1. Select the Data Warehouse Tutorial workspace in the navigation menu.

2. Select the WideWorldImporters warehouse item.

3. In the Explorer, select the + Warehouses button.


4. Select the SQL analytics endpoint of the lakehouse you created using shortcuts
previously, named ShortcutExercise . Both items are added to the query.

5. Your selected warehouses now show the same Explorer pane.

Execute a cross-warehouse query


In this example, you can see how easily you can run T-SQL queries across the
WideWorldImporters warehouse and ShortcutExercise SQL analytics endpoint. You can

write cross-database queries using three-part naming to reference the


database.schema.table , as in SQL Server.

1. From the ribbon, select New SQL query.

2. In the query editor, copy and paste the following T-SQL code.

SQL

SELECT Sales.StockItemKey,
Sales.Description,
SUM(CAST(Sales.Quantity AS int)) AS SoldQuantity,
c.Customer
FROM [dbo].[fact_sale] AS Sales,
[ShortcutExercise].[dbo].[dimension_customer] AS c
WHERE Sales.CustomerKey = c.CustomerKey
GROUP BY Sales.StockItemKey, Sales.Description, c.Customer;

3. Select the Run button to execute the query. After the query is completed, you will
see the results.

4. Rename the query for reference later. Right-click on SQL query 1 in the Explorer
and select Rename.

5. Type Cross-warehouse query to change the name of the query.


6. Press Enter on the keyboard or select anywhere outside the tab to save the
change.

Execute a cross-warehouse cross-workspace


query
To query data from Warehouse A residing in another workspace than your Warehouse B,
follow these steps:

1. Create a lakehouse in the same workspace as your Warehouse B.


2. In that lakehouse, create a shortcut pointing to the required databases or tables
from Warehouse A.
3. Through the previous cross-warehouse sample query, you can now query tables in
that lakehouse which are just a shortcut to Warehouse A. For example:

SQL

SELECT * FROM [lakehouse].[dbo].[table_shortcuted_from_warehouse_A]

7 Note

Cross-warehouse cross-workspace querying is currently limited for queries within


the same region.

Next step
Tutorial: Create Power BI reports

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


Tutorial: Create Power BI reports
Article • 07/18/2024

Applies to: ✅ SQL analytics endpoint and Warehouse in Microsoft Fabric

Learn how to create and save several types of Power BI reports.

Create reports
1. Select the Model view.

2. From the fact_sale table, drag the CityKey field and drop it onto the CityKey
field in the dimension_city table to create a relationship.

3. On the Create Relationship settings:


a. Table 1 is populated with fact_sale and the column of CityKey .
b. Table 2 is populated with dimension_city and the column of CityKey .
c. Cardinality: select Many to one (*:1).
d. Cross filter direction: select Single.
e. Leave the box next to Make this relationship active checked.
f. Check the box next to Assume referential integrity.
4. Select Confirm.

5. From the Home tab of the ribbon, select New report.

6. Build a Column chart visual:

a. On the Data pane, expand fact_sales and check the box next to Profit. This
creates a column chart and adds the field to the Y-axis.

b. On the Data pane, expand dimension_city and check the box next to
SalesTerritory. This adds the field to the X-axis.

c. Reposition and resize the column chart to take up the top left quarter of the
canvas by dragging the anchor points on the corners of the visual.

7. Select anywhere on the blank canvas (or press the Esc key) so the column chart
visual is no longer selected.

8. Build a Maps visual:

a. On the Visualizations pane, select the Azure Map visual.

b. From the Data pane, drag StateProvince from the dimension_city table to the
Location bucket on the Visualizations pane.

c. From the Data pane, drag Profit from the fact_sale table to the Size bucket on
the Visualizations pane.
d. If necessary, reposition and resize the map to take up the bottom left quarter of
the canvas by dragging the anchor points on the corners of the visual.

9. Select anywhere on the blank canvas (or press the Esc key) so the map visual is no
longer selected.

10. Build a Table visual:

a. On the Visualizations pane, select the Table visual.

b. From the Data pane, check the box next to SalesTerritory on the
dimension_city table.

c. From the Data pane, check the box next to StateProvince on the
dimension_city table.

d. From the Data pane, check the box next to Profit on the fact_sale table.

e. From the Data pane, check the box next to TotalExcludingTax on the fact_sale
table.
f. Reposition and resize the column chart to take up the right half of the canvas by
dragging the anchor points on the corners of the visual.

11. From the ribbon, select File > Save.

12. Enter Sales Analysis as the name of your report.

13. Select Save.

Next step
Tutorial: Build a report from the OneLake data hub

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


Tutorial: Build a report from OneLake
Article • 11/19/2024

Applies to: ✅ SQL analytics endpoint and Warehouse in Microsoft Fabric

Learn how to build a report with the data you ingested into your Warehouse in the last
step.

Build a report
1. Select OneLake in the navigation menu.

2. From the item list, select WideWorldImporters with the type of Semantic model
(default).

7 Note

Microsoft has renamed the Power BI dataset content type to semantic model.
This applies to Microsoft Fabric as well. For more information, see New name
for Power BI datasets.

3. In the Visualize this data section, select Create a report > Auto-create. A report is
generated from the dimension_customer table that was loaded in the previous
section.

4. A report similar to the following image is generated.


5. From the ribbon, select Save.

6. Enter Customer Quick Summary in the name box. In the Save your report dialogue,
select Save.

7. Your tutorial is complete!

Review Security for data warehousing in Microsoft Fabric.


Learn more about Workspace roles in Fabric data warehousing.
Consider Microsoft Purview, included by default in every tenant to meet
important compliance and governance needs.

Next step
Tutorial: Clean up tutorial resources

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


Tutorial: Clean up tutorial resources
Article • 07/18/2024

Applies to: ✅ SQL analytics endpoint and Warehouse in Microsoft Fabric

You can delete individual reports, pipelines, warehouses, and other items or remove the
entire workspace. In this tutorial, you will clean up the workspace, individual reports,
pipelines, warehouses, and other items you created as part of the tutorial.

Delete a workspace
1. In the Fabric portal, select Data Warehouse Tutorial in the navigation pane to
return to the workspace item list.

2. In the menu of the workspace header, select Workspace settings.

3. Select Other > Delete this workspace.


4. Select Delete on the warning to remove the workspace and all its contents.

Next step
What is data warehousing in Microsoft Fabric?

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


Connectivity to data warehousing in
Microsoft Fabric
Article • 10/08/2024

Applies to: ✅ SQL analytics endpoint and Warehouse in Microsoft Fabric

In Microsoft Fabric, a Lakehouse SQL analytics endpoint or Warehouse is accessible


through a Tabular Data Stream, or TDS endpoint, familiar to all modern web applications
that interact with a SQL Server TDS endpoint. This is referred to as the SQL Connection
String within the Microsoft Fabric user interface.

This article provides a how-to on connecting to your SQL analytics endpoint or


Warehouse.

To get started, you must complete the following prerequisites:

You need access to a SQL analytics endpoint or a Warehouse within a Premium


capacity workspace with contributor or higher permissions.

Authentication to warehouses in Fabric


In Microsoft Fabric, two types of authenticated users are supported through the SQL
connection string:

Microsoft Entra ID (formerly Azure Active Directory) user principals, or user


identities
Microsoft Entra ID (formerly Azure Active Directory) service principals

For more information, see Microsoft Entra authentication as an alternative to SQL


authentication in Microsoft Fabric.

The SQL connection string requires TCP port 1433 to be open. TCP 1433 is the standard
SQL Server port number. The SQL connection string also respects the Warehouse or
Lakehouse SQL analytics endpoint security model for data access. Data can be obtained
for all objects to which a user has access.

Allow Power BI service tags through firewall


To ensure proper access, you need to allow the Power BI service tags for firewall access.
For more information, see Power BI Service Tags. You cannot use the Fully Qualified
Domain Name (FQDN) of the TDS Endpoint alone. Allowing the Power BI service tags is
necessary for connectivity through the firewall.

Retrieve the SQL connection string


To retrieve the connection string, follow these steps:

1. Navigate to your workspace, select the Warehouse, and select the ... ellipses for
More options.

2. Select Copy SQL connection string to copy the connection string to your
clipboard.
Get started with SQL Server Management
Studio (SSMS)
The following steps detail how to start at the Microsoft Fabric workspace and connect a
warehouse to SQL Server Management Studio (SSMS).

1. When you open SSMS, the Connect to Server window appears. If already open,
you can connect manually by selecting Object Explorer > Connect > Database
Engine.

2. Once the Connect to Server window is open, paste the connection string copied
from the previous section of this article into the Server name box. Select Connect
and proceed with the appropriate credentials for authentication. Remember that
only Microsoft Entra multifactor authentication (MFA) is supported, via the option
Microsoft Entra MFA.
3. Once the connection is established, Object Explorer displays the connected
warehouse from the workspace and its respective tables and views, all of which are
ready to be queried.
When connecting via SSMS (or ADS), you see both a SQL analytics endpoint and
Warehouse listed as warehouses, and it's difficult to differentiate between the two item
types and their functionality. For this reason, we strongly encourage you to adopt a
naming convention that allows you to easily distinguish between the two item types
when you work in tools outside of the Microsoft Fabric portal experience. Only SSMS 19
or higher is supported.

Connect using Power BI


A Warehouse or Lakehouse SQL analytics endpoint is a fully supported and native data
source within Power BI, and there is no need to use the SQL Connection string. The Data
pane exposes all of the warehouses you have access to directly. This allows you to easily
find your warehouses by workspace, and:

1. Select the Warehouse.


2. Choose entities.
3. Load Data - choose a data connectivity mode: import or DirectQuery.

For more information, see Create reports in Microsoft Microsoft Fabric.

Connect using OLE DB


We support connectivity to the Warehouse or SQL analytics endpoint using OLE DB.
Make sure you're running the latest Microsoft OLE DB Driver for SQL Server.

Connect using ODBC


Microsoft Microsoft Fabric supports connectivity to the Warehouse or SQL analytics
endpoint using ODBC. Make sure you're running the latest ODBC Driver for SQL Server.
Use Microsoft Entra ID (formerly Azure Active Directory) authentication. Only ODBC 18
or higher versions are supported.

Connect using JDBC


Microsoft Microsoft Fabric also supports connectivity to the Warehouse or SQL analytics
endpoint using a Java database connectivity (JDBC) driver.

When establishing connectivity via JDBC, check for the following dependencies:

1. Add artifacts. Choose Add Artifact and add the following four dependencies, then
select Download/Update to load all dependencies. For example:
2. Select Test connection, and Finish.

XML

<dependency>
<groupId>com.microsoft.azure</groupId>
<artifactId>msal4j</artifactId>
<version>1.13.3</version>

</dependency>

<dependency>
<groupId>com.microsoft.sqlserver</groupId>
<artifactId>mssql-jdbc_auth</artifactId>
<version>11.2.1.x86</version>
</dependency>

<dependency>
<groupId>com.microsoft.sqlserver</groupId>
<artifactId>mssql-jdbc</artifactId>
<version>12.1.0.jre11-preview</version>
</dependency>

<dependency>
<groupId>com.microsoft.aad</groupId>
<artifactId>adal</artifactId>
<version>4.2.2</version>
</dependency>

Connect using dbt


The dbt adapter is a data transformation framework that uses software engineering
best practices like testing and version control to reduce code, automate dependency
management, and ship more reliable data—all with SQL.

The dbt data platform-specific adapter plugins allow users to connect to the data store
of choice. To connect to Synapse Data Warehouse in Microsoft Microsoft Fabric from
dbt use dbt-fabric adapter. Similarly, the Azure Synapse Analytics dedicated SQL pool

data source has its own adapter, dbt-synapse .

Both adapters support Microsoft Entra ID (formerly Azure Active Directory)


authentication and allow developers to use az cli authentication . However, SQL
authentication is not supported for dbt-fabric

The DBT Fabric DW Adapter uses the pyodbc library to establish connectivity with the
Warehouse. The pyodbc library is an ODBC implementation in Python language that
uses Python Database API Specification v2.0 . The pyodbc library directly passes
connection string to the database driver through SQLDriverConnect in the msodbc
connection structure to Microsoft Fabric using a TDS (Tabular Data Streaming) proxy
service.

For more information, see the Microsoft Fabric Synapse Data Warehouse dbt adapter
setup and Microsoft Fabric Synapse Data Warehouse dbt adapter configuration .

Connectivity by other means


Any non-Microsoft tool can also use the SQL connection string via ODBC or OLE DB
drivers to connect to a Microsoft Microsoft Fabric Warehouse or SQL analytics endpoint,
using Microsoft Entra ID (formerly Azure Active Directory) authentication. For more
information and sample connection strings, see Microsoft Entra authentication as an
alternative to SQL authentication.

Custom applications
In Microsoft Fabric, a Warehouse and a Lakehouse SQL analytics endpoint provide a SQL
connection string. Data is accessible from a vast ecosystem of SQL tooling, provided
they can authenticate using Microsoft Entra ID (formerly Azure Active Directory). For
more information, see Connection libraries for Microsoft SQL Database. For more
information and sample connection strings, see Microsoft Entra authentication as an
alternative to SQL authentication.
Best practices
We recommend adding retries in your applications/ETL jobs to build resiliency. For more
information, see the following docs:

Retry pattern - Azure Architecture Center


Working with transient errors - Azure SQL Database
Step 4: Connect resiliently to SQL with ADO.NET - ADO.NET Provider for SQL
Server
Step 4: Connect resiliently to SQL with PHP - PHP drivers for SQL Server

Considerations and limitations


SQL Authentication is not supported.
Multiple Active Result Sets (MARS) is unsupported for Microsoft Fabric Warehouse.
MARS is disabled by default, however if MultipleActiveResultSets is included in
the connection string, it should be removed or set to false.
If you receive this error "Couldn't complete the operation because we reached a
system limit", it's due to the system token size reaching its limit. This issue can be
caused if the workspace has too many warehouses/SQL analytics endpoints, if the
user is part of too many Entra groups, or a combination of the two. We
recommend having 40 or fewer warehouses and SQL analytics endpoint per
workspace to prevent this error. If the issue persists, contact support.
If you receive error code 24804 with the message "Couldn't complete the
operation due to a system update. Close out this connection, sign in again, and
retry the operation" or error code 6005 with the message "SHUTDOWN is in
progress. Execution fail against sql server. Please contact SQL Server team if you
need further support.", it's due to temporary connection loss, likely because of a
system deployment or reconfiguration. To resolve this issue, sign in again and
retry. To learn how to build resiliency and retries in your application, see Best
Practices.
If you receive error code 24804 with the message "Couldn't complete the
operation due to a system update. Close out this connection, sign in again, and
retry the operation" or error code 6005 with the message "Execution fail against
sql server. Please contact SQL Server team if you need further support.", it's due to
temporary connection loss, likely because of a system deployment or
reconfiguration. To resolve this issue, sign in again and retry. To learn how to build
resiliency and retries in your application, see Best Practices.
If you receive the error code 18456: "Execution failed against SQL server, please
contact SQL server team if you need further support.", refer to Known issue - Data
warehouse connection or query execution fails.
Linked server connections from SQL Server are not supported.

Related content
Security for data warehousing in Microsoft Fabric
Microsoft Entra authentication as an alternative to SQL authentication in Microsoft
Fabric
Add Fabric URLs to your allowlist
Azure IP ranges and service tags for public clouds

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


Overview of Copilot for Data Warehouse
Article • 09/25/2024

Applies to: ✅ Warehouse in Microsoft Fabric

Microsoft Copilot for Synapse Data Warehouse is an AI assistant designed to streamline


your data warehousing tasks. Copilot integrates seamlessly with your Fabric warehouse,
providing intelligent insights to help you along each step of the way in your T-SQL
explorations.

Introduction to Copilot for Data Warehouse


Copilot for Data Warehouse utilizes table and view names, column names, primary key,
and foreign key metadata to generate T-SQL code. Copilot for Data Warehouse does
not use data in tables to generate T-SQL suggestions.

Key features of Copilot for Warehouse include:

Natural Language to SQL: Ask Copilot to generate SQL queries using simple
natural language questions.
Code completion: Enhance your coding efficiency with AI-powered code
completions.
Quick actions: Quickly fix and explain SQL queries with readily available actions.
Intelligent Insights: Receive smart suggestions and insights based on your
warehouse schema and metadata.

There are three ways to interact with Copilot in the Fabric Warehouse editor.

Chat Pane: Use the chat pane to ask questions to Copilot through natural
language. Copilot will respond with a generated SQL query or natural language
based on the question asked.
How to: Use the Copilot chat pane for Synapse Data Warehouse
Code completions: Start writing T-SQL in the SQL query editor and Copilot will
automatically generate a code suggestion to help complete your query. The Tab
key accepts the code suggestion, or keep typing to ignore the suggestion.
How to: Use Copilot code completion for Synapse Data Warehouse
Quick Actions: In the ribbon of the SQL query editor, the Fix and Explain options
are quick actions. Highlight a SQL query of your choice and select one of the quick
action buttons to perform the selected action on your query.
Explain: Copilot can provide natural language explanations of your SQL query
and warehouse schema in comments format.
Fix: Copilot can fix errors in your code as error messages arise. Error scenarios
can include incorrect/unsupported T-SQL code, wrong spellings, and more.
Copilot will also provide comments that explain the changes and suggest SQL
best practices.
How to: Use Copilot quick actions for Synapse Data Warehouse

Use Copilot effectively


Here are some tips for maximizing productivity with Copilot.

When crafting prompts, be sure to start with a clear and concise description of the
specific information you're looking for.
Natural language to SQL depends on expressive table and column names. If your
table and columns aren't expressive and descriptive, Copilot might not be able to
construct a meaningful query.
Use natural language that is applicable to your table and view names, column
names, primary keys, and foreign keys of your warehouse. This context helps
Copilot generate accurate queries. Specify what columns you wish to see,
aggregations, and any filtering criteria as explicitly as possible. Copilot should be
able to correct typos or understand context given your schema context.
Create relationships in the model view of the warehouse to increase the accuracy
of JOIN statements in your generated SQL queries.
When using code completions, leave a comment at the top of the query with -- to
help guide the Copilot with context about the query you are trying to write.
Avoid ambiguous or overly complex language in your prompts. Simplify the
question while maintaining its clarity. This editing ensures Copilot can effectively
translate it into a meaningful T-SQL query that retrieves the desired data from the
associated tables and views.
Currently, natural language to SQL supports English language to T-SQL.
The following example prompts are clear, specific, and tailored to the properties of
your schema and data warehouse, making it easier for Copilot to generate
accurate T-SQL queries:
Show me all properties that sold last year
Count all the products, group by each category

Show all agents who sell properties in California

Show agents who have listed more than two properties for sale
Show the rank of each agent by property sales and show name, total sales,

and rank
Enable Copilot
Your administrator needs to enable the tenant switch before you start using
Copilot. For more information, see Copilot tenant settings.
Your F64 or P1 capacity needs to be in one of the regions listed in this article,
Fabric region availability.
If your tenant or capacity is outside the US or France, Copilot is disabled by default
unless your Fabric tenant admin enables the Data sent to Azure OpenAI can be
processed outside your tenant's geographic region, compliance boundary, or
national cloud instance tenant setting in the Fabric Admin portal.
Copilot in Microsoft Fabric isn't supported on trial SKUs. Only paid SKUs (F64 or
higher, or P1 or higher) are supported.
For more information, see Overview of Copilot in Fabric and Power BI.

What should I know to use Copilot responsibly?


Microsoft is committed to ensuring that our AI systems are guided by our AI
principles and Responsible AI Standard . These principles include empowering our
customers to use these systems effectively and in line with their intended uses. Our
approach to responsible AI is continually evolving to proactively address emerging
issues.

Copilot features in Fabric are built to meet the Responsible AI Standard, which means
that they're reviewed by multidisciplinary teams for potential harms, and then refined to
include mitigations for those harms.

For more information, see Privacy, security, and responsible use of Copilot for Data
Warehouse (preview).

Limitations of Copilot for Data Warehouse


Here are the current limitations of Copilot for Data Warehouse:

Copilot doesn't understand previous inputs and can't undo changes after a user
commits a change when authoring, either via user interface or the chat pane. For
example, you can't ask Copilot to "Undo my last 5 inputs." However, users can still
use the existing user interface options to delete unwanted changes or queries.
Copilot can't make changes to existing SQL queries. For example, if you ask Copilot
to edit a specific part of an existing query, it doesn't work.
Copilot might produce inaccurate results when the intent is to evaluate data.
Copilot only has access to the warehouse schema, none of the data inside.
Copilot responses can include inaccurate or low-quality content, so make sure to
review outputs before using them in your work.
People who are able to meaningfully evaluate the content's accuracy and
appropriateness should review the outputs.

Related content
Copilot tenant settings (preview)
How to: Use the Copilot chat pane for Synapse Data Warehouse
How to: Use Copilot quick actions for Synapse Data Warehouse
How to: Use Copilot code completion for Synapse Data Warehouse
Privacy, security, and responsible use of Copilot for Data Warehouse (preview)

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


How to: Use the Copilot chat pane for
Synapse Data Warehouse
Article • 08/01/2024

Applies to: Warehouse in Microsoft Fabric

Copilot for Data Warehouse includes a chat pane to interact with Copilot in natural
language. In this interface, you can ask Copilot questions specific to your data
warehouse or generally about data warehousing in Fabric. Depending on the question,
Copilot responds with a generated SQL query or a natural language response.

Since Copilot is schema aware and contextualized, you can generate queries tailored to
your Warehouse.

This integration means that Copilot can generate SQL queries for prompts like:

Show me all properties that sold last year

Which agents have listed more than two properties for sale?
Tell me the rank of each agent by property sales and show name, total sales,

and rank

Key capabilities
The supported capabilities of interacting through chat include:

Natural Language to SQL: Generate T-SQL code and get suggestions of questions
to ask to accelerate your workflow.
Q&A: Ask Copilot questions about warehousing in Fabric and it responds in natural
language
Explanations: Copilot can provide a summary and natural language of
explanations of T-SQL code within the active query tab.
Fixing errors: Copilot can also fix errors in T-SQL code as they arise. Copilot shares
context with the active query tab and can provide helpful suggestions to
automatically fix SQL query errors.

Prerequisites
Your administrator needs to enable the tenant switch before you start using
Copilot. For more information, see Copilot tenant settings.
Your F64 or P1 capacity needs to be in one of the regions listed in this article,
Fabric region availability.
If your tenant or capacity is outside the US or France, Copilot is disabled by default
unless your Fabric tenant admin enables the Data sent to Azure OpenAI can be
processed outside your tenant's geographic region, compliance boundary, or
national cloud instance tenant setting in the Fabric Admin portal.
Copilot in Microsoft Fabric isn't supported on trial SKUs. Only paid SKUs (F64 or
higher, or P1 or higher) are supported.
For more information, see Overview of Copilot in Fabric and Power BI.

Get started
1. In the Data warehouse workload, open a warehouse, and open a new SQL query.

2. To open the Copilot chat pane, select the Copilot ribbon in the button.

3. The chat pane offers helpful starter prompts to get started and familiar with
Copilot. Select any option to ask Copilot a question. The Ask a question button
provides example questions that are tailored specifically to your warehouse.

4. You can also type a request of your choice in the chat box and Copilot responds
accordingly.
5. To find documentation related to your request, select the Help button.
More powerful use cases
You can ask Copilot questions about the warehouse normally and it should respond
accordingly. However, if you want to force Copilot to perform a specific skill, there are /
commands that you can use. These commands must be at the start of your chat
message.

ノ Expand table

Command Description

/generate- Generate a SQL query from the prompt submitted to Copilot.


sql

/explain Generate an explanation for the query within the active query tab.

/fix Generate a fix for the query within the active query tab. You can optionally add
additional context to fix a specific part or aspect of the query.

/question Generate a natural language response from the prompt submitted to Copilot.

/help Get help for using Copilot. This links to documentation to Copilot and how to use
it.

For /generate-sql , /question , and optionally /fix , include additional information


regarding your intent. For example:

/generate-sql select numbers 1 through 10

/question what types of security are supported in this warehouse?


/fix using CTAS instead of ALTER TABLE

Related content
Microsoft Copilot for Synapse Data Warehouse
How to: Use Copilot code completion for Synapse Data Warehouse
How to: Use Copilot quick actions for Synapse Data Warehouse
Privacy, security, and responsible use of Copilot for Data Warehouse (preview)
Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


How to: Use Copilot code completion
for Synapse Data Warehouse
Article • 09/30/2024

Applies to: ✅ Warehouse in Microsoft Fabric

Copilot for Data Warehouse provides intelligent autocomplete-style T-SQL code


suggestions to simplify your coding experience.

As you start writing T-SQL code or comments in the editor, Copilot for Data Warehouse
leverages your warehouse schema and query tab context to complement the existing
IntelliSense with inline code suggestions. The completions can come in varied lengths -
sometimes the completion of the current line, and sometimes a whole new block of
code. The code completions support all types of T-SQL queries: data definition language
(DDL), data query language (DQL), and data manipulation language (DML). You can
accept all or part of a suggestion or keep typing to ignore the suggestions. It can also
generate alternative suggestions for you to pick.

Prerequisites
Your administrator needs to enable the tenant switch before you start using
Copilot. For more information, see Copilot tenant settings.
Your F64 or P1 capacity needs to be in one of the regions listed in this article,
Fabric region availability.
If your tenant or capacity is outside the US or France, Copilot is disabled by default
unless your Fabric tenant admin enables the Data sent to Azure OpenAI can be
processed outside your tenant's geographic region, compliance boundary, or
national cloud instance tenant setting in the Fabric Admin portal.
Copilot in Microsoft Fabric isn't supported on trial SKUs. Only paid SKUs (F64 or
higher, or P1 or higher) are supported.
For more information, see Overview of Copilot in Fabric and Power BI.

How can code completions help you?


Code completion enhances your productivity and workflow in Copilot for Data
Warehouse by reducing the cognitive load of writing T-SQL code. It accelerates code
writing, prevents syntax errors and typos, and improves code quality. It provides helpful,
context-rich suggestions directly within the query editor. Whether you're new to or
experienced with SQL, code completion helps you save time and energy with writing
SQL code, and focus on designing, optimizing, and testing your warehouse.

Key capabilities
Auto-complete partially written queries: Copilot can provide context-aware SQL
code suggestions or completions for your partially written T-SQL query.
Generate suggestions from comments: You can guide Copilot using comments
that describe your code logic and purpose, using natural language. Leave the
comment (using -- ) at the beginning of the query and Copilot will generate the
corresponding query.

Get started
1. Verify the Show Copilot completions setting in enabled in your warehouse
settings.

You can also check the setting's status through the status bar at the bottom
of the query editor.

If not enabled, then in your warehouse Settings, select the Copilot pane.
Enable the Show Copilot completions option is enabled.

2. Start writing your query in the SQL query editor within the warehouse. As you type,
Copilot will provide real-time code suggestions and completions of your query by
presenting a dimmed ghost text.

3. You can then accept the suggestion with the Tab key, or dismiss it. If you do not
want to accept an entire suggestion from Copilot, you can use the Ctrl+Right
keyboard shortcut to accept the next word of a suggestion.

4. Copilot can provide different suggestions for the same input. You can hover over
the suggestion to preview the other options.

5. To help Copilot, understand the query you're writing, you can provide context
about what code you expect by leaving a comment with -- . For example, you
could specify which warehouse object, condition, or methods to use. Copilot can
even autocomplete your comment to help you write clear and accurate comments
more efficiently.

Related content
Microsoft Copilot for Synapse Data Warehouse
How to: Use the Copilot chat pane for Synapse Data Warehouse
How to: Use Copilot quick actions for Synapse Data Warehouse
Privacy, security, and responsible use of Copilot for Data Warehouse (preview)
Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


How to: Use Copilot quick actions for
Fabric Data Warehouse
Article • 11/19/2024

Applies to: ✅ Warehouse in Microsoft Fabric

There are two AI-powered quick actions that are currently supported in Copilot for Data
Warehouse: Explain and Fix.

Quick actions can accelerate productivity by helping you write and understand queries
faster. These buttons are located at the top of the SQL query editor, near the Run
button.

The Explain quick action will leave a summary at the top of the query and in-line
code comments throughout the query to describe what the query is doing.

The Fix quick action will fix errors in your query syntax or logic. After running a SQL
query and being met with an error, you can fix your queries easily. Copilot will
automatically take the SQL error message into context when fixing your query.
Copilot will also leave a comment indicating where and how it has edited the T-
SQL code.

Copilot leverages information about your warehouse schema, query tab contents, and
execution results to give you relevant and useful feedback on your query.

Prerequisites
Your administrator needs to enable the tenant switch before you start using
Copilot. For more information, see Copilot tenant settings.
Your F64 or P1 capacity needs to be in one of the regions listed in this article,
Fabric region availability.
If your tenant or capacity is outside the US or France, Copilot is disabled by default
unless your Fabric tenant admin enables the Data sent to Azure OpenAI can be
processed outside your tenant's geographic region, compliance boundary, or
national cloud instance tenant setting in the Fabric Admin portal.
Copilot in Microsoft Fabric isn't supported on trial SKUs. Only paid SKUs (F64 or
higher, or P1 or higher) are supported.
For more information, see Overview of Copilot in Fabric and Power BI.

Get started
Whether you are a beginner or an expert in writing SQL queries, quick actions allow you
to understand and navigate the complexities of the SQL language to easily solve issues
independently.

Explain
To use Copilot to explain your queries, follow these steps:

1. Highlight the query that you want Copilot to explain. You can select the whole
query or just a part of it.

2. Select the Explain button in the toolbar. Copilot will analyze your query and
generate inline comments that explain what your code does. If applicable, Copilot
will leave a summary at the top of the query as well. The comments will appear
next to the relevant lines of code in your query editor.

3. Review the comments that Copilot generated. You can edit or delete them if you
want. You can also undo the changes if you don't like them, or make further edits.

Fix
To get Copilot's help with fixing an error in your query, follow these steps:

1. Write and run your query as usual. If there are any errors, you will see them in the
output pane.

2. Highlight the query that you want to fix. You can select the whole query or just a
part of it.

3. Select the Fix button in the toolbar. This button will only be enabled after you have
run your T-SQL query and it has returned an error.

4. Copilot will analyze your query and try to find the best way to fix it. It will also add
comments to explain what it fixed and why.

5. Review the changes that Copilot made and select Run to execute the fixed query.
You can also undo the changes if you don't like them, or make further edits.

Related content
Microsoft Copilot for Fabric Data Warehouse
How to: Use Copilot code completion for Fabric Data Warehouse
How to: Use the Copilot chat pane for Fabric Data Warehouse
Privacy, security, and responsible use of Copilot for Data Warehouse (preview)

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


Better together: the lakehouse and
warehouse
Article • 04/24/2024

Applies to: SQL analytics endpoint and Warehouse in Microsoft Fabric

This article explains the data warehousing workload with the SQL analytics endpoint of
the Lakehouse, and scenarios for use of the Lakehouse in data warehousing.

What is a Lakehouse SQL analytics endpoint?


In Fabric, when you create a lakehouse, a Warehouse is automatically created.

The SQL analytics endpoint enables you to query data in the Lakehouse using T-SQL
language and TDS protocol. Every Lakehouse has one SQL analytics endpoint, and each
workspace can have more than one Lakehouse. The number of SQL analytics endpoints
in a workspace matches the number of Lakehouse items.

The SQL analytics endpoint is automatically generated for every Lakehouse and
exposes Delta tables from the Lakehouse as SQL tables that can be queried using
the T-SQL language.
Every delta table from a Lakehouse is represented as one table. Data should be in
delta format.
The default Power BI semantic model is created for every SQL analytics endpoint
and it follows the naming convention of the Lakehouse objects.

There's no need to create a SQL analytics endpoint in Microsoft Fabric. Microsoft Fabric
users can't create a SQL analytics endpoint in a workspace. A SQL analytics endpoint is
automatically created for every Lakehouse. To get a SQL analytics endpoint, create a
lakehouse and a SQL analytics endpoint will be automatically created for the Lakehouse.

7 Note

Behind the scenes, the SQL analytics endpoint is using the same engine as the
Warehouse to serve high performance, low latency SQL queries.

Automatic Metadata Discovery


A seamless process reads the delta logs and from the files folder and ensures SQL
metadata for tables, such as statistics, is always up to date. There's no user action
needed, and no need to import, copy data, or set up infrastructure. For more
information, see Automatically generated schema in the SQL analytics endpoint.

Scenarios the Lakehouse enables for data


warehousing
In Fabric, we offer one warehouse.

The Lakehouse, with its SQL analytics endpoint, powered by the Warehouse, can simplify
the traditional decision tree of batch, streaming, or lambda architecture patterns.
Together with a warehouse, the lakehouse enables many additive analytics scenarios.
This section explores how to use a Lakehouse together with a Warehouse for a best of
breed analytics strategy.

Analytics with your Fabric Lakehouse's gold layer


One of the well-known strategies for lake data organization is a medallion architecture
where the files are organized in raw (bronze), consolidated (silver), and refined (gold)
layers. A SQL analytics endpoint can be used to analyze data in the gold layer of
medallion architecture if the files are stored in Delta Lake format, even if they're stored
outside the Microsoft Fabric OneLake.

You can use OneLake shortcuts to reference gold folders in external Azure Data Lake
storage accounts that are managed by Synapse Spark or Azure Databricks engines.

Warehouses can also be added as subject area or domain oriented solutions for specific
subject matter that can have bespoke analytics requirements.

If you choose to keep your data in Fabric, it will always be open and accessible through
APIs, Delta format, and of course T-SQL.

Query as a service over your delta tables from Lakehouse


and other items from OneLake data hub
There are use cases where an analyst, data scientist, or data engineer might need to
query data within a data lake. In Fabric, this end to end experience is completely
SaaSified.
OneLake is a single, unified, logical data lake for the whole organization. OneLake is
OneDrive for data. OneLake can contain multiple workspaces, for example, along your
organizational divisions. Every item in Fabric makes it data accessible via OneLake.

Data in a Microsoft Fabric Lakehouse is physically stored in OneLake with the following
folder structure:

The /Files folder contains raw and unconsolidated (bronze) files that should be
processed by data engineers before they're analyzed. The files might be in various
formats such as CSV, Parquet, different types of images, etc.
The /Tables folder contains refined and consolidated (gold) data that is ready for
business analysis. The consolidated data is in Delta Lake format.

A SQL analytics endpoint can read data in the /tables folder within OneLake. Analysis is
as simple as querying the SQL analytics endpoint of the Lakehouse. Together with the
Warehouse, you also get cross-database queries and the ability to seamless switch from
read-only queries to building additional business logic on top of your OneLake data
with Synapse Data Warehouse.

Data Engineering with Spark, and Serving with SQL


Data-driven enterprises need to keep their back-end and analytics systems in near real-
time sync with customer-facing applications. The impact of transactions must reflect
accurately through end-to-end processes, related applications, and online transaction
processing (OLTP) systems.

In Fabric, you can use Spark Streaming or Data Engineering to curate your data. You can
use the Lakehouse SQL analytics endpoint to validate data quality and for existing T-SQL
processes. This can be done in a medallion architecture or within multiple layers of your
Lakehouse, serving bronze, silver, gold, or staging, curated, and refined data. You can
customize the folders and tables created through Spark to meet your data engineering
and business requirements. When ready, a Warehouse can serve all of your downstream
business intelligence applications and other analytics use cases, without copying data,
using Views or refining data using CREATE TABLE AS SELECT (CTAS), stored procedures,
and other DML / DDL commands.

Integration with your Open Lakehouse's gold layer


A SQL analytics endpoint is not scoped to data analytics in just the Fabric Lakehouse. A
SQL analytics endpoint enables you to analyze lake data in any lakehouse, using
Synapse Spark, Azure Databricks, or any other lake-centric data engineering engine. The
data can be stored in Azure Data Lake Storage or Amazon S3.
This tight, bi-directional integration with the Fabric Lakehouse is always accessible
through any engine with open APIs, the Delta format, and of course T-SQL.

Data Virtualization of external data lakes with shortcuts


You can use OneLake shortcuts to reference gold folders in external Azure Data Lake
storage accounts that are managed by Synapse Spark or Azure Databricks engines, as
well as any delta table stored in Amazon S3.

Any folder referenced using a shortcut can be analyzed from a SQL analytics endpoint
and a SQL table is created for the referenced data. The SQL table can be used to expose
data in externally managed data lakes and enable analytics on them.

This shortcut acts as a virtual warehouse that can leveraged from a warehouse for
additional downstream analytics requirements, or queried directly.

Use the following steps to analyze data in external data lake storage accounts:

1. Create a shortcut that references a folder in Azure Data Lake storage or Amazon S3
account. Once you enter connection details and credentials, a shortcut is shown in
the Lakehouse.
2. Switch to the SQL analytics endpoint of the Lakehouse and find a SQL table that
has a name that matches the shortcut name. This SQL table references the folder in
ADLS/S3 folder.
3. Query the SQL table that references data in ADLS/S3. The table can be used as any
other table in the SQL analytics endpoint. You can join tables that reference data in
different storage accounts.

7 Note

If the SQL table is not immediately shown in the SQL analytics endpoint, you might
need to wait a few minutes. The SQL table that references data in external storage
account is created with a delay.

Analyze archived, or historical data in a data lake


Data partitioning is a well-known data access optimization technique in data lakes.
Partitioned data sets are stored in the hierarchical folders structures in the format /year=
<year>/month=<month>/day=<day> , where year , month , and day are the partitioning

columns. This allows you to store historical data logically separated in a format that
allows compute engines to read the data as needed with performant filtering, versus
reading the entire directory and all folders and files contained within.

Partitioned data enables faster access if the queries are filtering on the predicates that
compare predicate columns with a value.

A SQL analytics endpoint can easily read this type of data with no configuration
required. For example, you can use any application to archive data into a data lake,
including SQL Server 2022 or Azure SQL Managed Instance. After you partitioning data
and land it in a lake for archival purposes with external tables, a SQL analytics endpoint
can read partitioned Delta Lake tables as SQL tables and allow your organization to
analyze them. This reduces the total cost of ownership, reduces data duplication, and
lights up big data, AI, other analytics scenarios.

Data virtualization of Fabric data with shortcuts


Within Fabric, workspaces allow you to segregate data based on complex business,
geographic, or regulatory requirements.

A SQL analytics endpoint enables you to leave the data in place and still analyze data in
the Warehouse or Lakehouse, even in other Microsoft Fabric workspaces, via a seamless
virtualization. Every Microsoft Fabric Lakehouse stores data in OneLake.

Shortcuts enable you to reference folders in any OneLake location.

Every Microsoft Fabric Warehouse stores table data in OneLake. If a table is append-
only, the table data is exposed as Delta Lake data in OneLake. Shortcuts enable you to
reference folders in any OneLake where the Warehouse tables are exposed.

Cross workspace sharing and querying


While workspaces allow you to segregate data based on complex business, geographic,
or regulatory requirements, sometimes you need to facilitate sharing across these lines
for specific analytics needs.

A Lakehouse SQL analytics endpoint can enable easy sharing of data between
departments and users, where a user can bring their own capacity and warehouse.
Workspaces organize departments, business units, or analytical domains. Using
shortcuts, users can find any Warehouse or Lakehouse's data. Users can instantly
perform their own customized analytics from the same shared data. In addition to
helping with departmental chargebacks and usage allocation, this is a zero-copy version
the data as well.
The SQL analytics endpoint enables querying of any table and easy sharing. The added
controls of workspace roles and security roles that can be further layered to meet
additional business requirements.

Use the following steps to enable cross-workspace data analytics:

1. Create a OneLake shortcut that references a table or a folder in a workspace that


you can access.
2. Choose a Lakehouse or Warehouse that contains a table or Delta Lake folder that
you want to analyze. Once you select a table/folder, a shortcut is shown in the
Lakehouse.
3. Switch to the SQL analytics endpoint of the Lakehouse and find the SQL table that
has a name that matches the shortcut name. This SQL table references the folder in
another workspace.
4. Query the SQL table that references data in another workspace. The table can be
used as any other table in the SQL analytics endpoint. You can join the tables that
reference data in different workspaces.

7 Note

If the SQL table is not immediately shown in the SQL analytics endpoint, you might
need to wait a few minutes. The SQL table that references data in another
workspace is created with a delay.

Analyze partitioned data


Data partitioning is a well-known data access optimization technique in data lakes.
Partitioned data sets are stored in the hierarchical folders structures in the format /year=
<year>/month=<month>/day=<day> , where year , month , and day are the partitioning

columns. Partitioned data sets enable faster data access if the queries are filtering data
using the predicates that filter data by comparing predicate columns with a value.

A SQL analytics endpoint can represent partitioned Delta Lake data sets as SQL tables
and enable you to analyze them.

Related content
What is a lakehouse?
Create a lakehouse with OneLake
Default Power BI semantic models
Load data into the lakehouse
How to copy data using Copy activity in Data pipeline
Tutorial: Move data into lakehouse via Copy assistant
Connectivity
SQL analytics endpoint of the lakehouse
Query the Warehouse

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


How to: Create a warehouse with case-
insensitive (CI) collation
Article • 11/19/2024

Applies to: ✅ Warehouse in Microsoft Fabric

All Fabric warehouses by default are configured with case-sensitive (CS) collation
Latin1_General_100_BIN2_UTF8. You can also create warehouses with case-insensitive
(CI) collation - Latin1_General_100_CI_AS_KS_WS_SC_UTF8.

Currently, the only method available for creating a case-insensitive data warehouse is via
REST API. This article provides a step-by-step guide on how to create a warehouse with
case-insensitive collation through the REST API. It also explains how to use Visual Studio
Code with the REST Client extension to facilitate the process.

) Important

Once a warehouse is created, the collation setting cannot be changed. Carefully


consider your needs before initiating the creation process.

Prerequisites
A Fabric workspace with an active capacity or trial capacity.
Download and install Visual Studio Code to download and install the application.
Install the REST Client - Visual Studio Marketplace .

API endpoint
To create a warehouse with REST API, use the API endpoint: POST
https://api.fabric.microsoft.com/v1/workspaces/<workspace-id>/items

Here's a sample JSON request body for creating a warehouse:

JSON

{
"type": "Warehouse",
"displayName": "CaseInsensitiveAPIDemo",
"description": "New warehouse with case-insensitive collation",
"creationPayload": {
"defaultCollation": "Latin1_General_100_CI_AS_KS_WS_SC_UTF8"
}
}

Use Visual Studio Code to invoke the REST API


You can easily create a new warehouse with case-insensitive collation using Visual Studio
Code (VS Code) and the REST Client extension. Follow these steps:

1. Create a new text file in VS Code with the .http extension.

2. Input the request details in the file body. Note that there should be a blank space
between the header and the body, placed after the "Authorization" line.

JSON

POST
https://api.fabric.microsoft.com/v1/workspaces/<workspaceID>/items
HTTP/1.1
Content-Type: application/json
Authorization: Bearer <bearer token>

{
"type": "Warehouse",
"displayName": "<Warehouse name here>",
"description": "<Warehouse description here>",
"creationPayload": {
"defaultCollation": "Latin1_General_100_CI_AS_KS_WS_SC_UTF8"
}
}

3. Replace the placeholder values:

<workspaceID> : Find the workspace GUID in the URL after the /groups/

section, or by running SELECT @@SERVERNAME in an existing warehouse.


<bearer token> : Obtain this by following these steps:

a. Open your Microsoft Fabric workspace in a browser (Microsoft Edge or


Google Chrome).
b. Press F12 to open Developer Tools.
c. Select the Console tab. If necessary, select Expand Quick View to reveal
the console prompt > .
d. Type the command copy(powerBIAccessToken) and press Enter. While the
console responds undefined, the bearer token will be copied to your
clipboard.
e. Paste it in place of <bearer token> .
<Warehouse name here> : Enter the desired warehouse name.
<Warehouse description here> : Enter the desired warehouse description.

4. Select the Send Request link displayed over your POST command in the VS Code
editor.

5. You should receive a response with the status code 202 Accepted, along with
additional details about your POST request.

6. Go to the newly created warehouse in the Fabric portal.

7. Execute the following T-SQL statement in the Query editor to confirm that the
collation for your warehouse aligns with what you specified in the JSON above:

SQL

SELECT name, collation_name FROM sys.databases;

Related content
Create a Warehouse in Microsoft Fabric
Tables in data warehousing in Microsoft Fabric
Data types in Microsoft Fabric

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


Create a sample Warehouse in Microsoft
Fabric
Article • 04/24/2024

Applies to: Warehouse in Microsoft Fabric

This article describes how to get started with sample Warehouse using the Microsoft
Fabric portal, including creation and consumption of the warehouse.

How to create a new warehouse with sample


data
In this section, we walk you through creating a new Warehouse with sample data.

Create a warehouse sample using the Home hub


1. The first hub in the navigation pane is the Home hub. You can start creating your
warehouse sample from the Home hub by selecting the Warehouse sample card
under the New section.

2. Provide the name for your sample warehouse and select Create.
3. The create action creates a new Warehouse and start loading sample data into it.
The data loading takes few seconds to complete.

4. On completion of loading sample data, the warehouse opens with data loaded into
tables and views to query.

Load sample data into existing warehouse


1. Once you have created your warehouse, you can load sample data into warehouse
from Use sample database card.

2. The data loading takes few seconds to complete.


3. On completion of loading sample data, the warehouse displays data loaded into
tables and views to query.

Sample scripts
Your new warehouse is ready to accept T-SQL queries. The following sample T-SQL
scripts can be used on the sample data in your new warehouse.

7 Note
It is important to note that much of the functionality described in this section is
also available to users via a TDS end-point connection and tools such as SQL
Server Management Studio (SSMS) or Azure Data Studio (for users who prefer to
use T-SQL for the majority of their data processing needs). For more information,
see Connectivity or Query a warehouse.

SQL

/*************************************************
Get number of trips performed by each medallion
**************************************************/

SELECT
M.MedallionID
,M.MedallionCode
,COUNT(T.TripDistanceMiles) AS TotalTripCount
FROM
dbo.Trip AS T
JOIN
dbo.Medallion AS M
ON
T.MedallionID=M.MedallionID
GROUP BY
M.MedallionID
,M.MedallionCode

/****************************************************
How many passengers are being picked up on each trip?
*****************************************************/
SELECT
PassengerCount,
COUNT(*) AS CountOfTrips
FROM
dbo.Trip
WHERE
PassengerCount > 0
GROUP BY
PassengerCount
ORDER BY
PassengerCount

/***************************************************************************
******
What is the distribution of trips by hour on working days (non-holiday
weekdays)?
****************************************************************************
*****/
SELECT
ti.HourlyBucket,
COUNT(*) AS CountOfTrips
FROM dbo.Trip AS tr
INNER JOIN dbo.Date AS d
ON tr.DateID = d.DateID
INNER JOIN dbo.Time AS ti
ON tr.PickupTimeID = ti.TimeID
WHERE
d.IsWeekday = 1
AND d.IsHolidayUSA = 0
GROUP BY
ti.HourlyBucket
ORDER BY
ti.HourlyBucket

Related content
Query the SQL analytics endpoint or Warehouse in Microsoft Fabric
Warehouse settings and context menus

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


Synapse Data Warehouse in Microsoft
Fabric performance guidelines
Article • 09/24/2024

Applies to: ✅ Warehouse in Microsoft Fabric

These are guidelines to help you understand performance of your Warehouse in


Microsoft Fabric. In this article, you'll find guidance and important articles to focus on.
Warehouse in Microsoft Fabric is a SaaS platform where activities like workload
management, concurrency, and storage management are managed internally by the
platform. In addition to this internal performance management, you can still improve
your performance by developing performant queries against well-designed warehouses.

Cold run (cold cache) performance


Caching with local SSD and memory is automatic. The first 1-3 executions of a query
perform noticeably slower than subsequent executions. If you are experiencing cold run
performance issues, here are a couple of things you can do that can improve your cold
run performance:

If the first run's performance is crucial, try manually creating statistics. Review the
statistics article to better understand the role of statistics and for guidance on how
to create manual statistics to improve your query performance. However, if the first
run's performance is not critical, you can rely on automatic statistics that will be
generated in the first query and will continue to be leveraged in subsequent runs
(so long as underlying data does not change significantly).

If using Power BI, use Direct Lake mode where possible.

Metrics for monitoring performance


Currently, the Monitoring Hub does not include Warehouse. If you choose Data
Warehouse, you will not be able to access the Monitoring Hub from the navigation bar.

Fabric administrators will be able to access the Capacity Utilization and Metrics report
for up-to-date information tracking the utilization of capacity that includes Warehouse.
Use dynamic management views (DMVs) to
monitor query execution
You can use dynamic management views (DMVs) to monitor connection, session, and
request status in the Warehouse.

Statistics
The Warehouse uses a query engine to create an execution plan for a given SQL query.
When you submit a query, the query optimizer tries to enumerate all possible plans and
choose the most efficient candidate. To determine which plan would require the least
overhead, the engine needs to be able to evaluate the amount of work or rows that
might be processed by each operator. Then, based on each plan's cost, it chooses the
one with the least amount of estimated work. Statistics are objects that contain relevant
information about your data, to allow the query optimizer to estimate these costs.

You can also manually update statistics after each data load or data update to assure
that the best query plan can be built.

For more information statistics and how you can augment the automatically created
statistics, see Statistics in Fabric data warehousing.

Data ingestion guidelines


There are four options for data ingestion into a Warehouse:

COPY (Transact-SQL)
Data pipelines
Dataflows
Cross-warehouse ingestion

To help determine which option is best for you and to review some data ingestion best
practices, review Ingest data.

Group INSERT statements into batches (avoid


trickle inserts)
A one-time load to a small table with an INSERT statement, such as shown in the
following example, might be the best approach depending on your needs. However, if
you need to load thousands or millions of rows throughout the day, singleton INSERTS
aren't optimal.

SQL

INSERT INTO MyLookup VALUES (1, 'Type 1')

For guidance on how to handle these trickle-load scenarios, see Best practices for
ingesting data.

Minimize transaction sizes


INSERT, UPDATE, and DELETE statements run in a transaction. When they fail, they must
be rolled back. To reduce the potential for a long rollback, minimize transaction sizes
whenever possible. Minimizing transaction sizes can be done by dividing INSERT,
UPDATE, and DELETE statements into parts. For example, if you have an INSERT that you
expect to take 1 hour, you can break up the INSERT into four parts. Each run will then be
shortened to 15 minutes.

Consider using CTAS (Transact-SQL) to write the data you want to keep in a table rather
than using DELETE. If a CTAS takes the same amount of time, it's safer to run since it has
minimal transaction logging and can be canceled quickly if needed.

Collocate client applications and Microsoft


Fabric
If you're using client applications, make sure you're using Microsoft Fabric in a region
that's close to your client computer. Client application examples include Power BI
Desktop, SQL Server Management Studio, and Azure Data Studio.

Utilize star schema data design


A star schema organizes data into fact tables and dimension tables. It facilitates
analytical processing by denormalizing the data from highly normalized OLTP systems,
ingesting transactional data, and enterprise master data into a common, cleansed, and
verified data structure that minimizes joins at query time, reduces the number of rows
read and facilitates aggregations and grouping processing.

For more Warehouse design guidance, see Tables in data warehousing.


Reduce query result set sizes
Reducing query result set sizes helps you avoid client-side issues caused by large query
results. The SQL Query editor results sets are limited to the first 10,000 rows to avoid
these issues in this browser-based UI. If you need to return more than 10,000 rows, use
SQL Server Management Studio (SSMS) or Azure Data Studio.

Choose the best data type for performance


When defining your tables, use the smallest data type that supports your data as doing
so will improve query performance. This recommendation is important for CHAR and
VARCHAR columns. If the longest value in a column is 25 characters, then define your
column as VARCHAR(25). Avoid defining all character columns with a large default
length.

Use integer-based data types if possible. SORT, JOIN, and GROUP BY operations
complete faster on integers than on character data.

For supported data types and more information, see data types.

SQL analytics endpoint performance


For information and recommendations on performance of the SQL analytics endpoint,
see SQL analytics endpoint performance considerations.

Data Compaction
Data compaction consolidates smaller Parquet files into fewer, larger files, which
optimizes read operations. This process also helps in efficiently managing deleted rows
by eliminating them from immutable Parquet files. The data compaction process
involves re-writing tables or segments of tables into new Parquet files that are optimized
for performance. For more information, see Blog: Automatic Data Compaction for Fabric
Warehouse .

The data compaction process is seamlessly integrated into the warehouse. As queries
are executed, the system identifies tables that could benefit from compaction and
performs necessary evaluations. There is no manual way to trigger data compaction.

Related content
Query the SQL analytics endpoint or Warehouse in Microsoft Fabric
Limitations
Troubleshoot the Warehouse
Data types
T-SQL surface area
Tables in data warehouse
Caching in Fabric data warehousing

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


SQL analytics endpoint performance
considerations
Article • 11/04/2024

Applies to: ✅ SQL analytics endpoint in Microsoft Fabric

The SQL analytics endpoint enables you to query data in the lakehouse using T-SQL
language and TDS protocol. Every lakehouse has one SQL analytics endpoint. The
number of SQL analytics endpoints in a workspace matches the number of lakehouses
and mirrored databases provisioned in that one workspace.

A background process is responsible for scanning lakehouse for changes, and keeping
SQL analytics endpoint up-to-date for all the changes committed to lakehouses in a
workspace. The sync process is transparently managed by Microsoft Fabric platform.
When a change is detected in a lakehouse, a background process updates metadata and
the SQL analytics endpoint reflects the changes committed to lakehouse tables. Under
normal operating conditions, the lag between a lakehouse and SQL analytics endpoint is
less than one minute. The actual length of time can vary from a few seconds to minutes
depending on a number of factors that are dicussed in this article.

Automatically generated schema in the SQL analytics


endpoint of the Lakehouse
The SQL analytics endpoint manages the automatically generated tables so the
workspace users can't modify them. Users can enrich the database model by adding
their own SQL schemas, views, procedures, and other database objects.

For every Delta table in your Lakehouse, the SQL analytics endpoint automatically
generates a table in the appropriate schema. For autogenerated schema data types for
the SQL analytics endpoint, see Data types in Microsoft Fabric.

Tables in the SQL analytics endpoint are created with a minor delay. Once you create or
update Delta Lake table in the lake, the SQL analytics endpoint table that references the
Delta lake table will be created/refreshed automatically.

The amount of time it takes to refresh the table is related to how optimized the Delta
tables are. For more information, review Delta Lake table optimization and V-Order to
learn more about key scenarios, and an in-depth guide on how to efficiently maintain
Delta tables for maximum performance.
You can manually force a refresh of the automatic metadata scanning in the Fabric
portal. On the page for the SQL analytics endpoint, select the Refresh button in the
Explorer toolbar to refresh the schema. Go to Query your SQL analytics endpoint, and
look for the refresh button, as shown in the following image.

Guidance
Automatic metadata discovery tracks changes committed to lakehouses, and is a
single instance per Fabric workspace. If you are observing increased latency for
changes to sync between lakehouses and SQL analytics endpoint, it could be due
to large number of lakehouses in one workspace. In such a scenario, consider
migrating each lakehouse to a separate workspace as this allows automatic
metadata discovery to scale.
Parquet files are immutable by design. When there's an update or a delete
operation, a Delta table will add new parquet files with the changeset, increasing
the number of files over time, depending on frequency of updates and deletes. If
there's no maintenance scheduled, eventually, this pattern creates a read overhead
and this impacts time it takes to sync changes to SQL analytics endpoint. To
address this, schedule regular lakehouse table maintenance operations.
In some scenarios, you might observe that changes committed to a lakehouse are
not visible in the associated SQL analytics endpoint. For example, you might have
created a new table in lakehouse, but it's not listed in the SQL analytics endpoint.
Or, you might have committed a large number of rows to a table in a lakehouse
but this data is not visible in SQL analytics endpoint. We recommend initiating an
on-demand metadata sync, triggered from the SQL query editor Refresh ribbon
option. This option forces an on-demand metadata sync, rather than waiting on
the background metadata sync to finish.
Not all Delta features are understood by the automatic sync process. For more
information on the functionality supported by each engine in Fabric, see Delta Lake
Interoperability.
If there is an extremely large volumne of tables changes during the Extract
Transform and Load (ETL) processing, an expected delay could occur until all the
changes are processed.

Partition size considerations


The choice of partition column for a delta table in a lakehouse also affects the time it
takes to sync changes to SQL analytics endpoint. The number and size of partitions of
the partition column are important for performance:

A column with high cardinality (mostly or entirely made of unique values) results in
a large number of partitions. A large number of partitions negatively impacts
performance of the metadata discovery scan for changes. If the cardinality of a
column is high, choose another column for partitioning.
The size of each partition can also affect performance. Our recommendation is to
use a column that would result in a partition of at least (or close to) 1 GB. We
recommend following best practices for delta tables maintenance; optimization.
For a python script to evaluate partitions, see Sample script for partition details.

A large volume of small-sized parquet files increases the time it takes to sync changes
between a lakehouse and its associated SQL analytics endpoint. You might end up with
large number of parquet files in a delta table for one or more reasons:

If you choose a partition for a delta table with high number of unique values, it's
partitioned by each unique value and might be over-partitioned. Choose a
partition column that doesn't have a high cardinality, and results in individual
partition size of at least 1 GB.
Batch and streaming data ingestion rates might also result in small files depending
on frequency and size of changes being written to a lakehouse. For example, there
might be small volume of changes coming through to the lakehouse and this
would result in small parquet files. To address this, we recommend implementing
regular lakehouse table maintenance.

Sample script for partition details


Use the following notebook to print a report detailing size and details of partitions
underpinning a delta table.

1. First, you must provide the ABSFF path for your delta table in the variable
delta_table_path .

You can get ABFSS path of a delta table from the Fabric portal Explorer.
Right-click on table name, then select COPY PATH from the list of options.

2. The script outputs all partitions for the delta table.


3. The script iterates through each partition to calculate the total size and number of
files.
4. The script outputs the details of partitions, files per partitions, and size per
partition in GB.
The complete script can be copied from the following code block:

Python

# Purpose: Print out details of partitions, files per partitions, and size
per partition in GB.
from notebookutils import mssparkutils

# Define ABFSS path for your delta table. You can get ABFSS path of a delta
table by simply right-clicking on table name and selecting COPY PATH from
the list of options.
delta_table_path = "abfss://<workspace
id>@<onelake>.dfs.fabric.microsoft.com/<lakehouse id>/Tables/<tablename>"

# List all partitions for given delta table


partitions = mssparkutils.fs.ls(delta_table_path)

# Initialize a dictionary to store partition details


partition_details = {}

# Iterate through each partition


for partition in partitions:
if partition.isDir:
partition_name = partition.name
partition_path = partition.path
files = mssparkutils.fs.ls(partition_path)

# Calculate the total size of the partition

total_size = sum(file.size for file in files if not file.isDir)

# Count the number of files

file_count = sum(1 for file in files if not file.isDir)

# Write partition details

partition_details[partition_name] = {
"size_bytes": total_size,
"file_count": file_count
}

# Print the partition details


for partition_name, details in partition_details.items():
print(f"{partition_name}, Size: {details['size_bytes']:.2f} bytes, Number
of files: {details['file_count']}")

Related content
Better together: the lakehouse and warehouse
Synapse Data Warehouse in Microsoft Fabric performance guidelines
Limitations of the SQL analytics endpoint

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


Tables in data warehousing in Microsoft
Fabric
Article • 10/09/2024

Applies to: ✅ Warehouse in Microsoft Fabric

This article details key concepts for designing tables in Microsoft Fabric.

In tables, data is logically organized in a row-and-column format. Each row represents a


unique record, and each column represents a field in the record.

In Warehouse, tables are database objects that contain all the transactional data.

Determine table category


A star schema organizes data into fact tables and dimension tables. Some tables are
used for integration or staging data before moving to a fact or dimension table. As you
design a table, decide whether the table data belongs in a fact, dimension, or
integration table. This decision informs the appropriate table structure.

Fact tables contain quantitative data that are commonly generated in a


transactional system, and then loaded into the data warehouse. For example, a
retail business generates sales transactions every day, and then loads the data into
a data warehouse fact table for analysis.

Dimension tables contain attribute data that might change but usually changes
infrequently. For example, a customer's name and address are stored in a
dimension table and updated only when the customer's profile changes. To
minimize the size of a large fact table, the customer's name and address don't
need to be in every row of a fact table. Instead, the fact table and the dimension
table can share a customer ID. A query can join the two tables to associate a
customer's profile and transactions.

Integration tables provide a place for integrating or staging data. For example,
you can load data to a staging table, perform transformations on the data in
staging, and then insert the data into a production table.

A table stores data in OneLake as part of the Warehouse. The table and the data persist
whether or not a session is open.
Tables in the Warehouse
To show the organization of the tables, you could use fact , dim , or int as prefixes to
the table names. The following table shows some of the schema and table names for
WideWorldImportersDW sample data warehouse.

ノ Expand table

WideWorldImportersDW Source Table Name Table Type Data Warehouse Table Name

City Dimension wwi.DimCity

Order Fact wwi.FactOrder

Table names are case sensitive.


Table names can't contain / or \ or end with a . .

Create a table
For Warehouse, you can create a table as a new empty table. You can also create and
populate a table with the results of a select statement. The following are the T-SQL
commands for creating a table.

ノ Expand table

T-SQL Description
Statement

CREATE TABLE Creates an empty table by defining all the table columns and options.

CREATE TABLE Populates a new table with the results of a select statement. The table columns
AS SELECT and data types are based on the select statement results. To import data, this
statement can select from an external table.

This example creates a table with two columns:

SQL

CREATE TABLE MyTable (col1 int, col2 int );

Schema names
Warehouse supports the creation of custom schemas. Like in SQL Server, schemas are a
good way to group together objects that are used in a similar fashion. The following
code creates a user-defined schema called wwi .

Schema names are case sensitive.


Schema names can't contain / or \ or end with a . .

SQL

CREATE SCHEMA wwi;

Data types
Microsoft Fabric supports the most commonly used T-SQL data types.

For more about data types, see Data types in Microsoft Fabric.
When you create a table in Warehouse, review the data types reference in CREATE
TABLE (Transact-SQL).
For a guide to create a table in Warehouse, see Create tables.

Collation
Latin1_General_100_BIN2_UTF8 is the default collation for both tables and metadata.

You can create a warehouse with the case-insensitive (CI) collation


Latin1_General_100_CI_AS_KS_WS_SC_UTF8 . For more information, see How to: Create a

warehouse with case insensitive (CI) collation.

Supported collations in the API are:

Latin1_General_100_BIN2_UTF8 (default)
Latin1_General_100_CI_AS_KS_WS_SC_UTF8

Once the collation is set during database creation, all subsequent objects (tables,
columns, etc.) will inherit this default collation.

Statistics
The query optimizer uses column-level statistics when it creates the plan for executing a
query. To improve query performance, it's important to have statistics on individual
columns, especially columns used in query joins. Warehouse supports automatic
creation of statistics.

Statistical updating doesn't happen automatically. Update statistics after a significant


number of rows are added or changed. For instance, update statistics after a load. For
more information, see Statistics.

Primary key, foreign key, and unique key


For Warehouse, PRIMARY KEY and UNIQUE constraint are only supported when
NONCLUSTERED and NOT ENFORCED are both used.

FOREIGN KEY is only supported when NOT ENFORCED is used.

For syntax, check ALTER TABLE.


For more information, see Primary keys, foreign keys, and unique keys in
Warehouse in Microsoft Fabric.

Align source data with the data warehouse


Warehouse tables are populated by loading data from another data source. To achieve a
successful load, the number and data types of the columns in the source data must align
with the table definition in the data warehouse.

If data is coming from multiple data stores, you can port the data into the data
warehouse and store it in an integration table. Once data is in the integration table, you
can use the power of data warehouse to implement transformation operations. Once
the data is prepared, you can insert it into production tables.

Limitations
Warehouse supports many, but not all, of the table features offered by other databases.

The following list shows some of the table features that aren't currently supported.

1024 maximum columns per table


Computed columns
Indexed views
Partitioned tables
Sequence
Sparse columns
Surrogate keys on number sequences with Identity columns
Synonyms
Temporary tables
Triggers
Unique indexes
User-defined types

) Important

There are limitations with adding table constraints or columns when using Source
Control with Warehouse.

Related content
What is data warehousing in Microsoft Fabric?
What is data engineering in Microsoft Fabric?
Create a Warehouse
Query a warehouse
OneLake overview
Create tables in Warehouse
Transactions and modify tables

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


Data types in Microsoft Fabric
Article • 10/31/2024

Applies to: ✅ SQL analytics endpoint and Warehouse in Microsoft Fabric

Tables in Microsoft Fabric support the most commonly used T-SQL data types.

For more information on table creation, see Tables.


For syntax, see CREATE TABLE

Data types in Warehouse


Warehouse supports a subset of T-SQL data types. Each offered data type is based on
the SQL Server data type of the same name. For more information, to the reference
article for each in the following table.

ノ Expand table

Category Supported data types

Exact numerics bit


smallint
int
bigint
decimal/numeric

Approximate numerics float


real

Date and time date


time*
datetime2*

Fixed-length character strings char

Variable length character strings varchar***

Binary strings varbinary***


uniqueidentifer**
* The precision for datetime2 and time is limited to 6 digits of precision on fractions of
seconds.

** The uniqueidentifier data type is a T-SQL data type without a matching data type in
Delta Parquet. As a result, it's stored as a binary type. Warehouse supports storing and
reading uniqueidentifier columns, but these values can't be read on the SQL analytics
endpoint. Reading uniqueidentifier values in the lakehouse displays a binary
representation of the original values. As a result, features such as cross-joins between
Warehouse and SQL analytics endpoint using a uniqueidentifier column don't work as
expected.

*** Support for varchar (max) and varbinary (max) is currently in preview.

For more information about the supported data types including their precisions, see
data types in CREATE TABLE reference.

Unsupported data types


For T-SQL data types that aren't currently supported, some alternatives are available.
Make sure you evaluate the use of these types, as precision and query behavior vary:

ノ Expand table

Unsupported Alternatives available


data type

money and Use decimal, however note that it can't store the monetary unit.
smallmoney

datetime and Use datetime2.


smalldatetime

datetimeoffset Use datetime2, however you can use datetimeoffset for converting data with
CAST the AT TIME ZONE (Transact-SQL) function. For an example, see
datetimeoffset.

nchar and Use char and varchar respectively, as there's no similar unicode data type in
nvarchar Parquet. The char and varchar types in a UTF-8 collation might use more
storage than nchar and nvarchar to store unicode data. To understand the
impact on your environment, see Storage differences between UTF-8 and
UTF-16.

text and ntext Use varchar.

image Use varbinary.

tinyint Use smallint.


Unsupported Alternatives available
data type

geography No equivalent.

Unsupported data types can still be used in T-SQL code for variables, or any in-memory
use in session. Creating tables or views that persist data on disk with any of these types
isn't allowed.

For a guide to create a table in Warehouse, see Create tables.

Autogenerated data types in the SQL analytics


endpoint
The tables in SQL analytics endpoint are automatically created whenever a table is
created in the associated lakehouse. The column types in the SQL analytics endpoint
tables are derived from the source Delta types.

The rules for mapping original Delta types to the SQL types in SQL analytics endpoint
are shown in the following table:

ノ Expand table

Delta data type SQL data type (mapped)

LONG, BIGINT bigint

BOOLEAN, BOOL bit

INT, INTEGER int

TINYINT, BYTE, SMALLINT, SHORT smallint

DOUBLE float

FLOAT, REAL real

DATE date

TIMESTAMP datetime2

CHAR(n) varchar(n) with Latin1_General_100_BIN2_UTF8 collation

STRING, VARCHAR(n) varchar(n) with Latin1_General_100_BIN2_UTF8 collation

STRING, VARCHAR(MAX) varchar(MAX) with Latin1_General_100_BIN2_UTF8 collation


Delta data type SQL data type (mapped)

BINARY varbinary(n)

DECIMAL, DEC, NUMERIC decimal(p,s)

The columns that have the types that aren't listed in the table aren't represented as the
table columns in the SQL analytics endpoint.

Related content
T-SQL Surface Area in Microsoft Fabric

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


T-SQL surface area in Microsoft Fabric
Article • 11/19/2024

Applies to: ✅ SQL analytics endpoint and Warehouse in Microsoft Fabric

This article covers the T-SQL language syntax capabilities of Microsoft Fabric, when
querying the SQL analytics endpoint or Warehouse.

These limitations apply only to Warehouse and SQL analytics endpoint items in Fabric
Synapse Data Warehouse. For limitations of SQL Database in Fabric, see Limitations in
SQL Database in Microsoft Fabric (Preview).

7 Note

For more information on upcoming feature development for Fabric Data


Warehouse, see the Fabric Data Warehouse release plan.

T-SQL surface area


Creating, altering, and dropping tables, and insert, update, and delete are only
supported in Warehouse in Microsoft Fabric, not in the SQL analytics endpoint of
the Lakehouse.
You can create your own T-SQL views, functions, and procedures on top of the
tables that reference your Delta Lake data in the SQL analytics endpoint of the
Lakehouse.
For more about CREATE/DROP TABLE support, see Tables.
Fabric Warehouse and SQL analytics endpoint both support standard, sequential,
and nested CTEs. While CTEs are generally available in Microsoft Fabric, nested
CTEs are currently a preview feature. For more information, see Nested Common
Table Expression (CTE) in Fabric data warehousing (Transact-SQL).
For more about data types, see Data types.
TRUNCATE Table is supported in Warehouse in Microsoft Fabric.

Limitations
At this time, the following list of commands is NOT currently supported. Don't try to use
these commands. Even though they might appear to succeed, they could cause issues to
your warehouse.

ALTER TABLE ADD / ALTER / DROP COLUMN


Currently, only the following subset of ALTER TABLE operations in Warehouse in
Microsoft Fabric are supported:
ADD nullable columns of supported column data types.
ADD or DROP PRIMARY KEY, UNIQUE, and FOREIGN_KEY column constraints,
but only if the NOT ENFORCED option has been specified. All other ALTER
TABLE operations are blocked.
There are limitations with adding table constraints or columns when using
Source Control with Warehouse.
BULK LOAD

CREATE ROLE

CREATE USER

Hints
IDENTITY Columns
Manually created multi-column stats
Materialized views
MERGE
OPENROWSET

PREDICT

Queries targeting system and user tables


Recursive queries
Result Set Caching
Schema and table names can't contain / or \
SELECT - FOR XML
SET ROWCOUNT

SET TRANSACTION ISOLATION LEVEL


sp_showspaceused

Temporary tables
Triggers

Related content
Query insights in Fabric data warehousing
What is data warehousing in Microsoft Fabric?
Data types in Microsoft Fabric
Limitations in Microsoft Fabric

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


Primary keys, foreign keys, and unique
keys in Warehouse in Microsoft Fabric
Article • 09/24/2024

Applies to: ✅ SQL analytics endpoint and Warehouse in Microsoft Fabric

Learn about table constraints in SQL analytics endpoint and Warehouse in Microsoft
Fabric, including the primary key, foreign keys, and unique keys.

) Important

To add or remove primary key, foreign key, or unique constraints, use ALTER TABLE.
These cannot be created inline within a CREATE TABLE statement.

Table constraints
SQL analytics endpoint and Warehouse in Microsoft Fabric support these table
constraints:

PRIMARY KEY is only supported when NONCLUSTERED and NOT ENFORCED are
both used.
FOREIGN KEY is only supported when NOT ENFORCED is used.
UNIQUE constraint is only supported when NONCLUSTERED and NOT ENFORCED
are both used.

For syntax, check ALTER TABLE.

SQL analytics endpoint and Warehouse don't support default constraints at this
time.
For more information on tables, see Tables in data warehousing in Microsoft Fabric.

) Important

There are limitations with adding table constraints or columns when using Source
Control with Warehouse.

Examples
Create a Microsoft Fabric Warehouse table with a primary key:

SQL

CREATE TABLE PrimaryKeyTable (c1 INT NOT NULL, c2 INT);

ALTER TABLE PrimaryKeyTable ADD CONSTRAINT PK_PrimaryKeyTable PRIMARY KEY


NONCLUSTERED (c1) NOT ENFORCED;

Create a Microsoft Fabric Warehouse table with a unique constraint:

SQL

CREATE TABLE UniqueConstraintTable (c1 INT NOT NULL, c2 INT);

ALTER TABLE UniqueConstraintTable ADD CONSTRAINT UK_UniqueConstraintTablec1


UNIQUE NONCLUSTERED (c1) NOT ENFORCED;

Create a Microsoft Fabric Warehouse table with a foreign key:

SQL

CREATE TABLE ForeignKeyReferenceTable (c1 INT NOT NULL);

ALTER TABLE ForeignKeyReferenceTable ADD CONSTRAINT


PK_ForeignKeyReferenceTable PRIMARY KEY NONCLUSTERED (c1) NOT ENFORCED;

CREATE TABLE ForeignKeyTable (c1 INT NOT NULL, c2 INT);

ALTER TABLE ForeignKeyTable ADD CONSTRAINT FK_ForeignKeyTablec1 FOREIGN KEY


(c1) REFERENCES ForeignKeyReferenceTable (c1) NOT ENFORCED;

Related content
Design tables in Warehouse in Microsoft Fabric
Data types in Microsoft Fabric
What is data warehousing in Microsoft Fabric?
What is data engineering in Microsoft Fabric?
Warehouse in Microsoft Fabric
Create a Warehouse
Query a warehouse

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


Generate unique identifiers in a
warehouse table in Microsoft Fabric
Article • 11/19/2024

Applies to: ✅ Warehouse in Microsoft Fabric

It's a common requirement in data warehouses to assign a unique identifier to each row
of a table. In SQL Server-based environments this is typically done by creating an
identity column in a table, however currently this feature isn't supported in a warehouse
in Microsoft Fabric. Instead, you need to use a workaround technique. We present two
alternatives.

This article describes workaround techniques to generate unique identifiers in a


warehouse table.

Method 1
This method is most applicable when you need to create identity values, but the order
of the values isn't important (nonsequential values are acceptable).

Unique values are generated in the code that inserts data into the table.

1. To create unique data using this method, create a table that includes a column that
stores unique identifier values. The column data type should be set to bigint. You
should also define the column as NOT NULL to ensure that every row is assigned an
identifier.

The following T-SQL code sample creates an example table named


Orders_with_Identifier in the dbo schema, where the Row_ID column serves as a
unique key.

SQL

--Drop a table named 'Orders_with_Identifier' in schema 'dbo', if it


exists
IF OBJECT_ID('[dbo].[Orders_with_Identifier]', 'U') IS NOT NULL
DROP TABLE [dbo].[Orders_with_Identifier];
GO

CREATE TABLE [dbo].[Orders_with_Identifier] (


[Row_ID] BIGINT NOT NULL,
[O_OrderKey] BIGINT NULL,
[O_CustomerKey] BIGINT NULL,
[O_OrderStatus] VARCHAR(1) NULL,
[O_TotalPrice] DECIMAL(15, 2) NULL,
[O_OrderDate] DATE NULL,
[O_OrderPriority] VARCHAR(15) NULL,
[O_Clerk] VARCHAR (15) NULL,
[O_ShipPriority] INT NULL,
[O_Comment] VARCHAR (79) NULL
);

2. When you insert rows into the table, via T-SQL scripts or application code or
otherwise, generate unique data for Row_ID with the NEWID() function. This
function generates a unique value of type uniqueidentifier which can then be cast
and stored as a bigint.

The following code inserts rows into the dbo.Orders_with_Identifier table. The
values for the Row_ID column are computed by converting the values returned by
the newid() function. The function doesn't require an ORDER BY clause and
generates a new value for each record.

SQL

--Insert new rows with unique identifiers


INSERT INTO [dbo].[Orders_with_Identifier]
SELECT
CONVERT(BIGINT, CONVERT(VARBINARY, CONCAT(NEWID(), GETDATE()))) AS
[Row_ID],
[src].[O_OrderKey],
[src].[O_CustomerKey],
[src].[O_OrderStatus],
[src].[O_TotalPrice],
[src].[O_OrderDate],
[src].[O_OrderPriority],
[src].[O_Clerk],
[src].[O_ShipPriority],
[src].[O_Comment]
FROM [dbo].[Orders] AS [src];

Method 2
This method is most applicable when you need to create sequential identity values but
should be used with caution on larger datasets as it can be slower than alternative
methods. Considerations should also be made for multiple processes inserting data
simultaneously as this could lead to duplicate values.

1. To create unique data using this method, create a table that includes a column that
stores unique identifier values. The column data type should be set to int or bigint,
depending on the volume of data you expect to store. You should also define the
column as NOT NULL to ensure that every row is assigned an identifier.

The following T-SQL code sample creates an example table named


Orders_with_Identifier in the dbo schema, where the Row_ID column serves as a

unique key.

SQL

--Drop a table named 'Orders_with_Identifier' in schema 'dbo', if it


exists
IF OBJECT_ID('[dbo].[Orders_with_Identifier]', 'U') IS NOT NULL
DROP TABLE [dbo].[Orders_with_Identifier];
GO

CREATE TABLE [dbo].[Orders_with_Identifier] (


[Row_ID] BIGINT NOT NULL,
[O_OrderKey] BIGINT NULL,
[O_CustomerKey] BIGINT NULL,
[O_OrderStatus] VARCHAR(1) NULL,
[O_TotalPrice] DECIMAL(15, 2) NULL,
[O_OrderDate] DATE NULL,
[O_OrderPriority] VARCHAR(15) NULL,
[O_Clerk] VARCHAR (15) NULL,
[O_ShipPriority] INT NULL,
[O_Comment] VARCHAR (79) NULL
);
GO

2. Before you insert rows into the table, you need to determine the last identifier
value stored in the table. You can do that by retrieving the maximum identifier
value. This value should be assigned to a variable so you can refer to it when you
insert table rows (in the next step).

The following code assigns the last identifier value to a variable named @MaxID .

SQL

--Assign the last identifier value to a variable


--If the table doesn't contain any rows, assign zero to the variable
DECLARE @MaxID AS BIGINT;

IF EXISTS(SELECT * FROM [dbo].[Orders_with_Identifier])


SET @MaxID = (SELECT MAX([Row_ID]) FROM [dbo].
[Orders_with_Identifier]);
ELSE
SET @MaxID = 0;
3. When you insert rows into the table, unique and sequential numbers are computed
by adding the value of the @MaxID variable to the values returned by the
ROW_NUMBER function. This function is a window function that computes a
sequential row number starting with 1 .

The following T-SQL code—which is run in the same batch as the script in step 2—
inserts rows into the Orders_with_Identifier table. The values for the Row_ID
column are computed by adding the @MaxID variable to values returned by the
ROW_NUMBER function. The function must have an ORDER BY clause, which defines

the logical order of the rows within the result set. However when set to SELECT
NULL , no logical order is imposed, meaning identifier values are arbitrarily assigned.

This ORDER BY clause results in a faster execution time.

SQL

--Insert new rows with unique identifiers


INSERT INTO [dbo].[Orders_with_Identifier]
SELECT
@MaxID + ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS [Row_ID],
[src].[O_OrderKey],
[src].[O_CustomerKey],
[src].[O_OrderStatus],
[src].[O_TotalPrice],
[src].[O_OrderDate],
[src].[O_OrderPriority],
[src].[O_Clerk],
[src].[O_ShipPriority],
[src].[O_Comment]
FROM [dbo].[Orders] AS [src];

Related content
Design tables in Warehouse in Microsoft Fabric
Data types in Microsoft Fabric
ROW_NUMBER (Transact-SQL)
SELECT - OVER Clause (Transact-SQL)

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


Transactions in Warehouse tables in
Microsoft Fabric
Article • 04/24/2024

Applies to: SQL analytics endpoint and Warehouse in Microsoft Fabric

Similar to their behavior in SQL Server, transactions allow you to control the commit or
rollback of read and write queries.

You can modify data that is stored in tables in a Warehouse using transactions to group
changes together.

For example, you could commit inserts to multiples tables, or, none of the tables if
an error arises. If you're changing details about a purchase order that affects three
tables, you can group those changes into a single transaction. That means when
those tables are queried, they either all have the changes or none of them do.
Transactions are a common practice for when you need to ensure your data is
consistent across multiple tables.

Transactional capabilities
The same transactional capabilities are supported in the SQL analytics endpoint in
Microsoft Fabric, but for read-only queries.

Transactions can also be used for sequential SELECT statements to ensure the tables
involved all have data from the same point in time. As an example, if a table has new
rows added by another transaction, the new rows don't affect the SELECT queries inside
an open transaction.

) Important

Only the snapshot isolation level is supported in Microsoft Fabric. If you use T-SQL
to change your isolation level, the change is ignored at Query Execution time and
snapshot isolation is applied.

Cross-database query transaction support


Warehouse in Microsoft Fabric supports transactions that span across databases that are
within the same workspace including reading from the SQL analytics endpoint of the
Lakehouse. Every Lakehouse has one read-only SQL analytics endpoint. Each workspace
can have more than one lakehouse.

DDL support within transactions


Warehouse in Microsoft Fabric supports DDL such as CREATE TABLE inside user-defined
transactions.

Locks for different types of statements


This table provides a list of what locks are used for different types of transactions, all
locks are at the table level:

ノ Expand table

Statement type Lock taken

SELECT Schema-Stability (Sch-S)

INSERT Intent Exclusive (IX)

DELETE Intent Exclusive (IX)

UPDATE Intent Exclusive (IX)

COPY INTO Intent Exclusive (IX)

DDL Schema-Modification (Sch-M)

These locks prevent conflicts such as a table's schema being changed while rows are
being updated in a transaction.

You can query locks currently held with the dynamic management view (DMV)
sys.dm_tran_locks.

Conflicts from two or more concurrent transactions that update one or more rows in a
table are evaluated at the end of the transaction. The first transaction to commit
completes successfully and the other transactions are rolled back with an error returned.
These conflicts are evaluated at the table level and not the individual parquet file level.

INSERT statements always create new parquet files, which means fewer conflicts with
other transactions except for DDL because the table's schema could be changing.

Transaction logging
Transaction logging in Warehouse in Microsoft Fabric is at the parquet file level because
parquet files are immutable (they can't be changed). A rollback results in pointing back
to the previous parquet files. The benefits of this change are that transaction logging
and rollbacks are faster.

Limitations
Distributed transactions are not supported.
Save points are not supported.
Named transactions are not supported.
Marked transactions are not supported.
ALTER TABLE is not supported within an explicit transaction.
At this time, there's limited T-SQL functionality in the warehouse. See TSQL surface
area for a list of T-SQL commands that are currently not available.
If a transaction has data insertion into an empty table and issues a SELECT before
rolling back, the automatically generated statistics can still reflect the uncommitted
data, causing inaccurate statistics. Inaccurate statistics can lead to unoptimized
query plans and execution times. If you roll back a transaction with SELECTs after a
large INSERT, update statistics for the columns mentioned in your SELECT.

Related content
Query the SQL analytics endpoint or Warehouse in Microsoft Fabric
Tables in Warehouse

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


Warehouse settings and context menus
Article • 09/24/2024

Applies to: ✅ SQL analytics endpoint and Warehouse in Microsoft Fabric

Settings are accessible from the context menu or from the Settings icon in the ribbon
when you open the item. There are some key differences in the actions you can take in
settings depending on if you're interacting with the SQL analytics endpoint or a data
warehouse.

Settings options
This section describes and explains the settings options available based on the item
you're working with and its description.

The following image shows the warehouse settings menu.

The following table is a list of settings available for each warehouse.

ノ Expand table

Setting Detail Editable


for

Name Lets user read/edit name of the warehouse. Warehouse

Description Lets users add metadata details to provide descriptive Warehouse


information about a warehouse.

Owned by Name of the user who owns the warehouse.


Setting Detail Editable
for

Last modified by Name of the user who modified the warehouse recently.

SQL connection The SQL connection string for the workspace. You can use the
string SQL connection string to create a connection to the
warehouse using various tools, such as SSMS/Azure Data
Studio.

Sensitivity label Classify your warehouse to protect it from unauthorized


access.

Endorsement Endorse the warehouse and make it discoverable in your org.

Default Power BI Automatically add the warehouse's objects to the default


semantic model semantic model.

Restore points Create restore points to restore the warehouse to a previous


version.

The following table shows settings for the default Power BI semantic model.

ノ Expand table

Setting Details

Request access Request access to the default Power BI semantic model.

Q&A Use natural language to ask question on your data.

Query caching Turn on or off caching query results for speeding up reports by using
previously saved query results.

Server settings The XMLA connection string of the default semantic model.

Endorsement and Endorse the default semantic model independently from warehouse and
discovery make it discoverable in your org.

Context menus
Applies to: ✅ Warehouse in Microsoft Fabric

Warehouse offers an easy experience to create reports and access supported actions
using its context menus.

The following table describes the warehouse context menu options:

ノ Expand table

Menu option Option description

Open Opens warehouse to explore and analyze data.

Open with Opens warehouse in Azure Data Studio.

Share Lets users share the warehouse to build content based on the underlying default
Power BI semantic model, query data using SQL or get access to underlying data
files. Shares the warehouse access (SQL connections only, and autogenerated
semantic model) with other users in your organization. Users receive an email
with links to access the detail page where they can find the SQL connection
string and can access the default semantic model to create reports based on it.

Explore this Create an exploration to quickly visualize and analyze your data.
data
(preview)

Analyze in Uses the existing Analyze in Excel capability on default Power BI semantic model.
Excel

New report Build a report in DirectQuery mode.

Create Build detailed, print-ready PBI reports.


paginated
report

Favorite Mark specific items to quickly access them from your favorites list.

Rename Updates the warehouse with the new name. Does not apply to the SQL analytics
endpoint of the Lakehouse.
Menu option Option description

Delete Delete warehouse from workspace. A confirmation dialog notifies you of the
impact of the delete action. If the Delete action is confirmed, then the warehouse
and related downstream items are deleted. Does not apply to the SQL analytics
endpoint of the Lakehouse.

Move to Move specific items to a new folder.

Manage Enables users to add other recipients with specified permissions, similar to
permissions allowing the sharing of an underlying semantic model or allowing to build
content with the data associated with the underlying semantic model.

Copy SQL Copy the SQL connection string associated to the items in a specific workspace.
connection
string

Settings Learn more about warehouse settings in the previous section.

Query activity Access the query activity view to monitor your running and completed queries in
this warehouse.

View This option shows the end-to-end lineage of all items in this workspace from the
workspace data sources to the warehouse, the default Power BI semantic model, and other
lineage semantic models (if any) that were built on top of the warehouse, all the way to
deports, dashboards, and apps.

View item This option shows the end-to-end lineage of the specific warehouse selected
lineage from the data sources to the warehouse, the default Power BI semantic model,
and other semantic models (if any) that were built on top of the specific
warehouse, all the way to deports, dashboards, and apps.

View details Opens up warehouse details in Data hub.

Related content
Warehouse in Microsoft Fabric
Model data in the default Power BI semantic model in Microsoft Fabric
Create reports in the Power BI service
Admin portal

Feedback
Was this page helpful?  Yes  No
Provide product feedback | Ask the community
Source control with Warehouse
(preview)
Article • 07/19/2024

This article explains how Git integration and deployment pipelines work for warehouses
in Microsoft Fabric. Learn how to set up a connection to your repository, manage your
warehouses, and deploy them across different environments. Source control for Fabric
Warehouse is currently a preview feature.

You can use both Git integration and Deployment pipelines for different scenarios:

Use Git and SQL database projects to manage incremental change, team
collaboration, commit history in individual database objects.
Use deployment pipelines to promote code changes to different pre-production
and production environments.

Git integration
Git integration in Microsoft Fabric enables developers to integrate their development
processes, tools, and best practices directly into the Fabric platform. It allows developers
who are developing in Fabric to:

Backup and version their work


Revert to previous stages as needed
Collaborate with others or work alone using Git branches
Apply the capabilities of familiar source control tools to manage Fabric items

For more information on the Git integration process, see:

Fabric Git integration


Basic concepts in Git integration
Get started with Git integration (preview)

Set up a connection to source control


From the Workspace settings page, you can easily set up a connection to your repo to
commit and sync changes.

1. To set up the connection, see Get started with Git integration. Follow instructions
to Connect to a Git repo to either Azure DevOps or GitHub as a Git provider.
2. Once connected, your items, including warehouses, appear in the Source

control panel.
3. After you successfully connect the warehouse instances to the Git repo, you see
the warehouse folder structure in the repo. You can now execute future operations,
like creating a pull request.

Database projects for a warehouse in Git


The following image is an example of the file structure of each warehouse item in the
repo:

When you commit the warehouse item to the Git repo, the warehouse is converted to a
source code format, as a SQL database project. A SQL project is a local representation of
SQL objects that comprise the schema for a single database, such as tables, stored
procedures, or functions. The folder structure of the database objects is organized by
Schema/Object Type. Each object in the warehouse is represented with a .sql file that
contains its data definition language (DDL) definition. Warehouse table data and SQL
security features are not included in the SQL database project.

Shared queries are also committed to the repo and inherit the name that they are saved
as.

Download the SQL database project of a warehouse in


Fabric
With the SQL Database Projects extension available inside of Azure Data Studio and
Visual Studio Code , you can manage a warehouse schema, and handle Warehouse
object changes like other SQL database projects.

To download a local copy of your warehouse's schema, select Download SQL database
project in the ribbon.


The local copy of a database project that contains the definition of the warehouse
schema. The database project can be used to:

Recreate the warehouse schema in another warehouse.


Further develop the warehouse schema in client tools, like Azure Data Studio or
Visual Studio Code.

Publish SQL database project to a new warehouse


To publish the warehouse schema to a new warehouse:

1. Create a new warehouse in your Fabric workspace.


2. On the new warehouse launch page, under Build a warehouse, select SQL
database project.

3. Select the .zip file that was downloaded from the existing warehouse.
4. The warehouse schema is published to the new warehouse.

Deployment pipelines
You can also use deployment pipelines to deploy your warehouse code across different
environments, such as development, test, and production. Deployment pipelines don't
expose a database project.

Use the following steps to complete your warehouse deployment using the deployment
pipeline.

1. Create a new deployment pipeline or open an existing deployment pipeline. For


more information, see Get started with deployment pipelines.
2. Assign workspaces to different stages according to your deployment goals.
3. Select, view, and compare items including warehouses between different stages, as
shown in the following example.

4. Select Deploy to deploy your warehouses across the Development, Test, and
Production stages.

For more information about the Fabric deployment pipelines process, see Overview of
Fabric deployment pipelines.

Limitations in source control


SQL security features must be exported/migrated using a script-based approach.
Consider using a post-deployment script in a SQL database project, which you can
configure by opening the project with the SQL Database Projects extension
available inside of Azure Data Studio.

Limitations in Git integration


Currently, if you use ALTER TABLE to add a constraint or column in the database
project, the table will be dropped and recreated when deploying, resulting in data
loss. Consider the following workaround to preserve the table definition and data:
Create a new copy of the table in the warehouse, using CREATE TABLE and
INSERT , CREATE TABLE AS SELECT , or Clone table.

Modify the new table definition with new constraints or columns, as desired,
using ALTER TABLE .
Delete the old table.
Rename the new table to the name of the old table using sp_rename.
Modify the definition of the old table in the SQL database project in the exact
same way. The SQL database project of the warehouse in source control and the
live warehouse should now match.
Currently, do not create a Dataflow Gen2 with an output destination to the
warehouse. Committing and updating from Git would be blocked by a new item
named DataflowsStagingWarehouse that appears in the repository.
SQL analytics endpoint is not supported with Git integration.

Limitations for deployment pipelines

Currently, if you use ALTER TABLE to add a constraint or column in the database
project, the table will be dropped and recreated when deploying, resulting in data
loss.
Currently, do not create a Dataflow Gen2 with an output destination to the
warehouse. Deployment would be blocked by a new item named
DataflowsStagingWarehouse that appears in the deployment pipeline.

The SQL analytics endpoint is not supported in deployment pipelines.

Related content
Get started with Git integration (preview)
Basic concepts in Git integration
What is lifecycle management in Microsoft Fabric?
Tutorial: Set up dbt for Fabric Data Warehouse

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


Analyze Microsoft Fabric Warehouse
data with range bands
Article • 10/17/2024

Applies to: ✅ Warehouse in Microsoft Fabric

This article describes a query technique to summarize fact table data by using ranges of
fact or dimension table attributes. For example, you might need to determine sales
quantities by sale price. However, instead of grouping by each sale price, you want to
group by range bands of price, like:

$0.00 to $999.99
$1,000.00 to $4,999.99
and others…

 Tip

If you're inexperienced with dimensional modeling, consider reading the series of


articles on dimensional modeling as your first step to populating a data warehouse
with fact and dimension tables.

Step 1: Create a table to store range bands


First, you should create a table that stores one or more series of range bands.

SQL

CREATE TABLE [d_RangeBand]


(
[Series] VARCHAR(20) NOT NULL,
[RangeLabel] VARCHAR(50) NOT NULL,
[LowerBound] INT NOT NULL,
[UpperBound] INT NOT NULL
);

7 Note

Technically, this table isn't a dimension table. It's a helper table that organizes fact
or dimension data for analysis.
You should consider creating a composite primary key or unique constraint based on
the Series and RangeLabel columns to ensure that duplicate ranges within a series can't
be created. You should also verify that the lower and upper boundary values don't
overlap and that there aren't any gaps.

 Tip

You can add a RangeLabelSort column with an int data type if you need to control
the sort order of the range bands. This column will help you present the range
bands in a meaningful way, especially when the range label text values don't sort in
a logical order.

Step 2: Insert values into the range bands table


Next, you should insert one or more series of range bands into the range band table.

Here are some example range bands.

ノ Expand table

Series RangeLabel LowerBound UpperBound

Price $0.00 to $999.99 0 1,000

Price $1,000.00 to $4,999.99 1,000 5,000

Price $5,000.00 or above 5,000 9,999,999

Age 0 to 19 years 0 20

Age 20 to 39 years 20 40

Age 40 to 59 years 40 60

Age 60 or above 60 999

Step 3: Query by range bands


Lastly, you run a query statement that uses the range band table.

The following example queries the f_Sales fact table by joining it with the d_RangeBand
table and sums the fact quantity values. It filters the d_RangeBand table by the Price
series, and groups by the range labels.
SQL

SELECT
[r].[RangeLabel],
SUM([s].[Quantity]) AS [Quantity]
FROM
[d_RangeBand] AS [r],
[f_Sales] AS [s]
WHERE
[r].[Series] = 'Price'
AND [s].[UnitPrice] >= [r].[LowerBound]
AND [s].[UnitPrice] < [r].[UpperBound]
GROUP BY
[r].[RangeLabel];

) Important

Pay close attention to the logical operators used to determine the matching range
band in the WHERE clause. In this example, the lower boundary value is inclusive and
the upper boundary is exclusive. That way, there won't be any overlap of ranges or
gaps between ranges. The appropriate operators will depend on the boundary
values you store in your range band table.

Related content
Dimensional modeling in Microsoft Fabric Warehouse
Design tables in Warehouse in Microsoft Fabric
Data types in Microsoft Fabric

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


Dimensional modeling in Microsoft
Fabric Warehouse
Article • 06/21/2024

Applies to: SQL analytics endpoint and Warehouse in Microsoft Fabric

This article is the first in a series about dimensional modeling inside a warehouse. It
provides practical guidance for Warehouse in Microsoft Fabric, which is an experience
that supports many T-SQL capabilities, like creating tables and managing data in tables.
So, you're in complete control of creating your dimensional model tables and loading
them with data.

7 Note

In this article, the term data warehouse refers to an enterprise data warehouse,
which delivers comprehensive integration of critical data across the organization. In
contrast, the standalone term warehouse refers to a Fabric Warehouse, which is a
software as a service (SaaS) relational database offering that you can use to
implement a data warehouse. For clarity, in this article the latter is mentioned as
Fabric Warehouse.

 Tip

If you're inexperienced with dimensional modeling, consider that this series of


articles is your first step. It isn't intended to provide a complete discussion on
dimensional modeling design. For more information, refer directly to widely
adopted published content, like The Data Warehouse Toolkit: The Definitive Guide to
Dimensional Modeling (3rd edition, 2013) by Ralph Kimball, and others.

Star schema design


Star schema is a dimensional modeling design technique adopted by relational data
warehouses. It's a recommended design approach to take when creating a Fabric
Warehouse. A star schema comprises fact tables and dimension tables.

Dimension tables describe the entities relevant to your organization and analytics
requirements. Broadly, they represent the things that you model. Things could be
products, people, places, or any other concept, including date and time. For more
information and design best practices, see Dimension tables in this series.
Fact tables store measurements associated with observations or events. They can
store sales orders, stock balances, exchange rates, temperature readings, and
more. Fact tables contain dimension keys together with granular values that can be
aggregated. For more information and design best practices, see Fact tables in this
series.

A star schema design is optimized for analytic query workloads. For this reason, it's
considered a prerequisite for enterprise Power BI semantic models. Analytic queries are
concerned with filtering, grouping, sorting, and summarizing data. Fact data is
summarized within the context of filters and groupings of the related dimension tables.

The reason why it's called a star schema is because a fact table forms the center of a star
while the related dimension tables form the points of the star.

A star schema often contains multiple fact tables, and therefore multiple stars.

A well-designed star schema delivers high performance (relational) queries because of


fewer table joins, and the higher likelihood of useful indexes. Also, a star schema often
requires low maintenance as the data warehouse design evolves. For example, adding a
new column to a dimension table to support analysis by a new attribute is a relatively
simple task to perform. As is adding new facts and dimensions as the scope of the data
warehouse evolves.
Periodically, perhaps daily, the tables in a dimensional model are updated and loaded by
an Extract, Transform, and Load (ETL) process. This process synchronizes its data with the
source systems, which store operational data. For more information, see Load tables in
this series.

Dimensional modeling for Power BI


For enterprise solutions, a dimensional model in a Fabric Warehouse is a recommended
prerequisite for creating a Power BI semantic model. Not only does the dimensional
model support the semantic model, but it's also a source of data for other experiences,
like machine learning models.

However, in specific circumstances it might not be the best approach. For example, self-
service analysts who need freedom and agility to act quickly, and without dependency
on IT, might create semantic models that connect directly to source data. In such cases,
the theory of dimensional modeling is still relevant. That theory helps analysts create
intuitive and efficient models, while avoiding the need to create and load a dimensional
model in a data warehouse. Instead, a quasi-dimensional model can be created by using
Power Query, which defines the logic to connect to, and transform, source data to create
and load the semantic model tables. For more information, see Understand star schema
and the importance for Power BI.

) Important

When you use Power Query to define a dimensional model in the semantic model,
you aren't able to manage historical change, which might be necessary to analyze
the past accurately. If that's a requirement, you should create a data warehouse and
allow periodic ETL processes to capture and appropriately store dimension
changes.

Planning for a data warehouse


You should approach the creation of a data warehouse and the design of a dimension
model as a serious and important undertaking. That's because the data warehouse is a
core component of your data platform. It should form a solid foundation that supports
analytics and reporting—and therefore decision making—for your entire organization.

To this end, your data warehouse should strive to store quality, conformed, and
historically accurate data as a single version of the truth. It should deliver understandable
and navigable data with fast performance, and enforce permissions so that the right
data can only ever be accessed by the right people. Strive to design your data
warehouse for resilience, allowing it to adapt to change as your requirements evolve.

The successful implementation of a data warehouse depends on good planning. For


information about strategic and tactical considerations, and action items that lead to the
successful adoption of Fabric and your data warehouse, see the Microsoft Fabric
adoption roadmap.

 Tip

We recommend that you build out your enterprise data warehouse iteratively. Start
with the most important subject areas first, and then over time, according to
priority and resources, extend the data warehouse with other subject areas.

Related content
In the next article in this series, learn about guidance and design best practices for
dimension tables.

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


Dimensional modeling in Microsoft
Fabric Warehouse: Dimension tables
Article • 06/21/2024

Applies to: SQL analytics endpoint and Warehouse in Microsoft Fabric

7 Note

This article forms part of the Dimensional modeling series of articles. This series
focuses on guidance and design best practices related to dimensional modeling in
Microsoft Fabric Warehouse.

This article provides you with guidance and best practices for designing dimension
tables in a dimensional model. It provides practical guidance for Warehouse in Microsoft
Fabric, which is an experience that supports many T-SQL capabilities, like creating tables
and managing data in tables. So, you're in complete control of creating your
dimensional model tables and loading them with data.

7 Note

In this article, the term data warehouse refers to an enterprise data warehouse,
which delivers comprehensive integration of critical data across the organization. In
contrast, the standalone term warehouse refers to a Fabric Warehouse, which is a
software as a service (SaaS) relational database offering that you can use to
implement a data warehouse. For clarity, in this article the latter is mentioned as
Fabric Warehouse.

 Tip

If you're inexperienced with dimensional modeling, consider this series of articles


your first step. It isn't intended to provide a complete discussion on dimensional
modeling design. For more information, refer directly to widely adopted published
content, like The Data Warehouse Toolkit: The Definitive Guide to Dimensional
Modeling (3rd edition, 2013) by Ralph Kimball, and others.

In a dimensional model, a dimension table describes an entity relevant to your business


and analytics requirements. Broadly, dimension tables represent the things that you
model. Things could be products, people, places, or any other concept, including date
and time. To easily identify dimension tables, you typically prefix their names with d_ or
Dim_ .

Dimension table structure


To describe the structure of a dimension table, consider the following example of a
salesperson dimension table named d_Salesperson . This example applies good design
practices. Each of the groups of columns is described in the following sections.

SQL

CREATE TABLE d_Salesperson


(
--Surrogate key
Salesperson_SK INT NOT NULL,

--Natural key(s)
EmployeeID VARCHAR(20) NOT NULL,

--Dimension attributes
FirstName VARCHAR(20) NOT NULL,
<…>

--Foreign key(s) to other dimensions


SalesRegion_FK INT NOT NULL,
<…>

--Historical tracking attributes (SCD type 2)


RecChangeDate_FK INT NOT NULL,
RecValidFromKey INT NOT NULL,
RecValidToKey INT NOT NULL,
RecReason VARCHAR(15) NOT NULL,
RecIsCurrent BIT NOT NULL,

--Audit attributes
AuditMissing BIT NOT NULL,
AuditIsInferred BIT NOT NULL,
AuditCreatedDate DATE NOT NULL,
AuditCreatedBy VARCHAR(15) NOT NULL,
AuditLastModifiedDate DATE NOT NULL,
AuditLastModifiedBy VARCHAR(15) NOT NULL
);

Surrogate key
The sample dimension table has a surrogate key, which is named Salesperson_SK . A
surrogate key is a single-column unique identifier that's generated and stored in the
dimension table. It's a primary key column used to relate to other tables in the
dimensional model.

Surrogate keys strive to insulate the data warehouse from changes in source data. They
also deliver many other benefits, allowing you to:

Consolidate multiple data sources (avoiding clash of duplicate identifiers).


Consolidate multi-column natural keys into a more efficient, single-column key.
Track dimension history with a slowly changing dimension (SCD) type 2.
Limit fact table width for storage optimization (by selecting the smallest possible
integer data type).

A surrogate key column is a recommended practice, even when a natural key (described
next) seems an acceptable candidate. You should also avoid giving meaning to the key
values (except for date and time dimension keys, as described later).

Natural keys
The sample dimension table also has a natural key, which is named EmployeeID . A
natural key is the key stored in the source system. It allows relating the dimension data
to its source system, which is typically done by an Extract, Load, and Transform (ETL)
process to load the dimension table. Sometimes a natural key is called a business key,
and its values might be meaningful to business users.

Sometimes dimensions don't have a natural key. That could be the case for your date
dimension or lookup dimensions, or when you generate dimension data by normalizing
a flat file.

Dimension attributes
A sample dimension table also has dimension attributes, like the FirstName column.
Dimension attributes provide context to the numeric data stored in related fact tables.
They're typically text columns that are used in analytic queries to filter and group (slice
and dice), but not to be aggregated themselves. Some dimension tables contain few
attributes, while others contain many attributes (as many as it takes to support the
query requirements of the dimensional model).

 Tip

A good way to determine which dimensions and attributes you need is to find the
right people and ask the right questions. Specifically, stay alert for the mention of
the word by. For example, when someone says they need to analyze sales by
salesperson, by month, and by product category, they're telling you that they need
dimensions that have those attributes.

If you plan to create a Direct Lake semantic model, you should include all possible
columns required for filtering and grouping as dimension attributes. That's because
Direct Lake semantic models don't support calculated columns.

Foreign keys
The sample dimension table also has a foreign key, which is named SalesRegion_FK .
Other dimension tables can reference a foreign key, and their presence in a dimension
table is a special case. It indicates that the table is related to another dimension table,
meaning that it might form part of a snowflake dimension or it's related to an outrigger
dimension.

Fabric Warehouse supports foreign key constraints but they can't be enforced.
Therefore, it's important that your ETL process tests for integrity between related tables
when data is loaded.

It's still a good idea to create foreign keys. One good reason to create unenforced
foreign keys is to allow modeling tools, like Power BI Desktop, to automatically detect
and create relationships between tables in the semantic model.

Historical tracking attributes


The sample dimension table also has various historical tracking attributes. Historical
tracking attributes are optional based on your need to track specific changes as they
occur in the source system. They allow storing values to support the primary role of a
data warehouse, which is to describe the past accurately. Specifically, these attributes
store historical context as the ETL process loads new or changed data into the
dimension.

For more information, see Manage historical change later in this article.

Audit attributes
The sample dimension table also has various audit attributes. Audit attributes are
optional but recommended. They allow you to track when and how dimension records
were created or modified, and they can include diagnostic or troubleshooting
information raised during ETL processes. For example, you'll want to track who (or what
process) updated a row, and when. Audit attributes can also help diagnose a
challenging problem, like when an ETL process stops unexpectedly. They can also flag
dimension members as errors or inferred members.

Dimension table size


Often, the most useful and versatile dimensions in a dimensional model are big, wide
dimensions. They're big in terms of rows (in excess of millions) and wide in terms of the
number of dimension attributes (potentially hundreds). Size isn't so important (although
you should design and optimize for the smallest possible size). What matters is that the
dimension supports the required filtering, grouping, and accurate historical analysis of
fact data.

Big dimensions might be sourced from multiple source systems. In this case, dimension
processing needs to combine, merge, deduplicate, and standardize the data; and assign
surrogate keys.

By comparison, some dimensions are tiny. They might represent lookup tables that
contain only several records and attributes. Often these small dimensions store category
values related to transactions in fact tables, and they're implemented as dimensions with
surrogate keys to relate to the fact records.

 Tip

When you have many small dimensions, consider consolidating them into a junk
dimension.

Dimension design concepts


This section describes various dimension design concepts.

Denormalization vs. normalization


It's almost always the case that dimension tables should be denormalized. While
normalization is the term used to describe data that's stored in a way that reduces
repetitious data, denormalization is the term used to define where precomputed
redundant data exists. Redundant data exists typically due to the storage of hierarchies
(discussed later), meaning that hierarchies are flattened. For example, a product
dimension could store subcategory (and its related attributes) and category (and its
related attributes).
Because dimensions are generally small (when compared to fact tables), the cost of
storing redundant data is almost always outweighed by the improved query
performance and usability.

Snowflake dimensions
One exception to denormalization is to design a snowflake dimension. A snowflake
dimension is normalized, and it stores the dimension data across several related tables.

The following diagram depicts a snowflake dimension that comprises three related
dimension tables: Product , Subcategory , and Category .

Consider implementing a snowflake dimension when:

The dimension is extremely large and storage costs outweigh the need for high
query performance. (However, periodically reassess that this still remains the case.)
You need keys to relate the dimension to higher-grain facts. For example, the sales
fact table stores rows at product level, but the sales target fact table stores rows at
subcategory level.
You need to track historical changes at higher levels of granularity.
7 Note

Bear in mind that a hierarchy in a Power BI semantic model can only be based on
columns from a single semantic model table. Therefore, a snowflake dimension
should deliver a denormalized result by using a view that joins the snowflake tables
together.

Hierarchies
Commonly, dimension columns produce hierarchies. Hierarchies enable exploring data
at distinct levels of summarization. For example, the initial view of a matrix visual might
show yearly sales, and the report consumer can choose to drill down to reveal quarterly
and monthly sales.

There are three ways to store a hierarchy in a dimension. You can use:

Columns from a single, denormalized dimension.


A snowflake dimension, which comprises multiple related tables.
A parent-child (self-referencing) relationship in a dimension.

Hierarchies can be balanced or unbalanced. It's also important to understand that some
hierarchies are ragged.

Balanced hierarchies
Balanced hierarchies are the most common type of hierarchy. A balanced hierarchy has
the same number of levels. A common example of a balanced hierarchy is a calendar
hierarchy in a date dimension that comprises levels for year, quarter, month, and date.

The following diagram depicts a balanced hierarchy of sales regions. It comprises two
levels, which are sales region group and sales region.
Levels of a balanced hierarchy are either based on columns from a single, denormalized
dimension, or from tables that form a snowflake dimension. When based on a single,
denormalized dimension, the columns that represent the higher levels contain
redundant data.

For balanced hierarchies, facts always relate to a single level of the hierarchy, which is
typically the lowest level. That way, the facts can be aggregated (rolled up) to the
highest level of the hierarchy. Facts can relate to any level, which is determined by the
grain of the fact table. For example, the sales fact table might be stored at date level,
while the sales target fact table might be stored at quarter level.

Unbalanced hierarchies

Unbalanced hierarchies are a less common type of hierarchy. An unbalanced hierarchy


has levels based on a parent-child relationship. For this reason, the number of levels in
an unbalanced hierarchy is determined by the dimension rows, and not specific
dimension table columns.
A common example of an unbalanced hierarchy is an employee hierarchy where each
row in an employee dimension relates to a reporting manager row in the same table. In
this case, any employee can be a manager with reporting employees. Naturally, some
branches of the hierarchy will have more levels than others.

The following diagram depicts an unbalanced hierarchy. It comprises four levels, and
each member in the hierarchy is a salesperson. Notice that salespeople have a different
number of ancestors in the hierarchy according to who they report to.

Other common examples of unbalanced hierarchies include bill of materials, company


ownership models, and general ledger.

For unbalanced hierarchies, facts always relate to the dimension grain. For example,
sales facts relate to different salespeople, who have different reporting structures. The
dimension table would have a surrogate key (named Salesperson_SK ) and a
ReportsTo_Salesperson_FK foreign key column, which references the primary key

column. Each salesperson without anyone to manage isn't necessarily at the lowest level
of any branch of the hierarchy. When they're not at the lowest level, a salesperson might
sell products and have reporting salespeople who also sell products. So, the rollup of
fact data must consider the individual salesperson and all their descendants.

Querying parent-child hierarchies can be complex and slow, especially for large
dimensions. While the source system might store relationships as parent-child, we
recommend that you naturalize the hierarchy. In this instance, naturalize means to
transform and store the hierarchy levels in the dimension as columns.

 Tip

If you choose not to naturalize the hierarchy, you can still create a hierarchy based
on a parent-child relationship in a Power BI semantic model. However, this
approach isn't recommended for large dimensions. For more information, see
Understanding functions for parent-child hierarchies in DAX.

Ragged hierarchies
Sometimes a hierarchy is ragged because the parent of a member in the hierarchy exists
at a level that's not immediately above it. In these cases, missing level values repeat the
value of the parent.

Consider an example of a balanced geography hierarchy. A ragged hierarchy exists


when a country/region has no states or provinces. For example, New Zealand has
neither states nor provinces. So, when you insert the New Zealand row, you should also
store the country/region value in the StateProvince column.

The following diagram depicts a ragged hierarchy of geographical regions.


Manage historical change
When necessary, historical change can be managed by implementing a slowly changing
dimension (SCD). An SCD maintains historical context as new, or changed data, is loaded
into it.

Here are the most common SCD types.

Type 1: Overwrite the existing dimension member.


Type 2: Insert a new time-based versioned dimension member.
Type 3: Track limited history with attributes.

It's possible that a dimension could support both SCD type 1 and SCD type 2 changes.

SCD type 3 isn't commonly used, in part due to the fact that it's difficult to use in a
semantic model. Consider carefully whether an SCD type 2 approach would be a better
fit.
 Tip

If you anticipate a rapidly changing dimension, which is a dimension that has an


attribute that changes frequently, consider adding that attribute to the fact table
instead. If the attribute is numeric, like the product price, you can add it as a
measure in the fact table. If the attribute is a text value, you can create a dimension
based on all text values and add its dimension key to the fact table.

SCD type 1
SCD type 1 changes overwrite the existing dimension row because there's no need to
keep track of changes. This SCD type can also be used to correct errors. It's a common
type of SCD, and it should be used for most changing attributes, like customer name,
email address, and others.

The following diagram depicts the before and after state of a salesperson dimension
member where their phone number has changed.

This SCD type doesn't preserve historical perspective because the existing row is
updated. That means SCD type 1 changes can result in different higher-level
aggregations. For example, if a salesperson is assigned to a different sales region, a SCD
type 1 change would overwrite the dimension row. The rollup of salespeople historic
sales results to region would then produce a different outcome because it now uses the
new current sales region. It's as if that salesperson was always assigned to the new sales
region.

SCD type 2
SCD type 2 changes result in new rows that represent a time-based version of a
dimension member. There's always a current version row, and it reflects the state of the
dimension member in the source system. Historical tracking attributes in the dimension
table store values that allow identifying the current version (current flag is TRUE ) and its
validity time period. A surrogate key is required because there will be duplicate natural
keys when multiple versions are stored.

It's a common type of SCD, but it should be reserved for attributes that must preserve
historical perspective.

For example, if a salesperson is assigned to a different sales region, an SCD type 2


change involves an update operation and an insert operation.

1. The update operation overwrites the current version to set the historical tracking
attributes. Specifically, the end validity column is set to the ETL processing date (or
a suitable timestamp in the source system) and the current flag is set to FALSE .
2. The insert operation adds a new, current version, setting the start validity column
to the end validity column value (used to update the prior version) and the current
flag to TRUE .

It's important to understand that the granularity of related fact tables isn't at the
salesperson level, but rather the salesperson version level. The rollup of their historic
sales results to region will produce correct results but there will be two (or more)
salesperson member versions to analyze.

The following diagram depicts the before and after state of a salesperson dimension
member where their sales region has changed. Because the organization wants to
analyze salespeople effort by the region they're assigned to, it triggers an SCD type 2
change.
 Tip

When a dimension table supports SCD type 2 changes, you should include a label
attribute that describes the member and the version. Consider an example when
the salesperson Lynn Tsoflias from Adventure Works changes assignment from the
Australian sales region to the United Kingdom sales region. The label attribute for
the first version could read "Lynn Tsoflias (Australia)" and the label attribute for the
new, current version could read "Lynn Tsoflias (United Kingdom)." If helpful, you
might include the validity dates in the label too.

You should balance the need for historic accuracy versus usability and efficiency. Try
to avoid having too many SCD type 2 changes on a dimension table because it can
result in an overwhelming number of versions that might make it difficult for
analysts to comprehend.

Also, too many versions could indicate that a changing attribute might be better
stored in the fact table. Extending the earlier example, if sales region changes were
frequent, the sales region could be stored as a dimension key in the fact table
rather than implementing an SCD type 2.

Consider the following SCD type 2 historical tracking attributes.

SQL

CREATE TABLE d_Salesperson


(
<…>

--Historical tracking attributes (SCD type 2)


RecChangeDate_FK INT NOT NULL,
RecValidFromKey INT NOT NULL,
RecValidToKey INT NOT NULL,
RecReason VARCHAR(15) NOT NULL,
RecIsCurrent BIT NOT NULL,

<…>
);

Here are the purposes of the historical tracking attributes.

The RecChangeDate_FK column stores the date when the change came into effect. It
allows you to query when changes took place.
The RecValidFromKey and RecValidToKey columns store the effective dates of
validity for the row. Consider storing the earliest date found in the date dimension
for RecValidFromKey to represent the initial version, and storing 01/01/9999 for the
RecValidToKey of the current versions.

The RecReason column is optional. It allows documenting the reason why the
version was inserted. It could encode which attributes changed, or it could be a
code from the source system that states a particular business reason.
The RecIsCurrent column makes it possible to retrieve current versions only. It's
used when the ETL process looks up dimension keys when loading fact tables.

7 Note

Some source systems don't store historical changes, so it's important that the
dimension is processed regularly to detect changes and implement new versions.
That way, you can detect changes shortly after they occur, and their validity dates
will be accurate.

SCD type 3

SCD type 3 changes track limited history with attributes. This approach can be useful
when there's a need to record the last change, or a number of the latest changes.

This SCD type preserves limited historical perspective. It might be useful when only the
initial and current values should be stored. In this instance, interim changes wouldn't be
required.

For example, if a salesperson is assigned to a different sales region, an SCD type 3


change overwrites the dimension row. A column that specifically stores the previous
sales region is set as the previous sales region, and the new sales region is set as the
current sales region.

The following diagram depicts the before and after state of a salesperson dimension
member where their sales region has changed. Because the organization wants to
determine any previous sales region assignment, it triggers an SCD type 3 change.
Special dimension members
You might insert rows into a dimension that represent missing, unknown, N/A, or error
states. For example, you might use the following surrogate key values.

ノ Expand table

Key value Purpose

0 Missing (not available in the source system)

-1 Unknown (lookup failure during a fact table load)

-2 N/A (not applicable)

-3 Error

Calendar and time


Almost without exception, fact tables store measures at specific points in time. To
support analysis by date (and possibly time), there must be calendar (date and time)
dimensions.

It's uncommon that a source system would have calendar dimension data, so it must be
generated in the data warehouse. Typically, it's generated once, and if it's a calendar
dimension, it's extended with future dates when needed.

Date dimension
The date (or calendar) dimension is the most common dimension used for analysis. It
stores one row per date, and it supports the common requirement to filter or group by
specific periods of dates, like years, quarters, or months.

) Important

A date dimension shouldn't include a grain that extends to time of day. If time of
day analysis is required, you should have both a date dimension and a time
dimension (described next). Fact tables that store time of day facts should have two
foreign keys, one to each of these dimensions.

The natural key of the date dimension should use the date data type. The surrogate key
should store the date by using YYYYMMDD format and the int data type. This accepted
practice should be the only exception (alongside the time dimension) when the
surrogate key value has meaning and is human readable. Storing YYYYMMDD as an int
data type is not only efficient and sorted numerically, but it also conforms to the
unambiguous International Standards Organization (ISO) 8601 date format.

Here are some common attributes to include in a date dimension.

Year , Quarter , Month , Day


QuarterNumberInYear , MonthNumberInYear – which might be required to sort text

labels.
FiscalYear , FiscalQuarter – some corporate accounting schedules start mid-year,

so that the start/end of the calendar year and the fiscal year are different.
FiscalQuarterNumberInYear , FiscalMonthNumberInYear – which might be required
to sort text labels.
WeekOfYear – there are multiple ways to label the week of year, including an ISO

standard that has either 52 or 53 weeks.


IsHoliday , HolidayText – if your organization operates in multiple geographies,

you should maintain multiple sets of holiday lists that each geography observes as
a separate dimension or naturalized in multiple attributes in the date dimension.
Adding a HolidayText attribute could help identify holidays for reporting.
IsWeekday – similarly, in some geographies, the standard work week isn't Monday

to Friday. For example, the work week is Sunday to Thursday in many Middle
Eastern regions, while other regions employ a four-day or six-day work week.
LastDayOfMonth

RelativeYearOffset , RelativeQuarterOffset , RelativeMonthOffset ,


RelativeDayOffset – which might be required to support relative date filtering (for
example, previous month). Current periods use an offset of zero (0); previous
periods store offsets of -1, -2, -3…; future periods store offsets of 1, 2, 3….

As with any dimension, what's important is that it contains attributes that support the
known filtering, grouping, and hierarchy requirements. There might also be attributes
that store translations of labels into other languages.

When the dimension is used to relate to higher-grain facts, the fact table can use the
first date of the date period. For example, a sales target fact table that stores quarterly
salespeople targets would store the first date of the quarter in the date dimension. An
alternative approach is to create key columns in the date table. For example, a quarter
key could store the quarter key by using YYYYQ format and the smallint data type.

The dimension should be populated with the known range of dates used by all fact
tables. It should also include future dates when the data warehouse stores facts about
targets, budgets, or forecasts. As with other dimensions, you might include rows that
represent missing, unknown, N/A, or error situations.

 Tip

Search the internet for "date dimension generator" to find scripts and spreadsheets
that generate date data.

Typically, at the beginning of the next year, the ETL process should extend the date
dimension rows to a specific number of years ahead. When the dimension includes
relative offset attributes, the ETL process must be run daily to update offset attribute
values based on the current date (today).

Time dimension

Sometimes, facts need to be stored at a point in time (as in time of day). In this case,
create a time (or clock) dimension. It could have a grain of minutes (24 x 60 = 1,440
rows) or even seconds (24 x 60 x 60 = 86,400 rows). Other possible grains include half
hour or hour.

The natural key of a time dimension should use the time data type. The surrogate key
could use an appropriate format and store values that have meaning and are human
readable, for example, by using the HHMM or HHMMSS format.

Here are some common attributes to include in a time dimension.

Hour , HalfHour , QuarterHour , Minute


Time period labels (morning, afternoon, evening, night)
Work shift names
Peak or off-peak flags

Conformed dimensions
Some dimensions might be conformed dimensions. Conformed dimensions relate to
many fact tables, and so they're shared by multiple stars in a dimensional model. They
deliver consistency and can help you to reduce ongoing development and maintenance.

For example, it's typical that fact tables store at least one date dimension key (because
activity is almost always recorded by date and/or time). For that reason, a date
dimension is a common conformed dimension. You should therefore ensure that your
date dimension includes attributes relevant for the analysis of all fact tables.

The following diagram shows the Sales fact table and the Inventory fact table. Each
fact table relates to the Date dimension and Product dimension, which are conformed
dimensions.

As another example, your employee and users could be the same set of people. In this
case, it might make sense to combine the attributes of each entity to produce one
conformed dimension.

Role-playing dimensions
When a dimension is referenced multiple times in a fact table, it's known as a role-
playing dimension.

For example, when a sales fact table has order date, ship date, and delivery date
dimension keys, the date dimension relates in three ways. Each way represents a distinct
role, yet there's only one physical date dimension.

The following diagram depicts a Flight fact table. The Airport dimension is a role-
playing dimension because it's related twice to the fact table as the Departure Airport
dimension and the Arrival Airport dimension.

Junk dimensions
A junk dimension is useful when there are many independent dimensions, especially
when they comprise a few attributes (perhaps one), and when these attributes have low
cardinality (few values). The objective of a junk dimension is to consolidate many small
dimensions into a single dimension. This design approach can reduce the number of
dimensions, and decrease the number of fact table keys and thus fact table storage size.
They also help to reduce Data pane clutter because they present fewer tables to users.

A junk dimension table typically stores the Cartesian product of all dimension attribute
values, with a surrogate key attribute.

Good candidates include flags and indicators, order status, and customer demographic
states (gender, age group, and others).

The following diagram depicts a junk dimension named Sales Status that combines
order status values and delivery status values.

Degenerate dimensions
A degenerate dimension can occur when the dimension is at the same grain as the
related facts. A common example of a degenerate dimension is a sales order number
dimension that relates to a sales fact table. Typically, the invoice number is a single, non-
hierarchical attribute in the fact table. So, it's an accepted practice not to copy this data
to create a separate dimension table.

The following diagram depicts a Sales Order dimension that's a degenerate dimension
based on the SalesOrderNumber column in a sales fact table. This dimension is
implemented as a view that retrieves the distinct sales order number values.
 Tip

It's possible to create a view in a Fabric Warehouse that presents the degenerate
dimension as a dimension for querying purposes.

From a Power BI semantic modeling perspective, a degenerate dimension can be


created as a separate table by using Power Query. That way, the semantic model
conforms to the best practice that fields used to filter or group are sourced from
dimension tables, and fields used to summarize facts are sourced from fact tables.

Outrigger dimensions
When a dimension table relates to other dimension tables, it's known as an outrigger
dimension. An outrigger dimension can help to conform and reuse definitions in the
dimensional model.

For example, you could create a geography dimension that stores geographic locations
for every postal code. That dimension could then be referenced by your customer
dimension and salesperson dimension, which would store the surrogate key of the
geography dimension. That way, customers and salespeople could then be analyzed by
using consistent geographic locations.

The following diagram depicts a Geography dimension that's an outrigger dimension. It


doesn't relate directly to the Sales fact table. Instead, it's related indirectly via the
Customer dimension and the Salesperson dimension.
Consider that the date dimension can be used as an outrigger dimension when other
dimension table attributes store dates. For example, the birth date in a customer
dimension could be stored by using the surrogate key of the date dimension table.

Multivalued dimensions
When a dimension attribute must store multiple values, you need to design a
multivalued dimension. You implement a multivalued dimension by creating a bridge
table (sometimes called a join table). A bridge table stores a many-to-many relationship
between entities.

For example, consider there's a salesperson dimension, and that each salesperson is
assigned to one or possibly more sales regions. In this case, it makes sense to create a
sales region dimension. That dimension stores each sales region only once. A separate
table, known as the bridge table, stores a row for each salesperson and sales region
relationship. Physically, there's a one-to-many relationship from the salesperson
dimension to the bridge table, and another one-to-many relationship from the sales
region dimension to the bridge table. Logically, there's a many-to-many relationship
between salespeople and sales regions.

In the following diagram, the Account dimension table relates to the Transaction fact
table. Because customers can have multiple accounts and accounts can have multiple
customers, the Customer dimension table is related via the Customer Account bridge
table.

Related content
In the next article in this series, learn about guidance and design best practices for fact
tables.

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


Dimensional modeling in Microsoft
Fabric Warehouse: Fact tables
Article • 06/21/2024

Applies to: SQL analytics endpoint and Warehouse in Microsoft Fabric

7 Note

This article forms part of the Dimensional modeling series of articles. This series
focuses on guidance and design best practices related to dimensional modeling in
Microsoft Fabric Warehouse.

This article provides you with guidance and best practices for designing fact tables in a
dimensional model. It provides practical guidance for Warehouse in Microsoft Fabric,
which is an experience that supports many T-SQL capabilities, like creating tables and
managing data in tables. So, you're in complete control of creating your dimensional
model tables and loading them with data.

7 Note

In this article, the term data warehouse refers to an enterprise data warehouse,
which delivers comprehensive integration of critical data across the organization. In
contrast, the standalone term warehouse refers to a Fabric Warehouse, which is a
software as a service (SaaS) relational database offering that you can use to
implement a data warehouse. For clarity, in this article the latter is mentioned as
Fabric Warehouse.

 Tip

If you're inexperienced with dimensional modeling, consider this series of articles


your first step. It isn't intended to provide a complete discussion on dimensional
modeling design. For more information, refer directly to widely adopted published
content, like The Data Warehouse Toolkit: The Definitive Guide to Dimensional
Modeling (3rd edition, 2013) by Ralph Kimball, and others.

In a dimensional model, a fact table stores measurements associated with observations


or events. It could store sales orders, stock balances, exchange rates, temperature
readings, and more.
Fact tables include measures, which are typically numeric columns, like sales order
quantity. Analytic queries summarize measures (by using sum, count, average, and other
functions) within the context of dimension filters and groupings.

Fact tables also include dimension keys, which determine the dimensionality of the facts.
The dimension key values determine the granularity of the facts, which is the atomic
level by which facts are defined. For example, an order date dimension key in a sales fact
table sets the granularity of the facts at date level, while a target date dimension key in a
sales target fact table could set the granularity at quarter level.

7 Note

While it's possible to store facts at a higher granularity, it's not easy to split out
measure values to lower levels of granularity (if required). Sheer data volumes,
together with analytic requirements, might provide valid reason to store higher
granularity facts but at the expense of detailed analysis.

To easily identify fact tables, you typically prefix their names with f_ or Fact_ .

Fact table structure


To describe the structure of a fact table, consider the following example of a sales fact
table named f_Sales . This example applies good design practices. Each of the groups of
columns is described in the following sections.

SQL

CREATE TABLE f_Sales


(
--Dimension keys
OrderDate_Date_FK INT NOT NULL,
ShipDate_Date_FK INT NOT NULL,
Product_FK INT NOT NULL,
Salesperson_FK INT NOT NULL,
<…>

--Attributes
SalesOrderNo INT NOT NULL,
SalesOrderLineNo SMALLINT NOT NULL,

--Measures
Quantity INT NOT NULL,
<…>

--Audit attributes
AuditMissing BIT NOT NULL,
AuditCreatedDate DATE NOT NULL,
AuditCreatedBy VARCHAR(15) NOT NULL,
AuditLastModifiedDate DATE NOT NULL,
AuditLastModifiedBy VARCHAR(15) NOT NULL
);

Primary key
As is the case in the example, the sample fact table doesn't have a primary key. That's
because it doesn't typically serve a useful purpose, and it would unnecessarily increase
the table storage size. A primary key is often implied by the set of dimension keys and
attributes.

Dimension keys
The sample fact table has various dimension keys, which determine the dimensionality of
the fact table. Dimension keys are references to the surrogate keys (or higher-level
attributes) in the related dimensions.

7 Note

It's an unusual fact table that doesn't include at least one date dimension key.

A fact table can reference a dimension multiple times. In this case, it's known as a role-
playing dimension. In this example, the fact table has the OrderDate_Date_FK and
ShipDate_Date_FK dimension keys. Each dimension key represents a distinct role, yet

there's only one physical date dimension.

It's a good practice to set each dimension key as NOT NULL . During the fact table load,
you can use special dimension members to represent missing, unknown, N/A, or error
states (if necessary).

Attributes
The sample fact table has two attributes. Attributes provide additional information and
set the granularity of fact data, but they're neither dimension keys nor dimension
attributes, nor measures. In this example, attribute columns store sales order
information. Other examples could include tracking numbers or ticket numbers. For
analysis purposes, an attribute could form a degenerate dimension.
Measures
The sample fact table also has measures, like the Quantity column. Measure columns
are typically numeric and commonly additive (meaning they can be summed, and
summarized by using other aggregations). For more information, see Measure types
later in this article.

Audit attributes
The sample fact table also has various audit attributes. Audit attributes are optional.
They allow you to track when and how fact records were created or modified, and they
can include diagnostic or troubleshooting information raised during Extract, Transform,
and Load (ETL) processes. For example, you'll want to track who (or what process)
updated a row, and when. Audit attributes can also help diagnose a challenging
problem, like when an ETL process stops unexpectedly.

Fact table size


Fact tables vary in size. Their size corresponds to the dimensionality, granularity, number
of measures, and amount of history. In comparison to dimension tables, fact tables are
more narrow (fewer columns) but big or even immense in terms of rows (in excess of
billions).

Fact design concepts


This section describes various fact design concepts.

Fact table types


There are three types of fact tables:

Transaction fact tables


Periodic snapshot fact tables
Accumulating snapshot fact tables

Transaction fact tables


A transaction fact table stores business events or transactions. Each row stores facts in
terms of dimension keys and measures, and optionally other attributes. All the data is
fully known when inserted, and it never changes (except to correct errors).
Typically, transaction fact tables store facts at the lowest possible level of granularity,
and they contain measures that are additive across all dimensions. A sales fact table that
stores every sales order line is a good example of a transaction fact table.

Periodic snapshot fact tables


A periodic snapshot fact table stores measurements at a predefined time, or specific
intervals. It provides a summary of key metrics or performance indicators over time, and
so it's useful for trend analysis and monitoring change over time. Measures are always
semi-additive (described later).

An inventory fact table is a good example of a periodic snapshot table. It's loaded every
day with the end-of-day stock balance of every product.

Periodic snapshot tables can be used instead of a transaction fact table when recording
large volumes of transactions is expensive, and it doesn't support any useful analytic
requirement. For example, there might be millions of stock movements in a day (which
could be stored in a transaction fact table), but your analysis is only concerned with
trends of end-of-day stock levels.

Accumulating snapshot fact tables


An accumulating snapshot fact table stores measurements that accumulate across a
well-defined period or workflow. It often records the state of a business process at
distinct stages or milestones, which might take days, weeks, or even months to
complete.

A fact row is loaded soon after the first event in a process, and then the row is updated
in a predictable sequence every time a milestone event occurs. Updates continue until
the process completes.

Accumulating snapshot fact table have multiple date dimension keys, each representing
a milestone event. Some dimension keys might record a N/A state until the process
arrives at a certain milestone. Measures typically record durations. Durations between
milestones can provide valuable insight into a business workflow or assembly process.

Measure types
Measures are typically numeric, and commonly additive. However, some measures can't
always be added. These measures are categorized as either semi-additive or non-
additive.
Additive measures
An additive measure can be summed across any dimension. For example, order quantity
and sales revenue are additive measures (providing revenue is recorded for a single
currency).

Semi-additive measures

A semi-additive measure can be summed across certain dimensions only.

Here are some examples of semi-additive measures.

Any measure in a periodic snapshot fact table can't be summed across other time
periods. For example, you shouldn't sum the age of an inventory item sampled
nightly, but you could sum the age of all inventory items on a shelf, each night.
A stock balance measure in an inventory fact table can't be summed across other
products.
Sales revenue in a sales fact table that has a currency dimension key can't be
summed across currencies.

Non-additive measures

A non-additive measure can't be summed across any dimension. One example is a


temperature reading, which by its nature doesn't make sense to add to other readings.

Other examples include rates, like unit prices, and ratios. However, it's considered a
better practice to store the values used to compute the ratio, which allows the ratio to
be calculated if needed. For example, a discount percentage of a sales fact could be
stored as a discount amount measure (to be divided by the sales revenue measure). Or,
the age of an inventory item on the shelf shouldn't be summed over time, but you
might observe a trend in the average age of inventory items.

While some measures can't be summed, they're still valid measures. They can be
aggregated by using count, distinct count, minimum, maximum, average, and others.
Also, non-additive measures can become additive when they're used in calculations. For
example, unit price multiplied by order quantity produces sales revenue, which is
additive.

Factless fact tables


When a fact table doesn't contain any measure columns, it's called a factless fact table. A
factless fact table typically records events or occurrences, like students attending class.
From an analytics perspective, a measurement can be achieved by counting fact rows.

Aggregate fact tables


An aggregate fact table represents a rollup of a base fact table to a lower dimensionality
and/or higher granularity. Its purpose is to accelerate query performance for commonly
queried dimensions.

7 Note

A Power BI semantic model can generate user-defined aggregations to achieve the


same result, or use the data warehouse aggregate fact table by using DirectQuery
storage mode.

Related content
In the next article in this series, learn about guidance and design best practices for
loading dimensional model tables.

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


Dimensional modeling in Microsoft
Fabric Warehouse: Load tables
Article • 06/21/2024

Applies to: SQL analytics endpoint and Warehouse in Microsoft Fabric

7 Note

This article forms part of the Dimensional modeling series of articles. This series
focuses on guidance and design best practices related to dimensional modeling in
Microsoft Fabric Warehouse.

This article provides you with guidance and best practices for loading dimension and
fact tables in a dimensional model. It provides practical guidance for Warehouse in
Microsoft Fabric, which is an experience that supports many T-SQL capabilities, like
creating tables and managing data in tables. So, you're in complete control of creating
your dimensional model tables and loading them with data.

7 Note

In this article, the term data warehouse refers to an enterprise data warehouse,
which delivers comprehensive integration of critical data across the organization. In
contrast, the standalone term warehouse refers to a Fabric Warehouse, which is a
software as a service (SaaS) relational database offering that you can use to
implement a data warehouse. For clarity, in this article the latter is mentioned as
Fabric Warehouse.

 Tip

If you're inexperienced with dimensional modeling, consider this series of articles


your first step. It isn't intended to provide a complete discussion on dimensional
modeling design. For more information, refer directly to widely adopted published
content, like The Data Warehouse Toolkit: The Definitive Guide to Dimensional
Modeling (3rd edition, 2013) by Ralph Kimball, and others.

Load a dimensional model


Loading a dimensional model involves periodically running an Extract, Transform, and
Load (ETL) process. An ETL process orchestrates the running of other processes, which
are generally concerned with staging source data, synchronizing dimension data,
inserting rows into fact tables, and recording auditing data and errors.

For a Fabric Warehouse solution, you can use Data Factory to develop and run your ETL
process. The process can stage, transform, and load source data into your dimensional
model tables.

Specifically, you can:

Use data pipelines to build workflows to orchestrate the ETL process. Data
pipelines can execute SQL scripts, stored procedures, and more.
Use dataflows to develop low-code logic to ingest data from hundreds of data
sources. Dataflows support combining data from multiple sources, transforming
data, and then loading it to a destination, like a dimensional model table.
Dataflows are built by using the familiar Power Query experience that's available
today across many Microsoft products, including Microsoft Excel and Power BI
Desktop.

7 Note

ETL development can be complex, and development can be challenging. It's


estimated that 60-80 percent of a data warehouse development effort is dedicated
to the ETL process.

Orchestration
The general workflow of an ETL process is to:

1. Optionally, load staging tables.


2. Process dimension tables.
3. Process fact tables.
4. Optionally, perform post-processing tasks, like triggering the refresh of dependent
Fabric content (like a semantic model).
Dimension tables should be processed first to ensure that they store all dimension
members, including those added to source systems since the last ETL process. When
there are dependencies between dimensions, as is the case with outrigger dimensions,
dimension tables should be processed in order of dependency. For example, a
geography dimension that's used by a customer dimension and a vendor dimension
should be processed before the other two dimensions.

Fact tables can be processed once all dimension tables are processed.

When all dimensional model tables are processed, you might trigger the refresh of
dependent semantic models. It's also a good idea to send a notification to relevant staff
to inform them of the outcome of the ETL process.

Stage data
Staging source data can help support data loading and transformation requirements. It
involves extracting source system data and loading it into staging tables, which you
create to support the ETL process. We recommend that you stage source data because it
can:

Minimize the impact on operational systems.


Be used to assist with, and optimize, ETL processing.
Provide the ability to restart the ETL process, without the need to reload data from
source systems.

Data in staging tables should never be made available to business users. It's only
relevant to the ETL process.

7 Note

When your data is stored in a Fabric Lakehouse, it might not be necessary to stage
its data in the data warehouse. If it implements a medallion architecture, you could
source its data from either the bronze, silver, or gold layer.

We recommend that you create a schema in the warehouse, possibly named staging .
Staging tables should resemble the source tables as closely as possible in terms of
column names and data types. The contents of each table should be removed at the
start of the ETL process. However, note that Fabric Warehouse tables can't be truncated.
Instead, you can drop and recreate each staging table before loading it with data.

You can also consider data virtualization alternatives as part of your staging strategy.
You can use:

Mirroring, which is a low-cost and low-latency turnkey solution that allows you to
create a replica of your data in OneLake. For more information, see Why use
Mirroring in Fabric?.
OneLake shortcuts, which point to other storage locations that could contain your
source data. Shortcuts can be used as tables in T-SQL queries.
PolyBase in SQL Server, which is a data virtualization feature for SQL Server.
PolyBase allows T-SQL queries to join data from external sources to relational
tables in an instance of SQL Server.
Data virtualization with Azure SQL Managed Instance, which allows you to execute
T-SQL queries on files storing data in common data formats in Azure Data Lake
Storage (ADLS) Gen2 or Azure Blob Storage, and combine it with locally stored
relational data by using joins.

Transform data
The structure of your source data might not resemble the destination structures of your
dimensional model tables. So, your ETL process needs to reshape the source data to
align with the structure of the dimensional model tables.

Also, the data warehouse must deliver cleansed and conformed data, so source data
might need to be transformed to ensure quality and consistency.

7 Note

The concept of garbage in, garbage out certainly applies to data warehousing—
therefore, avoid loading garbage (low quality) data into your dimensional model
tables.

Here are some transformations that your ETL process could perform.

Combine data: Data from different sources can be integrated (merged) based on
matching keys. For example, product data is stored across different systems (like
manufacturing and marketing), yet they all use a common stock-keeping unit
(SKU). Data can also be appended when it shares a common structure. For
example, sales data is stored in multiple systems. A union of the sales from each
system can produce a superset of all sales data.
Convert data types: Data types can be converted to those defined in the
dimensional model tables.
Calculations: Calculations can be done to produce values for the dimensional
model tables. For example, for an employee dimension table, you might
concatenate first and last names to produce the full name. As another example, for
your sales fact table, you might calculate gross sales revenue, which is the product
of unit price and quantity.
Detect and manage historical change: Change can be detected and appropriately
stored in dimension tables. For more information, see Manage historical change
later in this article.
Aggregate data: Aggregation can be used to reduce fact table dimensionality
and/or to raise the granularity of the facts. For example, the sales fact table doesn't
need to store sales order numbers. Therefore, an aggregated result that groups by
all dimension keys can be used to store the fact table data.

Load data
You can load tables in a Fabric Warehouse by using the following data ingestion options.
COPY INTO (T-SQL): This option is useful when the source data comprise Parquet
or CSV files stored in an external Azure storage account, like ADLS Gen2 or Azure
Blob Storage.
Data pipelines: In addition to orchestrating the ETL process, data pipelines can
include activities that run T-SQL statements, perform lookups, or copy data from a
data source to a destination.
Dataflows: As an alternative to data pipelines, dataflows provide a code-free
experience to transform and clean data.
Cross-warehouse ingestion: When data is stored in the same workspace, cross-
warehouse ingestion allows joining different warehouse or lakehouse tables. It
supports T-SQL commands like INSERT…SELECT , SELECT INTO , and CREATE TABLE AS
SELECT (CTAS) . These commands are especially useful when you want to transform

and load data from staging tables within the same workspace. They're also set-
based operations, which is likely to be the most efficient and fastest way to load
dimensional model tables.

 Tip

For a complete explanation of these data ingestion options including best practices,
see Ingest data into the Warehouse.

Logging
ETL processes usually require dedicated monitoring and maintenance. For these reasons,
we recommend that you log the results of the ETL process to non-dimensional model
tables in your warehouse. You should generate a unique ID for each ETL process and use
it to log details about every operation.

Consider logging:

The ETL process:


A unique ID for each ETL execution
Start time and end time
Status (success or failure)
Any errors encountered
Each staging and dimensional model table:
Start time and end time
Status (success or failure)
Rows inserted, updated, and deleted
Final table row count
Any errors encountered
Other operations:
Start time and end time of semantic model refresh operations

 Tip

You can create a semantic model that's dedicated to monitoring and analyzing your
ETL processes. Process durations can help you identify bottlenecks that might
benefit from review and optimization. Row counts can allow you to understand the
size of the incremental load each time the ETL runs, and also help to predict the
future size of the data warehouse (and when to scale up the Fabric capacity, if
appropriate).

Process dimension tables


Processing a dimension table involves synchronizing the data warehouse data with the
source systems. Source data is first transformed and prepared for loading into its
dimension table. This data is then matched with the existing dimension table data by
joining on the business keys. It's then possible to determine whether the source data
represents new or modified data. When the dimension table applies slowly changing
dimension (SCD) type 1, changes are made by updating the existing dimension table
rows. When the table applies SCD type 2 changes, the existing version is expired and a
new version is inserted.

The following diagram depicts the logic used to process a dimension table.
Consider the process of the Product dimension table.

When new products are added to the source system, rows are inserted into the
Product dimension table.

When products are modified, existing rows in the dimension table are either
updated or inserted.
When SCD type 1 applies, updates are made to the existing rows.
When SCD type 2 applies, updates are made to expire the current row versions,
and new rows that represent the current version are inserted.
When SCD type 3 applies, a process similar to SCD type 1 occurs, updating the
existing rows without inserting new rows.

Surrogate keys
We recommend that each dimension table has a surrogate key, which should use the
smallest possible integer data type. In SQL Server-based environments that's typically
done by creating an identity column, however this feature isn't supported in Fabric
Warehouse. Instead, you'll need to use a workaround technique that generates unique
identifiers.

) Important

When a dimension table includes automatically generated surrogate keys, you


should never perform a truncate and full reload of it. That's because it would
invalidate the data loaded into fact tables that use the dimension. Also, if the
dimension table supports SCD type 2 changes, it might not be possible to
regenerate the historical versions.

Manage historical change


When a dimension table must store historical change, you'll need to implement a slowly
changing dimension (SCD).

7 Note

If the dimension table row is an inferred member (inserted by a fact load process),
you should treat any changes as late arriving dimension details instead of an SCD
change. In this case, any changed attributes should be updated and the inferred
member flag column set to FALSE .

It's possible that a dimension could support SCD type 1 and/or SCD type 2 changes.

SCD type 1

When SCD type 1 changes are detected, use the following logic.

1. Update any changed attributes.


2. If the table includes last modified date and last modified by columns, set the current
date and process that made the modifications.

SCD type 2

When SCD type 2 changes are detected, use the following logic.
1. Expire the current version by setting the end date validity column to the ETL
processing date (or a suitable timestamp in the source system) and the current flag
to FALSE .
2. If the table includes last modified date and last modified by columns, set the current
date and process that made the modifications.
3. Insert new members that have the start date validity column set to the end date
validity column value (used to update the prior version) and has the current
version flag set to TRUE .
4. If the table includes created date and created by columns, set the current date and
process that made the insertions.

SCD type 3
When SCD type 3 changes are detected, update the attributes by using similar logic to
processing SCD type 1.

Dimension member deletions


Take care if source data indicates that dimension members were deleted (either because
they're not retrieved from the source system, or they've been flagged as deleted). You
shouldn't synchronize deletions with the dimension table, unless dimension members
were created in error and there are no fact records related to them.

The appropriate way to handle source deletions is to record them as a soft delete. A soft
delete marks a dimension member as no longer active or valid. To support this case,
your dimension table should include a Boolean attribute with the bit data type, like
IsDeleted . Update this column for any deleted dimension members to TRUE (1). The

current, latest version of a dimension member might similarly be marked with a Boolean
(bit) value in the IsCurrent or IsActive columns. All reporting queries and Power BI
semantic models should filter out records that are soft deletes.

Date dimension
Calendar and time dimensions are special cases because they usually don't have source
data. Instead, they're generated by using fixed logic.

You should load the date dimension table at the beginning of every new year to extend
its rows to a specific number of years ahead. There might be other business data, for
example fiscal year data, holidays, week numbers to update regularly.
When the date dimension table includes relative offset attributes, the ETL process must
be run daily to update offset attribute values based on the current date (today).

We recommend that the logic to extend or update the date dimension table be written
in T-SQL and encapsulated in a stored procedure.

Process fact tables


Processing a fact table involves synchronizing the data warehouse data with the source
system facts. Source data is first transformed and prepared for loading into its fact table.
Then, for each dimension key, a lookup determines the surrogate key value to store in
the fact row. When a dimension supports SCD type 2, the surrogate key for the current
version of the dimension member should be retrieved.

7 Note

Usually the surrogate key can be computed for the date and time dimensions
because they should use YYYYMMDD or HHMM format. For more information, see
Calendar and time.

If a dimension key lookup fails, it could indicate an integrity issue with the source
system. In this case, the fact row must still get inserted into the fact table. A valid
dimension key must still be stored. One approach is to store a special dimension
member (like Unknown). This approach requires a later update to correctly assign the
true dimension key value, when known.

) Important

Because Fabric Warehouse doesn't enforce foreign keys, it's critical that the ETL
process check for integrity when it loads data into fact tables.

Another approach, relevant when there's confidence that the natural key is valid, is to
insert a new dimension member and then store its surrogate key value. For more
information, see Inferred dimension members later in this section.

The following diagram depicts the logic used to process a fact table.
Whenever possible, a fact table should be loaded incrementally, meaning that new facts
are detected and inserted. An incremental load strategy is more scalable, and it reduces
the workload for both the source systems and the destination systems.

) Important

Especially for a large fact table, it should be a last resort to truncate and reload a
fact table. That approach is expensive in terms of process time, compute resources,
and possible disruption to the source systems. It also involves complexity when the
fact table dimensions apply SCD type 2. That's because dimension key lookups will
need to be done within the validity period of the dimension member versions.

Hopefully, you can efficiently detect new facts by relying on source system identifiers or
timestamps. For example, when a source system reliably records sales orders that are in
sequence, you can store the latest sales order number retrieved (known as the high
watermark). The next process can use that sales order number to retrieve newly created
sales orders, and again, store the latest sales order number retrieved for use by the next
process. It might also be possible that a create date column could be used to reliably
detect new orders.

If you can't rely on the source system data to efficiently detect new facts, you might be
able to rely on a capability of the source system to perform an incremental load. For
example, SQL Server and Azure SQL Managed Instance have a feature called change
data capture (CDC), which can track changes to each row in a table. Also, SQL Server,
Azure SQL Managed Instance, and Azure SQL Database have a feature called change
tracking, which can identify rows that have changed. When enabled, it can help you to
efficiently detect new or changed data in any database table. You might also be able to
add triggers to relational tables that store keys of inserted, updated, or deleted table
records.

Lastly, you might be able to correlate source data to fact table by using attributes. For
example, the sales order number and sales order line number. However, for large fact
tables, it could be a very expensive operation to detect new, changed, or deleted facts. It
could also be problematic when the source system archives operational data.

Inferred dimension members


When a fact load process inserts a new dimension member, it's known as an inferred
member. For example, when a hotel guest checks in, they're asked to join the hotel chain
as a loyalty member. A membership number is issued immediately, but the details of the
guest might not follow until the paperwork is submitted by the guest (if ever).

All that's known about the dimension member is its natural key. The fact load process
needs to create a new dimension member by using Unknown attribute values.
Importantly, it must set the IsInferredMember audit attribute to TRUE . That way, when
the late arriving details are sourced, the dimension load process can make the necessary
updates to the dimension row. For more information, see Manage historical change in
this article.

Fact updates or deletions


You might be required to update or delete fact data. For example, when a sales order
gets canceled, or an order quantity is changed. As described earlier for loading fact
tables, you need to efficiently detect changes and perform appropriate modifications to
the fact data. In this example for the canceled order, the sales order status would
probably change from Open to Canceled. That change would require an update of the
fact data, and not the deletion of a row. For the quantity change, an update of the fact
row quantity measure would be necessary. This strategy of using soft deletes preserves
history. A soft delete marks a row as no longer active or valid, and all reporting queries
and Power BI semantic models should filter out records that are soft deletes.

When you anticipate fact updates or deletions, you should include attributes (like a sales
order number and its sales order line number) in the fact table to help identify the fact
rows to modify. Be sure to index these columns to support efficient modification
operations.

Lastly, if fact data was inserted by using a special dimension member (like Unknown),
you'll need to run a periodic process that retrieves current source data for such fact rows
and update dimension keys to valid values.

Related content
For more information about loading data into a Fabric Warehouse, see:

Ingest data into the Warehouse


Ingest data into your Warehouse using data pipelines
Ingest data into your Warehouse using the COPY statement
Ingest data into your Warehouse using Transact-SQL
Tutorial: Set up dbt for Fabric Warehouse

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


Ingest data into the Warehouse
Article • 08/02/2024

Applies to: Warehouse in Microsoft Fabric

Warehouse in Microsoft Fabric offers built-in data ingestion tools that allow users to
ingest data into warehouses at scale using code-free or code-rich experiences.

Data ingestion options


You can ingest data into a Warehouse using one of the following options:

COPY (Transact-SQL): the COPY statement offers flexible, high-throughput data


ingestion from an external Azure storage account. You can use the COPY statement
as part of your existing ETL/ELT logic in Transact-SQL code.
Data pipelines: pipelines offer a code-free or low-code experience for data
ingestion. Using pipelines, you can orchestrate robust workflows for a full Extract,
Transform, Load (ETL) experience that includes activities to help prepare the
destination environment, run custom Transact-SQL statements, perform lookups,
or copy data from a source to a destination.
Dataflows: an alternative to pipelines, dataflows enable easy data preparation,
cleaning, and transformation using a code-free experience.
Cross-warehouse ingestion: data ingestion from workspace sources is also
possible. This scenario might be required when there's the need to create a new
table with a subset of a different table, or as a result of joining different tables in
the warehouse and in the lakehouse. For cross-warehouse ingestion, in addition to
the options mentioned, Transact-SQL features such as INSERT...SELECT, SELECT
INTO, or CREATE TABLE AS SELECT (CTAS) work cross-warehouse within the same
workspace.

Decide which data ingestion tool to use


To decide which data ingestion option to use, you can use the following criteria:

Use the COPY (Transact-SQL) statement for code-rich data ingestion operations,
for the highest data ingestion throughput possible, or when you need to add data
ingestion as part of a Transact-SQL logic. For syntax, see COPY INTO (Transact-
SQL).
Use data pipelines for code-free or low-code, robust data ingestion workflows that
run repeatedly, at a schedule, or that involves large volumes of data. For more
information, see Ingest data using Data pipelines.
Use dataflows for a code-free experience that allow custom transformations to
source data before it's ingested. These transformations include (but aren't limited
to) changing data types, adding or removing columns, or using functions to
produce calculated columns. For more information, see Dataflows.
Use cross-warehouse ingestion for code-rich experiences to create new tables
with source data within the same workspace. For more information, see Ingest data
using Transact-SQL and Write a cross-database query.

7 Note

The COPY statement in Warehouse supports only data sources on Azure storage
accounts, OneLake sources are currently not supported.

Supported data formats and sources


Data ingestion for Warehouse in Microsoft Fabric offers a vast number of data formats
and sources you can use. Each of the options outlined includes its own list of supported
data connector types and data formats.

For cross-warehouse ingestion, data sources must be within the same Microsoft Fabric
workspace. Queries can be performed using three-part naming for the source data.

As an example, suppose there's two warehouses named Inventory and Sales in a


workspace. A query such as the following one creates a new table in the Inventory
warehouse with the content of a table in the Inventory warehouse, joined with a table in
the Sales warehouse:

SQL

CREATE TABLE Inventory.dbo.RegionalSalesOrders


AS
SELECT s.SalesOrders, i.ProductName
FROM Sales.dbo.SalesOrders s
JOIN Inventory.dbo.Products i
WHERE s.ProductID = i.ProductID
AND s.Region = 'West region'

The COPY (Transact-SQL) statement currently supports the PARQUET and CSV file
formats. For data sources, currently Azure Data Lake Storage (ADLS) Gen2 and Azure
Blob Storage are supported.
Data pipelines and dataflows support a wide variety of data sources and data formats.
For more information, see Data pipelines and Dataflows.

Best practices
The COPY command feature in Warehouse in Microsoft Fabric uses a simple, flexible,
and fast interface for high-throughput data ingestion for SQL workloads. In the current
version, we support loading data from external storage accounts only.

You can also use TSQL to create a new table and then insert into it, and then update and
delete rows of data. Data can be inserted from any database within the Microsoft Fabric
workspace using cross-database queries. If you want to ingest data from a Lakehouse to
a warehouse, you can do this with a cross database query. For example:

SQL

INSERT INTO MyWarehouseTable


SELECT * FROM MyLakehouse.dbo.MyLakehouseTable;

Avoid ingesting data using singleton INSERT statements, as this causes poor
performance on queries and updates. If singleton INSERT statements were used
for data ingestion consecutively, we recommend creating a new table by using
CREATE TABLE AS SELECT (CTAS) or INSERT...SELECT patterns, dropping the
original table, and then creating your table again from the table you created using
CREATE TABLE AS SELECT (CTAS).
Dropping your existing table impacts your semantic model, including any
custom measures or customizations you might have made to the semantic
model.
When working with external data on files, we recommend that files are at least 4
MB in size.
For large compressed CSV files, consider splitting your file into multiple files.
Azure Data Lake Storage (ADLS) Gen2 offers better performance than Azure Blob
Storage (legacy). Consider using an ADLS Gen2 account whenever possible.
For pipelines that run frequently, consider isolating your Azure storage account
from other services that could access the same files at the same time.
Explicit transactions allow you to group multiple data changes together so that
they're only visible when reading one or more tables when the transaction is fully
committed. You also have the ability to roll back the transaction if any of the
changes fail.
If a SELECT is within a transaction, and was preceded by data insertions, the
automatically generated statistics can be inaccurate after a rollback. Inaccurate
statistics can lead to unoptimized query plans and execution times. If you roll back
a transaction with SELECTs after a large INSERT, update statistics for the columns
mentioned in your SELECT.

7 Note

Regardless of how you ingest data into warehouses, the parquet files produced by
the data ingestion task will be optimized using V-Order write optimization. V-Order
optimizes parquet files to enable lightning-fast reads under the Microsoft Fabric
compute engines such as Power BI, SQL, Spark and others. Warehouse queries in
general benefit from faster read times for queries with this optimization, still
ensuring the parquet files are 100% compliant to its open-source specification.
Unlike in Fabric Data Engineering, V-Order is a global setting in Synapse Data
Warehouse that cannot be disabled. For more information on V-Order, see
Understand and manage V-Order for Warehouse.

Related content
Ingest data using Data pipelines
Ingest data using the COPY statement
Ingest data using Transact-SQL
Create your first dataflow to get and transform data
COPY (Transact-SQL)
CREATE TABLE AS SELECT (Transact-SQL)
INSERT (Transact-SQL)

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


Ingest data into your Warehouse using
data pipelines
Article • 04/24/2024

Applies to: Warehouse in Microsoft Fabric

Data pipelines offer an alternative to using the COPY command through a graphical user
interface. A data pipeline is a logical grouping of activities that together perform a data
ingestion task. Pipelines allow you to manage extract, transform, and load (ETL) activities
instead of managing each one individually.

In this tutorial, you'll create a new pipeline that loads sample data into a Warehouse in
Microsoft Fabric.

7 Note

Some features from Azure Data Factory are not available in Microsoft Fabric, but
the concepts are interchangeable. You can learn more about Azure Data Factory
and Pipelines on Pipelines and activities in Azure Data Factory and Azure Synapse
Analytics. For a quickstart, visit Quickstart: Create your first pipeline to copy data.

Create a data pipeline


1. To create a new pipeline navigate to your workspace, select the +New button, and
select Data pipeline.
2. In the New pipeline dialog, provide a name for your new pipeline and select
Create.

3. You'll land in the pipeline canvas area, where you see three options to get started:
Add a pipeline activity, Copy data, and Choose a task to start.

Each of these options offers different alternatives to create a pipeline:

Add pipeline activity: this option launches the pipeline editor, where you can
create new pipelines from scratch by using pipeline activities.
Copy data: this option launches a step-by-step assistant that helps you select
a data source, a destination, and configure data load options such as the
column mappings. On completion, it creates a new pipeline activity with a
Copy Data task already configured for you.
Choose a task to start: this option launches a set of predefined templates to
help get you started with pipelines based on different scenarios.

Pick the Copy data option to launch the Copy assistant.

4. The first page of the Copy data assistant helps you pick your own data from
various data sources, or select from one of the provided samples to get started.
For this tutorial, we'll use the COVID-19 Data Lake sample. Select this option and
select Next.

5. In the next page, you can select a dataset, the source file format, and preview the
selected dataset. Select Bing COVID-19, the CSV format, and select Next.

6. The next page, Data destinations, allows you to configure the type of the
destination workspace. We'll load data into a warehouse in our workspace, so
select the Warehouse tab, and the Data Warehouse option. Select Next.


7. Now it's time to pick the warehouse to load data into. Select your desired
warehouse in the dropdown list and select Next.

8. The last step to configure the destination is to provide a name to the destination
table and configure the column mappings. Here you can choose to load the data
to a new table or to an existing one, provide a schema and table names, change
column names, remove columns, or change their mappings. You can accept the
defaults, or adjust the settings to your preference.

When you're done reviewing the options, select Next.

9. The next page gives you the option to use staging, or provide advanced options
for the data copy operation (which uses the T-SQL COPY command). Review the
options without changing them and select Next.

10. The last page in the assistant offers a summary of the copy activity. Select the
option Start data transfer immediately and select Save + Run.
11. You are directed to the pipeline canvas area, where a new Copy Data activity is
already configured for you. The pipeline starts to run automatically. You can
monitor the status of your pipeline in the Output pane:

12. After a few seconds, your pipeline finishes successfully. Navigating back to your
warehouse, you can select your table to preview the data and confirm that the
copy operation concluded.

For more on data ingestion into your Warehouse in Microsoft Fabric, visit:

Ingesting data into the Warehouse


Ingest data into your Warehouse using the COPY statement
Ingest data into your Warehouse using Transact-SQL

Next step
Query the SQL analytics endpoint or Warehouse in Microsoft Fabric

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


Ingest data into your Warehouse using
the COPY statement
Article • 04/24/2024

Applies to: Warehouse in Microsoft Fabric

The COPY statement is the primary way to ingest data into Warehouse tables. COPY
performs high high-throughput data ingestion from an external Azure storage account,
with the flexibility to configure source file format options, a location to store rejected
rows, skipping header rows, and other options.

This tutorial shows data ingestion examples for a Warehouse table using the T-SQL
COPY statement. It uses the Bing COVID-19 sample data from the Azure Open Datasets.
For details about this data, including its schema and usage rights, see Bing COVID-19.

7 Note

To learn more about the T-SQL COPY statement including more examples and the
full syntax, see COPY (Transact-SQL).

Create a table
Before you use the COPY statement, the destination table needs to be created. To create
the destination table for this sample, use the following steps:

1. In your Microsoft Fabric workspace, find and open your warehouse.

2. Switch to the Home tab and select New SQL query.


3. To create the table used as the destination in this tutorial, run the following code:

SQL

CREATE TABLE [dbo].[bing_covid-19_data]


(
[id] [int] NULL,
[updated] [date] NULL,
[confirmed] [int] NULL,
[confirmed_change] [int] NULL,
[deaths] [int] NULL,
[deaths_change] [int] NULL,
[recovered] [int] NULL,
[recovered_change] [int] NULL,
[latitude] [float] NULL,
[longitude] [float] NULL,
[iso2] [varchar](8000) NULL,
[iso3] [varchar](8000) NULL,
[country_region] [varchar](8000) NULL,
[admin_region_1] [varchar](8000) NULL,
[iso_subdivision] [varchar](8000) NULL,
[admin_region_2] [varchar](8000) NULL,
[load_time] [datetime2](6) NULL
);

Ingest Parquet data using the COPY statement


In the first example, we load data using a Parquet source. Since this data is publicly
available and doesn't require authentication, you can easily copy this data by specifying
the source and the destination. No authentication details are needed. You'll only need to
specify the FILE_TYPE argument.
Use the following code to run the COPY statement with a Parquet source:

SQL

COPY INTO [dbo].[bing_covid-19_data]


FROM 'https://pandemicdatalake.blob.core.windows.net/public/curated/covid-
19/bing_covid-19_data/latest/bing_covid-19_data.parquet'
WITH (
FILE_TYPE = 'PARQUET'
);

Ingest CSV data using the COPY statement and


skipping a header row
It's common for comma-separated value (CSV) files to have a header row that provides
the column names representing the table in a CSV file. The COPY statement can copy
data from CSV files and skip one or more rows from the source file header.

If you ran the previous example to load data from Parquet, consider deleting all data
from your table:

SQL

DELETE FROM [dbo].[bing_covid-19_data];

To load data from a CSV file skipping a header row, use the following code:

SQL

COPY INTO [dbo].[bing_covid-19_data]


FROM 'https://pandemicdatalake.blob.core.windows.net/public/curated/covid-
19/bing_covid-19_data/latest/bing_covid-19_data.csv'
WITH (
FILE_TYPE = 'CSV',
FIRSTROW = 2
);

Check the results


The COPY statement completes by ingesting 4,766,736 rows into your new table. You
can confirm the operation ran successfully by running a query that returns the total
number of rows in your table:
SQL

SELECT COUNT(*) FROM [dbo].[bing_covid-19_data];

If you ran both examples without deleting the rows in between runs, you'll see the result
of this query with twice as many rows. While that works for data ingestion in this case,
consider deleting all rows and ingesting data only once if you're going to further
experiment with this data.

Related content
Ingest data using Data pipelines
Ingest data into your Warehouse using Transact-SQL
Ingesting data into the Warehouse

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


Ingest data into your Warehouse using
Transact-SQL
Article • 04/25/2024

Applies to: Warehouse in Microsoft Fabric

The Transact-SQL language offers options you can use to load data at scale from
existing tables in your lakehouse and warehouse into new tables in your warehouse.
These options are convenient if you need to create new versions of a table with
aggregated data, versions of tables with a subset of the rows, or to create a table as a
result of a complex query. Let's explore some examples.

Create a new table with the result of a query by


using CREATE TABLE AS SELECT (CTAS)
The CREATE TABLE AS SELECT (CTAS) statement allows you to create a new table in
your warehouse from the output of a SELECT statement. It runs the ingestion operation
into the new table in parallel, making it highly efficient for data transformation and
creation of new tables in your workspace.

7 Note

The examples in this article use the Bing COVID-19 sample dataset. To load the
sample dataset, follow the steps in Ingest data into your Warehouse using the
COPY statement to create the sample data into your warehouse.

The first example illustrates how to create a new table that is a copy of the existing dbo.
[bing_covid-19_data_2023] table, but filtered to data from the year 2023 only:

SQL

CREATE TABLE [dbo].[bing_covid-19_data_2023]


AS
SELECT *
FROM [dbo].[bing_covid-19_data]
WHERE DATEPART(YEAR,[updated]) = '2023';

You can also create a new table with new year , month , dayofmonth columns, with values
obtained from updated column in the source table. This can be useful if you're trying to
visualize infection data by year, or to see months when the most COVID-19 cases are
observed:

SQL

CREATE TABLE [dbo].[bing_covid-19_data_with_year_month_day]


AS
SELECT DATEPART(YEAR,[updated]) [year], DATEPART(MONTH,[updated]) [month],
DATEPART(DAY,[updated]) [dayofmonth], *
FROM [dbo].[bing_covid-19_data];

As another example, you can create a new table that summarizes the number of cases
observed in each month, regardless of the year, to evaluate how seasonality affects
spread in a given country/region. It uses the table created in the previous example with
the new month column as a source:

SQL

CREATE TABLE [dbo].[infections_by_month]


AS
SELECT [country_region],[month], SUM(CAST(confirmed as bigint))
[confirmed_sum]
FROM [dbo].[bing_covid-19_data_with_year_month_day]
GROUP BY [country_region],[month];

Based on this new table, we can see that the United States observed more confirmed
cases across all years in the month of January , followed by December and October .
April is the month with the lowest number of cases overall:

SQL

SELECT * FROM [dbo].[infections_by_month]


WHERE [country_region] = 'United States'
ORDER BY [confirmed_sum] DESC;

For more examples and syntax reference, see CREATE TABLE AS SELECT (Transact-SQL).

Ingest data into existing tables with T-SQL


queries
The previous examples create new tables based on the result of a query. To replicate the
examples but on existing tables, the INSERT...SELECT pattern can be used. For example,
the following code ingests new data into an existing table:

SQL

INSERT INTO [dbo].[bing_covid-19_data_2023]


SELECT * FROM [dbo].[bing_covid-19_data]
WHERE [updated] > '2023-02-28';

The query criteria for the SELECT statement can be any valid query, as long as the
resulting query column types align with the columns on the destination table. If column
names are specified and include only a subset of the columns from the destination
table, all other columns are loaded as NULL . For more information, see Using INSERT
INTO...SELECT to Bulk Import data with minimal logging and parallelism.

Ingest data from tables on different


warehouses and lakehouses
For both CREATE TABLE AS SELECT and INSERT...SELECT, the SELECT statement can also
reference tables on warehouses that are different from the warehouse where your
destination table is stored, by using cross-warehouse queries. This can be achieved by
using the three-part naming convention [warehouse_or_lakehouse_name.]
[schema_name.]table_name . For example, suppose you have the following workspace

assets:

A lakehouse named cases_lakehouse with the latest case data.


A warehouse named reference_warehouse with tables used for reference data.
A warehouse named research_warehouse where the destination table is created.

A new table can be created that uses three-part naming to combine data from tables on
these workspace assets:

SQL

CREATE TABLE [research_warehouse].[dbo].[cases_by_continent]


AS
SELECT
FROM [cases_lakehouse].[dbo].[bing_covid-19_data] cases
INNER JOIN [reference_warehouse].[dbo].[bing_covid-19_data] reference
ON cases.[iso3] = reference.[countrycode];

To learn more about cross-warehouse queries, see Write a cross-database SQL Query.

Related content
Ingesting data into the Warehouse
Ingest data using the COPY statement
Ingest data using Data pipelines
Write a cross-database SQL Query

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


Tutorial: Set up dbt for Fabric Data
Warehouse
Article • 11/19/2024

Applies to: ✅ Warehouse in Microsoft Fabric

This tutorial guides you through setting up dbt and deploying your first project to a
Fabric Warehouse.

Introduction
The dbt (Data Build Tool) open-source framework simplifies data transformation and
analytics engineering. It focuses on SQL-based transformations within the analytics
layer, treating SQL as code. dbt supports version control, modularization, testing, and
documentation.

The dbt adapter for Microsoft Fabric can be used to create dbt projects, which can then
be deployed to a Fabric Data Warehouse.

You can also change the target platform for the dbt project by simply changing the
adapter, for example; a project built for Azure Synapse dedicated SQL pool can be
upgraded in a few seconds to a Fabric Data Warehouse .

Prerequisites for the dbt adapter for Microsoft


Fabric
Follow this list to install and set up the dbt prerequisites:

1. Python version 3.7 (or higher) .

2. The Microsoft ODBC Driver for SQL Server.

3. Latest version of the dbt-fabric adapter from the PyPI (Python Package Index)
repository using pip install dbt-fabric .

PowerShell

pip install dbt-fabric

7 Note
By changing pip install dbt-fabric to pip install dbt-synapse and using
the following instructions, you can install the dbt adapter for Synapse
dedicated SQL pool .

4. Make sure to verify that dbt-fabric and its dependencies are installed by using pip
list command:

PowerShell

pip list

A long list of the packages and current versions should be returned from this
command.

5. If you don't already have one, create a Warehouse. You can use the trial capacity
for this exercise: sign up for the Microsoft Fabric free trial , create a workspace,
and then create a warehouse.

Get started with dbt-fabric adapter


This tutorial uses Visual Studio Code , but you can use your preferred tool of your
choice.

1. Clone the jaffle_shop demo dbt project onto your machine.

You can clone a repo with Visual Studio Code's built-in source control.
Or, for example, you can use the git clone command:

PowerShell

git clone https://github.com/dbt-labs/jaffle_shop.git

2. Open the jaffle_shop project folder in Visual Studio Code.


3. You can skip the sign-up if you have created a Warehouse already.

4. Create a profiles.yml file. Add the following configuration to profiles.yml . This


file configures the connection to your warehouse in Microsoft Fabric using the dbt-
fabric adapter.

yml

config:
partial_parse: true
jaffle_shop:
target: fabric-dev
outputs:
fabric-dev:
authentication: CLI
database: <put the database name here>
driver: ODBC Driver 18 for SQL Server
host: <enter your SQL analytics endpoint here>
schema: dbo
threads: 4
type: fabric

7 Note

Change the type from fabric to synapse to switch the database adapter to
Azure Synapse Analytics, if desired. Any existing dbt project's data platform
can be updated by changing the database adapter. For more information, see
the dbt list of supported data platforms .

5. Authenticate yourself to Azure in the Visual Studio Code terminal.

Run az login in Visual Studio Code terminal if you're using Azure CLI
authentication.
For Service Principal or other Microsoft Entra ID (formerly Azure Active
Directory) authentication in Microsoft Fabric, refer to dbt (Data Build Tool)
setup and dbt Resource Configurations . For more information, see
Microsoft Entra authentication as an alternative to SQL authentication in
Microsoft Fabric.

6. Now you're ready to test the connectivity. To test the connectivity to your
warehouse, run dbt debug in the Visual Studio Code terminal.

PowerShell

dbt debug

All checks are passed, which means you can connect your warehouse using dbt-
fabric adapter from the jaffle_shop dbt project.

7. Now, it's time to test if the adapter is working or not. First run dbt seed to insert
sample data into the warehouse.

8. Run dbt run to validate data against some tests.

PowerShell

dbt run

9. Run dbt test to run the models defined in the demo dbt project.

PowerShell

dbt test

You have now deployed a dbt project to Fabric Data Warehouse.

Move between different warehouses


It's simple move the dbt project between different warehouses. A dbt project on any
supported warehouse can be quickly migrated with this three step process:

1. Install the new adapter. For more information and full installation instructions, see
dbt adapters .

2. Update the type property in the profiles.yml file.

3. Build the project.

dbt in Fabric Data Factory


When integrated with Apache Airflow, a popular workflow management system, dbt
becomes a powerful tool for orchestrating data transformations. Airflow's scheduling
and task management capabilities allow data teams to automate dbt runs. It ensures
regular data updates and maintains a consistent flow of high-quality data for analysis
and reporting. This combined approach, using dbt's transformation expertise with
Airflow's workflow management, delivers efficient and robust data pipelines, ultimately
leading to faster and more insightful data-driven decisions.

Apache Airflow is an open-source platform used to programmatically create,


schedule, and monitor complex data workflows. It allows you to define a set of tasks,
called operators, that can be combined into directed acyclic graphs (DAGs) to represent
data pipelines.

For more information to operationalize dbt with your warehouse, see Transform data
using dbt with Data Factory in Microsoft Fabric.

Considerations
Important things to consider when using dbt-fabric adapter:

Review the current limitations in Microsoft Fabric data warehousing.

Fabric supports Microsoft Entra ID (formerly Azure Active Directory) authentication


for user principals, user identities, and service principals. The recommended
authentication mode to interactively work on warehouse is CLI (command-line
interfaces) and use service principals for automation.

Review the T-SQL (Transact-SQL) commands not supported in Fabric Data


Warehouse.

Some T-SQL commands are supported by dbt-fabric adapter using Create Table
as Select (CTAS), DROP , and CREATE commands, such as ALTER TABLE

ADD/ALTER/DROP COLUMN , MERGE , TRUNCATE , sp_rename .

Review Unsupported data types to learn about the supported and unsupported
data types.

You can log issues on the dbt-fabric adapter on GitHub by visiting Issues ·
microsoft/dbt-fabric · GitHub .

Next step
Transform data using dbt with Data Factory in Microsoft Fabric

Related content
What is data warehousing in Microsoft Fabric?
Tutorial: Create a Warehouse in Microsoft Fabric
Tutorial: Transform data using a stored procedure
Source Control with Warehouse

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


Default Power BI semantic models in
Microsoft Fabric
Article • 08/06/2024

Applies to: SQL analytics endpoint, Warehouse, and Mirrored Database in Microsoft
Fabric

In Microsoft Fabric, Power BI semantic models are a logical description of an analytical


domain, with metrics, business friendly terminology, and representation, to enable
deeper analysis. This semantic model is typically a star schema with facts that represent
a domain, and dimensions that allow you to analyze, or slice and dice the domain to drill
down, filter, and calculate different analyses. With the semantic model, the semantic
model is created automatically for you, and you choose which tables, relationships, and
measures are to be added, and the aforementioned business logic gets inherited from
the parent lakehouse or Warehouse respectively, jump-starting the downstream
analytics experience for business intelligence and analysis with an item in Microsoft
Fabric that is managed, optimized, and kept in sync with no user intervention.

Visualizations and analysis in Power BI reports can now be built in the web - or in just a
few steps in Power BI Desktop - saving users time, resources, and by default, providing a
seamless consumption experience for end-users. The default Power BI semantic model
follows the naming convention of the Lakehouse.

Power BI semantic models represent a source of data ready for reporting, visualization,
discovery, and consumption. Power BI semantic models provide:

The ability to expand warehousing constructs to include hierarchies, descriptions,


relationships. This allows deeper semantic understanding of a domain.
The ability to catalog, search, and find Power BI semantic model information in the
Data Hub.
The ability to set bespoke permissions for workload isolation and security.
The ability to create measures, standardized metrics for repeatable analysis.
The ability to create Power BI reports for visual analysis.
The ability discover and consume data in Excel.
The ability for third party tools like Tableau to connect and analyze data.

For more on Power BI, see Power BI guidance.

7 Note
Microsoft has renamed the Power BI dataset content type to semantic model. This
applies to Microsoft Fabric as well. For more information, see New name for Power
BI datasets.

Direct Lake mode


Direct Lake mode is a groundbreaking new engine capability to analyze very large
datasets in Power BI. The technology is based on the idea of consuming parquet-
formatted files directly from a data lake, without having to query a Warehouse or SQL
analytics endpoint, and without having to import or duplicate data into a Power BI
semantic model. This native integration brings a unique mode of accessing the data
from the Warehouse or SQL analytics endpoint, called Direct Lake. Direct Lake overview
has further information about this storage mode for Power BI semantic models.

Direct Lake provides the most performant query and reporting experience. Direct Lake is
a fast path to consume the data from the data lake straight into the Power BI engine,
ready for analysis.

In traditional DirectQuery mode, the Power BI engine directly queries the data from
the source for each query execution, and the query performance depends on the
data retrieval speed. DirectQuery eliminates the need to copy data, ensuring that
any changes in the source are immediately reflected in query results.

In Import mode, the performance is better because the data is readily available in
memory, without having to query the data from the source for each query
execution. However, the Power BI engine must first copy the data into the memory,
at data refresh time. Any changes to the underlying data source are picked up
during the next data refresh.

Direct Lake mode eliminates the Import requirement by consuming the data files
directly into memory. Because there's no explicit import process, it's possible to
pick up any changes at the source as they occur. Direct Lake combines the
advantages of DirectQuery and Import mode while avoiding their disadvantages.
Direct Lake mode is the ideal choice for analyzing very large datasets and datasets
with frequent updates at the source. Direct Lake will automatically fallback to
DirectQuery using the SQL analytics endpoint of the Warehouse or SQL analytics
endpoint when Direct Lake exceeds limits for the SKU, or uses features not
supported, allowing report users to continue uninterrupted.

Direct Lake mode is the storage mode for default Power BI semantic models, and new
Power BI semantic models created in a Warehouse or SQL analytics endpoint. Using
Power BI Desktop, you can also create Power BI semantic models using the SQL analytics
endpoint of Warehouse or SQL analytics endpoint as a data source for semantic models
in import or DirectQuery storage mode.

Understand what's in the default Power BI


semantic model
When you create a Warehouse or SQL analytics endpoint, a default Power BI semantic
model is created. The default semantic model is represented with the (default) suffix. You
can use Manage default semantic model to choose tables to add.

The default semantic model is queried via the SQL analytics endpoint and updated via
changes to the Lakehouse or Warehouse. You can also query the default semantic model
via cross-database queries from a Warehouse.

Sync the default Power BI semantic model


Previously we automatically added all tables and views in the Warehouse to the default
Power BI semantic model. Based on feedback, we have modified the default behavior to
not automatically add tables and views to the default Power BI semantic model. This
change will ensure the background sync will not get triggered. This will also disable
some actions like "New Measure", "Create Report", "Analyze in Excel".

If you want to change this default behavior, you can:

1. Manually enable the Sync the default Power BI semantic model setting for each
Warehouse or SQL analytics endpoint in the workspace. This will restart the
background sync that will incur some consumption costs.

2. Manually pick tables and views to be added to semantic model through Manage
default Power BI semantic model in the ribbon or info bar.

7 Note

In case you are not using the default Power BI semantic model for reporting
purposes, manually disable the Sync the default Power BI semantic model setting
to avoid adding objects automatically. The setting update will ensure that
background sync will not get triggered and save on Onelake consumption costs.

Manually update the default Power BI semantic model


Once there are objects in the default Power BI semantic model, there are two ways to
validate or visually inspect the tables:

1. Select the Manually update semantic model button in the ribbon.

2. Review the default layout for the default semantic model objects.

The default layout for BI-enabled tables persists in the user session and is generated
whenever a user navigates to the model view. Look for the Default semantic model
objects tab.

Access the default Power BI semantic model


To access default Power BI semantic models, go to your workspace, and find the
semantic model that matches the name of the desired Lakehouse. The default Power BI
semantic model follows the naming convention of the Lakehouse.

To load the semantic model, select the name of the semantic model.

Monitor the default Power BI semantic model


You can monitor and analyze activity on the semantic model with SQL Server Profiler by
connecting to the XMLA endpoint.

SQL Server Profiler installs with SQL Server Management Studio (SSMS), and allows
tracing and debugging of semantic model events. Although officially deprecated for SQL
Server, Profiler is still included in SSMS and remains supported for Analysis Services and
Power BI. Use with the Fabric default Power BI semantic model requires SQL Server
Profiler version 18.9 or higher. Users must specify the semantic model as the initial
catalog when connecting with the XMLA endpoint. To learn more, see SQL Server
Profiler for Analysis Services.

Script the default Power BI semantic model


You can script out the default Power BI semantic model from the XMLA endpoint with
SQL Server Management Studio (SSMS).

View the Tabular Model Scripting Language (TMSL) schema of the semantic model by
scripting it out via the Object Explorer in SSMS. To connect, use the semantic model's
connection string, which looks like powerbi://api.powerbi.com/v1.0/myorg/username . You
can find the connection string for your semantic model in the Settings, under Server
settings. From there, you can generate an XMLA script of the semantic model via
SSMS's Script context menu action. For more information, see Dataset connectivity with
the XMLA endpoint.

Scripting requires Power BI write permissions on the Power BI semantic model. With
read permissions, you can see the data but not the schema of the Power BI semantic
model.
Create a new Power BI semantic model in
Direct Lake storage mode
You can also create additional Power BI semantic models in Direct Lake mode based off
SQL analytics endpoint or Warehouse data. These new Power BI semantic models can
be edited in the workspace using Open data model and can be used with other features
such as write DAX queries and semantic model row-level security.

The New Power BI semantic model button creates a new blank semantic model
separate from the default semantic model.

To create a Power BI semantic model in Direct Lake mode, follow these steps:

1. Open the lakehouse and select New Power BI semantic model from the ribbon.

2. Alternatively, open a Warehouse or Lakehouse's SQL analytics endpoint, first select


the Reporting ribbon, then select New Power BI semantic model.

3. Enter a name for the new semantic model, select a workspace to save it in, and
pick the tables to include. Then select Confirm.

4. The new Power BI semantic model can be edited in the workspace, where you can
add relationships, measures, rename tables and columns, choose how values are
displayed in report visuals, and much more. If the model view does not show after
creation, check the pop-up blocker of your browser.

5. To edit the Power BI semantic model later, select Open data model from the
semantic model context menu or item details page to edit the semantic model
further.

Power BI reports can be created in the workspace by selecting New report from web
modeling, or in Power BI Desktop by live connecting to this new semantic model.

To learn more on how to edit data models in the Power BI service, see Edit Data Models.

Create a new Power BI semantic model in


import or DirectQuery storage mode
Having your data in Microsoft Fabric means you can create Power BI semantic models in
any storage mode -- Direct Lake, import, or DirectQuery. You can create additional
Power BI semantic models in import or DirectQuery mode based off SQL analytics
endpoint or Warehouse data using the of SQL endpoint.
To create a Power BI semantic model in import or DirectQuery mode, follow these steps:

1. Open Power BI Desktop, sign in, and click on OneLake data hub.

2. Choose the SQL analytics endpoint of the lakehouse or warehouse.

3. Select the Connect button dropdown and choose Connect to SQL endpoint.

4. Select import or DirectQuery storage mode and the tables to add to the semantic
model.

From there you can create the Power BI semantic model and report to publish to the
workspace when ready.

To learn more about Power BI, see Power BI.

Limitations
Default Power BI semantic models follow the current limitations for semantic models in
Power BI. Learn more:

Azure Analysis Services resource and object limits


Data types in Power BI Desktop - Power BI

If the parquet, Apache Spark, or SQL data types can't be mapped to one of the Power BI
desktop data types, they are dropped as part of the sync process. This is in line with
current Power BI behavior. For these columns, we recommend that you add explicit type
conversions in their ETL processes to convert it to a type that is supported. If there are
data types that are needed upstream, users can optionally specify a view in SQL with the
explicit type conversion desired. This will be picked up by the sync or can be added
manually as previously indicated.

Default Power BI semantic models can only be edited in the SQL analytics endpoint
or warehouse.

Related content
Define relationships in data models for data warehousing in Microsoft Fabric
Model data in the default Power BI semantic model in Microsoft Fabric

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


Model data in the default Power BI
semantic model in Microsoft Fabric
Article • 09/24/2024

Applies to: ✅ SQL analytics endpoint, Warehouse, and Mirrored Database in Microsoft
Fabric

The default Power BI semantic model inherits all relationships between entities defined
in the model view and infers them as Power BI semantic model relationships, when
objects are enabled for BI (Power BI Reports). Inheriting the warehouse's business logic
allows a warehouse developer or BI analyst to decrease the time to value towards
building a useful semantic model and metrics layer for analytical business intelligence
(BI) reports in Power BI, Excel, or external tools like Tableau that read the XMLA format.

While all constraints are translated to relationships, currently in Power BI, only one
relationship can be active at a time, whereas multiple primary and foreign key
constraints can be defined for warehouse entities and are shown visually in the diagram
lines. The active Power BI relationship is represented with a solid line and the rest is
represented with a dotted line. We recommend choosing the primary relationship as
active for BI reporting purposes.

Automatic translation of constraints to relationships in the default Power BI semantic


model is only applicable for tables in the Warehouse in Microsoft Fabric, not currently
supported in the SQL analytics endpoint.

7 Note

Microsoft has renamed the Power BI dataset content type to semantic model. This
applies to Microsoft Fabric as well. For more information, see New name for Power
BI datasets.

Data modeling properties


The following table provides a description of the properties available when using the
model view diagram and creating relationships:

ノ Expand table
Column name Description

FromObjectName Table/View name "From" which relationship is defined.

ToObjectName Table/View name "To" which a relationship is defined.

TypeOfRelationship Relationship cardinality, the possible values are: None, OneToOne,


OneToMany, ManyToOne, and ManyToMany.

SecurityFilteringBehavior Indicates how relationships influence filtering of data when


evaluating row-level security expressions and is a Power BI specific
semantic. The possible values are: OneDirection, BothDirections, and
None.

IsActive A Power BI specific semantic, and a boolean value that indicates


whether the relationship is marked as Active or Inactive. This defines
the default relationship behavior within the semantic model.

RelyOnReferentialIntegrity A boolean value that indicates whether the relationship can rely on
referential integrity or not.

CrossFilteringBehavior Indicates how relationships influence filtering of data and is Power


BI specific. The possible values are: 1 - OneDirection, 2 -
BothDirections, and 3 - Automatic.

Add or remove objects to the default Power BI


semantic model
In Power BI, a semantic model is always required before any reports can be built, so the
default Power BI semantic model enables quick reporting capabilities on top of the
warehouse. Within the warehouse, a user can add warehouse objects - tables or views to
their default Power BI semantic model. They can also add other semantic modeling
properties, such as hierarchies and descriptions. These properties are then used to
create the Power BI semantic model's tables. Users can also remove objects from the
default Power BI semantic model.

1. Open a warehouse in your Fabric workspace.


2. Navigate to Model view by selecting the Model view icon.

To add objects such as tables or views to the default Power BI semantic model, you have
options:

Manually enable the Sync the default Power BI semantic model setting that will
automatically add objects to the semantic model. For more information, see Sync
the default Power BI semantic model.
Manually add objects to the semantic model.
The auto detect experience determines any tables or views and opportunistically adds
them.

The manually detect option in the ribbon allows fine grained control of which object(s),
such as tables and/or views, should be added to the default Power BI semantic model:

Select all
Filter for tables or views
Select specific objects

To remove objects, a user can use the manually select button in the ribbon and:

Un-select all
Filter for tables or views
Un-select specific objects

 Tip

We recommend reviewing the objects enabled for BI and ensuring they have the
correct logical relationships to ensure a smooth downstream reporting experience.

Hide elements from downstream reporting


You can hide elements at the table or column level of your warehouse from downstream
reporting by using the Model layout canvas options, as shown in the following image.

Related content
Model data in the default Power BI semantic model in Microsoft Fabric
Default Power BI semantic models in Microsoft Fabric
Create reports in the Power BI service in Microsoft Fabric and Power BI Desktop
Share your warehouse and manage permissions

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


Define relationships in data models for
data warehousing in Microsoft Fabric
Article • 09/24/2024

Applies to: ✅ SQL analytics endpoint, Warehouse, and Mirrored Database in Microsoft
Fabric

A well-defined data model is instrumental in driving your analytics and reporting


workloads. In a Warehouse in Microsoft Fabric, you can easily build and change your
data model with a few simple steps in our visual editor. You need to have at least a small
sample of data loaded before you can explore these concepts further; tables can be
empty, but the schemas (their structures) need to be defined.

Warehouse modeling
Modeling the warehouse is possible by setting primary and foreign key constraints and
setting identity columns on the model layouts within the data warehouse user interface.
After you navigate the model layouts, you can do this in a visual entity relationship
diagram that allows a user to drag and drop tables to infer how the objects relate to one
another. Lines visually connecting the entities infer the type of physical relationships
that exist.

How to model data and define relationships


To model your data:

1. Open a warehouse in your Fabric workspace.


2. Navigate to Model layouts in the ribbon.


In the model layouts, users can model their warehouse and the canonical autogenerated
default Power BI semantic model. We recommend modeling your data warehouse using
traditional Kimball methodologies, using a star schema, wherever possible. There are
two types of modeling possible:

Warehouse modeling - the physical relationships expressed as primary and foreign


keys and constraints
Default Power BI semantic model modeling - the logical relationships expressed
between entities

Modeling automatically keeps these definitions in sync, enabling powerful warehouse


and semantic layer development simultaneously.

Define physical and logical relationships


1. To create a logical relationship between entities in a warehouse and the resulting
primary and foreign key constraints, select the Model layouts and select your
warehouse, then drag the column from one table to the column on the other table
to initiate the relationship. In the window that appears, configure the relationship
properties.
2. Select the Confirm button when your relationship is complete to save the
relationship information. The relationship set will effectively:
a. Set the physical relationships - primary and foreign key constraints in the
database
b. Set the logical relationships - primary and foreign key constraints in the default
Power BI semantic model

Edit relationships using different methods


Using drag and drop and the associated Edit relationships dialog is a more guided
experience for editing relationships in Power BI.

In contrast, editing relationships in the Properties pane is a streamlined approach to


editing relationships:

You only see the table names and columns from which you can choose, you aren't
presented with a data preview, and the relationship choices you make are only validated
when you select Apply changes. Using the Properties pane and its streamlined
approach reduces the number of queries generated when editing a relationship, which
can be important for big data scenarios, especially when using DirectQuery connections.
Relationships created using the Properties pane can also use multi-select relationships
in the Model view diagram layouts. Pressing the Ctrl key and select more than one line
to select multiple relationships. Common properties can be edited in the Properties
pane and Apply changes processes the changes in one transaction.

Single or multi-selected relationships can also be deleted by pressing Delete on your


keyboard. You can't undo the delete action, so a dialog prompts you to confirm deleting
the relationships.
Use model layouts
During the session, users can create multiple tabs in the model layouts to depict
multiple data warehouse schemas or further assist with database design.

Currently, the model layouts are only persisted in session. However the database
changes are persisted. Users can use the auto-layout whenever a new tab is created to
visually inspect the database design and understand the modeling.

Next step
Model data in the default Power BI semantic model in Microsoft Fabric

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


Create reports in the Power BI service in
Microsoft Fabric and Power BI Desktop
Article • 11/20/2024

Applies to: ✅ SQL analytics endpoint, Warehouse, and Mirrored Database in Microsoft
Fabric

This article describes three different scenarios you can follow to create reports in the
Power BI service.

Create a report from the warehouse editor


From within Fabric Data Warehouse, using the ribbon and the main home tab, navigate
to the New report button. This option provides a native, quick way to create report built
on top of the default Power BI semantic model.

If no tables have been added to the default Power BI semantic model, the dialog first
automatically adds tables, prompting the user to confirm or manually select the tables
included in the canonical default semantic model first, ensuring there's always data first.

With a default semantic model that has tables, the New report opens a browser tab to
the report editing canvas to a new report that is built on the semantic model. When you
save your new report you're prompted to choose a workspace, provided you have write
permissions for that workspace. If you don't have write permissions, or if you're a free
user and the semantic model resides in a Premium capacity workspace, the new report is
saved in your My workspace.

Use default Power BI semantic model within


workspace
Using the default semantic model and action menu in the workspace: In the Microsoft
Fabric workspace, navigate to the default Power BI semantic model and select the More
menu (...) to create a report in the Power BI service.
Select Create report to open the report editing canvas to a new report on the semantic
model. When you save your new report, it's saved in the workspace that contains the
semantic model as long as you have write permissions on that workspace. If you don't
have write permissions, or if you're a free user and the semantic model resides in a
Premium capacity workspace, the new report is saved in your My workspace.

Use the OneLake catalog


Use the default Power BI semantic model and semantic model details page in the
OneLake catalog. In the workspace list, select the default semantic model's name to get
to the Semantic model details page, where you can find details about the semantic
model and see related reports. You can also create a report directly from this page. To
learn more about creating a report in this fashion, see Dataset details.

In the OneLake catalog, you see warehouse and their associated default semantic
models. Select the warehouse to navigate to the warehouse details page. You can see
the warehouse metadata, supported actions, lineage and impact analysis, along with
related reports created from that warehouse. Default semantic models derived from a
warehouse behave the same as any semantic model.

To find the warehouse, you begin with the OneLake catalog. The following image shows
the OneLake catalog in the Power BI service:

1. Select a warehouse to view its warehouse details page.

2. Select the More menu (...) to display the options menu.

3. Select Open to open the warehouse.

Create reports in the Power BI Desktop


The OneLake catalog integration in Power BI Desktop lets you connect to the
Warehouse or SQL analytics endpoint of Lakehouse in easy steps.

1. Use Data hub menu in the ribbon to get list of all items.

2. Select the warehouse that you would like to connect

3. On the Connect button, select the dropdown, and select Connect to SQL
endpoint.

Related content
Connectivity
Create reports
Tutorial: Get started creating in the Power BI service

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


Security for data warehousing in
Microsoft Fabric
Article • 09/25/2024

Applies to: ✅ SQL analytics endpoint and Warehouse in Microsoft Fabric

This article covers security topics for securing the SQL analytics endpoint of the
lakehouse and the Warehouse in Microsoft Fabric.

For information on Microsoft Fabric security, see Security in Microsoft Fabric.

For information on connecting to the SQL analytics endpoint and Warehouse, see
Connectivity.

Warehouse access model


Microsoft Fabric permissions and granular SQL permissions work together to govern
Warehouse access and the user permissions once connected.

Warehouse connectivity is dependent on being granted the Microsoft Fabric Read


permission, at a minimum, for the Warehouse.
Microsoft Fabric item permissions enable the ability to provide a user with SQL
permissions, without needing to grant those permissions within SQL.
Microsoft Fabric workspace roles provide Microsoft Fabric permissions for all
warehouses within a workspace.
Granular user permissions can be further managed via T-SQL.

Workspace roles
Workspace roles are used for development team collaboration within a workspace. Role
assignment determines the actions available to the user and applies to all items within
the workspace.

For an overview of Microsoft Fabric workspace roles, see Roles in workspaces.


For instructions on assigning workspace roles, see Give workspace access.

For details on the specific Warehouse capabilities provided through workspace roles, see
Workspace roles in Fabric data warehousing.

Item permissions
In contrast to workspace roles, which apply to all items within a workspace, item
permissions can be assigned directly to individual Warehouses. The user will receive the
assigned permission on that single warehouse. The primary purpose for these
permissions is to enable sharing for downstream consumption of the Warehouse.

For details on the specific permissions provided for warehouses, see Share your
warehouse and manage permissions.

Granular security
Workspace roles and item permissions provide an easy way to assign coarse permissions
to a user for the entire warehouse. However, in some cases, more granular permissions
are needed for a user. To achieve this, standard T-SQL constructs can be used to provide
specific permissions to users.

Microsoft Fabric data warehousing supports several data protection technologies that
administrators can use to protect sensitive data from unauthorized access. By securing
or obfuscating data from unauthorized users or roles, these security features can
provide data protection in both a Warehouse and SQL analytics endpoint without
application changes.

Object-level security controls access to specific database objects.


Column-level security prevents unauthorized viewing of columns in tables.
Row-level security prevents unauthorized viewing of rows in tables, using familiar
WHERE clause filter predicates.

Dynamic data masking prevents unauthorized viewing of sensitive data by using


masks to prevent access to complete, such as email addresses or numbers.

Object-level security

Object-level security is a security mechanism that controls access to specific database


objects, such as tables, views, or procedures, based on user privileges or roles. It ensures
that users or roles can only interact with and manipulate the objects they have been
granted permissions for, protecting the integrity and confidentiality of the database
schema and its associated resources.

For details on the managing granular permissions in SQL, see SQL granular permissions.

Row-level security

Row-level security is a database security feature that restricts access to individual rows
or records within a database table based on specified criteria, such as user roles or
attributes. It ensures that users can only view or manipulate data that is explicitly
authorized for their access, enhancing data privacy and control.

For details on row-level security, see Row-level security in Fabric data warehousing.

Column-level security
Column-level security is a database security measure that limits access to specific
columns or fields within a database table, allowing users to see and interact with only
the authorized columns while concealing sensitive or restricted information. It offers
fine-grained control over data access, safeguarding confidential data within a database.

For details on column-level security, see Column-level security in Fabric data


warehousing.

Dynamic data masking


Dynamic data masking helps prevent unauthorized viewing of sensitive data by enabling
administrators to specify how much sensitive data to reveal, with minimal effect on the
application layer. Dynamic data masking can be configured on designated database
fields to hide sensitive data in the result sets of queries. With dynamic data masking, the
data in the database isn't changed, so it can be used with existing applications since
masking rules are applied to query results. Many applications can mask sensitive data
without modifying existing queries.

For details on dynamic data masking, see Dynamic data masking in Fabric data
warehousing.

Share a warehouse
Sharing is a convenient way to provide users read access to your Warehouse for
downstream consumption. Sharing allows downstream users in your organization to
consume a Warehouse using SQL, Spark, or Power BI. You can customize the level of
permissions that the shared recipient is granted to provide the appropriate level of
access.

For more information on sharing, see How to share your warehouse and manage
permissions.

Guidance on user access


When evaluating the permissions to assign to a user, consider the following guidance:

Only team members who are currently collaborating on the solution should be
assigned to Workspace roles (Admin, Member, Contributor), as this provides them
access to all Items within the workspace.
If they primarily require read only access, assign them to the Viewer role and grant
read access on specific objects through T-SQL. For more information, see Manage
SQL granular permissions.
If they are higher privileged users, assign them to Admin, Member, or Contributor
roles. The appropriate role is dependent on the other actions that they need to
perform.
Other users, who only need access to an individual warehouse or require access to
only specific SQL objects, should be given Fabric Item permissions and granted
access through SQL to the specific objects.
You can manage permissions on Microsoft Entra ID (formerly Azure Active
Directory) groups, as well, rather than adding each specific member. For more
information, see Microsoft Entra authentication as an alternative to SQL
authentication in Microsoft Fabric.

User audit logs


To track user activity in warehouse and SQL analytics endpoint for meeting regulatory
compliance and records managements requirements, a set of audit activities are
accessible via Microsoft Purview and PowerShell. You can use user audit logs to identify
who is taking what action on your Fabric items.

For more information on how to access user audit logs, see Track user activities in
Microsoft Fabric and Operations list.

Related content
Connectivity
SQL granular permissions in Microsoft Fabric
How to share your warehouse and manage permissions
Microsoft Entra authentication as an alternative to SQL authentication in Microsoft
Fabric

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


Microsoft Entra authentication as an
alternative to SQL authentication
Article • 10/16/2024

Applies to: ✅ SQL analytics endpoint and Warehouse in Microsoft Fabric

This article covers technical methods that users and customers can employ to transition
from SQL authentication to Microsoft Entra authentication within Microsoft Fabric.
Microsoft Entra authentication is an alternative to usernames and passwords via SQL
authentication for signing in to the SQL analytics endpoint of the lakehouse or the
Warehouse in Microsoft Fabric. Microsoft Entra authentication is advisable and vital for
creating a secure data platform.

This article focuses on Microsoft Entra authentication as an alternative to SQL


authentication in Microsoft Fabric items such as a Warehouse or Lakehouse SQL
analytics endpoint.

Benefits of Microsoft Entra authentication in


Fabric
One of Microsoft Fabric's core principles is secure by design. Microsoft Entra is integral
to Microsoft Fabric's security by ensuring strong data protection, governance, and
compliance.

Microsoft Entra plays a crucial role in Microsoft Fabric's security for several reasons:

Authentication: Verify users and service principals using Microsoft Entra ID, which
grants access tokens for operations within Fabric.
Secure access: Connect securely to cloud apps from any device or network,
safeguarding requests made to Fabric.
Conditional access: Admins can set policies that assess user login context, control
access, or enforce extra verification steps.
Integration: Microsoft Entra ID seamlessly works with all Microsoft SaaS offerings,
including Fabric, allowing easy access across devices and networks.
Broad platform: Gain access to Microsoft Fabric with Microsoft Entra ID via any
method, whether through the Fabric portal, SQL connection string, REST API, or
XMLA endpoint.

Microsoft Entra adopts a complete Zero Trust policy, offering a superior alternative to
traditional SQL authentication limited to usernames and passwords. This approach:
Prevents user impersonation.
Enables fine-grained access control considering user identity, environment,
devices, etc.
Supports advanced security like Microsoft Entra multifactor authentication.

Fabric configuration
Microsoft Entra authentication for use with a Warehouse or Lakehouse SQL analytics
endpoint requires configuration in both Tenant and Workspace settings.

Tenant setting
A Fabric admin in your tenant must permit service principal names (SPN) access to
Fabric APIs, necessary for the SPN to interface for SQL connection strings to Fabric
warehouse or SQL analytics endpoint items.

This setting is located in the Developer settings section and is labeled Service principals
can use Fabric APIs. Make sure it is Enabled.
Workspace setting
A Fabric admin in your workspace must grant access for a user or SPN to access Fabric
items.

There are two means by which a User/SPN can be granted access:

Grant a user/SPN membership to a role: Any workspace role (Admin, Member,


Contributor, or Viewer) is sufficient to connect to warehouse or lakehouse items
with a SQL connection string.

1. In the Manage access option in the Workspace, assign the Contributor role.
For more information, see Service roles.

Assign a user/SPN to a specific item: Grant access to a specific Warehouse or SQL


analytics endpoint of a Lakehouse. A Fabric admin can choose from different
permission levels.

1. Navigate to the relevant Warehouse or SQL analytics endpoint item.


2. Select More options, then Manage Permissions. Select Add user.
3. Add the User/SPN on the Grant people access page.
4. Assign the necessary permissions to a User/SPN. Choose no Additional
permissions to grant connect permissions only.


You can alter the default permissions given to the User or SPN by the system. Use the T-
SQL GRANT and DENY commands to alter permissions as required, or ALTER ROLE to
add membership to roles.

Currently, SPNs don't have the capability as user accounts for detailed permission
configuration with GRANT / DENY .

Support for user identities and service principal


names (SPNs)
Fabric natively supports authentication and authorization for Microsoft Entra users and
service principal names (SPN) in SQL connections to warehouse and SQL analytics
endpoint items.

User identities are the unique credentials for each user within an organization.
SPNs represent application objects within a tenant and act as the identity for
instances of applications, taking on the role of authenticating and authorizing
those applications.

Support for tabular data stream (TDS)


Fabric uses the Tabular Data Stream (TDS) protocol, the same as SQL Server, when you
connect with a connection string.

Fabric is compatible with any application or tool able to connect to a product with the
SQL Database Engine. Similar to a SQL Server instance connection, TDS operates on TCP
port 1433. For more information about Fabric SQL connectivity and finding the SQL
connection string, see Connectivity.

A sample SQL connection string looks like:


<guid_unique_your_item>.datawarehouse.fabric.microsoft.com .

Applications and client tools can set the Authentication connection property in the
connection string to choose a Microsoft Entra authentication mode. The following table
details the different Microsoft Entra authentication modes, including support for
Microsoft Entra multifactor authentication (MFA).

ノ Expand table
Authentication Scenarios Comments
mode

Microsoft Entra Utilized by applications or tools in Activate MFA and Microsoft Entra
Interactive situations where user authentication Conditional Access policies to
can occur interactively, or when it is enforce organizational rules.
acceptable to have manual
intervention for credential verification.

Microsoft Entra Used by apps for secure authentication Advisable to enable Microsoft Entra
Service Principal without human intervention, most Conditional Access policies.
suited for application integration.

Microsoft Entra When applications can't use SPN- MFA must be off, and no
Password based authentication due to conditional access policies can be
incompatibility, or require a generic set. We recommend validating with
username and password for many the customer's security team
users, or if other methods are before opting for this solution.
infeasible.

Driver support for Microsoft Entra


authentication
While most of the SQL drivers initially came with support for Microsoft Entra
authentication, recent updates have expanded compatibility to include SPN-based
authentication. This enhancement simplifies the shift to Microsoft Entra authentication
for various applications and tools through driver upgrades and adding support for
Microsoft Entra authentication.

However, sometimes it's necessary to adjust additional settings such as enabling certain
ports or firewalls to facilitate Microsoft Entra authentication on the host machine.

Applications and tools must upgrade drivers to versions that support Microsoft Entra
authentication and add an authentication mode keyword in their SQL connection string,
like ActiveDirectoryInteractive , ActiveDirectoryServicePrincipal , or
ActiveDirectoryPassword .

Fabric is compatible with Microsoft's native drivers, including OLE DB,


Microsoft.Data.SqlClient , and generic drivers such ODBC and JDBC. The transition for

applications to work with Fabric can be managed through reconfiguration to use


Microsoft Entra ID-based authentication.

For more information, see Connectivity to data warehousing in Microsoft Fabric.

Microsoft OLE DB
The OLE DB Driver for SQL Server is a stand-alone data access API designed for OLE DB
and first released with SQL Server 2005 (9.x). Since, expanded features include SPN-
based authentication with version 18.5.0, adding to the existing authentication methods
from earlier versions.

ノ Expand table

Authentication mode SQL connection string

Microsoft Entra Interactive Microsoft Entra interactive authentication

Microsoft Entra Service Principal Microsoft Entra Service Principal authentication

Microsoft Entra Password Microsoft Entra username and password authentication

For a C# code snippet using OLE DB with SPN-based authentication, see


System.Data.OLEDB.Connect.cs .

Microsoft ODBC Driver


The Microsoft ODBC Driver for SQL Server is a single dynamic-link library (DLL)
containing run-time support for applications using native-code APIs to connect to SQL
Server. It is recommended to use the most recent version for applications to integrate
with Fabric.

For more information on Microsoft Entra authentication with ODBC, see Using Microsoft
Entra ID with the ODBC Driver sample code.

ノ Expand table

Authentication SQL Connection String


Mode

Microsoft Entra DRIVER={ODBC Driver 18 for SQL Server};SERVER=<SQL Connection


Interactive String>;DATABASE=<DB Name>;UID=<Client_ID@domain>;PWD=
<Secret>;Authentication=ActiveDirectoryInteractive

Microsoft Entra DRIVER={ODBC Driver 18 for SQL Server};SERVER=<SQL Connection


Service Principal String>;DATABASE=<DBName>;UID=<Client_ID@domain>;PWD=
<Secret>;Authentication=ActiveDirectoryServicePrincipal

Microsoft Entra DRIVER={ODBC Driver 18 for SQL Server};SERVER=<SQL Connection


Password String>;DATABASE=<DBName>;UID=<Client_ID@domain>;PWD=
<Secret>;Authentication=ActiveDirectoryPassword

For a python code snippet using ODBC with SPN-based authentication, see pyodbc-dw-
connectivity.py .

Microsoft JDBC Driver


The Microsoft JDBC Driver for SQL Server is a Type 4 JDBC driver that provides database
connectivity through the standard JDBC application program interfaces (APIs) available
on the Java platform.

Starting from version 9.2, mssql-jdbc introduces support for


ActiveDirectoryInteractive and ActiveDirectoryServicePrincipal , with
ActiveDirectoryPassword being supported in versions 12.2 and above. This driver

requires additional jars as dependencies, which must be compatible with the version of
the mssql-driver used in your application. For more information, see Feature
dependencies of JDBC driver and Client setup requirement.

ノ Expand table

Authentication Mode More information

Microsoft Entra Interactive Connect using ActiveDirectoryInteractive authentication mode


Authentication Mode More information

Microsoft Entra Service Connect using ActiveDirectoryServicePrincipal authentication


Principal mode

Microsoft Entra Password Connect using ActiveDirectoryPassword authentication mode

For a java code snippet using JDBC with SPN-based authentication, see
fabrictoolbox/dw_connect.java and sample pom file pom.xml .

Microsoft.Data.SqlClient in .NET Core (C#)


The Microsoft.Data.SqlClient is a data provider for Microsoft SQL Server and Azure SQL
Database. It is a union of the two System.Data.SqlClient components that live
independently in .NET Framework and .NET Core, providing a set of classes for accessing
Microsoft SQL Server databases. Microsoft.Data.SqlClient is recommended for all new
and future development.

ノ Expand table

Authentication Mode More information

Microsoft Entra Interactive Using interactive authentication

Microsoft Entra Service Principal Using service principal authentication

Microsoft Entra Password Using Password Authentication

Code snippets using SPNs:

Microsoft.Data.SqlClient.Connect.cs
System.Data.SqlClient.Connect.cs

Related content
Connectivity
Security for data warehousing in Microsoft Fabric

Feedback
Was this page helpful?  Yes  No
Provide product feedback | Ask the community
Workspace roles in Fabric data
warehousing
Article • 07/18/2024

Applies to: ✅ SQL analytics endpoint and Warehouse in Microsoft Fabric

This article details the permissions that workspace roles provide in SQL analytics
endpoint and Warehouse. For instructions on assigning workspace roles, see Give
Workspace Access.

Workspace roles
Assigning users to the various workspace roles provides the following capabilities:

ノ Expand table

Workspace Description
role

Admin Grants the user CONTROL access for each Warehouse and SQL analytics endpoint
within the workspace, providing them with full read/write permissions and the
ability to manage granular user SQL permissions.

Allows the user to see workspace-scoped session, monitor connections and


requests in DMVs via TSQL, and KILL sessions.

Member Grants the user CONTROL access for each Warehouse and SQL analytics endpoint
within the workspace, providing them with full read/write permissions and the
ability to manage granular user SQL permissions.

Contributor Grants the user CONTROL access for each Warehouse and SQL analytics endpoint
within the workspace, providing them with full read/write permissions and the
ability to manage granular user SQL permissions.

Viewer Grants the user CONNECT and ReadData permissions for each Warehouse and
SQL analytics endpoint within the workspace. Viewers have SQL permissions to
read data from tables/views using T-SQL. For more information, see Manage SQL
granular permissions.

Related content
Security for data warehousing in Microsoft Fabric
SQL granular permissions
Row-level security in Fabric data warehousing
Column-level security in Fabric data warehousing

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


SQL granular permissions in Microsoft
Fabric
Article • 04/24/2024

Applies to: SQL analytics endpoint and Warehouse in Microsoft Fabric

When the default permissions provided by assignment to workspace roles or granted


through item permissions are insufficient, standard SQL constructs are available for
more granular control.

For SQL analytics endpoint and Warehouse:

Object-level-security can be managed using GRANT, REVOKE, and DENY T-SQL


syntax.
Users can be assigned to SQL roles, both custom and built-in database roles.

User granular permissions


In order for a user to connect to the database, the user must be assigned to a
Workspace role or assigned the item Read permission. Without Read permission at
a minimum, the connection fails.
If you'd like to set up a user's granular permissions before allowing them to
connect to the warehouse, permissions can first be set up within SQL. Then, they
can be given access by assigning them to a Workspace role or granting item
permissions.

Limitations
CREATE USER cannot be explicitly executed currently. When GRANT or DENY is
executed, the user is created automatically. The user will not be able to connect
until sufficient workspace level rights are given.

View my permissions
When a user connects to the SQL connection string, they can view the permissions
available to them using the sys.fn_my_permissions function.

User's database scoped permissions:

SQL
SELECT *
FROM sys.fn_my_permissions(NULL, 'Database');

User's schema scoped permissions:

SQL

SELECT *
FROM sys.fn_my_permissions('<schema-name>', 'Schema');

User's object-scoped permissions:

SQL

SELECT *
FROM sys.fn_my_permissions('<schema-name>.<object-name>', 'Object');

View permissions granted explicitly to users


When connected via the SQL connection string, a user with elevated permissions can
query granted permissions by using system views. This doesn't show the users or user
permissions that are given to users by being assigned to workspace roles or assigned
item permissions.

SQL

SELECT DISTINCT pr.principal_id, pr.name, pr.type_desc,


pr.authentication_type_desc, pe.state_desc, pe.permission_name
FROM sys.database_principals AS pr
INNER JOIN sys.database_permissions AS pe
ON pe.grantee_principal_id = pr.principal_id;

Data protection features


You can secure column filters and predicate-based row filters on tables in Warehouse or
SQL analytics endpoint to roles and users in Microsoft Fabric. You can also mask
sensitive data from non-admins using dynamic data masking.

Row-level security in Fabric data warehousing


Column-level security in Fabric data warehousing
Dynamic data masking in Fabric data warehousing
Related content
Security for data warehousing in Microsoft Fabric
GRANT, REVOKE, and DENY
How to share your warehouse and manage permissions

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


Share your data and manage
permissions
Article • 09/29/2024

Applies to: ✅ Warehouse and Mirrored Database in Microsoft Fabric

Sharing is a convenient way to provide users read access to your data for downstream
consumption. Sharing allows downstream users in your organization to consume a
Warehouse using T-SQL, Spark, or Power BI. You can customize the level of permissions
that the shared recipient is granted to provide the appropriate level of access.

7 Note

You must be an admin or member in your workspace to share an item in Microsoft


Fabric.

Get started
After identifying the Warehouse item you would like to share with another user in your
Fabric workspace, select the quick action in the row to Share.

The following animated gif reviews the steps to select a warehouse to share, select the
permissions to assign, and then finally Grant the permissions to another user.

Share a warehouse
1. You can share your Warehouse from the OneLake data hub or Warehouse item by
choosing Share from quick action, as highlighted in the following image.

2. You're prompted with options to select who you would like to share the
Warehouse with, what permissions to grant them, and whether they'll be notified
by email.

3. Fill out all required fields, select Grant access.

4. When the shared recipient receives the email, they can select Open and navigate
to the Warehouse Data Hub page.

5. Depending on the level of access the shared recipient has been granted, the
shared recipient is now able to connect to the SQL analytics endpoint, query the
Warehouse, build reports, or read data through Spark.

Fabric security roles


Here's more detail about each of the permissions provided:

If no additional permissions are selected - The shared recipient by default


receives "Read" permission, which only allows the recipient to connect to the SQL
analytics endpoint, the equivalent of CONNECT permissions in SQL Server. The
shared recipient won't be able to query any table or view or execute any function
or stored procedure unless they're provided access to objects within the
Warehouse using T-SQL GRANT statement.

 Tip

ReadData (used by the warehouse for T-SQL permissions), ReadAll (used by


OneLake and the SQL analytics endpoint), and Build (used by Power BI) are
separate permissions that do not overlap.

"Read all data using SQL" is selected ("ReadData" permissions)- The shared
recipient can read all the objects within the Warehouse. ReadData is the equivalent
of db_datareader role in SQL Server. The shared recipient can read data from all
tables and views within the Warehouse. If you want to further restrict and provide
granular access to some objects within the Warehouse, you can do this using T-
SQL GRANT / REVOKE / DENY statements.
In the SQL analytics endpoint of the Lakehouse, "Read all SQL Endpoint data" is
equivalent to "Read all data using SQL".

"Read all data using Apache Spark" is selected ("ReadAll" permissions)- The
shared recipient has read access to the underlying parquet files in OneLake, which
can be consumed using Spark. ReadAll should be provided only if the shared
recipient wants complete access to your warehouse's files using the Spark engine.

"Build reports on the default dataset" checkbox is selected ("Build"


permissions)- The shared recipient can build reports on top of the default
semantic model that is connected to your Warehouse. Build should be provided if
the shared recipient wants Build permissions on the default semantic model, to
create Power BI reports on this data. The Build checkbox is selected by default, but
can be unchecked.

ReadData permissions
With ReadData permissions, the shared recipient can open the Warehouse editor in
read-only mode and query the tables and views within the Warehouse. The shared
recipient can also choose to copy the SQL analytics endpoint provided and connect to a
client tool to run these queries.

ReadAll permissions
A shared recipient with ReadAll permissions can find the Azure Blob File System (ABFS)
path to the specific file in OneLake from the Properties pane in the Warehouse editor.
The shared recipient can then use this path within a Spark Notebook to read this data.

For example, in the following screenshot, a user with ReadAll permissions can query the
data in FactSale with a Spark query in a new notebook.

Build permissions
With Build permissions, the shared recipient can create reports on top of the default
semantic model that is connected to the Warehouse. The shared recipient can create
Power BI reports from the Data Hub or also do the same using Power BI Desktop.

Manage permissions
The Manage permissions page shows the list of users who have been given access by
either assigning to Workspace roles or item permissions.

If you're a member of the Admin or Member workspace roles, go to your workspace


and select More options. Then, select Manage permissions.

For users who were provided workspace roles, you'll see the corresponding user,
workspace role, and permissions. Members of the Admin, Member, and Contributor
workspace roles have read/write access to items in this workspace. Viewers have
ReadData permissions and can query all tables and views within the Warehouse in that
workspace. Item permissions Read, ReadData, and ReadAll can be provided to users.

You can choose to add or remove permissions using Manage permissions:

Remove access removes all item permissions.


Remove ReadData removes the ReadData permissions.
Remove ReadAll removes ReadAll permissions.
Remove build removes Build permissions on the corresponding default semantic
model.

Data protection features


Microsoft Fabric data warehousing supports several technologies that administrators
can use to protect sensitive data from unauthorized viewing. By securing or obfuscating
data from unauthorized users or roles, these security features can provide data
protection in both a Warehouse and SQL analytics endpoint without application
changes.

Column-level security prevents unauthorized viewing of columns in tables.


Row-level security prevents unauthorized viewing of rows in tables, using familiar
WHERE clause filter predicates.

Dynamic data masking prevents unauthorized viewing of sensitive data by using


masks to prevent access to complete, such as email addresses or numbers.

Limitations
If you provide item permissions or remove users who previously had permissions,
permission propagation can take up to two hours. The new permissions are visible
in Manage permissions immediately. Sign in again to ensure that the permissions
are reflected in your SQL analytics endpoint.
Shared recipients are able to access the Warehouse using owner's identity
(delegated mode). Ensure that the owner of the Warehouse is not removed from
the workspace.
Shared recipients only have access to the Warehouse they receive and not any
other items within the same workspace as the Warehouse. If you want to provide
permissions for other users in your team to collaborate on the Warehouse (read
and write access), add them as Workspace roles such as Member or Contributor.
Currently, when you share a Warehouse and choose Read all data using SQL, the
shared recipient can access the Warehouse editor in a read-only mode. These
shared recipients can create queries, but cannot currently save their queries.
Currently, sharing a Warehouse is only available through the user experience.
If you want to provide granular access to specific objects within the Warehouse,
share the Warehouse with no additional permissions, then provide granular access
to specific objects using T-SQL GRANT statement. For more information, see T-SQL
syntax for GRANT, REVOKE, and DENY.
If you see that the ReadAll permissions and ReadData permissions are disabled in
the sharing dialog, refresh the page.
Shared recipients do not have permission to reshare a Warehouse.
If a report built on top of the Warehouse is shared with another recipient, the
shared recipient needs more permissions to access the report. This depends on the
mode of access to the semantic model by Power BI:
If accessed through Direct query mode then ReadData permissions (or granular
SQL permissions to specific tables/views) need to be provided to the
Warehouse.
If accessed through Direct Lake mode, then ReadData permissions (or granular
permissions to specific tables/views) need to be provided to the Warehouse.
Direct Lake mode is the default connection type for semantic models that use a
Warehouse or SQL analytics endpoint as a data source. For more information,
see Direct Lake mode.
If accessed through Import mode then no additional permissions are needed.
Currently, sharing a warehouse directly with an SPN is not supported.

Related content
Query the SQL analytics endpoint or Warehouse in Microsoft Fabric
How to use Microsoft Fabric notebooks
Access Fabric OneLake shortcuts in an Apache Spark notebook
Navigate the Fabric Lakehouse explorer
GRANT (Transact-SQL)

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


Column-level security in Fabric data
warehousing
Article • 08/22/2024

Applies to: ✅ SQL analytics endpoint and Warehouse in Microsoft Fabric

Column-level security simplifies the design and coding of security in your application,
allowing you to restrict column access to protect sensitive data. For example, ensuring
that specific users can access only certain columns of a table pertinent to their
department.

Column-level security at the data level


The access restriction logic is located in the database tier, not in any single application
tier. The database applies the access restrictions every time data access is attempted,
from any application or reporting platform including Power BI. This restriction makes
your security more reliable and robust by reducing the surface area of your overall
security system.

Column-level security only applies to queries on a Warehouse or SQL analytics endpoint


in Fabric. Power BI queries on a warehouse in Direct Lake mode will fall back to Direct
Query mode to abide by column-level security.

Restrict access to certain columns to certain


users
In addition, column-level security is simpler and than designing additional views to filter
out columns for imposing access restrictions on the users.

Implement column-level security with the GRANT T-SQL statement. For simplicity of
management, assigning permissions to roles is preferred to using individuals.

Column-level security is applied to shared warehouse or lakehouse accessed through a


SQL analytics endpoint, because the underlying data source hasn't changed.

Only Microsoft Entra authentication is supported. For more information, see Microsoft
Entra authentication as an alternative to SQL authentication in Microsoft Fabric.

Examples
This example will create a table and will limit the columns that [email protected] can
see in the customers table.

SQL

CREATE TABLE dbo.Customers


(CustomerID int,
FirstName varchar(100) NULL,
CreditCard char(16) NOT NULL,
LastName varchar(100) NOT NULL,
Phone varchar(12) NULL,
Email varchar(100) NULL);

We will allow Charlie to only access the columns related to the customer, but not the
sensitive CreditCard column:

SQL

GRANT SELECT ON Customers(CustomerID, FirstName, LastName, Phone, Email) TO


[[email protected]];

Queries executed as [email protected] will fail if they include the CreditCard column:

SQL

SELECT * FROM Customers;

Output

Msg 230, Level 14, State 1, Line 12


The SELECT permission was denied on the column 'CreditCard' of the object
'Customers', database 'ContosoSales', schema 'dbo'.

Next step
Implement column-level security in Fabric Data Warehousing

Related content
Security for data warehousing in Microsoft Fabric
Share your warehouse and manage permissions
Row-level security in Fabric data warehousing
Dynamic data masking in Fabric data warehousing

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


Row-level security in Fabric data
warehousing
Article • 08/01/2024

Applies to: SQL analytics endpoint and Warehouse in Microsoft Fabric

Row-level security (RLS) enables you to use group membership or execution context to
control access to rows in a database table. For example, you can ensure that workers
access only those data rows that are pertinent to their department. Another example is
to restrict customers' data access to only the data relevant to their company in a
multitenant architecture. The feature is similar to row-level security in SQL Server.

Row-level security at the data level


Row-level security simplifies the design and coding of security in your application. Row-
level security helps you implement restrictions on data row access.

The access restriction logic is in the database tier, not in any single application tier. The
database applies the access restrictions every time data access is attempted, from any
application or reporting platform including Power BI. This makes your security system
more reliable and robust by reducing the surface area of your security system. Row-level
security only applies to queries on a Warehouse or SQL analytics endpoint in Fabric.
Power BI queries on a warehouse in Direct Lake mode will fall back to Direct Query
mode to abide by row-level security.

Restrict access to certain rows to certain users


Implement RLS by using the CREATE SECURITY POLICY Transact-SQL statement, and
predicates created as inline table-valued functions.

Row-level security is applied to shared warehouse or lakehouse, because the underlying


data source hasn't changed.

Predicate-based row-level security


Row-level security in Fabric Synapse Data Warehouse supports predicate-based security.
Filter predicates silently filter the rows available to read operations.
Access to row-level data in a table is restricted by a security predicate defined as an
inline table-valued function. The function is then invoked and enforced by a security
policy. For filter predicates, the application is unaware of rows that are filtered from the
result set. If all rows are filtered, then a null set will be returned.

Filter predicates are applied while reading data from the base table. They affect all get
operations: SELECT , DELETE , and UPDATE . Each table must have its own row-level security
defined separately. Users who query tables without a row level security policy will view
unfiltered data.

Users can't select or delete rows that are filtered. The user can't update rows that are
filtered. But it's possible to update rows in such a way that they'll be filtered afterward.

Filter predicate and security policies have the following behavior:

You can define a predicate function that joins with another table and/or invokes a
function. If the security policy is created with SCHEMABINDING = ON (the default),
then the join or function is accessible from the query and works as expected
without any additional permission checks.

You can issue a query against a table that has a security predicate defined but
disabled. Any rows that are filtered or blocked aren't affected.

If a dbo user, a member of the db_owner role, or the table owner queries a table
that has a security policy defined and enabled, the rows are filtered or blocked as
defined by the security policy.

Attempts to alter the schema of a table bound by a schema bound security policy
will result in an error. However, columns not referenced by the predicate can be
altered.

Attempts to add a predicate on a table that already has one defined for the
specified operation results in an error. This will happen whether the predicate is
enabled or not.

Attempts to modify a function, that is used as a predicate on a table within a


schema bound security policy, will result in an error.

Defining multiple active security policies that contain non-overlapping predicates,


succeeds.

Filter predicates have the following behavior:

Define a security policy that filters the rows of a table. The application is unaware
of any rows that are filtered for SELECT , UPDATE , and DELETE operations. Including
situations where all the rows are filtered out. The application can INSERT rows,
even if they will be filtered during any other operation.

Permissions
Creating, altering, or dropping security policies requires the ALTER ANY SECURITY POLICY
permission. Creating or dropping a security policy requires ALTER permission on the
schema.

Additionally, the following permissions are required for each predicate that is added:

SELECT and REFERENCES permissions on the function being used as a predicate.

REFERENCES permission on the target table being bound to the policy.

REFERENCES permission on every column from the target table used as arguments.

Security policies apply to all users, including dbo users in the database. Dbo users can
alter or drop security policies however their changes to security policies can be audited.
If members of roles like Administrator, Member, or Contributor need to see all rows to
troubleshoot or validate data, the security policy must be written to allow that.

If a security policy is created with SCHEMABINDING = OFF , then to query the target table,
users must have the SELECT or EXECUTE permission on the predicate function and any
additional tables, views, or functions used within the predicate function. If a security
policy is created with SCHEMABINDING = ON (the default), then these permission checks
are bypassed when users query the target table.

Security considerations: side channel attacks


Consider and prepare for the following two scenarios.

Malicious security policy manager


It is important to observe that a malicious security policy manager, with sufficient
permissions to create a security policy on top of a sensitive column and having
permission to create or alter inline table-valued functions, can collude with another user
who has select permissions on a table to perform data exfiltration by maliciously
creating inline table-valued functions designed to use side channel attacks to infer data.
Such attacks would require collusion (or excessive permissions granted to a malicious
user) and would likely require several iterations of modifying the policy (requiring
permission to remove the predicate in order to break the schema binding), modifying
the inline table-valued functions, and repeatedly running select statements on the target
table. We recommend you limit permissions as necessary and monitor for any suspicious
activity. Activity such as constantly changing policies and inline table-valued functions
related to row-level security should be monitored.

Carefully crafted queries


It is possible to cause information leakage by using carefully crafted queries that use
errors to exfiltrate data. For example, SELECT 1/(SALARY-100000) FROM PAYROLL WHERE
NAME='John Doe'; would let a malicious user know that John Doe's salary is exactly

$100,000. Even though there is a security predicate in place to prevent a malicious user
from directly querying other people's salary, the user can determine when the query
returns a divide-by-zero exception.

Examples
We can demonstrate row-level security Warehouse and SQL analytics endpoint in
Microsoft Fabric.

The following example creates sample tables that will work with Warehouse in Fabric,
but in SQL analytics endpoint use existing tables. In the SQL analytics endpoint, you
cannot CREATE TABLE , but you can CREATE SCHEMA , CREATE FUNCTION , and CREATE
SECURITY POLICY .

In this example, first create a schema sales , a table sales.Orders .

SQL

CREATE SCHEMA sales;


GO

-- Create a table to store sales data


CREATE TABLE sales.Orders (
SaleID INT,
SalesRep VARCHAR(100),
ProductName VARCHAR(50),
SaleAmount DECIMAL(10, 2),
SaleDate DATE
);

-- Insert sample data


INSERT INTO sales.Orders (SaleID, SalesRep, ProductName, SaleAmount,
SaleDate)
VALUES
(1, '[email protected]', 'Smartphone', 500.00, '2023-08-01'),
(2, '[email protected]', 'Laptop', 1000.00, '2023-08-02'),
(3, '[email protected]', 'Headphones', 120.00, '2023-08-03'),
(4, '[email protected]', 'Tablet', 800.00, '2023-08-04'),
(5, '[email protected]', 'Smartwatch', 300.00, '2023-08-05'),
(6, '[email protected]', 'Gaming Console', 400.00, '2023-08-06'),
(7, '[email protected]', 'TV', 700.00, '2023-08-07'),
(8, '[email protected]', 'Wireless Earbuds', 150.00, '2023-08-08'),
(9, '[email protected]', 'Fitness Tracker', 80.00, '2023-08-09'),
(10, '[email protected]', 'Camera', 600.00, '2023-08-10');

Create a Security schema, a function Security.tvf_securitypredicate , and a security


policy SalesFilter .

SQL

-- Creating schema for Security


CREATE SCHEMA Security;
GO

-- Creating a function for the SalesRep evaluation


CREATE FUNCTION Security.tvf_securitypredicate(@SalesRep AS nvarchar(50))
RETURNS TABLE
WITH SCHEMABINDING
AS
RETURN SELECT 1 AS tvf_securitypredicate_result
WHERE @SalesRep = USER_NAME() OR USER_NAME() = '[email protected]';
GO

-- Using the function to create a Security Policy


CREATE SECURITY POLICY SalesFilter
ADD FILTER PREDICATE Security.tvf_securitypredicate(SalesRep)
ON sales.Orders
WITH (STATE = ON);
GO

To modify a row level security function, you must first drop the security policy. In the
following script, we drop the policy SalesFilter before issuing an ALTER FUNCTION
statement on Security.tvf_securitypredicate . Then, we recreate the policy
SalesFilter .

SQL

-- Drop policy so we can change the predicate function.


DROP SECURITY POLICY SalesFilter;
GO

-- Alter the function for the SalesRep evaluation


ALTER FUNCTION Security.tvf_securitypredicate(@SalesRep AS nvarchar(50))
RETURNS TABLE
WITH SCHEMABINDING
AS
RETURN SELECT 1 AS tvf_securitypredicate_result
WHERE @SalesRep = USER_NAME() OR USER_NAME() = '[email protected]';
GO

-- Re-create a Security Policy


CREATE SECURITY POLICY SalesFilter
ADD FILTER PREDICATE Security.tvf_securitypredicate(SalesRep)
ON sales.Orders
WITH (STATE = ON);
GO

Next step
Implement row-level security in Fabric Data Warehousing

Related content
Security for data warehousing in Microsoft Fabric
Share your warehouse and manage permissions
Column-level security in Fabric data warehousing
Dynamic data masking in Fabric data warehousing

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


Dynamic data masking in Fabric data
warehousing
Article • 04/24/2024

Applies to: SQL analytics endpoint and Warehouse in Microsoft Fabric

Dynamic data masking limits sensitive data exposure by masking it to nonprivileged


users. It can be used to greatly simplify the design and coding of security in your
application.

Dynamic data masking helps prevent unauthorized viewing of sensitive data by enabling
administrators to specify how much sensitive data to reveal, with minimal effect on the
application layer. Dynamic data masking can be configured on designated database
fields to hide sensitive data in the result sets of queries. With dynamic data masking, the
data in the database isn't changed, so it can be used with existing applications since
masking rules are applied to query results. Many applications can mask sensitive data
without modifying existing queries.

A central data masking policy acts directly on sensitive fields in the database.
Designate privileged users or roles that do have access to the sensitive data.
Dynamic data masking features full masking and partial masking functions, and a
random mask for numeric data.
Simple Transact-SQL commands define and manage masks.

The purpose of dynamic data masking is to limit exposure of sensitive data, preventing
users who shouldn't have access to the data from viewing it. Dynamic data masking
doesn't aim to prevent database users from connecting directly to the database and
running exhaustive queries that expose pieces of the sensitive data.

Dynamic data masking is complementary to other Fabric security features like column-
level security and row-level security. It's highly recommended to use these data
protection features together in order to protect the sensitive data in the database.

Define a dynamic data mask


A masking rule can be defined on a column in a table, in order to obfuscate the data in
that column. Five types of masks are available.

ノ Expand table
Function Description Examples

Default Full masking according to the data Example column definition syntax: Phone#
types of the designated fields. varchar(12) MASKED WITH (FUNCTION =
'default()') NULL
For string data types, use XXXX (or
fewer) if the size of the field is fewer Example of alter syntax: ALTER COLUMN
than 4 characters (char, nchar, varchar, Gender ADD MASKED WITH (FUNCTION =
nvarchar, text, ntext). 'default()')

For numeric data types use a zero


value (bigint, bit, decimal, int, money,
numeric, smallint, smallmoney, tinyint,
float, real).

For date and time data types, use


1900-01-01 00:00:00.0000000 (date,
datetime2, datetime, datetimeoffset,
smalldatetime, time).

For binary data types use a single byte


of ASCII value 0 (binary, varbinary,
image).

Email Masking method that exposes the first Example definition syntax: Email
letter of an email address and the varchar(100) MASKED WITH (FUNCTION =
constant suffix ".com", in the form of 'email()') NULL
an email address. [email protected] .
Example of alter syntax: ALTER COLUMN
Email ADD MASKED WITH (FUNCTION =
'email()')

Random A random masking function for use on Example definition syntax: Account_Number
any numeric type to mask the original bigint MASKED WITH (FUNCTION =
value with a random value within a 'random([start range], [end range])')
specified range.
Example of alter syntax: ALTER COLUMN
[Month] ADD MASKED WITH (FUNCTION =
'random(1, 12)')

Custom Masking method that exposes the first Example definition syntax: FirstName
String and last letters and adds a custom varchar(100) MASKED WITH (FUNCTION =
padding string in the middle. prefix, 'partial(prefix,[padding],suffix)') NULL
[padding],suffix
Example of alter syntax: ALTER COLUMN
If the original value is too short to [Phone Number] ADD MASKED WITH (FUNCTION
complete the entire mask, part of the = 'partial(1,"XXXXXXX",0)')
prefix or suffix isn't exposed.
This turns a phone number like
Function Description Examples

555.123.1234 into 5XXXXXXX .

Additional example:

ALTER COLUMN [Phone Number] ADD MASKED


WITH (FUNCTION =
'partial(5,"XXXXXXX",0)')

This turns a phone number like


555.123.1234 into 555.1XXXXXXX .

For more examples, see How to implement dynamic data masking in Synapse Data
Warehouse.

Permissions
Users without the Administrator, Member, or Contributor rights on the workspace, and
without elevated permissions on the Warehouse, will see masked data.

You don't need any special permission to create a table with a dynamic data mask, only
the standard CREATE TABLE and ALTER on schema permissions.

Adding, replacing, or removing the mask of a column, requires the ALTER ANY
MASK permission and ALTER permission on the table. It's appropriate to grant ALTER ANY

MASK to a security officer.

Users with SELECT permission on a table can view the table data. Columns that are
defined as masked will display masked data. Grant the UNMASK permission to a user to
enable them to retrieve unmasked data from the columns for which masking is defined.

The CONTROL permission on the database includes both the ALTER ANY
MASK and UNMASK permission that enables the user to view unmasked data.

Administrative users or roles such as Admin, Member, or Contributor have CONTROL


permission on the database by design and can view unmasked data by default. Elevated
permissions on the Warehouse include CONTROL permission.

Security consideration: bypassing masking


using inference or brute-force techniques
Dynamic data masking is designed to simplify application development by limiting data
exposure in a set of predefined queries used by the application. While Dynamic Data
Masking can also be useful to prevent accidental exposure of sensitive data when
accessing data directly, it's important to note that unprivileged users with query
permissions can apply techniques to gain access to the actual data.

As an example, consider a user that has sufficient privileges to run queries on the
Warehouse, and tries to 'guess' the underlying data and ultimately infer the actual
values. Assume that we have a mask defined on the [Employee].[Salary] column, and
this user connects directly to the database and starts guessing values, eventually
inferring the [Salary] value in the Employees table:

SQL

SELECT ID, Name, Salary FROM Employees


WHERE Salary > 99999 and Salary < 100001;

Results in:

ノ Expand table

ID Name Salary

62543 Jane Doe 0

91245 John Smith 0

This demonstrates that dynamic data masking shouldn't be used alone to fully secure
sensitive data from users with query access to the Warehouse or SQL analytics endpoint.
It's appropriate for preventing sensitive data exposure, but doesn't protect against
malicious intent to infer the underlying data.

It's important to properly manage object-level security with SQL granular permissions,
and to always follow the minimal required permissions principle.

Related content
Workspace roles in Fabric data warehousing
Column-level security in Fabric data warehousing
Row-level security in Fabric data warehousing
Security for data warehousing in Microsoft Fabric

Next step
How to implement dynamic data masking in Synapse Data Warehouse

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


Implement column-level security in
Fabric data warehousing
Article • 08/05/2024

Applies to: SQL analytics endpoint and Warehouse in Microsoft Fabric

Column-level security (CLS) in Microsoft Fabric allows you to control access to columns
in a table based on specific grants on these tables. For more information, see Column-
level security in Fabric data warehousing.

This guide will walk you through the steps to implement column-level security in a
Warehouse or SQL analytics endpoint.

Prerequisites
Before you begin, make sure you have the following:

1. A Fabric workspace with an active capacity or trial capacity.


2. A Fabric Warehouse or SQL analytics endpoint on a Lakehouse.
3. Either the Administrator, Member, or Contributor rights on the workspace, or
elevated permissions on the Warehouse or SQL analytics endpoint.

1. Connect
1. Log in using an account with elevated access on the Warehouse or SQL analytics
endpoint. (Either Admin/Member/Contributor role on the workspace or Control
Permissions on the Warehouse or SQL analytics endpoint).
2. Open the Fabric workspace and navigate to the Warehouse or SQL analytics
endpoint where you want to apply column-level security.

2. Define column-level access for tables


1. Identify user or roles and the data tables you want to secure with column-level
security.

2. Implement column-level security with the GRANT T-SQL statement and a column
list. For simplicity of management, assigning permissions to roles is preferred to
using individuals.

SQL
-- Grant select to subset of columns of a table
GRANT SELECT ON YourSchema.YourTable
(Column1, Column2, Column3, Column4, Column5)
TO [SomeGroup];

3. Replace YourSchema with the name of your schema and YourTable with the name
of your target table.

4. Replace SomeGroup with the name of your User/Group.

5. Replace the comma-delimited columns list with the columns you want to give the
role access to.

6. Repeat these steps to grant specific column access for other tables if needed.

3. Test column-level access


1. Log in as a user who is a member of a role with an associated GRANT statement.
2. Query the database tables to verify that column-level security is working as
expected. Users should only see the columns they have access to, and should be
blocked from other columns. For example:

SQL

SELECT * FROM YourSchema.YourTable;

3. Similar results for the user will be filtered with other applications that use Microsoft
Entra authentication for database access. For more information, see Microsoft
Entra authentication as an alternative to SQL authentication in Microsoft Fabric.

4. Monitor and maintain column-level security


Regularly monitor and update your column-level security policies as your security
requirements evolve. Keep track of role assignments and ensure that users have the
appropriate access.

Related content
Column-level security for Fabric data warehousing
Row-level security in Fabric data warehousing
Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


Implement row-level security in
Microsoft Fabric data warehousing
Article • 08/05/2024

Applies to: SQL analytics endpoint and Warehouse in Microsoft Fabric

Row-level security (RLS) in Fabric Warehouse and SQL analytics endpoint allows you to
control access to rows in a database table based on user roles and predicates. For more
information, see Row-level security in Fabric data warehousing.

This guide will walk you through the steps to implement row-level security in Microsoft
Fabric Warehouse or SQL analytics endpoint.

Prerequisites
Before you begin, make sure you have the following:

1. A Fabric workspace with an active capacity or trial capacity.


2. A Fabric Warehouse or SQL analytics endpoint on a Lakehouse.
3. Either the Administrator, Member, or Contributor rights on the workspace, or
elevated permissions on the Warehouse or SQL analytics endpoint.

1. Connect
1. Log in using an account with elevated access on the Warehouse or SQL analytics
endpoint. (Either Admin/Member/Contributor role on the workspace or Control
Permissions on the Warehouse or SQL analytics endpoint).
2. Open the Fabric workspace and navigate to the Warehouse or SQL analytics
endpoint where you want to apply row-level security.

2. Define security policies


1. Determine the roles and predicates you want to use to control access to data.
Roles define who can access data, and predicates define the criteria for access.

2. Create security predicates. Security predicates are conditions that determine which
rows a user can access. You can create security predicates as inline table-valued
functions. This simple exercise assumes there is a column in your data table,
UserName_column , that contains the relevant username, populated by the system

function USER_NAME().

SQL

-- Creating schema for Security


CREATE SCHEMA Security;
GO

-- Creating a function for the SalesRep evaluation


CREATE FUNCTION Security.tvf_securitypredicate(@UserName AS
varchar(50))
RETURNS TABLE
WITH SCHEMABINDING
AS
RETURN SELECT 1 AS tvf_securitypredicate_result
WHERE @UserName = USER_NAME();
GO

-- Using the function to create a Security Policy


CREATE SECURITY POLICY YourSecurityPolicy
ADD FILTER PREDICATE Security.tvf_securitypredicate(UserName_column)
ON sampleschema.sampletable
WITH (STATE = ON);
GO

3. Replace YourSecurityPolicy with your policy name, tvf_securitypredicate with


the name of your predicate function, sampleschema with the name of your schema
and sampletable with the name of your target table.

4. Replace UserName_column with a column in your table that contains user names.

5. Replace WHERE @UserName = USER_NAME(); with a WHERE clause that matches the
desired predicate-based security filter. For example, this filters the data where the
UserName column, mapped to the @UserName parameter, matches the result of the

system function USER_NAME().

6. Repeat these steps to create security policies for other tables if needed.

3. Test row-level security


1. Log in to Fabric as a user who is a member of a role with an associated security
policy. Use the following query to verify the value that should be matched in the
table.

SQL
SELECT USER_NAME()

2. Query the database tables to verify that row-level security is working as expected.
Users should only see data that satisfies the security predicate defined in their role.
For example:

SQL

SELECT * FROM sampleschema.sampletable

3. Similar filtered results for the user will be filtered with other applications that use
Microsoft Entra authentication for database access. For more information, see
Microsoft Entra authentication as an alternative to SQL authentication in Microsoft
Fabric.

4. Monitor and maintain row-level security


Regularly monitor and update your row-level security policies as your security
requirements evolve. Keep track of role assignments and ensure that users have the
appropriate access.

Related content
Row-level security in Fabric data warehousing
Column-level security for Fabric data warehousing

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


How to implement dynamic data
masking in Synapse Data Warehouse
Article • 10/09/2024

Applies to: ✅ SQL analytics endpoint and Warehouse in Microsoft Fabric

Dynamic data masking is a cutting-edge data protection technology that helps


organizations safeguard sensitive information within their databases. It allows you to
define masking rules for specific columns, ensuring that only authorized users see the
original data while concealing it for others. Dynamic data masking provides an
additional layer of security by dynamically altering the data presented to users, based on
their access permissions.

For more information, see Dynamic data masking in Fabric data warehousing.

Prerequisites
Before you begin, make sure you have the following:

1. A Microsoft Fabric workspace with an active capacity or trial capacity.


2. A Warehouse.
a. Dynamic data masking works on SQL analytics endpoint. You can add masks to
existing columns using ALTER TABLE ... ALTER COLUMN as demonstrated later in
this article.
b. This exercise uses a Warehouse.
3. To administer, a user with the Administrator, Member, or Contributor rights on the
workspace, or elevated permissions on the Warehouse.
a. In this tutorial, the "admin account".
4. To test, a user without the Administrator, Member, or Contributor rights on the
workspace, and without elevated permissions on the Warehouse.
a. In this tutorial, the "test user".

1. Connect
1. Open the Fabric workspace and navigate to the Warehouse you want to apply
dynamic data masking to.
2. Sign in using an account with elevated access on the Warehouse, either
Admin/Member/Contributor role on the workspace or Control Permissions on the
Warehouse.
2. Configure dynamic data masking
1. Sign into the Fabric portal with your admin account.

2. In the Fabric workspace, navigate to your Warehouse.

3. Select the New SQL query option, and under Blank, select New SQL query.

4. In your SQL script, define dynamic data masking rules using the MASKED WITH
FUNCTION clause. For example:

SQL

CREATE TABLE dbo.EmployeeData (


EmployeeID INT
,FirstName VARCHAR(50) MASKED WITH (FUNCTION = 'partial(1,"-",2)')
NULL
,LastName VARCHAR(50) MASKED WITH (FUNCTION = 'default()') NULL
,SSN CHAR(11) MASKED WITH (FUNCTION = 'partial(0,"XXX-XX-",4)')
NULL
,email VARCHAR(256) NULL
);
GO
INSERT INTO dbo.EmployeeData
VALUES (1, 'TestFirstName', 'TestLastName', '123-45-
6789','[email protected]');
GO
INSERT INTO dbo.EmployeeData
VALUES (2, 'First_Name', 'Last_Name', '000-00-
0000','[email protected]');
GO

The FirstName column shows only the first and last two characters of the
string, with - in the middle.
The LastName column shows XXXX .
The SSN column shows XXX-XX- followed by the last four characters of the
string.

5. Select the Run button to execute the script.

6. Confirm the execution of the script.

7. The script will apply the specified dynamic data masking rules to the designated
columns in your table.

3. Test dynamic data masking


Once the dynamic data masking rules are applied, you can test the masking by querying
the table with a test user who does not have the Administrator, Member, or Contributor
rights on the workspace, or elevated permissions on the Warehouse.

1. Sign in to a tool like Azure Data Studio or SQL Server Management Studio as the
test user, for example [email protected].
2. As the test user, run a query against the table. The masked data is displayed
according to the rules you defined.

SQL

SELECT * FROM dbo.EmployeeData;

3. With your admin account, grant the UNMASK permission from the test user.

SQL

GRANT UNMASK ON dbo.EmployeeData TO [[email protected]];

4. As the test user, verify that a user signed in as [email protected] can see
unmasked data.

SQL

SELECT * FROM dbo.EmployeeData;

5. With your admin account, revoke the UNMASK permission from the test user.

SQL

REVOKE UNMASK ON dbo.EmployeeData TO [TestUser];

6. Verify that the test user cannot see unmasked data, only the masked data.

SQL

SELECT * FROM dbo.EmployeeData;

7. With your admin account, you can grant and revoke the UNMASK permission to a
role

SQL
GRANT UNMASK ON dbo.EmployeeData TO [TestRole];
REVOKE UNMASK ON dbo.EmployeeData TO [TestRole];

4. Manage and modify dynamic data masking rules


To manage or modify existing dynamic data masking rules, create a new SQL script.

1. You can add a mask to an existing column, using the MASKED WITH FUNCTION clause:

SQL

ALTER TABLE dbo.EmployeeData


ALTER COLUMN [email] ADD MASKED WITH (FUNCTION = 'email()');
GO

SQL

ALTER TABLE dbo.EmployeeData


ALTER COLUMN [email] DROP MASKED;

5. Cleanup
1. To clean up this testing table:

SQL

DROP TABLE dbo.EmployeeData;

Related content
Dynamic data masking in Fabric data warehousing
Workspace roles in Fabric data warehousing
Column-level security in Fabric data warehousing
Row-level security in Fabric data warehousing
Security for data warehousing in Microsoft Fabric

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


Query using the visual query editor
Article • 09/20/2024

Applies to: ✅ SQL analytics endpoint, Warehouse, and Mirrored Database in Microsoft
Fabric

This article describes how to use the visual query editor in the Microsoft Fabric portal to
quickly and efficiently write queries. You can use the visual query editor for a no-code
experience to create your queries.

You can also query the data in your warehouse with multiple tools with a SQL
connection string.
You can use the SQL query editor to write T-SQL queries from the Microsoft Fabric
portal.
You can quickly view data in the Data preview.

Visual query editor in the Fabric portal


The visual query editor provides an easy visual interface to write queries against the data
in your warehouse.

Once you've loaded data into your warehouse, you can use the visual query editor to
create queries to analyze your data. There are two ways to get to the visual query editor:

In the ribbon, create a new query using the New visual query button, as shown in the
following image.

To create a query, drag and drop tables from the Object explorer onto the canvas. To
drag a table, select and hold the table until you see it's picked up from the Object
explorer before dragging. Once you drag one or more tables onto the canvas, you can
use the visual experience to design your queries. The warehouse editor uses the Power
Query diagram view experience to enable you to easily query and analyze your data.
Learn more about Power Query diagram view.

As you work on your visual query, the queries are automatically saved every few
seconds. A "saving indicator" appears in your query tab to indicate that your query is
being saved. All workspace users can save their queries in My queries folder. However,
users in viewer role of the workspace or shared recipients of the warehouse are
restricted from moving queries to Shared queries folder.

The following animated gif shows the merging of two tables using a no-code visual
query editor.

The steps shown in the gif are:

1. First, the table DimCity is dragged from the Explorer into the blank new visual
query editor.
2. Then, the table FactSale is dragged from the Explorer into the visual query editor.
3. In the visual query editor, in the content menu of DimCity , the Merge queries as
new Power Query operator is used to join them on a common key.
4. In the new Merge page, the CityKey column in each table is selected to be the
common key. The Join kind is Inner.
5. The new Merge operator is added to the visual query editor.
6. When you see results, you can use Download Excel file to view results in Excel or
Visualize results to create report on results.

Save as view
You can save your query as a view on which data load is enabled using the Save as view
button. Select the schema name that you have access to create views, provide name of
view and verify the SQL statement before confirming creating view. When view is
successfully created, it appears in the Explorer.

View SQL
The View SQL feature allows you to see the SQL query based on the applied steps of
your visual query.

Select View query to see the resulting T-SQL, and Edit SQL script to edit the SQL query
in the query editor.

When writing queries that are joining two or more tables using the Merge queries
action, the query that has load enabled will be reflected in the SQL script. To specify
which table's query should be shown in the SQL script, select the context menu and then
Enable load. Expand the table's columns that got merged in the results to see the steps
reflected in the SQL script.

Save as table
You can use Save as table to save your query results into a table for the query with load
enabled. Select the warehouse in which you would like to save results, select schema
that you have access to create tables and provide table name to load results into the
table using CREATE TABLE AS SELECT statement. When table is successfully created, it
appears in the Explorer.

Create a cross-warehouse query in visual query


editor
For more information on cross-warehouse querying, see Cross-warehouse querying.

To create a cross-warehouse query, drag and drop tables from added warehouses
and add merge activity. For example, in the following image example, store_sales
is added from sales warehouse and it's merged with item table from marketing
warehouse.

Limitations with visual query editor


In the visual query editor, you can only run DQL (Data Query Language) or read-
only SELECT statements. DDL or DML statements are not supported.
Only a subset of Power Query operations that support Query folding are currently
supported.
Visualize Results currently does not support SQL queries with an ORDER BY clause.
When viewing SQL script joining two or more tables, only the table with load
enabled selected will show the corresponding SQL script.
There are certain steps that the View SQL feature does not support in which a
banner in the visual query editor indicates stating "The query is not supported as a
warehouse view, since it cannot be fully translated to SQL". For more information,
see Query folding indicators in Power Query.

Related content
Query the SQL analytics endpoint or Warehouse in Microsoft Fabric
Query using the SQL query editor
Query insights in Fabric data warehousing
Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


Query using the SQL query editor
Article • 09/24/2024

Applies to: ✅ SQL analytics endpoint, Warehouse, and Mirrored Database in Microsoft
Fabric

This article describes how to use the SQL query editor in the Microsoft Fabric portal to
quickly and efficiently write queries, and suggestions on how best to see the information
you need.

You can also query the data in your warehouse with multiple tools with a SQL
connection string.
You can build queries graphically with the Visual query editor.
You can quickly view data in the Data preview.

The SQL query editor provides support for IntelliSense, code completion, syntax
highlighting, client-side parsing, and validation. You can run Data Definition Language
(DDL), Data Manipulation Language (DML), and Data Control Language (DCL)
statements.

SQL query editor in the Fabric portal


The SQL query editor provides a text editor to write queries using T-SQL. To access the
built-in SQL query editor:

Create a new query using the New SQL query button in the ribbon.

If you select SQL templates dropdown list, you can easily create T-SQL objects with
code templates that populate in your SQL query window, as shown in the following
image.

As you work on your SQL query, the queries are automatically saved every few seconds.
A "saving" indicator appears in your query tab to indicate that your query is being
saved.

Multitask between tabs for data preview,


querying, and modeling
The data preview, querying, and modeling experience opens up as individual tabs that
you can multitask between in the editor. If you are writing a query, you can switch
between seeing a preview of the data and viewing the relationships between tables that
you're writing the query for. To view or close all tabs, click on the icon on the right of all
tabs.

View query results


Once you've written the T-SQL query, select Run to execute the query.

The Results preview is displayed in the Results section. If number of rows returned is
more than 10,000 rows, the preview is limited to 10,000 rows. You can search string
within results grid to get filtered rows matching search criteria. The Messages tab shows
SQL messages returned when SQL query is run.

The status bar indicates the query status, duration of the run and number of rows and
columns returned in results.

To enable Save as view, Save as table, Open in Excel, Explore this data (preview), and
Visualize results menus, highlight the SQL statement containing SELECT statement in
the SQL query editor.


Save as view
You can select the query and save your query as a view using the Save as view button.
Select the schema name that you have access to create views, provide name of view and
verify the SQL statement before confirming creating view. When view is successfully
created, it appears in the Explorer.

Save as table

You can use Save as table to save your query results into a table. Select the warehouse
in which you would like to save results, select schema that you have access to create
tables and provide table name to load results into the table using CREATE TABLE AS
SELECT statement. When table is successfully created, it appears in the Explorer.
Open in Excel
The Open in Excel button opens the corresponding T-SQL Query to Excel and executes
the query, enabling you to work with the results in Microsoft Excel on your local
computer.

Follow these steps to work with the Excel file locally:

1. After you select the Continue button, locate the downloaded Excel file in your
Windows File Explorer, for example, in the Downloads folder of your browser.

2. To see the data, select the Enable Editing button in the Protected View ribbon
followed by the Enable Content button in the Security Warning ribbon. Once both
are enabled, you are presented with the following dialog to approve running the
query listed.

3. Select Run.

4. Authenticate your account with the Microsoft account option. Select Connect.

Once you have successfully signed in, you'll see the data presented in the spreadsheet.
Explore this data (preview)
Explore this data (preview) provides the capability to perform ad-hoc exploration of
your query results. With this feature, you can launch a side-by-side matrix and visual
view to better understand any trends or patterns behind your query results before
diving into building a full Power BI report. For more information, see Explore your data
in the Power BI service.

Visualize results

Visualize results allows you to create reports from your query results within the SQL
query editor.

Copy

The Copy dropdown allows you to copy the results and/or column names in the data
grid. You can choose to copy results with column names, just copy the results only, or
just copy the column names only.

Multiple result sets


When you run multiple queries and those return multiple results, you can select results
dropdown list to see individual results.

Cross-warehouse querying
For more information on cross-warehouse querying, see Cross-warehouse querying.

You can write a T-SQL query with three-part naming convention to refer to objects and
join them across warehouses, for example:

SQL

SELECT
emp.Employee
,SUM(Profit) AS TotalProfit
,SUM(Quantity) AS TotalQuantitySold
FROM
[SampleWarehouse].[dbo].[DimEmployee] as emp
JOIN
[WWI_Sample].[dbo].[FactSale] as sale
ON
emp.EmployeeKey = sale.SalespersonKey
WHERE
emp.IsSalesperson = 'TRUE'
GROUP BY
emp.Employee
ORDER BY
TotalProfit DESC;

Keyboard shortcuts
Keyboard shortcuts provide a quick way to navigate and allow users to work more
efficiently in SQL query editor. The table in this article lists all the shortcuts available in
SQL query editor in the Microsoft Fabric portal:

ノ Expand table

Function Shortcut

New SQL query Ctrl + Q

Close current tab Ctrl + Shift + F4

Run SQL script Ctrl + Enter, Shift +Enter

Cancel running SQL script Alt+Break

Search string Ctrl + F

Replace string Ctrl + H

Undo Ctrl + Z

Redo Ctrl + Y

Go one word left Ctrl + Left arrow key

Go one word right Ctrl + Right arrow key

Indent increase Tab

Indent decrease Shift + Tab

Comment Ctrl + K, Ctrl + C

Uncomment Ctrl + K, Ctrl + U

Move cursor up ↑

Move cursor down ↓

Select All Ctrl + A

Limitations
In SQL query editor, every time you run the query, it opens a separate session and
closes it at the end of the execution. This means if you set up session context for
multiple query runs, the context is not maintained for independent execution of
queries.

You can run Data Definition Language (DDL), Data Manipulation Language (DML),
and Data Control Language (DCL) statements, but there are limitations for
Transaction Control Language (TCL) statements. In the SQL query editor, when you
select the Run button, you're submitting an independent batch request to execute.
Each Run action in the SQL query editor is a batch request, and a session only
exists per batch. Each execution of code in the same query window runs in a
different batch and session.
For example, when independently executing transaction statements, session
context is not retained. In the following screenshot, BEGIN TRAN was executed in
the first request, but since the second request was executed in a different
session, there is no transaction to commit, resulting into the failure of
commit/rollback operation. If the SQL batch submitted does not include a
COMMIT TRAN , the changes applied after BEGIN TRAN will not commit.

The SQL query editor does not support sp_set_session_context .

In the SQL query editor, the GO SQL command creates a new independent batch
in a new session.

When you are running a SQL query with USE, you need to submit the SQL query
with USE as one single request.

Visualize results currently does not support SQL queries with an ORDER BY clause.

T-SQL statements that use the T-SQL OPTION syntax are not currently supported in
the Explore this data or Visualize results options with DirectQuery mode. The
workaround is to create visualizations in Power BI Desktop using Import mode.

The following table summarizes the expected behavior will not match with SQL
Server Management Studio or Azure Data Studio:

ノ Expand table
Scenario Supported in Supported in SQL query
SSMS/ADS editor in Fabric portal

Using SET Statements (Transact-SQL) to set Yes No


properties for session

Using sp_set_session_context (Transact-SQL) Yes No


for multiple batch statements runs

Transactions (Transact-SQL) (unless executed Yes No


as a single batch request)

Related content
Query using the Visual Query editor
Tutorial: Create cross-warehouse queries with the SQL query editor

Next step
How-to: Query the Warehouse

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


View data in the Data preview in
Microsoft Fabric
Article • 11/19/2024

Applies to: ✅ SQL analytics endpoint, Warehouse, and Mirrored Database in Microsoft
Fabric

The Data preview is one of the three switcher modes along with the Query editor and
Model view within Fabric Data Warehouse that provides an easy interface to view the
data within your tables or views to preview sample data (top 1,000 rows).

You can also query the data in your warehouse with multiple tools with a SQL
connection string.
You can use the SQL query editor to write T-SQL queries from the Microsoft Fabric
portal.
You can build queries graphically with the Visual query editor.

Get started
After creating a warehouse and ingesting data, select a specific table or view from the
Object explorer that you would like to display in the data grid of the Data preview page.

Search value – Type in a specific keyword in the search bar and rows with that
specific keyword will be filtered. In this example, "New Hampshire" is the keyword
and only rows containing this keyword are shown. To clear the search, select the X
inside the search bar.

Sort columns (alphabetically or numerically) – Hover over the column title to see
the More Options (...) button appear. Select it to see the "Sort Ascending" and
"Sort Descending" options.

Copy value – Select a specific cell in the data preview and press Ctrl + C
(Windows) or Cmd + C (Mac).

Considerations and limitations


Only the top 1,000 rows can be shown in the data grid of the Data preview.
The Data preview view changes depending on how the columns are sorted or if
there's a keyword that is searched.

Related content
Define relationships in data models for data warehousing
Model data in the default Power BI semantic model in Microsoft Fabric

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


Delta Lake logs in Warehouse in
Microsoft Fabric
Article • 08/02/2024

Applies to: Warehouse in Microsoft Fabric

Warehouse in Microsoft Fabric is built up open file formats. User tables are stored in
parquet file format, and Delta Lake logs are published for all user tables.

The Delta Lake logs opens up direct access to the warehouse's user tables for any
engine that can read Delta Lake tables. This access is limited to read-only to ensure the
user data maintains ACID transaction compliance. All inserts, updates, and deletes to the
data in the tables must be executed through the Warehouse. Once a transaction is
committed, a system background process is initiated to publish the updated Delta Lake
log for the affected tables.

How to get OneLake path


The following steps detail how to get the OneLake path from a table in a warehouse:

1. Open Warehouse in your Microsoft Fabric workspace.

2. In the Object Explorer, you find more options (...) on a selected table in the Tables
folder. Select the Properties menu.

3. On selection, the Properties pane shows the following information:


a. Name
b. Format
c. Type
d. URL
e. Relative path
f. ABFS path

How to get Delta Lake logs path


You can locate Delta Lake logs via the following methods:

Delta Lake logs can be queried through shortcuts created in a lakehouse. You can
view the files using a Microsoft Fabric Spark Notebook or the Lakehouse explorer
in Synapse Data Engineering in the Microsoft Fabric portal.

Delta Lake logs can be found via Azure Storage Explorer, through Spark
connections such as the Power BI Direct Lake mode, or using any other service that
can read delta tables.

Delta Lake logs can be found in the _delta_log folder of each table through the
OneLake Explorer in Windows, as shown in the following screenshot.

Pausing Delta Lake log publishing


Publishing of Delta Lake logs can be paused and resumed if needed. When publishing is
paused, Microsoft Fabric engines that read tables outside of the Warehouse sees the
data as it was before the pause. It ensures that reports remain stable and consistent,
reflecting data from all tables as they existed before any changes were made to the
tables. Once your data updates are complete, you can resume Delta Lake Log publishing
to make all recent data changes visible to other analytical engines. Another use case for
pausing Delta Lake log publishing is when users do not need interoperability with other
compute engines in Microsoft Fabric, as it can help save on compute costs.

The syntax to pause and resume Delta Lake log publishing is as follows:

SQL

ALTER DATABASE CURRENT SET DATA_LAKE_LOG_PUBLISHING = PAUSED | AUTO

Example: pause and resume Delta Lake log publishing


To pause Delta Lake log publishing, use the following code snippet:

SQL

ALTER DATABASE CURRENT SET DATA_LAKE_LOG_PUBLISHING = PAUSED

Queries to warehouse tables on the current warehouse from other Microsoft Fabric
engines (for example, queries from a Lakehouse) now show a version of the data as it
was before pausing Delta Lake log publishing. Warehouse queries still show the latest
version of data.

To resume Delta Lake log publishing, use the following code snippet:

SQL

ALTER DATABASE CURRENT SET DATA_LAKE_LOG_PUBLISHING = AUTO

When the state is changed back to AUTO, the Fabric Warehouse engine publishes logs
of all recent changes made to tables on the warehouse, allowing other analytical
engines in Microsoft Fabric to read the latest version of data.

Checking the status of Delta Lake log publishing


To check the current state of Delta Lake log publishing on all warehouses for the current
workspace, use the following code snippet:

SQL
SELECT [name], [DATA_LAKE_LOG_PUBLISHING_DESC] FROM sys.databases

Limitations
Table Names can only be used by Spark and other systems if they only contain
these characters: A-Z a-z 0-9 and underscores.
Column Names that will be used by Spark and other systems cannot contain:
spaces
tabs
carriage returns
[
,
;
{
}
(
)
=
]

Related content
Query the Warehouse
How to use Microsoft Fabric notebooks
OneLake overview
Accessing shortcuts
Navigate the Fabric Lakehouse explorer

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


Query data as it existed in the past
Article • 07/18/2024

Applies to: Warehouse in Microsoft Fabric

Warehouse in Microsoft Fabric offers the capability to query historical data as it existed
in the past. The ability to query a data from a specific timestamp is known in the data
warehousing industry as time travel.

Time travel facilitates stable reporting by maintaining the consistency and accuracy
of data over time.
Time travel enables historical trend analysis by querying across various past points
in time, and helps anticipate the future trends.
Time travel simplifies low-cost comparisons between previous versions of data.
Time travel aids in analyzing performance over time.
Time travel allows organizations to audit data changes over time, often required
for compliance purposes.
Time travel helps to reproduce the results from machine learning models.
Time travel can query tables as they existed at a specific point in time across
multiple warehouses in the same workspace.

What is time travel?


Time travel in a data warehouse is a low-cost and efficient capability to quickly query
prior versions of data.

Microsoft Fabric currently allows retrieval of past states of data in the following ways:

At the statement level with FOR TIMESTAMP AS OF


At the table level with CLONE TABLE

Time travel with the FOR TIMESTAMP AS OF T-SQL


command
Within a Warehouse item, tables can be queried using the OPTION FOR TIMESTAMP AS
OF T-SQL syntax to retrieve data at past points in time. The FOR TIMESTAMP AS OF clause
affects the entire statement, including all joined warehouse tables.

The results obtained from the time travel queries are inherently read-only. Write
operations such as INSERT, UPDATE, and DELETE cannot occur while utilizing the FOR
TIMESTAMP AS OF query hint.
Use the OPTION clause to specify the FOR TIMESTAMP AS OF query hint. Queries return
data exactly as it existed at the timestamp, specified as YYYY-MM-DDTHH:MM:SS[.fff] . For
example:

SQL

SELECT *
FROM [dbo].[dimension_customer] AS DC
OPTION (FOR TIMESTAMP AS OF '2024-03-13T19:39:35.28'); --March 13, 2024 at
7:39:35.28 PM UTC

Use the CONVERT syntax for the necessary datetime format with style 126.

The timestamp can be specified only once using the OPTION clause for queries, stored
procedures, views, etc. The OPTION applies to everything within the SELECT statement.

For samples, see How to: Query using time travel.

Data retention
In Microsoft Fabric, a warehouse automatically preserves and maintains various versions
of the data, up to a default retention period of thirty calendar days. This allows the
ability to query tables as of any prior point-in-time. All inserts, updates, and deletes
made to the data warehouse are retained. The retention automatically begins from the
moment the warehouse is created. Expired files are automatically deleted after the
retention threshold.

Currently, a SELECT statement with the FOR TIMESTAMP AS OF query hint returns the
latest version of table schema.
Any records that are deleted in a table are available to be queried as they existed
before deletion, if the deletion is within the retention period.
Any modifications made to the schema of a table, including but not limited to
adding or removing columns from the table, cannot be queried before the schema
change. Similarly, dropping and recreating a table with the same data removes its
history.

Time travel scenarios


Consider the ability to time travel to prior data in the following scenarios:

Stable reporting
Frequent execution of extract, transform, and load (ETL) jobs is essential to keep up with
the ever-changing data landscape. The ability to time travel supports this goal by
ensuring data integrity while providing the flexibility to generate reports based on the
query results that are returned as of a past point in time, such as the previous evening,
while background processing is ongoing.

ETL activities can run concurrently while the same table is queried as of a prior point-in-
time.

Historical trend and predictive analysis

Time travel simplifies the analysis of historical data, helping uncover valuable trends and
patterns through querying data across various past time frames. This facilitates
predictive analysis by enabling experimenting with historical datasets and training of
predictive models. It aids in anticipating future trends and helps making well-informed,
data-driven decisions.

Analysis and comparison


Time travel offers an efficient and cost-effective troubleshooting capability by providing
a historical lens for analysis and comparison, facilitating the identification of root cause.

Performance analysis

Time travel can help analyze the performance of warehouse queries overtime. This helps
identify the performance degradation trends based on which the queries can be
optimized.

Audit and compliance


Time travel streamlines auditing and compliance procedures by empowering auditors to
navigate through data history. This not only helps to remain compliant with regulations
but also helps enhance assurance and transparency.

Machine learning models


Time travel capabilities help in reproducing the results of machine learning models by
facilitating analysis of historical data and simulating real-world scenarios. This enhances
the overall reliability of the models so that accurate data driven decisions can be made.
Design considerations
Considerations for the OPTION FOR TIMESTAMP AS OF query hint:

The FOR TIMESTAMP AS OF query hint cannot be used to create the views as of any
prior point in time within the retention period. It can be used to query views as of
past point in time, within the retention period.
The FOR TIMESTAMP AS OF query hint can be used only once within a SELECT
statement.
The FOR TIMESTAMP AS OF query hint can be defined within the SELECT statement in
a stored procedure.

Permissions to time travel


Any user who has Admin, Member, Contributor, or Viewer workspace roles can query
the tables as of a past point-in-time. When users query tables, the restrictions imposed
by column-level security (CLS), row-level security (RLS), or dynamic data masking (DDM)
are automatically imposed.

Limitations
Supply at most three digits of fractional seconds in the timestamp. If you supply
more precision, you receive the error message An error occurred during
timestamp conversion. Please provide a timestamp in the format yyyy-MM-

ddTHH:mm:ss[.fff]. Msg 22440, Level 16, State 1, Code line 29 .

Currently, only the Coordinated Universal Time (UTC) time zone is used for time
travel.

Currently, the data retention for time travel queries is thirty calendar days.

FOR TIMESTAMP AS OF values in the OPTION clause must be deterministic. For an

example of parameterization, see Time travel in a stored procedure.

Time travel is not supported for the SQL analytics endpoint of the Lakehouse.

The OPTION FOR TIMESTAMP AS OF syntax can only be used in queries that begin
with SELECT statement. Queries such as INSERT INTO SELECT and CREATE TABLE AS
SELECT cannot be used along with the OPTION FOR TIMESTAMP AS OF . Consider

instead the ability to Clone a warehouse table at a point in time.


View definitions cannot contain the OPTION FOR TIMESTAMP AS OF syntax. The view
can be queried with the SELECT .. FROM <view> ... OPTION FOR TIMESTAMP AS OF
syntax. However, you cannot query past data from tables in a view from before the
view was created.

FOR TIMESTAMP AS OF syntax for time travel is not currently supported in Power BI

Desktop Direct query mode or the Explore this data option.

Next step
How to: Query using time travel

Related content
Query the SQL analytics endpoint or Warehouse in Microsoft Fabric
Query hints

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


How to: Query using time travel at the
statement level
Article • 10/30/2024

In Microsoft Fabric, the capability to time travel unlocks the ability to query the prior
versions of data without the need to generate multiple data copies, saving on storage
costs. This article describes how to query warehouse tables using time travel at the
statement level, using the T-SQL OPTION clause and the FOR TIMESTAMP AS OF syntax.
This feature is currently in preview.

Warehouse tables can be queried up to a retention period of thirty calendar days using
the OPTION clause, providing the date format yyyy-MM-ddTHH:mm:ss[.fff] .

The following examples can be executed in the SQL Query Editor, SQL Server
Management Studio (SSMS), Azure Data Studio, or any T-SQL query editor.

7 Note

Currently, only the Coordinated Universal Time (UTC) time zone is used for time
travel.

Time travel on a warehouse table


This example shows how to time travel on an individual table in warehouse.

The OPTION T-SQL clause specifies the point-in-time to return the data.

SQL

/* Time travel using a SELECT statement */


SELECT *
FROM [dbo].[dimension_customer]
OPTION (FOR TIMESTAMP AS OF '2024-05-02T20:44:13.700');

Time travel on multiple warehouse tables


The OPTION Clause is declared once per query, and the results of the query will reflect
the state of the data at the timestamp specified in the query for all tables.

SQL
SELECT Sales.StockItemKey,
Sales.Description,
CAST (Sales.Quantity AS int)) AS SoldQuantity,
c.Customer
FROM [dbo].[fact_sale] AS Sales INNER JOIN [dbo].[dimension_customer] AS c
ON Sales.CustomerKey = c.CustomerKey
GROUP BY Sales.StockItemKey, Sales.Description, Sales.Quantity, c.Customer
ORDER BY Sales.StockItemKey
OPTION (FOR TIMESTAMP AS OF '2024-05-02T20:44:13.700');

Time travel in a stored procedure


Stored procedures are a set of SQL statements that are precompiled and stored so that
it can be used repeatedly. The OPTION clause can be declared once in the stored
procedure, and the result set reflects the state of all tables at the timestamp specified.

The FOR TIMESTAMP AS OF clause cannot directly accept a variable, as values in this
OPTION clause must be deterministic. You can use sp_executesql to pass a strongly typed

datetime value to the stored procedure. This simple example passes a variable and
converts the datetime parameter to the necessary format with date style 126.

SQL

CREATE PROCEDURE [dbo].[sales_by_city] (@pointInTime DATETIME)


AS
BEGIN
DECLARE @selectForTimestampStatement NVARCHAR(4000);
DECLARE @pointInTimeLiteral VARCHAR(33);

SET @pointInTimeLiteral = CONVERT(VARCHAR(33), @pointInTime, 126);


SET @selectForTimestampStatement = '
SELECT *
FROM [dbo].[fact_sale]
OPTION (FOR TIMESTAMP AS OF ''' + @pointInTimeLiteral + ''')';

EXEC sp_executesql @selectForTimestampStatement


END

Then, you can call the stored procedure and pass in a variable as a strongly typed
parameter. For example:

SQL

--Execute the stored procedure


DECLARE @pointInTime DATETIME;
SET @pointInTime = '2024-05-10T22:56:15.457';
EXEC dbo.sales_by_city @pointInTime;

Or, for example:

SQL

--Execute the stored procedure


DECLARE @pointInTime DATETIME;
SET @pointInTime = DATEADD(dd, -7, GETDATE())
EXEC dbo.sales_by_city @pointInTime;

Time travel in a view


Views represent a saved query that dynamically retrieves data from one or more tables
whenever the view is queried. The OPTION clause can be used to query the views so that
the results reflect the state of data at the timestamp specified in the query.

SQL

--Create View
CREATE VIEW Top10CustomersView
AS
SELECT TOP (10)
FS.[CustomerKey],
DC.[Customer],
SUM(FS.TotalIncludingTax) AS TotalSalesAmount
FROM
[dbo].[dimension_customer] AS DC
INNER JOIN
[dbo].[fact_sale] AS FS ON DC.[CustomerKey] = FS.[CustomerKey]
GROUP BY
FS.[CustomerKey],
DC.[Customer]
ORDER BY
TotalSalesAmount DESC;

/*View of Top10 Customers as of a point in time*/


SELECT *
FROM [Timetravel].[dbo].[Top10CustomersView]
OPTION (FOR TIMESTAMP AS OF '2024-05-01T21:55:27.513');

The historical data from tables in a view can only be queried for time travel
beginning from the time the view was created.
After a view is altered, time travel queries are only valid after it was altered.
If an underlying table of a view is altered without changing the view, time travel
queries on the view can return the data from before the table change, as expected.
When the underlying table of a view is dropped and recreated without modifying
the view, data for time travel queries is only available from the time after the table
was recreated.

Limitations
For more information on time travel at the statement level limitations with FOR
TIMESTAMP AS OF , see Time travel Limitations.

Related content
Query data as it existed in the past

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


T-SQL support in Microsoft Fabric
notebooks
Article • 09/24/2024

The T-SQL notebook feature in Microsoft Fabric lets you write and run T-SQL code
within a notebook. You can use T-SQL notebooks to manage complex queries and write
better markdown documentation. It also allows direct execution of T-SQL on connected
warehouse or SQL analytics endpoint. By adding a Data Warehouse or SQL analytics
endpoint to a notebook, T-SQL developers can run queries directly on the connected
endpoint. BI analysts can also perform cross-database queries to gather insights from
multiple warehouses and SQL analytics endpoints.

Most of the existing notebook functionalities are available for T-SQL notebooks. These
include charting query results, coauthoring notebooks, scheduling regular executions,
and triggering execution within Data Integration pipelines.

) Important

This feature is in preview.

In this article, you learn how to:

Create a T-SQL notebook


Add a Data Warehouse or SQL analytics endpoint to a notebook
Create and run T-SQL code in a notebook
Use the charting features to graphically represent query outcomes
Save the query as a view or a table
Run cross warehouse queries
Skip the execution of non-T-SQL code

Create a T-SQL notebook


To get started with this experience, you can create a T-SQL notebook in the following
two ways:

1. Create a T-SQL notebook from the Data Warehouse homepage: Navigate to the
data warehouse experience, and choose Notebook.
2. Create a T-SQL notebook from an existing warehouse editor: Navigate to an
existing warehouse, from the top navigation ribbon, select New SQL query and
then New T-SQL query notebook

Once the notebook is created, T-SQL is set as the default language. You can add data
warehouse or SQL analytics endpoints from the current workspace into your notebook.

Add a Data Warehouse or SQL analytics


endpoint into a notebook
To add a Data Warehouse or SQL analytics endpoint into a notebook, from the
notebook editor, select + Data sources button and select Warehouses. From the data-
hub panel, select the data warehouse or SQL analytics endpoint you want to connect to.
Set a primary warehouse
You can add multiple warehouses or SQL analytics endpoints into the notebook, with
one of them is set as the primary. The primary warehouse runs the T-SQL code. To set it,
go to the object explorer, select ... next to the warehouse, and choose Set as primary.
For any T-SQL command which supports three-part naming, primary warehouse is used
as the default warehouse if no warehouse is specified.

Create and run T-SQL code in a notebook


To create and run T-SQL code in a notebook, add a new cell and set T-SQL as the cell
language.

You can autogenerate T-SQL code using the code template from the object explorer's
context menu. The following templates are available for T-SQL notebooks:

Select top 100


Create table
Create as select
Drop
Drop and create

You can run one T-SQL code cell by selecting the Run button in the cell toolbar or run
all cells by selecting the Run all button in the toolbar.

7 Note

Each code cell is executed in a separate session, so the variables defined in one cell
are not available in another cell.

Within the same code cell, it might contain multiple lines of code. User can select part of
these code and only run the selected ones. Each execution also generates a new session.
After the code is executed, expand the message panel to check the execution summary.

The Table tab list the records from the returned result set. If the execution contains
multiple result set, you can switch from one to another via the dropdown menu.

Use the charting features to graphically represent query


outcomes
By clicking on the Inspect, you can see the charts which represent the data quality and
distribution of each column

Save the query as a view or table


You can use Save as table menu to save the results of the query into the table using
CTAS command. To use this menu, select the query text from the code cell and select
Save as table menu.
Similarly, you can create a view from your selected query text using Save as view menu
in the cell command bar.
7 Note

Because the Save as table and Save as view menu are only available for the
selected query text, you need to select the query text before using these
menus.

Create View does not support three-part naming, so the view is always
created in the primary warehouse by setting the warehouse as the primary
warehouse.

Cross warehouse query


You can run cross warehouse query by using three-part naming. The three-part naming
consists of the database name, schema name, and table name. The database name is
the name of the warehouse or SQL analytics endpoint, the schema name is the name of
the schema, and the table name is the name of the table.

Skip the execution of non-T-SQL code


Within the same notebook, it's possible to create code cells that use different languages.
For instance, a PySpark code cell can precede a T-SQL code cell. In such case, user can
choose to skip the run of any PySpark code for T-SQL notebook. This dialog appears
when you run all the code cells by clicking the Run all button in the toolbar.
Public preview limitations
Parameter cell isn't yet supported in T-SQL notebook. The parameter passed from
pipeline or scheduler won't be able to be used in T-SQL notebook.
The Recent Run feature isn't yet supported in T-SQL notebook. You need to use
the current data warehouse monitoring feature to check the execution history of
the T-SQL notebook. See Monitor Data Warehouse article for more details.
The monitor URL inside the pipeline execution isn't yet supported in the T-SQL
notebook.
The snapshot feature isn't yet supported in the T-SQL notebook.

Related content
For more information about Fabric notebooks, see the following articles.

What is data warehousing in Microsoft Fabric


Questions? Try asking the Fabric Community .
Suggestions? Contribute ideas to improve Fabric .

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


Restore in-place of a warehouse in
Microsoft Fabric
Article • 07/17/2024

Applies to: Warehouse in Microsoft Fabric

Microsoft Fabric offers the capability to restore a warehouse to a prior point-in-time,


from a restore point.

Restore in-place can be used to restore the warehouse to a known good state in
the event of accidental corruption, minimizing downtime and data loss.
Restore in-place can be helpful to reset the warehouse to a known good state for
development and testing purposes.
Restore in-place helps to quickly roll back the changes to prior state, due failed
database release or migration.

Restore in-place is an essential part of data recovery that allows restoration of the
warehouse to a prior known good state. A restore overwrites the existing warehouse,
using restore points from the existing warehouse.

You can also query data in a warehouse as it appeared in the past, using the T-SQL
OPTION syntax. For more information, see Query data as it existed in the past.

7 Note

The restore points and restore in place features are currently in preview.

What are restore points?


Restore points are recovery points of the warehouse created by copying only the
metadata, while referencing the data files in OneLake. The metadata is copied while the
underlying data of the warehouse stored as parquet files aren't copied. These restore
points can be used to recover the warehouse as of prior point in time.

To view all restore points for your warehouse, in the Fabric portal go to Settings ->
Restore points.

System-created restore points


The creation of the system-created restore points is a built-in feature in Warehouse.
However, the warehouse should be in an Active state for automatic system-created
restore point creation.

System-generated restore points are created throughout the day, and are available for
thirty days. System-generated restore points are created automatically every eight
hours. A system-created restore point might not be available immediately for a new
warehouse. If one is not yet available, create a user-defined restore point.

There can be up to 180 system-generated restore points at any given point in time.

Warehouse supports an eight-hour recovery point objective (RPO).

If the warehouse is paused, system-created restore points can't be created unless and
until the warehouse is resumed. You should create a user-defined restore point before
pausing the warehouse. Before a warehouse is dropped, a system-created restore point
isn't automatically created.

System-created restore points can't be deleted, as the restore points are used to
maintain service level agreements (SLAs) for recovery.

User-defined restore points


Warehouse enables the workspace administrators to manually create restore points
before and after large modifications made to the warehouse. This ensures that the
restore points are logically consistent, providing data protection and quick recovery
time in case of any workload interruptions or user errors.

Any number of user-defined restore points aligned with your specific business or
organizational recovery strategy can be created. User-defined restore points are
available for thirty calendar days and are automatically deleted on your behalf after the
expiry of the retention period.

For more information about creating and managing restore points, see Manage restore
points in the Fabric portal.

Restore point retention


Details for restore point retention periods:

Warehouse deletes both the system-created and user-defined restore point at the
expiry of the 30 calendar day retention period.
The age of a restore point is measured by the absolute calendar days from the
time the restore point is taken, including when the Microsoft Fabric capacity is
paused.
System-created and user-generated restore points can't be created when the
Microsoft Fabric capacity is paused. The creation of a restore point fails when the
fabric capacity gets paused while the restore point creation is in progress.
If a restore point is generated and then the capacity remains paused for more than
30 days before being resumed, the restore point remains in existence until a total
of 180 system-created restore points are reached.
At any point in time, Warehouse is guaranteed to be able to store up to 180
system-generated restore points as long as these restore points haven't reached
the thirty day retention period.
All the user-defined restore points that are created for the warehouse are
guaranteed to be stored until the default retention period of 30 calendar days.

Recovery point and restore costs

Storage billing
The creation of both system-created and user-defined restore points consume storage.
The storage cost of restore points in OneLake includes the data files stored in parquet
format. There are no storage charges incurred during the process of restore.

Compute billing
Compute charges are incurred during the creation and restore of restore points, and
consume the Microsoft Fabric capacity.

Restore in-place of a warehouse


Use the Fabric portal to restore a warehouse in-place.

When you restore, the current warehouse is replaced with the restored warehouse. The
name of the warehouse remains the same, and the old warehouse is overwritten. All
components, including objects in the Explorer, modeling, Query Insights, and semantic
models are restored as they existed when the restore point was created.

Each restore point references a UTC timestamp when the restore point was created.

If you encounter Error 5064 after requesting a restore, resubmit the restore again.
Security
Any member of the Admin, Member, or Contributor workspace roles can create,
delete, or rename the user-defined restore points.

Any user that has the workspace roles of a Workspace Administrator, Member,
Contributor, or Viewer can see the list of system-created and user-defined restore
points.

A data warehouse can be restored only by user that has workspace roles of a
Workspace Administrator, from a system-created or user-defined restore point.

Limitations
A recovery point can't be restored to create a new warehouse with a different
name, either within or across the Microsoft Fabric workspaces.
Restore points can't be retained beyond the default thirty calendar day retention
period. This retention period isn't currently configurable.

Next step
Restore in-place in the Fabric portal

Related content
Manage restore points in the Fabric portal
Clone table in Microsoft Fabric
Query data as it existed in the past
Microsoft Fabric disaster recovery guide

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


Restore in-place in the Fabric portal
Article • 07/18/2024

Applies to: Warehouse in Microsoft Fabric

Restore in-place is an essential part of data recovery that allows restoration of the
warehouse to a prior known good state. A restore overwrites the existing warehouse,
using restore points from the existing warehouse in Microsoft Fabric.

This tutorial guides you through how to create restore points and performing a
restore in-place in a warehouse, as well as how to rename, manage, and view
restore points.

Prerequisites
Review the workspace roles membership required for the following steps. For more
information, see Restore in place Security.
An existing user-defined or system-created restore point.
A system-created restore point might not be available immediately for a new
warehouse. If one is not yet available, create a user-defined restore point.

Restore the warehouse using the restore point


1. To restore a user-defined or system-created restore point, go to context menu
action of the restore point and select Restore.

2. Review and confirm the dialogue. Select the checkbox followed by Restore.
3. A notification appears showing restore progress, followed by success notification.
Restore in-place is a metadata operation, so it can take a while depending on the
size of the metadata that is being restored.

) Important

When a restore in-place is initiated, users inside the warehouse are not
alerted that a restore is ongoing. Once the restore operation is completed,
users should refresh the Object Explorer.

4. Refresh your reports to reflect the restored state of the data.

Create user-defined restore point


Fabric automatically creates system-created restore points at least every eight hours.
Workspace administrators, members, and contributors can also manually create restore
points, for example, before and after large modifications made to the warehouse.

1. Go to Warehouse Settings -> Restore points.

2. Select Add a restore point.


3. Provide a Name and Description.

4. A notification appears on successful creation of restore point.

Rename restore point


1. To rename a user-defined or system-created restore point, go to context menu
action of the restore point, select Rename.
2. Provide a new name and select Rename.

3. A notification appears on a successful rename.

Delete user-defined restore point


You can delete user-defined restore points, but system-created restore points cannot be
deleted.

For more information, see Restore point retention.

1. To delete a user-defined restore point, either go to context menu action of the


restore point, or select Delete.

2. To confirm, select Delete.

3. A notification appears on successful deletion of restore point.

View system-created and user-defined restore


points
1. Go to Warehouse Settings -> Restore points to view all restore points.

2. A unique restore point is identified by Time(UTC) value. Sort this column to


identify the latest restore points.
3. If your warehouse has been restored, select Details for more information. The
Details of last restoration popup provides details on latest restoration, including
who performed the restore, when it was performed, and which restore point was
restored.

If the restore point is over 30 days old and was deleted, the details of the
banner will show as N/A .

Related content
Restore in-place of a warehouse in Microsoft Fabric
Microsoft Fabric terminology

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


Clone table in Microsoft Fabric
Article • 07/12/2024

Applies to: Warehouse in Microsoft Fabric

Microsoft Fabric offers the capability to create near-instantaneous zero-copy clones with
minimal storage costs.

Table clones facilitate development and testing processes by creating copies of


tables in lower environments.
Table clones provide consistent reporting and zero-copy duplication of data for
analytical workloads and machine learning modeling and testing.
Table clones provide the capability of data recovery in the event of a failed release
or data corruption by retaining the previous state of data.
Table clones help to create historical reports that reflect the state of data as it
existed as of a specific point-in-time in the past.
Table clones at a specific point in time can preserve the state of data at specific
business points in time.

You can use the CREATE TABLE AS CLONE OF T-SQL commands to create a table clone.
For a tutorial, see Tutorial: Clone table using T-SQL or Tutorial: Clone tables in the Fabric
portal.

You can also query data in a warehouse as it appeared in the past, using the T-SQL
OPTION syntax. For more information, see Query data as it existed in the past.

What is zero-copy clone?


A zero-copy clone creates a replica of the table by copying the metadata, while still
referencing the same data files in OneLake. The metadata is copied while the underlying
data of the table stored as parquet files is not copied. The creation of a clone is similar
to creating a table within a Warehouse in Microsoft Fabric.

Table clone in Synapse Data Warehouse

Creation of a table clone


Within a warehouse, a clone of a table can be created near-instantaneously using simple
T-SQL. A clone of a table can be created within or across schemas in a warehouse.
Clone of a table can be created based on either:

Current point-in-time: The clone is based on the present state of the table.

Previous point-in-time: The clone is based on a point-in-time up to thirty days in


the past. The table clone contains the data as it appeared at a desired past point in
time. In the industry, this feature is known as "time travel". The new table is created
with a timestamp based on UTC. For examples, see Clone table as of past point-in-
time or CREATE TABLE AS CLONE OF.

You can also clone a group of tables at once. This can be useful for cloning a group of
related tables at the same past point in time. For an example, see Clone multiple tables
at once.

You can also query data from tables as they existed in the past, using the Time travel
feature in Warehouse.

Data retention
Warehouse automatically preserves and maintains the data history for thirty calendar
days, allowing for clones to be made at a point in time. All inserts, updates, and deletes
made to the data warehouse are retained for thirty calendar days.

There is no limit on the number of clones created both within and across schemas.

Separate and independent


Upon creation, a table clone is an independent and separate copy of the data from its
source.

Any changes made through DML or DDL on the source of the clone table are not
reflected in the clone table.
Similarly, any changes made through DDL or DML on the table clone are not
reflected on the source of the clone table.

Permissions to create a table clone


The following permissions are required to create a table clone:

Users with Admin, Member, or Contributor workspace roles can clone the tables
within the workspace. The Viewer workspace role cannot create a clone.
SELECT permission on all the rows and columns of the source of the table clone is
required.
User must have CREATE TABLE permission in the schema where the table clone will
be created.

Deletion of a table clone


Due to its autonomous existence, both the original source and the clones can be
deleted without any constraints. Once a clone is created, it remains in existence until
deleted by the user.

Users with Admin, Member, or Contributor workspace roles can delete the table
clone within the workspace.
Users who have ALTER SCHEMA permissions on the schema in which the table
clone resides can delete the table clone.

Table clone inheritance


The objects described here are included in the table clone:

The clone table inherits object-level SQL security from the source table of the
clone. As the workspace roles provide read access by default, DENY permission can
be set on the table clone if desired.

The clone table inherits the row-level security (RLS) and dynamic data masking
from the source of the clone table.

The clone table inherits all attributes that exist at the source table, whether the
clone was created within the same schema or across different schemas in a
warehouse.

The clone table inherits the primary and unique key constraints defined in the
source table.

A read-only delta log is created for every table clone that is created within the
Warehouse. The data files stored as delta parquet files are read-only. This ensures
that the data stays always protected from corruption.

Table clone scenarios


Consider the ability to clone tables near instantaneously and with minimal storage costs
in the following beneficial scenarios:

Development and testing


Table clones allow developers and testers to experiment, validate, and refine the tables
without affecting the tables in production environment. The clone provides a safe and
isolated space to conduct development and testing activities of new features, ensuring
the integrity and stability of the production environment. Use a table clone to quickly
spin up a copy of production-like environment for troubleshooting, experimentation,
development and testing purposes.

Consistent reporting, data exploration, and machine


learning modeling
To keep up with the ever-changing data landscape, frequent execution of ETL jobs is
essential. Table clones support this goal by ensuring data integrity while providing the
flexibility to generate reports based on the cloned tables, while background processing
is ongoing. Additionally, table clones enable the reproducibility of earlier results for
machine learning models. They also facilitate valuable insights by enabling historical
data exploration and analysis.

Low-cost, near-instantaneous recovery


In the event of accidental data loss or corruption, existing table clones can be used to
recover the table to its previous state.

Data archiving
For auditing or compliance purposes, zero copy clones can be easily used to create
copies of data as it existed at a particular point in time in the past. Some data might
need to be archived for long-term retention or legal compliance. Cloning the table at
various historical points ensures that data is preserved in its original form.

Limitations
Table clones across warehouses in a workspace are not currently supported.
Table clones across workspaces are not currently supported.
Clone table is not supported on the SQL analytics endpoint of the Lakehouse.
Clone of a warehouse or schema is currently not supported.
Table clones submitted before the retention period of thirty days cannot be
created.
Changes to the table schema prevent a clone from being created before to the
table schema change.
Next step
Tutorial: Clone tables in the Fabric portal

Related content
Tutorial: Clone a table using T-SQL in Microsoft Fabric
CREATE TABLE AS CLONE OF
Query the SQL analytics endpoint or Warehouse in Microsoft Fabric
Query data as it existed in the past

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


Tutorial: Clone tables in the Fabric portal
Article • 09/20/2024

Applies to: ✅ Warehouse in Microsoft Fabric

A zero-copy clone creates a replica of the table by copying the metadata, while still
referencing the same data files in OneLake. This tutorial guides you through creating a
table clone in Warehouse in Microsoft Fabric, using the warehouse editor with a no-
code experience.

Clone table as of current state


When you select the table, and select on more options, you get the Clone table menu.
This menu is also available via Table tools in the ribbon.

On clone table pane, you can see the source table schema and name is already
populated. The table state as current, creates clone of the source table as of its current
state. You can choose destination schema and edit pre-populated destination table
name. You can also see the generated T-SQL statement when you expand SQL
statement section. When you select the Clone button, a clone of the table is generated
and you can see it in Explorer.
Clone table as of past point-in-time
Similar to current state, you can also choose the past state of the table within last 30
days by selecting the date and time in UTC. This generates a clone of the table from a
specific point in time, selectable in the Date and time of past state fields.
Clone multiple tables at once
You can also clone a group of tables at once. This can be useful for cloning a group of
related tables at the same past point in time. By selecting source tables, current or past
table state, and destination schema, you can perform clone of multiple tables easily and
quickly.
With the Clone tables context menu on Tables folder in the Explorer, you can select
multiple tables for cloning.

The default naming pattern for cloned objects is source_table_name-Clone . The T-SQL
commands for the multiple CREATE TABLE AS CLONE OF statements are provided if
customization of the name is required.
Related content
Clone table in Microsoft Fabric
Tutorial: Clone table using T-SQL
CREATE TABLE AS CLONE OF

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


What is Mirroring in Fabric?
Article • 11/20/2024

Mirroring in Fabric is a low-cost and low-latency solution to bring data from various
systems together into a single analytics platform. You can continuously replicate your
existing data estate directly into Fabric's OneLake from a variety of Azure databases and
external data sources.

With the most up-to-date data in a queryable format in OneLake, you can now use all
the different services in Fabric, such as running analytics with Spark, executing
notebooks, data engineering, visualizing through Power BI Reports, and more.

Mirroring in Fabric allows users to enjoy a highly integrated, end-to-end, and easy-to-
use product that is designed to simplify your analytics needs. Built for openness and
collaboration between Microsoft, and technology solutions that can read the open-
source Delta Lake table format, Mirroring is a low-cost and low-latency turnkey solution
that allows you to create a replica of your data in OneLake which can be used for all
your analytical needs.

The Delta tables can then be used everywhere Fabric, allowing users to accelerate their
journey into Fabric.

Why use Mirroring in Fabric?


Today many organizations have mission critical operational or analytical data sitting in
silos.

Accessing and working with this data today requires complex ETL (Extract Transform
Load) pipelines, business processes, and decision silos, creating:

Restricted and limited access to important, ever changing, data


Friction between people, process, and technology
Long wait times to create data pipelines and processes to critically important data
No freedom to use the tools you need to analyze and share insights comfortably
Lack of a proper foundation for folks to share and collaborate on data
No common, open data formats for all analytical scenarios - BI, AI, Integration,
Engineering, and even Apps

Mirroring in Fabric provides an easy experience to speed the time-to-value for insights
and decisions, and to break down data silos between technology solutions:
Near real time replication of data and metadata into a SaaS data-lake, with built-in
analytics built-in for BI and AI

The Microsoft Fabric platform is built on a foundation of Software as a Service (SaaS),


which takes simplicity and integration to a whole new level. To learn more about
Microsoft Fabric, see What is Microsoft Fabric?

Mirroring creates three items in your Fabric workspace:

Mirroring manages the replication of data and metadata into OneLake and
conversion to Parquet, in an analytics-ready format. This enables downstream
scenarios like data engineering, data science, and more.
A SQL analytics endpoint
A Default semantic model

In addition to the SQL query editor, there's a broad ecosystem of tooling including SQL
Server Management Studio (SSMS), the mssql extension with Visual Studio Code, and
even GitHub Copilot.

Sharing enables ease of access control and management, to make sure you can control
access to sensitive information. Sharing also enables secure and democratized decision-
making across your organization.

Types of mirroring
Fabric offers three different approaches in bringing data into OneLake through
mirroring.

Database mirroring – Database mirroring in Microsoft Fabric allows replication of


entire databases and tables, allowing you to bring data from various systems
together into a single analytics platform.
Metadata mirroring – Metadata mirroring in Fabric synchronizes metadata (such
as catalog names, schemas, and tables) instead of physically moving the data. This
approach leverages shortcuts, ensuring the data remains in its source while still
being easily accessible within Fabric.
Open mirroring – Open mirroring in Fabric is designed to extend mirroring based
on open Delta Lake table format. This capability enables any developer to write
their application's change data directly into a mirrored database item in Microsoft
Fabric, based on the open mirroring approach and public APIs.

Currently, the following external databases are available:


ノ Expand table

Platform Near real-time Type of End-to-end tutorial


replication mirroring

Microsoft Fabric mirrored databases Yes Database Tutorial: Azure


from Azure Cosmos DB (preview) mirroring Cosmos DB

Microsoft Fabric mirrored databases Yes Metadata Tutorial: Azure


from Azure Databricks (preview) mirroring Databricks

Microsoft Fabric mirrored databases Yes Database Tutorial: Azure SQL


from Azure SQL Database mirroring Database

Microsoft Fabric mirrored databases Yes Database Tutorial: Azure SQL


from Azure SQL Managed Instance mirroring Managed Instance
(preview)

Microsoft Fabric mirrored databases Yes Database Tutorial: Snowflake


from Snowflake mirroring

Open mirrored databases (preview) Yes Open Tutorial: Open


mirroring mirroring

Microsoft Fabric mirrored databases Yes Database Automatically


from Fabric SQL database (preview) mirroring configured

How does the near real time replication of


database mirroring work?
Mirroring is enabled by creating a secure connection to your operational data source.
You choose whether to replicate an entire database or individual tables and Mirroring
will automatically keep your data in sync. Once set up, data will continuously replicate
into the OneLake for analytics consumption.

The following are core tenets of Mirroring:

Enabling Mirroring in Fabric is simple and intuitive, without having the need to
create complex ETL pipelines, allocate other compute resources, and manage data
movement.

Mirroring in Fabric is a fully managed service, so you don't have to worry about
hosting, maintaining, or managing replication of the mirrored connection.

How does metadata mirroring work?


Mirroring not only enables data replication but can also be achieved through shortcuts
or metadata mirroring rather than full data replication, allowing data to be available
without physically moving or duplicating it. Mirroring in this context refers to replicating
only metadata—such as catalog names, schemas, and tables—rather than the actual
data itself. This approach enables Fabric to make data from different sources accessible
without duplicating it, simplifying data management and minimizing storage needs.

For example, when accessing data registered in Unity Catalog, Fabric mirrors only the
catalog structure from Azure Databricks, allowing the underlying data to be accessed
through shortcuts. This method ensures that any changes in the source data are
instantly reflected in Fabric without requiring data movement, maintaining real-time
synchronization and enhancing efficiency in accessing up-to-date information.

How does open mirroring work?


In addition to mirroring enabling data replication by creating a secure connection to
your data source, you can also select an existing data provider or write your own
application to land data into mirrored database. Once you create an open mirrored
database via public API or via the Fabric portal, you will be able to obtain a landing zone
URL in OneLake, where you can land change data per open mirroring specification.

Once data is in the landing zone with the proper format, replication will start running
and manage the complexity of merging the changes with updates, insert, and delete to
be reflected into delta tables. This method ensures that any data written into the landing
zone will be immediately and keeping the data in Fabric up-to-date.

Sharing
Sharing enables ease of access control and management, while security controls like
Row-level security (RLS) and Object level security (OLS), and more make sure you can
control access to sensitive information. Sharing also enables secure and democratized
decision-making across your organization.

By sharing, users grant other users or a group of users access to a mirrored database
without giving access to the workspace and the rest of its items. When someone shares
a mirrored database, they also grant access to the SQL analytics endpoint and
associated default semantic model.

For more information, see Share your mirrored database and manage permissions.

Cross-database queries
With the data from your mirrored database stored in the OneLake, you can write cross-
database queries, joining data from mirrored databases, warehouses, and the SQL
analytics endpoints of Lakehouses in a single T-SQL query. For more information, see
Write a cross-database query.

For example, you can reference the table from mirrored databases and warehouses
using three-part naming. In the following example, use the three-part name to refer to
ContosoSalesTable in the warehouse ContosoWarehouse . From other databases or

warehouses, the first part of the standard SQL three-part naming convention is the
name of the mirrored database.

SQL

SELECT *
FROM ContosoWarehouse.dbo.ContosoSalesTable AS Contoso
INNER JOIN Affiliation
ON Affiliation.AffiliationId = Contoso.RecordTypeID;

Data Engineering with your mirrored database


data
Microsoft Fabric provides various data engineering capabilities to ensure that your data
is easily accessible, well-organized, and high-quality. From Fabric Data Engineering, you
can:

Create and manage your data as Spark using a lakehouse


Design pipelines to copy data into your lakehouse
Use Spark job definitions to submit batch/streaming job to Spark cluster
Use notebooks to write code for data ingestion, preparation, and transformation

Data Science with your mirrored database data


Microsoft Fabric offers Fabric Data Science to empower users to complete end-to-end
data science workflows for the purpose of data enrichment and business insights. You
can complete a wide range of activities across the entire data science process, all the
way from data exploration, preparation and cleansing to experimentation, modeling,
model scoring and serving of predictive insights to BI reports.

Microsoft Fabric users can access Data Science workloads. From there, they can discover
and access various relevant resources. For example, they can create machine learning
Experiments, Models and Notebooks. They can also import existing Notebooks on the
Data Science Home page.

SQL database in Fabric


You can also directly create and manage a SQL database in Microsoft Fabric (Preview)
inside the Fabric portal. Based on Azure SQL Database, SQL database in Fabric is
automatically mirrored for analytics purposes and allows you to easily create your
operational database in Fabric. SQL database is the home in Fabric for OLTP workloads,
and can integrate with Fabric's source control integration.

Related content
What is Microsoft Fabric?
Model data in the default Power BI semantic model in Microsoft Fabric
What is the SQL analytics endpoint for a lakehouse?
Direct Lake overview

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


Monitor Fabric mirrored database
replication
Article • 11/19/2024

Once mirroring is configured, visit the Monitor replication page to monitor the current
state of replication.

The Monitor replication pane shows you the current state of the source database
replication, with the corresponding statuses of the tables, total rows replicated, and last
refresh date/time as well.

Status
The following are the possible statuses for the replication:

ノ Expand table
Monitor Status

Database Running: Replication is currently running bringing snapshot and change data into
level OneLake.
Running with warning: Replication is running, with transient errors.
Stopping/Stopped: Replication has stopped.
Error: Fatal error in replication that can't be recovered.

Table level Running: Data is replicating.


Running with warning: Warning of nonfatal error with replication of the data
from the table.
Stopping/Stopped: Replication has stopped.
Error: Fatal error in replication for that table.

Related content
Troubleshoot Fabric mirrored databases
What is Mirroring in Fabric?

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


Share your mirrored database and
manage permissions
Article • 11/21/2024

When you share a mirrored database, you grant other users or groups access to the
mirrored database without giving access to the workspace and the rest of its items.
Sharing a mirrored database also grants access to the SQL analytics endpoint and the
associated default semantic model.

7 Note

You must be an admin or member in your workspace to share an item in Microsoft


Fabric.

Share a mirrored database


To share a mirrored database, navigate to your workspace, and select Share next to the
mirrored database name.

You're prompted with options to select who you would like to share the mirrored
database with, what permissions to grant them, and whether they'll be notified by email.

By default, sharing a mirrored database grants users Read permission to the mirrored
database, the associated SQL analytics endpoint, and the default semantic model. In
addition to these default permissions, you can grant:

"Read all SQL analytics endpoint data": Grants the recipient the ReadData
permission for the SQL analytics endpoint, allowing the recipient to read all data
via the SQL analytics endpoint using Transact-SQL queries.

"Read all OneLake data": Grants the ReadAll permission to the recipient, allowing
them to access the mirrored data in OneLake, for example, by using Spark or
OneLake Explorer.

"Build reports on the default semantic model": Grants the recipient the Build
permission for the default semantic model, enabling users to create Power BI
reports on top of the semantic model.

"Read and write": Grants the recipient the Write permission for the mirrored
database, allowing them to edit the mirrored database configuration and
read/write data from/to the landing zone.

Manage permissions
To review the permissions granted to a mirrored database, its SQL analytics endpoint, or
its default semantic model, navigate to one of these items in the workspace and select
the Manage permissions quick action.

If you have the Share permission for a mirrored database, you can also use the Manage
permissions page to grant or revoke permissions. To view existing recipients, select the
context menu (...) at the end of each row to add or remove specific permission.

7 Note

When mirroring data from Azure SQL Database or Azure SQL Managed Instance, its
System Assigned Managed Identity needs to have "Read and write" permission to
the mirrored database. If you create the mirrored database from the Fabric portal,
the permission is granted automatically. If you use API to create the mirrored
database, make sure you grant the permission following above instruction. You can
search the recipient by specifying the name of your Azure SQL Database logical
server or Azure SQL Managed Instance.

Related content
What is Mirroring in Fabric?
What is the SQL analytics endpoint for a lakehouse?

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


Explore data in your mirrored database
using Microsoft Fabric
Article • 11/19/2024

Learn more about all the methods to query the data in your mirrored database within
Microsoft Fabric.

Use the SQL analytics endpoint


Microsoft Fabric provides a read-only T-SQL serving layer for replicated delta tables. This
SQL-based experience is called the SQL analytics endpoint. You can analyze data in delta
tables using a no code visual query editor or T-SQL to create views, functions, stored
procedures, and apply SQL security.

To access the SQL analytics endpoint, select the corresponding item in the workspace
view or switch to the SQL analytics endpoint mode in the mirrored database explorer.
For more information, see What is the SQL analytics endpoint for a lakehouse?

Use Data view to preview data


The Data preview is one of the three switcher modes along with the Query editor and
Model view within the SQL analytics endpoint that provides an easy interface to view the
data within your tables or views to preview sample data (top 1,000 rows).

For more information, see View data in the Data preview in Microsoft Fabric.

Use Visual Queries to analyze data


The Visual Query Editor is a feature in Microsoft Fabric that provides a no-code
experience to create T-SQL queries against data in your mirrored database item. You can
drag and drop tables onto the canvas, design queries visually, and use Power Query
diagram view.

For more information, see Query using the visual query editor.

Use SQL Queries to analyze data


The SQL Query Editor is a feature in Microsoft Fabric that provides a query editor to
create T-SQL queries against data in your mirrored database item. The SQL query editor
provides support for IntelliSense, code completion, syntax highlighting, client-side
parsing, and validation.

For more information, see Query using the SQL query editor.

Use notebooks to explore your data with a


Lakehouse shortcut
Notebooks are a powerful code item for you to develop Apache Spark jobs and machine
learning experiments on your data. You can use notebooks in the Fabric Lakehouse to
explore your mirrored tables. You can access your mirrored database from the
Lakehouse with Spark queries in notebooks. You first need to create a shortcut from
your mirrored tables into the Lakehouse, and then build notebooks with Spark queries in
your Lakehouse.

For a step-by-step guide, see Explore data in your mirrored database with notebooks.

For more information, see Create shortcuts in lakehouse and see Explore the data in
your lakehouse with a notebook.

Access delta files directly


You can access mirrored database table data in Delta format files. Connect to the
OneLake directly through the OneLake file explorer or Azure Storage Explorer.

For a step-by-step guide, see Explore data in your mirrored database directly in
OneLake.

Model your data and add business semantics


In Microsoft Fabric, Power BI datasets are a semantic model with metrics; a logical
description of an analytical domain, with business friendly terminology and
representation, to enable deeper analysis. This semantic model is typically a star schema
with facts that represent a domain. Dimensions allow you to analyze the domain to drill
down, filter, and calculate different analyses. With the semantic model, the dataset is
created automatically for you, with inherited business logic from the parent mirrored
database. Your downstream analytics experience for business intelligence and analysis
starts with an item in Microsoft Fabric that is managed, optimized, and kept in sync with
no user intervention.
The default Power BI dataset inherits all relationships between entities defined in the
model view and infers them as Power BI dataset relationships, when objects are enabled
for BI (Power BI Reports). Inheriting the mirrored database's business logic allows a
warehouse developer or BI analyst to decrease the time to value toward building a
useful semantic model and metrics layer for analytical business intelligence (BI) reports
in Power BI, Excel, or external tools like Tableau, that read the XMLA format. For more
information, see Data modeling in the default Power BI dataset.

A well-defined data model is instrumental in driving your analytics and reporting


workloads. In a SQL analytics endpoint in Microsoft Fabric, you can easily build and
change your data model with a few simple steps in our visual editor. Modeling the
mirrored database item is possible by setting primary and foreign key constraints and
setting identity columns on the model view within the SQL analytics endpoint page in
the Fabric portal. After you navigate the model view, you can do this in a visual entity
relationship diagram. The diagram allows you to drag and drop tables to infer how the
objects relate to one another. Lines visually connecting the entities infer the type of
physical relationships that exist.

Create a report
Create a report directly from the semantic model (default) in three different ways:

SQL analytics endpoint editor in the ribbon


Data pane in the navigation bar
Semantic model (default) in the workspace

For more information, see Create reports in the Power BI service in Microsoft Fabric and
Power BI Desktop.

Related content
What is Mirroring in Fabric?
Model data in the default Power BI semantic model in Microsoft Fabric
What is the SQL analytics endpoint for a lakehouse?
Direct Lake overview

Feedback
Was this page helpful?  Yes  No
Provide product feedback | Ask the community
Explore data in your mirrored database
directly in OneLake
Article • 11/19/2024

You can access mirrored database table data in Delta format files. This tutorial provides
steps to connect to Azure Cosmos DB data directly with Azure Storage Explorer.

Prerequisites
Complete the tutorial to create a mirrored database from your source database.
Tutorial: Create a mirrored database from Azure Cosmos DB
Tutorial: Create a mirrored database from Azure Databricks
Tutorial: Create a mirrored database from Azure SQL Database
Tutorial: Create a mirrored database from Azure SQL Managed Instance
Tutorial: Create a mirrored database from Snowflake
Tutorial: Create an open mirrored database

Access OneLake files


1. Open the Mirrored Database item and navigate to the SQL analytics endpoint.
2. Select the ... dots next to any of the tables.
3. Select Properties. Select the copy button next to the URL.

4. Open the Azure Storage Explorer desktop application. If you don't have it,
download and install Azure Storage Explorer .
5. Connect to Azure Storage.
6. On the Select Resource page, select Azure Data Lake Storage (ADLS) Gen2 as the
resource.
7. Select Next.
8. On the Select Connection Method page, Sign in using OAuth. If you aren't signed
into the subscription, you should do that first with OAuth. And then access ADLS
Gen2 resource.
9. Select Next.
10. On the Enter Connection Info page, provide a Display name.
11. Paste the SQL analytics endpoint URL into the box for Blob container or directory
URL.
12. Select Next.
13. You can access delta files directly from Azure Storage Explorer.

 Tip

More examples:

Integrate OneLake with Azure Databricks


Use OneLake file explorer to access Fabric data
Use OneLake file explorer to access Fabric data

Related content
Explore data in your mirrored database using Microsoft Fabric
Explore data in your mirrored database with notebooks
Connecting to Microsoft OneLake

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


Explore data in your mirrored database
with notebooks
Article • 11/20/2024

You can explore the data replicated from your mirrored database with Spark queries in
notebooks.

Notebooks are a powerful code item for you to develop Apache Spark jobs and machine
learning experiments on your data. You can use notebooks in the Fabric Lakehouse to
explore your mirrored tables.

Prerequisites
Complete the tutorial to create a mirrored database from your source database.
Tutorial: Configure Microsoft Fabric mirrored database for Azure Cosmos DB
(Preview)
Tutorial: Configure Microsoft Fabric mirrored databases from Azure Databricks
(Preview)
Tutorial: Configure Microsoft Fabric mirrored databases from Azure SQL
Database
Tutorial: Configure Microsoft Fabric mirrored databases from Azure SQL
Managed Instance (Preview)
Tutorial: Configure Microsoft Fabric mirrored databases from Snowflake

Create a shortcut
You first need to create a shortcut from your mirrored tables into the Lakehouse, and
then build notebooks with Spark queries in your Lakehouse.

1. In the Fabric portal, open Data Engineering.

2. If you don't have a Lakehouse created already, select Lakehouse and create a new
Lakehouse by giving it a name.

3. Select Get Data -> New shortcut.

4. Select Microsoft OneLake.

5. You can see all your mirrored databases in the Fabric workspace.

6. Select the mirrored database you want to add to your Lakehouse, as a shortcut.
7. Select desired tables from the mirrored database.

8. Select Next, then Create.

9. In the Explorer, you can now see selected table data in your Lakehouse.

 Tip

You can add other data in Lakehouse directly or bring shortcuts like S3, ADLS
Gen2. You can navigate to the SQL analytics endpoint of the Lakehouse and
join the data across all these sources with mirrored data seamlessly.

10. To explore this data in Spark, select the ... dots next to any table. Select New
notebook or Existing notebook to begin analysis.

11. The notebook will automatically open and the load the dataframe with a SELECT
... LIMIT 1000 Spark SQL query.
New notebooks can take up to two minutes to load completely. You can
avoid this delay by using an existing notebook with an active session.

Related content
Explore data in your mirrored database using Microsoft Fabric
Create shortcuts in lakehouse
Explore the data in your lakehouse with a notebook

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


Mirroring Azure SQL Database
Article • 11/19/2024

Mirroring in Fabric provides an easy experience to avoid complex ETL (Extract Transform
Load) and integrate your existing Azure SQL Database estate with the rest of your data
in Microsoft Fabric. You can continuously replicate your existing Azure SQL Databases
directly into Fabric's OneLake. Inside Fabric, you can unlock powerful business
intelligence, artificial intelligence, Data Engineering, Data Science, and data sharing
scenarios.

For a tutorial on configuring your Azure SQL Database for Mirroring in Fabric, see
Tutorial: Configure Microsoft Fabric mirrored databases from Azure SQL Database.

To learn more and watch demos of Mirroring Azure SQL Database in Fabric, watch the
following the Data Exposed episode.
https://learn-video.azurefd.net/vod/player?show=data-exposed&ep=key-mirroring-to-
azure-sql-database-in-fabric-benefits-data-exposed&locale=en-
us&embedUrl=%2Ffabric%2Fdatabase%2Fmirrored-database%2Fazure-sql-database

Why use Mirroring in Fabric?


With Mirroring in Fabric, you don't need to piece together different services from
multiple vendors. Instead, you can enjoy a highly integrated, end-to-end, and easy-to-
use product that is designed to simplify your analytics needs, and built for openness and
collaboration between Microsoft, Azure SQL Database, and the 1000s of technology
solutions that can read the open-source Delta Lake table format.

What analytics experiences are built in?


Mirrored databases are an item in Fabric Data Warehousing distinct from the
Warehouse and SQL analytics endpoint.
Mirroring creates three items in your Fabric workspace:

The mirrored database item. Mirroring manages the replication of data into
OneLake and conversion to Parquet, in an analytics-ready format. This enables
downstream scenarios like data engineering, data science, and more.
A SQL analytics endpoint
A default semantic model

Each mirrored Azure SQL Database has an autogenerated SQL analytics endpoint that
provides a rich analytical experience on top of the Delta Tables created by the mirroring
process. Users have access to familiar T-SQL commands that can define and query data
objects but not manipulate the data from the SQL analytics endpoint, as it's a read-only
copy. You can perform the following actions in the SQL analytics endpoint:

Explore the tables that reference data in your Delta Lake tables from Azure SQL
Database.
Create no code queries and views and explore data visually without writing a line
of code.
Develop SQL views, inline TVFs (Table-valued Functions), and stored procedures to
encapsulate your semantics and business logic in T-SQL.
Manage permissions on the objects.
Query data in other Warehouses and Lakehouses in the same workspace.

In addition to the SQL query editor, there's a broad ecosystem of tooling that can query
the SQL analytics endpoint, including SQL Server Management Studio (SSMS), the mssql
extension with Visual Studio Code, and even GitHub Copilot.

Network requirements
Currently, Mirroring doesn't support Azure SQL Database logical servers behind an
Azure Virtual Network or private networking. If you have your Azure Database instance
behind a private network, you can't enable Azure SQL Database mirroring.

Currently, you must update your Azure SQL logical server firewall rules to Allow
public network access.
You must enable the Allow Azure services option to connect to your Azure SQL
Database logical server.

Active transactions, workloads, and replicator


engine behaviors
Active transactions continue to hold the transaction log truncation until the
transaction commits and the mirrored Azure SQL Database catches up, or the
transaction aborts. Long-running transactions might result in the transaction log
filling up more than usual. The source database transaction log should be
monitored so that the transaction log does not fill. For more information, see
Transaction log grows due to long-running transactions and CDC.
Each user workload varies. During initial snapshot, there might be more resource
usage on the source database, for both CPU and IOPS (input/output operations
per second, to read the pages). Table updates/delete operations can lead to
increased log generation. Learn more on how to monitor resources for your Azure
SQL Database.
The replicator engine monitors each table for changes independently. If there are
no updates in a source table, the replicator engine starts to back off with an
exponentially increasing duration for that table, up to an hour. The same can occur
if there is a transient error, preventing data refresh. The replicator engine will
automatically resume regular polling after updated data is detected.

Tier and purchasing model support


The source Azure SQL Database can be either a single database or a database in an
elastic pool.

All service tiers in the vCore purchasing model are supported.


For the DTU (Database Transaction Unit) purchasing model, databases created in
the Free, Basic, or Standard service tiers with fewer than 100 DTUs are not
supported.
Next step
Tutorial: Configure Microsoft Fabric mirrored databases from Azure SQL Database

Related content
How to: Secure data Microsoft Fabric mirrored databases from Azure SQL
Database
Limitations in Microsoft Fabric mirrored databases from Azure SQL Database
Monitor Fabric mirrored database replication
Troubleshoot Fabric mirrored databases from Azure SQL Database

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


Tutorial: Configure Microsoft Fabric
mirrored databases from Azure SQL
Database
Article • 11/19/2024

Mirroring in Fabric is an enterprise, cloud-based, zero-ETL, SaaS technology. In this


section, you learn how to create a mirrored Azure SQL Database, which creates a read-
only, continuously replicated copy of your Azure SQL Database data in OneLake.

Prerequisites
Create or use an existing Azure SQL Database.
The source Azure SQL Database can be either a single database or a database in
an elastic pool.
If you don't have an Azure SQL Database, create a new single database. Use the
Azure SQL Database free offer if you haven't already.
Review the tier and purchasing model requirements for Azure SQL Database.
During the current preview, we recommend using a copy of one of your existing
databases or any existing test or development database that you can recover
quickly from a backup. If you want to use a database from an existing backup,
see Restore a database from a backup in Azure SQL Database.
You need an existing capacity for Fabric. If you don't, start a Fabric trial.
If you want to mirror a database from an existing backup, see Restore a
database from a backup in Azure SQL Database.

The Fabric capacity needs to be active and running. A paused or deleted capacity
will affect Mirroring and no data will be replicated.
Enable the Fabric tenant setting Service principals can use Fabric APIs. To learn how
to enable tenant settings, see Fabric Tenant settings.
Networking requirements for Fabric to access your Azure SQL Database:
Currently, Mirroring doesn't support Azure SQL Database logical servers behind
an Azure Virtual Network or private networking. If you have your Azure SQL
logical server behind a private network, you can't enable Azure SQL Database
mirroring.
You need to update your Azure SQL logical server firewall rules to Allow public
network access, and enable the Allow Azure services option to connect to your
Azure SQL Database logical server.
Enable System Assigned Managed Identity (SAMI) of your
Azure SQL logical server
The System Assigned Managed Identity (SAMI) of your Azure SQL logical server must be
enabled, and must be the primary identity, to publish data to Fabric OneLake.

1. To configure or verify that the SAMI is enabled, go to your logical SQL Server in the
Azure portal. Under Security in the resource menu, select Identity.
2. Under System assigned managed identity, select Status to On.
3. The SAMI must be the primary identity. Verify the SAMI is the primary identity with
the following T-SQL query: SELECT * FROM sys.dm_server_managed_identities;

Database principal for Fabric


Next, you need to create a way for the Fabric service to connect to your Azure SQL
Database.

You can accomplish this with a login and mapped database user.

Use a login and mapped database user


1. Connect to your Azure SQL logical server using SQL Server Management Studio
(SSMS) or the mssql extension with Visual Studio Code. Connect to the master
database.

2. Create a server login and assign the appropriate permissions.

Create a SQL Authenticated login named fabric_login . You can choose any
name for this login. Provide your own strong password. Run the following T-
SQL script in the master database:

SQL

CREATE LOGIN fabric_login WITH PASSWORD = '<strong password>';


ALTER SERVER ROLE [##MS_ServerStateReader##] ADD MEMBER fabric_login;

Or, create a Microsoft Entra ID authenticated login from an existing account.


Run the following T-SQL script in the master database:

SQL

CREATE LOGIN [[email protected]] FROM EXTERNAL PROVIDER;


ALTER SERVER ROLE [##MS_ServerStateReader##] ADD MEMBER
[[email protected]];

3. Connect to the Azure SQL Database your plan to mirror to Microsoft Fabric, using
the Azure portal query editor, SQL Server Management Studio (SSMS), or the
mssql extension with Visual Studio Code.

4. Create a database user connected to the login:

SQL

CREATE USER fabric_user FOR LOGIN fabric_login;


GRANT CONTROL TO fabric_user;

Or,

SQL

CREATE USER [[email protected]] FOR LOGIN [[email protected]];


GRANT CONTROL TO [[email protected]];

Create a mirrored Azure SQL Database


1. Open the Fabric portal .
2. Use an existing workspace, or create a new workspace.
3. Navigate to the Create pane. Select the Create icon.
4. Scroll to the Data Warehouse section and then select Mirrored Azure SQL
Database. Enter the name of your Azure SQL Database to be mirrored, then select
Create.

Connect to your Azure SQL Database


To enable Mirroring, you will need to connect to the Azure SQL logical server from
Fabric to initiate connection between SQL Database and Fabric. The following steps
guide you through the process of creating the connection to your Azure SQL Database:

1. Under New sources, select Azure SQL Database. Or, select an existing Azure SQL
Database connection from the OneLake hub.
2. If you selected New connection, enter the connection details to the Azure SQL
Database.

Server: You can find the Server name by navigating to the Azure SQL
Database Overview page in the Azure portal. For example, server-
name.database.windows.net .

Database: Enter the name of your Azure SQL Database.


Connection: Create new connection.
Connection name: An automatic name is provided. You can change it.
Authentication kind:
Basic (SQL Authentication)
Organization account (Microsoft Entra ID)
Tenant ID (Azure Service Principal)
3. Select Connect.

Start mirroring process


1. The Configure mirroring screen allows you to mirror all data in the database, by
default.

Mirror all data means that any new tables created after Mirroring is started
will be mirrored.

Optionally, choose only certain objects to mirror. Disable the Mirror all data
option, then select individual tables from your database.

For this tutorial, we select the Mirror all data option.

2. Select Mirror database. Mirroring begins.

3. Wait for 2-5 minutes. Then, select Monitor replication to see the status.

4. After a few minutes, the status should change to Running, which means the tables
are being synchronized.

If you don't see the tables and the corresponding replication status, wait a few
seconds and then refresh the panel.

5. When they have finished the initial copying of the tables, a date appears in the
Last refresh column.

6. Now that your data is up and running, there are various analytics scenarios
available across all of Fabric.

) Important

Any granular security established in the source database must be re-configured in


the mirrored database in Microsoft Fabric.
Monitor Fabric Mirroring
Once mirroring is configured, you're directed to the Mirroring Status page. Here, you
can monitor the current state of replication.

For more information and details on the replication states, see Monitor Fabric mirrored
database replication.

) Important

If there are no updates in the source tables, the replicator engine will start to back
off with an exponentially increasing duration, up to an hour. The replicator engine
will automatically resume regular polling after updated data is detected.

Related content
Mirroring Azure SQL Database
What is Mirroring in Fabric?

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


Frequently asked questions for
Mirroring Azure SQL Database in
Microsoft Fabric
FAQ

This article answers frequently asked questions about Mirroring Azure SQL Database in
Microsoft Fabric.

Features and capabilities


What authentication to the Azure SQL Database
is allowed?
Currently, for authentication to the source Azure SQL Database, we support SQL
authentication with user name and password, Microsoft Entra ID, and Service Principal.

Is there a staging or landing zone for Azure SQL


Database? If so, is it outside of OneLake?
A landing zone in OneLake stores both the snapshot and change data, to improve
performance when converting files into delta verti-parquet.

How long does the initial replication take?


It depends on the size of the data that is being brought in.

How long does it take to replicate


inserts/updates/deletes?
Near real-time latency.

Do you support replicating of views, transient or


external tables?
No. Currently, only replicating regular tables are supported.
How do I manage connections?
Select the settings cog, then select on Manage connection and gateways. You can also
delete existing connections from this page.

Can Power BI reports on mirrored data use direct


lake mode?
Yes, since tables are all v-ordered delta tables.

Self-help for Mirroring Azure SQL


Database in Microsoft Fabric
How do I know Fabric is replicating data on my
Azure SQL Database?
If you're experiencing mirroring problems, perform the following database level checks
using Dynamic Management Views (DMVs) and stored procedures to validate
configuration. Contact support if troubleshooting is required.

Execute the following query to check if the changes properly flow:

SELECT * FROM sys.dm_change_feed_log_scan_sessions

For troubleshooting steps, see Troubleshoot Fabric mirrored databases from Azure SQL
Database.

How to enable System assigned managed


identity (SAMI) on SQL Server?
With a single step in the Azure portal, you can enable System Assigned Managed
Identity (SAMI) of your Azure SQL logical server.

What are the replication statuses?


See Monitor Fabric Mirror replication.
Can Azure SQL Database mirroring be accessed
through the Power BI Gateway or behind a
firewall?
Currently, access through the Power BI Gateway or behind a firewall is unsupported.

What steps does restarting the Mirroring


include?
The data from source tables will be reinitialized. Each time you stop and start, the entire
table is fetched again.

What happens if I remove a table from


Mirroring?
The table is no longer replicated and its data is deleted from OneLake.

If I delete the Mirror, does it affect the source


Azure SQL Database?
No, we just remove the streaming tables.

Can I mirror the same source database multiple


times?
No, each Azure SQL Database can only be mirrored once. You just need a single copy of
the data in Fabric OneLake, which you can share with others.

Can I mirror only specific tables from my Azure


SQL Database?
Yes, specific tables can be selected during Mirroring configuration.

What happens to Mirroring in an event of


planned or unplanned Geo failover?
Mirroring is disabled in an event of geo failover, whether planned or unplanned, as
there are potential data loss scenarios. If this occurs, create a new mirror and configure
it to point to the new logical SQL server and Azure SQL Database.

Security
Is data ever leaving the customers Fabric tenant?
No.

Is data staged outside of a customer


environment?
No. Data isn't staged outside of customer environment, it's staged in the customer's
OneLake.

Cost Management
What are the costs associated with Mirroring?
There is no compute cost for mirroring data from the source to Fabric OneLake. The
Mirroring storage cost is free up to a certain limit based on the purchased compute
capacity SKU you provision. Learn more from the Mirroring section in Microsoft Fabric -
Pricing .

What do we recommend a customer do to avoid


or reduce Azure SQL Database costs?
See Plan and manage costs for Azure SQL Database. Consider using a dedicated, smaller
Azure SQL Database, based on requirements.

How are ingress fees handled?


Fabric doesn't charge for Ingress fees into OneLake for Mirroring.

How are egress fees handled?


If the Azure SQL Database is located in a different region from your Fabric capacity, data
egress will be charged. If in the same region, there is no data egress.

Licensing
What are licensing options for Fabric Mirroring?
A Power BI Premium, Fabric Capacity, or Trial Capacity is required. For more information
on licensing, see Microsoft Fabric licenses.

Stop or pause Fabric Mirroring


What are the results of stopping Mirroring?
Replication stops in the source database, but a copy of the tables is kept in OneLake.
Restarting the mirroring results in all data being replicated from the start.

How to stop/disable Mirroring from your Azure


SQL Database?
If you're unable to Stop mirroring your Azure SQL Database from the Fabric portal, or
unable to delete your mirrored Azure SQL Database item from Fabric, execute the
following stored procedure on your Azure SQL Database: exec
sp_change_feed_disable_db;

What if I stop or pause my Fabric capacity?


The Fabric capacity needs to be active and running. A paused or deleted capacity will
impact Mirroring and no data will be replicated.

Related content
What is Mirroring in Fabric?
Azure SQL Database mirroring in Microsoft Fabric
Troubleshoot Fabric mirrored databases from Azure SQL Database.

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


How to: Secure data Microsoft Fabric
mirrored databases from Azure SQL
Database
Article • 11/19/2024

This guide helps you establish data security in your mirrored Azure SQL Database in
Microsoft Fabric.

Security requirements
1. The System Assigned Managed Identity (SAMI) of your Azure SQL logical server
needs to be enabled, and must be the primary identity. To configure, go to your
logical SQL Server in the Azure portal. Under Security the resource menu, select
Identity. Under System assigned managed identity, select Status to On.

After enabling the SAMI, if the SAMI is disabled or removed, the mirroring of
Azure SQL Database to Fabric OneLake will fail.
After enabling the SAMI, if you add a user assigned managed identity (UAMI),
it will become the primary identity, replacing the SAMI as primary. This will
cause replication to fail. To resolve, remove the UAMI.

2. Fabric needs to connect to the Azure SQL database. For this purpose, create a
dedicated database user with limited permissions, to follow the principle of least
privilege. Create either a login with a strong password and connected user, or a
contained database user with a strong password. For a tutorial, see Tutorial:
Configure Microsoft Fabric mirrored databases from Azure SQL Database.

) Important

Any granular security established in the source database must be re-configured in


the mirrored database in Microsoft Fabric. For more information, see SQL granular
permissions in Microsoft Fabric.

Data protection features


You can secure column filters and predicate-based row filters on tables to roles and
users in Microsoft Fabric:
Row-level security in Fabric data warehousing
Column-level security in Fabric data warehousing

You can also mask sensitive data from non-admins using dynamic data masking:

Dynamic data masking in Fabric data warehousing

Related content
What is Mirroring in Fabric?
SQL granular permissions in Microsoft Fabric

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


Troubleshoot Fabric mirrored databases
from Azure SQL Database
Article • 11/19/2024

This article covers troubleshooting steps troubleshooting for mirroring Azure SQL
Database.

For troubleshooting the automatically configured mirroring for Fabric SQL database, see
Troubleshoot mirroring from Fabric SQL database (preview).

Changes to Fabric capacity or workspace


ノ Expand table

Cause Result Recommended resolution

Fabric capacity Mirroring will 1. Resume or assign capacity from the Azure portal
paused/deleted stop 2. Go to Fabric mirrored database item. From the toolbar,
select Stop replication.
3. Start replication by selecting Mirror database for the
mirrored item in the Fabric portal.

Fabric capacity Mirroring will 1. Go to Fabric mirrored database item. From the toolbar,
resumed not be resumed select Stop replication.
2. Start replication by selecting Mirror database for the
mirrored item in the Fabric portal.

Workspace Mirroring stops If mirroring is still active on the Azure SQL Database,
deleted automatically execute the following stored procedure on your Azure
SQL Database: exec sp_change_feed_disable_db; .

Fabric trial Mirroring stops See Fabric trial capacity expires.


capacity expired automatically

Fabric capacity Mirroring will Wait until the overload state is over or update your
exceeded pause capacity. Learn more from Actions you can take to recover
from overload situations. Mirroring will continue once the
capacity is recovered.

T-SQL queries for troubleshooting


If you're experiencing mirroring problems, perform the following database level checks
using Dynamic Management Views (DMVs) and stored procedures to validate
configuration.

1. Execute the following query to check if the changes properly flow:

SQL

SELECT * FROM sys.dm_change_feed_log_scan_sessions;

2. If the sys.dm_change_feed_log_scan_sessions DMV doesn't show any progress on


processing incremental changes, execute the following T-SQL query to check if
there are any problems reported:

SQL

SELECT * FROM sys.dm_change_feed_errors;

3. If there aren't any issues reported, execute the following stored procedure to
review the current configuration of the mirrored Azure SQL Database. Confirm it
was properly enabled.

SQL

EXEC sp_help_change_feed;

The key columns to look for here are the table_name and state . Any value besides
4 indicates a potential problem.

4. If replication is still not working, verify that the correct SAMI object has
permissions.
a. In the Fabric portal, select the "..." ellipses option on the mirrored database item.
b. Select the Manage Permissions option.
c. Confirm that the Azure SQL logical server name shows with Read, Write
permissions.
d. Ensure that AppId that shows up matches the ID of the SAMI of your Azure SQL
Database logical server.

5. Contact support if troubleshooting is required.

Managed identity
The System Assigned Managed Identity (SAMI) of the Azure SQL logical server needs to
be enabled, and must be the primary identity. For more information, see Create an
Azure SQL Database server with a user-assigned managed identity.

After enablement, if SAMI setting status is either turned Off or initially enabled, then
disabled, and then enabled again, the mirroring of Azure SQL Database to Fabric
OneLake will fail.

The SAMI must be the primary identity. Verify the SAMI is the primary identity with the
following: SELECT * FROM sys.dm_server_managed_identities;

User Assigned Managed Identity (UAMI) is not supported. If you add a UAMI, it becomes
the primary identity, replacing the SAMI as primary. This causes replication to fail. To
resolve:

Remove all UAMIs. Verify that the SAMI is enabled.

SPN permissions
Do not remove Azure SQL Database service principal name (SPN) contributor
permissions on Fabric mirrored database item.

If you accidentally remove the SPN permission, Mirroring Azure SQL Database will not
function as expected. No new data can be mirrored from the source database.

If you remove Azure SQL Database SPN permissions or permissions are not set up
correctly, use the following steps.

1. Add the SPN as a user by selecting the ... ellipses option on the mirrored
database item.
2. Select the Manage Permissions option.
3. Enter the name of the Azure SQL Database logical server name. Provide Read and
Write permissions.

Related content
Limitations of Microsoft Fabric Data Warehouse
Frequently asked questions for Mirroring Azure SQL Database in Microsoft Fabric

Feedback
Was this page helpful?  Yes  No
Provide product feedback | Ask the community
Limitations in Microsoft Fabric mirrored
databases from Azure SQL Database
Article • 11/19/2024

Current limitations in the Microsoft Fabric mirrored databases from Azure SQL Database
are listed in this page. This page is subject to change.

For troubleshooting, see:

Troubleshoot Fabric mirrored databases


Troubleshoot Fabric mirrored databases from Azure SQL Database

Database level limitations


Fabric Mirroring for Azure SQL Database is only supported on a writable primary
database.
Azure SQL Database cannot be mirrored if the database has: enabled Change Data
Capture (CDC), Azure Synapse Link for SQL, or the database is already mirrored in
another Fabric workspace.
The maximum number of tables that can be mirrored into Fabric is 500 tables. Any
tables above the 500 limit currently cannot be replicated.
If you select Mirror all data when configuring Mirroring, the tables to be
mirrored over are the first 500 tables when all tables are sorted alphabetically
based on the schema name and then the table name. The remaining set of
tables at the bottom of the alphabetical list are not mirrored over.
If you unselect Mirror all data and select individual tables, you are prevented
from selecting more than 500 tables.

Permissions in the source database


Row-level security is supported, but permissions are currently not propagated to
the replicated data in Fabric OneLake.
Object-level permissions, for example granting permissions to certain columns, are
currently not propagated to the replicated data in Fabric OneLake.
Dynamic data masking settings are currently not propagated to the replicated data
in Fabric OneLake.
To successfully configure Mirroring for Azure SQL Database, the principal used to
connect to the source Azure SQL Database must be granted the permission ALTER
ANY EXTERNAL MIRROR, which is included in higher level permission like
CONTROL permission or the db_owner role.

Network and connectivity security


The source SQL server needs to enable Allow public network access and Allow
Azure services to connect.
The System Assigned Managed Identity (SAMI) of the Azure SQL logical server
needs to be enabled and must be the primary identity.
The Azure SQL Database service principal name (SPN) contributor permissions
should not be removed from the Fabric mirrored database item.
Mirroring across Microsoft Entra tenants is not supported where an Azure SQL
Database and the Fabric workspace are in separate tenants.
Microsoft Purview Information Protection/sensitivity labels defined in Azure SQL
Database are not cascaded and mirrored to Fabric OneLake.

Table level
A table that does not have a defined primary key cannot be mirrored.
A table using a primary key defined as nonclustered primary key cannot be
mirrored.
A table cannot be mirrored if the primary key is one of the data types: sql_variant,
timestamp/rowversion.
Delta lake supports only six digits of precision.
Columns of SQL type datetime2, with precision of 7 fractional second digits, do
not have a corresponding data type with same precision in Delta files in Fabric
OneLake. A precision loss happens if columns of this type are mirrored and
seventh decimal second digit will be trimmed.
A table cannot be mirrored if the primary key is one of these data types:
datetime2(7), datetimeoffset(7), time(7), where 7 is seven digits of precision.
The datetimeoffset(7) data type does not have a corresponding data type with
same precision in Delta files in Fabric OneLake. A precision loss (loss of time
zone and seventh time decimal) occurs if columns of this type are mirrored.
Clustered columnstore indexes are not currently supported.
If one or more columns in the table is of type Large Binary Object (LOB) with a size
> 1 MB, the column data is truncated to size of 1 MB in Fabric OneLake.
Source tables that have any of the following features in use cannot be mirrored.
Temporal history tables and ledger history tables
Always Encrypted
In-memory tables
Graph
External tables
The following table-level data definition language (DDL) operations aren't allowed
on SQL database source tables when enabled for mirroring.
Switch/Split/Merge partition
Alter primary key
When there is DDL change, a complete data snapshot is restarted for the changed
table, and data is reseeded.
Currently, a table cannot be mirrored if it has the json or vector data type.
Currently, you cannot ALTER a column to the vector or json data type when a
table is mirrored.

Column level
If the source table contains computed columns, these columns cannot be mirrored
to Fabric OneLake.
If the source table contains columns with one of these data types, these columns
cannot be mirrored to Fabric OneLake. The following data types are unsupported
for mirroring:
image
text/ntext
xml
rowversion/timestamp
sql_variant
User Defined Types (UDT)
geometry
geography
Column names for a SQL table cannot contain spaces nor the following characters:
, ; { } ( ) \n \t = .

Warehouse limitations
Source schema hierarchy is not replicated to the mirrored database. Instead,
source schema is flattened, and schema name is encoded into the mirrored
database table name.

Mirrored item limitations


User needs to be a member of the Admin/Member role for the workspace to
create SQL Database mirroring.
Stopping mirroring disables mirroring completely.
Starting mirroring reseeds all the tables, effectively starting from scratch.

SQL analytics endpoint limitations

The SQL analytics endpoint is the same as the Lakehouse SQL analytics endpoint. It
is the same read-only experience. See SQL analytics endpoint limitations.

Fabric regions that support Mirroring


The following are the Fabric regions that support Mirroring for Azure SQL Database:

Asia Pacific:
Australia East
Australia Southeast
Central India
East Asia
Japan East
Korea Central
Southeast Asia
South India

Europe
North Europe
West Europe
France Central
Germany West Central
Norway East
Sweden Central
Switzerland North
Switzerland West
UK South
UK West

Americas:
Brazil South
Canada Central
Canada East
Central US
East US
East US2
North Central US
West US
West US2

Middle East and Africa:


South Africa North
UAE North

Next step
Tutorial: Configure Microsoft Fabric mirrored databases from Azure SQL Database

Related content
Monitor Fabric mirrored database replication
Model data in the default Power BI semantic model in Microsoft Fabric

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


Mirroring Azure SQL Managed Instance
(Preview)
Article • 11/19/2024

Mirroring in Fabric provides an easy experience to avoid complex ETL (Extract Transform
Load) and integrate your existing Azure SQL Managed Instance estate with the rest of
your data in Microsoft Fabric. You can continuously replicate your existing SQL Managed
Instance databases directly into Fabric's OneLake. Inside Fabric, you can unlock powerful
business intelligence, artificial intelligence, Data Engineering, Data Science, and data
sharing scenarios.

For a tutorial on configuring your Azure SQL Managed Instance for Mirroring in Fabric,
see Tutorial: Configure Microsoft Fabric mirrored databases from Azure SQL Managed
Instance (Preview).

Why use Mirroring in Fabric?


With Mirroring in Fabric, you don't need to piece together different services from
multiple vendors. Instead, you can enjoy a highly integrated, end-to-end, and easy-to-
use product that is designed to simplify your analytics needs, and built for openness and
collaboration between Microsoft, Azure SQL Managed Instance, and the 1000s of
technology solutions that can read the open-source Delta Lake table format.

What analytics experiences are built in?


Mirrored databases are an item in the Fabric Data Warehouse distinct from the
Warehouse and SQL analytics endpoint.
Mirroring creates three items in your Fabric workspace:

The mirrored database item. Mirroring manages the replication of data into
OneLake and conversion to Parquet, in an analytics-ready format. This enables
downstream scenarios like data engineering, data science, and more.
A SQL analytics endpoint
A default semantic model

Each mirrored Azure SQL Managed Instance has an autogenerated SQL analytics
endpoint that provides a rich analytical experience on top of the Delta Tables created by
the mirroring process. Users have access to familiar T-SQL commands that can define
and query data objects but not manipulate the data from the SQL analytics endpoint, as
it's a read-only copy. You can perform the following actions in the SQL analytics
endpoint:

Explore the tables that reference data in your Delta Lake tables from Azure SQL
Managed Instance.
Create no code queries and views and explore data visually without writing a line
of code.
Develop SQL views, inline TVFs (Table-valued Functions), and stored procedures to
encapsulate your semantics and business logic in T-SQL.
Manage permissions on the objects.
Query data in other Warehouses and Lakehouses in the same workspace.

In addition to the SQL query editor, there's a broad ecosystem of tooling that can query
the SQL analytics endpoint, including SQL Server Management Studio (SSMS), Azure
Data Studio, and even GitHub Copilot.

Network requirements
During the current preview, Fabric Mirroring for Azure SQL Managed Instance requires
you to use the Public Endpoint and to configure your SQL managed instance VNET to
allow traffic from and to Azure services. You can use Azure Cloud or Power BI service
tags to scope this configuration:

Currently, you must update your Azure SQL Managed Instance network security to
Enable public endpoints.
Currently, you must allow Public Endpoint traffic in the network security group
option to be able connect your Fabric workspace to your Azure SQL Managed
Instance.

Active transactions, workloads, and replicator


engine behaviors
Active transactions continue to hold the transaction log truncation until the
transaction commits and the mirrored Azure SQL Managed Instance catches up, or
the transaction aborts. Long-running transactions might result in the transaction
log filling up more than usual. The source database transaction log should be
monitored so that the transaction log doesn't fill. For more information, see
Transaction log grows due to long-running transactions and CDC.
Each user workload varies. During initial snapshot, there might be more resource
usage on the source database, for both CPU and IOPS (input/output operations
per second, to read the pages). Table updates/delete operations can lead to
increased log generation. Learn more on how to monitor resources for your Azure
SQL Managed Instance.
The replicator engine monitors each table for changes independently. If there are
no updates in a source table, the replicator engine starts to back off with an
exponentially increasing duration for that table, up to an hour. The same can occur
if there's a transient error, preventing data refresh. The replicator engine will
automatically resume regular polling after updated data is detected.

Tier and purchasing model support


The source Azure SQL Managed Instance can be either a single SQL managed instance
or a SQL managed instance belonging to an instance pool.

All service tiers in the vCore purchasing model are supported.

Next step
Tutorial: Configure Microsoft Fabric mirrored databases from Azure SQL Managed
Instance (Preview)

Related content
How to: Secure data Microsoft Fabric mirrored databases from Azure SQL
Managed Instance (Preview)
Limitations in Microsoft Fabric mirrored databases from Azure SQL Managed
Instance (Preview)
Monitor Fabric mirrored Managed Instance database replication
Troubleshoot Fabric mirrored databases from Azure SQL Managed Instance
(Preview)

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


Tutorial: Configure Microsoft Fabric
mirrored databases from Azure SQL
Managed Instance (Preview)
Article • 11/19/2024

Mirroring in Fabric is an enterprise, cloud-based, zero-ETL, SaaS technology. In this


section, you learn how to create a mirrored Azure SQL Managed Instance database,
which represents a read-only, continuously replicated copy of chosen database from
your Azure SQL Managed Instance in OneLake.

Prerequisites
Create or use an existing Azure SQL Managed Instance.
Update Policy for source Azure SQL Managed Instance needs to be configured
to "Always up to date"
The source Azure SQL Managed Instance can be either a single SQL managed
instance or a SQL managed instance belonging to an instance pool.
If you don't have an Azure SQL Managed Instance, you can create a new SQL
managed instance. You can use the Azure SQL Managed Instance free offer if
you like.
During the current preview, we recommend using a copy of one of your existing
databases or any existing test or development database that you can recover
quickly from a backup. If you want to use a database from an existing backup,
see Restore a database from a backup in Azure SQL Managed Instance.
You need an existing capacity for Fabric. If you don't, start a Fabric trial.
The Fabric capacity needs to be active and running. A paused or deleted
capacity impacts Mirroring and no data are replicated.
Enable the Fabric tenant setting Service principals can use Fabric APIs. To learn how
to enable tenant settings, see About tenant settings.
Networking requirements for Fabric to access your Azure SQL Managed Instance:
In the current preview, Mirroring requires that your Azure SQL Managed
Instance has a public endpoint which needs to be accessible from Azure Cloud
or Power BI service tags. For more information, see Use Azure SQL Managed
Instance securely with public endpoints how to securely run a public endpoint
for Azure SQL Managed Instance.
Enable System Assigned Managed Identity (SAMI) of your
Azure SQL Managed Instance
The System Assigned Managed Identity (SAMI) of your Azure SQL Managed Instance
must be enabled, and must be the primary identity, to publish data to Fabric OneLake.

1. To configure or verify that the SAMI is enabled, go to your SQL Managed Instance
in the Azure portal. Under Security in the resource menu, select Identity.
2. Under System assigned managed identity, select Status to On.
3. The SAMI must be the primary identity. Verify the SAMI is the primary identity with
the following T-SQL query: SELECT * FROM sys.dm_server_managed_identities;

Database principal for Fabric


Next, you need to create a way for the Fabric service to connect to your Azure SQL
Managed Instance.

You can accomplish this with a login and mapped database user. Following the principle
of least privilege for security, you should only grant CONTROL DATABASE permission in
the database you intend to mirror.

Use a login and mapped database user


1. Connect to your Azure SQL Managed Instance using SQL Server Management
Studio (SSMS) or Azure Data Studio. Connect to the master database.

2. Create a server login and assign the appropriate permissions.

Create a SQL Authenticated login. You can choose any name for this login,
substitute it in the following script for <fabric_login> . Provide your own
strong password. Run the following T-SQL script in the master database:

SQL

CREATE LOGIN <fabric_login> WITH PASSWORD = '<strong password>';


ALTER SERVER ROLE [##MS_ServerStateReader##] ADD MEMBER <fabric_login>;

Or, create a Microsoft Entra ID authenticated login from an existing account.


Run the following T-SQL script in the master database:

SQL
CREATE LOGIN [[email protected]] FROM EXTERNAL PROVIDER;
ALTER SERVER ROLE [##MS_ServerStateReader##] ADD MEMBER
[[email protected]];

3. Switch your query scope to the database you want to mirror. Substitute the name
of your database for <mirroring_source_database> and run the following T-SQL:

SQL

USE [<mirroring_source_database>];

4. Create a database user connected to the login. Substitute the name of a new
database user for this purpose for <fabric_user> :

SQL

CREATE USER <fabric_user> FOR LOGIN <fabric_login>;


GRANT CONTROL TO <fabric_user>;

Or, for Microsoft Entra logins,

SQL

CREATE USER [[email protected]] FOR LOGIN [[email protected]];


GRANT CONTROL TO [[email protected]];

Create a mirrored Azure SQL Managed Instance


database
1. Open the Fabric portal .
2. Use an existing workspace, or create a new workspace.
3. Navigate to the Create pane. Select the Create icon.
4. Scroll to the Data Warehouse section and then select Mirrored Azure SQL
Managed Instance (preview).

Connect to your Azure SQL Managed Instance


To enable Mirroring, you need to connect to the Azure SQL Managed Instance from
Fabric to initiate connection between SQL Managed Instance and Fabric. The following
steps guide you through the process of creating the connection to your Azure SQL
Managed Instance:

1. Under New sources, select Azure SQL Managed Instance. Or, select an existing
Azure SQL Managed Instance connection from the OneLake catalog.
a. You can't use existing Azure SQL Managed Instance connections with type "SQL
Server" (generic connection type). Only connections with connection type "SQL
Managed Instance" are supported for mirroring of Azure SQL Managed Instance
data.
2. If you selected New connection, enter the connection details to the Azure SQL
Managed Instance. You need to connect to a specific database, you can't set up
mirroring for the entire SQL managed instance and all its databases.

Server: You can find the Server name by navigating to the Azure SQL
Managed Instance Networking page in the Azure portal (under Security
menu) and looking at the Public Endpoint field. For example,
<managed_instance_name>.public.<dns_zone>.database.windows.net,3342 .

Database: Enter the name of database you wish to mirror.


Connection: Create new connection.
Connection name: An automatic name is provided. You can change it to
facilitate finding this SQL managed instance database connection at a future
time, if needed.
Authentication kind:
Basic (SQL Authentication)
Organization account (Microsoft Entra ID)
Tenant ID (Azure Service Principal)

3. Select Connect.

Start mirroring process


1. The Configure mirroring screen allows you to mirror all data in the database, by
default.

Mirror all data means that any new tables created after Mirroring is started
will be mirrored.

Optionally, choose only certain objects to mirror. Disable the Mirror all data
option, then select individual tables from your database.

If tables can't be mirrored at all, they show an error icon and relevant
explanation text. Likewise, if tables can only mirror with limitations, a warning
icon is shown with relevant explanation text.

For this tutorial, we select the Mirror all data option.

2. On the next screen, give the destination item a name and select Create mirrored
database. Now wait a minute or two for Fabric to provision everything for you.

3. After 2-5 minutes, select Monitor replication to see the status.

4. After a few minutes, the status should change to Running, which means the tables
are being synchronized.

If you don't see the tables and the corresponding replication status, wait a few
seconds and then refresh the panel.

5. When the initial copying of the tables is finished, a date appears in the Last refresh
column.

6. Now that your data is up and running, there are various analytics scenarios
available across all of Fabric.

) Important

Any granular security established in the source database must be re-configured in


the mirrored database in Microsoft Fabric.

Monitor Fabric Mirroring


Once mirroring is configured, you're directed to the Mirroring Status page. Here, you
can monitor the current state of replication.

These are the replicating statuses:

For overall database level monitoring:


Running – Replication is currently running bringing snapshot and change data
into OneLake.
Running with warning: Replication is running, with transient errors
Stopping/Stopped – Replication is stopped.
Error – Fatal error in replication that can't be recovered.

For table level monitoring:


Running –The data from the table is successfully being replicated into the
warehouse.
Running with warning – Warning of non-fatal error with replication of the data
from the table
Stopping/Stopped - Replication has stopped
Error – Fatal error in replication for that table.

If the initial sync is completed, a Last completed timestamp is shown next to the table
name. This timestamp indicates the time when Fabric has last checked the table for
changes.

Also, note the Rows replicated column. It counts all the rows that have been replicated
for the table. Each time a row is replicated, it is counted again. This means that, for
example, inserting a row with primary key =1 on the source increases the "Rows
replicated" count by one. If you update the row with the same primary key, replicates to
Fabric again, and the row count increases by one, even though it's the same row which
replicated again. Fabric counts all replications that happened on the row, including
inserts, deletes, updates.

The Monitor replication screen also reflects any errors and warnings with tables being
mirrored. If the table has unsupported column types or if the entire table is unsupported
(for example, in memory or columnstore indexes), a notification about the limitation is
shown on this screen. For more information and details on the replication states, see
Monitor Fabric mirrored database replication.

) Important

If there are no updates in the source tables, the replicator engine will start to back
off with an exponentially increasing duration, up to an hour. The replicator engine
will automatically resume regular polling after updated data is detected.

Related content
Mirroring Azure SQL Managed Instance (Preview)
What is Mirroring in Fabric?

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


Frequently asked questions for
Mirroring Azure SQL Managed
Instance in Microsoft Fabric
(Preview)
FAQ

This article answers frequently asked questions about Mirroring Azure SQL Managed
Instance in Microsoft Fabric.

Features and capabilities


Is there a staging or landing zone for Azure SQL
Managed Instance? If so, is it outside of
OneLake?
A landing zone in OneLake stores both the snapshot and change data, to improve
performance when converting files into delta verti-parquet.

How long does the initial replication take?


It depends on the size of the data that is being brought in.

How long does it take to replicate


inserts/updates/deletes?
Near real-time latency.

Do you support replicating of views, transient, or


external tables?
No. Currently, only replicating regular tables are supported.

Does mirroring work without exposing my Azure


SQL Managed Instance to the internet?
In the current preview, your SQL managed instance needs to have a public endpoint, be
open for inbound access from Azure cloud, and it needs to be allowed to contact Azure
storage to be able to export data. Private endpoints are currently not supported.

How do I manage connections?


In Fabric, select the Settings button, then select Manage connection and gateways. You
can also delete existing connections from this page.

Can Power BI reports on mirrored data use direct


lake mode?
Yes, since tables are all v-ordered delta tables.

Self-help for Mirroring Azure SQL


Managed Instance in Microsoft Fabric
How do I know Fabric is replicating data on my
Azure SQL Managed Instance?
If you're experiencing mirroring problems, perform the following database level checks
using Dynamic Management Views (DMVs) and stored procedures to validate
configuration.

Execute the following query to check if the changes properly flow:

SELECT * FROM sys.dm_change_feed_log_scan_sessions

For troubleshooting steps, see Troubleshoot Fabric mirrored databases from Azure SQL
Managed Instance. Contact support if more troubleshooting is required.

How to enable System assigned managed


identity (SAMI) on Azure SQL Managed
Instance?
With a single step in the Azure portal, you can [enable System Assigned Managed
Identity (SAMI) of your Azure SQL Managed Instance](azure-sql-managed-instance-
tutorial.md#Enable System Assigned Managed Identity (SAMI) of your Azure SQL
Managed Instance).

What are the replication statuses?


See Monitor Fabric Mirror replication.

Can Azure SQL Managed Instance database


mirroring be accessed through the Power BI
Gateway or VNET data gateway?
Currently, this is unsupported.

What happens if I remove a table from


Mirroring?
The table is no longer replicated and its data is deleted from OneLake.

If I delete the mirrored database, does it affect


the source Azure SQL Managed Instance?
No, we just remove the replicated tables from the OneLake.

Can I mirror the same source database multiple


times?
No, each database in a SQL managed instance can only be mirrored to Fabric once. You
just need a single copy of the data in Fabric OneLake, which you can share with others.

Can I mirror only specific tables from my Azure


SQL Managed Instance?
Yes, specific tables can be selected during Mirroring configuration.

What happens to Mirroring in an event of


planned or unplanned Geo failover?
Mirroring stops working in an event of geo failover, whether planned or unplanned, as
there are potential data loss scenarios. If this occurs, disable mirroring completely, and
then create a new mirror and configure it to point to the new Azure SQL Managed
Instance.

Security
What authentication to the Azure SQL Managed
Instance is allowed?
Currently, for authentication to the source Azure SQL Managed Instance, we support
SQL authentication with user name and password and Microsoft Entra ID. Your SQL
managed instance should have read rights on your Microsoft Entra directory. For more
information, see Configure and manage Microsoft Entra authentication with Azure SQL.

Is data ever leaving the customers Fabric tenant?


No.

Is data staged outside of a customer


environment?
No. Data isn't staged outside of customer environment, it's staged in the customer's
OneLake.

Cost Management
What are the costs associated with Mirroring?
There's no compute cost for mirroring data from the source to Fabric OneLake. The
Mirroring storage cost is free up to a certain limit based on the purchased compute
capacity SKU you provision. Learn more from the Mirroring section in Microsoft Fabric
Pricing . The compute on SQL, Power BI, or Spark to consume the mirrored data will be
charged based on the Capacity.

How are ingress fees handled?


Fabric doesn't charge for ingress fees into OneLake for Mirroring.
How are egress fees handled?
If the Azure SQL Managed Instance is located in a different region from your Fabric
capacity, data egress is charged. If in the same region, there is no data egress.

Licensing
What are licensing options for Fabric Mirroring?
A Power BI Premium, Fabric Capacity, or Trial Capacity is required. For more information
on licensing, see Microsoft Fabric licenses.

Stop or pause Fabric Mirroring


What are the results of stopping Mirroring?
Replication stops in the source database, but a copy of the tables is kept in OneLake.
Restarting the mirroring results in all data being replicated from the start.

What steps does restarting the Mirroring


include?
The data from source tables will be reinitialized. Each time you stop and start, the entire
table is fetched again.

How to stop/disable Mirroring from your Azure


SQL Managed Instance?
If you are unable to stop mirroring your SQL managed instance from the Fabric portal,
or unable to delete your mirrored database from the Fabric portal, you can execute the
following stored procedure on your SQL managed instance: exec
sp_change_feed_disable_db; .

What if I stop or pause my Fabric capacity?


The Fabric capacity needs to be active and running. A paused or deleted capacity will
impact Mirroring and no data will be replicated.
Related content
What is Mirroring in Fabric?
Mirroring Azure SQL Managed Instance in Microsoft Fabric
Troubleshoot Fabric mirrored databases from Azure SQL Managed Instance
(Preview)

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


How to: Secure data in Microsoft Fabric
mirrored databases from Azure SQL
Managed Instance (Preview)
Article • 11/19/2024

This guide helps you establish data security in your mirrored Azure SQL Managed
Instance database in Microsoft Fabric.

Security requirements
1. The System Assigned Managed Identity (SAMI) of your Azure SQL Managed
Instance needs to be enabled, and must be the primary identity. To configure or
verify that the SAMI is enabled, go to your SQL Managed Instance in the Azure
portal. Under Security in the resource menu, select Identity. Under System
assigned managed identity, select Status to On.

After enabling the SAMI, if the SAMI is disabled or removed, the mirroring of
Azure SQL Managed Instance to Fabric OneLake will fail.
After enabling the SAMI, if you add a user assigned managed identity (UAMI),
it will become the primary identity, replacing the SAMI as primary. This will
cause replication to fail. To resolve, remove the UAMI.

2. Fabric needs to connect to the Azure SQL Managed Instance. For this purpose,
create a dedicated database user with limited permissions, to follow the principle
of least privilege. For a tutorial, see Tutorial: Configure Microsoft Fabric mirrored
databases from Azure SQL Managed Instance (Preview).

) Important

Any granular security established in the source database must be re-configured in


the mirrored database in Microsoft Fabric. For more information, see SQL granular
permissions in Microsoft Fabric.

Data protection features in Microsoft Fabric


You can secure column filters and predicate-based row filters on tables to roles and
users in Microsoft Fabric:
Row-level security in Fabric data warehousing
Column-level security in Fabric data warehousing

You can also mask sensitive data from non-admins using dynamic data masking:

Dynamic data masking in Fabric data warehousing

Related content
What is Mirroring in Fabric?
SQL granular permissions in Microsoft Fabric

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


Troubleshoot Fabric mirrored databases
from Azure SQL Managed Instance
(Preview)
Article • 11/19/2024

This article covers troubleshooting steps troubleshooting for mirroring Azure SQL
Managed Instance.

Changes to Fabric capacity or workspace


ノ Expand table

Cause Result Recommended resolution

Fabric capacity Mirroring stops 1. Resume or assign capacity from the Azure portal
paused/deleted 2. Go to Fabric mirrored database item. From the toolbar,
select Stop replication.
3. Start replication by selecting Mirror database for the
mirrored item in the Fabric portal.

Fabric capacity Mirroring isn't 1. Go to Fabric mirrored database item. From the toolbar,
resumed resumed select Stop replication.
2. Start replication by selecting Mirror database for the
mirrored item in the Fabric portal.

Workspace Mirroring stops 1. If mirroring is still active on the Azure SQL Managed
deleted automatically Instance, execute the following stored procedure on your
Azure SQL Managed Instance: exec
sp_change_feed_disable_db; .

Fabric trial Mirroring stops See Fabric trial capacity expires.


capacity expired automatically

T-SQL queries for troubleshooting


If you're experiencing mirroring problems, perform the following database level checks
using Dynamic Management Views (DMVs) and stored procedures to validate
configuration.

1. Execute the following query to check if the changes properly flow:

SQL
SELECT * FROM sys.dm_change_feed_log_scan_sessions;

2. If the sys.dm_change_feed_log_scan_sessions DMV doesn't show any progress on


processing incremental changes, execute the following T-SQL query to check if
there are any problems reported:

SQL

SELECT * FROM sys.dm_change_feed_errors;

3. If there aren't any issues reported, execute the following stored procedure to
review the current configuration of the mirrored Azure SQL Managed Instance.
Confirm it was properly enabled.

SQL

EXEC sp_help_change_feed;

The key columns to look for here are the table_name and state . Any value besides
4 indicates a potential problem. (Tables shouldn't sit for too long in statuses other

than 4 )

4. If replication is still not working, verify that the correct SAMI object has
permissions (see SPN permissions).
a. In the Fabric portal, select the "..." ellipses option on the mirrored database item.
b. Select the Manage Permissions option.
c. Confirm that the Azure SQL Managed Instance name shows with Read, Write
permissions.
d. Ensure that AppId that shows up matches the ID of the SAMI of your Azure SQL
Managed Instance.

5. Contact support if troubleshooting is required.

Managed identity
The System Assigned Managed Identity (SAMI) of the Azure SQL Managed Instance
needs to be enabled, and must be the primary identity.

After enablement, if SAMI setting status is either turned Off or initially enabled, then
disabled, and then enabled again, the mirroring of Azure SQL Managed Instance to
Fabric OneLake will fail. SAMI after re-enabling isn't the same identity as before
disabling. Therefore, you need to grant the new SAMI permissions to access the Fabric
workspace.

The SAMI must be the primary identity. Verify the SAMI is the primary identity with the
following SQL: SELECT * FROM sys.dm_server_managed_identities;

User Assigned Managed Identity (UAMI) isn't supported. If you add a UAMI, it becomes
the primary identity, replacing the SAMI as primary. This causes replication to fail. To
resolve:

Remove all UAMIs. Verify that the SAMI is enabled.

SPN permissions
Don't remove Azure SQL Managed Instance service principal name (SPN) contributor
permissions on Fabric mirrored database item.

If you accidentally remove the SPN permission, mirroring Azure SQL Managed Instance
won't function as expected. No new data can be mirrored from the source database.

If you remove Azure SQL Managed Instance SPN permissions or permissions aren't set
up correctly, use the following steps.

1. Add the SPN as a user by selecting the ... ellipses option on the mirrored
managed instance item.
2. Select the Manage Permissions option.
3. Enter the Azure SQL Managed Instance public endpoint. Provide Read and Write
permissions.

Related content
Limitations in Microsoft Fabric mirrored databases from Azure SQL Managed
Instance (Preview)
Frequently asked questions for Mirroring Azure SQL Managed Instance in
Microsoft Fabric (Preview)

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


Limitations in Microsoft Fabric mirrored
databases from Azure SQL Managed
Instance (Preview)
Article • 11/19/2024

Current limitations in the Microsoft Fabric mirrored databases from Azure SQL Managed
Instance are listed in this page. This page is subject to change.

For troubleshooting, see:

Troubleshoot Fabric mirrored databases


Troubleshoot Fabric mirrored databases from Azure SQL Managed Instance
(Preview)

Feature availability
You can configure your Azure SQL Managed Instance for mirroring if it is deployed to
any Azure except: East US 2; West US 2; Central US; West US. For a complete list of
region support, see Fabric regions that support Mirroring.

Database level limitations


Mirroring on Azure SQL Managed Instance is only available for instances that have
their Update Policy set to Always up to date. SQL Server 2022 version of SQL
Managed Instance doesn't support mirroring.
Geo Disaster Recovery setup isn't supported by Mirroring.
Fabric Mirroring for Azure SQL Managed Instance is only supported on a writable
primary database.
An Azure SQL Managed Instance database can't be mirrored if the database has:
enabled Change Data Capture (CDC), Transactional Replication, or the database is
already mirrored in another Fabric workspace.
The maximum number of tables that can be mirrored into Fabric is 500 tables. Any
tables above the 500 limit currently can't be replicated.
If you select Mirror all data when configuring Mirroring, the tables to be
mirrored over are the first 500 tables when all tables are sorted alphabetically
based on the schema name and then the table name. The remaining set of
tables at the bottom of the alphabetical list aren't mirrored over.
If you unselect Mirror all data and select individual tables, you are prevented
from selecting more than 500 tables.
The database copy/move feature isn't supported on databases that are mirrored. If
you move or copy a database with mirroring enabled, the copy will report a
mirroring error state.
If your SQL managed instance database is set up to use Azure SQL Managed
Instance Link feature, the readable replica isn't supported to be a source for Fabric
mirroring.
If your database is configured for mirroring and then renamed, the Monitor
Mirroring functionality will stop working. Renaming the database to the name it
had when mirroring was set up will resolve the issue.

Permissions in the source database


Row-level security is supported, but permissions are currently not propagated to
the replicated data in Fabric OneLake.
Object-level permissions, for example granting permissions to certain columns,
aren't currently propagated to the replicated data in Fabric OneLake.
Dynamic data masking settings aren't currently propagated from the source
database into Fabric OneLake.
To successfully configure Mirroring for Azure SQL Managed Instance, the principal
used to connect to the source SQL managed instance needs to be granted
CONTROL or db_owner permissions. It's recommended to only grant this only on
the database being mirrored - do not do it on the entire server level.

Network and connectivity security


The source SQL managed instance needs to enable public endpoint and allow
Azure services to connect to it.
The System Assigned Managed Identity (SAMI) of the Azure SQL Managed
Instance needs to be enabled and must be the primary identity.
The Azure SQL Managed Instance service principal name (SPN) contributor
permissions shouldn't be removed from the Fabric mirrored database item.
User Assigned Managed Identity (UAMI) isn't supported.
Mirroring across Microsoft Entra tenants isn't supported where an Azure SQL
Managed Instance and the Fabric workspace are in separate tenants.
Microsoft Purview Information Protection/sensitivity labels defined in Azure SQL
Managed Instance aren't mirrored to Fabric OneLake.
Table level
A table that doesn't have a defined primary key can't be mirrored.
A table using a primary key defined as nonclustered primary key can't be
mirrored.
A table can't be mirrored if the primary key is one of the data types: sql_variant,
timestamp/rowversion
A table cannot be mirrored if the primary key is one of these data types:
datetime2(7), datetimeoffset(7), time(7), where 7 is seven digits of precision.
Delta lake supports only six digits of precision.
Columns of SQL type datetime2, with precision of 7 fractional second digits,
do not have a corresponding data type with same precision in Delta files in
Fabric OneLake. A precision loss happens if columns of this type are mirrored
and seventh decimal second digit will be trimmed.
The datetimeoffset(7) data type does not have a corresponding data type
with same precision in Delta files in Fabric OneLake. A precision loss (loss of
time zone and seventh time decimal) occurs if columns of this type are
mirrored.
Clustered columnstore indexes aren't currently supported.
If one or more columns in the table is of type Large Binary Object (LOB) with a size
> 1 MB, the column data is truncated to size of 1 MB in Fabric OneLake. Configure
the max text repl size server configuration option to allow more than 65,536 bytes
if you want to allow large inserts.
Source tables that have any of the following features in use can't be mirrored:
Temporal history tables and ledger history tables
Always Encrypted
In-memory tables
Graph
External tables
The following table-level data definition language (DDL) operations aren't allowed
on source tables when enabled for SQL Managed Instance mirroring to Microsoft
Fabric.
Switch/Split/Merge partition
Alter primary key
Truncate table
When there's DDL change, a complete data snapshot is restarted for the changed
table, and entire table data is reseeded into Fabric OneLake.
Currently, a table cannot be mirrored if it has the json data type.
Currently, you cannot ALTER a column to the json data type when a table is
mirrored.
Views and Materialized views aren't supported for mirroring.

Column level
If the source table contains computed columns, these columns can't be mirrored to
Fabric OneLake.
If the source table contains columns with one of these data types, these columns
can't be mirrored to Fabric OneLake. The following data types are unsupported for
mirroring:
image
text/ntext
xml
json
rowversion/timestamp
sql_variant
User Defined Types (UDT)
geometry
geography
Column names for a SQL table can't contain spaces nor the following characters: ,
; { } ( ) \n \t = .

The following column level data definition language (DDL) operations aren't
supported on source tables when they're enabled for SQL Managed Instance
mirroring to Microsoft Fabric:
Alter column
Rename column ( sp_rename )

Mirrored item limitations


User needs to be a member of the Admin/Member role for the workspace to
create SQL Managed Instance mirroring.
Stopping mirroring disables mirroring completely.
Starting mirroring reseeds all the tables, effectively starting from scratch.
If Fabric capacity is stopped and then restarted, mirroring will stop working and
needs to be manually restarted. There won't be warnings/error messages
indicating that mirroring stopped working.

SQL analytics endpoint limitations


The SQL analytics endpoint is the same as the Lakehouse SQL analytics endpoint.
It's the same read-only experience. See SQL analytics endpoint limitations.
Source schema hierarchy isn't replicated to the mirrored database. Instead, source
schema is flattened, and schema name is encoded into the mirrored database table
name.

Fabric regions that support Mirroring


The following are the Fabric regions that support Mirroring for Azure SQL Managed
Instance:

Asia Pacific:
Australia East
Australia Southeast
Central India
East Asia
Japan East
Korea Central
Southeast Asia
South India

Europe
North Europe
West Europe
France Central
Germany West Central
Norway East
Sweden Central
Switzerland North
Switzerland West
UK South
UK West

Americas:
Brazil South
Canada Central
Canada East
East US2
West US2

Middle East and Africa:


South Africa North
UAE North
Next step
Tutorial: Configure Microsoft Fabric mirrored databases from Azure SQL Managed
Instance (Preview)

Related content
Monitor Fabric mirrored database replication
Model data in the default Power BI semantic model in Microsoft Fabric

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


Mirroring Azure Cosmos DB (Preview)
Article • 11/19/2024

Mirroring in Microsoft Fabric provides a seamless no-ETL experience to integrate your


existing Azure Cosmos DB data with the rest of your data in Microsoft Fabric. Your Azure
Cosmos DB data is continuously replicated directly into Fabric OneLake in near real-
time, without any performance impact on your transactional workloads or consuming
Request Units (RUs).

Data in OneLake is stored in the open-source delta format and automatically made
available to all analytical engines on Fabric.

You can use built-in Power BI capabilities to access data in OneLake in DirectLake mode.
With Copilot enhancements in Fabric, you can use the power of generative AI to get key
insights on your business data. In addition to Power BI, you can use T-SQL to run
complex aggregate queries or use Spark for data exploration. You can seamlessly access
the data in notebooks and use data science to build machine learning models.

) Important

Mirroring for Azure Cosmos DB is currently in preview. Production workloads aren't


supported during preview. Currently, only Azure Cosmos DB for NoSQL accounts
are supported.

Why use mirroring in Fabric?


With Mirroring in Fabric, you don't need to piece together different services from
multiple vendors. Instead, you can enjoy a highly integrated, end-to-end, and easy-to-
use product that is designed to simplify your analytics needs and built for openness.

If you're looking for BI reporting or analytics on your operational data in Azure Cosmos
DB, mirroring provides:

No-ETL, cost-effective near real-time access to your Azure Cosmos DB data


without effecting your request unit consumption
Ease of bringing data across various sources into Fabric OneLake
Delta table optimizations with v-order for lightning-fast reads
One-click integration with Power BI with Direct Lake and Copilot
Rich business insights by joining data across various sources
Richer app integration to access queries and views
OneLake data is stored in the open-source Delta Lake format, allowing you to use it with
various solutions within and outside of Microsoft. This data format helps make it easier
to build a single data estate for your analytical needs.

What analytics experiences are built in?


Mirrored databases are an item in Fabric Data Warehousing distinct from the
Warehouse and SQL analytics endpoint.

Every Mirrored Azure Cosmos DB database has three items you can interact with in your
Fabric workspace:

The mirrored database item. Mirroring manages the replication of data into
OneLake and conversion to Parquet, in an analytics-ready format. This enables
downstream scenarios like data engineering, data science, and more.
SQL analytics endpoint, which is automatically generated
Default semantic model, which is automatically generated

Mirrored database
The mirrored database shows the replication status and the controls to stop or start
replication in Fabric OneLake. You can also view your source database, in read-only
mode, using the Azure Cosmos DB data explorer. Using data explorer, you can view your
containers in your source Azure Cosmos DB database and query them. These operations
consume request units (RUs) from your Azure Cosmos DB account. Any changes to the
source database are reflected immediately in Fabric's source database view. Writing to
the source database isn't allowed from Fabric, as you can only view the data.
SQL analytics endpoint
Each mirrored database has an autogenerated SQL analytics endpoint that provides a
rich analytical experience on top of the OneLake's Delta tables created by the mirroring
process. You have access to familiar T-SQL commands that can define and query data
objects but not manipulate the data from the SQL analytics endpoint, as it's a read-only
copy.

You can perform the following actions in the SQL analytics endpoint:

Explore Delta Lake tables using T-SQL. Each table is mapped to a container from
your Azure Cosmos DB database.
Create no-code queries and views and explore them visually without writing a line
of code.
Join and query data in other mirrored databases, Warehouses, and Lakehouses in
the same workspace.
You can easily visualize and build BI reports based on SQL queries or views.

In addition to the SQL query editor, there's a broad ecosystem of tooling. These tools
include the mssql extension with Visual Studio Code, SQL Server Management Studio
(SSMS), and even GitHub Copilot. You can supercharge analysis and insights generation
from the tool of your choice.

Semantic model
The default semantic model is an automatically provisioned Power BI Semantic Model.
This feature enables business metrics to be created, shared, and reused. For more
information, see semantic models.

How does near real-time replication work?


When you enable mirroring on your Azure Cosmos DB database, inserts, update, and
delete operations on your online transaction processing (OLTP) data continuously
replicates into Fabric OneLake for analytics consumption.

The continuous backup feature is a prerequisite for mirroring. You can enable either 7-
day or 30-day continuous backup on your Azure Cosmos DB account. If you are
enabling continuous backup specifically for mirroring, 7-day continuous backup is
recommended, as it is free of cost.

7 Note
Mirroring does not use Azure Cosmos DB's analytical store or change feed as a
change data capture source. You can continue to use these capabilities
independently, along with mirroring.

It could take a few minutes to replicate your Azure Cosmos DB Data into Fabric
OneLake. Depending on your data's initial snapshot or the frequency of
updates/deletes, replication could also take longer in some cases. Replication doesn't
affect the request units (RUs) you allocated for your transactional workloads.

What to expect from mirroring


There are a few considerations and supported scenarios you should consider before
mirroring.

Setup considerations
To mirror a database, it should already be provisioned in Azure. You must enable
continuous backup on the account as a prerequisite.

You can only mirror each database individually at a time. You can choose which
database to mirror.
You can mirror the same database multiple times within the same workspace. As a
best practice, a single copy of database can be reused across lakehouses,
warehouses, or other mirrored databases. You shouldn't need to set up multiple
mirrors to the same database.
You can also mirror the same database across different Fabric workspaces or
tenants.
Changes to Azure Cosmos DB containers, such as adding new containers and
deleting existing ones, are replicated seamlessly to Fabric. You can start mirroring
an empty database with no containers, for example, and mirroring seamlessly picks
up the containers added at a later point in time.

Support for nested data


Nested data is shown as a JSON string in SQL analytics endpoint tables. You can use
OPENJSON , CROSS APPLY , and OUTER APPLY in T-SQL queries or views to expand this data

selectively. If you're using Power Query, you can also apply the ToJson function to
expand this data.

7 Note
Fabric has a limitation for string columns of 8 KB in size. For more information, see
data warehouse limitations.

Handle schema changes


Mirroring automatically replicates properties across Azure Cosmos DB items, with
schema changes. Any new properties discovered in an item are shown as new columns
and the missing properties, if any, is represented as null in Fabric.

If you rename a property in an item, Fabric tables retain both the old and new columns.
The old column will show null and the new one will show the latest value, for any items
that are replicated after the renaming operation.

If you change the data type of a property in Azure Cosmos DB items, the changes are
supported for compatible data types that can be converted. If the data types aren't
compatible for conversion in Delta, they're represented as null values.

SQL analytics endpoint tables convert Delta data types to T-SQL data types.

Duplicate column names


Azure Cosmos DB supports case-insensitive column names, based on the JSON
standard. Mirroring supports these duplicate column names by adding _n to the
column name, where n would be a numeric value.

For example, if the Azure Cosmos DB item has addressName and AddressName as unique
properties, Fabric tables have corresponding addressName and AddressName_1 columns.
For more information, see replication limitations.

Security
Connections to your source database are based on account keys for your Azure Cosmos
DB accounts. If you rotate or regenerate the keys, you need to update the connections
to ensure replication works. For more information, see connections.

Account keys aren't directly visible to other Fabric users once the connection is set up.
You can limit who has access to the connections created in Fabric. Writes aren't
permitted to Azure Cosmos DB database either from the data explorer or analytics
endpoint in your mirrored database.
Mirroring doesn't currently support authentication using read-only account keys, single-
sign on (SSO) with Microsoft Entra IDs and role-based access control, or managed
identities.

Once the data is replicated into Fabric OneLake, you need to secure access to this data.

Data protection features


Granular security can be configured in the mirrored database in Microsoft Fabric. For
more information, see granular permissions in Microsoft Fabric.

You can secure column filters and predicate-based row filters on tables to roles and
users in Microsoft Fabric:

Row-level security in Fabric data warehousing


Column-level security in Fabric data warehousing

You can also mask sensitive data from non admin users using dynamic data masking:

Dynamic data masking in Fabric data warehousing

Network security
Currently, mirroring doesn't support private endpoints or customer managed keys
(CMK) on OneLake. Mirroring isn't supported for Azure Cosmos DB accounts with
network security configurations less permissive than all networks, using service
endpoints, using private endpoints, using IP addresses, or using any other settings that
could limit public network access to the account. Azure Cosmos DB accounts should be
open to all networks to work with mirroring.

Disaster recovery and replication latency


In Fabric, you can deploy content to data centers in regions other than the home region
of the Fabric tenant. For more information, see multi-geo support.

For an Azure Cosmos DB account with a primary write region and multiple read regions,
mirroring chooses the Azure Cosmos DB read region closest to the region where Fabric
capacity is configured. This selection helps provide low-latency replication for mirroring.

When you switch your Azure Cosmos DB account to a recovery region, mirroring
automatically selects the nearest Azure Cosmos DB region again.
7 Note

Mirroring does not support accounts with multiple write regions.

Your Cosmos DB data replicated to OneLake need to be configured to handle region-


wide outages. For more information, see disaster recovery in OneLake.

Explore your data with mirroring


You can directly view and access mirrored data in OneLake. You can also seamlessly
access mirrored data without further data movement.

Learn more on how to access OneLake using ADLS Gen2 APIs or SDK, the OneLake File
explorer, and Azure Storage explorer.

You can connect to the SQL analytics endpoint from tools such as SQL Server
Management Studio (SSMS) or using drivers like Microsoft Open Database Connectivity
(ODBC) and Java Database Connectivity (JDBC). For more information, see SQL analytics
endpoint connectivity.

You can also access mirrored data with services such as:

Azure services like Azure Databricks, Azure HDInsight, or Azure Synapse Analytics
Fabric Lakehouse using shortcuts for data engineering and data science scenarios
Other mirrored databases or warehouses in the Fabric workspace

You can also build medallion architecture solutions, cleaning and transforming the data
that is landing into mirrored database as the bronze layer. For more information, see
medallion architecture support in Fabric.

Pricing
Mirroring is free of cost for compute used to replicate your Cosmos DB data into Fabric
OneLake. Storage in OneLake is free of cost based on certain conditions. For more
information, see OneLake pricing for mirroring. The compute usage for querying data
via SQL, Power BI or, Spark is still charged based on the Fabric Capacity.

If you're using the data explorer in Fabric mirroring, you accrue typical costs based on
request unit (RU) usage to explore the containers and query the items in the source
Azure Cosmos DB database. The Azure Cosmos DB continuous backup feature is a
prerequisite to mirroring: Standard charges for continuous backup apply. There are no
additional charges for mirroring on continuous backup billing. For more information, see
Azure Cosmos DB pricing .

Next step
Tutorial: Configure Microsoft Fabric mirrored databases from Azure Cosmos DB
(Preview)

Related content
Limitations in Microsoft Fabric mirrored databases from Azure Cosmos DB
(Preview)
FAQ: Microsoft Fabric mirrored databases from Azure Cosmos DB

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


Tutorial: Configure Microsoft Fabric
mirrored database for Azure Cosmos DB
(Preview)
Article • 11/19/2024

In this tutorial, you configure a Fabric mirrored database from an existing Azure Cosmos
DB for NoSQL account.

Mirroring incrementally replicates Azure Cosmos DB data into Fabric OneLake in near
real-time, without affecting the performance of transactional workloads or consuming
Request Units (RUs). You can build Power BI reports directly on the data in OneLake,
using DirectLake mode. You can run ad hoc queries in SQL or Spark, build data models
using notebooks and use built-in Copilot and advanced AI capabilities in Fabric to
analyze the data.

) Important

Mirroring for Azure Cosmos DB is currently in preview. Production workloads aren't


supported during preview. Currently, only Azure Cosmos DB for NoSQL accounts
are supported.

Prerequisites
An existing Azure Cosmos DB for NoSQL account.
If you don't have an Azure subscription, Try Azure Cosmos DB for NoSQL free .
If you have an existing Azure subscription, create a new Azure Cosmos DB for
NoSQL account.
An existing Fabric capacity. If you don't have an existing capacity, start a Fabric
trial. Mirroring might not be available in some Fabric regions. For more
information, see supported regions.

 Tip

During the public preview, it's recommended to use a test or development copy of
your existing Azure Cosmos DB data that can be recovered quickly from a backup.
Configure your Azure Cosmos DB account
First, ensure that the source Azure Cosmos DB account is correctly configured to use
with Fabric mirroring.

1. Navigate to your Azure Cosmos DB account in the Azure portal .

2. Ensure that continuous backup is enabled. If not enabled, follow the guide at
migrate an existing Azure Cosmos DB account to continuous backup to enable
continuous backup. This feature might not be available in some scenarios. For
more information, see database and account limitations.

3. Ensure that the networking options are set to public network access for all
networks. If not, follow the guide at configure network access to an Azure Cosmos
DB account.

Create a mirrored database


Now, create a mirrored database that is the target of the replicated data. For more
information, see What to expect from mirroring.

1. Navigate to the Fabric portal home.

2. Open an existing workspace or create a new workspace.

3. In the navigation menu, select Create.

4. Select Create, locate the Data Warehouse section, and then select Mirrored Azure
Cosmos DB (Preview).

5. Provide a name for the mirrored database and then select Create.

Connect to the source database


Next, connect the source database to the mirrored database.

1. In the New connection section, select Azure Cosmos DB for NoSQL.

2. Provide credentials for the Azure Cosmos DB for NoSQL account including these
items:

ノ Expand table
Value

Azure Cosmos DB endpoint URL endpoint for the source account.

Connection name Unique name for the connection.

Authentication kind Select Account key.

Account Key Read-write key for the source account.

3. Select Connect. Then, select a database to mirror.

7 Note

All containers in the database are mirrored.

Start mirroring process


1. Select Mirror database. Mirroring now begins.

2. Wait two to five minutes. Then, select Monitor replication to see the status of the
replication action.

3. After a few minutes, the status should change to Running, which indicates that the
containers are being synchronized.

 Tip
If you can't find the containers and the corresponding replication status, wait
a few seconds and then refresh the pane. In rare cases, you might receive
transient error messages. You can safely ignore them and continue to refresh.

4. When mirroring finishes the initial copying of the containers, a date appears in the
last refresh column. If data was successfully replicated, the total rows column
would contain the number of items replicated.

Monitor Fabric Mirroring


Now that your data is up and running, there are various analytics scenarios available
across all of Fabric.

1. Once Fabric Mirroring is configured, you're automatically navigated to the


Replication Status pane.

2. Here, monitor the current state of replication. For more information and details on
the replication states, see Monitor Fabric mirrored database replication.

Query the source database from Fabric


Use the Fabric portal to explore the data that already exists in your Azure Cosmos DB
account, querying your source Cosmos DB database.

1. Navigate to the mirrored database in the Fabric portal.

2. Select View, then Source database. This action opens the Azure Cosmos DB data
explorer with a read-only view of the source database.

3. Select a container, then open the context menu and select New SQL query.
4. Run any query. For example, use SELECT COUNT(1) FROM container to count the
number of items in the container.

7 Note

All the reads on source database are routed to Azure and will consume
Request Units (RUs) allocated on the account.

Analyze the target mirrored database


Now, use T-SQL to query your NoSQL data that is now stored in Fabric OneLake.

1. Navigate to the mirrored database in the Fabric portal.

2. Switch from Mirrored Azure Cosmos DB to SQL analytics endpoint.

3. Each container in the source database should be represented in the SQL analytics
endpoint as a warehouse table.

4. Select any table, open the context menu, then select New SQL Query, and finally
select Select Top 100.

5. The query executes and returns 100 records in the selected table.

6. Open the context menu for the same table and select New SQL Query. Write an
example query that use aggregates like SUM , COUNT , MIN , or MAX . Join multiple
tables in the warehouse to execute the query across multiple containers.

7 Note

For example, this query would execute across multiple containers:

SQL
SELECT
d.[product_category_name],
t.[order_status],
c.[customer_country],
s.[seller_state],
p.[payment_type],
sum(o.[price]) as price,
sum(o.[freight_value]) freight_value
FROM
[dbo].[products] p
INNER JOIN
[dbo].[OrdersDB_order_payments] p
on o.[order_id] = p.[order_id]
INNER JOIN
[dbo].[OrdersDB_order_status] t
ON o.[order_id] = t.[order_id]
INNER JOIN
[dbo].[OrdersDB_customers] c
on t.[customer_id] = c.[customer_id]
INNER JOIN
[dbo].[OrdersDB_productdirectory] d
ON o.product_id = d.product_id
INNER JOIN
[dbo].[OrdersDB_sellers] s
on o.seller_id = s.seller_id
GROUP BY
d.[product_category_name],
t.[order_status],
c.[customer_country],
s.[seller_state],
p.[payment_type]

This example assumes the name of your table and columns. Use your own
table and columns when writing your SQL query.

7. Select the query and then select Save as view. Give the view a unique name. You
can access this view at any time from the Fabric portal.

8. Return back to the mirrored database in the Fabric portal.

9. Select New visual query. Use the query editor to build complex queries.

Build BI reports on the SQL queries or views


1. Select the query or view and then select Explore this data (preview). This action
explores the query in Power BI directly using Direct Lake on OneLake mirrored
data.
2. Edit the charts as needed and save the report.

 Tip

You can also optionally use Copilot or other enhancements to build dashboards
and reports without any further data movement.

More examples
Learn more about how to access and query mirrored Azure Cosmos DB data in Fabric:

How to: Query nested data in Microsoft Fabric mirrored databases from Azure
Cosmos DB (Preview)
How to: Access mirrored Azure Cosmos DB data in Lakehouse and notebooks from
Microsoft Fabric (Preview)
How to: Join mirrored Azure Cosmos DB data with other mirrored databases in
Microsoft Fabric (Preview)

Related content
Mirroring Azure Cosmos DB (Preview)
FAQ: Microsoft Fabric mirrored databases from Azure Cosmos DB
Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


Frequently asked questions for
Microsoft Fabric mirrored
databases from Azure Cosmos DB
(Preview)
FAQ

This article answers frequently asked questions about Mirrored Azure Cosmos DB
database in Microsoft Fabric.

) Important

Mirroring for Azure Cosmos DB is currently in preview. Production workloads aren't


supported during preview. Currently, only Azure Cosmos DB for NoSQL accounts
are supported.

General questions
How is mirroring different from shortcuts in
relation to Azure Cosmos DB?
Mirroring replicates the source database into Fabric OneLake in open-source delta
format. You can run analytics on this data from anywhere in Fabric. Shortcuts don't
replicate the data into Fabric OneLake. Instead, shortcuts link to the source data without
data movement. Currently, Azure Cosmos DB is only available as a source for mirroring.

Does mirroring affect the performance of the


source Azure Cosmos DB database?
No, mirroring doesn't affect the performance or cost of the source database. Mirroring
requires the continuous backup feature to be enabled on the source Azure Cosmos DB
account. Continuous backup enables replication without effect on transactional
workloads.
Is mirroring Azure Cosmos DB a functional
replacement for pipeline copy jobs in Fabric?
Mirroring is a low-latency replication of your data in Azure Cosmos DB. Unlike copy jobs,
mirroring creates a continuous and incremental copy of your Azure Cosmos DB data.
Mirroring doesn't affect your transactional workloads on the source database or
container.

In contrast, a copy job is a scheduled job, which can add end-to-end latency for
incremental jobs. Additionally, copy jobs requirement management to pick up
incremental changes, add to compute costs in Fabric, and affect request unit
consumption on the source database in Azure Cosmos DB.

Copy jobs are useful for one-time copy jobs from Azure Cosmos DB, but mirroring is
ideal for tracking incremental changes.

Does trying the mirroring feature affect my


Azure Cosmos DB account?
No, you can enable and disable mirroring without any effect to your source Azure
Cosmos DB account or data.

2 Warning

If you enable continuous backup on an Azure Cosmos DB account for mirroring


into Fabric, continuous backup can't be disabled. Similarly, you can't disable
analytical store for an Azure Cosmos DB account if continuous backup is enabled.

Pricing
What costs are associated with mirroring Azure
Cosmos DB?
Mirroring is in preview. There are currently no costs for compute used to replicate data
from Azure Cosmos DB to Fabric OneLake. Storage costs for OneLake are also free upto
certain limits. For more information, see OneLake pricing for mirroring . The compute
for querying data using SQL, Power BI, or Spark is charged at regular rates.
For Azure Cosmos DB, continuous backup is a prerequisite to mirroring. If you enabled
any continuous backup tier before mirroring, you don't accrue any extra cost. If you
enable continuous backup specifically for mirroring, 7-day backup mode is free of cost;
if you enable 30-day backup, you're billed the price associated with that feature. For
more information, see Azure Cosmos DB pricing .

If you use data explorer to view the source data from Azure Cosmos DB, you will accrue
costs based on Request Units (RU) usage.

How are egress fees handled for mirroring Azure


Cosmos DB?
Egress fees are only charged if your Azure Cosmos DB account is in a different region
than your Fabric capacity. Fabric mirrors from the geographically closest Azure region to
Fabric's capacity region in scenarios where an Azure Cosmos DB account has multiple
read regions. For more information, see replication limitations.

Azure Synapse Link and analytical store


Is mirroring using Azure Cosmos DB's analytical
store?
No, mirroring doesn't use the analytical store. Mirroring doesn't affect your transactional
workloads or throughput consumption.

In Azure Cosmos DB, continuous backup is a prerequisite for mirroring. This prerequisite
allows Fabric to mirror your data without impacting your transactional workloads or
requiring the analytical store.

Is mirroring using Azure Synapse Link for Azure


Cosmos DB?
No, mirroring in Fabric isn't related to Azure Synapse Link.

In Azure Cosmos DB, continuous backup is a prerequisite for mirroring. This prerequisite
allows Fabric to mirror your data without impacting your transactional workloads or
requiring the analytical store.
Does mirroring affect how Azure Synapse Link
works with Azure Cosmos DB?
No, mirroring in Fabric isn't related to Azure Synapse Link. You can continue to use
Azure Synapse Link while using Fabric mirroring.

Can I continue to use Azure Cosmos DB's


analytical store as a change data capture (CDC)
source in Azure Data Factory while using
mirroring?
Yes, you can use analytical store and Fabric mirroring on the same Azure Cosmos DB
account. These features work independently of each other. Mirroring doesn't interfere
with analytical store usage.

Can I continue to use Azure Cosmos DB's change


feed while using mirroring?
Yes, you can use the change feed and Fabric mirroring on the same Azure Cosmos DB
account. These features work independently of each other. Mirroring doesn't interfere
with change feed usage.

Can I disable analytical store for my Azure


Cosmos DB account after using mirroring?
Mirroring requires Azure Cosmos DB continuous backup as a prerequisite. Azure
Cosmos DB accounts with continuous backup enabled can't disable analytical store.
Once you disable analytical store on any collections, you cannot enable continuous
backup. This is a temporary limitation.

With mirroring, are you deprecating Azure


Synapse Link for Azure Cosmos DB?
No, Azure Synapse Link and Azure Synapse Analytics are still available for your
workloads. There are no plans to deprecate these workloads. You can continue to use
Azure Synapse Link for your production workloads.
Data connections and authentication
How do I manage mirroring connections for
Azure Cosmos DB?
In the Fabric portal, select the Manage connections and gateways options within the
Settings section.

What authentication methods are allowed to


Azure Cosmos DB accounts?
Only read-write account keys are supported.

Can I use single sign-on and role-based access


control as authentication for mirroring Azure
Cosmos DB?
No, only read-write account keys are supported at this time.

Can I use managed identities as authentication


for mirroring Azure Cosmos DB?
No, only read-write account keys are supported at this time.

What happens if I rotate my Azure Cosmos DB


account keys?
You must update the connection credentials for Fabric mirroring if the account keys are
rotated. If you don't update the keys, mirroring fails. To resolve this failure, stop
replication, update the credentials with the newly rotated keys, and then restart
replication.

Setup
Can I select specific containers within an Azure
Cosmos DB database for mirroring?
No, when you mirror a database from Azure Cosmos DB, all containers are replication
into Fabric OneLake.

Can I use mirroring to replicate a single Azure


Cosmos DB database multiple times?
Yes, multiple mirrors are possible but unnecessary. Once the replicated data is in Fabric,
it can be shared to other destinations directly from Fabric.

Can I create shortcuts to my replica of Azure


Cosmos DB data that I created using mirroring?
No, mirroring doesn't support the creation of shortcuts to external sources like Azure
Data Lake Storage (ADLS) Gen2 or Amazon Web Services (AWS) Simple Storage Service
(S3).

Azure Cosmos DB data explorer


In Fabric, when I select "View" and "Source
database" am I seeing data in OneLake or in
Azure Cosmos DB?
The option in Fabric to view the source database provides a read-only view of the live
data in Azure Cosmos DB using the data explorer. This perspective is a real-time view of
the containers that are the source of the replicated data.

This view of the live data directly in the Fabric portal is a useful tool to determine if the
data in OneLake is recent or represented correctly when compared to the source Azure
Cosmos DB database. Operations using the data explorer on the live Azure Cosmos DB
data can accrue request unit consumption.

Analytics on Azure Cosmos DB data


How do I analyze Azure Cosmos DB data
mirrored into OneLake?
Use the Fabric portal create a new SQL query against your SQL analytics endpoint. From
here, you can run common queries like SELECT TOP 100 * FROM ... .

Additionally, use Lakehouse to analyze the OneLake data long with other data. From
Lakehouse, you can utilize Spark to query data with notebooks.

How is data synced in mirroring for Azure


Cosmos DB?
The syncing of the data is fully managed. When you enable mirroring, the data is
replicated into Fabric OneLake in near real-time and mirroring continuously replicates
new changes as they occur in the source database.

Does Azure Cosmos DB mirroring work across


Azure and Fabric regions?
Mirroring is supported across regions but this scenario could result in unexpected
network data egress costs and latency. Ideally, match your Fabric capacity to one of your
Azure Cosmos DB account's regions. For more information, see replication limitations.

Is mirrored data for Azure Cosmos DB only


available using the SQL analytics endpoint?
You can add existing mirrored databases as shortcuts in Lakehouse. From Lakehouse,
you can explore the data directly, open the data in a notebook for Spark queries, or
build machine learning models.

) Important

The shortcut in Lakehouse is a shortcut to the mirrored database, the OneLake


replicate of the Azure Cosmos DB data. The shortcut in Lakehouse doesn't directly
access the Azure Cosmos DB account or data.

How long does initial replication of Azure


Cosmos DB data take?
The latency of initial and continuous replication varies based on the volume of data. In
most cases, latency can be a few minutes but it can be longer for large volumes of data.
How long does it take to replicate Azure Cosmos
DB insert, update, and delete operations?
Once the initial data is replicated, individual operations are replicated in near real-time.
In rare cases, there can be a small delay if the source database has a high volume of
update and delete operations within a time window.

Does mirroring have built-in backoff logic with


Azure Cosmos DB?
No, mirroring doesn't have built-in backoff logic as replication is continuous and
incremental.

Does mirroring support the change data feed


from Azure Cosmos DB?
No, mirroring doesn't currently support the change data feed on mirrored data from
Azure Cosmos DB.

Does mirroring support the medallion


architecture for data replicated from Azure
Cosmos DB?
Mirroring doesn't have built-in support for the medallion architecture. You can configure
your own silver and gold layers with watermark logic and processing for transformations
and joins using pipelines or Spark.

Do Power BI reports use direct lake mode with


mirrored data from Azure Cosmos DB?
Yes.

Does Azure Cosmos DB mirroring support


nested data?
Yes, nested data is flattened in OneLake as a JSON string. Use OPENJSON , CROSS APPLY ,
and OUTER APPLY to flatten the data for view. For more information, see nested data.
Does Azure Cosmos DB mirroring support
automatic flattening.
No, mirroring doesn't automatically flatten nested data. Methods are available for the
SQL analytics endpoint to work with nested JSON strings. For more information, see
nested data.

Should I be concerned about cold start


performance with mirrored data from Azure
Cosmos DB?
No, in general SQL queries in Fabric don't experience cold start latency.

What happens if I delete the source Azure


Cosmos DB database in Azure, while it is being
mirrored?
Data Explorer and replication begin to fail in Fabric. OneLake data remain as-is, until you
delete the existing mirrored data.

After Azure Cosmos DB is mirrored, how do I


connect the SQL analysis endpoint to client tools
or applications?
Connecting to the SQL analysis endpoint for mirrored data is similar to using the same
endpoint for any other item in Fabric. For more information, see connect to data
warehousing in Fabric.

How do I join Azure Cosmos DB mirrored data


across databases?
Mirror each Azure Cosmos DB database independently. Then, add one of the SQL
analytics endpoints to the other as a mirrored database item. Next, use a SQL JOIN
query to perform queries across containers in distinct Azure Cosmos DB databases.
How do I join Azure Cosmos DB mirrored data
with Azure SQL Database or Snowflake data?
Mirror the Azure Cosmos DB database. Then, mirror either the Azure SQL Database or
Snowflake data. Then, add one of the SQL analytics endpoints to the other as a mirrored
database item. Now, use a SQL JOIN query to perform queries across multiple data
services.

Replication actions
How can I stop or disable replication for a
mirrored Azure Cosmos DB database?
Stop replication by using the Fabric portal's stop replication option. This action
completely stops replication but not remove any data that already exists in OneLake.

How do I restart replication for a mirrored Azure


Cosmos DB database?
Replication doesn't support the concepts of pause or resume. Stopping replication
completely halts replication and selecting restart replication in the Fabric portal starts
replication entirely from scratch. Restarting replication replaces the OneLake data with
the latest data instead of incrementally updating it.

Why can't I find an option to configure


replication for a mirrored Azure Cosmos DB
database?
Mirroring for Azure Cosmos DB automatically mirrors all containers within the selected
database. Because of this nuance, the Fabric portal doesn't contain an option to
configure specific replication options for Azure Cosmos DB.

What does each replication status message


mean for replicated Azure Cosmos DB data?
Optimally, you want the replication to have a status of Running. If the replication status
is Running with warning, the replication is successful but there's an issue that you
should resolve. A status of Stopping, Stopped, Failed, or Error indicates more serious
states that require intervention before replication can continue. For more information,
see Monitor Fabric mirroring.

Analytical time-to-live (TTL) or soft


deletes
Are items deleted by Azure Cosmos DB's time-
to-live (TTL) feature removed from the mirrored
database?
Yes, data deleted using TTL is treated in the same way as data deleted using delete
operations in Azure Cosmos DB. The data is then deleted from the mirrored database.
Mirroring doesn't distinguish between these deletion modalities.

Can we configure soft-deletes for analytical data


mirrored in Fabric from Azure Cosmos DB?
Delete operations are replicated immediately to OneLake. There's currently no way to
configure soft-deletes or analytical time-to-live (TTL).

Does Azure Cosmos DB mirroring support


analytical time-to-live?
No, analytical time-to-live isn't supported.

Accessing OneLake data


Can I access OneLake files generated by Azure
Cosmos DB mirroring directly?
Yes, you can access OneLake files directly using the file or storage explorers. You can
also use OneLake delta files in Databricks. For more information, see access Fabric data
directly using OneLake file explorer or integrate OneLake with Azure Databricks.

API support
Can I configure Azure Cosmos DB mirroring
programatically?
No, support for automated mirroring configuring is currently not available.

Is built-in continuous integration or deployment


(CI/CD) available for Azure Cosmos DB
mirroring?
No, support for built-in CI/CD is currently not available.

Security
Can you access an Azure Cosmos DB mirrored
database using Power BI Gateway or behind a
firewall?
No, this level of access is currently not supported.

Does Azure Cosmos DB mirroring support


private endpoints?
No, private endpoints are currently not supported.

Does mirrored data from Azure Cosmos DB ever


leave my Fabric tenant?
No, data remains in your Fabric tenant.

Is mirrored data from Azure Cosmos DB stored


outside of my environment?
No, data is staged directly in your tenant's OneLake and isn't staged outside of your
environment.

Licensing
What are the licensing options for Azure Cosmos
DB mirroring?
Power BI Premium, Fabric Capacity, or Trial Capacity licensing is required to use
mirroring.

What license is required for a user to create and


configure mirroring for Azure Cosmos DB data?
For information about licensing, see Fabric licenses.

What license is required for a user to consume


mirrored data from Azure Cosmos DB?
For information about licensing, see Fabric licenses.

Related content
Overview of Microsoft Fabric mirrored databases from Azure Cosmos DB
Troubleshooting: Microsoft Fabric mirrored databases from Azure Cosmos DB
Limitations: Microsoft Fabric mirrored databases from Azure Cosmos DB

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


Limitations in Microsoft Fabric mirrored
databases from Azure Cosmos DB
(Preview)
Article • 11/20/2024

This article details the current limitations for Azure Cosmos DB accounts mirrored into
Microsoft Fabric. The limitation and quota details on this page are subject to change in
the future.

) Important

Mirroring for Azure Cosmos DB is currently in preview. Production workloads aren't


supported during preview. Currently, only Azure Cosmos DB for NoSQL accounts
are supported.

Availability
Mirroring is supported in a specific set of regions for Fabric and APIs for Azure Cosmos
DB.

Supported APIs
Mirroring is only available for the Azure Cosmos DB account types listed here.

ノ Expand table

Available

API for NoSQL Yes

API for MongoDB (RU-based) No

API for MongoDB (vCore-based) No

API for Apache Gremlin No

API for Table No

API for Apache Cassandra (RU-based) No

Managed Instance for Apache Cassandra No


Supported regions
Here's a list of regions that support mirroring for Azure Cosmos DB:

Asia Pacific:
Australia East
Australia Southeast
Central India
East Asia
Japan East
Korea Central
Southeast Asia
South India

Europe
North Europe
West Europe
France Central
Germany West Central
Norway East
Sweden Central
Switzerland North
Switzerland West
UK South
UK West

Americas:
Brazil South
Canada Central
Canada East
Central US
East US
East US2
North Central US
West US
West US2

Middle East and Africa:


South Africa North
UAE North
Account and database limitations
You can enable mirroring only if the Azure Cosmos DB account is configured with
either 7-day or 30-day continuous backup.
All current limitations of the continuous backup feature in Azure Cosmos DB also
apply to Fabric mirroring.
These limitations include, but aren't limited to; the inability to disable
continuous backup once enabled and lack of support for multi-region write
accounts. For more information, see Azure Cosmos DB continuous backup
limitations.
You can enable both the analytical store and continuous backup features on the
same Azure Cosmos DB account.
You can't disable the analytical store feature on Azure Cosmos DB accounts with
continuous backup enabled.
You can't enable continuous backup on an Azure Cosmos DB account that
previously disabled the analytical store feature for a container.

Security limitations
Azure Cosmos DB read-write account keys are the only supported mechanism to
connect to the source account. Read-only account keys, managed identities, and
passwordless authentication with role-based access control aren't supported.
You must update the connection credentials for Fabric mirroring if the account
keys are rotated. If you don't update the keys, mirroring fails. To resolve this failure,
stop replication, update the credentials with the newly rotated keys, and then
restart replication.
Fabric users with access to the workspace automatically inherit access to the mirror
database. However, you can granularly control workspace and tenant level access
to manage access for users in your organization.
You can directly share the mirrored database in Fabric.

Permissions
If you only have viewer permissions in Fabric, you can't preview or query data in
the SQL analytics endpoint.
If you intend to use the data explorer, the Azure Cosmos DB data explorer doesn't
use the same permissions as Fabric. Requests to view and query data using the
data explorer are routed to Azure instead of Fabric.

Network security
The source Azure Cosmos DB account must enable public network access for all
networks.
Private endpoints aren't supported for Azure Cosmos DB accounts.
Network isolation using techniques and features like IP addresses or service
endpoints aren't supported for Azure Cosmos DB accounts.
Data in OneLake doesn't support private endpoints, customer managed keys, or
double encryption.

Data explorer limitations


Fabric Data Explorer queries are read-only. You can view existing containers, view
items, and query items.
You can't create or delete containers using the data explorer in Fabric.
You can't insert, modify, or delete items using the data explorer in Fabric.
You can avoid sharing the source database by only sharing the SQL analytics
endpoint with other users for analytics.
You can't turn off the data explorer in a mirrored database.

Replication limitations
Mirroring doesn't support containers that contain items with property names
containing either whitespaces or wild-card characters. This limitation causes
mirroring for the specific container to fail. Other containers within the same
databases can still successfully mirror. If property names are updated to remove
these invalid characters, you must configure a new mirror to the same database
and container and you can't use the old mirror.
Fabric OneLake mirrors from the geographically closest Azure region to Fabric's
capacity region in scenarios where an Azure Cosmos DB account has multiple read
regions. In disaster recovery scenarios, mirroring automatically scans and picks up
new read regions as your read regions could potentially fail over and change.
Delete operations in the source container are immediately reflected in Fabric
OneLake using mirroring. Soft-delete operations using time-to-live (TTL) values
isn't supported.
Mirroring doesn't support custom partitioning.
Fabric has existing limitations with T-SQL. For more information, see T-SQL
limitations.

Schema and data changes


Deleting and adding a similar container replaces the data in the warehouse tables
with only the new container's data.
Changing the type of data in a property across multiple items cause the replicator
to upcast the data where applicable. This behavior is in parity with the native delta
experience. Any data that doesn't fit into the supported criteria become a null type.
For example, changing an array property to a string upcasts to a null type.
Adding new properties to items cause mirroring to seamlessly detect the new
properties and add corresponding columns to the warehouse table. If item
properties are removed or missing, they have a null value for the corresponding
record.
Replicating data using mirroring doesn't have a full-fidelity or well-defined schema.
Mirroring automatically and continuously tracks property changes and data type
(when allowed).

Nested data
Nested JSON objects in Azure Cosmos DB items are represented as JSON strings in
warehouse tables.
Commands such as OPENJSON , CROSS APPLY , and OUTER APPLY are available to
expand JSON string data selectively.
PowerQuery includes ToJson to expand JSON string data selectively.
Mirroring doesn't have schema constraints on the level of nesting. For more
information, see Azure Cosmos DB analytical store schema constraints.

Data warehouse limitations


Warehouse can't handle JSON string columns greater than 8 KB in size. The error
message for this scenario is "JSON text is not properly formatted. Unexpected
character '"' is found at position".
Nested data represented as a JSON string in SQL analytics endpoint and
warehouse tables can commonly cause the column to increase to more than 8 KB
in size. Monitoring levels of nesting and the amount of data if you receive this
error message.

Mirrored item limitations


Enabling mirroring for an Azure Cosmos DB account in a workspace requires either
the admin or member role in your workspace.
Stopping replication disables mirroring completely.
Starting replication again reseeds all of the target warehouse tables. This operation
effectively starts mirroring from scratch.

Give feedback
If you would like to give feedback on current limitations, features, or issues; let us know
at [email protected].

Related content
Mirroring Azure Cosmos DB (Preview)
FAQ: Microsoft Fabric mirrored databases from Azure Cosmos DB
Troubleshooting: Microsoft Fabric mirrored databases from Azure Cosmos DB

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


Troubleshoot Microsoft Fabric
mirrored databases from Azure
Cosmos DB (Preview)
FAQ

Mirroring in Microsoft Fabric provides a seamless no-ETL experience to integrate your


existing Azure Cosmos DB data with the rest of your data in Fabric. Use the tips in this
article to help troubleshoot problems that you might experience when you create a
mirrored database for Azure Cosmos DB in Fabric.

) Important

Mirroring for Azure Cosmos DB is currently in preview. Production workloads aren't


supported during preview. Currently, only Azure Cosmos DB for NoSQL accounts
are supported.

Here's a list of common issues and relevant troubleshooting steps to follow if mirroring
an Azure Cosmos DB database to Microsoft Fabric isn't working as expected.

Mirroring is failing when loading the


databases with an "SQLAPIendpoint"
error. How do I resolve this error?
This error typically indicates that your Azure Cosmos DB account key is no longer valid
for the connection you selected. Once the connection credentials are updated with a
valid account key, set up mirroring again.

Fabric is unable to configure mirroring


with an error indicating that my Azure
Cosmos DB account doesn't have
continuous backup. How do I fix this
error?
Enable continuous backup for your Azure Cosmos DB account. For more information,
see create an Azure Cosmos DB account with continuous backup or migrate an existing
Azure Cosmos DB account to continuous backup.

Once the continuous backup feature is enabled, return to the Fabric mirroring setup and
continue with the remaining steps.

How do I know if my Azure Cosmos DB


account has continuous backup
enabled?
Using the Azure portal, you can check if the continuous backup feature is enabled by
locating the Point in Time Restore option in the resource menu for the Azure Cosmos
DB account. If this option isn't available, either the account doesn't have continuous
backup enabled, or the account is migrating to continuous backup.

Enabling continuous backup in my


Azure Cosmos DB account is causing
various errors. Can I still set up Fabric
mirroring?
No, Continuous backup must be enabled for Azure Cosmos DB accounts that are
intended to be a mirroring source.

If there's an error message when enabling continuous backup for an Azure Cosmos DB
account, the account might have limitations blocking the feature. For example, if you
previously deleted analytical store for the account, the account can't support continuous
backup. In this scenario, the only remaining option is to use a new Azure Cosmos DB
account for mirroring.

Why is replication not working and I'm


getting "internal server error" when I
select "monitor replication"?
Replication could be working and you're observing a transient error if Azure Cosmos DB
is throttling requests from Fabric. Additionally, there might be a limitation of mirroring
with Azure Cosmos DB causing this issue. For more information, see Azure Cosmos DB
mirroring limitations.

Refresh the Fabric portal and determine if the problem is automatically resolved. Also,
you can stop and start replication. If none of these options work, open a support ticket.

How can I be sure Fabric is replicating


data from Azure Cosmos DB?
First, follow general troubleshooting steps for Fabric mirrored databases. For more
information, see troubleshooting.

In most cases, the Monitor replication option can provide further detail indicating
whether data is replicating to Fabric successfully. A common troubleshooting step is to
check if the last refreshed time is recent. If the time isn't recent, stop and then restart
replication as the next step. Note, "last refreshed time" is only updated if the source
database has changes since the time noted for replication. If the source database has no
updates, deletes or inserts, "last refreshed time" will not be updated.

The "monitor replication" pane includes


tables with no rows replicated after a
significant amount of time. Is replication
stuck?
Replication is likely stuck. Stop and restart replication as a first step. If this step doesn't
work, open a support ticket.

Why can't I find any tables in the SQL


analytics endpoint?
First, refresh the Schemas and dbo node to determine if the tables are ready. Tables are
automatically loaded after they're ready. If no tables are ready after a significant amount
of time, use the Monitor replication pane to determine if any replication errors
occurred.
Why does my target warehouse tables
only include the `_rid` column after
replicating?
First, refresh the Schemas and dbo node to determine if the tables are ready. Tables are
automatically loaded after they're ready. If more columns aren't ready after a significant
amount of time, use the Monitor replication pane to determine if any replication errors
occurred.

I added new items to a container in my


Azure Cosmos DB database. These items
aren't included in the results of my SQL
analytics endpoint queries. How do I
know if replication is working?
The Monitor replication option can provide further detail indicating whether data is
replicating to Fabric successfully. A common troubleshooting step is to check if the last
refreshed time is recent. If the time isn't recent, stop and then restart replication as the
next step. If the time is recent, attempt your query again. Sometimes, there can be a
delay between data being inserted into Azure Cosmos DB and it being replicated and
available in Fabric.

If the data is still not available, use Lakehouse to create a shortcut and run a Spark query
from a notebook. Spark always shows the latest data. If the data is available in Spark but
not SQL analytics, open a support ticket.

If the data is also not available in Spark, there might be an unintended issue with
replication latency. Wait for some time and retry replication. If problems persist, open a
support ticket.

Why am I getting a "JSON text isn't


properly formatted. Unexpected
character '"' is found at position" error
message when running T-SQL queries
against my SQL analytics endpoint?
Data warehouse can't handle JSON string columns greater than 8 KB in size. Nested data
represented as a JSON string in SQL analytics endpoint or warehouse tables can
commonly cause the column to increase to more than 8 KB in size. Monitoring levels of
nesting and the amount of data if you receive this error message. For more information,
see data warehouse limitations.

Why am I getting an "Invalid column


name" error in the "monitor replication"
pane?
Mirroring doesn't support containers that contain items with property names containing
either whitespaces or wild-card characters. This limitation causes mirroring for the
specific container to fail. Other containers within the same databases can still
successfully mirror. For more information, see replication limitations.

Related content
Overview of Microsoft Fabric mirrored databases from Azure Cosmos DB
FAQ: Microsoft Fabric mirrored databases from Azure Cosmos DB
Limitations: Microsoft Fabric mirrored databases from Azure Cosmos DB

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


How to: Join mirrored Azure Cosmos DB
data with other mirrored databases in
Microsoft Fabric (Preview)
Article • 11/19/2024

In this guide, join two Azure Cosmos DB for NoSQL containers from separate databases
using Fabric mirroring.

You can join data from Cosmos DB with any other mirrored databases, warehouses, or
lakehouses within same Fabric workspace.

) Important

Mirroring for Azure Cosmos DB is currently in preview. Production workloads aren't


supported during preview. Currently, only Azure Cosmos DB for NoSQL accounts
are supported.

Prerequisites
An existing Azure Cosmos DB for NoSQL account.
If you don't have an Azure subscription, Try Azure Cosmos DB for NoSQL free .
If you have an existing Azure subscription, create a new Azure Cosmos DB for
NoSQL account.
An existing Fabric capacity. If you don't have an existing capacity, start a Fabric
trial.
The Azure Cosmos DB for NoSQL account must be configured for Fabric mirroring.
For more information, see account requirements.

 Tip

During the public preview, it's recommended to use a test or development copy of
your existing Azure Cosmos DB data that can be recovered quickly from a backup.

Setup mirroring and prerequisites


Configure mirroring for the Azure Cosmos DB for NoSQL database. If you're unsure how
to configure mirroring, refer to the configure mirrored database tutorial.
1. Navigate to the Fabric portal .

2. Create a new connection using your Azure Cosmos DB account's credentials.

3. Mirror the first database using the connection you configured.

4. Now, mirror the second database.

5. Wait for replication to finish the initial snapshot of data for both mirrors.

Create a query that joins databases


Now, use the SQL analytics endpoint to create a query across two mirrored database
items, without the need for data movement. Both items should be in the same
workspace.

1. Navigate to one of the mirrored databases in the Fabric portal.

2. Switch from Mirrored Azure Cosmos DB to SQL analytics endpoint.

3. In the menu, select + Warehouses. Select the SQL analytics endpoint item for the
other mirrored database.

4. Open the context menu for the table and select New SQL Query. Write an example
query that combines both databases.
For example, this query would execute across multiple containers and databases,
without any data movement. This example assumes the name of your table and
columns. Use your own table and columns when writing your SQL query.

SQL

SELECT
product_category_count = COUNT (product_category),
product_category
FROM
[StoreSalesDB].[dbo].[storeorders_Sql] as StoreSales
INNER JOIN
[dbo].[OrdersDB_order_status] as OrderStatus
ON StoreSales.order_id = OrderStatus.order_id
WHERE
order_status='delivered'
AND OrderStatus.order_month_year > '6/1/2022'
GROUP BY
product_category
ORDER BY
product_category_count desc

You can add data from more sources and query them seamlessly. Fabric simplifies
and eases bringing your organizational data together.
Related content
FAQ: Microsoft Fabric mirrored databases from Azure Cosmos DB
Troubleshooting: Microsoft Fabric mirrored databases from Azure Cosmos DB
Limitations in Microsoft Fabric mirrored databases from Azure Cosmos DB
(Preview)

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


How to: Query nested data in Microsoft
Fabric mirrored databases from Azure
Cosmos DB (Preview)
Article • 11/19/2024

Use the mirrored database in Microsoft Fabric to query nested JSON data sourced from
Azure Cosmos DB for NoSQL.

) Important

Mirroring for Azure Cosmos DB is currently in preview. Production workloads aren't


supported during preview. Currently, only Azure Cosmos DB for NoSQL accounts
are supported.

Prerequisites
An existing Azure Cosmos DB for NoSQL account.
If you don't have an Azure subscription, Try Azure Cosmos DB for NoSQL free .
If you have an existing Azure subscription, create a new Azure Cosmos DB for
NoSQL account.
An existing Fabric capacity. If you don't have an existing capacity, start a Fabric
trial.
The Azure Cosmos DB for NoSQL account must be configured for Fabric mirroring.
For more information, see account requirements.

 Tip

During the public preview, it's recommended to use a test or development copy of
your existing Azure Cosmos DB data that can be recovered quickly from a backup.

Create nested data within the source database


Create JSON items within your Azure Cosmos DB for NoSQL account that contain
varying levels of nested JSON data.

1. Navigate to your Azure Cosmos DB account in the Azure portal .


2. Select Data Explorer from the resource menu.

3. Use + New container to create a new container. For this guide, name the container
TestC . The corresponding database name is arbitrary.

4. Use the + New item option multiple times to create and save these five JSON
items.

JSON

{
"id": "123-abc-xyz",
"name": "A 13",
"country": "USA",
"items": [
{
"purchased": "11/23/2022",
"order_id": "3432-2333-2234-3434",
"item_description": "item1"
},
{
"purchased": "01/20/2023",
"order_id": "3431-3454-1231-8080",
"item_description": "item2"
},
{
"purchased": "02/20/2023",
"order_id": "2322-2435-4354-2324",
"item_description": "item3"
}
]
}

JSON

{
"id": "343-abc-def",
"name": "B 22",
"country": "USA",
"items": [
{
"purchased": "01/20/2023",
"order_id": "2431-2322-1545-2322",
"item_description": "book1"
},
{
"purchased": "01/21/2023",
"order_id": "3498-3433-2322-2320",
"item_description": "book2"
},
{
"purchased": "01/24/2023",
"order_id": "9794-8858-7578-9899",
"item_description": "book3"
}
]
}

JSON

{
"id": "232-abc-x43",
"name": "C 13",
"country": "USA",
"items": [
{
"purchased": "04/03/2023",
"order_id": "9982-2322-4545-3546",
"item_description": "clothing1"
},
{
"purchased": "05/20/2023",
"order_id": "7989-9989-8688-3446",
"item_description": "clothing2"
},
{
"purchased": "05/27/2023",
"order_id": "9898-2322-1134-2322",
"item_description": "clothing3"
}
]
}

JSON

{
"id": "677-abc-yuu",
"name": "D 78",
"country": "USA"
}

JSON

{
"id": "979-abc-dfd",
"name": "E 45",
"country": "USA"
}
Setup mirroring and prerequisites
Configure mirroring for the Azure Cosmos DB for NoSQL database. If you're unsure how
to configure mirroring, refer to the configure mirrored database tutorial.

1. Navigate to the Fabric portal .

2. Create a new connection and mirrored database using your Azure Cosmos DB
account's credentials.

3. Wait for replication to finish the initial snapshot of data.

Query basic nested data


Now, use the SQL analytics endpoint to create a query that can handle simple nested
JSON data.

1. Navigate to the mirrored database in the Fabric portal.

2. Switch from Mirrored Azure Cosmos DB to SQL analytics endpoint.

3. Open the context menu for the test table and select New SQL Query.

4. Run this query to expand on the items array with OPENJSON . This query uses OUTER
APPLY to include extra items that might not have an items array.

SQL

SELECT
t.name,
t.id,
t.country,
P.purchased,
P.order_id,
P.item_description
FROM OrdersDB_TestC AS t
OUTER APPLY OPENJSON(t.items) WITH
(
purchased datetime '$.purchased',
order_id varchar(100) '$.order_id',
item_description varchar(200) '$.item_description'
) as P

 Tip

When choosing the data types in OPENJSON , using varchar(max) for string
types could worsen query performance. Instead, use varchar(n) wher n could
be any number. The lower n is, the more likely you will see better query
performance.

5. Use CROSS APPLY in the next query to only show items with an items array.

SQL

SELECT
t.name,
t.id,
t.country,
P.purchased,
P.order_id,
P.item_description
FROM
OrdersDB_TestC as t CROSS APPLY OPENJSON(t.items) WITH (
purchased datetime '$.purchased',
order_id varchar(100) '$.order_id',
item_description varchar(200) '$.item_description'
) as P

Create deeply nested data


To build on this nested data example, let's add a deeply nested data example.

1. Navigate to your Azure Cosmos DB account in the Azure portal .

2. Select Data Explorer from the resource menu.

3. Use + New container to create a new container. For this guide, name the container
TestD . The corresponding database name is arbitrary.

4. Use the + New item option multiple times to create and Save this JSON item.

JSON
{
"id": "eadca09b-e618-4090-a25d-b424a26c2361",
"entityType": "Package",
"packages": [
{
"packageid": "fiwewsb-f342-jofd-a231-c2321",
"storageTemperature": "69",
"highValue": true,
"items": [
{
"id": "1",
"name": "Item1",
"properties": {
"weight": "2",
"isFragile": "no"
}
},
{
"id": "2",
"name": "Item2",
"properties": {
"weight": "4",
"isFragile": "yes"
}
}
]
},
{
"packageid": "d24343-dfdw-retd-x414-f34345",
"storageTemperature": "78",
"highValue": false,
"items": [
{
"id": "3",
"name": "Item3",
"properties": {
"weight": "12",
"isFragile": "no"
}
},
{
"id": "4",
"name": "Item4",
"properties": {
"weight": "12",
"isFragile": "no"
}
}
]
}
],
"consignment": {
"consignmentId": "ae21ebc2-8cfc-4566-bf07-b71cdfb37fb2",
"customer": "Humongous Insurance",
"deliveryDueDate": "2020-11-08T23:38:50.875258Z"
}
}

Query deeply nested data


Finally, create a T-SQL query that can find data deeply nested in a JSON string.

1. Open the context menu for the TestD table and select New SQL Query again.

2. Run this query to expand all levels of nested data using OUTER APPLY with
consignment.

SQL

SELECT
P.id,
R.packageId,
R.storageTemperature,
R.highValue,
G.id,
G.name,
H.weight,
H.isFragile,
Q.consignmentId,
Q.customer,
Q.deliveryDueDate
FROM
OrdersDB_TestD as P CROSS APPLY OPENJSON(P.packages) WITH (
packageId varchar(100) '$.packageid',
storageTemperature INT '$.storageTemperature',
highValue varchar(100) '$.highValue',
items nvarchar(MAX) AS JSON ) as R
OUTER APPLY OPENJSON (R.items) WITH (
id varchar(100) '$.id',
name varchar(100) '$.name',
properties nvarchar(MAX) as JSON
) as G OUTER APPLY OPENJSON(G.properties) WITH (
weight INT '$.weight',
isFragile varchar(100) '$.isFragile'
) as H OUTER APPLY OPENJSON(P.consignment) WITH (
consignmentId varchar(200) '$.consignmentId',
customer varchar(100) '$.customer',
deliveryDueDate Date '$.deliveryDueDate'
) as Q

7 Note
When expanding packages , items is represented as JSON, which can
optionally expand. The items property has sub-properties as JSOn which also
can optionally expand.

3. Finally, run a query that chooses when to expand specific levels of nesting.

SQL

SELECT
P.id,
R.packageId,
R.storageTemperature,
R.highValue,
R.items,
Q.consignmentId,
Q.customer,
Q.deliveryDueDate
FROM
OrdersDB_TestD as P CROSS APPLY OPENJSON(P.packages) WITH (
packageId varchar(100) '$.packageid',
storageTemperature INT '$.storageTemperature',
highValue varchar(100) '$.highValue',
items nvarchar(MAX) AS JSON
) as R
OUTER APPLY OPENJSON(P.consignment) WITH (
consignmentId varchar(200) '$.consignmentId',
customer varchar(100) '$.customer',
deliveryDueDate Date '$.deliveryDueDate'
) as Q

7 Note

Property limits for nested levels are not enforced in this T-SQL query
experience.

Related content
FAQ: Microsoft Fabric mirrored databases from Azure Cosmos DB
Troubleshooting: Microsoft Fabric mirrored databases from Azure Cosmos DB
Limitations in Microsoft Fabric mirrored databases from Azure Cosmos DB
(Preview)

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


How to: Access mirrored Azure Cosmos
DB data in Lakehouse and notebooks
from Microsoft Fabric (Preview)
Article • 11/19/2024

In this guide, you learn how to Access mirrored Azure Cosmos DB data in Lakehouse
and notebooks from Microsoft Fabric (Preview).

) Important

Mirroring for Azure Cosmos DB is currently in preview. Production workloads aren't


supported during preview. Currently, only Azure Cosmos DB for NoSQL accounts
are supported.

Prerequisites
An existing Azure Cosmos DB for NoSQL account.
If you don't have an Azure subscription, Try Azure Cosmos DB for NoSQL free .
If you have an existing Azure subscription, create a new Azure Cosmos DB for
NoSQL account.
An existing Fabric capacity. If you don't have an existing capacity, start a Fabric
trial.
The Azure Cosmos DB for NoSQL account must be configured for Fabric mirroring.
For more information, see account requirements.

 Tip

During the public preview, it's recommended to use a test or development copy of
your existing Azure Cosmos DB data that can be recovered quickly from a backup.

Setup mirroring and prerequisites


Configure mirroring for the Azure Cosmos DB for NoSQL database. If you're unsure how
to configure mirroring, refer to the configure mirrored database tutorial.

1. Navigate to the Fabric portal .


2. Create a new connection and mirrored database using your Azure Cosmos DB
account's credentials.

3. Wait for replication to finish the initial snapshot of data.

Access mirrored data in Lakehouse and


notebooks
Use Lakehouse to further extend the number of tools you can use to analyze your Azure
Cosmos DB for NoSQL mirrored data. Here, you use Lakehouse to build a Spark
notebook to query your data.

1. Navigate to the Fabric portal home again.

2. In the navigation menu, select Create.

3. Select Create, locate the Data Engineering section, and then select Lakehouse.

4. Provide a name for the Lakehouse and then select Create.

5. Now select Get Data, and then New shortcut. From the list of shortcut options,
select Microsoft OneLake.

6. Select the mirrored Azure Cosmos DB for NoSQL database from the list of mirrored
databases in your Fabric workspace. Select the tables to use with Lakehouse, select
Next, and then select Create.

7. Open the context menu for the table in Lakehouse and select New or existing
notebook.

8. A new notebook automatically opens and loads a dataframe using SELECT LIMIT
1000 .

9. Run queries like SELECT * using Spark.

Python

df = spark.sql("SELECT * FROM Lakehouse.OrdersDB_customers LIMIT 1000")


display(df)

7 Note

This example assumes the name of your table. Use your own table when
writing your Spark query.

Write back using Spark


Finally, you can use Spark and Python code to write data back to your source Azure
Cosmos DB account from notebooks in Fabric. You might want to do this to write back
analytical results to Cosmos DB, which can then be using as serving plane for OLTP
applications.

1. Create four code cells within your notebook.

2. First, query your mirrored data.

Python

fMirror = spark.sql("SELECT * FROM Lakehouse1.OrdersDB_ordercatalog")

 Tip

The table names in these sample code blocks assume a certain data schema.
Feel free to replace this with your own table and column names.

3. Now transform and aggregate the data.

Python
dfCDB =
dfMirror.filter(dfMirror.categoryId.isNotNull()).groupBy("categoryId").
agg(max("price").alias("max_price"), max("id").alias("id"))

4. Next, configure Spark to write back to your Azure Cosmos DB for NoSQL account
using your credentials, database name, and container name.

Python

writeConfig = {
"spark.cosmos.accountEndpoint" :
"https://xxxx.documents.azure.com:443/",
"spark.cosmos.accountKey" : "xxxx",
"spark.cosmos.database" : "xxxx",
"spark.cosmos.container" : "xxxx"
}

5. Finally, use Spark to write back to the source database.

Python

dfCDB.write.mode("APPEND").format("cosmos.oltp").options(**writeConfig)
.save()

6. Run all of the code cells.

) Important

Write operations to Azure Cosmos DB will consume request units (RUs).

Related content
FAQ: Microsoft Fabric mirrored databases from Azure Cosmos DB
Troubleshooting: Microsoft Fabric mirrored databases from Azure Cosmos DB
Limitations in Microsoft Fabric mirrored databases from Azure Cosmos DB
(Preview)

Feedback
Was this page helpful?  Yes  No
Provide product feedback | Ask the community
Mirroring Azure Databricks Unity
Catalog (Preview)
Article • 11/20/2024

Many organizations today register their data in Unity Catalog within Azure Databricks. A
mirrored Unity Catalog in Fabric enables customer to read data managed by Unity
Catalog from Fabric workloads. Azure Databricks and Fabric are better together.

For a tutorial on configuring your Azure Databricks Workspace for mirroring the Unity
Catalog into Fabric, see Tutorial: Configure Microsoft Fabric mirrored databases from
Azure Databricks (Preview).

Mirrored databases in Fabric allow users to enjoy a highly integrated, end-to-end, and
easy-to-use product that is designed to simplify your analytics needs. You can enjoy an
easy-to-use product designed to simplify your analytics needs and built for openness
and collaboration between Microsoft Fabric and Azure Databricks.

When you use Fabric to read data that is registered in Unity Catalog, there is no data
movement or data replication. Only the Azure Databricks catalog structure is mirrored to
Fabric and the underlying catalog data is accessed through shortcuts. Hence any
changes in data are reflected immediately in Fabric.

What analytics experiences are built in


Mirrored catalogs are an item in Fabric Data Warehousing distinct from the Warehouse
and SQL analytics endpoint.

When you mirror an Azure Databricks Unity Catalog, Fabric creates three items:

Mirrored Azure Databricks item


A SQL analytics endpoint on a Lakehouse
A default semantic model

You can access your mirrored Azure Databricks data multiple ways:

Each mirrored Azure Databricks item has an autogenerated SQL analytics endpoint
that provides a rich analytical experience created by the mirroring process. Use T-
SQL commands to define and query data objects from the read-only SQL analytics
endpoint.
Use Power BI with Direct Lake mode to create reports against the Azure Databricks
item.
Metadata sync
When you create a new mirrored database from Azure Databricks in Fabric, by default,
the Automatically sync future catalog changes for the selected schema is enabled. The
following metadata changes are reflected from your Azure Databricks workspace to
Fabric if automatic sync is enabled:

Addition of schemas to a catalog.


Deletion of schemas from a catalog.
Addition of tables to a schema.
Deletion of tables from a schema.

Schema/table selection:

By default, the entire catalog is selected when the user adds the catalog.
The user can exclude certain tables within the schema.
Unselecting a schema unselects all the tables within the schema.
If the user goes back and selects the schema, all tables within the schema are
selected again.
Same selection behavior applies to schemas within a catalog.

There are other filtration conditions that are applied to catalogs/schemas/tables:

Materialized views and streaming tables will not be displayed.


External tables that don't support Delta format will not be displayed.

Related content
Tutorial: Configure Microsoft Fabric mirrored databases from Azure Databricks
(Preview)
Secure Fabric mirrored databases from Azure Databricks
Limitations in Microsoft Fabric mirrored databases from Azure Databricks (Preview)
Review the FAQ

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


Tutorial: Configure Microsoft Fabric
mirrored databases from Azure
Databricks (Preview)
Article • 11/19/2024

Database mirroring in Microsoft Fabric is an enterprise, cloud-based, zero-ETL, SaaS


technology. This guide helps you establish a mirrored database from Azure Databricks,
which creates a read-only, continuously replicated copy of your Azure Databricks data in
OneLake.

Prerequisites
Create or use an existing Azure Databricks workspace with Unity Catalog enabled.
You must have the EXTERNAL USE SCHEMA privilege on the schema in Unity Catalog
that contains the tables that will be accessed from Fabric. For more information,
see Control external access to data in Unity Catalog.
Turn on the tenant setting "Mirrored Azure Databricks Catalog (Preview)" at the
tenant or capacity level for this feature.
You need to use Fabric's permissions model to set access controls for catalogs,
schemas, and tables in Fabric.
Azure Databricks workspaces shouldn't be behind a private endpoint.
Storage accounts containing Unity Catalog data can't be behind a firewall.

Create a mirrored database from Azure


Databricks
Follow these steps to create a new mirrored database from your Azure Databricks Unity
Catalog.

1. Navigate to https://powerbi.com .

2. Select + New and then Mirrored Azure Databricks catalog.


3. Select an existing connection if you have one configured.

If you don't have an existing connection, create a new connection and enter
all the details. You can authenticate to your Azure Databricks workspace
using 'Organizational account' or "Service principal". To create a connection,
you must be either a user or an admin of the Azure Databricks workspace.

4. Once you connect to an Azure Databricks workspace, on the Choose tables from a
Databricks catalog page, you're able to select the catalog, schemas, and tables via
the inclusion/exclusion list that you want to add and access from Microsoft Fabric.
Pick the catalog and its related schemas and tables that you want to add to your
Fabric workspace.

You can only see the catalogs/schemas/tables that you have access to as per
the privileges that are granted to them as per the privilege model described
at Unity Catalog privileges and securable objects.
By default, the Automatically sync future catalog changes for the selected
schema is enabled. For more information, see Mirroring Azure Databricks
Unity Catalog (Preview).
When you have made your selections, select Next.

5. By default, the name of the item will be the name of the catalog you're trying to
add to Fabric. On the Review and create page, you can review the details and
optionally change the mirrored database item name, which must be unique in your
workspace. Select Create.

6. A Databricks catalog item is created and for each table, a corresponding


Databricks type shortcut is also created.

Schemas that don't have any tables won't be shown.

7. You can also see a preview of the data when you access a shortcut by selecting the
SQL analytics endpoint. Open the SQL analytics endpoint item to launch the
Explorer and Query editor page. You can query your mirrored Azure Databricks
tables with T-SQL in the SQL Editor.

Create Lakehouse shortcuts to the Databricks


catalog item
You can also create shortcuts from your Lakehouse to your Databricks catalog item to
use your Lakehouse data and use Spark Notebooks.

1. First, we create a lakehouse. If you already have a lakehouse in this workspace, you
can use an existing lakehouse.
a. Select your workspace in the navigation menu.
b. Select + New > Lakehouse.
c. Provide a name for your lakehouse in the Name field, and select Create.
2. In the Explorer view of your lakehouse, in the Get data in your lakehouse menu,
under Load data in your lakehouse, select the New shortcut button.
3. Select Microsoft OneLake. Select a catalog. This is the data item that you created
in the previous steps. Then select Next.
4. Select tables within the schema, and select Next.
5. Select Create.
6. Shortcuts are now available in your Lakehouse to use with your other Lakehouse
data. You can also use Notebooks and Spark to perform data processing on the
data for these catalog tables that you added from your Azure Databricks
workspace.

Create a Semantic Model

 Tip

For the best experience, it's recommended that you use Microsoft Edge Browser for
Semantic Modeling Tasks.

Learn more about the default Power BI semantic model.

In addition to the default Power BI semantic model, you have the option of updating the
default Power BI semantic model if you choose to add/remove tables from the model or
create a new Semantic Model. To update the Default semantic model:

1. Navigate to your Mirrored Azure Databricks item in your workspace.


2. Select the SQL analytics endpoint from the dropdown list in the toolbar.
3. Under Reporting, select Manage default semantic model.

Manage your semantic model relationships


1. Select Model Layouts from the Explorer in your workspace.
2. Once Model layouts are selected, you are presented with a graphic of the tables
that were included as part of the Semantic Model.
3. To create relationships between tables, drag a column name from one table to
another column name of another table. A popup is presented to identify the
relationship and cardinality for the tables.

Related content
Secure Fabric mirrored databases from Azure Databricks

Limitations in Microsoft Fabric mirrored databases from Azure Databricks (Preview)

Review the FAQ

Mirroring Azure Databricks Unity Catalog (Preview)

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


Secure Fabric mirrored databases from
Azure Databricks
Article • 11/20/2024

This article helps you establish data security in your mirrored Azure Databricks in
Microsoft Fabric.

Unity Catalog
Users must reconfigure Unity Catalog policies and permissions in Fabric.

To allow Azure Databricks Catalogs to be available in Fabric, see Control external access
to data in Unity Catalog.

Unity Catalog policies and permission aren't mirrored in Fabric. Users can't reuse Unity
Catalog policies and permissions in Fabric. Permissions set on catalogs, schemas, and
tables inside Azure Databricks doesn't carry over to Fabrics workspaces. You need to use
Fabric's permission model to set access control on objects in Fabric.

The credential used to create the connection to Unity Catalog of this catalog mirroring is
used for all data queries.

Permissions
Permissions set on catalogs, schemas, and tables in your Azure Databricks workspace
can't be replicated to your Fabric workspace. Use Fabric's permissions model to set
access controls for catalogs, schemas, and tables in Fabric.

When selecting objects to mirror, you can only see the catalogs/schemas/tables that you
have access to as per the privileges that are granted to them as per the privilege model
described at Unity Catalog privileges and securable objects.

For more information on setting up Fabric Workspace security, see the Permission model
and Roles in workspaces in Microsoft Fabric.

Related content
Tutorial: Configure Microsoft Fabric mirrored databases from Azure Databricks
(Preview)
Limitations in Microsoft Fabric mirrored databases from Azure Databricks (Preview)
Review the FAQ
Mirroring Azure Databricks Unity Catalog (Preview)

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


Limitations in Microsoft Fabric mirrored
databases from Azure Databricks
(Preview)
Article • 11/20/2024

This article lists current limitations with mirrored Azure Databricks in Microsoft Fabric.

Network
Azure Databricks workspaces shouldn't be behind a private endpoint.
Storage accounts containing unity catalog data can't be behind a firewall.
Azure Databricks IP Access lists aren't supported.

Supported Spark Versions


Fabric Runtime Version needs to be at least Spark 3.4 Delta 2.4. Verify in Workspace
Settings, Data Engineering/Science, Spark Settings, Environment Tab.

Limitations
Mirrored Azure Databricks item doesn't support renaming schema, table, or both
when added to the inclusion or exclusion list.
Azure Databricks workspaces shouldn't be behind a private endpoint.
Azure Data Lake Storage Gen 2 account that is utilized by your Azure Databricks
workspace must also be accessible to Fabric.

The following table types are not supported:

Tables with RLS/CLM policies


Lakehouse federated tables
Delta sharing tables
Streaming tables
Views, Materialized views

Supported regions
Here's a list of regions that support mirroring for Azure Databricks Catalog:
Asia Pacific:
Australia East
Australia Southeast
Central India
East Asia
Japan East
Japan West
Korea Central
Southeast Asia
South India

Europe
North Europe
West Europe
France Central
Germany North
Germany West Central
Norway East
Norway West
Sweden Central
Switzerland North
Switzerland West
Poland Central
Italy North
UK South
UK West

Americas:
Brazil South
Canada Central
Canada East
Central US
East US
East US2
North Central US
West US

Middle East and Africa:


South Africa North
South Africa West
UAE North
Related content
Tutorial: Configure Microsoft Fabric mirrored databases from Azure Databricks
(Preview)
Secure Fabric mirrored databases from Azure Databricks
Review the FAQ
Mirroring Azure Databricks Unity Catalog (Preview)

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


Frequently asked questions for
mirrored databases from Azure
Databricks (preview) in Microsoft
Fabric (Preview)
FAQ

This article answers frequently asked questions about mirrored databases from Azure
Databricks (preview) in Microsoft Fabric.

I am unable to create a new connection


to my Databricks Workspace using the
Connect to Azure Databricks Workspace
wizard.
Try these steps to resolve a connection issue.

1. Select the Settings button in your workspace.


2. Locate and select the Manage connections and gateways.
3. Select New in the workspace.
4. Select Cloud Object.
5. Provide a Connection name.
6. For Connection type, search for or type "Azure Databricks workspace".
7. Paste your Databricks workspace URL into the URL field.
8. Select Edit credentials and select your auth ID.
9. Select Create.
10. You can now go back to the Workspace landing page select New Item+ from the
workspace then select Mirrored Azure Databricks catalog.
11. This returns you to the wizard. You can now choose the Existing connection radial
and choose the connection you created.

What if I cannot perform an action


because there is already existing sync in
progress?
In the event the catalog is performing a metadata sync and Manage catalog is selected,
you could be blocked from modifying the catalog until the sync is complete. Wait for
the sync to complete and try Manage catalog again.

Related content
What is Mirroring in Fabric?
Mirroring Azure Databricks (Preview) Tutorial
Secure Fabric mirrored databases from Azure Databricks
Limitations
Mirroring Azure Databricks Unity Catalog (Preview)

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


Mirroring Snowflake in Microsoft Fabric
Article • 11/19/2024

Mirroring in Fabric provides an easy experience to avoid complex ETL (Extract Transform
Load) and integrate your existing Snowflake warehouse data with the rest of your data
in Microsoft Fabric. You can continuously replicate your existing Snowflake data directly
into Fabric's OneLake. Inside Fabric, you can unlock powerful business intelligence,
artificial intelligence, Data Engineering, Data Science, and data sharing scenarios.

For a tutorial on configuring your Snowflake database for Mirroring in Fabric, see
Tutorial: Configure Microsoft Fabric mirrored databases from Snowflake.

Why use Mirroring in Fabric?


With Mirroring in Fabric, you don't need to piece together different services from
multiple vendors. Instead, you can enjoy a highly integrated, end-to-end, and easy-to-
use product that is designed to simplify your analytics needs, and built for openness and
collaboration between Microsoft, Snowflake, and the 1000s of technology solutions that
can read the open-source Delta Lake table format.

What analytics experiences are built in?


Mirrored databases are an item in Fabric Data Warehousing distinct from the
Warehouse and SQL analytics endpoint.

Mirroring creates three items in your Fabric workspace:


The mirrored database item. Mirroring manages the replication of data into
OneLake and conversion to Parquet, in an analytics-ready format. This enables
downstream scenarios like data engineering, data science, and more.
A SQL analytics endpoint
A default semantic model

Each mirrored database has an autogenerated SQL analytics endpoint that provides a
rich analytical experience on top of the Delta Tables created by the mirroring process.
Users have access to familiar T-SQL commands that can define and query data objects
but not manipulate the data from the SQL analytics endpoint, as it's a read-only copy.
You can perform the following actions in the SQL analytics endpoint:

Explore the tables that reference data in your Delta Lake tables from Snowflake.
Create no code queries and views and explore data visually without writing a line
of code.
Develop SQL views, inline TVFs (Table-valued Functions), and stored procedures to
encapsulate your semantics and business logic in T-SQL.
Manage permissions on the objects.
Query data in other Warehouses and Lakehouses in the same workspace.

In addition to the SQL query editor, there's a broad ecosystem of tooling that can query
the SQL analytics endpoint, including SQL Server Management Studio (SSMS), the mssql
extension with Visual Studio Code, and even GitHub Copilot.

Security considerations
To enable Fabric mirroring, you will need user permissions for your Snowflake database
that contains the following permissions:

CREATE STREAM

SELECT table
SHOW tables

DESCRIBE tables

For more information, see Snowflake documentation on Access Control Privileges for
Streaming tables and Required Permissions for Streams .

) Important
Any granular security established in the source Snowflake warehouse must be re-
configured in the mirrored database in Microsoft Fabric. For more information, see
SQL granular permissions in Microsoft Fabric.

Mirrored Snowflake cost considerations


Fabric doesn't charge for network data ingress fees into OneLake for Mirroring. There
are no mirroring costs when your Snowflake data is being replicated into OneLake.

There are Snowflake compute and cloud query costs when data is being mirrored: virtual
warehouse compute and cloud services compute.

Snowflake virtual warehouse compute charges:


Compute charges will be charged on the Snowflake side if there are data
changes that are being read in Snowflake, and in turn are being mirrored into
Fabric.
Any metadata queries run behind the scenes to check for data changes are not
charged for any Snowflake compute; however, queries that do produce data
such as a SELECT * will wake up the Snowflake warehouse and compute will be
charged.
Snowflake services compute charges:
Although there aren't any compute charges for behind the scenes tasks such as
authoring, metadata queries, access control, showing data changes, and even
DDL queries, there are cloud costs associated with these queries.
Depending on what type of Snowflake edition you have, you will be charged for
the corresponding credits for any cloud services costs.

In the following screenshot, you can see the virtual warehouse compute and cloud
services compute costs for the associated Snowflake database that is being mirrored
into Fabric. In this scenario, majority of the cloud services compute costs (in yellow) are
coming from data change queries based on the points mentioned previously. The virtual
warehouse compute charges (in blue) are coming strictly from the data changes are
being read from Snowflake and mirrored into Fabric.


For more information of Snowflake specific cloud query costs, see Snowflake docs:
Understanding overall cost .

Next step
Tutorial: Configure Microsoft Fabric mirrored databases from Snowflake

Related content
How to: Secure data Microsoft Fabric mirrored databases from Snowflake
Model data in the default Power BI semantic model in Microsoft Fabric
Monitor Fabric mirrored database replication

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


Tutorial: Configure Microsoft Fabric
mirrored databases from Snowflake
Article • 11/19/2024

In this tutorial, you'll configure a Fabric mirrored database from Snowflake.

In this example, you will learn how to configure a secure connection to your Snowflake
data source(s) along with other helpful information to get you acquainted with and
proficient with the concepts of Mirroring in Microsoft Fabric.

7 Note

While this example is specific to Snowflake, you can find detailed steps to configure
Mirroring for other data sources, like Azure SQL Database or Azure Cosmos DB. For
more information, see What is Mirroring in Fabric?

Prerequisites
Create or use an existing Snowflake warehouse. You can connect to any version of
Snowflake instance in any cloud, including Microsoft Azure.
You need an existing Fabric capacity. If you don't, start a Fabric trial.
You will need user permissions for your Snowflake database that contains the
following permissions. For more information, see Snowflake documentation on
Access Control Privileges for Streaming tables and Required Permissions for
Streams .
CREATE STREAM

SELECT table
SHOW tables

DESCRIBE tables

The user needs to have at least one role assigned that allows access to the
Snowflake database.

Create a mirrored database


In this section, we'll provide a brief overview of how to create a new mirrored database
to use with your mirrored Snowflake data source.

You can use an existing workspace (not My Workspace) or create a new workspace.
1. From your workspace, navigate to the Create hub.
2. After you have selected the workspace that you would like to use, select Create.
3. Scroll down and select the Mirrored Snowflake card.
4. Enter the name for the new database.
5. Select Create.

Connect to your Snowflake instance in any


cloud

7 Note

You might need to alter the firewall cloud to allow Mirroring to connect to the
Snowflake instance.

1. Select Snowflake under "New connection" or selected an existing connection.

2. If you selected "New connection", enter the connection details to the Snowflake
database.

ノ Expand table

Connection Description
setting

Server You can find your server name by navigating to the accounts on the
resource menu in Snowflake. Hover your mouse over the account name,
you can copy the server name to the clipboard. Remove the https://
from the server name.

Warehouse From the Warehouses section from the resource menu in Snowflake,
select Warehouses. The warehouse is the Snowflake Warehouse
(Compute) and not the database.

Connection Create new connection.

Connection Should be automatically filled out. Change it to a name that you would
name like to use.

Authentication Snowflake
kind

Username Your Snowflake username that you created to sign into Snowflake.com.

Password Your Snowflake password that you created when you created your login
information into Snowflake.com.
3. Select database from dropdown list.

Start mirroring process


1. The Configure mirroring screen allows you to mirror all data in the database, by
default.

Mirror all data means that any new tables created after Mirroring is started
will be mirrored.

Optionally, choose only certain objects to mirror. Disable the Mirror all data
option, then select individual tables from your database.

For this tutorial, we select the Mirror all data option.

2. Select Mirror database. Mirroring begins.

3. Wait for 2-5 minutes. Then, select Monitor replication to see the status.

4. After a few minutes, the status should change to Running, which means the tables
are being synchronized.

If you don't see the tables and the corresponding replication status, wait a few
seconds and then refresh the panel.
5. When they have finished the initial copying of the tables, a date appears in the
Last refresh column.

6. Now that your data is up and running, there are various analytics scenarios
available across all of Fabric.

) Important

Any granular security established in the source database must be re-configured in


the mirrored database in Microsoft Fabric.

Monitor Fabric Mirroring


Once mirroring is configured, you're directed to the Mirroring Status page. Here, you
can monitor the current state of replication.

For more information and details on the replication states, see Monitor Fabric mirrored
database replication.

) Important

If there are no updates in the source tables, the replicator engine will start to back
off with an exponentially increasing duration, up to an hour. The replicator engine
will automatically resume regular polling after updated data is detected.

Related content
Mirroring Snowflake
What is Mirroring in Fabric?

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


Frequently asked questions for
Mirroring Snowflake in Microsoft
Fabric
FAQ

This article answers frequently asked questions about Mirroring Snowflake in Microsoft
Fabric.

Features and capabilities


Is there a staging or landing zone for Snowflake?
If so, is it outside of OneLake?
For Snowflake, we do use a landing zone to store both the snapshot and change data
into OneLake, to improve performance as we're converting these files in the landing
zones into delta vertiparquet.

Can views, transient or external tables be


replicated?
Currently, only replicating regular tables are supported.

How do I manage connections?


Select the settings cog, then select on Manage connection and gateways. You can also
delete existing connections from this page.

Cost efficiency
What should I do to avoid or reduce Snowflake
costs?
Implement Snowflake budgets, use limits on credits, or use dedicated a smaller
Snowflake instance based on requirements.
How are ingress fees handled?
Fabric doesn't charge for Ingress fees into OneLake for Mirroring.

How are egress fees handled?


If hosted outside of Azure, refer to Snowflake and your cloud documentation for egress
costs. If hosted in Azure but in a different region from your Fabric capacity, data egress
will be charged. If hosted in Azure in the same region, there is no data egress.

Performance
How long does the initial replication take?
It depends on the size of the data that is being brought in.

How long does it take to replicate


inserts/updates/deletes?
Near real-time latency.

Will the Power BI reports use direct lake mode?


Yes, tables are all v-ordered delta tables.

Troubleshoot Mirroring Snowflake in


Microsoft Fabric
What are the replication statuses?
See Monitor Fabric Mirror replication.

Can Snowflake Mirroring be accessed through


the Power BI Gateway or behind a firewall?
Currently, access through the Power BI Gateway or behind a firewall is unsupported.
What does starting the Mirroring do?
The data from source tables will be reinitialized. Each time you stop and start, the entire
table is fetched again.

What happens if I deselect a table from


Mirroring?
We stop Mirroring that specific table and delete it from OneLake.

If I delete the Mirror does it affect the source


mirrored database?
No, we just remove the streaming tables.

Can I Mirror the same database multiple times?


Yes, you can, but you shouldn't need to. Once the data is in Fabric, it can be shared from
there.

Can I Mirror specific tables from my source


database?
Yes, specific tables can be selected during Mirroring configuration.

Data governance
Is data ever leaving the customers Fabric tenant?
No.

Is data staged outside of a customer


environment?
No, data isn't staged outside of customer environment, it's staged in the customer's
OneLake.

Licensing
What are licensing options for Fabric Mirroring?
A Power BI Premium, Fabric Capacity, or Trial Capacity is required. For more information
on licensing, see Microsoft Fabric licenses.

What are the Fabric compute costs associated


with Mirroring?
There's no cost for Mirroring or storing mirrored data in Fabric, unless storage exceeds
provisioned capacity.

Related content
What is Mirroring in Fabric?
Snowflake connector overview

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


How to: Secure data Microsoft Fabric
mirrored databases from Snowflake
Article • 11/19/2024

This guide helps you establish data security in your mirrored Snowflake in Microsoft
Fabric.

Security considerations
To enable Fabric mirroring, you will need user permissions for your Snowflake database
that contains the following permissions:

CREATE STREAM

SELECT table

SHOW tables
DESCRIBE tables

For more information, see Snowflake documentation on Access Control Privileges for
Streaming tables and Required Permissions for Streams .

) Important

Any granular security established in the source Snowflake database must be re-
configured in the mirrored database in Microsoft Fabric. For more information, see
SQL granular permissions in Microsoft Fabric.

Data protection features


You can secure column filters and predicate-based row filters on tables to roles and
users in Microsoft Fabric:

Row-level security in Fabric data warehousing


Column-level security in Fabric data warehousing

You can also mask sensitive data from non-admins using dynamic data masking:

Dynamic data masking in Fabric data warehousing


Related content
What is Mirroring in Fabric?
SQL granular permissions in Microsoft Fabric

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


Limitations in Microsoft Fabric mirrored
databases from Snowflake
Article • 11/19/2024

Current limitations in the Microsoft Fabric mirrored databases from Snowflake are listed
in this page. This page is subject to change.

Database level limitations


Any table names and column names with special characters ,;{}()\= and spaces
are not replicated.
If there are no updates in a source table, the replicator engine starts to back off
with an exponentially increasing duration for that table, up to an hour. The same
can occur if there is a transient error, preventing data refresh. The replicator engine
will automatically resume regular polling after updated data is detected.
Only replicating native tables are supported. Currently, External, Transient,
Temporary, Dynamic tables are not supported.
The maximum number of tables that can be mirrored into Fabric is 500 tables. Any
tables above the 500 limit currently cannot be replicated.
If you select Mirror all data when configuring Mirroring, the tables to be
mirrored over will be determined by taking the first 500 tables when all tables
are sorted alphabetically based on the schema name and then the table name.
The remaining set of tables at the bottom of the alphabetical list will not be
mirrored over.
If you unselect Mirror all data and select individual tables, you are prevented
from selecting more than 500 tables.

Network and firewall


Currently, Mirroring does not support Snowflake instances behind a virtual
network or private networking. If your Snowflake instance is behind a private
network, you cannot enable Snowflake mirroring.

Security
Snowflake authentication only via username/password is supported.
Sharing recipients must be added to the workspace. To share a dataset or report,
first add access to the workspace with a role of admin, member, reader, or
contributor.

Performance
If you're changing most the data in a large table, it's more efficient to stop and
restart Mirroring. Inserting or updating billions of records can take a long time.
Some schema changes are not reflected immediately. Some schema changes need
a data change (insert/update/delete) before schema changes are replicated to
Fabric.

Fabric regions that support Mirroring


Mirroring for Snowflake is available everywhere that Fabric is available, except for West
US 3.

Related content
What is Mirroring in Fabric?
Mirroring Snowflake
Tutorial: Configure Microsoft Fabric mirrored databases from Snowflake

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


Open mirroring in Microsoft Fabric
(Preview)
Article • 11/19/2024

Mirroring in Fabric provides an easy experience to avoid complex ETL (Extract Transform
Load) and integrate your existing data into OneLake with the rest of your data in
Microsoft Fabric. You can continuously replicate your existing data directly into Fabric's
OneLake. Inside Fabric, you can unlock powerful business intelligence, artificial
intelligence, Data Engineering, Data Science, and data sharing scenarios.

Open mirroring enables any application to write change data directly into a mirrored
database in Fabric. Open mirroring is designed to be extensible, customizable, and
open. It's a powerful feature that extends mirroring in Fabric based on open Delta Lake
table format.

Once the data lands in OneLake in Fabric, open mirroring simplifies the handling of
complex data changes, ensuring that all mirrored data is continuously up-to-date and
ready for analysis.

) Important

This feature is in preview.

For a tutorial on configuring your open mirrored database in Fabric, see Tutorial:
Configure Microsoft Fabric open mirrored databases.

Why use open mirroring in Fabric?


Open mirroring extends the Mirroring in Fabric capability to your own applications, or
existing data providers to land data into a mirrored database within OneLake in Fabric.
Once the data lands in the landing zone, the mirroring replication engine manages the
complexity of changes and converts data into Delta Parquet, an analytics-ready format.
In the OneLake, your data can be analyzed and consumed by all the experiences in
Fabric.

Open mirroring meets your data replication needs if you:

Use your own application to write data into the open mirroring landing zone per
the open mirroring landing zone requirements and formats.
Use one of our existing open mirroring partners to help you ingest data.

What analytics experiences are built in?


All types of mirrored databases are an item in Fabric Data Warehousing distinct from
the Warehouse and SQL analytics endpoint.

Mirroring creates three items in your Fabric workspace:

The mirrored database item. Mirroring manages the replication of data into
OneLake and conversion into Delta Parquet format, and manages the complexity
of the changes, in an analytics-ready format. This enables downstream scenarios
like data engineering, data science, and more.
A SQL analytics endpoint
A default semantic model

Each open mirrored database has an autogenerated SQL analytics endpoint that
provides a rich analytical experience on top of the Delta Tables created by the mirroring
process. Users have access to familiar T-SQL commands that can define and query data
objects but not manipulate the data from the SQL analytics endpoint, as it's a read-only
copy. You can perform the following actions in the SQL analytics endpoint:

Explore the tables that reference data in your Delta Lake tables.
Create no code queries and views and explore data visually without writing a line
of code.
Develop SQL views, inline TVFs (Table-valued Functions), and stored procedures to
encapsulate your semantics and business logic in T-SQL.
Manage permissions on the objects.
Query data in other Warehouses and Lakehouses in the same workspace.

In addition to the SQL query editor, there's a broad ecosystem of tooling that can query
the SQL analytics endpoint, including SQL Server Management Studio (SSMS), the mssql
extension with Visual Studio Code, and even GitHub Copilot.

Open mirroring cost considerations


Same as for all types of mirroring in Fabric, open mirroring offers a free terabyte of
mirroring storage for every capacity unit (CU) you have purchased and provisioned. For
example, if you purchase F64, you get 64 free terabytes worth of storage for your
mirrored replicas. OneLake storage is billed only when the free mirroring storage limit is
exceeded, or the capacity is paused.

In addition, the compute needed to manage the complexity of change data is free and it
doesn't consume capacity. Requests to OneLake as part of the mirroring process
consume capacity like normal with OneLake compute consumption.

Next step
Tutorial: Configure Microsoft Fabric open mirrored databases

Related content
Monitor Fabric mirrored database replication

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


Tutorial: Configure Microsoft Fabric
open mirrored databases
Article • 11/19/2024

In this tutorial, you configure an Open mirrored database in Fabric. This example guides
you to create a new open mirrored database and learn how to land data into the
landing zone. You'll get proficient with the concepts of open mirroring in Microsoft
Fabric.

) Important

This feature is in preview.

Prerequisites
You need an existing capacity for Fabric. If you don't, start a Fabric trial.
The Fabric capacity needs to be active and running. A paused or deleted
capacity will affect Mirroring and no data will be replicated.
During the current preview, the ability to create an open mirrored database via the
Fabric portal is not available in all Fabric capacity regions.

Create a mirrored database


In this section, we provide a brief overview of how to create a new open mirrored
database in the Fabric portal. Alternatively, you could use the Create mirrored database
REST API together with the JSON definition example of open mirroring for creation.

1. Use an existing workspace or create a new workspace. From your workspace,


navigate to the Create hub. Select Create.
2. Locate and select the Mirrored Database card.
3. Enter a name for the new mirrored database.
4. Select Create.
5. Once an Open mirrored database is created via the user interface, the mirroring
process is ready. Review the Home page for the new mirrored database item.
Locate the Landing zone URL is in the details section of the mirrored database
home page.

Write change data into the landing zone


Your application can now write initial load and incremental change data into the landing
zone.

Follow the Connecting to Microsoft OneLake to authorize and write to the


mirrored database landing zone in OneLake.
Review the Open mirroring landing zone requirements and format specifications.

Start mirroring process


1. The Configure mirroring screen allows you to mirror all data in the database, by
default.

Mirror all data means that any new tables created after Mirroring is started
will be mirrored.
Optionally, choose only certain objects to mirror. Disable the Mirror all data
option, then select individual tables from your database. For this tutorial, we
select the Mirror all data option.

2. Select Mirror database. Mirroring begins.


3. Wait for 2-5 minutes. Then, select Monitor replication to see the status.
4. After a few minutes, the status should change to Running, which means the tables
are being synchronized. If you don't see the tables and the corresponding
replication status, wait a few seconds and then refresh the panel.
5. When they have finished the initial copying of the tables, a date appears in the
Last refresh column.
6. Now that your data is up and running, there are various analytics scenarios
available across all of Fabric.

Monitor Fabric Mirroring


Once mirroring is configured, you're directed to the Mirroring Status page. Here, you
can monitor the current state of replication.

For more information and details on the replication states, see Monitor Fabric mirrored
database replication.

Related content
Connecting to Microsoft OneLake
Open mirroring landing zone requirements and format

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


Open mirroring landing zone
requirements and format
Article • 11/19/2024

This article details the landing zone and table/column operation requirements for open
mirroring in Microsoft Fabric.

) Important

This feature is in preview.

Once you have created your open mirrored database via the Fabric portal or public API
in your Fabric workspace, you get a landing zone URL in OneLake in the Home page of
your mirrored database item. This landing zone is where your application to create a
metadata file and land data in Parquet format (uncompressed, Snappy, GZIP, ZSTD).

Landing zone
For every mirrored database, there is a unique storage location in OneLake for metadata
and delta tables. Open mirroring provides a landing zone folder for application to create
a metadata file and push data into OneLake. Mirroring monitors these files in the
landing zone and read the folder for new tables and data added.

For example, if you have tables ( Table A , Table B , Table C ) to be created in the landing
zone, create folders like the following URLs:

https://onelake.dfs.fabric.microsoft.com/<workspace id>/<mirrored database

id>/LandingZone/TableA
https://onelake.dfs.fabric.microsoft.com/<workspace id>/<mirrored database

id>/LandingZone/TableB
https://onelake.dfs.fabric.microsoft.com/<workspace id>/<mirrored database
id>/LandingZone/TableC

Metadata file in the landing zone


Every table folder must contain a _metadata.json file.

This table metadata file contains a JSON record to currently specify only the unique key
columns as keyColumns .

For example, to declare columns C1 and C2 as a compound unique key for the table:

JSON

"keyColumns": ["C1", "C2"],

If keyColumns or _metadata.json is not specified, then update/deletes are not possible.


This file can be added anytime, but once added keyColumns can't be changed.

Data file and format in the landing zone


Open mirroring supports Parquet as the landing zone file format with or without
compression. Supported compression formats include Snappy, GZIP, and ZSTD.

All the Parquet files written to the landing zone have the following format:

<RowMarker><DataColumns>

RowMarker : column name is __rowMarker__ (including two underscores before and

after rowMarker ).
RowMaker values:

0 for INSERT
1 for UPDATE

2 for DELETE

4 for UPSERT

Row order: All the logs in the file should be in natural order as applied in
transaction. This is important for the same row being updated multiple times.
Open mirroring applies the changes using the order in the files.
File order: Files should be added in monotonically increasing numbers.

File name: File name is 20 digits, like 00000000000000000001.parquet for the first file,
and 00000000000000000002.parquet for the second. File names should be in
continuous numbers. Files will be deleted by the mirroring service automatically,
but the last file will be left so that the publisher system can reference it to add the
next file in sequence.

Initial load
For the initial load of data into an open mirrored database, all rows should have INSERT
as row marker. Without RowMarker data in a file, mirroring treats the entire file as an
INSERT.

Incremental changes
Open mirroring reads incremental changes in order and applies them to the target Delta
table. Order is implicit in the change log and in the order of the files.

Updated rows must contain the full row data, with all columns.

Here is some sample parquet data of the row history to change the EmployeeLocation
for EmployeeID E0001 from Redmond to Bellevue. In this scenario, the EmployeeID
column has been marked as a key column in the metadata file in the landing zone.

parquet

__rowMarker__,EmployeeID,EmployeeLocation
0,E0001,Redmond
0,E0002,Redmond
0,E0003,Redmond
1,E0001,Bellevue

If key columns are updated, then it should be presented by a DELETE on previous key
columns and an INSERT rows with new key and data. For example, the row history to
change the RowMarker unique identifier for EmployeeID E0001 to E0002. You don't need
to provide all column data for a DELETE row, only the key columns.

parquet

__rowMarker__,EmployeeID,EmployeeLocation
0,E0001,Bellevue
2,E0001,NULL
0,E0002,Bellevue

Table operations
Open mirroring supports table operations such as add, drop, and rename tables.

Add table
Open mirroring picks up any table added to landing zone by the application. Open
mirroring scans for new tables in every iteration.

Drop table
Open mirroring keeps track of the folder name. If a table folder is deleted, open
mirroring drops the table in the mirrored database.

If a folder is recreated, open mirroring drops the table and recreates it with the new data
in the folder, accomplished by tracking the ETag for the folder.

When attempting to drop a table, you can try deleting the folder, but there is a chance
that open mirroring is still using the data from the folder, causing a delete failure for
publisher.

Rename table
To rename a table, drop and recreate the folder with initial and incremental data. Data
will need to be repopulated to the renamed table.

Schema
A table path can be specified within a schema folder. A schema landing zone should
have a <schemaname>.schema folder name. There can be multiple schemas and there can
be multiple tables in a schema.

For example, if you have schemas ( Schema1 , Schema2 ) and tables ( Table A , Table B ,
Table C ) to be created in the landing zone, create folders like the following paths in

OneLake:

https://onelake.dfs.fabric.microsoft.com/<workspace id>/<mirrored database

id>/LandingZone/Schema1.schema/TableA
https://onelake.dfs.fabric.microsoft.com/<workspace id>/<mirrored database
id>/LandingZone/Schema1.schema/TableB

https://onelake.dfs.fabric.microsoft.com/<workspace id>/<mirrored database


id>/LandingZone/Schema2.schema/TableC

Table columns and column operations

Column types
Simple parquet types are supported in the landing zone.
Complex types should be written as a JSON string.
Binary complex types like geography, images, etc. can be stored as binary type in
the landing zone.

Add column
If new columns are added to the parquet files, open mirroring adds the columns to the
delta tables.

Delete column
If a column is dropped from the new log files, open mirroring stores NULL for those
columns in new rows, and old rows have the columns present in the data. To delete the
column, drop the table and create the table folder in the landing zone again, which will
result into recreation of the Delta table with new schema and data.

Open mirroring always unions all the columns from previous version of added data. To
remove a column, recreate the table/folder.

Change column type


To change a column type, drop and recreate the folder with initial and incremental data
with the new column type. Providing a new column type without recreating the table
results an error, and replication for that table will stop. Once the table folder is
recreated, replication resumes with new data and schema.

Rename column
To rename a column, delete the table folder and recreate the folder with all the data and
with the new column name.

Next step
Tutorial: Configure Microsoft Fabric open mirrored databases

Related content
Monitor Fabric mirrored database replication

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


Frequently asked questions for
open mirroring in Microsoft Fabric
FAQ

This article answers frequently asked questions about open mirroring in Microsoft
Fabric.

) Important

This feature is in preview.

Features and capabilities


Is there a staging or landing zone for Snowflake?
If so, is it outside of OneLake?
For Snowflake, we do use a landing zone to store both the snapshot and change data
into OneLake, to improve performance as we're converting these files in the landing
zones into delta vertiparquet.

How long does the initial replication take?


It depends on the size of the data and the data provider or application that you use to
write data into open mirroring.

How long does it take to replicate data once it is


in the landing zone?
Near real-time latency.

How do I manage connections?


There's no connection to manage in open mirroring.

Can Power BI reports on mirrored data use direct


lake mode?
Yes, since tables are all v-ordered delta tables.

What are the replication statuses?


See Monitor Fabric Mirror replication.

Cost management
What are the costs associated with Mirroring?
See Open mirroring cost considerations.

Related content
Open mirroring landing zone requirements and format

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


Open mirroring (preview) partner
ecosystem
Article • 11/20/2024

Open mirroring in Microsoft Fabric (Preview) is designed to be extensible, customizable,


and open. It's a powerful feature that extends Mirroring in Fabric based on open Delta
Lake table format. This capability enables any data providers to write change data
directly into a mirrored database item in Microsoft Fabric.

The following are the open mirroring partners who have already built solutions to
integrate with Microsoft Fabric.

) Important

This feature is in preview.

This page will be updated during the current preview.

Oracle GoldenGate 23ai


Oracle GoldenGate 23ai integration into Microsoft Fabric via open mirroring. Now, any
supported Oracle GoldenGate source including Oracle Database@Azure can replicate
data into Mirrored Database in Microsoft Fabric. This powerful combination unlocks
real-time data integration, continuously synchronizing data across your hybrid and
multicloud environments. Mirrored Database in Microsoft Fabric as a destination is
available through the GoldenGate for Distributed Applications and Analytics 23ai
product.

For more information, see Oracle GoldenGate 23ai integration into open mirroring in
Microsoft Fabric .

Striim
SQL2Fabric-Mirroring is a Striim solution that reads data from SQL Server and writes it
to Microsoft Fabric's mirroring landing zone in Delta-Parquet format. Microsoft's Fabric
replication service frequently picks up these files and replicates the file contents into
Fabric data warehouse tables.

For more information, see Striim integration into open mirroring in Microsoft Fabric .
MongoDB
MongoDB integrated with open mirroring for a solution to bring operational data from
MongoDB Atlas to Microsoft Fabric for Big data analytics, AI and BI, combining it with
the rest of the data estate of the enterprise. Once mirroring is enabled for a MongoDB
Atlas collection, the corresponding table in OneLake is kept in sync with the changes in
source MongoDB Atlas collection, unlocking opportunities of varied analytics and AI and
BI in near real-time.

For more information, see MongoDB integration into open mirroring in Microsoft
Fabric .

Related content
Tutorial: Configure Microsoft Fabric open mirrored databases

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


Microsoft Fabric mirroring public REST
API
Article • 11/21/2024

The public APIs for Fabric mirroring consist of two categories: (1) CRUD operations for
Fabric mirrored database item and (2) Start/stop and monitoring operations. The
primary online reference documentation for Microsoft Fabric REST APIs can be found in
Microsoft Fabric REST API references.

7 Note

These REST APIs don't apply to mirrored database from Azure Databricks.

Create mirrored database


REST API - Items - Create mirrored database

Before you create mirrored database, the corresponding data source connection is
needed. If you don't have a connection yet, refer to create new connection using portal
and use that connection ID in the following definition. You can also refer to create new
connection REST API to create new connection using Fabric REST APIs.

Example:

POST https://api.fabric.microsoft.com/v1/workspaces/<your workspace

ID>/mirroredDatabases

Body:

JSON

{
"displayName": "Mirrored database 1",
"description": "A mirrored database description",
"definition": {
"parts": [
{
"path": "mirroring.json",
"payload": "eyAicHJvcGVydGllcy..WJsZSIgfSB9IH0gXSB9IH0",
"payloadType": "InlineBase64"
}
]
}
}

The payload property in previous JSON body is Base64 encoded. You can use Base64
Encode and Decode to encode. The original JSON definition examples for different
types of sources follow:

JSON definition example of Snowflake


JSON definition example of Azure SQL Database
JSON definition example of Azure SQL Managed Instance
JSON definition example of Azure Cosmos DB
JSON definition example of open mirroring

If you want to replicate selective tables instead of all the tables in the specified
database, refer to JSON definition example of replicating specified tables.

) Important

To mirror data from Azure SQL Database or Azure SQL Managed Instance, you need
to also do the following before start mirroring:

1. Enable System Assigned Managed Identity (SAMI) of your Azure SQL logical
server or Azure SQL Managed Instance.
2. Grant the SAMI Read and Write permission to the mirrored database.
Currently you need to do this on the Fabric portal. Alternativley, you can grant
SAMI workspace role using Add Workspace Role Assignment API.

JSON definition example of Snowflake


JSON

{
"properties": {
"source": {
"type": "Snowflake",
"typeProperties": {
"connection": "a0a0a0a0-bbbb-cccc-dddd-e1e1e1e1e1e1",
"database": "xxxx"
}
},
"target": {
"type": "MountedRelationalDatabase",
"typeProperties": {
"defaultSchema": "xxxx",
"format": "Delta"
}
}
}
}

JSON definition example of Azure SQL Database


JSON

{
"properties": {
"source": {
"type": "AzureSqlDatabase",
"typeProperties": {
"connection": "a0a0a0a0-bbbb-cccc-dddd-e1e1e1e1e1e1"
}
},
"target": {
"type": "MountedRelationalDatabase",
"typeProperties": {
"defaultSchema": "xxxx",
"format": "Delta"
}
}
}
}

JSON definition example of Azure SQL Managed Instance


JSON

{
"properties": {
"source": {
"type": "AzureSqlMI",
"typeProperties": {
"connection": "a0a0a0a0-bbbb-cccc-dddd-e1e1e1e1e1e1"
}
},
"target": {
"type": "MountedRelationalDatabase",
"typeProperties": {
"defaultSchema": "xxxx",
"format": "Delta"
}
}
}
}

JSON definition example of Azure Cosmos DB


JSON

{
"properties": {
"source": {
"type": "CosmosDb",
"typeProperties": {
"connection": "a0a0a0a0-bbbb-cccc-dddd-e1e1e1e1e1e1",
"database": "xxxx"
}
},
"target": {
"type": "MountedRelationalDatabase",
"typeProperties": {
"defaultSchema": "xxxx",
"format": "Delta"
}
}
}
}

JSON definition example of open mirroring


JSON

{
"properties": {
"source": {
"type": "GenericMirror",
"typeProperties": {}
},
"target": {
"type": "MountedRelationalDatabase",
"typeProperties": {
"format": "Delta"
}
}
}
}

JSON definition example of replicating specified tables


The previous examples apply to the scenario that automatically replicates all the tables
in the specified database. If you want to specify the tables to replicate, you can specify
the mountedTables property, as in the following example.

JSON

{
"properties": {
"source": {
"type": "Snowflake",
"typeProperties": {
"connection": "a0a0a0a0-bbbb-cccc-dddd-e1e1e1e1e1e1",
"database": "xxxx"
}
},
"target": {
"type": "MountedRelationalDatabase",
"typeProperties": {
"defaultSchema": "xxxx",
"format": "Delta"
}
},
"mountedTables": [
{
"source": {
"typeProperties": {
"schemaName": "xxxx",
"tableName": "xxxx"
}
}
}
]
}
}

Response 201:

JSON

{
"id": "<mirrored database ID>",
"type": "MirroredDatabase",
"displayName": "Mirrored database 1",
"description": "A mirrored database description",
"workspaceId": "<your workspace ID>"
}

Delete mirrored database


REST API - Items - Delete mirrored database

Example:

DELETE https://api.fabric.microsoft.com/v1/workspaces/<your workspace


ID>/mirroredDatabases/<mirrored database ID>

Response 200: (No body)

Get mirrored database


REST API - Items - Get mirrored database

Example:

GET https://api.fabric.microsoft.com/v1/workspaces/<your workspace

ID>/mirroredDatabases/<mirrored database ID>

Response 200:

JSON

{
"displayName": "Mirrored database 1",
"description": "A mirrored database description.",
"type": "MirroredDatabase",
"workspaceId": "<your workspace ID>",
"id": "<mirrored database ID>",
"properties": {
"oneLakeTablesPath": "https://onelake.dfs.fabric.microsoft.com/<your
workspace ID>/<mirrored database ID>/Tables",
"sqlEndpointProperties": {
"connectionString": "xxxx.xxxx.fabric.microsoft.com",
"id": "b1b1b1b1-cccc-dddd-eeee-f2f2f2f2f2f2",
"provisioningStatus": "Success"
},
"defaultSchema": "xxxx"
}
}

Get mirrored database definition


REST API - Items - Get mirrored database definition

Example:
POST https://api.fabric.microsoft.com/v1/workspaces/<your workspace
ID>/mirroredDatabases/<mirrored database ID>/getDefinition

Response 200:

JSON

{
"definition": {
"parts":[
{
"path": "mirroring.json",
"payload": "eyAicHJvcGVydGllcy..WJsZSIgfSB9IH0gXSB9IH0",
"payloadType": "InlineBase64"
}
]
}
}

List mirrored databases


REST API - Items - List mirrored databases

Example:

GET https://api.fabric.microsoft.com/v1/workspaces/<your workspace

ID>/mirroredDatabases

Response 200:

JSON

{
"value": [
{
"displayName": "Mirrored database 1",
"description": "A mirrored database description.",
"type": "MirroredDatabase",
"workspaceId": "<your workspace ID>",
"id": "<mirrored database ID>",
"properties": {
"oneLakeTablesPath":
"https://onelake.dfs.fabric.microsoft.com/<your workspace ID>/<mirrored
database ID>/Tables",
"sqlEndpointProperties": {
"connectionString": "xxxx.xxxx.fabric.microsoft.com",
"id": "b1b1b1b1-cccc-dddd-eeee-f2f2f2f2f2f2",
"provisioningStatus": "Success"
},
"defaultSchema": "xxxx"
}
}
]
}

Update mirrored database


REST API - Items - Update mirrored database

Example:

PATCH https://api.fabric.microsoft.com/v1/workspaces/<your workspace

ID>/mirroredDatabases/<mirrored database ID>

Body:

JSON

{
"displayName": "MirroredDatabase's New name",
"description": "A new description for mirrored database."
}

Response 200:

JSON

{
"displayName": "MirroredDatabase's New name",
"description": "A new description for mirrored database.",
"type": "MirroredDatabase",
"workspaceId": "<your workspace ID>",
"id": "<mirrored database ID>"
}

Update mirrored database definition


REST API - Items - Update mirrored database definition

Example:

POST https://api.fabric.microsoft.com/v1/workspaces/<your workspace

ID>/mirroredDatabases/<mirrored database ID>/updateDefinition


Body:

JSON

{
"definition": {
"parts": [
{
"path": "mirroring.json",
"payload": "eyAicHJvcGVydGllcy..WJsZSIgfSB9IH0gXSB9IH0",
"payloadType": "InlineBase64"
}
]
}
}

Response 200: (No body)

7 Note

This API supports adding/removing tables by refreshing the mountedTables


property. It also supports updating the source connection ID, database name and
default schema (these three properties can only be updated when Get mirroring
status api returns Initialized / Stopped ).

Get mirroring status


REST API - Mirroring - Get mirroring status

This API returns the status of mirrored database instance. The list of available statuses
are provided at values of MirroringStatus.

Example:

POST https://api.fabric.microsoft.com/v1/workspaces/<your workspace


ID>/mirroredDatabases/<mirrored database ID>/getMirroringStatus

Response 200:

JSON

{
"status": "Running"
}
Start mirroring
REST API - Mirroring - Start mirroring

Example:

POST https://api.fabric.microsoft.com/v1/workspaces/<your workspace

ID>/mirroredDatabases/<mirrored database ID>/startMirroring

Response 200: (No body)

7 Note

Mirroring can not be started when above Get mirroring status api returns
Initializing status.

Get tables mirroring status


REST API - Mirroring - Get tables mirroring status

If mirroring is started and Get mirroring status API returns Running status, this API
returns the status and metrics of tables replication.

Example:

POST https://api.fabric.microsoft.com/v1/workspaces/<your workspace


ID>/mirroredDatabases/<mirrored database ID>/getTablesMirroringStatus

Response 200:

JSON

{
"continuationToken": null,
"continuationUri": null,
"data": [
{
"sourceSchemaName": "dbo",
"sourceTableName": "test",
"status": "Replicating",
"metrics": {
"processedBytes": 1247,
"processedRows": 6,
"lastSyncDateTime": "2024-10-08T05:07:11.0663362Z"
}
}
]
}

Stop mirroring
REST API - Mirroring - Stop mirroring

Example:

POST https://api.fabric.microsoft.com/v1/workspaces/<your workspace

ID>/mirroredDatabases/<mirrored database ID>/stopMirroring

Response 200: (No body)

7 Note

After stopping mirroring, you can call Get mirroring status api to query the
mirroring status.

Microsoft Fabric .NET SDK


The .NET SDK that supports Fabric mirroring is available at Microsoft Fabric .NET SDK .
The version needs to be >= 1.0.0-beta.11.

Known limitations
Currently Service Principal/Managed Identity authentication is not supported if your
tenant home region is in North Central US or East US. You can use it in other regions.

Related content
REST API - Items

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


Troubleshoot Fabric mirrored databases
Article • 11/19/2024

Scenarios, resolutions, and workarounds for Microsoft Fabric mirrored databases.

) Important

This feature is in preview.

Resources
Review the troubleshooting section of frequently asked questions for each data source:

Troubleshoot Mirroring Azure SQL Database and FAQ about Mirroring Azure SQL
Database
Troubleshoot Mirroring Azure SQL Managed Instance and FAQ about Mirroring
Azure SQL Managed Instance
Troubleshoot Mirroring Azure Cosmos DB and FAQ about Mirroring Azure Cosmos
DB
Troubleshoot Mirroring Snowflake
FAQ about Mirroring Azure Databricks
Troubleshoot mirroring from Fabric SQL database (preview) and FAQ for Mirroring
Fabric SQL database (preview)

Review limitations documentation for each data source:

Limitations in Microsoft Fabric mirrored databases from Azure SQL Database


Limitations in Microsoft Fabric mirrored databases from Azure SQL Managed
Instance (Preview)
Limitations in Microsoft Fabric mirrored databases from Azure Cosmos DB
(Preview)
Limitations in Microsoft Fabric mirrored databases from Azure Databricks (Preview)
Limitations in Microsoft Fabric mirrored databases from Snowflake
Limitations in mirroring from Fabric SQL database

Stop replication
When you select Stop replication, OneLake files remain as is, but incremental replication
stops. You can restart the replication at any time by selecting Start replication. You
might want to do stop/start replication when resetting the state of replication, after
source database changes, or as a troubleshooting tool.

Troubleshoot
This section contains general Mirroring troubleshooting steps.

I can't connect to a source database

1. Check your connection details are correct, server name, database name, username,
and password.
2. Check the server is not behind a firewall or private virtual network. Open the
appropriate firewall ports.

No views are replicated

Currently, views are not supported. Only replicating regular tables are supported.

No tables are being replicated

1. Check the monitoring status to check the status of the tables. For more
information, see Monitor Fabric mirrored database replication.
2. Select the Configure replication button. Check to see if the tables are present in
the list of tables, or if any Alerts on each table detail are present.

Columns are missing from the destination table

1. Select the Configure replication button.


2. Select the Alert icon next to the table detail if any columns are not being
replicated.

Some of the data in my column appears to be truncated

The Fabric warehouse does not support VARCHAR(max) it only currently supports
VARCHAR(8000).

Data doesn't appear to be replicating

In the Monitoring page, the date shown is the last time data was successfully replicated.
I can't change the source database
Changing the source database is not supported. Create a new mirrored database.

Limits error messages


These common error messages have explanations and mitigations:

ノ Expand table

Error message Reason Mitigation

"The replication is being There's a maximum of 10 TB of In the source database,


throttled due to destination storage space in destination per drop tables, remove data,
space limit." Mirrored database. The replication or shard.
is being throttled due to
destination space limit.

"The tables count may There's a maximum of 500 tables. In the source database,
exceed the limit, there could drop or filter tables. If the
be some tables missing." new table is the 500th
table, no mitigation
required.

"The replication is being There's a maximum of 1 TB of Wait for throttling to end.


throttled and expected to change data captured per Mirrored
continue at YYYY-MM- database per day.
DDTHH:MM:ss."

Related content
What is Mirroring in Fabric?
Monitor Fabric mirrored database replication

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


Statistics in Fabric data warehousing
Article • 09/10/2024

Applies to: ✅ SQL analytics endpoint and Warehouse in Microsoft Fabric

The Warehouse in Microsoft Fabric uses a query engine to create an execution plan for a
given SQL query. When you submit a query, the query optimizer tries to enumerate all
possible plans and choose the most efficient candidate. To determine which plan would
require the least overhead (I/O, CPU, memory), the engine needs to be able to evaluate
the amount of work or rows that might be processed at each operator. Then, based on
each plan's cost, it chooses the one with the least amount of estimated work. Statistics
are objects that contain relevant information about your data, to allow query optimizer
to estimate these costs.

How to use statistics


To achieve optimal query performance, it is important to have accurate statistics.
Microsoft Fabric currently supports the following paths to provide relevant and up-to-
date statistics:

User-defined statistics
The user manually uses data definition language (DDL) syntax to create, update,
and drop statistics as needed
Automatic statistics
Engine automatically creates and maintains statistics at querytime

Manual statistics for all tables


The traditional option of maintaining statistics health is available in Microsoft Fabric.
Users can create, update, and drop histogram-based single-column statistics with
CREATE STATISTICS, UPDATE STATISTICS, and DROP STATISTICS, respectively. Users can
also view the contents of histogram-based single-column statistics with DBCC
SHOW_STATISTICS. Currently, a limited version of these statements is supported.

If creating statistics manually, consider focusing on columns heavily used in your


query workload (specifically in GROUP BYs, ORDER BYs, filters, and JOINs).
Consider updating column-level statistics regularly after data changes that
significantly change rowcount or distribution of the data.

Examples of manual statistics maintenance


To create statistics on the dbo.DimCustomer table, based on all the rows in a column
CustomerKey :

SQL

CREATE STATISTICS DimCustomer_CustomerKey_FullScan


ON dbo.DimCustomer (CustomerKey) WITH FULLSCAN;

To manually update the statistics object DimCustomer_CustomerKey_FullScan , perhaps


after a large data update:

SQL

UPDATE STATISTICS dbo.DimCustomer (DimCustomer_CustomerKey_FullScan) WITH


FULLSCAN;

To show information about the statistics object:

SQL

DBCC SHOW_STATISTICS ("dbo.DimCustomer",


"DimCustomer_CustomerKey_FullScan");

To show only information about the histogram of the statistics object:

SQL

DBCC SHOW_STATISTICS ("dbo.DimCustomer", "DimCustomer_CustomerKey_FullScan")


WITH HISTOGRAM;

To manually drop the statistics object DimCustomer_CustomerKey_FullScan :

SQL

DROP STATISTICS dbo.DimCustomer.DimCustomer_CustomerKey_FullScan;

The following T-SQL objects can also be used to check both manually created and
automatically created statistics in Microsoft Fabric:

sys.stats catalog view


sys.stats_columns catalog view
STATS_DATE system function
Automatic statistics at query
Whenever you issue a query and query optimizer requires statistics for plan exploration,
Microsoft Fabric automatically creates those statistics if they don't already exist. Once
statistics have been created, query optimizer can utilize them in estimating the plan
costs of the triggering query. In addition, if the query engine determines that existing
statistics relevant to query no longer accurately reflect the data, those statistics are
automatically refreshed. Because these automatic operations are done synchronously,
you can expect the query duration to include this time if the needed statistics do not yet
exist or significant data changes have happened since the last statistics refresh.

Verify automatic statistics at querytime


There are various cases where you can expect some type of automatic statistics. The
most common are histogram-based statistics, which are requested by the query
optimizer for columns referenced in GROUP BYs, JOINs, DISTINCT clauses, filters (WHERE
clauses), and ORDER BYs. For example, if you want to see the automatic creation of
these statistics, a query will trigger creation if statistics for COLUMN_NAME do not yet exist.
For example:

SQL

SELECT <COLUMN_NAME>
FROM <YOUR_TABLE_NAME>
GROUP BY <COLUMN_NAME>;

In this case, you should expect that statistics for COLUMN_NAME to have been created. If
the column was also a varchar column, you would also see average column length
statistics created. If you'd like to validate statistics were automatically created, you can
run the following query:

SQL

select
object_name(s.object_id) AS [object_name],
c.name AS [column_name],
s.name AS [stats_name],
s.stats_id,
STATS_DATE(s.object_id, s.stats_id) AS [stats_update_date],
s.auto_created,
s.user_created,
s.stats_generation_method_desc
FROM sys.stats AS s
INNER JOIN sys.objects AS o
ON o.object_id = s.object_id
LEFT JOIN sys.stats_columns AS sc
ON s.object_id = sc.object_id
AND s.stats_id = sc.stats_id
LEFT JOIN sys.columns AS c
ON sc.object_id = c.object_id
AND c.column_id = sc.column_id
WHERE o.type = 'U' -- Only check for stats on user-tables
AND s.auto_created = 1
AND o.name = '<YOUR_TABLE_NAME>'
ORDER BY object_name, column_name;

Now, you can find the statistics_name of the automatically generated histogram
statistic (should be something like _WA_Sys_00000007_3B75D760 ) and run the following T-
SQL:

SQL

DBCC SHOW_STATISTICS ('<YOUR_TABLE_NAME>', '<statistics_name>');

For example:

SQL

DBCC SHOW_STATISTICS ('sales.FactInvoice', '_WA_Sys_00000007_3B75D760');

The Updated value in the result set of DBCC SHOW_STATISTICS should be a date (in UTC)
similar to when you ran the original GROUP BY query.

These automatically generated statistics can then be leveraged in subsequent queries by


the query engine to improve plan costing and execution efficiency. If enough changes
occur in table, the query engine will also refresh those statistics to improve query
optimization. The same previous sample exercise can be applied after changing the
table significantly. In Fabric, the SQL query engine uses the same recompilation
threshold as SQL Server 2016 (13.x) to refresh statistics.

Types of automatically generated statistics


In Microsoft Fabric, there are multiple types of statistics that are automatically generated
by the engine to improve query plans. Currently, they can be found in sys.stats although
not all are actionable:

Histogram statistics
Created per column needing histogram statistics at querytime
These objects contain histogram and density information regarding the
distribution of a particular column. Similar to the statistics automatically created
at querytime in Azure Synapse Analytics dedicated pools.
Name begins with _WA_Sys_ .
Contents can be viewed with DBCC SHOW_STATISTICS
Average column length statistics
Created for variable character columns (varchar) greater than 100 needing
average column length at querytime.
These objects contain a value representing the average row size of the varchar
column at the time of statistics creation.
Name begins with ACE-AverageColumnLength_ .
Contents cannot be viewed and are nonactionable by user.
Table-based cardinality statistics
Created per table needing cardinality estimation at querytime.
These objects contain an estimate of the rowcount of a table.
Named ACE-Cardinality .
Contents cannot be viewed and are nonactionable by user.

Limitations
Only single-column histogram statistics can be manually created and modified.
Multi-column statistics creation is not supported.
Other statistics objects might appear in sys.stats, aside from manually created
statistics and automatically created statistics. These objects are not used for query
optimization.

Related content
Monitoring connections, sessions, and requests using DMVs

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


Caching in Fabric data warehousing
Article • 11/19/2024

Applies to: ✅ SQL analytics endpoint and Warehouse in Microsoft Fabric

Retrieving data from the data lake is crucial input/output (IO) operation with substantial
implications for query performance. Fabric Data Warehouse employs refined access
patterns to enhance data reads from storage and elevate query execution speed.
Additionally, it intelligently minimizes the need for remote storage reads by leveraging
local caches.

Caching is a technique that improves the performance of data processing applications


by reducing the IO operations. Caching stores frequently accessed data and metadata in
a faster storage layer, such as local memory or local SSD disk, so that subsequent
requests can be served more quickly, directly from the cache. If a particular set of data
has been previously accessed by a query, any subsequent queries will retrieve that data
directly from the in-memory cache. This approach significantly diminishes IO latency, as
local memory operations are notably faster compared to fetching data from remote
storage.

Caching is fully transparent to the user. Irrespective of the origin, whether it be a


warehouse table, a OneLake shortcut, or even OneLake shortcut that references to non-
Azure services, the query caches all the data it accesses.

There are two types of caches that are described later in this article:

In-memory cache
Disk cache

In-memory cache
As the query accesses and retrieves data from storage, it performs a transformation
process that transcodes the data from its original file-based format into highly
optimized structures in in-memory cache.
Data in cache is organized in a compressed columnar format optimized for analytical
queries. Each column of data is stored together, separate from the others, allowing for
better compression since similar data values are stored together, leading to reduced
memory footprint. When queries need to perform operations on a specific column like
aggregates or filtering, the engine can work more efficiently since it doesn't have to
process unnecessary data from other columns.

Additionally, this columnar storage is also conducive to parallel processing, which can
significantly speed up query execution for large datasets. The engine can perform
operations on multiple columns simultaneously, taking advantage of modern multi-core
processors.

This approach is especially beneficial for analytical workloads where queries involve
scanning large amounts of data to perform aggregations, filtering, and other data
manipulations.

Disk cache
Certain datasets are too large to be accommodated within an in-memory cache. To
sustain rapid query performance for these datasets, Warehouse utilizes disk space as a
complementary extension to the in-memory cache. Any information that is loaded into
the in-memory cache is also serialized to the SSD cache.

Given that the in-memory cache has a smaller capacity compared to the SSD cache, data
that is removed from the in-memory cache remains within the SSD cache for an
extended period. When subsequent query requests this data, it is retrieved from the SSD
cache into the in-memory cache at a significantly quicker rate than if fetched from
remote storage, ultimately providing you with more consistent query performance.

Cache management
Caching remains consistently active and operates seamlessly in the background,
requiring no intervention on your part. Disabling caching is not needed, as doing so
would inevitably lead to a noticeable deterioration in query performance.

The caching mechanism is orchestrated and upheld by the Microsoft Fabric itself, and it
doesn't offer users the capability to manually clear the cache.

Full cache transactional consistency ensures that any modifications to the data in
storage, such as through Data Manipulation Language (DML) operations, after it has
been initially loaded into the in-memory cache, will result in consistent data.

When the cache reaches its capacity threshold and fresh data is being read for the first
time, objects that have remained unused for the longest duration will be removed from
the cache. This process is enacted to create space for the influx of new data and
maintain an optimal cache utilization strategy.

Related content
Fabric Data Warehouse performance guidelines

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


Understand V-Order for Microsoft
Fabric Warehouse
Article • 08/05/2024

Applies to: Warehouse in Microsoft Fabric

The Warehouse in Microsoft Fabric storage uses the Delta Lake table format for all user
data. In addition to optimizations provided by the Delta format, a warehouse applies
optimizations to storage to provide faster query performance on analytics scenarios
while maintaining adherence to the Parquet format. This article covers V-Order write
optimization, its benefits, and how to control it.

What is V-Order?
V-Order is a write time optimization to the parquet file format that enables lightning-
fast reads under the Microsoft Fabric compute engines, such as Power BI, SQL, Spark,
and others.

Power BI and SQL engines make use of Microsoft Verti-Scan technology and V-Ordered
parquet files to achieve in-memory-like data access times. Spark and other non-Verti-
Scan compute engines also benefit from the V-Ordered files with an average of 10%
faster read times, with some scenarios up to 50%.

V-Order works by applying special sorting, row group distribution, dictionary encoding,
and compression on Parquet files. As a result, compute engines require less network,
disk, and CPU resources to read data from storage, providing cost efficiency and
performance. It's 100% compliant to the open-source parquet format; all parquet
engines can read it as regular parquet files.

Performance considerations
Consider the following before deciding to disable V-Order:

Microsoft Fabric Direct Lake mode depends on V-Order.


In warehouse, the effect of V-Order on performance can vary depending on your
table schemas, data volumes, query, and ingestion patterns.
Make sure you test how V-Order affects the performance of data ingestion and of
your queries before deciding to disable it. Consider creating a copy of your test
warehouse using source control, disabling V-Order on the copy, and executing
data ingestion and querying tasks to test the performance implications.
Scenarios where V-Order might not be beneficial
Consider the effect of V-Order on performance before deciding if disabling V-Order is
right for you.

U Caution

Currently, disabling V-Order can only be done at the warehouse level, and it is
irreversible: once disabled, it cannot be enabled again. Users must consider the
performance if they choose to Disable V-Order in Fabric Warehouse.

Disabling V-Order can be useful for write-intensive warehouses, such as for warehouses
that are dedicated to staging data as part of a data ingestion process. Staging tables are
often dropped and recreated (or truncated) to process new data. These staging tables
might then be read only once or twice, which might not justify the ingestion time added
by applying V-Order. By disabling V-Order and reducing the time to ingest data, your
overall time to process data during ingestion jobs might be reduced. In this case, you
should segment the staging warehouse from your main user-facing warehouse, so that
the analytics queries and Power BI can benefit from V-Order.

Related content
Disable V-Order on Warehouse in Microsoft Fabric
Delta Lake table optimization and V-Order

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


Monitor Fabric Data warehouse
Article • 11/19/2024

Applies to: ✅ SQL analytics endpoint and Warehouse in Microsoft Fabric

Monitoring the usage and activity is crucial for ensuring that your warehouse operates
efficiently.

Fabric provides a set of tools to help you:

Optimize query performance


Gain insights into your Fabric capacity to determine when it's time to scale up or
down
Understand details about running and completed queries

Microsoft Fabric Capacity Metrics app


The Microsoft Fabric Capacity Metrics app provides visibility into capacity usage of each
warehouse allowing you to see the compute charges for all user-generated and system-
generated T-SQL statements within a warehouse and SQL analytics endpoint. For more
information on monitoring capacity usage, see Billing and utilization reporting in Fabric
Data Warehouse.

Query activity
Users are provided a one-stop view of their running and completed queries in an easy-
to-use interface, without having to run T-SQL. For more information, see Monitor your
running and completed queries using Query activity.

Query insights
Query Insights provides historical query data for completed, failed, canceled queries
along with aggregated insights to help you tune your query performance. For more
information, see Query insights in Fabric data warehousing.

Dynamic management views (DMVs)


Users can get insights about their live connections, sessions, and requests by querying a
set of dynamic management views (DMVs) with T-SQL. For more information, see
Monitor connections, sessions, and requests using DMVs.

Related content
Billing and utilization reporting in Fabric Data Warehouse
Monitor your running and completed queries using Query activity
Query insights in Fabric data warehousing
Monitor connections, sessions, and requests using DMVs

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


Billing and utilization reporting in Synapse
Data Warehouse
Article • 08/22/2024

Applies to: ✅ SQL analytics endpoint and Warehouse in Microsoft Fabric

The article explains compute usage reporting of the Synapse Data Warehouse in Microsoft Fabric,
which includes read and write activity against the Warehouse, and read activity on the SQL analytics
endpoint of the Lakehouse.

When you use a Fabric capacity, your usage charges appear in the Azure portal under your
subscription in Microsoft Cost Management. To understand your Fabric billing, visit Understand your
Azure bill on a Fabric capacity.

For more information about monitoring current and historical query activity, see Monitor in Fabric
Data warehouse overview.

Capacity
In Fabric, based on the Capacity SKU purchased, you're entitled to a set of Capacity Units (CUs) that
are shared across all Fabric workloads. For more information on licenses supported, see Microsoft
Fabric licenses.

Capacity is a dedicated set of resources that is available at a given time to be used. Capacity defines
the ability of a resource to perform an activity or to produce output. Different resources consume
CUs at different times.

Capacity in Fabric Synapse Data Warehouse


In the capacity-based SaaS model, Fabric data warehousing aims to make the most of the purchased
capacity and provide visibility into usage.

CUs consumed by data warehousing include read and write activity against the Warehouse, and
read activity on the SQL analytics endpoint of the Lakehouse.

In simple terms, 1 Fabric capacity unit = 0.5 Warehouse vCores. For example, a Fabric capacity SKU
F64 has 64 capacity units, which is equivalent to 32 Warehouse vCores.

Compute usage reporting


The Microsoft Fabric Capacity Metrics app provides visibility into capacity usage for all Fabric
workloads in one place. Administrators can use the app to monitor capacity, the performance of
workloads, and their usage compared to purchased capacity.
Initially, you must be a capacity admin to install the Microsoft Fabric Capacity Metrics app. Once
installed, anyone in the organization can have permissions granted or shared to view the app. For
more information, see Install the Microsoft Fabric Capacity Metrics app.

Once you have installed the app, select the Warehouse from the Select item kind: dropdown list.
The Multi metric ribbon chart chart and the Items (14 days) data table now show only Warehouse
activity.

Warehouse operation categories


You can analyze universal compute capacity usage by workload category, across the tenant. Usage is
tracked by total Capacity Unit Seconds (CUs). The table displayed shows aggregated usage across
the last 14 days.

Both the Warehouse and SQL analytics endpoint rollup under Warehouse in the Metrics app, as they
both use SQL compute. The operation categories seen in this view are:

Warehouse Query: Compute charge for all user-generated and system-generated T-SQL
statements within a Warehouse.
SQL analytics endpoint Query: Compute charge for all user generated and system generated
T-SQL statements within a SQL analytics endpoint.
OneLake Compute: Compute charge for all reads and writes for data stored in OneLake.

For example:
Timepoint explore graph
This graph in the Microsoft Fabric Capacity Metrics app shows utilization of resources compared to
capacity purchased. 100% of utilization represents the full throughput of a capacity SKU and is
shared by all Fabric workloads. This is represented by the yellow dotted line. Selecting a specific
timepoint in the graph enables the Explore button, which opens a detailed drill through page.

In general, similar to Power BI, operations are classified either as interactive or background, and
denoted by color. Most operations in Warehouse category are reported as background to take
advantage of 24-hour smoothing of activity to allow for the most flexible usage patterns. Classifying
data warehousing as background reduces the frequency of peaks of CU utilization from triggering
throttling.

Timepoint drill through graph


This table in the Microsoft Fabric Capacity Metrics app provides a detailed view of utilization at
specific timepoints. The amount of capacity provided by the given SKU per 30-second period is
shown along with the breakdown of interactive and background operations. The interactive
operations table represents the list of operations that were executed at that timepoint.

The Background operations table might appear to display operations that were executed much
before the selected timepoint. This is due to background operations undergoing 24-hour
smoothing. For example, the table displays all operations that were executed and still being
smoothed at a selected timepoint.

Top use cases for this view include:

Identification of a user who scheduled or ran an operation: values can be either


"[email protected]", "System", or "Power BI Service".
Examples of user generated statements include running T-SQL queries or activity in the
Fabric portal, such as the SQL Query editor or Visual Query editor.
Examples of "System" generated statements include metadata synchronous activities and
other system background tasks that are run to enable faster query execution.

Identification of an operation status: values can be either "Success", "InProgress", "Cancelled",


"Failure", "Invalid", or "Rejected".
The "Cancelled" status are queries cancelled before completing.
The "Rejected" status can occur because of resource limitations.

Identification of an operation that consumed many resources: sort the table by Total CU(s)
descending to find the most expensive queries, then use Operation Id to uniquely identify an
operation. This is the distributed statement ID, which can be used in other monitoring tools
like dynamic management views (DMVs) and Query Insights for end-to-end traceability, such
as in dist_statement_id in sys.dm_exec_requests, and distributed_statement_id in query
insights.exec_requests_history. Examples:

The following sample T-SQL query uses an Operation Id inside a query on the
sys.dm_exec_requests dynamic management view.

SQL

SELECT * FROM sys.dm_exec_requests


WHERE dist_statement_id = '00AA00AA-BB11-CC22-DD33-44EE44EE44EE';
The following T-SQL query uses an Operation Id in a query on the
queryinsights.exec_requests_history view.

SQL

SELECT * FROM queryinsights.exec_requests_history


WHERE distributed_statement_id = '00AA00AA-BB11-CC22-DD33-44EE44EE44EE`;

Billing example
Consider the following query:

SQL

SELECT * FROM Nyctaxi;

For demonstration purposes, assume the billing metric accumulates 100 CU seconds.

The cost of this query is CU seconds times the price per CU. Assume in this example that the price
per CU is
0.18/hour. T hereare3600secondsinanhour. So, thecostof thisquerywouldbe(100x0.18)/3600 =

0.005.

The numbers used in this example are for demonstration purposes only and not actual billing
metrics.

Considerations
Consider the following usage reporting nuances:

Cross database reporting: When a T-SQL query joins across multiple warehouses (or across a
Warehouse and a SQL analytics endpoint), usage is reported against the originating resource.
Queries on system catalog views and dynamic management views are billable queries.
Duration(s) field reported in Fabric Capacity Metrics App is for informational purposes only. It
reflects the statement execution duration. Duration might not include the complete end-to-
end duration for rendering results back to the web application like the SQL Query Editor or
client applications like SQL Server Management Studio and Azure Data Studio.

Next step
How to: Observe Synapse Data Warehouse utilization trends

Related content
Monitor connections, sessions, and requests using DMVs
Workload management
Synapse Data Warehouse in Microsoft Fabric performance guidelines
What is the Microsoft Fabric Capacity Metrics app?
Smoothing and throttling in Fabric Data Warehousing
Understand your Azure bill on a Fabric capacity
Understand the metrics app compute page
Pause and resume in Fabric data warehousing
Monitor Fabric Data warehouse

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


Monitor connections, sessions, and
requests using DMVs
Article • 04/24/2024

Applies to: SQL analytics endpoint and Warehouse in Microsoft Fabric

You can use existing dynamic management views (DMVs) to monitor connection,
session, and request status in Microsoft Fabric. For more information about the tools
and methods of executing T-SQL queries, see Query the Warehouse.

How to monitor connections, sessions, and


requests using query lifecycle DMVs
For the current version, there are three dynamic management views (DMVs) provided
for you to receive live SQL query lifecycle insights.

sys.dm_exec_connections
Returns information about each connection established between the warehouse
and the engine.
sys.dm_exec_sessions
Returns information about each session authenticated between the item and
engine.
sys.dm_exec_requests
Returns information about each active request in a session.

These three DMVs provide detailed insight on the following scenarios:

Who is the user running the session?


When was the session started by the user?
What's the ID of the connection to the data Warehouse and the session that is
running the request?
How many queries are actively running?
Which queries are long running?

In this tutorial, learn how to monitor your running SQL queries using dynamic
management views (DMVs).

Example DMV queries


The following example queries sys.dm_exec_sessions to find all sessions that are
currently executing.

SQL

SELECT *
FROM sys.dm_exec_sessions;

Find the relationship between connections and sessions


The following example joins sys.dm_exec_connections and sys.dm_exec_sessions to the
relationship between the active session in a specific connection.

SQL

SELECT connections.connection_id,
connections.connect_time,
sessions.session_id, sessions.login_name, sessions.login_time,
sessions.status
FROM sys.dm_exec_connections AS connections
INNER JOIN sys.dm_exec_sessions AS sessions
ON connections.session_id=sessions.session_id;

Identify and KILL a long-running query


This first query identifies the list of long-running queries in the order of which query has
taken the longest since it has arrived.

SQL
SELECT request_id, session_id, start_time, total_elapsed_time
FROM sys.dm_exec_requests
WHERE status = 'running'
ORDER BY total_elapsed_time DESC;

This second query shows which user ran the session that has the long-running query.

SQL

SELECT login_name
FROM sys.dm_exec_sessions
WHERE 'session_id' = 'SESSION_ID WITH LONG-RUNNING QUERY';

This third query shows how to use the KILL command on the session_id with the long-
running query.

SQL

KILL 'SESSION_ID WITH LONG-RUNNING QUERY'

For example

SQL

KILL '101'

Permissions
An Admin has permissions to execute all three DMVs ( sys.dm_exec_connections ,
sys.dm_exec_sessions , sys.dm_exec_requests ) to see their own and others'

information within a workspace.


A Member, Contributor, and Viewer can execute sys.dm_exec_sessions and
sys.dm_exec_requests and see their own results within the warehouse, but does

not have permission to execute sys.dm_exec_connections .


Only an Admin has permission to run the KILL command.

Related content
Query using the SQL Query editor
Query the Warehouse and SQL analytics endpoint in Microsoft Fabric
Query insights in the Warehouse and SQL analytics endpoint in Microsoft Fabric
Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


Monitor your running and completed T-
SQL queries using Query activity
Article • 06/02/2024

Applies to: SQL analytics endpoint and Warehouse in Microsoft Fabric

Monitoring SQL queries is essential for monitoring and troubleshooting performance of


your Fabric warehouse, and also for maintaining the efficiency of the warehouse. With
Query activity, you have a one-stop view of all running and historical T-SQL queries
along with a list of long-running and frequently run queries without having to run any T-
SQL code. You can use Query activity ensure that queries aren't taking longer than
expected to execute and are completing successfully.

Prerequisites
You must be an admin in your workspace to access Query activity. Members,
Contributors, Viewers do not have permission to access this view.

Get started
There are two ways you can launch the Query activity experience.

Select More Options (...) next to the warehouse you want to monitor within the
workspace view and select Query activity.
Within the query editor of the warehouse you want to monitor, select Query
activity in the ribbon.

Query runs
On the Query runs page, you can see a list of running, succeeded, canceled, and failed
queries up to the past 30 days.

Use the dropdown list to filter for status, submitter, or submit time.
Use the search bar to filter for specific keywords in the query text or other
columns.
For each query, the following details are provided:

ノ Expand table

Column name Description

Distributed statement Id Unique ID for each query

Query text Text of the executed query (up to 8,000 characters)

Submit time (UTC) Timestamp when the request arrived

Duration Time it took for the query to execute

Status Query status (Running, Succeeded, Failed, or Canceled)

Submitter Name of the user or system that sent the query

Session Id ID linking the query to a specific user session

Run source Name of the client program that initiated the session

When you want to reload the queries that are displayed on the page, select the Refresh
button in the ribbon. If you see a query that is running that you would like to
immediately stop the execution of, select the query using the checkbox and select the
Cancel button. You'll be prompted with a dialog to confirm before the query is canceled.
Any unselected queries that are part of the same SQL sessions you select will also be
canceled.

The same information regarding running queries can also be found using dynamic
management views.

Query insights
On the Query insights page, you can see a list of long running queries and frequently
run queries to help determine any trends within your warehouse's queries.

For each query in the Long running queries insight, the following details are provided:

ノ Expand table

Column name Description

Query text Text of the executed query (up to 8,000 characters)

Median run duration Median query execution time (ms) across runs

Run count Total number of times the query was executed

Last run duration Time taken by the last execution (ms)

Last run distributed statement ID Unique ID for the last query execution

Last run session ID Session ID for the last execution

For each query in the Frequently run queries insight, the following details are provided:

ノ Expand table

Column name Description

Query text Text of the executed query (up to 8,000 characters)

Average run duration Average query execution time (ms) across runs

Max duration Longest query execution time (ms)

Min duration Shortest query execution time (ms)

Last run distributed statement ID Unique ID for the last query execution

Run count Total number of times the query was executed

Count of successful runs Number of successful query executions

Count of failed runs Number of failed query executions

Count of canceled runs Number of canceled query executions

The same information regarding completed, failed, and canceled queries from Query
runs along with aggregated insights can also be found in Query insights in Fabric data
warehousing.
Limitations
Historical queries can take up to 15 minutes to appear in Query activity depending
on the concurrent workload being executed.
Only the top 10,000 rows can be shown in the Query runs and Query insights tabs
for the given filter selections.
An "Invalid object name queryinsights.exec_requests_history" error might occur if
Query activity is opened immediately after a new warehouse is created, due to the
underlying system views not yet generated. As a workaround, wait two minutes,
then refresh the page.

Related content
Billing and utilization reporting in Synapse Data Warehouse
Query insights in Fabric data warehousing
Monitor connections, sessions, and requests using DMVs

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


Query insights in Fabric data
warehousing
Article • 11/20/2024

Applies to: ✅ SQL analytics endpoint and Warehouse in Microsoft Fabric

In Microsoft Fabric, the query insights feature is a scalable, sustainable, and extendable
solution to enhance the SQL analytics experience. With historical query data, aggregated
insights, and access to actual query text, you can analyze and tune your query
performance. QI provides information on queries run in a user's context only, system
queries aren't considered.

The query insights feature provides a central location for historic query data and
actionable insights for 30 days, helping you to make informed decisions to enhance the
performance of your Warehouse or SQL analytics endpoint. When a SQL query runs in
Microsoft Fabric, the query insights feature collect and consolidates its execution data,
providing you with valuable information. You can view complete query text for Admin,
Member, and Contributor roles.

Historical Query Data: The query insights feature stores historical data about
query executions, enabling you to track performance changes over time. System
queries aren't stored in query insights.
Aggregated Insights: The query insights feature aggregates query execution data
into insights that are more actionable, such as identifying long-running queries or
most active users. These aggregations are based on the query shape. For more
information, see How are similar queries aggregated to generate insights?

Before you begin


You should have access to a SQL analytics endpoint or Warehouse within a Premium
capacity workspace with contributor or higher permissions.

When do you need query insights?


The query insights feature addresses several questions and concerns related to query
performance and database optimization, including:

Query Performance Analysis

What is the historical performance of our queries?


Are there any long-running queries that need attention?
Can we identify the queries causing performance bottlenecks?
Was cache utilized for my queries?
Which queries are consuming the most CPU?

Query Optimization and Tuning

Which queries are frequently run, and can their performance be improved?
Can we identify queries that have failed or been canceled?
Can we track changes in query performance over time?
Are there any queries that consistently perform poorly?

User Activity Monitoring

Who submitted a particular query?


Who are the most active users or the users with the most long-running queries?

There are three system views to provide answers to these questions:

queryinsights.exec_requests_history (Transact-SQL)
Returns information about each completed SQL request/query.

queryinsights.exec_sessions_history (Transact-SQL)
Returns information about frequently run queries.

queryinsights.long_running_queries (Transact-SQL)
Returns the information about queries by query execution time.

queryinsights.frequently_run_queries (Transact-SQL)
Returns information about frequently run queries.

Where can you see query insights?


Autogenerated views are under the queryinsights schema in SQL analytics endpoint
and Warehouse. In the Fabric Explorer of a Warehouse for example, find query insights
views under Schemas, queryinsights, Views.
After your query completes execution, you see its execution data in the queryinsights
views of the Warehouse or SQL analytics endpoint you were connected to. If you run a
cross-database query while in the context of WH_2 , your query appears in the query
insights of WH_2 . Completed queries can take up to 15 minutes to appear in query
insights depending on the concurrent workload being executed. The time taken for
queries to appear in query insights increases with increase in concurrent queries being
executed.
How are similar queries aggregated to
generate insights?
Queries are considered the same by the Query Insights if the queries have the same
shape, even if the predicates may be different.

You can utilize the query hash column in the views to analyze similar queries and drill
down to each execution.

For example, the following queries are considered the same after their predicates are
parameterized:

SQL

SELECT * FROM Orders


WHERE OrderDate BETWEEN '1996-07-01' AND '1996-07-31';

and

SQL

SELECT * FROM Orders


WHERE OrderDate BETWEEN '2000-07-01' AND '2006-07-31';

Examples

Identify queries run by you in the last 30 minutes


The following query uses queryinsights.exec_requests_history and the built-in
USER_NAME() function, which returns your current session user name.

SQL

SELECT * FROM queryinsights.exec_requests_history


WHERE start_time >= DATEADD(MINUTE, -30, GETUTCDATE())
AND login_name = USER_NAME();

Identify top CPU consuming queries by CPU time


The following query returns the top 100 queries by allocated CPU time.
SQL

SELECT TOP 100 distributed_statement_id, query_hash, allocated_cpu_time_ms,


label, command
FROM queryinsights.exec_requests_history
ORDER BY allocated_cpu_time_ms DESC;

Identify which queries are scanning most data from


remote rather than cache
You can determine if the large data scanning during query execution is slowing down
your query and make decisions to tweak your query code accordingly. This analysis
allows you to compare different query executions and identify if the variance in the
amount of data scanned is the reason for performance changes.

Furthermore, you can assess the use of cache by examining the sum of
data_scanned_memory_mb and data_scanned_disk_mb , and comparing it to the

data_scanned_remote_storage_mb for past executions.

7 Note

The data scanned values might not account the data moved during the
intermediate stages of query execution. In some cases, the size of the data moved
and CPU required to process may be larger than the data scanned value indicates.

SQL

SELECT distributed_statement_id, query_hash, data_scanned_remote_storage_mb,


data_scanned_memory_mb, data_scanned_disk_mb, label, command
FROM queryinsights.exec_requests_history
ORDER BY data_scanned_remote_storage_mb DESC;

Identify the most frequently run queries using a substring


in the query text
The following query returns the most recent queries that match a certain string, ordered
by the number of successful executions descending.

SQL

SELECT * FROM queryinsights.frequently_run_queries


WHERE last_run_command LIKE '%<some_label>%'
ORDER BY number_of_successful_runs DESC;

Identify long-running queries using a substring in the


query text
The following query returns the queries that match a certain string, ordered by the
median query execution time descending.

SQL

SELECT * FROM queryinsights.long_running_queries


WHERE last_run_command LIKE '%<some_label>%'
ORDER BY median_total_elapsed_time_ms DESC;

Related content
Monitoring connections, sessions, and requests using DMVs
queryinsights.exec_requests_history (Transact-SQL)
queryinsights.exec_sessions_history (Transact-SQL)
queryinsights.long_running_queries (Transact-SQL)
queryinsights.frequently_run_queries (Transact-SQL)

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


How to: Observe Synapse Data
Warehouse utilization trends
Article • 08/22/2024

Applies to: ✅ SQL analytics endpoint and Warehouse in Microsoft Fabric

Learn how to observe trends and spikes in your data warehousing workload in Microsoft
Fabric using the Microsoft Fabric Capacity Metrics app.

The Microsoft Fabric Capacity Metrics app provides visibility into capacity usage for all
Fabric workloads in one place. It's mostly used by capacity administrators to monitor the
performance of workloads and their usage, compared to purchased capacity.

Prerequisites
Have a Microsoft Fabric licenses, which grants Capacity Units (CUs) shared across
all Fabric workloads.
Add the Microsoft Fabric Capacity Metrics app from AppSource.

Observe overall trend across all items in Fabric


capacity
In the Fabric Capacity Metrics app, use the Multi metric ribbon chart to find peaks in CU
utilization. Look for patterns in your Fabric usage that coincide with peak end-user
activity, nightly processing, periodic reporting, etc. Determine what resources are
consuming the most CUs at peak utilization and/or business hours.

This graph can provide high-level CU trends in the last 14 days to see which Fabric
workload has used the most CU.

1. Use the Item table to identify specific warehouses consuming most Compute. The
Items table in the multi metric ribbon chart provides aggregated consumption at
item level. In this view, for example, you can identify which items have consumed
the most CUs.
2. Select "Warehouse" in the Select item kind(s) dropdown list.
3. Sort the Item table by CU(s), descending.
4. You can now identify the items using the most capacity units, overall duration of
activity, number of users, and more.
Drill through peak activity
Use the timepoint graph to identify a range of activity where CU utilization was at its
peak. We can identify individual interactive and background activities consuming
utilization.

The following animated image walks through several steps you can use to drill through
utilization, throttling, and overage information. For more information, visit Throttling in
Microsoft Fabric.

1. Select the Utilization tab in timepoint explore graph to identify the timepoint at
which capacity utilization exceeded more than what was purchased. The yellow
dotted line provides visibility into upper SKU limit. The upper SKU limit is based on
the SKU purchased along with the enablement of autoscale, if the capacity has
autoscale enabled.
2. Select the Throttling tab and go to the Background rejection section, which is
most applicable for Warehouse requests. In the previous sample animated image,
observe that on October 16, 2023 at 12:57 PM, all background requests in the
capacity were throttled. The 100% line represents the maximum limit based on the
Fabric SKU purchased.
3. Select the Overages tab. This graph gives an overview of the debt that is being
collected and carry forwarded across time periods.
Add % (Green): When the capacity overloads and starts adding to debt
bucket.
Burndown % (Blue): When the debt starts burning down and overall capacity
utilization falls below 100%.
Cumulative % (Red): Represents the total overall debt at timepoints. This
needs to be burned down eventually.
4. In the Utilization, Throttling, or Overages tabs, select a specific timepoint to
enable the Explore button for further drill through analysis.
5. Select Explore. The new page provides tables to explore details of both interactive
and background operations. The page shows some background operations that
are not occurring at that time, due to the 24-hour smoothing logic. In the previous
animated image, operations are displayed between October 15 12:57 PM to
October 16 12:57 PM, because of the background operations still being smoothed
at the selected timepoint.
6. In the Background operations table, you can also identify users, operations,
start/stop times, durations that consumed the most CUs.

The table of operations also provides a list of operations that are InProgress,
so you can understand long running queries and its current CU consumption.

Identification of an operation that consumed many resources: sort the table


by Total CU(s) descending to find the most expensive queries, then use
Operation Id to uniquely identify an operation. This is the distributed
statement ID, which can be used in other monitoring tools like dynamic
management views (DMVs) and Query Insights for end-to-end traceability,
such as in dist_statement_id in sys.dm_exec_requests, and
distributed_statement_id in query insights.exec_requests_history. Examples:

The following sample T-SQL query uses the Operation Id inside a query on
the sys.dm_exec_requests dynamic management view.

SQL

SELECT * FROM sys.dm_exec_requests


WHERE dist_statement_id = '00AA00AA-BB11-CC22-DD33-44EE44EE44EE';

The following T-SQL query uses the Operation Id in a query on the


queryinsights.exec_requests_history view.

SQL

SELECT * FROM queryinsights.exec_requests_history


WHERE distributed_statement_id = '00AA00AA-BB11-CC22-DD33-
44EE44EE44EE`;

7. The Burndown table graph represents the different Fabric workloads that are
running on this capacity and the % compute consumed by them at the selected
timepoint.

The table entry for DMS is your Warehouse workload. In the previous sample
animated image, DMS has added 26% to the overall carryforward debt.
The Cumulative % column provides a percentage of how much the capacity
has overconsumed. This value should be below 100% to avoid throttling. For
example, in the previous sample animated image, 2433.84% indicates that
DMS used 24 times more capacity than what the current SKU (F2) allows.

Related content
Billing and utilization reporting in Synapse Data Warehouse
Monitor connections, sessions, and requests using DMVs
Workload management
Synapse Data Warehouse in Microsoft Fabric performance guidelines
What is the Microsoft Fabric Capacity Metrics app?
Smoothing and throttling in Fabric Data Warehousing
Pause and resume in Fabric data warehousing

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


Burstable capacity in Fabric data
warehousing
Article • 04/24/2024

Applies to: SQL analytics endpoint and Warehouse in Microsoft Fabric

A Fabric capacity is a distinct pool of resources that's size (or SKU) determines the
amount of computational power available. Warehouse and SQL analytics endpoint
provide burstable capacity that allows workloads to use more resources to achieve
better performance.

Burstable capacity
Burstable capacity has a direct correlation to the SKU that has been assigned to the
Fabric capacity of the workspace. It also is a function of the workload. A non-demanding
workload might never use burstable capacity units. The workload could achieve optimal
performance within the baseline capacity that has been purchased.

To determine if your workload is using burstable capacity, the following formula can be
used to calculate the scale factor for your workload: Capacity Units (CU) / duration /
Baseline CU = Scale factor

As an illustration of this formula, if your capacity is an F8, and your workload takes 100
seconds to complete, and it uses 1500 CU, the scale factor would be calculated as
follows: 1500 / 100 / 8 = 1.875

CU can be determined by using the Microsoft Fabric Capacity Metrics app.

When a scale factor is over 1, it means that burstable capacity is being used to meet the
demands of the workload. It also means that your workload is borrowing capacity units
from a future time interval. This is a fundamental concept of Microsoft Fabric called
smoothing.

Smoothing offers relief for customers who create sudden spikes during their peak times,
while they have a lot of idle capacity that is unused. Smoothing simplifies capacity
management by spreading the evaluation of compute to ensure that customer jobs run
smoothly and efficiently.

SKU guardrails
Burstable capacity is finite. There's a limit applied to the backend compute resources to
greatly reduce the risk of Warehouse and SQL analytics endpoint workloads causing
throttling.

The limit (or guardrail) is a scale factor directly correlated to the Fabric Capacity SKU size
that is assigned to the workspace.

ノ Expand table

Fabric Equivalent Premium Baseline Capacity Units Burstable Scale


SKU SKU (CU) Factor

F2 2 1x - 32x

F4 4 1x - 16x

F8 8 1x - 12x

F16 16 1x - 12x

F32 32 1x - 12x

F64 P1 64 1x - 12x

F128 P2 128 1x - 12x

F256 P3 256 1x - 12x

F512 P4 512 1x - 12x

F1024 P5 1024 1x - 12x

F2048 2048 1x - 12x

Smaller SKU sizes are often used for Dev/Test scenarios or ad hoc workloads. The larger
scale factor shown in the table gives more processing power that aligns with lower
overall utilization typically found in those environments.

Larger SKU sizes have access to more total capacity units, allowing more complex
workloads to run optimally and with more concurrency. Therefore, if desired
performance of a workload is not being achieved, increasing the capacity SKU size might
be beneficial.

7 Note

The maximum Burstable Scale Factor might only be observed for extremely small
time intervals, often within a single query for seconds or even milliseconds. When
using the Microsoft Fabric Capacity Metrics app to observe burstable capacity, the
scale factor over longer durations will be lower.

Isolation boundaries
Warehouse fully isolates ingestion from query processing, as described in Workload
management.

The burstable scale factor can be achieved independently for ingestion at the same time
the burstable scale factor is achieved for query processing. These scale factors
encapsulate all processes within a single workspace. However, capacity can be assigned
to multiple workspaces. Therefore, the aggregate max scale factor across a capacity
would be represented in the following formula: ([Query burstable scale factor] +
[Ingestion burstable scale factor]) * [number of Fabric workspaces] = [aggregate
burstable scale factor]

Considerations
Typically, a complex query running in a workspace assigned to a small capacity
SKU size should run to completion. However, if the data retrieval or intermediate
data processing physically can't run within the burstable scale factor, it results in
the following error message: This query was rejected due to current capacity
constraints. Review the performance guidelines to ensure data and query

optimization prior to increasing SKU size. To increase the SKU size, contact your
capacity administrator.

After the capacity is resized, new guardrails will be applied when the next query is
run. Performance should stabilize to the new capacity SKU size within a few
seconds of the first query submission.

A workload running on a nonoptimal capacity size can be subject to resource


contention (such as spilling) that can increase the CU usage of the workload.

Related content
Workload management
Scale your capacity
Smoothing and throttling in Fabric Data Warehousing
Manage capacity settings
Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


Pause and resume in Fabric data
warehousing
Article • 04/24/2024

Applies to: SQL analytics endpoint and Warehouse in Microsoft Fabric

Microsoft Fabric capacity can be paused to enable cost savings for your organization.
Similar to other workloads, Synapse Data Warehouse in Microsoft Fabric is affected
when the Fabric capacity is paused.

A Warehouse or Lakehouse in Microsoft Fabric cannot be paused individually. To learn


more about how to pause and resume your Fabric capacity, visit Pause and resume your
capacity.

Effect on user requests


An administrator can pause an active Fabric capacity at any time, even while SQL
statements are executing. Users can expect the following behavior when a capacity is
paused:

New requests: Once a capacity is paused, users cannot execute new SQL
statements or queries. This also includes activity on the Fabric portal like create
operations, loading data grid, opening model view, opening visual query editor.
Any new activity attempted after capacity is paused returns the following error
message Unable to complete the action because this Fabric capacity is
currently paused.

In client application tools like SQL Server Management Studio (SSMS) or Azure
Data Studio, users signing in to a paused capacity will get the same error text
with SQL error code: 24800.
In client application tools like SQL Server Management Studio (SSMS) or Azure
Data Studio, users attempting to run a new TSQL query on an existing
connection when capacity is paused will see the same error text with SQL error
code: 24802.
In-flight requests: Any open requests, like SQL statements in execution, or activity
on the SQL Query Editor, visual query editor, or modeling view, are canceled with
an error message like Unable to complete the action because this Fabric
capacity is currently paused.

User transactions: When a capacity gets paused in the middle of a user transaction
like BEGIN TRAN and COMMIT TRAN , the transactions roll back.
7 Note

The user experience of rejecting new requests and canceling in-flight requests is
consistent across both Fabric portal and client applications like SQL Server
Management Studio (SSMS) or Azure Data Studio.

Effect on system background tasks


Like user-initiated tasks, system background tasks that are in-flight are canceled when
capacity is paused. Examples of system-generated statements include metadata
synchronous activities and other background tasks that are run to enable faster query
execution.

Some cleanup activity might be affected when compute is paused. For example,
historical data older than the current data retention settings is not removed while the
capacity is paused. The activities catch up once the capacity resumes.

Effect on cache and performance


When a Fabric capacity is paused, warehouse compute resources are shut down
gracefully. For best performance, caches need to be kept warm all the time. In such
scenarios, it's not recommended to pause the underlying capacity.

When a Fabric capacity is resumed, it restarts the warehouse compute resources with a
clean cache, it will take a few runs to add relevant data to cache. During this time after a
resume operation, there could be perceived performance slowdowns.

 Tip

Make a trade-off between performance and cost before deciding to pause the
underlying Fabric capacity.

Effect on billing
When capacity is manually paused, it effectively pauses the compute billing meters
for all Microsoft Fabric workloads, including Warehouse.
Data warehouses do not report compute usage once pause workflow is initiated.
The OneLake storage billing meter is not paused. You continue to pay for storage
when compute is paused.
Learn more about billing implications here: Understand your Fabric capacity Azure bill.

Considerations and limitations


In the event of pause, in-flight requests in client application tools like SQL Server
Management Studio (SSMS) or Azure Data Studio receive generic error messages
that do not indicate the intent behind cancellation. A few sample error messages in
this case would be (not limited to):
An existing connection was forcibly closed by the remote host
Internal error. Unable to properly update physical metadata. Please try the

operation again and contact Customer Support Services if this persists.


A severe error occurred on the current command. The results, if any, should

be discarded.

Once the capacity resumes, it might take a couple of minutes to start accepting
new requests.
Background cleanup activity might be affected when compute is paused. The
activities catch up once the capacity resumes.

Related content
Scale your capacity
Workload management

Next step
Pause and resume your capacity

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


Smoothing and throttling in Fabric Data
Warehousing
Article • 10/08/2024

Applies to: ✅ SQL analytics endpoint and Warehouse in Microsoft Fabric

This article details the concepts of smoothing and throttling in workloads using
Warehouse and SQL analytics endpoint in Microsoft Fabric.

This article is specific to data warehousing workloads in Microsoft Fabric. For all Fabric
workloads and general information, see Throttling in Microsoft Fabric.

Compute capacity
Capacity forms the foundation in Microsoft Fabric and provides the computing power
that drives all Fabric workloads. Based on the Capacity SKU purchased, you're entitled to
a set of Capacity Units (CUs) that are shared across Fabric. You can review the CUs for
each SKU at Capacity and SKUs.

Smoothing
Capacities have periods where they're under-utilized (idle) and over-utilized (peak).
When a capacity is running multiple jobs, a sudden spike in compute demand might be
generated that exceeds the limits of a purchased capacity. Warehouse and SQL analytics
endpoint provide burstable capacity that allows workloads to use more resources to
achieve better performance.

Smoothing offers relief for customers who create sudden spikes during their peak times
while they have a lot of idle capacity that is unused. Smoothing simplifies capacity
management by spreading the evaluation of compute to ensure that customer jobs run
smoothly and efficiently.

Smoothing won't affect execution time. It helps streamline capacity management by


allowing customers to size your capacity based on average, rather than peak usage.

For interactive jobs run by users: capacity consumption is typically smoothed over
a minimum of 5 minutes, or longer, to reduce short-term temporal spikes.
For scheduled, or background jobs: capacity consumption is spread over 24 hours,
eliminating the concern for job scheduling or contention.
Throttling behavior specific to the warehouse
and SQL analytics endpoint
In general, similar to Power BI, operations are classified either as interactive or
background.

Most operations in the Warehouse category are reported as background to take


advantage of 24-hour smoothing of activity to allow for the most flexible usage
patterns. With 24-hour smoothing, operations can run simultaneously without causing
any spikes at any time during the day. Customers get the benefit of a consistently fast
performance without having to worry about tiny spikes in their workload. Thus,
classifying data warehousing as background reduces the frequency of peaks of CU
utilization from triggering throttling too quickly.

Most Warehouse and SQL analytics endpoint operations only experience operation
rejection after over-utilization averaged over a 24-hour period. For more information,
see Future smoothed consumption.

Throttling considerations
Any inflight operations including long-running queries, stored procedures, batches
won't get throttled mid-way. Throttling policies are applicable to the next
operation after consumption is smoothed.
Warehouse operations are background except for scenarios that involves Modeling
operations (such as creating a measure, adding or removing tables from a default
semantic model, visualize results, etc.) or creating/updating Power BI semantic
models (including a default semantic model) or reports. These operations continue
to follow "Interactive Rejection" policy.
Just like most Warehouse operations, dynamic management views (DMVs) are also
classified as background and covered by the "Background Rejection" policy. As a
result, DMVs cannot be queried when capacity is throttled. Even though DMVs are
not available, capacity admins can go to Microsoft Fabric Capacity Metrics app to
understand the root cause.
When the "Background Rejection" policy is enabled, any activity on the SQL query
editor, visual query editor, or modeling view, might see the error message: Unable
to complete the action because your organization's Fabric compute capacity has
exceeded its limits. Try again later .

When the "Background Rejection" policy is enabled, if you attempt to connect to a


warehouse or run a new TSQL query in client applications like SQL Server
Management Studio (SSMS) or Azure Data Studio via SQL connection string, you
might see SQL error code 24801 and the error text Unable to complete the action
because your organization's Fabric compute capacity has exceeded its limits.

Try again later .

Best practices to recover from overload


situations
Review actions you can take to recover from overload situations.

Monitor overload information with Fabric


Capacity Metrics App
Capacity administrators can view overload information and drilldown further via
Microsoft Fabric Capacity Metrics app.

For a walkthrough of the app, visit How to: Observe Synapse Data Warehouse utilization
trends.

Use the Microsoft Fabric Capacity Metrics app to view a visual history of any
overutilization of capacity, including carry forward, cumulative, and burndown of
utilization. For more information, refer to Throttling in Microsoft Fabric and Overages in
the Microsoft Fabric Capacity Metrics app.


Next step
How to: Observe Synapse Data Warehouse utilization trends

Related content
Throttling in Microsoft Fabric
Billing and utilization reporting in Synapse Data Warehouse
What is the Microsoft Fabric Capacity Metrics app?
Synapse Data Warehouse in Microsoft Fabric performance guidelines
Understand your Azure bill on a Fabric capacity
Smoothing and throttling in Fabric Data Warehousing
Burstable capacity in Fabric data warehousing
Pause and resume in Fabric data warehousing

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


Workload management
Article • 06/03/2024

Applies to: SQL analytics endpoint and Warehouse in Microsoft Fabric

This article describes the architecture and workload management behind data
warehousing in Microsoft Fabric.

Data processing
The Warehouse and SQL analytics endpoint share the same underlying processing
architecture. As data is retrieved or ingested, it leverages a distributed engine built for
both small and large-scale data and computational functions.

The processing system is serverless in that backend compute capacity scales up and
down autonomously to meet workload demands.
When a query is submitted, the SQL frontend (FE) performs query optimization to
determine the best plan based on the data size and complexity. Once the plan is
generated, it is given to the Distributed Query Processing (DQP) engine. The DQP
orchestrates distributed execution of the query by splitting it into smaller queries that
are executed on backend compute nodes. Each small query is called a task and
represents a distributed execution unit. It reads file(s) from OneLake, joins results from
other tasks, groups, or orders data retrieved from other tasks. For ingestion jobs, it also
writes data to the proper destination tables.

When data is processed, results are returned to the SQL frontend for serving back to the
user or calling application.

Elasticity and resiliency


Backend compute capacity benefits from a fast provisioning architecture. Although there
is no SLA on resource assignment, typically new nodes are acquired within a few
seconds. As resource demand increases, new workloads use the scaled-out capacity.
Scaling is an online operation and query processing goes uninterrupted.

The system is fault tolerant and if a node becomes unhealthy, operations executing on
the node are redistributed to healthy nodes for completion.

Warehouse and SQL analytics endpoint provide burstable capacity that allows workloads
to use more resources to achieve better performance, and use smoothing to offer relief
for customers who create sudden spikes during their peak times, while they have a lot of
idle capacity that is unused. Smoothing simplifies capacity management by spreading
the evaluation of compute to ensure that customer jobs run smoothly and efficiently.

Scheduling and resourcing


The distributed query processing scheduler operates at a task level. Queries are
represented to the scheduler as a directed acyclic graph (DAG) of tasks. This concept is
familiar to Spark users. A DAG allows for parallelism and concurrency as tasks that do
not depend on each other can be executed simultaneously or out of order.

As queries arrive, their tasks are scheduled based on first-in-first-out (FIFO) principles. If
there is idle capacity, the scheduler might use a "best fit" approach to optimize
concurrency.

When the scheduler identifies resourcing pressure, it invokes a scale operation. Scaling
is managed autonomously and backend topology grows as concurrency increases. As it
takes a few seconds to acquire nodes, the system is not optimized for consistent
subsecond performance of queries that require distributed processing.

When pressure subsides, backend topology scales back down and releases resource
back to the region.

Ingestion isolation
Applies to: Warehouse in Microsoft Fabric
In the backend compute pool of Warehouse in Microsoft Fabric, loading activities are
provided resource isolation from analytical workloads. This improves performance and
reliability, as ingestion jobs can run on dedicated nodes that are optimized for ETL and
do not compete with other queries or applications for resources.

Sessions
The Warehouse and SQL analytics endpoint have a user session limit of 724 per
workspace. When this limit is reached an error will be returned: The user session limit
for the workspace is 724 and has been reached .

7 Note

As Microsoft Fabric is a SaaS platform, there are many system connections that run
to continuously optimize the environment. DMVs show both system and user
sessions. For more information, see Monitor using DMVs.

Best practices
The Microsoft Fabric workspace provides a natural isolation boundary of the distributed
compute system. Workloads can take advantage of this boundary to manage both cost
and performance.
OneLake shortcuts can be used to create read-only replicas of tables in other
workspaces to distribute load across multiple SQL engines, creating an isolation
boundary. This can effectively increase the maximum number of sessions performing
read-only queries.

Related content
OneLake, the OneDrive for data
What is data warehousing in Microsoft Fabric?
Better together: the lakehouse and warehouse
Burstable capacity in Fabric data warehousing
Smoothing and throttling in Fabric Data Warehousing

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


Migration​: ​Azure Synapse Analytics
dedicated SQL pools to Fabric
Article • 07/17/2024

Applies to: Warehouse in Microsoft Fabric

This article details the strategy, considerations, and methods of migration of data
warehousing in Azure Synapse Analytics dedicated SQL pools to Microsoft Fabric
Warehouse.

Migration introduction
As Microsoft introduced Microsoft Fabric, an all-in-one SaaS analytics solution for
enterprises that offers a comprehensive suite of services, including Data Factory, Data
Engineering, Data Warehousing, Data Science, Real-Time Intelligence, and Power BI.

This article focuses on options for schema (DDL) migration, database code (DML)
migration, and data migration. Microsoft offers several options, and here we discuss
each option in detail and provide guidance on which of these options you should
consider for your scenario. This article uses the TPC-DS industry benchmark for
illustration and performance testing. Your actual result might vary depending on many
factors including type of data, data types, width of tables, data source latency, etc.

Prepare for migration


Carefully plan your migration project before you get started and ensure that your
schema, code, and data are compatible with Fabric Warehouse. There are some
limitations that you need to consider. Quantify the refactoring work of the incompatible
items, as well as any other resources needed before the migration delivery.

Another key goal of planning is to adjust your design to ensure that your solution takes
full advantage of the high query performance that Fabric Warehouse is designed to
provide. Designing data warehouses for scale introduces unique design patterns, so
traditional approaches aren't always the best. Review the Fabric Warehouse performance
guidelines, because although some design adjustments can be made after migration,
making changes earlier in the process will save you time and effort. Migration from one
technology/environment to another is always a major effort.

The following diagram depicts the Migration Lifecycle listing the major pillars consisting
of Assess and Evaluate, Plan and Design, Migrate, Monitor and Govern, Optimize and
Modernize pillars with the associated tasks in each pillar to plan and prepare for the
smooth migration.

Runbook for migration


Consider the following activities as a planning runbook for your migration from Synapse
dedicated SQL pools to Fabric Warehouse.

1. Assess and Evaluate


a. Identify objectives and motivations. Establish clear desired outcomes.
b. Discovery, assess, and baseline the existing architecture.
c. Identify key stakeholders and sponsors.
d. Define the scope of what is to be migrated.
i. Start small and simple, prepare for multiple small migrations.
ii. Begin to monitor and document all stages of the process.
iii. Build inventory of data and processes for migration.
iv. Define data model changes (if any).
v. Set up the Fabric Workspace.
e. What is your skillset/preference?
i. Automate wherever possible.
ii. Use Azure built-in tools and features to reduce migration effort.
f. Train staff early on the new platform.
i. Identify upskilling needs and training assets, including Microsoft Learn.
2. Plan and Design
a. Define the desired architecture.
b. Select the method/tools for the migration to accomplish the following tasks:
i. Data extraction from the source.
ii. Schema (DDL) conversion, including metadata for tables and views
iii. Data ingestion, including historical data.
i. If necessary, re-engineer the data model using new platform performance
and scalability.
iv. Database code (DML) migration.
i. Migrate or refactor stored procedures and business processes.
c. Inventory and extract the security features and object permissions from the
source.
d. Design and plan to replace/modify existing ETL/ELT processes for incremental
load.
i. Create parallel ETL/ELT processes to the new environment.
e. Prepare a detailed migration plan.
i. Map current state to new desired state.
3. Migrate
a. Perform schema, data, code migration.
i. Data extraction from the source.
ii. Schema (DDL) conversion
iii. Data ingestion
iv. Database code (DML) migration.
b. If necessary, scale the dedicated SQL pool resources up temporarily to aid
speed of migration.
c. Apply security and permissions.
d. Migrate existing ETL/ELT processes for incremental load.
i. Migrate or refactor ETL/ELT incremental load processes.
ii. Test and compare parallel increment load processes.
e. Adapt detail migration plan as necessary.
4. Monitor and Govern
a. Run in parallel, compare against your source environment.
i. Test applications, business intelligence platforms, and query tools.
ii. Benchmark and optimize query performance.
iii. Monitor and manage cost, security, and performance.
b. Governance benchmark and assessment.
5. Optimize and Modernize
a. When the business is comfortable, transition applications and primary reporting
platforms to Fabric.
i. Scale resources up/down as workload shifts from Azure Synapse Analytics to
Microsoft Fabric.
ii. Build a repeatable template from the experience gained for future
migrations. Iterate.
iii. Identify opportunities for cost optimization, security, scalability, and
operational excellence
iv. Identify opportunities to modernize your data estate with the latest Fabric
features.
'Lift and shift' or modernize?
In general, there are two types of migration scenarios, regardless of the purpose and
scope of the planned migration: lift and shift as-is, or a phased approach that
incorporates architectural and code changes.

Lift and shift


In a lift and shift migration, an existing data model is migrated with minor changes to
the new Fabric Warehouse. This approach minimizes risk and migration time by reducing
the new work needed to realize the benefits of migration.

Lift and shift migration is a good fit for these scenarios:

You have an existing environment with a small number of data marts to migrate.
You have an existing environment with data that's already in a well-designed star
or snowflake schema.
You're under time and cost pressure to move to Fabric Warehouse.

In summary, this approach works well for those workloads that is optimized with your
current Synapse dedicated SQL pools environment, and therefore doesn't require major
changes in Fabric.

Modernize in a phased approach with architectural


changes
If a legacy data warehouse has evolved over a long period of time, you might need to
re-engineer it to maintain the required performance levels.

You might also want to redesign the architecture to take advantage of the new engines
and features available in the Fabric Workspace.

Design differences: Synapse dedicated SQL


pools and Fabric Warehouse
Consider the following Azure Synapse and Microsoft Fabric data warehousing
differences, comparing dedicated SQL pools to the Fabric Warehouse.

Table considerations
When you migrate tables between different environments, typically only the raw data
and the metadata physically migrate. Other database elements from the source system,
such as indexes, usually aren't migrated because they might be unnecessary or
implemented differently in the new environment.

Performance optimizations in the source environment, such as indexes, indicate where


you might add performance optimization in a new environment, but now Fabric takes
care of that automatically for you.

T-SQL considerations
There are several Data Manipulation Language (DML) syntax differences to be aware of.
Refer to T-SQL surface area in Microsoft Fabric. Consider also a code assessment when
choosing method(s) of migration for the database code (DML).

Depending on the parity differences at the time of the migration, you might need to
rewrite parts of your T-SQL DML code.

Data type mapping differences


There are several data type differences in Fabric Warehouse. For more information, see
Data types in Microsoft Fabric.

The following table provides the mapping of supported data types from Synapse
dedicated SQL pools to Fabric Warehouse.

ノ Expand table

Synapse dedicated SQL pools Fabric Warehouse

money decimal(19,4)

smallmoney decimal(10,4)

smalldatetime datetime2

datetime datetime2

nchar char

nvarchar varchar

tinyint smallint

binary varbinary
Synapse dedicated SQL pools Fabric Warehouse

datetimeoffset* datetime2

* Datetime2 does not store the extra time zone offset information that is stored in. Since
the datetimeoffset data type is not currently supported in Fabric Warehouse, the time
zone offset data would need to be extracted into a separate column.

Schema, code, and data migration methods


Review and identify which of these options fits your scenario, staff skill sets, and the
characteristics of your data. The option(s) chosen will depend on your experience,
preference, and the benefits from each of the tools. Our goal is to continue to develop
migration tools that mitigate friction and manual intervention to make that migration
experience seamless.

This table summarizes information for data schema (DDL), database code (DML), and
data migration methods. We expand further on each scenario later in this article, linked
in the Option column.

ノ Expand table

Option Option What it does Skill/Preference Scenario


Number

1 Data Factory Schema ADF/Pipeline Simplified all in one schema


(DDL) (DDL) and data migration.
conversion Recommended for dimension
Data extract tables.
Data
ingestion

2 Data Factory Schema ADF/Pipeline Using partitioning options to


with partition (DDL) increase read/write parallelism
conversion providing 10x throughput vs
Data extract option 1, recommended for
Data fact tables.
ingestion

3 Data Factory Schema ADF/Pipeline Convert and migrate the


with accelerated (DDL) schema (DDL) first, then use
code conversion CETAS to extract and
COPY/Data Factory to ingest
data for optimal overall
ingestion performance.
Option Option What it does Skill/Preference Scenario
Number

4 Stored Schema T-SQL SQL user using IDE with more


procedures (DDL) granular control over which
accelerated conversion tasks they want to work on.
code Data extract Use COPY/Data Factory to
Code ingest data.
assessment

5 SQL Database Schema SQL Project SQL Database Project for


Project (DDL) deployment with the
extension for conversion integration of option 4. Use
Azure Data Data extract COPY or Data Factory to ingest
Studio Code data.
assessment

6 CREATE Data extract T-SQL Cost effective and high-


EXTERNAL performance data extract into
TABLE AS Azure Data Lake Storage
SELECT (CETAS) (ADLS) Gen2. Use COPY/Data
Factory to ingest data.

7 Migrate using Schema dbt Existing dbt users can use the
dbt (DDL) dbt Fabric adapter to convert
conversion their DDL and DML. You must
database then migrate data using other
code (DML) options in this table.
conversion

Choose a workload for the initial migration


When you're deciding where to start on the Synapse dedicated SQL pool to Fabric
Warehouse migration project, choose a workload area where you are able to:

Prove the viability of migrating to Fabric Warehouse by quickly delivering the


benefits of the new environment. Start small and simple, prepare for multiple small
migrations.
Allow your in-house technical staff time to gain relevant experience with the
processes and tools that they use when they migrate to other areas.
Create a template for further migrations that's specific to the source Synapse
environment, and the tools and processes in place to help.

 Tip
Create an inventory of objects that need to be migrated, and document the
migration process from start to end, so that it can be repeated for other dedicated
SQL pools or workloads.

The volume of migrated data in an initial migration should be large enough to


demonstrate the capabilities and benefits of the Fabric Warehouse environment, but not
too large to quickly demonstrate value. A size in the 1-10 terabyte range is typical.

Migration with Fabric Data Factory


In this section, we discuss the options using Data Factory for the low-code/no-code
persona who are familiar with Azure Data Factory and Synapse Pipeline. This drag and
drop UI option provides a simple step to convert the DDL and migrate the data.

Fabric Data Factory can perform the following tasks:

Convert the schema (DDL) to Fabric Warehouse syntax.


Create the schema (DDL) on Fabric Warehouse.
Migrate the data to Fabric Warehouse.

Option 1. Schema/Data migration - Copy Wizard and


ForEach Copy Activity
This method uses Data Factory Copy assistant to connect to the source dedicated SQL
pool, convert the dedicated SQL pool DDL syntax to Fabric, and copy data to Fabric
Warehouse. You can select 1 or more target tables (for TPC-DS dataset there are 22
tables). It generates the ForEach to loop through the list of tables selected in the UI and
spawn 22 parallel Copy Activity threads.

22 SELECT queries (one for each table selected) were generated and executed in
the dedicated SQL pool.
Make sure you have the appropriate DWU and resource class to allow the queries
generated to be executed. For this case, you need a minimum of DWU1000 with
staticrc10 to allow a maximum of 32 queries to handle 22 queries submitted.

Data Factory direct copying data from the dedicated SQL pool to Fabric Warehouse
requires staging. The ingestion process consisted of two phases.
The first phase consists of extracting the data from the dedicated SQL pool into
ADLS and is referred as staging.
The second phase consists of ingesting the data from staging into Fabric
Warehouse. Most of the data ingestion timing is in the staging phase. In
summary, staging has a huge impact on ingestion performance.
Recommended use
Using the Copy Wizard to generate a ForEach provides simple UI to convert DDL and
ingest the selected tables from the dedicated SQL pool to Fabric Warehouse in one step.

However, it isn't optimal with the overall throughput. The requirement to use staging,
the need to parallelize read and write for the "Source to Stage" step are the major
factors for the performance latency. It's recommended to use this option for dimension
tables only.

Option 2. DDL/Data migration - Data pipeline using


partition option
To address improving the throughput to load larger fact tables using Fabric data
pipeline, it's recommended to use Copy Activity for each Fact table with partition option.
This provides the best performance with Copy activity.

You have the option of using the source table physical partitioning, if available. If table
does not have physical partitioning, you must specify the partition column and supply
min/max values to use dynamic partitioning. In the following screenshot, the data
pipeline Source options are specifying a dynamic range of partitions based on the
ws_sold_date_sk column.

While using partition can increase the throughput with the staging phase, there are
considerations to make the appropriate adjustments:
Depending on your partition range, it might potentially use all the concurrency
slots as it might generate over 128 queries on the dedicated SQL pool.
You're required to scale to a minimum of DWU6000 to allow all queries to be
executed.
As an example, for TPC-DS web_sales table, 163 queries were submitted to the
dedicated SQL pool. At DWU6000, 128 queries were executed while 35 queries
were queued.
Dynamic partition automatically selects the range partition. In this case, an 11-day
range for each SELECT query submitted to the dedicated SQL pool. For example:

SQL

WHERE [ws_sold_date_sk] > '2451069' AND [ws_sold_date_sk] <= '2451080')


...
WHERE [ws_sold_date_sk] > '2451333' AND [ws_sold_date_sk] <= '2451344')

Recommended use

For fact tables, we recommended using Data Factory with partitioning option to increase
throughput.

However, the increased parallelized reads require dedicated SQL pool to scale to higher
DWU to allow the extract queries to be executed. Leveraging partitioning, the rate is
improved 10x over no partition option. You could increase the DWU to get additional
throughput via compute resources, but the dedicated SQL pool has a maximum 128
active queries allow.

7 Note

For more information on Synapse DWU to Fabric mapping, see Blog: Mapping ​
Azure Synapse dedicated SQL pools to Fabric data warehouse compute .

Option 3. DDL migration - Copy Wizard ForEach Copy


Activity
The two previous options are great data migration options for smaller databases. But if
you require higher throughput, we recommend an alternative option:

1. Extract the data from the dedicated SQL pool to ADLS, therefore mitigating the
stage performance overhead.
2. Use either Data Factory or the COPY command to ingest the data into Fabric
Warehouse.

Recommended use

You can continue to use Data Factory to convert your schema (DDL). Using the Copy
Wizard, you can select the specific table or All tables. By design, this migrates the
schema and data in one step, extracting the schema without any rows, using the false
condition, TOP 0 in the query statement.

The following code sample covers schema (DDL) migration with Data Factory.

Code example: Schema (DDL) migration with Data Factory


You can use Fabric Data Pipelines to easily migrate over your DDL (schemas) for table
objects from any source Azure SQL Database or dedicated SQL pool. This data pipeline
migrates over the schema (DDL) for the source dedicated SQL pool tables to Fabric
Warehouse.

Pipeline design: parameters

This Data Pipeline accepts a parameter SchemaName , which allows you to specify which
schemas to migrate over. The dbo schema is the default.

In the Default value field, enter a comma-delimited list of table schema indicating which
schemas to migrate: 'dbo','tpch' to provide two schemas, dbo and tpch .
Pipeline design: Lookup activity

Create a Lookup Activity and set the Connection to point to your source database.

In the Settings tab:

Set Data store type to External.

Connection is your Azure Synapse dedicated SQL pool. Connection type is Azure
Synapse Analytics.

Use query is set to Query.

The Query field needs to be built using a dynamic expression, allowing the
parameter SchemaName to be used in a query that returns a list of target source
tables. Select Query then select Add dynamic content.

This expression within the LookUp Activity generates a SQL statement to query the
system views to retrieve a list of schemas and tables. References the SchemaName
parameter to allow for filtering on SQL schemas. The Output of this is an Array of
SQL schema and tables that will be used as input into the ForEach Activity.

Use the following code to return a list of all user tables with their schema name.

JSON

@concat('
SELECT s.name AS SchemaName,
t.name AS TableName
FROM sys.tables AS t
INNER JOIN sys.schemas AS s
ON t.type = ''U''
AND s.schema_id = t.schema_id
AND s.name in (',coalesce(pipeline().parameters.SchemaName, 'dbo'),')
')
Pipeline design: ForEach Loop

For the ForEach Loop, configure the following options in the Settings tab:

Disable Sequential to allow for multiple iterations to run concurrently.


Set Batch count to 50 , limiting the maximum number of concurrent iterations.
The Items field needs to use dynamic content to reference the output of the
LookUp Activity. Use the following code snippet: @activity('Get List of Source
Objects').output.value

Pipeline design: Copy Activity inside the ForEach Loop

Inside the ForEach Activity, add a Copy Activity. This method uses the Dynamic
Expression Language within Data Pipelines to build a SELECT TOP 0 * FROM <TABLE> to
migrate only the schema without data into a Fabric Warehouse.

In the Source tab:

Set Data store type to External.


Connection is your Azure Synapse dedicated SQL pool. Connection type is Azure
Synapse Analytics.
Set Use Query to Query.
In the Query field, paste in the dynamic content query and use this expression
which will return zero rows, only the table schema: @concat('SELECT TOP 0 * FROM
',item().SchemaName,'.',item().TableName)

In the Destination tab:

Set Data store type to Workspace.


The Workspace data store type is Data Warehouse and the Data Warehouse is set
to the Fabric Warehouse.
The destination Table's schema and table name are defined using dynamic
content.
Schema refers to the current iteration's field, SchemaName with the snippet:
@item().SchemaName

Table is referencing TableName with the snippet: @item().TableName


Pipeline design: Sink

For Sink, point to your Warehouse and reference the Source Schema and Table name.

Once you run this pipeline, you'll see your Data Warehouse populated with each table in
your source, with the proper schema.

Migration using stored procedures in Synapse


dedicated SQL pool
This option uses stored procedures to perform the Fabric Migration.

You can get the code samples at microsoft/fabric-migration on GitHub.com . This code
is shared as open source, so feel free to contribute to collaborate and help the
community.

What Migration Stored Procedures can do:

1. Convert the schema (DDL) to Fabric Warehouse syntax.


2. Create the schema (DDL) on Fabric Warehouse.
3. Extract data from Synapse dedicated SQL pool to ADLS.
4. Flag nonsupported Fabric syntax for T-SQL codes (stored procedures, functions,
views).

Recommended use
This is a great option for those who:

Are familiar with T-SQL.


Want to use an integrated development environment such as SQL Server
Management Studio (SSMS).
Want more granular control over which tasks they want to work on.

You can execute the specific stored procedure for the schema (DDL) conversion, data
extract, or T-SQL code assessment.

For the data migration, you'll need to use either COPY INTO or Data Factory to ingest
the data into Fabric Warehouse.

Migrate using SQL database projects


Microsoft Fabric Data Warehouse is supported in the SQL Database Projects extension
available inside of Azure Data Studio and Visual Studio Code .

This extension is available inside Azure Data Studio and Visual Studio Code. This feature
enables capabilities for source control, database testing and schema validation.

For more information on source control for warehouses in Microsoft Fabric, including Git
integration and deployment pipelines, see Source Control with Warehouse.

Recommended use
This is a great option for those who prefer to use SQL Database Project for their
deployment. This option essentially integrated the Fabric Migration Stored Procedures
into the SQL Database Project to provide a seamless migration experience.

A SQL Database Project can:

1. Convert the schema (DDL) to Fabric Warehouse syntax.


2. Create the schema (DDL) on Fabric Warehouse.
3. Extract data from Synapse dedicated SQL pool to ADLS.
4. Flag nonsupported syntax for T-SQL codes (stored procedures, functions, views).

For the data migration, you'll then use either COPY INTO or Data Factory to ingest the
data into Fabric Warehouse.

Adding to the Azure Data Studio supportability to Fabric, the Microsoft Fabric CAT team
has provided a set of PowerShell scripts to handle the extraction, creation, and
deployment of schema (DDL) and database code (DML) via a SQL Database Project. For
a walkthrough of using the SQL Database project with our helpful PowerShell scripts, see
microsoft/fabric-migration on GitHub.com .

For more information on SQL Database Projects, see Getting started with the SQL
Database Projects extension and Build and Publish a project.

Migration of data with CETAS


The T-SQL CREATE EXTERNAL TABLE AS SELECT (CETAS) command provides the most
cost effective and optimal method to extract data from Synapse dedicated SQL pools to
Azure Data Lake Storage (ADLS) Gen2.

What CETAS can do:

Extract data into ADLS.


This option requires users to create the schema (DDL) on Fabric Warehouse
before ingesting the data. Consider the options in this article to migrate schema
(DDL).

The advantages of this option are:

Only a single query per table is submitted against the source Synapse dedicated
SQL pool. This won't use up all the concurrency slots, and so won't block
concurrent customer production ETL/queries.
Scaling to DWU6000 isn't required, as only a single concurrency slot is used for
each table, so customers can use lower DWUs.
The extract is run in parallel across all the compute nodes, and this is the key to the
improvement of performance.

Recommended use
Use CETAS to extract the data to ADLS as Parquet files. Parquet files provide the
advantage of efficient data storage with columnar compression that will take less
bandwidth to move across the network. Furthermore, since Fabric stored the data as
Delta parquet format, data ingestion will be 2.5x faster compared to text file format,
since there's no conversion to the Delta format overhead during ingestion.

To increase CETAS throughput:

Add parallel CETAS operations, increasing the use of concurrency slots but allowing
more throughput.
Scale the DWU on Synapse dedicated SQL pool.

Migration via dbt


In this section, we discuss dbt option for those customers who are already using dbt in
their current Synapse dedicated SQL pool environment.

What dbt can do:

1. Convert the schema (DDL) to Fabric Warehouse syntax.


2. Create the schema (DDL) on Fabric Warehouse.
3. Convert database code (DML) to Fabric syntax.

The dbt framework generates DDL and DML (SQL scripts) on the fly with each execution.
With model files expressed in SELECT statements, the DDL/DML can be translated
instantly to any target platform by changing the profile (connection string) and the
adapter type.
Recommended use
The dbt framework is code-first approach. The data must be migrated by using options
listed in this document, such as CETAS or COPY/Data Factory.

The dbt adapter for Microsoft Fabric Synapse Data Warehouse allows the existing dbt
projects that were targeting different platforms such as Synapse dedicated SQL pools,
Snowflake, Databricks, Google Big Query, or Amazon Redshift to be migrated to a Fabric
Warehouse with a simple configuration change.

To get started with a dbt project targeting Fabric Warehouse, see Tutorial: Set up dbt for
Fabric Data Warehouse. This document also lists an option to move between different
warehouses/platforms.

Data Ingestion into Fabric Warehouse


For ingestion into Fabric Warehouse, use COPY INTO or Fabric Data Factory, depending
on your preference. Both methods are the recommended and best performing options,
as they have equivalent performance throughput, given the prerequisite that the files
are already extracted to Azure Data Lake Storage (ADLS) Gen2.

Several factors to note so that you can design your process for maximum performance:

With Fabric, there isn't any resource contention when loading multiple tables from
ADLS to Fabric Warehouse concurrently. As a result, there is no performance
degradation when loading parallel threads. The maximum ingestion throughput
will only be limited by the compute power of your Fabric capacity.
Fabric workload management provides separation of resources allocated for load
and query. There's no resource contention while queries and data loading
executed at the same time.

Related content
Create a Warehouse in Microsoft Fabric
Synapse Data Warehouse in Microsoft Fabric performance guidelines
Security for data warehousing in Microsoft Fabric
Blog: Mapping ​Azure Synapse dedicated SQL pools to Fabric data warehouse
compute

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


Troubleshoot the Warehouse
Article • 11/19/2024

Applies to: ✅ Warehouse in Microsoft Fabric

This article provides guidance in troubleshooting common issues in Warehouse in


Microsoft Fabric.

Transient connection errors


A transient error, also known as a transient fault, has an underlying cause that soon
resolves itself. If a connection to Warehouse used to work fine but starts to fail without
changes in user permission, firewall policy, and network configuration, try these steps
before contacting support:

1. Check the status of Warehouse and ensure it's not paused.


2. Don't immediately retry the failed command. Instead, wait for 5 to 10 minutes,
establish a new connection, then retry the command. Occasionally Azure system
quickly shifts hardware resources to better load-balance various workloads. Most
of these reconfiguration events finish in less than 60 seconds. During this
reconfiguration time span, you might have issues with connecting to your
databases. Connection could also fail when the service is being automatically
restarted to resolve certain issues.
3. Connect using a different application and/or from another machine.

Query failure due to tempdb space issue


The tempdb is a system database used by the engine for various temporary storage
needs during query execution. It can't be accessed or configured by users. Queries could
fail due to tempdb running out of space. Take these steps to reduce tempdb space usage:

1. Refer to the article about statistics to verify proper column statistics have been
created on all tables.
2. Ensure all table statistics are updated after large DML transactions.
3. Queries with complex JOINs, GROUP BY, and ORDER BY and expect to return large
result set use more tempdb space in execution. Update queries to reduce the
number of GROUP BY and ORDER BY columns if possible.
4. Rerun the query when there's no other active queries running to avoid resource
constraint during query execution.
Query performance seems to degrade over
time
Many factors can affect a query's performance, such as changes in table size, data skew,
workload concurrency, available resources, network, etc. Just because a query runs
slower doesn't necessarily mean there's a query performance problem. Take following
steps to investigate the target query:

1. Identify the differences in all performance-affecting factors among good and bad
performance runs.
2. Refer to the article about statistics to verify proper column statistics have been
created on all tables.
3. Ensure all table statistics are updated after large DML transactions.
4. Check for data skew in base tables.
5. Pause and resume the service. Then, rerun the query when there's no other active
queries running. You can monitor the warehouse workload using DMV.

Query fails after running for a long time. No


data is returned to the client.
A SELECT statement could have completed successfully in the backend and fails when
trying to return the query result set to the client. Try following steps to isolate the
problem:

1. Use different client tools to rerun the same query.

SQL Server Management Studio (SSMS)


Azure Data Studio
The SQL query editor in the Microsoft Fabric portal
The Visual Query editor in the Microsoft Fabric portal
SQLCMD utility (for authentication via Microsoft Entra ID (formerly Azure
Active Directory) Universal with MFA, use parameters -G -U )

2. If step 1 fails, run a CTAS command with the failed SELECT statement to send the
SELECT query result to another table in the same warehouse. Using CTAS avoids
query result set being sent back to the client machine. If the CTAS command
finishes successfully and the target table is populated, then the original query
failure is likely caused by the warehouse front end or client issues.
What to collect before contacting Microsoft
support
Provide the workspace ID of Warehouse.
Provide the Statement ID and Distributed request ID. They're returned as messages
after a query completes or fails.
Provide the text of the exact error message.
Provide the time when the query completes or fails.

Related content
Query insights in Fabric data warehousing
Monitoring connections, sessions, and requests using DMVs
What is the Microsoft Fabric Capacity Metrics app?
Limitations in Microsoft Fabric
Microsoft Entra authentication as an alternative to SQL authentication in Microsoft
Fabric

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


Limitations of Microsoft Fabric Data
Warehouse
Article • 11/19/2024

Applies to: ✅ SQL analytics endpoint and Warehouse in Microsoft Fabric

This article details the current limitations in Microsoft Fabric.

These limitations apply only to Warehouse and SQL analytics endpoint items in Fabric
Synapse Data Warehouse. For limitations of SQL Database in Fabric, see Limitations in
SQL Database in Microsoft Fabric (Preview).

Limitations
Current general product limitations for Data Warehousing in Microsoft Fabric are listed
in this article, with feature level limitations called out in the corresponding feature
article. More functionality will build upon the world class, industry-leading performance
and concurrency story, and will land incrementally. For more information on the future
of Microsoft Fabric, see Fabric Roadmap .

Data warehousing is not supported for multiple geographies at this time.


Currently, parquet files that are no longer needed are not removed from storage
by garbage collection.

For more limitations in specific areas, see:

Clone table
Connectivity
Data types in Microsoft Fabric
Semantic models
Delta lake logs
Pause and resume in Fabric data warehousing
Share your data and manage permissions
Limitations in source control
Statistics
Tables
Transactions
Visual Query editor

Limitations of the SQL analytics endpoint


The following limitations apply to SQL analytics endpoint automatic schema generation
and metadata discovery.

Data should be in Delta Parquet format to be autodiscovered in the SQL analytics


endpoint. Delta Lake is an open-source storage framework that enables building
Lakehouse architecture.

Tables with renamed columns aren't supported in the SQL analytics endpoint.

Delta column mapping by name is supported, but Delta column mapping by ID


is not supported. For more information, see Delta Lake features and Fabric
experiences.
Delta column mapping in the SQL analytics endpoint is currently in preview.

Delta tables created outside of the /tables folder aren't available in the SQL
analytics endpoint.

If you don't see a Lakehouse table in the warehouse, check the location of the
table. Only the tables that are referencing data in the /tables folder are available
in the warehouse. The tables that reference data in the /files folder in the lake
aren't exposed in the SQL analytics endpoint. As a workaround, move your data to
the /tables folder.

Some columns that exist in the Spark Delta tables might not be available in the
tables in the SQL analytics endpoint. Refer to the Data types for a full list of
supported data types.

If you add a foreign key constraint between tables in the SQL analytics endpoint,
you won't be able to make any further schema changes (for example, adding the
new columns). If you don't see the Delta Lake columns with the types that should
be supported in SQL analytics endpoint, check if there is a foreign key constraint
that might prevent updates on the table.

For information and recommendations on performance of the SQL analytics


endpoint, see SQL analytics endpoint performance considerations.

Known issues
For known issues in Microsoft Fabric, visit Microsoft Fabric Known Issues .

Related content
T-SQL surface area
Create a Warehouse in Microsoft Fabric

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


Change ownership of Fabric Warehouse
Article • 09/13/2024

Applies to: ✅ Warehouse in Microsoft Fabric

The Warehouse item uses the owner's identity when accessing data on OneLake. To
change the owner of these items, currently the solution method is to use an API call as
described in this article.

This guide walks you through the steps to change your Warehouse owner to your
Organizational account. The takeover APIs for each allow you to change this owner's
identity to an SPN or other organization account (Microsoft Entra ID). For more
information, see Microsoft Entra authentication as an alternative to SQL authentication
in Microsoft Fabric.

The takeover API only works for Warehouse, not the SQL analytics endpoint.

Prerequisites
Before you begin, you need:

A Fabric workspace with an active capacity or trial capacity.

A Fabric warehouse on a Lakehouse.

Either be a member of the Administrator, Member, or Contributor roles on the


workspace.

Install and import the Power BI PowerShell module, if not installed already. Open
Windows PowerShell as an administrator in an internet-connected workstation and
execute the following command:

PowerShell

Install-Module -Name MicrosoftPowerBIMgmt


Import-Module MicrosoftPowerBIMgmt

Connect
1. Open Windows PowerShell as an administrator.
2. Connect to your Power BI Service:
PowerShell

Connect-PowerBIServiceAccount

Take ownership of Warehouse


1. Navigate to the Warehouse item you want to change the owner in the workspace.
Open the SQL Editor.
2. Copy the URL from your browser and place a text editor for use later on.
3. Copy the first GUID from the URL, for example, 11aaa111-a11a-1111-1aaa-
aa111111aaa . Don't include the / characters. Store this in a text editor for use soon.

4. Copy the second GUID from the URL, for example, 11aaa111-a11a-1111-1aaa-
aa111111aaa . Don't include the / characters. Store this in a text editor for use soon.

5. In the following script, replace workspaceID with the first GUID you copied. Run the
following command.

PowerShell

$workspaceID = 'workspaceID'

6. In the following script, replace warehouseID with the second GUID you copied. Run
the following command.

PowerShell

$warehouseid = 'warehouseID'

7. Run the following command:

PowerShell

$url = 'groups/' + $workspaceID + '/datawarehouses/' + $warehouseid +


'/takeover'

8. Run the following command:

PowerShell

Invoke-PowerBIRestMethod -Url $url -Method Post -Body ""

9. Owner of the warehouse item has now changed.

Full script
PowerShell

# Install the Power BI PowerShell module if not already installed


Install-Module -Name MicrosoftPowerBIMgmt

# Import the Power BI PowerShell module


Import-Module MicrosoftPowerBIMgmt

# Fill the parameters


$workspaceID = 'workspaceID'
$warehouseid = 'warehouseID'

# Connect to the Power BI service


Connect-PowerBIServiceAccount

#Invoke warehouse takeover


$url = 'groups/' + $workspaceID + '/datawarehouses/' + $warehouseid +
'/takeover'
Invoke-PowerBIRestMethod -Url $url -Method Post -Body ""

Related content
Security for data warehousing in Microsoft Fabric

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


Disable V-Order on Warehouse in
Microsoft Fabric
Article • 08/02/2024

Applies to: Warehouse in Microsoft Fabric

This article explains how to disable V-Order on Warehouse in Microsoft Fabric.

Disabling V-Order causes any new Parquet files produced by the warehouse engine to
be created without V-Order optimization.

U Caution

Currently, disabling V-Order can only be done at the warehouse level, and it is
irreversible: once disabled, it cannot be enabled again. Users must consider all the
performance impact of disabling V-Order before deciding to do so.

Disable V-Order on a warehouse


To permanently disable V-Order on a warehouse, use the following T-SQL code to
execute ALTER DATABASE ... SET in a new query window:

SQL

ALTER DATABASE CURRENT SET VORDER = OFF;

Check the V-Order state of a warehouse


To check the current status of V-Order on all warehouses, of your workspace, use the
following T-SQL code to query sys.databases in a new query window:

SQL

SELECT [name], [is_vorder_enabled]


FROM sys.databases;

This query outputs each warehouse on the current workspace, with their V-Order status.
A V-Order state of 1 indicates V-Order is enabled for a warehouse, while a state of 0
indicates disabled.

Related content
Understand and manage V-Order for Warehouse

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Ask the community


Transact-SQL reference (Database
Engine)
Article • 07/12/2023

Applies to: SQL Server Azure SQL Database Azure SQL Managed Instance
Azure Synapse Analytics Analytics Platform System (PDW) SQL analytics
endpoint in Microsoft Fabric Warehouse in Microsoft Fabric SQL database in
Microsoft Fabric

This article gives the basics about how to find and use the Microsoft Transact-SQL (T-
SQL) reference articles. T-SQL is central to using Microsoft SQL products and services. All
tools and applications that communicate with a SQL Server database do so by sending
T-SQL commands.

T-SQL compliance with the SQL standard


For detailed technical documents about how certain standards are implemented in SQL
Server, see the Microsoft SQL Server Standards Support documentation.

Tools that use T-SQL


Some of the Microsoft tools that issue T-SQL commands are:

SQL Server Management Studio (SSMS)


Azure Data Studio
SQL Server Data Tools (SSDT)
sqlcmd

Locate the Transact-SQL reference articles


To find T-SQL articles, use search at the top right of this page, or use the table of
contents on the left side of the page. You can also type a T-SQL key word in the
Management Studio Query Editor window, and press F1.

Find system views


To find the system tables, views, functions, and procedures, see these links, which are in
the Using relational databases section of the SQL documentation.
System catalog Views
System compatibility views
System dynamic management views
System functions
System information schema views
System stored procedures
System tables

"Applies to" references


The T-SQL reference articles encompass multiple versions of SQL Server, starting with
2008, and the other Azure SQL services. Near the top of each article, is a section that
indicates which products and services support subject of the article.

For example, this article applies to all versions, and has the following label.

Applies to: SQL Server Azure SQL Database Azure SQL Managed Instance
Azure Synapse Analytics Analytics Platform System (PDW)

Another example, the following label indicates an article that applies only to Azure
Synapse Analytics and Parallel Data Warehouse.

Applies to: Azure Synapse Analytics Analytics Platform System (PDW)

In some cases, the article is used by a product or service, but all of the arguments aren't
supported. In this case, other Applies to sections are inserted into the appropriate
argument descriptions in the body of the article.

Get help from Microsoft Q & A


For online help, see the Microsoft Q & A Transact-SQL Forum.

See other language references


The SQL docs include these other language references:

XQuery Language Reference


Integration Services Language Reference
Replication Language Reference
Analysis Services Language Reference
Next steps
Tutorial: Writing Transact-SQL Statements
Transact-SQL Syntax Conventions (Transact-SQL)

Feedback
Was this page helpful?  Yes  No

Provide product feedback | Get help at Microsoft Q&A


Microsoft Learn documentation
contributor guide overview
Article • 02/16/2023

Welcome to the Microsoft Learn documentation contributor guide!

Sharing your expertise with others on Microsoft Learn helps everyone achieve more. Use
the information in this guide to publish a new article to Microsoft Learn or make
updates to an existing published article.

Several of the Microsoft documentation sets are open source and hosted on GitHub.
Not all document sets are completely open source, but many have public-facing repos
where you can suggest changes via pull requests (PR). This open-source approach
streamlines and improves communication between product engineers, content teams,
and customers, and it has other advantages:

Open-source repos plan in the open to get feedback on what docs are most
needed.
Open-source repos review in the open to publish the most helpful content on our
first release.
Open-source repos update in the open to make it easier to continuously improve
the content.

The user experience on Microsoft Learn integrates GitHub workflows directly to make
it even easier. Start by editing the document you're viewing. Or help by reviewing new
topics or creating quality issues.

) Important

All repositories that publish to Microsoft Learn have adopted the Microsoft Open
Source Code of Conduct or the .NET Foundation Code of Conduct . For more
information, see the Code of Conduct FAQ . Contact [email protected]
or [email protected] with any questions or comments.

Minor corrections or clarifications to documentation and code examples in public


repositories are covered by the learn.microsoft.com Terms of Use. New or
significant changes generate a comment in the PR, asking you to submit an online
Contribution License Agreement (CLA) if you're not a Microsoft employee. We need
you to complete the online form before we can review or accept your PR.
Quick edits to documentation
Quick edits streamline the process to report and fix small errors and omissions in
documentation. Despite all efforts, small grammar and spelling errors do make their way
into our published documents. While you can create issues to report mistakes, it's faster
and easier to create a PR to fix the issue, when the option is available.

1. Some docs pages allow you to edit content directly in the browser. If so, you'll see
an Edit button like the one shown below. Choosing the Edit (or equivalently
localized) button takes you to the source file on GitHub.

If the Edit button isn't present, it means the content isn't open to public
contributions. Some pages are generated (for example, from inline documentation
in code) and must be edited in the project they belong to.

2. Select the pencil icon to edit the article. If the pencil icon is grayed out, you need
to either log in to your GitHub account or create a new account.

3. Edit the file in the web editor. Choose the Preview tab to check the formatting of
your changes.

4. When you're finished editing, scroll to the bottom of the page. In the Propose
changes area, enter a title and optionally a description for your changes. The title
will be the first line of the commit message. Select Propose changes to create a
new branch in your fork and commit your changes:
5. Now that you've proposed and committed your changes, you need to ask the
owners of the repository to "pull" your changes into their repository. This is done
using something called a "pull request" (PR). When you select Propose changes, a
new page similar to the following is displayed:

Select Create pull request. Next, enter a title and a description for the PR, and then
select Create pull request. If you're new to GitHub, see About pull requests for
more information.

6. That's it! Content team members will review your PR and merge it when it's
approved. You may get feedback requesting changes.

The GitHub editing UI responds to your permissions on the repository. The preceding
images are for contributors who don't have write permissions to the target repository.
GitHub automatically creates a fork of the target repository in your account. The newly
created fork name has the form GitHubUsername / RepositoryName by default. If you have
write access to the target repository, such as your fork, GitHub creates a new branch in
the target repository. The branch name has the default form patch- n , using a numeric
identifier for the patch branch.

We use PRs for all changes, even for contributors who have write access. Most
repositories protect the default branch so that updates must be submitted as PRs.

The in-browser editing experience is best for minor or infrequent changes. If you make
large contributions or use advanced Git features (such as branch management or
advanced merge conflict resolution), you need to fork the repo and work locally.
7 Note

Most localized documentation doesn't offer the ability to edit or provide feedback
through GitHub. To provide feedback on localized content, use
https://aka.ms/provide-feedback form.

Review open PRs


You can read new topics before they're published by checking the open PR queue.
Reviews follow the GitHub flow process. You can see proposed updates or new articles
in public repositories. Review them and add your comments. Look at any of our
Microsoft Learn repositories, and check the open PRs for areas that interest you.
Community feedback on proposed updates helps the entire community.

Create quality issues


Our docs are a continuous work in progress. Good issues help us focus our efforts on
the highest priorities for the community. The more detail you can provide, the more
helpful the issue. Tell us what information you sought. Tell us the search terms you used.
If you can't get started, tell us how you want to start exploring unfamiliar technology.

Many of Microsoft's documentation pages have a Feedback section at the bottom of


the page where you can choose to leave Product feedback or Content feedback to
track issues that are specific to that article.

Issues start the conversation about what's needed. The content team will respond to
these issues with ideas for what we can add, and ask for your opinions. When we create
a draft, we'll ask you to review the PR.

Get more involved


Other topics in this guide help you get started productively contributing to Microsoft
Learn. They explain working with GitHub repositories, Markdown tools, and extensions
used in the Microsoft Learn content.

You might also like