Fabric Data Warehouse
Fabric Data Warehouse
Data warehousing
e OVERVIEW
Overview
g TUTORIAL
b GET STARTED
What is a Lakehouse?
c HOW-TO GUIDE
Connectivity
Workspace roles
Row-level security
Column-level security
Ingest data
c HOW-TO GUIDE
p CONCEPT
Mirroring
Mirroring Databricks
Mirroring Snowflake
Open mirroring
Share and secure mirrored databases
p CONCEPT
Semantic models
Query
p CONCEPT
Manage
c HOW-TO GUIDE
Statistics
Query insights
Caching
Troubleshoot
Monitor
c HOW-TO GUIDE
Capacity Metrics app
Warehouse capacity
Y ARCHITECTURE
Burstable capacity
Workload Management
Data recovery
Y ARCHITECTURE
Restore in-place
Clone tables
Best practices
Y ARCHITECTURE
Warehouse performance
Security
Ingest Data
What is data warehousing in Microsoft
Fabric?
Article • 08/22/2024
The easy-to-use SaaS experience is also tightly integrated with Power BI for easy analysis
and reporting, converging the world of data lakes and warehouses and greatly
simplifying an organizations investment in their analytics estate.
The warehouse can be populated by any one of the supported data ingestion methods
such as COPY INTO, Pipelines, Dataflows, or cross database ingestion options such
as CREATE TABLE AS SELECT (CTAS), INSERT..SELECT, or SELECT INTO.
Performance guidelines
With the SQL analytics endpoint of the Lakehouse, T-SQL commands can define and
query data objects but not manipulate or modify the data. You can perform the
following actions in the SQL analytics endpoint:
Query the tables that reference data in your Delta Lake folders in the lake.
Create views, inline TVFs, and procedures to encapsulate your semantics and
business logic in T-SQL.
Manage permissions on the objects.
Warehouse or lakehouse
When deciding between using a warehouse or a lakehouse, it's important to consider
the specific needs and context of your data management and analytics
requirements. Equally important, this is not a one way decision!
You always have the opportunity to add one or the other at a later point should your
business needs change and regardless of where you start, both the warehouse and the
lakehouse use the same powerful SQL engine for all T-SQL queries.
Here are some general guidelines to help you make the decision:
Choose a data warehouse when you need an enterprise-scale solution with open
standard format, no knobs performance, and minimal setup. Best suited for semi-
structured and structured data formats, the data warehouse is suitable for both
beginner and experienced data professionals, offering simple and intuitive
experiences.
Choose a lakehouse when you need a large repository of highly unstructured data
from heterogeneous sources, leveraging low-cost object storage and want to use
SPARK as your primary development tool. Acting as a 'lightweight' data warehouse,
you always have the option to use the SQL endpoint and T-SQL tools to deliver
reporting and data intelligence scenarios in your lakehouse.
For more detailed decision guidance, see Microsoft Fabric decision guide: Choose
between Warehouse and Lakehouse.
Related content
Better together: the lakehouse and warehouse
Create a warehouse in Microsoft Fabric
Create a lakehouse in Microsoft Fabric
Introduction to Power BI datamarts
Create reports on data warehousing in Microsoft Fabric
Source control with Warehouse (preview)
Feedback
Was this page helpful? Yes No
Use this reference guide and the example scenarios to help you choose a data store for
your Microsoft Fabric workloads.
ノ Expand table
Security RLS, CLS**, table level (T- Object level, RLS, CLS, RLS
SQL), none for Spark DDL/DML, dynamic data
masking
Advanced Interface for large-scale Interface for large-scale Time Series native
analytics data processing, built-in data processing, built-in elements, full geo-
data parallelism, and fault data parallelism, and spatial and query
tolerance fault tolerance capabilities
Advanced Tables defined using Tables defined using Full indexing for free
formatting PARQUET, CSV, AVRO, PARQUET, CSV, AVRO, text and semi-
support JSON, and any Apache JSON, and any Apache structured data like
Hive compatible file Hive compatible file JSON
format format
* Spark supports reading from tables using shortcuts, doesn't yet support accessing
views, stored procedures, functions etc.
ノ Expand table
Advanced T-SQL analytical capabilities, data Interface for data processing with
analytics replicated to delta parquet in automated performance tuning
OneLake for analytics
Advanced Table support for OLTP, JSON, Tables defined using PARQUET, CSV,
formatting vector, graph, XML, spatial, key- AVRO, JSON, and any Apache Hive
support value compatible file format
Ingestion latency Available instantly for querying Available instantly for querying
Scenarios
Review these scenarios for help with choosing a data store in Fabric.
Scenario 1
Susan, a professional developer, is new to Microsoft Fabric. They're ready to get started
cleaning, modeling, and analyzing data but need to decide to build a data warehouse or
a lakehouse. After review of the details in the previous table, the primary decision points
are the available skill set and the need for multi-table transactions.
Susan has spent many years building data warehouses on relational database engines,
and is familiar with SQL syntax and functionality. Thinking about the larger team, the
primary consumers of this data are also skilled with SQL and SQL analytical tools. Susan
decides to use a Fabric warehouse, which allows the team to interact primarily with T-
SQL, while also allowing any Spark users in the organization to access the data.
Susan creates a new lakehouse and accesses the data warehouse capabilities with the
lakehouse SQL analytics endpoint. Using the Fabric portal, creates shortcuts to the
external data tables and places them in the /Tables folder. Susan now can write T-SQL
queries that reference shortcuts to query Delta Lake data in the lakehouse. The shortcuts
automatically appear as tables in the SQL analytics endpoint and can be queried with T-
SQL using three-part names.
Scenario 2
Rob, a data engineer, needs to store and model several terabytes of data in Fabric. The
team has a mix of PySpark and T-SQL skills. Most of the team running T-SQL queries are
consumers, and therefore don't need to write INSERT, UPDATE, or DELETE statements.
The remaining developers are comfortable working in notebooks, and because the data
is stored in Delta, they're able to interact with a similar SQL syntax.
Rob decides to use a lakehouse, which allows the data engineering team to use their
diverse skills against the data, while allowing the team members who are highly skilled
in T-SQL to consume the data.
Scenario 3
Ash, a citizen developer, is a Power BI developer. They're familiar with Excel, Power BI,
and Office. They need to build a data product for a business unit. They know they don't
quite have the skills to build a data warehouse or a lakehouse, and those seem like too
much for their needs and data volumes. They review the details in the previous table
and see that the primary decision points are their own skills and their need for a self
service, no code capability, and data volume under 100 GB.
Ash works with business analysts familiar with Power BI and Microsoft Office, and knows
that they already have a Premium capacity subscription. As they think about their larger
team, they realize the primary consumers of this data are analysts, familiar with no-code
and SQL analytical tools. Ash decides to use a Power BI datamart, which allows the team
to interact build the capability fast, using a no-code experience. Queries can be
executed via Power BI and T-SQL, while also allowing any Spark users in the organization
to access the data as well.
Scenario 4
Daisy is business analyst experienced with using Power BI to analyze supply chain
bottlenecks for a large global retail chain. They need to build a scalable data solution
that can handle billions of rows of data and can be used to build dashboards and
reports that can be used to make business decisions. The data comes from plants,
suppliers, shippers, and other sources in various structured, semi-structured, and
unstructured formats.
Daisy decides to use an Eventhouse because of its scalability, quick response times,
advanced analytics capabilities including time series analysis, geospatial functions, and
fast direct query mode in Power BI. Queries can be executed using Power BI and KQL to
compare between current and previous periods, quickly identify emerging problems, or
provide geo-spatial analytics of land and maritime routes.
Scenario 5
Kirby is an application architect experienced in developing .NET applications for
operational data. They need a high concurrency database with full ACID transaction
compliance and strongly enforced foreign keys for relational integrity. Kirby wants the
benefit of automatic performance tuning to simplify day-to-day database management.
Kirby decides on a SQL database in Fabric, with the same SQL Database Engine as Azure
SQL Database. SQL databases in Fabric automatically scale to meet demand throughout
the business day. They have the full capability of transactional tables and the flexibility
of transaction isolation levels from serializable to read committed snapshot. SQL
database in Fabric automatically creates and drops nonclustered indexes based on
strong signals from execution plans observed over time.
In Kirby's scenario, data from the operational application must be joined with other data
in Fabric: in Spark, in a warehouse, and from real-time events in an Eventhouse. Every
Fabric database includes a SQL analytics endpoint, so data to be accessed in real time
from Spark or with Power BI queries using DirectLake mode. These reporting solutions
spare the primary operational database from the overhead of analytical workloads, and
avoid denormalization. Kirby also has existing operational data in other SQL databases,
and needs to import that data without transformation. To import existing operational
data without any data type conversion, Kirby designs data pipelines with Fabric Data
Factory to import data into the Fabric SQL database.
Related content
Create a lakehouse in Microsoft Fabric
Create a warehouse in Microsoft Fabric
Create an eventhouse
Create a SQL database in the Fabric portal
Power BI datamart
Feedback
Was this page helpful? Yes No
Microsoft Fabric offers two enterprise-scale, open standard format workloads for data
storage: Warehouse and Lakehouse. This article compares the two platforms and the
decision points for each.
Criterion
Spark
Use Lakehouse
T-SQL
Use Warehouse
Yes
Use Warehouse
No
Use Lakehouse
Don't know
Use Lakehouse
Unstructured and structureddata
Use Lakehouse
Structureddata only
Use Warehouse
Choose a candidate service
Perform a detailed evaluation of the service to confirm that it meets your needs.
The Warehouse item in Fabric Data Warehouse is an enterprise scale data warehouse
with open standard format.
The Lakehouse item in Fabric Data Engineering is a data architecture platform for
storing, managing, and analyzing structured and unstructured data in a single location.
Store, manage, and analyze structured and unstructured data in a single location
to gain insights and make decisions faster and efficiently.
Flexible and scalable solution that allows organizations to handle large volumes of
data of all types and sizes.
Easily ingest data from many different sources, which are converted into a unified
Delta format
Automatic table discovery and registration for a fully managed file-to-table
experience for data engineers and data scientists.
Automatic SQL analytics endpoint and default dataset that allows T-SQL querying
of delta tables in the lake
Warehouse
Primary capabilities
Read only, system generated SQL analytics endpoint for Lakehouse for T-SQL querying
and serving. Supports analytics on the Lakehouse Delta tables, and the Delta Lake
folders referenced via shortcuts.
Developer profile
Data loading
Storage layer
Development experience
Warehouse Editor with full support for T-SQL data ingestion, modeling,
development, and querying UI experiences for data ingestion, modeling, and
querying
Read / Write support for 1st and 3rd party tooling
Lakehouse SQL analytics endpoint with limited T-SQL support for views, table
valued functions, and SQL Queries
UI experiences for modeling and querying
Limited T-SQL support for 1st and 3rd party tooling
T-SQL capabilities
Full DQL, DML, and DDL T-SQL support, full transaction support
Full DQL, No DML, limited DDL T-SQL Support such as SQL Views and TVFs
Related content
Microsoft Fabric decision guide: choose a data store
Feedback
Was this page helpful? Yes No
This article describes how to get started with Warehouse in Microsoft Fabric using the
Microsoft Fabric portal, including discovering creation and consumption of the
warehouse. You learn how to create your warehouse from scratch and sample along with
other helpful information to get you acquainted and proficient with warehouse
capabilities offered through the Microsoft Fabric portal.
Tip
You can proceed with either a new blank Warehouse or a new Warehouse with
sample data to continue this series of Get Started steps.
The first hub in the navigation pane is the Home hub. You can start creating your
warehouse from the Home hub by selecting the Warehouse card under the New
section. An empty warehouse is created for you to start creating objects in the
warehouse. You can use either sample data to get a jump start or load your own test
data if you prefer.
Create a warehouse using the Create hub
Another option available to create your warehouse is through the Create hub, which is
the second hub in the navigation pane.
You can create your warehouse from the Create hub by selecting the Warehouse card
under the Data Warehousing section. When you select the card, an empty warehouse is
created for you to start creating objects in the warehouse or use a sample to get started
as previously mentioned.
Once initialized, you can load data into your warehouse. For more information about
getting data into a warehouse, see Ingesting data.
1. The first hub in the navigation pane is the Home hub. You can start creating your
warehouse sample from the Home hub by selecting the Warehouse sample card
under the New section.
2. Provide the name for your sample warehouse and select Create.
3. The create action creates a new Warehouse and start loading sample data into it.
The data loading takes few minutes to complete.
4. On completion of loading sample data, the warehouse opens with data loaded into
tables and views to query.
If you have an existing warehouse created that's empty, the following steps will show
how to load sample data.
1. Once you have created your warehouse, you can load sample data into warehouse
from Use sample database card on the home page of the warehouse.
3. On completion of loading sample data, the warehouse displays data loaded into
tables and views to query.
4. The following sample T-SQL scripts can be used on the sample data in your new
warehouse.
7 Note
SQL
/*************************************************
Get number of trips performed by each medallion
**************************************************/
SELECT
M.MedallionID
,M.MedallionCode
,COUNT(T.TripDistanceMiles) AS TotalTripCount
FROM
dbo.Trip AS T
JOIN
dbo.Medallion AS M
ON
T.MedallionID=M.MedallionID
GROUP BY
M.MedallionID
,M.MedallionCode
/****************************************************
How many passengers are being picked up on each trip?
*****************************************************/
SELECT
PassengerCount,
COUNT(*) AS CountOfTrips
FROM
dbo.Trip
WHERE
PassengerCount > 0
GROUP BY
PassengerCount
ORDER BY
PassengerCount
/**********************************************************************
***********
What is the distribution of trips by hour on working days (non-holiday
weekdays)?
***********************************************************************
**********/
SELECT
ti.HourlyBucket,
COUNT(*) AS CountOfTrips
FROM dbo.Trip AS tr
INNER JOIN dbo.Date AS d
ON tr.DateID = d.DateID
INNER JOIN dbo.Time AS ti
ON tr.PickupTimeID = ti.TimeID
WHERE
d.IsWeekday = 1
AND d.IsHolidayUSA = 0
GROUP BY
ti.HourlyBucket
ORDER BY
ti.HourlyBucket
Tip
Next step
Create tables in the Warehouse in Microsoft Fabric
Feedback
Was this page helpful? Yes No
2. Select Table, and an autogenerated CREATE TABLE script template appears in your
new SQL query window, as shown in the following image.
To learn more about supported table creation in Warehouse in Microsoft Fabric, see
Tables in data warehousing in Microsoft Fabric and Data types in Microsoft Fabric.
Next step
Ingest data into your Warehouse using data pipelines
Feedback
Was this page helpful? Yes No
Data pipelines offer an alternative to using the COPY command through a graphical user
interface. A data pipeline is a logical grouping of activities that together perform a data
ingestion task. Pipelines allow you to manage extract, transform, and load (ETL) activities
instead of managing each one individually.
In this tutorial, you'll create a new pipeline that loads sample data into a Warehouse in
Microsoft Fabric.
7 Note
Some features from Azure Data Factory are not available in Microsoft Fabric, but
the concepts are interchangeable. You can learn more about Azure Data Factory
and Pipelines on Pipelines and activities in Azure Data Factory and Azure Synapse
Analytics. For a quickstart, visit Quickstart: Create your first pipeline to copy data.
3. You'll land in the pipeline canvas area, where you see three options to get started:
Add a pipeline activity, Copy data, and Choose a task to start.
Add pipeline activity: this option launches the pipeline editor, where you can
create new pipelines from scratch by using pipeline activities.
Copy data: this option launches a step-by-step assistant that helps you select
a data source, a destination, and configure data load options such as the
column mappings. On completion, it creates a new pipeline activity with a
Copy Data task already configured for you.
Choose a task to start: this option launches a set of predefined templates to
help get you started with pipelines based on different scenarios.
4. The first page of the Copy data assistant helps you pick your own data from
various data sources, or select from one of the provided samples to get started.
For this tutorial, we'll use the COVID-19 Data Lake sample. Select this option and
select Next.
5. In the next page, you can select a dataset, the source file format, and preview the
selected dataset. Select Bing COVID-19, the CSV format, and select Next.
6. The next page, Data destinations, allows you to configure the type of the
destination workspace. We'll load data into a warehouse in our workspace, so
select the Warehouse tab, and the Data Warehouse option. Select Next.
7. Now it's time to pick the warehouse to load data into. Select your desired
warehouse in the dropdown list and select Next.
8. The last step to configure the destination is to provide a name to the destination
table and configure the column mappings. Here you can choose to load the data
to a new table or to an existing one, provide a schema and table names, change
column names, remove columns, or change their mappings. You can accept the
defaults, or adjust the settings to your preference.
9. The next page gives you the option to use staging, or provide advanced options
for the data copy operation (which uses the T-SQL COPY command). Review the
options without changing them and select Next.
10. The last page in the assistant offers a summary of the copy activity. Select the
option Start data transfer immediately and select Save + Run.
11. You are directed to the pipeline canvas area, where a new Copy Data activity is
already configured for you. The pipeline starts to run automatically. You can
monitor the status of your pipeline in the Output pane:
12. After a few seconds, your pipeline finishes successfully. Navigating back to your
warehouse, you can select your table to preview the data and confirm that the
copy operation concluded.
For more on data ingestion into your Warehouse in Microsoft Fabric, visit:
Next step
Query the SQL analytics endpoint or Warehouse in Microsoft Fabric
Feedback
Was this page helpful? Yes No
Applies to: SQL analytics endpoint, Warehouse, and Mirrored Database in Microsoft
Fabric
Alternatively, you can use any of these tools to connect to your SQL analytics
endpoint or Warehouse via a T-SQL connection string. For more information, see
Connectivity.
Download SQL Server Management Studio (SSMS).
Download Azure Data Studio .
7 Note
Review the T-SQL surface area for SQL analytics endpoint or Warehouse in
Microsoft Fabric.
2. A new tab appears for you to create a visual query.
3. Drag and drop tables from the object Explorer to Visual query editor window to
create a query.
1. Add SQL analytics endpoint or Warehouse from your current active workspace to
object Explorer using + Warehouses action. When you select SQL analytics
endpoint or Warehouse from the dialog, it gets added into the object Explorer for
referencing when writing a SQL query or creating Visual query.
2. You can reference the table from added databases using three-part naming. In the
following example, use the three-part name to refer to ContosoSalesTable in the
added database ContosoLakehouse .
SQL
SELECT *
FROM ContosoLakehouse.dbo.ContosoSalesTable AS Contoso
INNER JOIN Affiliation
ON Affiliation.AffiliationId = Contoso.RecordTypeID;
3. Using three-part naming to reference the databases/tables, you can join multiple
databases.
SQL
SELECT *
FROM ContosoLakehouse.dbo.ContosoSalesTable AS Contoso
INNER JOIN My_lakehouse.dbo.Affiliation
ON My_lakehouse.dbo.Affiliation.AffiliationId = Contoso.RecordTypeID;
4. For more efficient and longer queries, you can use aliases.
SQL
SELECT *
FROM ContosoLakehouse.dbo.ContosoSalesTable AS Contoso
INNER JOIN My_lakehouse.dbo.Affiliation as MyAffiliation
ON MyAffiliation.AffiliationId = Contoso.RecordTypeID;
5. Using three-part naming to reference the database and tables, you can insert data
from one database to another.
SQL
6. You can drag and drop tables from added databases to Visual query editor to
create a cross-database query.
3. Once the script is automatically generated, select the Run button to run the script
and see the results.
7 Note
At this time, there's limited T-SQL functionality. See T-SQL surface area for a list of
T-SQL commands that are currently not available.
Next step
Create reports on data warehousing in Microsoft Fabric
Feedback
Was this page helpful? Yes No
Microsoft Fabric lets you create reusable and default Power BI semantic models to
create reports in various ways in Power BI. This article describes the various ways you
can use your Warehouse or SQL analytics endpoint, and their default Power BI semantic
models, to create reports.
For example, you can establish a live connection to a shared semantic model in the
Power BI service and create many different reports from the same semantic model. You
can create a data model in Power BI Desktop and publish to the Power BI service. Then,
you and others can create multiple reports in separate .pbix files from that common
data model and save them to different workspaces.
Advanced users can build reports from a warehouse using a composite model or using
the SQL connection string.
Reports that use the Warehouse or SQL analytics endpoint can be created in either of
the following two tools:
Power BI service
Power BI Desktop
7 Note
Microsoft has renamed the Power BI dataset content type to semantic model. This
applies to Microsoft Fabric as well. For more information, see New name for Power
BI datasets.
With a default semantic model that has tables, the New report opens a browser tab to
the report editing canvas to a new report that is built on the semantic model. When you
save your new report you're prompted to choose a workspace, provided you have write
permissions for that workspace. If you don't have write permissions, or if you're a free
user and the semantic model resides in a Premium capacity workspace, the new report is
saved in your My workspace.
For more information on how to create reports using the Power BI service, see Create
reports in the Power BI service.
For a tutorial with Power BI Desktop, see Get started with Power BI Desktop. For
advanced situations where you want to add more data or change the storage mode, see
use composite models in Power BI Desktop.
If you're browsing for a specific SQL analytics endpoint or Warehouse in OneLake, you
can use the integrated OneLake data hub in Power BI Desktop to make a connection
and build reports:
1. Open Power BI Desktop and select Warehouse under the OneLake data hub
dropdown list in the ribbon.
2. Select the desired warehouse.
If you would like to create a live connection to the automatically defined data
model, select Connect.
If you would like to connect directly to the data source and define your own
data model, select the dropdown list arrow for the Connect button and select
Connect to SQL endpoint.
Alternatively, if you have the SQL connection string of your SQL analytics endpoint or
Warehouse and would like more advanced options, such as writing a SQL statement to
filter out specific data, connect to a warehouse in Power BI Desktop:
1. In the Fabric portal, right-click on the Warehouse or SQL analytics endpoint in your
workspace and select Copy SQL connection string. Or, navigate to the Warehouse
Settings in your workspace. Copy the SQL connection string.
2. Open Power BI Desktop and select SQL Server in the ribbon.
3. Paste the SQL connection string under Server.
4. In the Navigator dialog, select the databases and tables you would like to load.
5. If prompted for authentication, select Organizational account.
6. Authenticate using Microsoft Entra ID (formerly Azure Active Directory) multifactor
authentication (MFA). For more information, see Microsoft Entra authentication as
an alternative to SQL authentication in Microsoft Fabric.
Related content
Model data in the default Power BI semantic model in Microsoft Fabric
Create reports in the Power BI service in Microsoft Fabric and Power BI Desktop
Feedback
Was this page helpful? Yes No
Microsoft Fabric provides a one-stop shop for all the analytical needs for every
enterprise. It covers the complete spectrum of services including data movement, data
lake, data engineering, data integration and data science, real time analytics, and
business intelligence. With Microsoft Fabric, there's no need to stitch together different
services from multiple vendors. Instead, the customer enjoys an end-to-end, highly
integrated, single comprehensive product that is easy to understand, onboard, create
and operate. No other product on the market offers the breadth, depth, and level of
integration that Microsoft Fabric offers. Additionally, Microsoft Purview is included by
default in every tenant to meet compliance and governance needs.
1. Sign into your Power BI online account, or if you don't have an account yet, sign up
for a free trial.
2. Enable Microsoft Fabric in your tenant.
In this tutorial, you take on the role of a Warehouse developer at the fictional Wide
World Importers company and complete the following steps in the Microsoft Fabric
portal to build and implement an end-to-end data warehouse solution:
Data sources - Microsoft Fabric makes it easy and quick to connect to Azure Data
Services, other cloud platforms, and on-premises data sources to ingest data from.
Ingestion - With 200+ native connectors as part of the Microsoft Fabric pipeline and
with drag and drop data transformation with dataflow, you can quickly build insights for
your organization. Shortcut is a new feature in Microsoft Fabric that provides a way to
connect to existing data without having to copy or move it. You can find more details
about the Shortcut feature later in this tutorial.
Transform and store - Microsoft Fabric standardizes on Delta Lake format, which means
all the engines of Microsoft Fabric can read and work on the same data stored in
OneLake - no need for data duplicity. This storage allows you to build a data warehouse
or data mesh based on your organizational need. For transformation, you can choose
either low-code or no-code experience with pipelines/dataflows or use T-SQL for a code
first experience.
Consume - Data from the warehouse can be consumed by Power BI, the industry
leading business intelligence tool, for reporting and visualization. Each warehouse
comes with a built-in TDS endpoint for easily connecting to and querying data from
other reporting tools, when needed. When a warehouse is created, a secondary item,
called a default semantic model, is generated at the same time with the same name. You
can use the default semantic model to start visualizing data with just a couple of steps.
Sample data
For sample data, we use the Wide World Importers (WWI) sample database. For our data
warehouse end-to-end scenario, we have generated sufficient data for a sneak peek into
the scale and performance capabilities of the Microsoft Fabric platform.
Wide World Importers (WWI) is a wholesale novelty goods importer and distributor
operating from the San Francisco Bay area. As a wholesaler, WWI's customers are mostly
companies who resell to individuals. WWI sells to retail customers across the United
States including specialty stores, supermarkets, computing stores, tourist attraction
shops, and some individuals. WWI also sells to other wholesalers via a network of agents
who promote the products on WWI's behalf. To earn more about their company profile
and operation, see Wide World Importers sample databases for Microsoft SQL.
Typically, you would bring data from transactional systems (or line of business
applications) into a data lake or data warehouse staging area. However, for this tutorial,
we use the dimensional model provided by WWI as our initial data source. We use it as
the source to ingest the data into a data warehouse and transform it through T-SQL.
Data model
While the WWI dimensional model contains multiple fact tables, for this tutorial we
focus on the fact_sale table and its related dimensions only, as follows, to demonstrate
this end-to-end data warehouse scenario:
Next step
Tutorial: Create a Microsoft Microsoft Fabric workspace
Feedback
Was this page helpful? Yes No
Before you can create a warehouse, you need to create a workspace where you'll build
out the remainder of the tutorial.
Create a workspace
The workspace contains all the items needed for data warehousing, including: Data
Factory pipelines, the data warehouse, Power BI semantic models, operational
databases, and reports.
1. Sign in to Power BI .
Next step
Tutorial: Create a Microsoft Fabric data warehouse
Feedback
Was this page helpful? Yes No
Now that you have a workspace, you can create your first Warehouse in Microsoft
Fabric.
2. Search for the workspace you created in Tutorial: Create a Microsoft Fabric
workspace by typing in the search textbox at the top and selecting your workspace
to open it.
3. Select the + New button to display a full list of available items. From the list of
objects to create, choose **Warehouse ** to create a new Warehouse in Microsoft
Fabric.
1. Select Create.
Next step
Tutorial: Ingest data into a Microsoft Fabric data warehouse
Feedback
Was this page helpful? Yes No
Now that you have created a Warehouse in Microsoft Fabric, you can ingest data into
that warehouse.
Ingest data
1. From the Build a warehouse landing page, select the Data Warehouse Tutorial
workspace in the navigation menu to return to the workspace item list.
2. Select New > More options to display a full list of available items.
4. On the New pipeline dialog, enter Load Customer Data as the name.
5. Select Create.
8. If necessary, select the newly created Copy data activity from the design canvas
and follow the next steps to configure it.
12. On the New connection page, select or type to select Azure Blobs from the list of
connection options.
c. The Connection name field is automatically populated, but for clarity, type in
Wide World Importers Public Sample .
16. Change the remaining settings on the Source page of the copy activity as follows,
to reach the .parquet files in
https://fabrictutorialdata.blob.core.windows.net/sampledata/WideWorldImporters
DW/parquet/full/dimension_customer/*.parquet :
i. Container: sampledata
17. Select Preview data next to the File path setting to ensure there are no errors.
18. Select the Destination page of the Copy data activity. For Connection, select the
warehouse item WideWorldImporters from the list, or select More to search for
the warehouse.
19. Next to the Table option configuration setting, select the Auto create table radio
button.
20. The dropdown menu next to the Table configuration setting will automatically
change to two text boxes.
21. In the first box next to the Table setting, enter dbo .
22. In the second box next to the Table setting, enter dimension_customer .
25. Monitor the copy activity's progress on the Output page and wait for it to
complete.
Next step
Tutorial: Create tables in a data warehouse
Feedback
Was this page helpful? Yes No
Learn how to create tables in the data warehouse you created in a previous part of the
tutorial.
Create a table
1. Select Workspaces in the navigation menu.
2. Select the workspace created in Tutorial: Create a Microsoft Fabric data workspace,
such as Data Warehouse Tutorial.
3. From the item list, select WideWorldImporters with the type of Warehouse.
4. From the ribbon, select New SQL query. Under Blank, select New SQL query for a
new blank query window.
SQL
/*
1. Drop the dimension_city table if it already exists.
2. Create the dimension_city table.
3. Drop the fact_sale table if it already exists.
4. Create the fact_sale table.
*/
--dimension_city
DROP TABLE IF EXISTS [dbo].[dimension_city];
CREATE TABLE [dbo].[dimension_city]
(
[CityKey] [int] NULL,
[WWICityID] [int] NULL,
[City] [varchar](8000) NULL,
[StateProvince] [varchar](8000) NULL,
[Country] [varchar](8000) NULL,
[Continent] [varchar](8000) NULL,
[SalesTerritory] [varchar](8000) NULL,
[Region] [varchar](8000) NULL,
[Subregion] [varchar](8000) NULL,
[Location] [varchar](8000) NULL,
[LatestRecordedPopulation] [bigint] NULL,
[ValidFrom] [datetime2](6) NULL,
[ValidTo] [datetime2](6) NULL,
[LineageKey] [int] NULL
);
--fact_sale
(
[SaleKey] [bigint] NULL,
[CityKey] [int] NULL,
[CustomerKey] [int] NULL,
[BillToCustomerKey] [int] NULL,
[StockItemKey] [int] NULL,
[InvoiceDateKey] [datetime2](6) NULL,
[DeliveryDateKey] [datetime2](6) NULL,
[SalespersonKey] [int] NULL,
[WWIInvoiceID] [int] NULL,
[Description] [varchar](8000) NULL,
[Package] [varchar](8000) NULL,
[Quantity] [int] NULL,
[UnitPrice] [decimal](18, 2) NULL,
[TaxRate] [decimal](18, 3) NULL,
[TotalExcludingTax] [decimal](29, 2) NULL,
[TaxAmount] [decimal](38, 6) NULL,
[Profit] [decimal](18, 2) NULL,
[TotalIncludingTax] [decimal](38, 6) NULL,
[TotalDryItems] [int] NULL,
[TotalChillerItems] [int] NULL,
[LineageKey] [int] NULL,
[Month] [int] NULL,
[Year] [int] NULL,
[Quarter] [int] NULL
);
9. Press Enter on the keyboard or select anywhere outside the tab to save the
change.
10. Validate the table was created successfully by selecting the refresh icon button on
the ribbon.
11. In the Object explorer, verify that you can see the newly created Create Tables
query, fact_sale table, and dimension_city table.
Next step
Tutorial: Load data using T-SQL
Feedback
Was this page helpful? Yes No
Now that you know how to build a data warehouse, load a table, and generate a report,
it's time to extend the solution by exploring other methods for loading data.
SQL
--Copy data from the public Azure storage account to the dbo.fact_sale
table.
COPY INTO [dbo].[fact_sale]
FROM
'https://azuresynapsestorage.blob.core.windows.net/sampledata/WideWorld
ImportersDW/tables/fact_sale.parquet'
WITH (FILE_TYPE = 'PARQUET');
3. Select Run to execute the query. The query takes between one and four minutes to
execute.
4. After the query is completed, review the messages to see the rows affected which
indicated the number of rows that were loaded into the dimension_city and
fact_sale tables respectively.
5. Load the data preview to validate the data loaded successfully by selecting on the
fact_sale table in the Explorer.
6. Rename the query for reference later. Right-click on SQL query 1 in the Explorer
and select Rename.
7. Type Load Tables to change the name of the query.
8. Press Enter on the keyboard or select anywhere outside the tab to save the
change.
Next step
Tutorial: Clone a table using T-SQL in Microsoft Fabric
Feedback
Was this page helpful? Yes No
This tutorial guides you through creating a table clone in Warehouse in Microsoft Fabric,
using the CREATE TABLE AS CLONE OF T-SQL syntax.
You can use the CREATE TABLE AS CLONE OF T-SQL commands to create a table
clone at the current point-in-time or at a previous point-in-time.
You can also clone tables in the Fabric portal. For examples, see Tutorial: Clone
tables in the Fabric portal.
You can also query data in a warehouse as it appeared in the past, using the T-SQL
OPTION syntax. For more information, see Query data as it existed in the past.
2. To create a table clone as of current point in time, in the query editor, paste the
following code to create clones of the dbo.dimension_city and dbo.fact_sale
tables.
SQL
3. Select Run to execute the query. The query takes a few seconds to execute.
After the query is completed, the table clones dimension_city1 and fact_sale1
have been created.
4. Load the data preview to validate the data loaded successfully by selecting on the
dimension_city1 table in the Explorer.
5. To create a table clone as of a past point in time, use the AS CLONE OF ... AT T-
SQL syntax. The following sample to create clones from a past point in time of the
dbo.dimension_city and dbo.fact_sale tables. Input the Coordinated Universal
Time (UTC) for the point in timestamp at which the table is required to be cloned.
SQL
6. Select Run to execute the query. The query takes a few seconds to execute.
After the query is completed, the table clones dimension_city2 and fact_sale2
have been created, with data as it existed in the past point in time.
7. Load the data preview to validate the data loaded successfully by selecting on the
fact_sale2 table in the Explorer.
8. Rename the query for reference later. Right-click on SQL query 2 in the Explorer
and select Rename.
10. Press Enter on the keyboard or select anywhere outside the tab to save the
change.
2. Create a new schema within the WideWorldImporter warehouse named dbo1 . Copy,
paste, and run the following T-SQL code which creates table clones as of current
point in time of dbo.dimension_city and dbo.fact_sale tables across schemas
within the same data warehouse.
SQL
3. Select Run to execute the query. The query takes a few seconds to execute.
After the query is completed, clones dimension_city1 and fact_sale1 are created
in the dbo1 schema.
4. Load the data preview to validate the data loaded successfully by selecting on the
dimension_city1 table under dbo1 schema in the Explorer.
5. To create a table clone as of a previous point in time, in the query editor, paste the
following code to create clones of the dbo.dimension_city and dbo.fact_sale
tables in the dbo1 schema. Input the Coordinated Universal Time (UTC) for the
point in timestamp at which the table is required to be cloned.
SQL
6. Select Run to execute the query. The query takes a few seconds to execute.
After the query is completed, table clones fact_sale2 and dimension_city2 are
created in the dbo1 schema, with data as it existed in the past point in time.
7. Load the data preview to validate the data loaded successfully by selecting on the
fact_sale2 table under dbo1 schema in the Explorer.
8. Rename the query for reference later. Right-click on SQL query 3 in the Explorer
and select Rename.
9. Type Clone Table in another schema to change the name of the query.
10. Press Enter on the keyboard or select anywhere outside the tab to save the
change.
Next step
Tutorial: Transform data using a stored procedure
Related content
Clone table in Microsoft Fabric
Tutorial: Clone tables in the Fabric portal
CREATE TABLE AS CLONE OF
Feedback
Was this page helpful? Yes No
Learn how to create and save a new stored procedure to transform data.
Transform data
1. From the Home tab of the ribbon, select New SQL query.
2. In the query editor, paste the following code to create the stored procedure
dbo.populate_aggregate_sale_by_city . This stored procedure will create and load
SQL
3. To save this query for reference later, right-click on the query tab, and select
Rename.
5. Press Enter on the keyboard or select anywhere outside the tab to save the
change.
8. In the Object explorer, verify that you can see the newly created stored procedure
by expanding the StoredProcedures node under the dbo schema.
9. From the Home tab of the ribbon, select New SQL query.
10. In the query editor, paste the following code. This T-SQL executes
dbo.populate_aggregate_sale_by_city to create the
dbo.aggregate_sale_by_date_city table.
SQL
11. To save this query for reference later, right-click on the query tab, and select
Rename.
12. Type Run Create Aggregate Procedure to change the name of the query.
13. Press Enter on the keyboard or select anywhere outside the tab to save the
change.
15. Select the refresh button on the ribbon. The query takes between two and three
minutes to execute.
16. In the Object explorer, load the data preview to validate the data loaded
successfully by selecting on the aggregate_sale_by_city table in the Explorer.
Next step
Tutorial: Time travel using T-SQL at statement level
Feedback
Was this page helpful? Yes No
In this article, learn how to time travel in your warehouse at the statement level using T-
SQL. This feature allows you to query data as it appeared in the past, within a retention
period.
7 Note
Currently, only the Coordinated Universal Time (UTC) time zone is used for time
travel.
Time travel
In this example, we'll update a row, and show how to easily query the previous value
using the FOR TIMESTAMP AS OF query hint.
1. From the Home tab of the ribbon, select New SQL query.
2. In the query editor, paste the following code to create the view Top10CustomerView .
Select Run to execute the query.
SQL
3. In the Explorer, verify that you can see the newly created view Top10CustomersView
by expanding the View node under dbo schema.
4. Create another new query, similar to Step 1. From the Home tab of the ribbon,
select New SQL query.
5. In the query editor, paste the following code. This updates the TotalIncludingTax
column value to 200000000 for the record which has the SaleKey value of
22632918 . Select Run to execute the query.
SQL
6. In the query editor, paste the following code. The CURRENT_TIMESTAMP T-SQL
function returns the current UTC timestamp as a datetime. Select Run to execute
the query.
SQL
SELECT CURRENT_TIMESTAMP;
8. Paste the following code in the query editor and replace the timestamp value with
the current timestamp value obtained from the prior step. The timestamp syntax
format is YYYY-MM-DDTHH:MM:SS[.FFF] .
10. The following example returns the list of top ten customers by TotalIncludingTax ,
including the new value for SaleKey 22632918 . Select Run to execute the query.
SQL
11. Paste the following code in the query editor and replace the timestamp value to a
time prior to executing the update script to update the TotalIncludingTax value.
This would return the list of top ten customers before the TotalIncludingTax was
updated for SaleKey 22632918. Select Run to execute the query.
SQL
/*View of Top10 Customers as of today before record updates*/
SELECT *
FROM [WideWorldImporters].[dbo].[Top10CustomersView]
OPTION (FOR TIMESTAMP AS OF '2024-04-24T20:49:06.097');
For more examples, visit How to: Query using time travel at the statement level.
Next step
Tutorial: Create a query with the visual query builder
Related content
Query data as it existed in the past
How to: Query using time travel
Query hints (Transact-SQL)
Feedback
Was this page helpful? Yes No
Create and save a query with the visual query builder in the Microsoft Fabric portal.
2. Drag the fact_sale table from the Explorer to the query design pane.
3. Limit the dataset size by selecting Reduce rows > Keep top rows from the
transformations ribbon.
4. In the Keep top rows dialog, enter 10000 .
5. Select OK.
6. Drag the dimension_city table from the explorer to the query design pane.
7. From the transformations ribbon, select the dropdown next to Combine and select
Merge queries as new.
c. Select the CityKey field in the dimension_city table by selecting on the column
name in the header row to indicate the join column.
d. Select the CityKey field in the fact_sale table by selecting on the column
name in the header row to indicate the join column.
10. With the Merge step selected, select the Expand button next to fact_sale on the
header of the data grid then select the columns TaxAmount , Profit , and
TotalIncludingTax .
11. Select OK.
a. Change to Advanced.
b. Group by (if necessary, select Add grouping to add more group by columns):
i. Country
ii. StateProvince
iii. City
c. New column name (if necessary, select Add aggregation to add more
aggregate columns and operations):
i. SumOfTaxAmount
i. Choose Operation of Sum and Column of TaxAmount .
ii. SumOfProfit
i. Choose Operation of Sum and Column of Profit .
iii. SumOfTotalIncludingTax
i. Choose Operation of Sum and Column of TotalIncludingTax .
16. Type Sales Summary to change the name of the query.
17. Press Enter on the keyboard or select anywhere outside the tab to save the
change.
Next step
Tutorial: Analyze data with a notebook
Feedback
Was this page helpful? Yes No
In this tutorial, learn about how you can use analyze data using T-SQL notebook or
using a notebook with a Lakehouse shortcut.
4. Once the notebook is created, you can see WideWorldImporters warehouse is
loaded into the explorer, and the ribbon shows T-SQL as the default language.
5. Right-click to launch the More menu option on the dimension_city table. Select
SELECT TOP 100 to generate a quick SQL template to explore 100 rows from the
table.
6. Run the code cell and you can see messages and results.
4. The new lakehouse loads and the Explorer view opens up, with the Get data in
your lakehouse menu. Under Load data in your lakehouse, select the New
shortcut button.
5. In the New shortcut window, select the button for Microsoft OneLake.
6. In the Select a data source type window, scroll through the list until you find the
Warehouse named WideWorldImporters you created previously. Select it, then
select Next.
7. In the OneLake object browser, expand Tables, expand the dbo schema, and then
select the checkbox for dimension_customer . Select Next. Select Create.
8. If you see a folder called Unidentified under Tables, select the Refresh icon in the
horizontal menu bar.
9. Select the dimension_customer in the Table list to preview the data. The lakehouse
is showing the data from the dimension_customer table from the Warehouse!
10. Next, create a new notebook to query the dimension_customer table. In the Home
ribbon, select the dropdown list for Open notebook and choose New notebook.
12. Select, then drag the dimension_customer from the Tables list into the open
notebook cell. You can see a PySpark query has been written for you to query all
the data from ShortcutExercise.dimension_customer . This notebook experience is
similar to Visual Studio Code Jupyter notebook experience. You can also open the
notebook in VS Code.
13. In the Home ribbon, select the Run all button. Once the query is completed, you
will see you can easily use PySpark to query the Warehouse tables!
Next step
Tutorial: Create cross-warehouse queries with the SQL query editor
Feedback
Was this page helpful? Yes No
In this tutorial, learn about how you can easily create and execute T-SQL queries with
the SQL query editor across multiple warehouse, including joining together data from a
SQL analytics endpoint and a Warehouse in Microsoft Fabric.
2. In the query editor, copy and paste the following T-SQL code.
SQL
SELECT Sales.StockItemKey,
Sales.Description,
SUM(CAST(Sales.Quantity AS int)) AS SoldQuantity,
c.Customer
FROM [dbo].[fact_sale] AS Sales,
[ShortcutExercise].[dbo].[dimension_customer] AS c
WHERE Sales.CustomerKey = c.CustomerKey
GROUP BY Sales.StockItemKey, Sales.Description, c.Customer;
3. Select the Run button to execute the query. After the query is completed, you will
see the results.
4. Rename the query for reference later. Right-click on SQL query 1 in the Explorer
and select Rename.
SQL
7 Note
Next step
Tutorial: Create Power BI reports
Feedback
Was this page helpful? Yes No
Create reports
1. Select the Model view.
2. From the fact_sale table, drag the CityKey field and drop it onto the CityKey
field in the dimension_city table to create a relationship.
a. On the Data pane, expand fact_sales and check the box next to Profit. This
creates a column chart and adds the field to the Y-axis.
b. On the Data pane, expand dimension_city and check the box next to
SalesTerritory. This adds the field to the X-axis.
c. Reposition and resize the column chart to take up the top left quarter of the
canvas by dragging the anchor points on the corners of the visual.
7. Select anywhere on the blank canvas (or press the Esc key) so the column chart
visual is no longer selected.
b. From the Data pane, drag StateProvince from the dimension_city table to the
Location bucket on the Visualizations pane.
c. From the Data pane, drag Profit from the fact_sale table to the Size bucket on
the Visualizations pane.
d. If necessary, reposition and resize the map to take up the bottom left quarter of
the canvas by dragging the anchor points on the corners of the visual.
9. Select anywhere on the blank canvas (or press the Esc key) so the map visual is no
longer selected.
b. From the Data pane, check the box next to SalesTerritory on the
dimension_city table.
c. From the Data pane, check the box next to StateProvince on the
dimension_city table.
d. From the Data pane, check the box next to Profit on the fact_sale table.
e. From the Data pane, check the box next to TotalExcludingTax on the fact_sale
table.
f. Reposition and resize the column chart to take up the right half of the canvas by
dragging the anchor points on the corners of the visual.
Next step
Tutorial: Build a report from the OneLake data hub
Feedback
Was this page helpful? Yes No
Learn how to build a report with the data you ingested into your Warehouse in the last
step.
Build a report
1. Select OneLake in the navigation menu.
2. From the item list, select WideWorldImporters with the type of Semantic model
(default).
7 Note
Microsoft has renamed the Power BI dataset content type to semantic model.
This applies to Microsoft Fabric as well. For more information, see New name
for Power BI datasets.
3. In the Visualize this data section, select Create a report > Auto-create. A report is
generated from the dimension_customer table that was loaded in the previous
section.
6. Enter Customer Quick Summary in the name box. In the Save your report dialogue,
select Save.
Next step
Tutorial: Clean up tutorial resources
Feedback
Was this page helpful? Yes No
You can delete individual reports, pipelines, warehouses, and other items or remove the
entire workspace. In this tutorial, you will clean up the workspace, individual reports,
pipelines, warehouses, and other items you created as part of the tutorial.
Delete a workspace
1. In the Fabric portal, select Data Warehouse Tutorial in the navigation pane to
return to the workspace item list.
4. Select Delete on the warning to remove the workspace and all its contents.
Next step
What is data warehousing in Microsoft Fabric?
Feedback
Was this page helpful? Yes No
The SQL connection string requires TCP port 1433 to be open. TCP 1433 is the standard
SQL Server port number. The SQL connection string also respects the Warehouse or
Lakehouse SQL analytics endpoint security model for data access. Data can be obtained
for all objects to which a user has access.
1. Navigate to your workspace, select the Warehouse, and select the ... ellipses for
More options.
2. Select Copy SQL connection string to copy the connection string to your
clipboard.
Get started with SQL Server Management
Studio (SSMS)
The following steps detail how to start at the Microsoft Fabric workspace and connect a
warehouse to SQL Server Management Studio (SSMS).
1. When you open SSMS, the Connect to Server window appears. If already open,
you can connect manually by selecting Object Explorer > Connect > Database
Engine.
2. Once the Connect to Server window is open, paste the connection string copied
from the previous section of this article into the Server name box. Select Connect
and proceed with the appropriate credentials for authentication. Remember that
only Microsoft Entra multifactor authentication (MFA) is supported, via the option
Microsoft Entra MFA.
3. Once the connection is established, Object Explorer displays the connected
warehouse from the workspace and its respective tables and views, all of which are
ready to be queried.
When connecting via SSMS (or ADS), you see both a SQL analytics endpoint and
Warehouse listed as warehouses, and it's difficult to differentiate between the two item
types and their functionality. For this reason, we strongly encourage you to adopt a
naming convention that allows you to easily distinguish between the two item types
when you work in tools outside of the Microsoft Fabric portal experience. Only SSMS 19
or higher is supported.
When establishing connectivity via JDBC, check for the following dependencies:
1. Add artifacts. Choose Add Artifact and add the following four dependencies, then
select Download/Update to load all dependencies. For example:
2. Select Test connection, and Finish.
XML
<dependency>
<groupId>com.microsoft.azure</groupId>
<artifactId>msal4j</artifactId>
<version>1.13.3</version>
</dependency>
<dependency>
<groupId>com.microsoft.sqlserver</groupId>
<artifactId>mssql-jdbc_auth</artifactId>
<version>11.2.1.x86</version>
</dependency>
<dependency>
<groupId>com.microsoft.sqlserver</groupId>
<artifactId>mssql-jdbc</artifactId>
<version>12.1.0.jre11-preview</version>
</dependency>
<dependency>
<groupId>com.microsoft.aad</groupId>
<artifactId>adal</artifactId>
<version>4.2.2</version>
</dependency>
The dbt data platform-specific adapter plugins allow users to connect to the data store
of choice. To connect to Synapse Data Warehouse in Microsoft Microsoft Fabric from
dbt use dbt-fabric adapter. Similarly, the Azure Synapse Analytics dedicated SQL pool
The DBT Fabric DW Adapter uses the pyodbc library to establish connectivity with the
Warehouse. The pyodbc library is an ODBC implementation in Python language that
uses Python Database API Specification v2.0 . The pyodbc library directly passes
connection string to the database driver through SQLDriverConnect in the msodbc
connection structure to Microsoft Fabric using a TDS (Tabular Data Streaming) proxy
service.
For more information, see the Microsoft Fabric Synapse Data Warehouse dbt adapter
setup and Microsoft Fabric Synapse Data Warehouse dbt adapter configuration .
Custom applications
In Microsoft Fabric, a Warehouse and a Lakehouse SQL analytics endpoint provide a SQL
connection string. Data is accessible from a vast ecosystem of SQL tooling, provided
they can authenticate using Microsoft Entra ID (formerly Azure Active Directory). For
more information, see Connection libraries for Microsoft SQL Database. For more
information and sample connection strings, see Microsoft Entra authentication as an
alternative to SQL authentication.
Best practices
We recommend adding retries in your applications/ETL jobs to build resiliency. For more
information, see the following docs:
Related content
Security for data warehousing in Microsoft Fabric
Microsoft Entra authentication as an alternative to SQL authentication in Microsoft
Fabric
Add Fabric URLs to your allowlist
Azure IP ranges and service tags for public clouds
Feedback
Was this page helpful? Yes No
Natural Language to SQL: Ask Copilot to generate SQL queries using simple
natural language questions.
Code completion: Enhance your coding efficiency with AI-powered code
completions.
Quick actions: Quickly fix and explain SQL queries with readily available actions.
Intelligent Insights: Receive smart suggestions and insights based on your
warehouse schema and metadata.
There are three ways to interact with Copilot in the Fabric Warehouse editor.
Chat Pane: Use the chat pane to ask questions to Copilot through natural
language. Copilot will respond with a generated SQL query or natural language
based on the question asked.
How to: Use the Copilot chat pane for Synapse Data Warehouse
Code completions: Start writing T-SQL in the SQL query editor and Copilot will
automatically generate a code suggestion to help complete your query. The Tab
key accepts the code suggestion, or keep typing to ignore the suggestion.
How to: Use Copilot code completion for Synapse Data Warehouse
Quick Actions: In the ribbon of the SQL query editor, the Fix and Explain options
are quick actions. Highlight a SQL query of your choice and select one of the quick
action buttons to perform the selected action on your query.
Explain: Copilot can provide natural language explanations of your SQL query
and warehouse schema in comments format.
Fix: Copilot can fix errors in your code as error messages arise. Error scenarios
can include incorrect/unsupported T-SQL code, wrong spellings, and more.
Copilot will also provide comments that explain the changes and suggest SQL
best practices.
How to: Use Copilot quick actions for Synapse Data Warehouse
When crafting prompts, be sure to start with a clear and concise description of the
specific information you're looking for.
Natural language to SQL depends on expressive table and column names. If your
table and columns aren't expressive and descriptive, Copilot might not be able to
construct a meaningful query.
Use natural language that is applicable to your table and view names, column
names, primary keys, and foreign keys of your warehouse. This context helps
Copilot generate accurate queries. Specify what columns you wish to see,
aggregations, and any filtering criteria as explicitly as possible. Copilot should be
able to correct typos or understand context given your schema context.
Create relationships in the model view of the warehouse to increase the accuracy
of JOIN statements in your generated SQL queries.
When using code completions, leave a comment at the top of the query with -- to
help guide the Copilot with context about the query you are trying to write.
Avoid ambiguous or overly complex language in your prompts. Simplify the
question while maintaining its clarity. This editing ensures Copilot can effectively
translate it into a meaningful T-SQL query that retrieves the desired data from the
associated tables and views.
Currently, natural language to SQL supports English language to T-SQL.
The following example prompts are clear, specific, and tailored to the properties of
your schema and data warehouse, making it easier for Copilot to generate
accurate T-SQL queries:
Show me all properties that sold last year
Count all the products, group by each category
Show agents who have listed more than two properties for sale
Show the rank of each agent by property sales and show name, total sales,
and rank
Enable Copilot
Your administrator needs to enable the tenant switch before you start using
Copilot. For more information, see Copilot tenant settings.
Your F64 or P1 capacity needs to be in one of the regions listed in this article,
Fabric region availability.
If your tenant or capacity is outside the US or France, Copilot is disabled by default
unless your Fabric tenant admin enables the Data sent to Azure OpenAI can be
processed outside your tenant's geographic region, compliance boundary, or
national cloud instance tenant setting in the Fabric Admin portal.
Copilot in Microsoft Fabric isn't supported on trial SKUs. Only paid SKUs (F64 or
higher, or P1 or higher) are supported.
For more information, see Overview of Copilot in Fabric and Power BI.
Copilot features in Fabric are built to meet the Responsible AI Standard, which means
that they're reviewed by multidisciplinary teams for potential harms, and then refined to
include mitigations for those harms.
For more information, see Privacy, security, and responsible use of Copilot for Data
Warehouse (preview).
Copilot doesn't understand previous inputs and can't undo changes after a user
commits a change when authoring, either via user interface or the chat pane. For
example, you can't ask Copilot to "Undo my last 5 inputs." However, users can still
use the existing user interface options to delete unwanted changes or queries.
Copilot can't make changes to existing SQL queries. For example, if you ask Copilot
to edit a specific part of an existing query, it doesn't work.
Copilot might produce inaccurate results when the intent is to evaluate data.
Copilot only has access to the warehouse schema, none of the data inside.
Copilot responses can include inaccurate or low-quality content, so make sure to
review outputs before using them in your work.
People who are able to meaningfully evaluate the content's accuracy and
appropriateness should review the outputs.
Related content
Copilot tenant settings (preview)
How to: Use the Copilot chat pane for Synapse Data Warehouse
How to: Use Copilot quick actions for Synapse Data Warehouse
How to: Use Copilot code completion for Synapse Data Warehouse
Privacy, security, and responsible use of Copilot for Data Warehouse (preview)
Feedback
Was this page helpful? Yes No
Copilot for Data Warehouse includes a chat pane to interact with Copilot in natural
language. In this interface, you can ask Copilot questions specific to your data
warehouse or generally about data warehousing in Fabric. Depending on the question,
Copilot responds with a generated SQL query or a natural language response.
Since Copilot is schema aware and contextualized, you can generate queries tailored to
your Warehouse.
This integration means that Copilot can generate SQL queries for prompts like:
Which agents have listed more than two properties for sale?
Tell me the rank of each agent by property sales and show name, total sales,
and rank
Key capabilities
The supported capabilities of interacting through chat include:
Natural Language to SQL: Generate T-SQL code and get suggestions of questions
to ask to accelerate your workflow.
Q&A: Ask Copilot questions about warehousing in Fabric and it responds in natural
language
Explanations: Copilot can provide a summary and natural language of
explanations of T-SQL code within the active query tab.
Fixing errors: Copilot can also fix errors in T-SQL code as they arise. Copilot shares
context with the active query tab and can provide helpful suggestions to
automatically fix SQL query errors.
Prerequisites
Your administrator needs to enable the tenant switch before you start using
Copilot. For more information, see Copilot tenant settings.
Your F64 or P1 capacity needs to be in one of the regions listed in this article,
Fabric region availability.
If your tenant or capacity is outside the US or France, Copilot is disabled by default
unless your Fabric tenant admin enables the Data sent to Azure OpenAI can be
processed outside your tenant's geographic region, compliance boundary, or
national cloud instance tenant setting in the Fabric Admin portal.
Copilot in Microsoft Fabric isn't supported on trial SKUs. Only paid SKUs (F64 or
higher, or P1 or higher) are supported.
For more information, see Overview of Copilot in Fabric and Power BI.
Get started
1. In the Data warehouse workload, open a warehouse, and open a new SQL query.
2. To open the Copilot chat pane, select the Copilot ribbon in the button.
3. The chat pane offers helpful starter prompts to get started and familiar with
Copilot. Select any option to ask Copilot a question. The Ask a question button
provides example questions that are tailored specifically to your warehouse.
4. You can also type a request of your choice in the chat box and Copilot responds
accordingly.
5. To find documentation related to your request, select the Help button.
More powerful use cases
You can ask Copilot questions about the warehouse normally and it should respond
accordingly. However, if you want to force Copilot to perform a specific skill, there are /
commands that you can use. These commands must be at the start of your chat
message.
ノ Expand table
Command Description
/explain Generate an explanation for the query within the active query tab.
/fix Generate a fix for the query within the active query tab. You can optionally add
additional context to fix a specific part or aspect of the query.
/question Generate a natural language response from the prompt submitted to Copilot.
/help Get help for using Copilot. This links to documentation to Copilot and how to use
it.
Related content
Microsoft Copilot for Synapse Data Warehouse
How to: Use Copilot code completion for Synapse Data Warehouse
How to: Use Copilot quick actions for Synapse Data Warehouse
Privacy, security, and responsible use of Copilot for Data Warehouse (preview)
Feedback
Was this page helpful? Yes No
As you start writing T-SQL code or comments in the editor, Copilot for Data Warehouse
leverages your warehouse schema and query tab context to complement the existing
IntelliSense with inline code suggestions. The completions can come in varied lengths -
sometimes the completion of the current line, and sometimes a whole new block of
code. The code completions support all types of T-SQL queries: data definition language
(DDL), data query language (DQL), and data manipulation language (DML). You can
accept all or part of a suggestion or keep typing to ignore the suggestions. It can also
generate alternative suggestions for you to pick.
Prerequisites
Your administrator needs to enable the tenant switch before you start using
Copilot. For more information, see Copilot tenant settings.
Your F64 or P1 capacity needs to be in one of the regions listed in this article,
Fabric region availability.
If your tenant or capacity is outside the US or France, Copilot is disabled by default
unless your Fabric tenant admin enables the Data sent to Azure OpenAI can be
processed outside your tenant's geographic region, compliance boundary, or
national cloud instance tenant setting in the Fabric Admin portal.
Copilot in Microsoft Fabric isn't supported on trial SKUs. Only paid SKUs (F64 or
higher, or P1 or higher) are supported.
For more information, see Overview of Copilot in Fabric and Power BI.
Key capabilities
Auto-complete partially written queries: Copilot can provide context-aware SQL
code suggestions or completions for your partially written T-SQL query.
Generate suggestions from comments: You can guide Copilot using comments
that describe your code logic and purpose, using natural language. Leave the
comment (using -- ) at the beginning of the query and Copilot will generate the
corresponding query.
Get started
1. Verify the Show Copilot completions setting in enabled in your warehouse
settings.
You can also check the setting's status through the status bar at the bottom
of the query editor.
If not enabled, then in your warehouse Settings, select the Copilot pane.
Enable the Show Copilot completions option is enabled.
2. Start writing your query in the SQL query editor within the warehouse. As you type,
Copilot will provide real-time code suggestions and completions of your query by
presenting a dimmed ghost text.
3. You can then accept the suggestion with the Tab key, or dismiss it. If you do not
want to accept an entire suggestion from Copilot, you can use the Ctrl+Right
keyboard shortcut to accept the next word of a suggestion.
4. Copilot can provide different suggestions for the same input. You can hover over
the suggestion to preview the other options.
5. To help Copilot, understand the query you're writing, you can provide context
about what code you expect by leaving a comment with -- . For example, you
could specify which warehouse object, condition, or methods to use. Copilot can
even autocomplete your comment to help you write clear and accurate comments
more efficiently.
Related content
Microsoft Copilot for Synapse Data Warehouse
How to: Use the Copilot chat pane for Synapse Data Warehouse
How to: Use Copilot quick actions for Synapse Data Warehouse
Privacy, security, and responsible use of Copilot for Data Warehouse (preview)
Feedback
Was this page helpful? Yes No
There are two AI-powered quick actions that are currently supported in Copilot for Data
Warehouse: Explain and Fix.
Quick actions can accelerate productivity by helping you write and understand queries
faster. These buttons are located at the top of the SQL query editor, near the Run
button.
The Explain quick action will leave a summary at the top of the query and in-line
code comments throughout the query to describe what the query is doing.
The Fix quick action will fix errors in your query syntax or logic. After running a SQL
query and being met with an error, you can fix your queries easily. Copilot will
automatically take the SQL error message into context when fixing your query.
Copilot will also leave a comment indicating where and how it has edited the T-
SQL code.
Copilot leverages information about your warehouse schema, query tab contents, and
execution results to give you relevant and useful feedback on your query.
Prerequisites
Your administrator needs to enable the tenant switch before you start using
Copilot. For more information, see Copilot tenant settings.
Your F64 or P1 capacity needs to be in one of the regions listed in this article,
Fabric region availability.
If your tenant or capacity is outside the US or France, Copilot is disabled by default
unless your Fabric tenant admin enables the Data sent to Azure OpenAI can be
processed outside your tenant's geographic region, compliance boundary, or
national cloud instance tenant setting in the Fabric Admin portal.
Copilot in Microsoft Fabric isn't supported on trial SKUs. Only paid SKUs (F64 or
higher, or P1 or higher) are supported.
For more information, see Overview of Copilot in Fabric and Power BI.
Get started
Whether you are a beginner or an expert in writing SQL queries, quick actions allow you
to understand and navigate the complexities of the SQL language to easily solve issues
independently.
Explain
To use Copilot to explain your queries, follow these steps:
1. Highlight the query that you want Copilot to explain. You can select the whole
query or just a part of it.
2. Select the Explain button in the toolbar. Copilot will analyze your query and
generate inline comments that explain what your code does. If applicable, Copilot
will leave a summary at the top of the query as well. The comments will appear
next to the relevant lines of code in your query editor.
3. Review the comments that Copilot generated. You can edit or delete them if you
want. You can also undo the changes if you don't like them, or make further edits.
Fix
To get Copilot's help with fixing an error in your query, follow these steps:
1. Write and run your query as usual. If there are any errors, you will see them in the
output pane.
2. Highlight the query that you want to fix. You can select the whole query or just a
part of it.
3. Select the Fix button in the toolbar. This button will only be enabled after you have
run your T-SQL query and it has returned an error.
4. Copilot will analyze your query and try to find the best way to fix it. It will also add
comments to explain what it fixed and why.
5. Review the changes that Copilot made and select Run to execute the fixed query.
You can also undo the changes if you don't like them, or make further edits.
Related content
Microsoft Copilot for Fabric Data Warehouse
How to: Use Copilot code completion for Fabric Data Warehouse
How to: Use the Copilot chat pane for Fabric Data Warehouse
Privacy, security, and responsible use of Copilot for Data Warehouse (preview)
Feedback
Was this page helpful? Yes No
This article explains the data warehousing workload with the SQL analytics endpoint of
the Lakehouse, and scenarios for use of the Lakehouse in data warehousing.
The SQL analytics endpoint enables you to query data in the Lakehouse using T-SQL
language and TDS protocol. Every Lakehouse has one SQL analytics endpoint, and each
workspace can have more than one Lakehouse. The number of SQL analytics endpoints
in a workspace matches the number of Lakehouse items.
The SQL analytics endpoint is automatically generated for every Lakehouse and
exposes Delta tables from the Lakehouse as SQL tables that can be queried using
the T-SQL language.
Every delta table from a Lakehouse is represented as one table. Data should be in
delta format.
The default Power BI semantic model is created for every SQL analytics endpoint
and it follows the naming convention of the Lakehouse objects.
There's no need to create a SQL analytics endpoint in Microsoft Fabric. Microsoft Fabric
users can't create a SQL analytics endpoint in a workspace. A SQL analytics endpoint is
automatically created for every Lakehouse. To get a SQL analytics endpoint, create a
lakehouse and a SQL analytics endpoint will be automatically created for the Lakehouse.
7 Note
Behind the scenes, the SQL analytics endpoint is using the same engine as the
Warehouse to serve high performance, low latency SQL queries.
The Lakehouse, with its SQL analytics endpoint, powered by the Warehouse, can simplify
the traditional decision tree of batch, streaming, or lambda architecture patterns.
Together with a warehouse, the lakehouse enables many additive analytics scenarios.
This section explores how to use a Lakehouse together with a Warehouse for a best of
breed analytics strategy.
You can use OneLake shortcuts to reference gold folders in external Azure Data Lake
storage accounts that are managed by Synapse Spark or Azure Databricks engines.
Warehouses can also be added as subject area or domain oriented solutions for specific
subject matter that can have bespoke analytics requirements.
If you choose to keep your data in Fabric, it will always be open and accessible through
APIs, Delta format, and of course T-SQL.
Data in a Microsoft Fabric Lakehouse is physically stored in OneLake with the following
folder structure:
The /Files folder contains raw and unconsolidated (bronze) files that should be
processed by data engineers before they're analyzed. The files might be in various
formats such as CSV, Parquet, different types of images, etc.
The /Tables folder contains refined and consolidated (gold) data that is ready for
business analysis. The consolidated data is in Delta Lake format.
A SQL analytics endpoint can read data in the /tables folder within OneLake. Analysis is
as simple as querying the SQL analytics endpoint of the Lakehouse. Together with the
Warehouse, you also get cross-database queries and the ability to seamless switch from
read-only queries to building additional business logic on top of your OneLake data
with Synapse Data Warehouse.
In Fabric, you can use Spark Streaming or Data Engineering to curate your data. You can
use the Lakehouse SQL analytics endpoint to validate data quality and for existing T-SQL
processes. This can be done in a medallion architecture or within multiple layers of your
Lakehouse, serving bronze, silver, gold, or staging, curated, and refined data. You can
customize the folders and tables created through Spark to meet your data engineering
and business requirements. When ready, a Warehouse can serve all of your downstream
business intelligence applications and other analytics use cases, without copying data,
using Views or refining data using CREATE TABLE AS SELECT (CTAS), stored procedures,
and other DML / DDL commands.
Any folder referenced using a shortcut can be analyzed from a SQL analytics endpoint
and a SQL table is created for the referenced data. The SQL table can be used to expose
data in externally managed data lakes and enable analytics on them.
This shortcut acts as a virtual warehouse that can leveraged from a warehouse for
additional downstream analytics requirements, or queried directly.
Use the following steps to analyze data in external data lake storage accounts:
1. Create a shortcut that references a folder in Azure Data Lake storage or Amazon S3
account. Once you enter connection details and credentials, a shortcut is shown in
the Lakehouse.
2. Switch to the SQL analytics endpoint of the Lakehouse and find a SQL table that
has a name that matches the shortcut name. This SQL table references the folder in
ADLS/S3 folder.
3. Query the SQL table that references data in ADLS/S3. The table can be used as any
other table in the SQL analytics endpoint. You can join tables that reference data in
different storage accounts.
7 Note
If the SQL table is not immediately shown in the SQL analytics endpoint, you might
need to wait a few minutes. The SQL table that references data in external storage
account is created with a delay.
columns. This allows you to store historical data logically separated in a format that
allows compute engines to read the data as needed with performant filtering, versus
reading the entire directory and all folders and files contained within.
Partitioned data enables faster access if the queries are filtering on the predicates that
compare predicate columns with a value.
A SQL analytics endpoint can easily read this type of data with no configuration
required. For example, you can use any application to archive data into a data lake,
including SQL Server 2022 or Azure SQL Managed Instance. After you partitioning data
and land it in a lake for archival purposes with external tables, a SQL analytics endpoint
can read partitioned Delta Lake tables as SQL tables and allow your organization to
analyze them. This reduces the total cost of ownership, reduces data duplication, and
lights up big data, AI, other analytics scenarios.
A SQL analytics endpoint enables you to leave the data in place and still analyze data in
the Warehouse or Lakehouse, even in other Microsoft Fabric workspaces, via a seamless
virtualization. Every Microsoft Fabric Lakehouse stores data in OneLake.
Every Microsoft Fabric Warehouse stores table data in OneLake. If a table is append-
only, the table data is exposed as Delta Lake data in OneLake. Shortcuts enable you to
reference folders in any OneLake where the Warehouse tables are exposed.
A Lakehouse SQL analytics endpoint can enable easy sharing of data between
departments and users, where a user can bring their own capacity and warehouse.
Workspaces organize departments, business units, or analytical domains. Using
shortcuts, users can find any Warehouse or Lakehouse's data. Users can instantly
perform their own customized analytics from the same shared data. In addition to
helping with departmental chargebacks and usage allocation, this is a zero-copy version
the data as well.
The SQL analytics endpoint enables querying of any table and easy sharing. The added
controls of workspace roles and security roles that can be further layered to meet
additional business requirements.
7 Note
If the SQL table is not immediately shown in the SQL analytics endpoint, you might
need to wait a few minutes. The SQL table that references data in another
workspace is created with a delay.
columns. Partitioned data sets enable faster data access if the queries are filtering data
using the predicates that filter data by comparing predicate columns with a value.
A SQL analytics endpoint can represent partitioned Delta Lake data sets as SQL tables
and enable you to analyze them.
Related content
What is a lakehouse?
Create a lakehouse with OneLake
Default Power BI semantic models
Load data into the lakehouse
How to copy data using Copy activity in Data pipeline
Tutorial: Move data into lakehouse via Copy assistant
Connectivity
SQL analytics endpoint of the lakehouse
Query the Warehouse
Feedback
Was this page helpful? Yes No
All Fabric warehouses by default are configured with case-sensitive (CS) collation
Latin1_General_100_BIN2_UTF8. You can also create warehouses with case-insensitive
(CI) collation - Latin1_General_100_CI_AS_KS_WS_SC_UTF8.
Currently, the only method available for creating a case-insensitive data warehouse is via
REST API. This article provides a step-by-step guide on how to create a warehouse with
case-insensitive collation through the REST API. It also explains how to use Visual Studio
Code with the REST Client extension to facilitate the process.
) Important
Prerequisites
A Fabric workspace with an active capacity or trial capacity.
Download and install Visual Studio Code to download and install the application.
Install the REST Client - Visual Studio Marketplace .
API endpoint
To create a warehouse with REST API, use the API endpoint: POST
https://api.fabric.microsoft.com/v1/workspaces/<workspace-id>/items
JSON
{
"type": "Warehouse",
"displayName": "CaseInsensitiveAPIDemo",
"description": "New warehouse with case-insensitive collation",
"creationPayload": {
"defaultCollation": "Latin1_General_100_CI_AS_KS_WS_SC_UTF8"
}
}
2. Input the request details in the file body. Note that there should be a blank space
between the header and the body, placed after the "Authorization" line.
JSON
POST
https://api.fabric.microsoft.com/v1/workspaces/<workspaceID>/items
HTTP/1.1
Content-Type: application/json
Authorization: Bearer <bearer token>
{
"type": "Warehouse",
"displayName": "<Warehouse name here>",
"description": "<Warehouse description here>",
"creationPayload": {
"defaultCollation": "Latin1_General_100_CI_AS_KS_WS_SC_UTF8"
}
}
<workspaceID> : Find the workspace GUID in the URL after the /groups/
4. Select the Send Request link displayed over your POST command in the VS Code
editor.
5. You should receive a response with the status code 202 Accepted, along with
additional details about your POST request.
7. Execute the following T-SQL statement in the Query editor to confirm that the
collation for your warehouse aligns with what you specified in the JSON above:
SQL
Related content
Create a Warehouse in Microsoft Fabric
Tables in data warehousing in Microsoft Fabric
Data types in Microsoft Fabric
Feedback
Was this page helpful? Yes No
This article describes how to get started with sample Warehouse using the Microsoft
Fabric portal, including creation and consumption of the warehouse.
2. Provide the name for your sample warehouse and select Create.
3. The create action creates a new Warehouse and start loading sample data into it.
The data loading takes few seconds to complete.
4. On completion of loading sample data, the warehouse opens with data loaded into
tables and views to query.
3. On completion of loading sample data, the warehouse displays data loaded into
tables and views to query.
Sample scripts
Your new warehouse is ready to accept T-SQL queries. The following sample T-SQL
scripts can be used on the sample data in your new warehouse.
7 Note
It is important to note that much of the functionality described in this section is
also available to users via a TDS end-point connection and tools such as SQL
Server Management Studio (SSMS) or Azure Data Studio (for users who prefer to
use T-SQL for the majority of their data processing needs). For more information,
see Connectivity or Query a warehouse.
SQL
/*************************************************
Get number of trips performed by each medallion
**************************************************/
SELECT
M.MedallionID
,M.MedallionCode
,COUNT(T.TripDistanceMiles) AS TotalTripCount
FROM
dbo.Trip AS T
JOIN
dbo.Medallion AS M
ON
T.MedallionID=M.MedallionID
GROUP BY
M.MedallionID
,M.MedallionCode
/****************************************************
How many passengers are being picked up on each trip?
*****************************************************/
SELECT
PassengerCount,
COUNT(*) AS CountOfTrips
FROM
dbo.Trip
WHERE
PassengerCount > 0
GROUP BY
PassengerCount
ORDER BY
PassengerCount
/***************************************************************************
******
What is the distribution of trips by hour on working days (non-holiday
weekdays)?
****************************************************************************
*****/
SELECT
ti.HourlyBucket,
COUNT(*) AS CountOfTrips
FROM dbo.Trip AS tr
INNER JOIN dbo.Date AS d
ON tr.DateID = d.DateID
INNER JOIN dbo.Time AS ti
ON tr.PickupTimeID = ti.TimeID
WHERE
d.IsWeekday = 1
AND d.IsHolidayUSA = 0
GROUP BY
ti.HourlyBucket
ORDER BY
ti.HourlyBucket
Related content
Query the SQL analytics endpoint or Warehouse in Microsoft Fabric
Warehouse settings and context menus
Feedback
Was this page helpful? Yes No
If the first run's performance is crucial, try manually creating statistics. Review the
statistics article to better understand the role of statistics and for guidance on how
to create manual statistics to improve your query performance. However, if the first
run's performance is not critical, you can rely on automatic statistics that will be
generated in the first query and will continue to be leveraged in subsequent runs
(so long as underlying data does not change significantly).
Fabric administrators will be able to access the Capacity Utilization and Metrics report
for up-to-date information tracking the utilization of capacity that includes Warehouse.
Use dynamic management views (DMVs) to
monitor query execution
You can use dynamic management views (DMVs) to monitor connection, session, and
request status in the Warehouse.
Statistics
The Warehouse uses a query engine to create an execution plan for a given SQL query.
When you submit a query, the query optimizer tries to enumerate all possible plans and
choose the most efficient candidate. To determine which plan would require the least
overhead, the engine needs to be able to evaluate the amount of work or rows that
might be processed by each operator. Then, based on each plan's cost, it chooses the
one with the least amount of estimated work. Statistics are objects that contain relevant
information about your data, to allow the query optimizer to estimate these costs.
You can also manually update statistics after each data load or data update to assure
that the best query plan can be built.
For more information statistics and how you can augment the automatically created
statistics, see Statistics in Fabric data warehousing.
COPY (Transact-SQL)
Data pipelines
Dataflows
Cross-warehouse ingestion
To help determine which option is best for you and to review some data ingestion best
practices, review Ingest data.
SQL
For guidance on how to handle these trickle-load scenarios, see Best practices for
ingesting data.
Consider using CTAS (Transact-SQL) to write the data you want to keep in a table rather
than using DELETE. If a CTAS takes the same amount of time, it's safer to run since it has
minimal transaction logging and can be canceled quickly if needed.
Use integer-based data types if possible. SORT, JOIN, and GROUP BY operations
complete faster on integers than on character data.
For supported data types and more information, see data types.
Data Compaction
Data compaction consolidates smaller Parquet files into fewer, larger files, which
optimizes read operations. This process also helps in efficiently managing deleted rows
by eliminating them from immutable Parquet files. The data compaction process
involves re-writing tables or segments of tables into new Parquet files that are optimized
for performance. For more information, see Blog: Automatic Data Compaction for Fabric
Warehouse .
The data compaction process is seamlessly integrated into the warehouse. As queries
are executed, the system identifies tables that could benefit from compaction and
performs necessary evaluations. There is no manual way to trigger data compaction.
Related content
Query the SQL analytics endpoint or Warehouse in Microsoft Fabric
Limitations
Troubleshoot the Warehouse
Data types
T-SQL surface area
Tables in data warehouse
Caching in Fabric data warehousing
Feedback
Was this page helpful? Yes No
The SQL analytics endpoint enables you to query data in the lakehouse using T-SQL
language and TDS protocol. Every lakehouse has one SQL analytics endpoint. The
number of SQL analytics endpoints in a workspace matches the number of lakehouses
and mirrored databases provisioned in that one workspace.
A background process is responsible for scanning lakehouse for changes, and keeping
SQL analytics endpoint up-to-date for all the changes committed to lakehouses in a
workspace. The sync process is transparently managed by Microsoft Fabric platform.
When a change is detected in a lakehouse, a background process updates metadata and
the SQL analytics endpoint reflects the changes committed to lakehouse tables. Under
normal operating conditions, the lag between a lakehouse and SQL analytics endpoint is
less than one minute. The actual length of time can vary from a few seconds to minutes
depending on a number of factors that are dicussed in this article.
For every Delta table in your Lakehouse, the SQL analytics endpoint automatically
generates a table in the appropriate schema. For autogenerated schema data types for
the SQL analytics endpoint, see Data types in Microsoft Fabric.
Tables in the SQL analytics endpoint are created with a minor delay. Once you create or
update Delta Lake table in the lake, the SQL analytics endpoint table that references the
Delta lake table will be created/refreshed automatically.
The amount of time it takes to refresh the table is related to how optimized the Delta
tables are. For more information, review Delta Lake table optimization and V-Order to
learn more about key scenarios, and an in-depth guide on how to efficiently maintain
Delta tables for maximum performance.
You can manually force a refresh of the automatic metadata scanning in the Fabric
portal. On the page for the SQL analytics endpoint, select the Refresh button in the
Explorer toolbar to refresh the schema. Go to Query your SQL analytics endpoint, and
look for the refresh button, as shown in the following image.
Guidance
Automatic metadata discovery tracks changes committed to lakehouses, and is a
single instance per Fabric workspace. If you are observing increased latency for
changes to sync between lakehouses and SQL analytics endpoint, it could be due
to large number of lakehouses in one workspace. In such a scenario, consider
migrating each lakehouse to a separate workspace as this allows automatic
metadata discovery to scale.
Parquet files are immutable by design. When there's an update or a delete
operation, a Delta table will add new parquet files with the changeset, increasing
the number of files over time, depending on frequency of updates and deletes. If
there's no maintenance scheduled, eventually, this pattern creates a read overhead
and this impacts time it takes to sync changes to SQL analytics endpoint. To
address this, schedule regular lakehouse table maintenance operations.
In some scenarios, you might observe that changes committed to a lakehouse are
not visible in the associated SQL analytics endpoint. For example, you might have
created a new table in lakehouse, but it's not listed in the SQL analytics endpoint.
Or, you might have committed a large number of rows to a table in a lakehouse
but this data is not visible in SQL analytics endpoint. We recommend initiating an
on-demand metadata sync, triggered from the SQL query editor Refresh ribbon
option. This option forces an on-demand metadata sync, rather than waiting on
the background metadata sync to finish.
Not all Delta features are understood by the automatic sync process. For more
information on the functionality supported by each engine in Fabric, see Delta Lake
Interoperability.
If there is an extremely large volumne of tables changes during the Extract
Transform and Load (ETL) processing, an expected delay could occur until all the
changes are processed.
A column with high cardinality (mostly or entirely made of unique values) results in
a large number of partitions. A large number of partitions negatively impacts
performance of the metadata discovery scan for changes. If the cardinality of a
column is high, choose another column for partitioning.
The size of each partition can also affect performance. Our recommendation is to
use a column that would result in a partition of at least (or close to) 1 GB. We
recommend following best practices for delta tables maintenance; optimization.
For a python script to evaluate partitions, see Sample script for partition details.
A large volume of small-sized parquet files increases the time it takes to sync changes
between a lakehouse and its associated SQL analytics endpoint. You might end up with
large number of parquet files in a delta table for one or more reasons:
If you choose a partition for a delta table with high number of unique values, it's
partitioned by each unique value and might be over-partitioned. Choose a
partition column that doesn't have a high cardinality, and results in individual
partition size of at least 1 GB.
Batch and streaming data ingestion rates might also result in small files depending
on frequency and size of changes being written to a lakehouse. For example, there
might be small volume of changes coming through to the lakehouse and this
would result in small parquet files. To address this, we recommend implementing
regular lakehouse table maintenance.
1. First, you must provide the ABSFF path for your delta table in the variable
delta_table_path .
You can get ABFSS path of a delta table from the Fabric portal Explorer.
Right-click on table name, then select COPY PATH from the list of options.
Python
# Purpose: Print out details of partitions, files per partitions, and size
per partition in GB.
from notebookutils import mssparkutils
# Define ABFSS path for your delta table. You can get ABFSS path of a delta
table by simply right-clicking on table name and selecting COPY PATH from
the list of options.
delta_table_path = "abfss://<workspace
id>@<onelake>.dfs.fabric.microsoft.com/<lakehouse id>/Tables/<tablename>"
partition_details[partition_name] = {
"size_bytes": total_size,
"file_count": file_count
}
Related content
Better together: the lakehouse and warehouse
Synapse Data Warehouse in Microsoft Fabric performance guidelines
Limitations of the SQL analytics endpoint
Feedback
Was this page helpful? Yes No
This article details key concepts for designing tables in Microsoft Fabric.
In Warehouse, tables are database objects that contain all the transactional data.
Dimension tables contain attribute data that might change but usually changes
infrequently. For example, a customer's name and address are stored in a
dimension table and updated only when the customer's profile changes. To
minimize the size of a large fact table, the customer's name and address don't
need to be in every row of a fact table. Instead, the fact table and the dimension
table can share a customer ID. A query can join the two tables to associate a
customer's profile and transactions.
Integration tables provide a place for integrating or staging data. For example,
you can load data to a staging table, perform transformations on the data in
staging, and then insert the data into a production table.
A table stores data in OneLake as part of the Warehouse. The table and the data persist
whether or not a session is open.
Tables in the Warehouse
To show the organization of the tables, you could use fact , dim , or int as prefixes to
the table names. The following table shows some of the schema and table names for
WideWorldImportersDW sample data warehouse.
ノ Expand table
WideWorldImportersDW Source Table Name Table Type Data Warehouse Table Name
Create a table
For Warehouse, you can create a table as a new empty table. You can also create and
populate a table with the results of a select statement. The following are the T-SQL
commands for creating a table.
ノ Expand table
T-SQL Description
Statement
CREATE TABLE Creates an empty table by defining all the table columns and options.
CREATE TABLE Populates a new table with the results of a select statement. The table columns
AS SELECT and data types are based on the select statement results. To import data, this
statement can select from an external table.
SQL
Schema names
Warehouse supports the creation of custom schemas. Like in SQL Server, schemas are a
good way to group together objects that are used in a similar fashion. The following
code creates a user-defined schema called wwi .
SQL
Data types
Microsoft Fabric supports the most commonly used T-SQL data types.
For more about data types, see Data types in Microsoft Fabric.
When you create a table in Warehouse, review the data types reference in CREATE
TABLE (Transact-SQL).
For a guide to create a table in Warehouse, see Create tables.
Collation
Latin1_General_100_BIN2_UTF8 is the default collation for both tables and metadata.
Latin1_General_100_BIN2_UTF8 (default)
Latin1_General_100_CI_AS_KS_WS_SC_UTF8
Once the collation is set during database creation, all subsequent objects (tables,
columns, etc.) will inherit this default collation.
Statistics
The query optimizer uses column-level statistics when it creates the plan for executing a
query. To improve query performance, it's important to have statistics on individual
columns, especially columns used in query joins. Warehouse supports automatic
creation of statistics.
If data is coming from multiple data stores, you can port the data into the data
warehouse and store it in an integration table. Once data is in the integration table, you
can use the power of data warehouse to implement transformation operations. Once
the data is prepared, you can insert it into production tables.
Limitations
Warehouse supports many, but not all, of the table features offered by other databases.
The following list shows some of the table features that aren't currently supported.
) Important
There are limitations with adding table constraints or columns when using Source
Control with Warehouse.
Related content
What is data warehousing in Microsoft Fabric?
What is data engineering in Microsoft Fabric?
Create a Warehouse
Query a warehouse
OneLake overview
Create tables in Warehouse
Transactions and modify tables
Feedback
Was this page helpful? Yes No
Tables in Microsoft Fabric support the most commonly used T-SQL data types.
ノ Expand table
** The uniqueidentifier data type is a T-SQL data type without a matching data type in
Delta Parquet. As a result, it's stored as a binary type. Warehouse supports storing and
reading uniqueidentifier columns, but these values can't be read on the SQL analytics
endpoint. Reading uniqueidentifier values in the lakehouse displays a binary
representation of the original values. As a result, features such as cross-joins between
Warehouse and SQL analytics endpoint using a uniqueidentifier column don't work as
expected.
*** Support for varchar (max) and varbinary (max) is currently in preview.
For more information about the supported data types including their precisions, see
data types in CREATE TABLE reference.
ノ Expand table
money and Use decimal, however note that it can't store the monetary unit.
smallmoney
datetimeoffset Use datetime2, however you can use datetimeoffset for converting data with
CAST the AT TIME ZONE (Transact-SQL) function. For an example, see
datetimeoffset.
nchar and Use char and varchar respectively, as there's no similar unicode data type in
nvarchar Parquet. The char and varchar types in a UTF-8 collation might use more
storage than nchar and nvarchar to store unicode data. To understand the
impact on your environment, see Storage differences between UTF-8 and
UTF-16.
geography No equivalent.
Unsupported data types can still be used in T-SQL code for variables, or any in-memory
use in session. Creating tables or views that persist data on disk with any of these types
isn't allowed.
The rules for mapping original Delta types to the SQL types in SQL analytics endpoint
are shown in the following table:
ノ Expand table
DOUBLE float
DATE date
TIMESTAMP datetime2
BINARY varbinary(n)
The columns that have the types that aren't listed in the table aren't represented as the
table columns in the SQL analytics endpoint.
Related content
T-SQL Surface Area in Microsoft Fabric
Feedback
Was this page helpful? Yes No
This article covers the T-SQL language syntax capabilities of Microsoft Fabric, when
querying the SQL analytics endpoint or Warehouse.
These limitations apply only to Warehouse and SQL analytics endpoint items in Fabric
Synapse Data Warehouse. For limitations of SQL Database in Fabric, see Limitations in
SQL Database in Microsoft Fabric (Preview).
7 Note
Limitations
At this time, the following list of commands is NOT currently supported. Don't try to use
these commands. Even though they might appear to succeed, they could cause issues to
your warehouse.
CREATE ROLE
CREATE USER
Hints
IDENTITY Columns
Manually created multi-column stats
Materialized views
MERGE
OPENROWSET
PREDICT
Temporary tables
Triggers
Related content
Query insights in Fabric data warehousing
What is data warehousing in Microsoft Fabric?
Data types in Microsoft Fabric
Limitations in Microsoft Fabric
Feedback
Was this page helpful? Yes No
Learn about table constraints in SQL analytics endpoint and Warehouse in Microsoft
Fabric, including the primary key, foreign keys, and unique keys.
) Important
To add or remove primary key, foreign key, or unique constraints, use ALTER TABLE.
These cannot be created inline within a CREATE TABLE statement.
Table constraints
SQL analytics endpoint and Warehouse in Microsoft Fabric support these table
constraints:
PRIMARY KEY is only supported when NONCLUSTERED and NOT ENFORCED are
both used.
FOREIGN KEY is only supported when NOT ENFORCED is used.
UNIQUE constraint is only supported when NONCLUSTERED and NOT ENFORCED
are both used.
SQL analytics endpoint and Warehouse don't support default constraints at this
time.
For more information on tables, see Tables in data warehousing in Microsoft Fabric.
) Important
There are limitations with adding table constraints or columns when using Source
Control with Warehouse.
Examples
Create a Microsoft Fabric Warehouse table with a primary key:
SQL
SQL
SQL
Related content
Design tables in Warehouse in Microsoft Fabric
Data types in Microsoft Fabric
What is data warehousing in Microsoft Fabric?
What is data engineering in Microsoft Fabric?
Warehouse in Microsoft Fabric
Create a Warehouse
Query a warehouse
Feedback
Was this page helpful? Yes No
It's a common requirement in data warehouses to assign a unique identifier to each row
of a table. In SQL Server-based environments this is typically done by creating an
identity column in a table, however currently this feature isn't supported in a warehouse
in Microsoft Fabric. Instead, you need to use a workaround technique. We present two
alternatives.
Method 1
This method is most applicable when you need to create identity values, but the order
of the values isn't important (nonsequential values are acceptable).
Unique values are generated in the code that inserts data into the table.
1. To create unique data using this method, create a table that includes a column that
stores unique identifier values. The column data type should be set to bigint. You
should also define the column as NOT NULL to ensure that every row is assigned an
identifier.
SQL
2. When you insert rows into the table, via T-SQL scripts or application code or
otherwise, generate unique data for Row_ID with the NEWID() function. This
function generates a unique value of type uniqueidentifier which can then be cast
and stored as a bigint.
The following code inserts rows into the dbo.Orders_with_Identifier table. The
values for the Row_ID column are computed by converting the values returned by
the newid() function. The function doesn't require an ORDER BY clause and
generates a new value for each record.
SQL
Method 2
This method is most applicable when you need to create sequential identity values but
should be used with caution on larger datasets as it can be slower than alternative
methods. Considerations should also be made for multiple processes inserting data
simultaneously as this could lead to duplicate values.
1. To create unique data using this method, create a table that includes a column that
stores unique identifier values. The column data type should be set to int or bigint,
depending on the volume of data you expect to store. You should also define the
column as NOT NULL to ensure that every row is assigned an identifier.
unique key.
SQL
2. Before you insert rows into the table, you need to determine the last identifier
value stored in the table. You can do that by retrieving the maximum identifier
value. This value should be assigned to a variable so you can refer to it when you
insert table rows (in the next step).
The following code assigns the last identifier value to a variable named @MaxID .
SQL
The following T-SQL code—which is run in the same batch as the script in step 2—
inserts rows into the Orders_with_Identifier table. The values for the Row_ID
column are computed by adding the @MaxID variable to values returned by the
ROW_NUMBER function. The function must have an ORDER BY clause, which defines
the logical order of the rows within the result set. However when set to SELECT
NULL , no logical order is imposed, meaning identifier values are arbitrarily assigned.
SQL
Related content
Design tables in Warehouse in Microsoft Fabric
Data types in Microsoft Fabric
ROW_NUMBER (Transact-SQL)
SELECT - OVER Clause (Transact-SQL)
Feedback
Was this page helpful? Yes No
Similar to their behavior in SQL Server, transactions allow you to control the commit or
rollback of read and write queries.
You can modify data that is stored in tables in a Warehouse using transactions to group
changes together.
For example, you could commit inserts to multiples tables, or, none of the tables if
an error arises. If you're changing details about a purchase order that affects three
tables, you can group those changes into a single transaction. That means when
those tables are queried, they either all have the changes or none of them do.
Transactions are a common practice for when you need to ensure your data is
consistent across multiple tables.
Transactional capabilities
The same transactional capabilities are supported in the SQL analytics endpoint in
Microsoft Fabric, but for read-only queries.
Transactions can also be used for sequential SELECT statements to ensure the tables
involved all have data from the same point in time. As an example, if a table has new
rows added by another transaction, the new rows don't affect the SELECT queries inside
an open transaction.
) Important
Only the snapshot isolation level is supported in Microsoft Fabric. If you use T-SQL
to change your isolation level, the change is ignored at Query Execution time and
snapshot isolation is applied.
ノ Expand table
These locks prevent conflicts such as a table's schema being changed while rows are
being updated in a transaction.
You can query locks currently held with the dynamic management view (DMV)
sys.dm_tran_locks.
Conflicts from two or more concurrent transactions that update one or more rows in a
table are evaluated at the end of the transaction. The first transaction to commit
completes successfully and the other transactions are rolled back with an error returned.
These conflicts are evaluated at the table level and not the individual parquet file level.
INSERT statements always create new parquet files, which means fewer conflicts with
other transactions except for DDL because the table's schema could be changing.
Transaction logging
Transaction logging in Warehouse in Microsoft Fabric is at the parquet file level because
parquet files are immutable (they can't be changed). A rollback results in pointing back
to the previous parquet files. The benefits of this change are that transaction logging
and rollbacks are faster.
Limitations
Distributed transactions are not supported.
Save points are not supported.
Named transactions are not supported.
Marked transactions are not supported.
ALTER TABLE is not supported within an explicit transaction.
At this time, there's limited T-SQL functionality in the warehouse. See TSQL surface
area for a list of T-SQL commands that are currently not available.
If a transaction has data insertion into an empty table and issues a SELECT before
rolling back, the automatically generated statistics can still reflect the uncommitted
data, causing inaccurate statistics. Inaccurate statistics can lead to unoptimized
query plans and execution times. If you roll back a transaction with SELECTs after a
large INSERT, update statistics for the columns mentioned in your SELECT.
Related content
Query the SQL analytics endpoint or Warehouse in Microsoft Fabric
Tables in Warehouse
Feedback
Was this page helpful? Yes No
Settings are accessible from the context menu or from the Settings icon in the ribbon
when you open the item. There are some key differences in the actions you can take in
settings depending on if you're interacting with the SQL analytics endpoint or a data
warehouse.
Settings options
This section describes and explains the settings options available based on the item
you're working with and its description.
ノ Expand table
Last modified by Name of the user who modified the warehouse recently.
SQL connection The SQL connection string for the workspace. You can use the
string SQL connection string to create a connection to the
warehouse using various tools, such as SSMS/Azure Data
Studio.
The following table shows settings for the default Power BI semantic model.
ノ Expand table
Setting Details
Query caching Turn on or off caching query results for speeding up reports by using
previously saved query results.
Server settings The XMLA connection string of the default semantic model.
Endorsement and Endorse the default semantic model independently from warehouse and
discovery make it discoverable in your org.
Context menus
Applies to: ✅ Warehouse in Microsoft Fabric
Warehouse offers an easy experience to create reports and access supported actions
using its context menus.
ノ Expand table
Share Lets users share the warehouse to build content based on the underlying default
Power BI semantic model, query data using SQL or get access to underlying data
files. Shares the warehouse access (SQL connections only, and autogenerated
semantic model) with other users in your organization. Users receive an email
with links to access the detail page where they can find the SQL connection
string and can access the default semantic model to create reports based on it.
Explore this Create an exploration to quickly visualize and analyze your data.
data
(preview)
Analyze in Uses the existing Analyze in Excel capability on default Power BI semantic model.
Excel
Favorite Mark specific items to quickly access them from your favorites list.
Rename Updates the warehouse with the new name. Does not apply to the SQL analytics
endpoint of the Lakehouse.
Menu option Option description
Delete Delete warehouse from workspace. A confirmation dialog notifies you of the
impact of the delete action. If the Delete action is confirmed, then the warehouse
and related downstream items are deleted. Does not apply to the SQL analytics
endpoint of the Lakehouse.
Manage Enables users to add other recipients with specified permissions, similar to
permissions allowing the sharing of an underlying semantic model or allowing to build
content with the data associated with the underlying semantic model.
Copy SQL Copy the SQL connection string associated to the items in a specific workspace.
connection
string
Query activity Access the query activity view to monitor your running and completed queries in
this warehouse.
View This option shows the end-to-end lineage of all items in this workspace from the
workspace data sources to the warehouse, the default Power BI semantic model, and other
lineage semantic models (if any) that were built on top of the warehouse, all the way to
deports, dashboards, and apps.
View item This option shows the end-to-end lineage of the specific warehouse selected
lineage from the data sources to the warehouse, the default Power BI semantic model,
and other semantic models (if any) that were built on top of the specific
warehouse, all the way to deports, dashboards, and apps.
Related content
Warehouse in Microsoft Fabric
Model data in the default Power BI semantic model in Microsoft Fabric
Create reports in the Power BI service
Admin portal
Feedback
Was this page helpful? Yes No
Provide product feedback | Ask the community
Source control with Warehouse
(preview)
Article • 07/19/2024
This article explains how Git integration and deployment pipelines work for warehouses
in Microsoft Fabric. Learn how to set up a connection to your repository, manage your
warehouses, and deploy them across different environments. Source control for Fabric
Warehouse is currently a preview feature.
You can use both Git integration and Deployment pipelines for different scenarios:
Use Git and SQL database projects to manage incremental change, team
collaboration, commit history in individual database objects.
Use deployment pipelines to promote code changes to different pre-production
and production environments.
Git integration
Git integration in Microsoft Fabric enables developers to integrate their development
processes, tools, and best practices directly into the Fabric platform. It allows developers
who are developing in Fabric to:
1. To set up the connection, see Get started with Git integration. Follow instructions
to Connect to a Git repo to either Azure DevOps or GitHub as a Git provider.
2. Once connected, your items, including warehouses, appear in the Source
control panel.
3. After you successfully connect the warehouse instances to the Git repo, you see
the warehouse folder structure in the repo. You can now execute future operations,
like creating a pull request.
When you commit the warehouse item to the Git repo, the warehouse is converted to a
source code format, as a SQL database project. A SQL project is a local representation of
SQL objects that comprise the schema for a single database, such as tables, stored
procedures, or functions. The folder structure of the database objects is organized by
Schema/Object Type. Each object in the warehouse is represented with a .sql file that
contains its data definition language (DDL) definition. Warehouse table data and SQL
security features are not included in the SQL database project.
Shared queries are also committed to the repo and inherit the name that they are saved
as.
To download a local copy of your warehouse's schema, select Download SQL database
project in the ribbon.
The local copy of a database project that contains the definition of the warehouse
schema. The database project can be used to:
3. Select the .zip file that was downloaded from the existing warehouse.
4. The warehouse schema is published to the new warehouse.
Deployment pipelines
You can also use deployment pipelines to deploy your warehouse code across different
environments, such as development, test, and production. Deployment pipelines don't
expose a database project.
Use the following steps to complete your warehouse deployment using the deployment
pipeline.
4. Select Deploy to deploy your warehouses across the Development, Test, and
Production stages.
For more information about the Fabric deployment pipelines process, see Overview of
Fabric deployment pipelines.
Modify the new table definition with new constraints or columns, as desired,
using ALTER TABLE .
Delete the old table.
Rename the new table to the name of the old table using sp_rename.
Modify the definition of the old table in the SQL database project in the exact
same way. The SQL database project of the warehouse in source control and the
live warehouse should now match.
Currently, do not create a Dataflow Gen2 with an output destination to the
warehouse. Committing and updating from Git would be blocked by a new item
named DataflowsStagingWarehouse that appears in the repository.
SQL analytics endpoint is not supported with Git integration.
Currently, if you use ALTER TABLE to add a constraint or column in the database
project, the table will be dropped and recreated when deploying, resulting in data
loss.
Currently, do not create a Dataflow Gen2 with an output destination to the
warehouse. Deployment would be blocked by a new item named
DataflowsStagingWarehouse that appears in the deployment pipeline.
Related content
Get started with Git integration (preview)
Basic concepts in Git integration
What is lifecycle management in Microsoft Fabric?
Tutorial: Set up dbt for Fabric Data Warehouse
Feedback
Was this page helpful? Yes No
This article describes a query technique to summarize fact table data by using ranges of
fact or dimension table attributes. For example, you might need to determine sales
quantities by sale price. However, instead of grouping by each sale price, you want to
group by range bands of price, like:
$0.00 to $999.99
$1,000.00 to $4,999.99
and others…
Tip
SQL
7 Note
Technically, this table isn't a dimension table. It's a helper table that organizes fact
or dimension data for analysis.
You should consider creating a composite primary key or unique constraint based on
the Series and RangeLabel columns to ensure that duplicate ranges within a series can't
be created. You should also verify that the lower and upper boundary values don't
overlap and that there aren't any gaps.
Tip
You can add a RangeLabelSort column with an int data type if you need to control
the sort order of the range bands. This column will help you present the range
bands in a meaningful way, especially when the range label text values don't sort in
a logical order.
ノ Expand table
Age 0 to 19 years 0 20
Age 20 to 39 years 20 40
Age 40 to 59 years 40 60
The following example queries the f_Sales fact table by joining it with the d_RangeBand
table and sums the fact quantity values. It filters the d_RangeBand table by the Price
series, and groups by the range labels.
SQL
SELECT
[r].[RangeLabel],
SUM([s].[Quantity]) AS [Quantity]
FROM
[d_RangeBand] AS [r],
[f_Sales] AS [s]
WHERE
[r].[Series] = 'Price'
AND [s].[UnitPrice] >= [r].[LowerBound]
AND [s].[UnitPrice] < [r].[UpperBound]
GROUP BY
[r].[RangeLabel];
) Important
Pay close attention to the logical operators used to determine the matching range
band in the WHERE clause. In this example, the lower boundary value is inclusive and
the upper boundary is exclusive. That way, there won't be any overlap of ranges or
gaps between ranges. The appropriate operators will depend on the boundary
values you store in your range band table.
Related content
Dimensional modeling in Microsoft Fabric Warehouse
Design tables in Warehouse in Microsoft Fabric
Data types in Microsoft Fabric
Feedback
Was this page helpful? Yes No
This article is the first in a series about dimensional modeling inside a warehouse. It
provides practical guidance for Warehouse in Microsoft Fabric, which is an experience
that supports many T-SQL capabilities, like creating tables and managing data in tables.
So, you're in complete control of creating your dimensional model tables and loading
them with data.
7 Note
In this article, the term data warehouse refers to an enterprise data warehouse,
which delivers comprehensive integration of critical data across the organization. In
contrast, the standalone term warehouse refers to a Fabric Warehouse, which is a
software as a service (SaaS) relational database offering that you can use to
implement a data warehouse. For clarity, in this article the latter is mentioned as
Fabric Warehouse.
Tip
Dimension tables describe the entities relevant to your organization and analytics
requirements. Broadly, they represent the things that you model. Things could be
products, people, places, or any other concept, including date and time. For more
information and design best practices, see Dimension tables in this series.
Fact tables store measurements associated with observations or events. They can
store sales orders, stock balances, exchange rates, temperature readings, and
more. Fact tables contain dimension keys together with granular values that can be
aggregated. For more information and design best practices, see Fact tables in this
series.
A star schema design is optimized for analytic query workloads. For this reason, it's
considered a prerequisite for enterprise Power BI semantic models. Analytic queries are
concerned with filtering, grouping, sorting, and summarizing data. Fact data is
summarized within the context of filters and groupings of the related dimension tables.
The reason why it's called a star schema is because a fact table forms the center of a star
while the related dimension tables form the points of the star.
A star schema often contains multiple fact tables, and therefore multiple stars.
However, in specific circumstances it might not be the best approach. For example, self-
service analysts who need freedom and agility to act quickly, and without dependency
on IT, might create semantic models that connect directly to source data. In such cases,
the theory of dimensional modeling is still relevant. That theory helps analysts create
intuitive and efficient models, while avoiding the need to create and load a dimensional
model in a data warehouse. Instead, a quasi-dimensional model can be created by using
Power Query, which defines the logic to connect to, and transform, source data to create
and load the semantic model tables. For more information, see Understand star schema
and the importance for Power BI.
) Important
When you use Power Query to define a dimensional model in the semantic model,
you aren't able to manage historical change, which might be necessary to analyze
the past accurately. If that's a requirement, you should create a data warehouse and
allow periodic ETL processes to capture and appropriately store dimension
changes.
To this end, your data warehouse should strive to store quality, conformed, and
historically accurate data as a single version of the truth. It should deliver understandable
and navigable data with fast performance, and enforce permissions so that the right
data can only ever be accessed by the right people. Strive to design your data
warehouse for resilience, allowing it to adapt to change as your requirements evolve.
Tip
We recommend that you build out your enterprise data warehouse iteratively. Start
with the most important subject areas first, and then over time, according to
priority and resources, extend the data warehouse with other subject areas.
Related content
In the next article in this series, learn about guidance and design best practices for
dimension tables.
Feedback
Was this page helpful? Yes No
7 Note
This article forms part of the Dimensional modeling series of articles. This series
focuses on guidance and design best practices related to dimensional modeling in
Microsoft Fabric Warehouse.
This article provides you with guidance and best practices for designing dimension
tables in a dimensional model. It provides practical guidance for Warehouse in Microsoft
Fabric, which is an experience that supports many T-SQL capabilities, like creating tables
and managing data in tables. So, you're in complete control of creating your
dimensional model tables and loading them with data.
7 Note
In this article, the term data warehouse refers to an enterprise data warehouse,
which delivers comprehensive integration of critical data across the organization. In
contrast, the standalone term warehouse refers to a Fabric Warehouse, which is a
software as a service (SaaS) relational database offering that you can use to
implement a data warehouse. For clarity, in this article the latter is mentioned as
Fabric Warehouse.
Tip
SQL
--Natural key(s)
EmployeeID VARCHAR(20) NOT NULL,
--Dimension attributes
FirstName VARCHAR(20) NOT NULL,
<…>
--Audit attributes
AuditMissing BIT NOT NULL,
AuditIsInferred BIT NOT NULL,
AuditCreatedDate DATE NOT NULL,
AuditCreatedBy VARCHAR(15) NOT NULL,
AuditLastModifiedDate DATE NOT NULL,
AuditLastModifiedBy VARCHAR(15) NOT NULL
);
Surrogate key
The sample dimension table has a surrogate key, which is named Salesperson_SK . A
surrogate key is a single-column unique identifier that's generated and stored in the
dimension table. It's a primary key column used to relate to other tables in the
dimensional model.
Surrogate keys strive to insulate the data warehouse from changes in source data. They
also deliver many other benefits, allowing you to:
A surrogate key column is a recommended practice, even when a natural key (described
next) seems an acceptable candidate. You should also avoid giving meaning to the key
values (except for date and time dimension keys, as described later).
Natural keys
The sample dimension table also has a natural key, which is named EmployeeID . A
natural key is the key stored in the source system. It allows relating the dimension data
to its source system, which is typically done by an Extract, Load, and Transform (ETL)
process to load the dimension table. Sometimes a natural key is called a business key,
and its values might be meaningful to business users.
Sometimes dimensions don't have a natural key. That could be the case for your date
dimension or lookup dimensions, or when you generate dimension data by normalizing
a flat file.
Dimension attributes
A sample dimension table also has dimension attributes, like the FirstName column.
Dimension attributes provide context to the numeric data stored in related fact tables.
They're typically text columns that are used in analytic queries to filter and group (slice
and dice), but not to be aggregated themselves. Some dimension tables contain few
attributes, while others contain many attributes (as many as it takes to support the
query requirements of the dimensional model).
Tip
A good way to determine which dimensions and attributes you need is to find the
right people and ask the right questions. Specifically, stay alert for the mention of
the word by. For example, when someone says they need to analyze sales by
salesperson, by month, and by product category, they're telling you that they need
dimensions that have those attributes.
If you plan to create a Direct Lake semantic model, you should include all possible
columns required for filtering and grouping as dimension attributes. That's because
Direct Lake semantic models don't support calculated columns.
Foreign keys
The sample dimension table also has a foreign key, which is named SalesRegion_FK .
Other dimension tables can reference a foreign key, and their presence in a dimension
table is a special case. It indicates that the table is related to another dimension table,
meaning that it might form part of a snowflake dimension or it's related to an outrigger
dimension.
Fabric Warehouse supports foreign key constraints but they can't be enforced.
Therefore, it's important that your ETL process tests for integrity between related tables
when data is loaded.
It's still a good idea to create foreign keys. One good reason to create unenforced
foreign keys is to allow modeling tools, like Power BI Desktop, to automatically detect
and create relationships between tables in the semantic model.
For more information, see Manage historical change later in this article.
Audit attributes
The sample dimension table also has various audit attributes. Audit attributes are
optional but recommended. They allow you to track when and how dimension records
were created or modified, and they can include diagnostic or troubleshooting
information raised during ETL processes. For example, you'll want to track who (or what
process) updated a row, and when. Audit attributes can also help diagnose a
challenging problem, like when an ETL process stops unexpectedly. They can also flag
dimension members as errors or inferred members.
Big dimensions might be sourced from multiple source systems. In this case, dimension
processing needs to combine, merge, deduplicate, and standardize the data; and assign
surrogate keys.
By comparison, some dimensions are tiny. They might represent lookup tables that
contain only several records and attributes. Often these small dimensions store category
values related to transactions in fact tables, and they're implemented as dimensions with
surrogate keys to relate to the fact records.
Tip
When you have many small dimensions, consider consolidating them into a junk
dimension.
Snowflake dimensions
One exception to denormalization is to design a snowflake dimension. A snowflake
dimension is normalized, and it stores the dimension data across several related tables.
The following diagram depicts a snowflake dimension that comprises three related
dimension tables: Product , Subcategory , and Category .
The dimension is extremely large and storage costs outweigh the need for high
query performance. (However, periodically reassess that this still remains the case.)
You need keys to relate the dimension to higher-grain facts. For example, the sales
fact table stores rows at product level, but the sales target fact table stores rows at
subcategory level.
You need to track historical changes at higher levels of granularity.
7 Note
Bear in mind that a hierarchy in a Power BI semantic model can only be based on
columns from a single semantic model table. Therefore, a snowflake dimension
should deliver a denormalized result by using a view that joins the snowflake tables
together.
Hierarchies
Commonly, dimension columns produce hierarchies. Hierarchies enable exploring data
at distinct levels of summarization. For example, the initial view of a matrix visual might
show yearly sales, and the report consumer can choose to drill down to reveal quarterly
and monthly sales.
There are three ways to store a hierarchy in a dimension. You can use:
Hierarchies can be balanced or unbalanced. It's also important to understand that some
hierarchies are ragged.
Balanced hierarchies
Balanced hierarchies are the most common type of hierarchy. A balanced hierarchy has
the same number of levels. A common example of a balanced hierarchy is a calendar
hierarchy in a date dimension that comprises levels for year, quarter, month, and date.
The following diagram depicts a balanced hierarchy of sales regions. It comprises two
levels, which are sales region group and sales region.
Levels of a balanced hierarchy are either based on columns from a single, denormalized
dimension, or from tables that form a snowflake dimension. When based on a single,
denormalized dimension, the columns that represent the higher levels contain
redundant data.
For balanced hierarchies, facts always relate to a single level of the hierarchy, which is
typically the lowest level. That way, the facts can be aggregated (rolled up) to the
highest level of the hierarchy. Facts can relate to any level, which is determined by the
grain of the fact table. For example, the sales fact table might be stored at date level,
while the sales target fact table might be stored at quarter level.
Unbalanced hierarchies
The following diagram depicts an unbalanced hierarchy. It comprises four levels, and
each member in the hierarchy is a salesperson. Notice that salespeople have a different
number of ancestors in the hierarchy according to who they report to.
For unbalanced hierarchies, facts always relate to the dimension grain. For example,
sales facts relate to different salespeople, who have different reporting structures. The
dimension table would have a surrogate key (named Salesperson_SK ) and a
ReportsTo_Salesperson_FK foreign key column, which references the primary key
column. Each salesperson without anyone to manage isn't necessarily at the lowest level
of any branch of the hierarchy. When they're not at the lowest level, a salesperson might
sell products and have reporting salespeople who also sell products. So, the rollup of
fact data must consider the individual salesperson and all their descendants.
Querying parent-child hierarchies can be complex and slow, especially for large
dimensions. While the source system might store relationships as parent-child, we
recommend that you naturalize the hierarchy. In this instance, naturalize means to
transform and store the hierarchy levels in the dimension as columns.
Tip
If you choose not to naturalize the hierarchy, you can still create a hierarchy based
on a parent-child relationship in a Power BI semantic model. However, this
approach isn't recommended for large dimensions. For more information, see
Understanding functions for parent-child hierarchies in DAX.
Ragged hierarchies
Sometimes a hierarchy is ragged because the parent of a member in the hierarchy exists
at a level that's not immediately above it. In these cases, missing level values repeat the
value of the parent.
It's possible that a dimension could support both SCD type 1 and SCD type 2 changes.
SCD type 3 isn't commonly used, in part due to the fact that it's difficult to use in a
semantic model. Consider carefully whether an SCD type 2 approach would be a better
fit.
Tip
SCD type 1
SCD type 1 changes overwrite the existing dimension row because there's no need to
keep track of changes. This SCD type can also be used to correct errors. It's a common
type of SCD, and it should be used for most changing attributes, like customer name,
email address, and others.
The following diagram depicts the before and after state of a salesperson dimension
member where their phone number has changed.
This SCD type doesn't preserve historical perspective because the existing row is
updated. That means SCD type 1 changes can result in different higher-level
aggregations. For example, if a salesperson is assigned to a different sales region, a SCD
type 1 change would overwrite the dimension row. The rollup of salespeople historic
sales results to region would then produce a different outcome because it now uses the
new current sales region. It's as if that salesperson was always assigned to the new sales
region.
SCD type 2
SCD type 2 changes result in new rows that represent a time-based version of a
dimension member. There's always a current version row, and it reflects the state of the
dimension member in the source system. Historical tracking attributes in the dimension
table store values that allow identifying the current version (current flag is TRUE ) and its
validity time period. A surrogate key is required because there will be duplicate natural
keys when multiple versions are stored.
It's a common type of SCD, but it should be reserved for attributes that must preserve
historical perspective.
1. The update operation overwrites the current version to set the historical tracking
attributes. Specifically, the end validity column is set to the ETL processing date (or
a suitable timestamp in the source system) and the current flag is set to FALSE .
2. The insert operation adds a new, current version, setting the start validity column
to the end validity column value (used to update the prior version) and the current
flag to TRUE .
It's important to understand that the granularity of related fact tables isn't at the
salesperson level, but rather the salesperson version level. The rollup of their historic
sales results to region will produce correct results but there will be two (or more)
salesperson member versions to analyze.
The following diagram depicts the before and after state of a salesperson dimension
member where their sales region has changed. Because the organization wants to
analyze salespeople effort by the region they're assigned to, it triggers an SCD type 2
change.
Tip
When a dimension table supports SCD type 2 changes, you should include a label
attribute that describes the member and the version. Consider an example when
the salesperson Lynn Tsoflias from Adventure Works changes assignment from the
Australian sales region to the United Kingdom sales region. The label attribute for
the first version could read "Lynn Tsoflias (Australia)" and the label attribute for the
new, current version could read "Lynn Tsoflias (United Kingdom)." If helpful, you
might include the validity dates in the label too.
You should balance the need for historic accuracy versus usability and efficiency. Try
to avoid having too many SCD type 2 changes on a dimension table because it can
result in an overwhelming number of versions that might make it difficult for
analysts to comprehend.
Also, too many versions could indicate that a changing attribute might be better
stored in the fact table. Extending the earlier example, if sales region changes were
frequent, the sales region could be stored as a dimension key in the fact table
rather than implementing an SCD type 2.
SQL
<…>
);
The RecChangeDate_FK column stores the date when the change came into effect. It
allows you to query when changes took place.
The RecValidFromKey and RecValidToKey columns store the effective dates of
validity for the row. Consider storing the earliest date found in the date dimension
for RecValidFromKey to represent the initial version, and storing 01/01/9999 for the
RecValidToKey of the current versions.
The RecReason column is optional. It allows documenting the reason why the
version was inserted. It could encode which attributes changed, or it could be a
code from the source system that states a particular business reason.
The RecIsCurrent column makes it possible to retrieve current versions only. It's
used when the ETL process looks up dimension keys when loading fact tables.
7 Note
Some source systems don't store historical changes, so it's important that the
dimension is processed regularly to detect changes and implement new versions.
That way, you can detect changes shortly after they occur, and their validity dates
will be accurate.
SCD type 3
SCD type 3 changes track limited history with attributes. This approach can be useful
when there's a need to record the last change, or a number of the latest changes.
This SCD type preserves limited historical perspective. It might be useful when only the
initial and current values should be stored. In this instance, interim changes wouldn't be
required.
The following diagram depicts the before and after state of a salesperson dimension
member where their sales region has changed. Because the organization wants to
determine any previous sales region assignment, it triggers an SCD type 3 change.
Special dimension members
You might insert rows into a dimension that represent missing, unknown, N/A, or error
states. For example, you might use the following surrogate key values.
ノ Expand table
-3 Error
It's uncommon that a source system would have calendar dimension data, so it must be
generated in the data warehouse. Typically, it's generated once, and if it's a calendar
dimension, it's extended with future dates when needed.
Date dimension
The date (or calendar) dimension is the most common dimension used for analysis. It
stores one row per date, and it supports the common requirement to filter or group by
specific periods of dates, like years, quarters, or months.
) Important
A date dimension shouldn't include a grain that extends to time of day. If time of
day analysis is required, you should have both a date dimension and a time
dimension (described next). Fact tables that store time of day facts should have two
foreign keys, one to each of these dimensions.
The natural key of the date dimension should use the date data type. The surrogate key
should store the date by using YYYYMMDD format and the int data type. This accepted
practice should be the only exception (alongside the time dimension) when the
surrogate key value has meaning and is human readable. Storing YYYYMMDD as an int
data type is not only efficient and sorted numerically, but it also conforms to the
unambiguous International Standards Organization (ISO) 8601 date format.
labels.
FiscalYear , FiscalQuarter – some corporate accounting schedules start mid-year,
so that the start/end of the calendar year and the fiscal year are different.
FiscalQuarterNumberInYear , FiscalMonthNumberInYear – which might be required
to sort text labels.
WeekOfYear – there are multiple ways to label the week of year, including an ISO
you should maintain multiple sets of holiday lists that each geography observes as
a separate dimension or naturalized in multiple attributes in the date dimension.
Adding a HolidayText attribute could help identify holidays for reporting.
IsWeekday – similarly, in some geographies, the standard work week isn't Monday
to Friday. For example, the work week is Sunday to Thursday in many Middle
Eastern regions, while other regions employ a four-day or six-day work week.
LastDayOfMonth
As with any dimension, what's important is that it contains attributes that support the
known filtering, grouping, and hierarchy requirements. There might also be attributes
that store translations of labels into other languages.
When the dimension is used to relate to higher-grain facts, the fact table can use the
first date of the date period. For example, a sales target fact table that stores quarterly
salespeople targets would store the first date of the quarter in the date dimension. An
alternative approach is to create key columns in the date table. For example, a quarter
key could store the quarter key by using YYYYQ format and the smallint data type.
The dimension should be populated with the known range of dates used by all fact
tables. It should also include future dates when the data warehouse stores facts about
targets, budgets, or forecasts. As with other dimensions, you might include rows that
represent missing, unknown, N/A, or error situations.
Tip
Search the internet for "date dimension generator" to find scripts and spreadsheets
that generate date data.
Typically, at the beginning of the next year, the ETL process should extend the date
dimension rows to a specific number of years ahead. When the dimension includes
relative offset attributes, the ETL process must be run daily to update offset attribute
values based on the current date (today).
Time dimension
Sometimes, facts need to be stored at a point in time (as in time of day). In this case,
create a time (or clock) dimension. It could have a grain of minutes (24 x 60 = 1,440
rows) or even seconds (24 x 60 x 60 = 86,400 rows). Other possible grains include half
hour or hour.
The natural key of a time dimension should use the time data type. The surrogate key
could use an appropriate format and store values that have meaning and are human
readable, for example, by using the HHMM or HHMMSS format.
Conformed dimensions
Some dimensions might be conformed dimensions. Conformed dimensions relate to
many fact tables, and so they're shared by multiple stars in a dimensional model. They
deliver consistency and can help you to reduce ongoing development and maintenance.
For example, it's typical that fact tables store at least one date dimension key (because
activity is almost always recorded by date and/or time). For that reason, a date
dimension is a common conformed dimension. You should therefore ensure that your
date dimension includes attributes relevant for the analysis of all fact tables.
The following diagram shows the Sales fact table and the Inventory fact table. Each
fact table relates to the Date dimension and Product dimension, which are conformed
dimensions.
As another example, your employee and users could be the same set of people. In this
case, it might make sense to combine the attributes of each entity to produce one
conformed dimension.
Role-playing dimensions
When a dimension is referenced multiple times in a fact table, it's known as a role-
playing dimension.
For example, when a sales fact table has order date, ship date, and delivery date
dimension keys, the date dimension relates in three ways. Each way represents a distinct
role, yet there's only one physical date dimension.
The following diagram depicts a Flight fact table. The Airport dimension is a role-
playing dimension because it's related twice to the fact table as the Departure Airport
dimension and the Arrival Airport dimension.
Junk dimensions
A junk dimension is useful when there are many independent dimensions, especially
when they comprise a few attributes (perhaps one), and when these attributes have low
cardinality (few values). The objective of a junk dimension is to consolidate many small
dimensions into a single dimension. This design approach can reduce the number of
dimensions, and decrease the number of fact table keys and thus fact table storage size.
They also help to reduce Data pane clutter because they present fewer tables to users.
A junk dimension table typically stores the Cartesian product of all dimension attribute
values, with a surrogate key attribute.
Good candidates include flags and indicators, order status, and customer demographic
states (gender, age group, and others).
The following diagram depicts a junk dimension named Sales Status that combines
order status values and delivery status values.
Degenerate dimensions
A degenerate dimension can occur when the dimension is at the same grain as the
related facts. A common example of a degenerate dimension is a sales order number
dimension that relates to a sales fact table. Typically, the invoice number is a single, non-
hierarchical attribute in the fact table. So, it's an accepted practice not to copy this data
to create a separate dimension table.
The following diagram depicts a Sales Order dimension that's a degenerate dimension
based on the SalesOrderNumber column in a sales fact table. This dimension is
implemented as a view that retrieves the distinct sales order number values.
Tip
It's possible to create a view in a Fabric Warehouse that presents the degenerate
dimension as a dimension for querying purposes.
Outrigger dimensions
When a dimension table relates to other dimension tables, it's known as an outrigger
dimension. An outrigger dimension can help to conform and reuse definitions in the
dimensional model.
For example, you could create a geography dimension that stores geographic locations
for every postal code. That dimension could then be referenced by your customer
dimension and salesperson dimension, which would store the surrogate key of the
geography dimension. That way, customers and salespeople could then be analyzed by
using consistent geographic locations.
Multivalued dimensions
When a dimension attribute must store multiple values, you need to design a
multivalued dimension. You implement a multivalued dimension by creating a bridge
table (sometimes called a join table). A bridge table stores a many-to-many relationship
between entities.
For example, consider there's a salesperson dimension, and that each salesperson is
assigned to one or possibly more sales regions. In this case, it makes sense to create a
sales region dimension. That dimension stores each sales region only once. A separate
table, known as the bridge table, stores a row for each salesperson and sales region
relationship. Physically, there's a one-to-many relationship from the salesperson
dimension to the bridge table, and another one-to-many relationship from the sales
region dimension to the bridge table. Logically, there's a many-to-many relationship
between salespeople and sales regions.
In the following diagram, the Account dimension table relates to the Transaction fact
table. Because customers can have multiple accounts and accounts can have multiple
customers, the Customer dimension table is related via the Customer Account bridge
table.
Related content
In the next article in this series, learn about guidance and design best practices for fact
tables.
Feedback
Was this page helpful? Yes No
7 Note
This article forms part of the Dimensional modeling series of articles. This series
focuses on guidance and design best practices related to dimensional modeling in
Microsoft Fabric Warehouse.
This article provides you with guidance and best practices for designing fact tables in a
dimensional model. It provides practical guidance for Warehouse in Microsoft Fabric,
which is an experience that supports many T-SQL capabilities, like creating tables and
managing data in tables. So, you're in complete control of creating your dimensional
model tables and loading them with data.
7 Note
In this article, the term data warehouse refers to an enterprise data warehouse,
which delivers comprehensive integration of critical data across the organization. In
contrast, the standalone term warehouse refers to a Fabric Warehouse, which is a
software as a service (SaaS) relational database offering that you can use to
implement a data warehouse. For clarity, in this article the latter is mentioned as
Fabric Warehouse.
Tip
Fact tables also include dimension keys, which determine the dimensionality of the facts.
The dimension key values determine the granularity of the facts, which is the atomic
level by which facts are defined. For example, an order date dimension key in a sales fact
table sets the granularity of the facts at date level, while a target date dimension key in a
sales target fact table could set the granularity at quarter level.
7 Note
While it's possible to store facts at a higher granularity, it's not easy to split out
measure values to lower levels of granularity (if required). Sheer data volumes,
together with analytic requirements, might provide valid reason to store higher
granularity facts but at the expense of detailed analysis.
To easily identify fact tables, you typically prefix their names with f_ or Fact_ .
SQL
--Attributes
SalesOrderNo INT NOT NULL,
SalesOrderLineNo SMALLINT NOT NULL,
--Measures
Quantity INT NOT NULL,
<…>
--Audit attributes
AuditMissing BIT NOT NULL,
AuditCreatedDate DATE NOT NULL,
AuditCreatedBy VARCHAR(15) NOT NULL,
AuditLastModifiedDate DATE NOT NULL,
AuditLastModifiedBy VARCHAR(15) NOT NULL
);
Primary key
As is the case in the example, the sample fact table doesn't have a primary key. That's
because it doesn't typically serve a useful purpose, and it would unnecessarily increase
the table storage size. A primary key is often implied by the set of dimension keys and
attributes.
Dimension keys
The sample fact table has various dimension keys, which determine the dimensionality of
the fact table. Dimension keys are references to the surrogate keys (or higher-level
attributes) in the related dimensions.
7 Note
It's an unusual fact table that doesn't include at least one date dimension key.
A fact table can reference a dimension multiple times. In this case, it's known as a role-
playing dimension. In this example, the fact table has the OrderDate_Date_FK and
ShipDate_Date_FK dimension keys. Each dimension key represents a distinct role, yet
It's a good practice to set each dimension key as NOT NULL . During the fact table load,
you can use special dimension members to represent missing, unknown, N/A, or error
states (if necessary).
Attributes
The sample fact table has two attributes. Attributes provide additional information and
set the granularity of fact data, but they're neither dimension keys nor dimension
attributes, nor measures. In this example, attribute columns store sales order
information. Other examples could include tracking numbers or ticket numbers. For
analysis purposes, an attribute could form a degenerate dimension.
Measures
The sample fact table also has measures, like the Quantity column. Measure columns
are typically numeric and commonly additive (meaning they can be summed, and
summarized by using other aggregations). For more information, see Measure types
later in this article.
Audit attributes
The sample fact table also has various audit attributes. Audit attributes are optional.
They allow you to track when and how fact records were created or modified, and they
can include diagnostic or troubleshooting information raised during Extract, Transform,
and Load (ETL) processes. For example, you'll want to track who (or what process)
updated a row, and when. Audit attributes can also help diagnose a challenging
problem, like when an ETL process stops unexpectedly.
An inventory fact table is a good example of a periodic snapshot table. It's loaded every
day with the end-of-day stock balance of every product.
Periodic snapshot tables can be used instead of a transaction fact table when recording
large volumes of transactions is expensive, and it doesn't support any useful analytic
requirement. For example, there might be millions of stock movements in a day (which
could be stored in a transaction fact table), but your analysis is only concerned with
trends of end-of-day stock levels.
A fact row is loaded soon after the first event in a process, and then the row is updated
in a predictable sequence every time a milestone event occurs. Updates continue until
the process completes.
Accumulating snapshot fact table have multiple date dimension keys, each representing
a milestone event. Some dimension keys might record a N/A state until the process
arrives at a certain milestone. Measures typically record durations. Durations between
milestones can provide valuable insight into a business workflow or assembly process.
Measure types
Measures are typically numeric, and commonly additive. However, some measures can't
always be added. These measures are categorized as either semi-additive or non-
additive.
Additive measures
An additive measure can be summed across any dimension. For example, order quantity
and sales revenue are additive measures (providing revenue is recorded for a single
currency).
Semi-additive measures
Any measure in a periodic snapshot fact table can't be summed across other time
periods. For example, you shouldn't sum the age of an inventory item sampled
nightly, but you could sum the age of all inventory items on a shelf, each night.
A stock balance measure in an inventory fact table can't be summed across other
products.
Sales revenue in a sales fact table that has a currency dimension key can't be
summed across currencies.
Non-additive measures
Other examples include rates, like unit prices, and ratios. However, it's considered a
better practice to store the values used to compute the ratio, which allows the ratio to
be calculated if needed. For example, a discount percentage of a sales fact could be
stored as a discount amount measure (to be divided by the sales revenue measure). Or,
the age of an inventory item on the shelf shouldn't be summed over time, but you
might observe a trend in the average age of inventory items.
While some measures can't be summed, they're still valid measures. They can be
aggregated by using count, distinct count, minimum, maximum, average, and others.
Also, non-additive measures can become additive when they're used in calculations. For
example, unit price multiplied by order quantity produces sales revenue, which is
additive.
7 Note
Related content
In the next article in this series, learn about guidance and design best practices for
loading dimensional model tables.
Feedback
Was this page helpful? Yes No
7 Note
This article forms part of the Dimensional modeling series of articles. This series
focuses on guidance and design best practices related to dimensional modeling in
Microsoft Fabric Warehouse.
This article provides you with guidance and best practices for loading dimension and
fact tables in a dimensional model. It provides practical guidance for Warehouse in
Microsoft Fabric, which is an experience that supports many T-SQL capabilities, like
creating tables and managing data in tables. So, you're in complete control of creating
your dimensional model tables and loading them with data.
7 Note
In this article, the term data warehouse refers to an enterprise data warehouse,
which delivers comprehensive integration of critical data across the organization. In
contrast, the standalone term warehouse refers to a Fabric Warehouse, which is a
software as a service (SaaS) relational database offering that you can use to
implement a data warehouse. For clarity, in this article the latter is mentioned as
Fabric Warehouse.
Tip
For a Fabric Warehouse solution, you can use Data Factory to develop and run your ETL
process. The process can stage, transform, and load source data into your dimensional
model tables.
Use data pipelines to build workflows to orchestrate the ETL process. Data
pipelines can execute SQL scripts, stored procedures, and more.
Use dataflows to develop low-code logic to ingest data from hundreds of data
sources. Dataflows support combining data from multiple sources, transforming
data, and then loading it to a destination, like a dimensional model table.
Dataflows are built by using the familiar Power Query experience that's available
today across many Microsoft products, including Microsoft Excel and Power BI
Desktop.
7 Note
Orchestration
The general workflow of an ETL process is to:
Fact tables can be processed once all dimension tables are processed.
When all dimensional model tables are processed, you might trigger the refresh of
dependent semantic models. It's also a good idea to send a notification to relevant staff
to inform them of the outcome of the ETL process.
Stage data
Staging source data can help support data loading and transformation requirements. It
involves extracting source system data and loading it into staging tables, which you
create to support the ETL process. We recommend that you stage source data because it
can:
Data in staging tables should never be made available to business users. It's only
relevant to the ETL process.
7 Note
When your data is stored in a Fabric Lakehouse, it might not be necessary to stage
its data in the data warehouse. If it implements a medallion architecture, you could
source its data from either the bronze, silver, or gold layer.
We recommend that you create a schema in the warehouse, possibly named staging .
Staging tables should resemble the source tables as closely as possible in terms of
column names and data types. The contents of each table should be removed at the
start of the ETL process. However, note that Fabric Warehouse tables can't be truncated.
Instead, you can drop and recreate each staging table before loading it with data.
You can also consider data virtualization alternatives as part of your staging strategy.
You can use:
Mirroring, which is a low-cost and low-latency turnkey solution that allows you to
create a replica of your data in OneLake. For more information, see Why use
Mirroring in Fabric?.
OneLake shortcuts, which point to other storage locations that could contain your
source data. Shortcuts can be used as tables in T-SQL queries.
PolyBase in SQL Server, which is a data virtualization feature for SQL Server.
PolyBase allows T-SQL queries to join data from external sources to relational
tables in an instance of SQL Server.
Data virtualization with Azure SQL Managed Instance, which allows you to execute
T-SQL queries on files storing data in common data formats in Azure Data Lake
Storage (ADLS) Gen2 or Azure Blob Storage, and combine it with locally stored
relational data by using joins.
Transform data
The structure of your source data might not resemble the destination structures of your
dimensional model tables. So, your ETL process needs to reshape the source data to
align with the structure of the dimensional model tables.
Also, the data warehouse must deliver cleansed and conformed data, so source data
might need to be transformed to ensure quality and consistency.
7 Note
The concept of garbage in, garbage out certainly applies to data warehousing—
therefore, avoid loading garbage (low quality) data into your dimensional model
tables.
Here are some transformations that your ETL process could perform.
Combine data: Data from different sources can be integrated (merged) based on
matching keys. For example, product data is stored across different systems (like
manufacturing and marketing), yet they all use a common stock-keeping unit
(SKU). Data can also be appended when it shares a common structure. For
example, sales data is stored in multiple systems. A union of the sales from each
system can produce a superset of all sales data.
Convert data types: Data types can be converted to those defined in the
dimensional model tables.
Calculations: Calculations can be done to produce values for the dimensional
model tables. For example, for an employee dimension table, you might
concatenate first and last names to produce the full name. As another example, for
your sales fact table, you might calculate gross sales revenue, which is the product
of unit price and quantity.
Detect and manage historical change: Change can be detected and appropriately
stored in dimension tables. For more information, see Manage historical change
later in this article.
Aggregate data: Aggregation can be used to reduce fact table dimensionality
and/or to raise the granularity of the facts. For example, the sales fact table doesn't
need to store sales order numbers. Therefore, an aggregated result that groups by
all dimension keys can be used to store the fact table data.
Load data
You can load tables in a Fabric Warehouse by using the following data ingestion options.
COPY INTO (T-SQL): This option is useful when the source data comprise Parquet
or CSV files stored in an external Azure storage account, like ADLS Gen2 or Azure
Blob Storage.
Data pipelines: In addition to orchestrating the ETL process, data pipelines can
include activities that run T-SQL statements, perform lookups, or copy data from a
data source to a destination.
Dataflows: As an alternative to data pipelines, dataflows provide a code-free
experience to transform and clean data.
Cross-warehouse ingestion: When data is stored in the same workspace, cross-
warehouse ingestion allows joining different warehouse or lakehouse tables. It
supports T-SQL commands like INSERT…SELECT , SELECT INTO , and CREATE TABLE AS
SELECT (CTAS) . These commands are especially useful when you want to transform
and load data from staging tables within the same workspace. They're also set-
based operations, which is likely to be the most efficient and fastest way to load
dimensional model tables.
Tip
For a complete explanation of these data ingestion options including best practices,
see Ingest data into the Warehouse.
Logging
ETL processes usually require dedicated monitoring and maintenance. For these reasons,
we recommend that you log the results of the ETL process to non-dimensional model
tables in your warehouse. You should generate a unique ID for each ETL process and use
it to log details about every operation.
Consider logging:
Tip
You can create a semantic model that's dedicated to monitoring and analyzing your
ETL processes. Process durations can help you identify bottlenecks that might
benefit from review and optimization. Row counts can allow you to understand the
size of the incremental load each time the ETL runs, and also help to predict the
future size of the data warehouse (and when to scale up the Fabric capacity, if
appropriate).
The following diagram depicts the logic used to process a dimension table.
Consider the process of the Product dimension table.
When new products are added to the source system, rows are inserted into the
Product dimension table.
When products are modified, existing rows in the dimension table are either
updated or inserted.
When SCD type 1 applies, updates are made to the existing rows.
When SCD type 2 applies, updates are made to expire the current row versions,
and new rows that represent the current version are inserted.
When SCD type 3 applies, a process similar to SCD type 1 occurs, updating the
existing rows without inserting new rows.
Surrogate keys
We recommend that each dimension table has a surrogate key, which should use the
smallest possible integer data type. In SQL Server-based environments that's typically
done by creating an identity column, however this feature isn't supported in Fabric
Warehouse. Instead, you'll need to use a workaround technique that generates unique
identifiers.
) Important
7 Note
If the dimension table row is an inferred member (inserted by a fact load process),
you should treat any changes as late arriving dimension details instead of an SCD
change. In this case, any changed attributes should be updated and the inferred
member flag column set to FALSE .
It's possible that a dimension could support SCD type 1 and/or SCD type 2 changes.
SCD type 1
When SCD type 1 changes are detected, use the following logic.
SCD type 2
When SCD type 2 changes are detected, use the following logic.
1. Expire the current version by setting the end date validity column to the ETL
processing date (or a suitable timestamp in the source system) and the current flag
to FALSE .
2. If the table includes last modified date and last modified by columns, set the current
date and process that made the modifications.
3. Insert new members that have the start date validity column set to the end date
validity column value (used to update the prior version) and has the current
version flag set to TRUE .
4. If the table includes created date and created by columns, set the current date and
process that made the insertions.
SCD type 3
When SCD type 3 changes are detected, update the attributes by using similar logic to
processing SCD type 1.
The appropriate way to handle source deletions is to record them as a soft delete. A soft
delete marks a dimension member as no longer active or valid. To support this case,
your dimension table should include a Boolean attribute with the bit data type, like
IsDeleted . Update this column for any deleted dimension members to TRUE (1). The
current, latest version of a dimension member might similarly be marked with a Boolean
(bit) value in the IsCurrent or IsActive columns. All reporting queries and Power BI
semantic models should filter out records that are soft deletes.
Date dimension
Calendar and time dimensions are special cases because they usually don't have source
data. Instead, they're generated by using fixed logic.
You should load the date dimension table at the beginning of every new year to extend
its rows to a specific number of years ahead. There might be other business data, for
example fiscal year data, holidays, week numbers to update regularly.
When the date dimension table includes relative offset attributes, the ETL process must
be run daily to update offset attribute values based on the current date (today).
We recommend that the logic to extend or update the date dimension table be written
in T-SQL and encapsulated in a stored procedure.
7 Note
Usually the surrogate key can be computed for the date and time dimensions
because they should use YYYYMMDD or HHMM format. For more information, see
Calendar and time.
If a dimension key lookup fails, it could indicate an integrity issue with the source
system. In this case, the fact row must still get inserted into the fact table. A valid
dimension key must still be stored. One approach is to store a special dimension
member (like Unknown). This approach requires a later update to correctly assign the
true dimension key value, when known.
) Important
Because Fabric Warehouse doesn't enforce foreign keys, it's critical that the ETL
process check for integrity when it loads data into fact tables.
Another approach, relevant when there's confidence that the natural key is valid, is to
insert a new dimension member and then store its surrogate key value. For more
information, see Inferred dimension members later in this section.
The following diagram depicts the logic used to process a fact table.
Whenever possible, a fact table should be loaded incrementally, meaning that new facts
are detected and inserted. An incremental load strategy is more scalable, and it reduces
the workload for both the source systems and the destination systems.
) Important
Especially for a large fact table, it should be a last resort to truncate and reload a
fact table. That approach is expensive in terms of process time, compute resources,
and possible disruption to the source systems. It also involves complexity when the
fact table dimensions apply SCD type 2. That's because dimension key lookups will
need to be done within the validity period of the dimension member versions.
Hopefully, you can efficiently detect new facts by relying on source system identifiers or
timestamps. For example, when a source system reliably records sales orders that are in
sequence, you can store the latest sales order number retrieved (known as the high
watermark). The next process can use that sales order number to retrieve newly created
sales orders, and again, store the latest sales order number retrieved for use by the next
process. It might also be possible that a create date column could be used to reliably
detect new orders.
If you can't rely on the source system data to efficiently detect new facts, you might be
able to rely on a capability of the source system to perform an incremental load. For
example, SQL Server and Azure SQL Managed Instance have a feature called change
data capture (CDC), which can track changes to each row in a table. Also, SQL Server,
Azure SQL Managed Instance, and Azure SQL Database have a feature called change
tracking, which can identify rows that have changed. When enabled, it can help you to
efficiently detect new or changed data in any database table. You might also be able to
add triggers to relational tables that store keys of inserted, updated, or deleted table
records.
Lastly, you might be able to correlate source data to fact table by using attributes. For
example, the sales order number and sales order line number. However, for large fact
tables, it could be a very expensive operation to detect new, changed, or deleted facts. It
could also be problematic when the source system archives operational data.
All that's known about the dimension member is its natural key. The fact load process
needs to create a new dimension member by using Unknown attribute values.
Importantly, it must set the IsInferredMember audit attribute to TRUE . That way, when
the late arriving details are sourced, the dimension load process can make the necessary
updates to the dimension row. For more information, see Manage historical change in
this article.
When you anticipate fact updates or deletions, you should include attributes (like a sales
order number and its sales order line number) in the fact table to help identify the fact
rows to modify. Be sure to index these columns to support efficient modification
operations.
Lastly, if fact data was inserted by using a special dimension member (like Unknown),
you'll need to run a periodic process that retrieves current source data for such fact rows
and update dimension keys to valid values.
Related content
For more information about loading data into a Fabric Warehouse, see:
Feedback
Was this page helpful? Yes No
Warehouse in Microsoft Fabric offers built-in data ingestion tools that allow users to
ingest data into warehouses at scale using code-free or code-rich experiences.
Use the COPY (Transact-SQL) statement for code-rich data ingestion operations,
for the highest data ingestion throughput possible, or when you need to add data
ingestion as part of a Transact-SQL logic. For syntax, see COPY INTO (Transact-
SQL).
Use data pipelines for code-free or low-code, robust data ingestion workflows that
run repeatedly, at a schedule, or that involves large volumes of data. For more
information, see Ingest data using Data pipelines.
Use dataflows for a code-free experience that allow custom transformations to
source data before it's ingested. These transformations include (but aren't limited
to) changing data types, adding or removing columns, or using functions to
produce calculated columns. For more information, see Dataflows.
Use cross-warehouse ingestion for code-rich experiences to create new tables
with source data within the same workspace. For more information, see Ingest data
using Transact-SQL and Write a cross-database query.
7 Note
The COPY statement in Warehouse supports only data sources on Azure storage
accounts, OneLake sources are currently not supported.
For cross-warehouse ingestion, data sources must be within the same Microsoft Fabric
workspace. Queries can be performed using three-part naming for the source data.
SQL
The COPY (Transact-SQL) statement currently supports the PARQUET and CSV file
formats. For data sources, currently Azure Data Lake Storage (ADLS) Gen2 and Azure
Blob Storage are supported.
Data pipelines and dataflows support a wide variety of data sources and data formats.
For more information, see Data pipelines and Dataflows.
Best practices
The COPY command feature in Warehouse in Microsoft Fabric uses a simple, flexible,
and fast interface for high-throughput data ingestion for SQL workloads. In the current
version, we support loading data from external storage accounts only.
You can also use TSQL to create a new table and then insert into it, and then update and
delete rows of data. Data can be inserted from any database within the Microsoft Fabric
workspace using cross-database queries. If you want to ingest data from a Lakehouse to
a warehouse, you can do this with a cross database query. For example:
SQL
Avoid ingesting data using singleton INSERT statements, as this causes poor
performance on queries and updates. If singleton INSERT statements were used
for data ingestion consecutively, we recommend creating a new table by using
CREATE TABLE AS SELECT (CTAS) or INSERT...SELECT patterns, dropping the
original table, and then creating your table again from the table you created using
CREATE TABLE AS SELECT (CTAS).
Dropping your existing table impacts your semantic model, including any
custom measures or customizations you might have made to the semantic
model.
When working with external data on files, we recommend that files are at least 4
MB in size.
For large compressed CSV files, consider splitting your file into multiple files.
Azure Data Lake Storage (ADLS) Gen2 offers better performance than Azure Blob
Storage (legacy). Consider using an ADLS Gen2 account whenever possible.
For pipelines that run frequently, consider isolating your Azure storage account
from other services that could access the same files at the same time.
Explicit transactions allow you to group multiple data changes together so that
they're only visible when reading one or more tables when the transaction is fully
committed. You also have the ability to roll back the transaction if any of the
changes fail.
If a SELECT is within a transaction, and was preceded by data insertions, the
automatically generated statistics can be inaccurate after a rollback. Inaccurate
statistics can lead to unoptimized query plans and execution times. If you roll back
a transaction with SELECTs after a large INSERT, update statistics for the columns
mentioned in your SELECT.
7 Note
Regardless of how you ingest data into warehouses, the parquet files produced by
the data ingestion task will be optimized using V-Order write optimization. V-Order
optimizes parquet files to enable lightning-fast reads under the Microsoft Fabric
compute engines such as Power BI, SQL, Spark and others. Warehouse queries in
general benefit from faster read times for queries with this optimization, still
ensuring the parquet files are 100% compliant to its open-source specification.
Unlike in Fabric Data Engineering, V-Order is a global setting in Synapse Data
Warehouse that cannot be disabled. For more information on V-Order, see
Understand and manage V-Order for Warehouse.
Related content
Ingest data using Data pipelines
Ingest data using the COPY statement
Ingest data using Transact-SQL
Create your first dataflow to get and transform data
COPY (Transact-SQL)
CREATE TABLE AS SELECT (Transact-SQL)
INSERT (Transact-SQL)
Feedback
Was this page helpful? Yes No
Data pipelines offer an alternative to using the COPY command through a graphical user
interface. A data pipeline is a logical grouping of activities that together perform a data
ingestion task. Pipelines allow you to manage extract, transform, and load (ETL) activities
instead of managing each one individually.
In this tutorial, you'll create a new pipeline that loads sample data into a Warehouse in
Microsoft Fabric.
7 Note
Some features from Azure Data Factory are not available in Microsoft Fabric, but
the concepts are interchangeable. You can learn more about Azure Data Factory
and Pipelines on Pipelines and activities in Azure Data Factory and Azure Synapse
Analytics. For a quickstart, visit Quickstart: Create your first pipeline to copy data.
3. You'll land in the pipeline canvas area, where you see three options to get started:
Add a pipeline activity, Copy data, and Choose a task to start.
Add pipeline activity: this option launches the pipeline editor, where you can
create new pipelines from scratch by using pipeline activities.
Copy data: this option launches a step-by-step assistant that helps you select
a data source, a destination, and configure data load options such as the
column mappings. On completion, it creates a new pipeline activity with a
Copy Data task already configured for you.
Choose a task to start: this option launches a set of predefined templates to
help get you started with pipelines based on different scenarios.
4. The first page of the Copy data assistant helps you pick your own data from
various data sources, or select from one of the provided samples to get started.
For this tutorial, we'll use the COVID-19 Data Lake sample. Select this option and
select Next.
5. In the next page, you can select a dataset, the source file format, and preview the
selected dataset. Select Bing COVID-19, the CSV format, and select Next.
6. The next page, Data destinations, allows you to configure the type of the
destination workspace. We'll load data into a warehouse in our workspace, so
select the Warehouse tab, and the Data Warehouse option. Select Next.
7. Now it's time to pick the warehouse to load data into. Select your desired
warehouse in the dropdown list and select Next.
8. The last step to configure the destination is to provide a name to the destination
table and configure the column mappings. Here you can choose to load the data
to a new table or to an existing one, provide a schema and table names, change
column names, remove columns, or change their mappings. You can accept the
defaults, or adjust the settings to your preference.
9. The next page gives you the option to use staging, or provide advanced options
for the data copy operation (which uses the T-SQL COPY command). Review the
options without changing them and select Next.
10. The last page in the assistant offers a summary of the copy activity. Select the
option Start data transfer immediately and select Save + Run.
11. You are directed to the pipeline canvas area, where a new Copy Data activity is
already configured for you. The pipeline starts to run automatically. You can
monitor the status of your pipeline in the Output pane:
12. After a few seconds, your pipeline finishes successfully. Navigating back to your
warehouse, you can select your table to preview the data and confirm that the
copy operation concluded.
For more on data ingestion into your Warehouse in Microsoft Fabric, visit:
Next step
Query the SQL analytics endpoint or Warehouse in Microsoft Fabric
Feedback
Was this page helpful? Yes No
The COPY statement is the primary way to ingest data into Warehouse tables. COPY
performs high high-throughput data ingestion from an external Azure storage account,
with the flexibility to configure source file format options, a location to store rejected
rows, skipping header rows, and other options.
This tutorial shows data ingestion examples for a Warehouse table using the T-SQL
COPY statement. It uses the Bing COVID-19 sample data from the Azure Open Datasets.
For details about this data, including its schema and usage rights, see Bing COVID-19.
7 Note
To learn more about the T-SQL COPY statement including more examples and the
full syntax, see COPY (Transact-SQL).
Create a table
Before you use the COPY statement, the destination table needs to be created. To create
the destination table for this sample, use the following steps:
3. To create the table used as the destination in this tutorial, run the following code:
SQL
SQL
If you ran the previous example to load data from Parquet, consider deleting all data
from your table:
SQL
To load data from a CSV file skipping a header row, use the following code:
SQL
If you ran both examples without deleting the rows in between runs, you'll see the result
of this query with twice as many rows. While that works for data ingestion in this case,
consider deleting all rows and ingesting data only once if you're going to further
experiment with this data.
Related content
Ingest data using Data pipelines
Ingest data into your Warehouse using Transact-SQL
Ingesting data into the Warehouse
Feedback
Was this page helpful? Yes No
The Transact-SQL language offers options you can use to load data at scale from
existing tables in your lakehouse and warehouse into new tables in your warehouse.
These options are convenient if you need to create new versions of a table with
aggregated data, versions of tables with a subset of the rows, or to create a table as a
result of a complex query. Let's explore some examples.
7 Note
The examples in this article use the Bing COVID-19 sample dataset. To load the
sample dataset, follow the steps in Ingest data into your Warehouse using the
COPY statement to create the sample data into your warehouse.
The first example illustrates how to create a new table that is a copy of the existing dbo.
[bing_covid-19_data_2023] table, but filtered to data from the year 2023 only:
SQL
You can also create a new table with new year , month , dayofmonth columns, with values
obtained from updated column in the source table. This can be useful if you're trying to
visualize infection data by year, or to see months when the most COVID-19 cases are
observed:
SQL
As another example, you can create a new table that summarizes the number of cases
observed in each month, regardless of the year, to evaluate how seasonality affects
spread in a given country/region. It uses the table created in the previous example with
the new month column as a source:
SQL
Based on this new table, we can see that the United States observed more confirmed
cases across all years in the month of January , followed by December and October .
April is the month with the lowest number of cases overall:
SQL
For more examples and syntax reference, see CREATE TABLE AS SELECT (Transact-SQL).
SQL
The query criteria for the SELECT statement can be any valid query, as long as the
resulting query column types align with the columns on the destination table. If column
names are specified and include only a subset of the columns from the destination
table, all other columns are loaded as NULL . For more information, see Using INSERT
INTO...SELECT to Bulk Import data with minimal logging and parallelism.
assets:
A new table can be created that uses three-part naming to combine data from tables on
these workspace assets:
SQL
To learn more about cross-warehouse queries, see Write a cross-database SQL Query.
Related content
Ingesting data into the Warehouse
Ingest data using the COPY statement
Ingest data using Data pipelines
Write a cross-database SQL Query
Feedback
Was this page helpful? Yes No
This tutorial guides you through setting up dbt and deploying your first project to a
Fabric Warehouse.
Introduction
The dbt (Data Build Tool) open-source framework simplifies data transformation and
analytics engineering. It focuses on SQL-based transformations within the analytics
layer, treating SQL as code. dbt supports version control, modularization, testing, and
documentation.
The dbt adapter for Microsoft Fabric can be used to create dbt projects, which can then
be deployed to a Fabric Data Warehouse.
You can also change the target platform for the dbt project by simply changing the
adapter, for example; a project built for Azure Synapse dedicated SQL pool can be
upgraded in a few seconds to a Fabric Data Warehouse .
3. Latest version of the dbt-fabric adapter from the PyPI (Python Package Index)
repository using pip install dbt-fabric .
PowerShell
7 Note
By changing pip install dbt-fabric to pip install dbt-synapse and using
the following instructions, you can install the dbt adapter for Synapse
dedicated SQL pool .
4. Make sure to verify that dbt-fabric and its dependencies are installed by using pip
list command:
PowerShell
pip list
A long list of the packages and current versions should be returned from this
command.
5. If you don't already have one, create a Warehouse. You can use the trial capacity
for this exercise: sign up for the Microsoft Fabric free trial , create a workspace,
and then create a warehouse.
You can clone a repo with Visual Studio Code's built-in source control.
Or, for example, you can use the git clone command:
PowerShell
yml
config:
partial_parse: true
jaffle_shop:
target: fabric-dev
outputs:
fabric-dev:
authentication: CLI
database: <put the database name here>
driver: ODBC Driver 18 for SQL Server
host: <enter your SQL analytics endpoint here>
schema: dbo
threads: 4
type: fabric
7 Note
Change the type from fabric to synapse to switch the database adapter to
Azure Synapse Analytics, if desired. Any existing dbt project's data platform
can be updated by changing the database adapter. For more information, see
the dbt list of supported data platforms .
Run az login in Visual Studio Code terminal if you're using Azure CLI
authentication.
For Service Principal or other Microsoft Entra ID (formerly Azure Active
Directory) authentication in Microsoft Fabric, refer to dbt (Data Build Tool)
setup and dbt Resource Configurations . For more information, see
Microsoft Entra authentication as an alternative to SQL authentication in
Microsoft Fabric.
6. Now you're ready to test the connectivity. To test the connectivity to your
warehouse, run dbt debug in the Visual Studio Code terminal.
PowerShell
dbt debug
All checks are passed, which means you can connect your warehouse using dbt-
fabric adapter from the jaffle_shop dbt project.
7. Now, it's time to test if the adapter is working or not. First run dbt seed to insert
sample data into the warehouse.
PowerShell
dbt run
9. Run dbt test to run the models defined in the demo dbt project.
PowerShell
dbt test
1. Install the new adapter. For more information and full installation instructions, see
dbt adapters .
For more information to operationalize dbt with your warehouse, see Transform data
using dbt with Data Factory in Microsoft Fabric.
Considerations
Important things to consider when using dbt-fabric adapter:
Some T-SQL commands are supported by dbt-fabric adapter using Create Table
as Select (CTAS), DROP , and CREATE commands, such as ALTER TABLE
Review Unsupported data types to learn about the supported and unsupported
data types.
You can log issues on the dbt-fabric adapter on GitHub by visiting Issues ·
microsoft/dbt-fabric · GitHub .
Next step
Transform data using dbt with Data Factory in Microsoft Fabric
Related content
What is data warehousing in Microsoft Fabric?
Tutorial: Create a Warehouse in Microsoft Fabric
Tutorial: Transform data using a stored procedure
Source Control with Warehouse
Feedback
Was this page helpful? Yes No
Applies to: SQL analytics endpoint, Warehouse, and Mirrored Database in Microsoft
Fabric
Visualizations and analysis in Power BI reports can now be built in the web - or in just a
few steps in Power BI Desktop - saving users time, resources, and by default, providing a
seamless consumption experience for end-users. The default Power BI semantic model
follows the naming convention of the Lakehouse.
Power BI semantic models represent a source of data ready for reporting, visualization,
discovery, and consumption. Power BI semantic models provide:
7 Note
Microsoft has renamed the Power BI dataset content type to semantic model. This
applies to Microsoft Fabric as well. For more information, see New name for Power
BI datasets.
Direct Lake provides the most performant query and reporting experience. Direct Lake is
a fast path to consume the data from the data lake straight into the Power BI engine,
ready for analysis.
In traditional DirectQuery mode, the Power BI engine directly queries the data from
the source for each query execution, and the query performance depends on the
data retrieval speed. DirectQuery eliminates the need to copy data, ensuring that
any changes in the source are immediately reflected in query results.
In Import mode, the performance is better because the data is readily available in
memory, without having to query the data from the source for each query
execution. However, the Power BI engine must first copy the data into the memory,
at data refresh time. Any changes to the underlying data source are picked up
during the next data refresh.
Direct Lake mode eliminates the Import requirement by consuming the data files
directly into memory. Because there's no explicit import process, it's possible to
pick up any changes at the source as they occur. Direct Lake combines the
advantages of DirectQuery and Import mode while avoiding their disadvantages.
Direct Lake mode is the ideal choice for analyzing very large datasets and datasets
with frequent updates at the source. Direct Lake will automatically fallback to
DirectQuery using the SQL analytics endpoint of the Warehouse or SQL analytics
endpoint when Direct Lake exceeds limits for the SKU, or uses features not
supported, allowing report users to continue uninterrupted.
Direct Lake mode is the storage mode for default Power BI semantic models, and new
Power BI semantic models created in a Warehouse or SQL analytics endpoint. Using
Power BI Desktop, you can also create Power BI semantic models using the SQL analytics
endpoint of Warehouse or SQL analytics endpoint as a data source for semantic models
in import or DirectQuery storage mode.
The default semantic model is queried via the SQL analytics endpoint and updated via
changes to the Lakehouse or Warehouse. You can also query the default semantic model
via cross-database queries from a Warehouse.
1. Manually enable the Sync the default Power BI semantic model setting for each
Warehouse or SQL analytics endpoint in the workspace. This will restart the
background sync that will incur some consumption costs.
2. Manually pick tables and views to be added to semantic model through Manage
default Power BI semantic model in the ribbon or info bar.
7 Note
In case you are not using the default Power BI semantic model for reporting
purposes, manually disable the Sync the default Power BI semantic model setting
to avoid adding objects automatically. The setting update will ensure that
background sync will not get triggered and save on Onelake consumption costs.
2. Review the default layout for the default semantic model objects.
The default layout for BI-enabled tables persists in the user session and is generated
whenever a user navigates to the model view. Look for the Default semantic model
objects tab.
To load the semantic model, select the name of the semantic model.
SQL Server Profiler installs with SQL Server Management Studio (SSMS), and allows
tracing and debugging of semantic model events. Although officially deprecated for SQL
Server, Profiler is still included in SSMS and remains supported for Analysis Services and
Power BI. Use with the Fabric default Power BI semantic model requires SQL Server
Profiler version 18.9 or higher. Users must specify the semantic model as the initial
catalog when connecting with the XMLA endpoint. To learn more, see SQL Server
Profiler for Analysis Services.
View the Tabular Model Scripting Language (TMSL) schema of the semantic model by
scripting it out via the Object Explorer in SSMS. To connect, use the semantic model's
connection string, which looks like powerbi://api.powerbi.com/v1.0/myorg/username . You
can find the connection string for your semantic model in the Settings, under Server
settings. From there, you can generate an XMLA script of the semantic model via
SSMS's Script context menu action. For more information, see Dataset connectivity with
the XMLA endpoint.
Scripting requires Power BI write permissions on the Power BI semantic model. With
read permissions, you can see the data but not the schema of the Power BI semantic
model.
Create a new Power BI semantic model in
Direct Lake storage mode
You can also create additional Power BI semantic models in Direct Lake mode based off
SQL analytics endpoint or Warehouse data. These new Power BI semantic models can
be edited in the workspace using Open data model and can be used with other features
such as write DAX queries and semantic model row-level security.
The New Power BI semantic model button creates a new blank semantic model
separate from the default semantic model.
To create a Power BI semantic model in Direct Lake mode, follow these steps:
1. Open the lakehouse and select New Power BI semantic model from the ribbon.
3. Enter a name for the new semantic model, select a workspace to save it in, and
pick the tables to include. Then select Confirm.
4. The new Power BI semantic model can be edited in the workspace, where you can
add relationships, measures, rename tables and columns, choose how values are
displayed in report visuals, and much more. If the model view does not show after
creation, check the pop-up blocker of your browser.
5. To edit the Power BI semantic model later, select Open data model from the
semantic model context menu or item details page to edit the semantic model
further.
Power BI reports can be created in the workspace by selecting New report from web
modeling, or in Power BI Desktop by live connecting to this new semantic model.
To learn more on how to edit data models in the Power BI service, see Edit Data Models.
1. Open Power BI Desktop, sign in, and click on OneLake data hub.
3. Select the Connect button dropdown and choose Connect to SQL endpoint.
4. Select import or DirectQuery storage mode and the tables to add to the semantic
model.
From there you can create the Power BI semantic model and report to publish to the
workspace when ready.
Limitations
Default Power BI semantic models follow the current limitations for semantic models in
Power BI. Learn more:
If the parquet, Apache Spark, or SQL data types can't be mapped to one of the Power BI
desktop data types, they are dropped as part of the sync process. This is in line with
current Power BI behavior. For these columns, we recommend that you add explicit type
conversions in their ETL processes to convert it to a type that is supported. If there are
data types that are needed upstream, users can optionally specify a view in SQL with the
explicit type conversion desired. This will be picked up by the sync or can be added
manually as previously indicated.
Default Power BI semantic models can only be edited in the SQL analytics endpoint
or warehouse.
Related content
Define relationships in data models for data warehousing in Microsoft Fabric
Model data in the default Power BI semantic model in Microsoft Fabric
Feedback
Was this page helpful? Yes No
Applies to: ✅ SQL analytics endpoint, Warehouse, and Mirrored Database in Microsoft
Fabric
The default Power BI semantic model inherits all relationships between entities defined
in the model view and infers them as Power BI semantic model relationships, when
objects are enabled for BI (Power BI Reports). Inheriting the warehouse's business logic
allows a warehouse developer or BI analyst to decrease the time to value towards
building a useful semantic model and metrics layer for analytical business intelligence
(BI) reports in Power BI, Excel, or external tools like Tableau that read the XMLA format.
While all constraints are translated to relationships, currently in Power BI, only one
relationship can be active at a time, whereas multiple primary and foreign key
constraints can be defined for warehouse entities and are shown visually in the diagram
lines. The active Power BI relationship is represented with a solid line and the rest is
represented with a dotted line. We recommend choosing the primary relationship as
active for BI reporting purposes.
7 Note
Microsoft has renamed the Power BI dataset content type to semantic model. This
applies to Microsoft Fabric as well. For more information, see New name for Power
BI datasets.
ノ Expand table
Column name Description
RelyOnReferentialIntegrity A boolean value that indicates whether the relationship can rely on
referential integrity or not.
To add objects such as tables or views to the default Power BI semantic model, you have
options:
Manually enable the Sync the default Power BI semantic model setting that will
automatically add objects to the semantic model. For more information, see Sync
the default Power BI semantic model.
Manually add objects to the semantic model.
The auto detect experience determines any tables or views and opportunistically adds
them.
The manually detect option in the ribbon allows fine grained control of which object(s),
such as tables and/or views, should be added to the default Power BI semantic model:
Select all
Filter for tables or views
Select specific objects
To remove objects, a user can use the manually select button in the ribbon and:
Un-select all
Filter for tables or views
Un-select specific objects
Tip
We recommend reviewing the objects enabled for BI and ensuring they have the
correct logical relationships to ensure a smooth downstream reporting experience.
Related content
Model data in the default Power BI semantic model in Microsoft Fabric
Default Power BI semantic models in Microsoft Fabric
Create reports in the Power BI service in Microsoft Fabric and Power BI Desktop
Share your warehouse and manage permissions
Feedback
Was this page helpful? Yes No
Applies to: ✅ SQL analytics endpoint, Warehouse, and Mirrored Database in Microsoft
Fabric
Warehouse modeling
Modeling the warehouse is possible by setting primary and foreign key constraints and
setting identity columns on the model layouts within the data warehouse user interface.
After you navigate the model layouts, you can do this in a visual entity relationship
diagram that allows a user to drag and drop tables to infer how the objects relate to one
another. Lines visually connecting the entities infer the type of physical relationships
that exist.
In the model layouts, users can model their warehouse and the canonical autogenerated
default Power BI semantic model. We recommend modeling your data warehouse using
traditional Kimball methodologies, using a star schema, wherever possible. There are
two types of modeling possible:
You only see the table names and columns from which you can choose, you aren't
presented with a data preview, and the relationship choices you make are only validated
when you select Apply changes. Using the Properties pane and its streamlined
approach reduces the number of queries generated when editing a relationship, which
can be important for big data scenarios, especially when using DirectQuery connections.
Relationships created using the Properties pane can also use multi-select relationships
in the Model view diagram layouts. Pressing the Ctrl key and select more than one line
to select multiple relationships. Common properties can be edited in the Properties
pane and Apply changes processes the changes in one transaction.
Currently, the model layouts are only persisted in session. However the database
changes are persisted. Users can use the auto-layout whenever a new tab is created to
visually inspect the database design and understand the modeling.
Next step
Model data in the default Power BI semantic model in Microsoft Fabric
Feedback
Was this page helpful? Yes No
Applies to: ✅ SQL analytics endpoint, Warehouse, and Mirrored Database in Microsoft
Fabric
This article describes three different scenarios you can follow to create reports in the
Power BI service.
If no tables have been added to the default Power BI semantic model, the dialog first
automatically adds tables, prompting the user to confirm or manually select the tables
included in the canonical default semantic model first, ensuring there's always data first.
With a default semantic model that has tables, the New report opens a browser tab to
the report editing canvas to a new report that is built on the semantic model. When you
save your new report you're prompted to choose a workspace, provided you have write
permissions for that workspace. If you don't have write permissions, or if you're a free
user and the semantic model resides in a Premium capacity workspace, the new report is
saved in your My workspace.
In the OneLake catalog, you see warehouse and their associated default semantic
models. Select the warehouse to navigate to the warehouse details page. You can see
the warehouse metadata, supported actions, lineage and impact analysis, along with
related reports created from that warehouse. Default semantic models derived from a
warehouse behave the same as any semantic model.
To find the warehouse, you begin with the OneLake catalog. The following image shows
the OneLake catalog in the Power BI service:
1. Use Data hub menu in the ribbon to get list of all items.
3. On the Connect button, select the dropdown, and select Connect to SQL
endpoint.
Related content
Connectivity
Create reports
Tutorial: Get started creating in the Power BI service
Feedback
Was this page helpful? Yes No
This article covers security topics for securing the SQL analytics endpoint of the
lakehouse and the Warehouse in Microsoft Fabric.
For information on connecting to the SQL analytics endpoint and Warehouse, see
Connectivity.
Workspace roles
Workspace roles are used for development team collaboration within a workspace. Role
assignment determines the actions available to the user and applies to all items within
the workspace.
For details on the specific Warehouse capabilities provided through workspace roles, see
Workspace roles in Fabric data warehousing.
Item permissions
In contrast to workspace roles, which apply to all items within a workspace, item
permissions can be assigned directly to individual Warehouses. The user will receive the
assigned permission on that single warehouse. The primary purpose for these
permissions is to enable sharing for downstream consumption of the Warehouse.
For details on the specific permissions provided for warehouses, see Share your
warehouse and manage permissions.
Granular security
Workspace roles and item permissions provide an easy way to assign coarse permissions
to a user for the entire warehouse. However, in some cases, more granular permissions
are needed for a user. To achieve this, standard T-SQL constructs can be used to provide
specific permissions to users.
Microsoft Fabric data warehousing supports several data protection technologies that
administrators can use to protect sensitive data from unauthorized access. By securing
or obfuscating data from unauthorized users or roles, these security features can
provide data protection in both a Warehouse and SQL analytics endpoint without
application changes.
Object-level security
For details on the managing granular permissions in SQL, see SQL granular permissions.
Row-level security
Row-level security is a database security feature that restricts access to individual rows
or records within a database table based on specified criteria, such as user roles or
attributes. It ensures that users can only view or manipulate data that is explicitly
authorized for their access, enhancing data privacy and control.
For details on row-level security, see Row-level security in Fabric data warehousing.
Column-level security
Column-level security is a database security measure that limits access to specific
columns or fields within a database table, allowing users to see and interact with only
the authorized columns while concealing sensitive or restricted information. It offers
fine-grained control over data access, safeguarding confidential data within a database.
For details on dynamic data masking, see Dynamic data masking in Fabric data
warehousing.
Share a warehouse
Sharing is a convenient way to provide users read access to your Warehouse for
downstream consumption. Sharing allows downstream users in your organization to
consume a Warehouse using SQL, Spark, or Power BI. You can customize the level of
permissions that the shared recipient is granted to provide the appropriate level of
access.
For more information on sharing, see How to share your warehouse and manage
permissions.
Only team members who are currently collaborating on the solution should be
assigned to Workspace roles (Admin, Member, Contributor), as this provides them
access to all Items within the workspace.
If they primarily require read only access, assign them to the Viewer role and grant
read access on specific objects through T-SQL. For more information, see Manage
SQL granular permissions.
If they are higher privileged users, assign them to Admin, Member, or Contributor
roles. The appropriate role is dependent on the other actions that they need to
perform.
Other users, who only need access to an individual warehouse or require access to
only specific SQL objects, should be given Fabric Item permissions and granted
access through SQL to the specific objects.
You can manage permissions on Microsoft Entra ID (formerly Azure Active
Directory) groups, as well, rather than adding each specific member. For more
information, see Microsoft Entra authentication as an alternative to SQL
authentication in Microsoft Fabric.
For more information on how to access user audit logs, see Track user activities in
Microsoft Fabric and Operations list.
Related content
Connectivity
SQL granular permissions in Microsoft Fabric
How to share your warehouse and manage permissions
Microsoft Entra authentication as an alternative to SQL authentication in Microsoft
Fabric
Feedback
Was this page helpful? Yes No
This article covers technical methods that users and customers can employ to transition
from SQL authentication to Microsoft Entra authentication within Microsoft Fabric.
Microsoft Entra authentication is an alternative to usernames and passwords via SQL
authentication for signing in to the SQL analytics endpoint of the lakehouse or the
Warehouse in Microsoft Fabric. Microsoft Entra authentication is advisable and vital for
creating a secure data platform.
Microsoft Entra plays a crucial role in Microsoft Fabric's security for several reasons:
Authentication: Verify users and service principals using Microsoft Entra ID, which
grants access tokens for operations within Fabric.
Secure access: Connect securely to cloud apps from any device or network,
safeguarding requests made to Fabric.
Conditional access: Admins can set policies that assess user login context, control
access, or enforce extra verification steps.
Integration: Microsoft Entra ID seamlessly works with all Microsoft SaaS offerings,
including Fabric, allowing easy access across devices and networks.
Broad platform: Gain access to Microsoft Fabric with Microsoft Entra ID via any
method, whether through the Fabric portal, SQL connection string, REST API, or
XMLA endpoint.
Microsoft Entra adopts a complete Zero Trust policy, offering a superior alternative to
traditional SQL authentication limited to usernames and passwords. This approach:
Prevents user impersonation.
Enables fine-grained access control considering user identity, environment,
devices, etc.
Supports advanced security like Microsoft Entra multifactor authentication.
Fabric configuration
Microsoft Entra authentication for use with a Warehouse or Lakehouse SQL analytics
endpoint requires configuration in both Tenant and Workspace settings.
Tenant setting
A Fabric admin in your tenant must permit service principal names (SPN) access to
Fabric APIs, necessary for the SPN to interface for SQL connection strings to Fabric
warehouse or SQL analytics endpoint items.
This setting is located in the Developer settings section and is labeled Service principals
can use Fabric APIs. Make sure it is Enabled.
Workspace setting
A Fabric admin in your workspace must grant access for a user or SPN to access Fabric
items.
1. In the Manage access option in the Workspace, assign the Contributor role.
For more information, see Service roles.
You can alter the default permissions given to the User or SPN by the system. Use the T-
SQL GRANT and DENY commands to alter permissions as required, or ALTER ROLE to
add membership to roles.
Currently, SPNs don't have the capability as user accounts for detailed permission
configuration with GRANT / DENY .
User identities are the unique credentials for each user within an organization.
SPNs represent application objects within a tenant and act as the identity for
instances of applications, taking on the role of authenticating and authorizing
those applications.
Fabric is compatible with any application or tool able to connect to a product with the
SQL Database Engine. Similar to a SQL Server instance connection, TDS operates on TCP
port 1433. For more information about Fabric SQL connectivity and finding the SQL
connection string, see Connectivity.
Applications and client tools can set the Authentication connection property in the
connection string to choose a Microsoft Entra authentication mode. The following table
details the different Microsoft Entra authentication modes, including support for
Microsoft Entra multifactor authentication (MFA).
ノ Expand table
Authentication Scenarios Comments
mode
Microsoft Entra Utilized by applications or tools in Activate MFA and Microsoft Entra
Interactive situations where user authentication Conditional Access policies to
can occur interactively, or when it is enforce organizational rules.
acceptable to have manual
intervention for credential verification.
Microsoft Entra Used by apps for secure authentication Advisable to enable Microsoft Entra
Service Principal without human intervention, most Conditional Access policies.
suited for application integration.
Microsoft Entra When applications can't use SPN- MFA must be off, and no
Password based authentication due to conditional access policies can be
incompatibility, or require a generic set. We recommend validating with
username and password for many the customer's security team
users, or if other methods are before opting for this solution.
infeasible.
However, sometimes it's necessary to adjust additional settings such as enabling certain
ports or firewalls to facilitate Microsoft Entra authentication on the host machine.
Applications and tools must upgrade drivers to versions that support Microsoft Entra
authentication and add an authentication mode keyword in their SQL connection string,
like ActiveDirectoryInteractive , ActiveDirectoryServicePrincipal , or
ActiveDirectoryPassword .
Microsoft OLE DB
The OLE DB Driver for SQL Server is a stand-alone data access API designed for OLE DB
and first released with SQL Server 2005 (9.x). Since, expanded features include SPN-
based authentication with version 18.5.0, adding to the existing authentication methods
from earlier versions.
ノ Expand table
For more information on Microsoft Entra authentication with ODBC, see Using Microsoft
Entra ID with the ODBC Driver sample code.
ノ Expand table
For a python code snippet using ODBC with SPN-based authentication, see pyodbc-dw-
connectivity.py .
requires additional jars as dependencies, which must be compatible with the version of
the mssql-driver used in your application. For more information, see Feature
dependencies of JDBC driver and Client setup requirement.
ノ Expand table
For a java code snippet using JDBC with SPN-based authentication, see
fabrictoolbox/dw_connect.java and sample pom file pom.xml .
ノ Expand table
Microsoft.Data.SqlClient.Connect.cs
System.Data.SqlClient.Connect.cs
Related content
Connectivity
Security for data warehousing in Microsoft Fabric
Feedback
Was this page helpful? Yes No
Provide product feedback | Ask the community
Workspace roles in Fabric data
warehousing
Article • 07/18/2024
This article details the permissions that workspace roles provide in SQL analytics
endpoint and Warehouse. For instructions on assigning workspace roles, see Give
Workspace Access.
Workspace roles
Assigning users to the various workspace roles provides the following capabilities:
ノ Expand table
Workspace Description
role
Admin Grants the user CONTROL access for each Warehouse and SQL analytics endpoint
within the workspace, providing them with full read/write permissions and the
ability to manage granular user SQL permissions.
Member Grants the user CONTROL access for each Warehouse and SQL analytics endpoint
within the workspace, providing them with full read/write permissions and the
ability to manage granular user SQL permissions.
Contributor Grants the user CONTROL access for each Warehouse and SQL analytics endpoint
within the workspace, providing them with full read/write permissions and the
ability to manage granular user SQL permissions.
Viewer Grants the user CONNECT and ReadData permissions for each Warehouse and
SQL analytics endpoint within the workspace. Viewers have SQL permissions to
read data from tables/views using T-SQL. For more information, see Manage SQL
granular permissions.
Related content
Security for data warehousing in Microsoft Fabric
SQL granular permissions
Row-level security in Fabric data warehousing
Column-level security in Fabric data warehousing
Feedback
Was this page helpful? Yes No
Limitations
CREATE USER cannot be explicitly executed currently. When GRANT or DENY is
executed, the user is created automatically. The user will not be able to connect
until sufficient workspace level rights are given.
View my permissions
When a user connects to the SQL connection string, they can view the permissions
available to them using the sys.fn_my_permissions function.
SQL
SELECT *
FROM sys.fn_my_permissions(NULL, 'Database');
SQL
SELECT *
FROM sys.fn_my_permissions('<schema-name>', 'Schema');
SQL
SELECT *
FROM sys.fn_my_permissions('<schema-name>.<object-name>', 'Object');
SQL
Feedback
Was this page helpful? Yes No
Sharing is a convenient way to provide users read access to your data for downstream
consumption. Sharing allows downstream users in your organization to consume a
Warehouse using T-SQL, Spark, or Power BI. You can customize the level of permissions
that the shared recipient is granted to provide the appropriate level of access.
7 Note
Get started
After identifying the Warehouse item you would like to share with another user in your
Fabric workspace, select the quick action in the row to Share.
The following animated gif reviews the steps to select a warehouse to share, select the
permissions to assign, and then finally Grant the permissions to another user.
Share a warehouse
1. You can share your Warehouse from the OneLake data hub or Warehouse item by
choosing Share from quick action, as highlighted in the following image.
2. You're prompted with options to select who you would like to share the
Warehouse with, what permissions to grant them, and whether they'll be notified
by email.
4. When the shared recipient receives the email, they can select Open and navigate
to the Warehouse Data Hub page.
5. Depending on the level of access the shared recipient has been granted, the
shared recipient is now able to connect to the SQL analytics endpoint, query the
Warehouse, build reports, or read data through Spark.
Tip
"Read all data using SQL" is selected ("ReadData" permissions)- The shared
recipient can read all the objects within the Warehouse. ReadData is the equivalent
of db_datareader role in SQL Server. The shared recipient can read data from all
tables and views within the Warehouse. If you want to further restrict and provide
granular access to some objects within the Warehouse, you can do this using T-
SQL GRANT / REVOKE / DENY statements.
In the SQL analytics endpoint of the Lakehouse, "Read all SQL Endpoint data" is
equivalent to "Read all data using SQL".
"Read all data using Apache Spark" is selected ("ReadAll" permissions)- The
shared recipient has read access to the underlying parquet files in OneLake, which
can be consumed using Spark. ReadAll should be provided only if the shared
recipient wants complete access to your warehouse's files using the Spark engine.
ReadData permissions
With ReadData permissions, the shared recipient can open the Warehouse editor in
read-only mode and query the tables and views within the Warehouse. The shared
recipient can also choose to copy the SQL analytics endpoint provided and connect to a
client tool to run these queries.
ReadAll permissions
A shared recipient with ReadAll permissions can find the Azure Blob File System (ABFS)
path to the specific file in OneLake from the Properties pane in the Warehouse editor.
The shared recipient can then use this path within a Spark Notebook to read this data.
For example, in the following screenshot, a user with ReadAll permissions can query the
data in FactSale with a Spark query in a new notebook.
Build permissions
With Build permissions, the shared recipient can create reports on top of the default
semantic model that is connected to the Warehouse. The shared recipient can create
Power BI reports from the Data Hub or also do the same using Power BI Desktop.
Manage permissions
The Manage permissions page shows the list of users who have been given access by
either assigning to Workspace roles or item permissions.
For users who were provided workspace roles, you'll see the corresponding user,
workspace role, and permissions. Members of the Admin, Member, and Contributor
workspace roles have read/write access to items in this workspace. Viewers have
ReadData permissions and can query all tables and views within the Warehouse in that
workspace. Item permissions Read, ReadData, and ReadAll can be provided to users.
Limitations
If you provide item permissions or remove users who previously had permissions,
permission propagation can take up to two hours. The new permissions are visible
in Manage permissions immediately. Sign in again to ensure that the permissions
are reflected in your SQL analytics endpoint.
Shared recipients are able to access the Warehouse using owner's identity
(delegated mode). Ensure that the owner of the Warehouse is not removed from
the workspace.
Shared recipients only have access to the Warehouse they receive and not any
other items within the same workspace as the Warehouse. If you want to provide
permissions for other users in your team to collaborate on the Warehouse (read
and write access), add them as Workspace roles such as Member or Contributor.
Currently, when you share a Warehouse and choose Read all data using SQL, the
shared recipient can access the Warehouse editor in a read-only mode. These
shared recipients can create queries, but cannot currently save their queries.
Currently, sharing a Warehouse is only available through the user experience.
If you want to provide granular access to specific objects within the Warehouse,
share the Warehouse with no additional permissions, then provide granular access
to specific objects using T-SQL GRANT statement. For more information, see T-SQL
syntax for GRANT, REVOKE, and DENY.
If you see that the ReadAll permissions and ReadData permissions are disabled in
the sharing dialog, refresh the page.
Shared recipients do not have permission to reshare a Warehouse.
If a report built on top of the Warehouse is shared with another recipient, the
shared recipient needs more permissions to access the report. This depends on the
mode of access to the semantic model by Power BI:
If accessed through Direct query mode then ReadData permissions (or granular
SQL permissions to specific tables/views) need to be provided to the
Warehouse.
If accessed through Direct Lake mode, then ReadData permissions (or granular
permissions to specific tables/views) need to be provided to the Warehouse.
Direct Lake mode is the default connection type for semantic models that use a
Warehouse or SQL analytics endpoint as a data source. For more information,
see Direct Lake mode.
If accessed through Import mode then no additional permissions are needed.
Currently, sharing a warehouse directly with an SPN is not supported.
Related content
Query the SQL analytics endpoint or Warehouse in Microsoft Fabric
How to use Microsoft Fabric notebooks
Access Fabric OneLake shortcuts in an Apache Spark notebook
Navigate the Fabric Lakehouse explorer
GRANT (Transact-SQL)
Feedback
Was this page helpful? Yes No
Column-level security simplifies the design and coding of security in your application,
allowing you to restrict column access to protect sensitive data. For example, ensuring
that specific users can access only certain columns of a table pertinent to their
department.
Implement column-level security with the GRANT T-SQL statement. For simplicity of
management, assigning permissions to roles is preferred to using individuals.
Only Microsoft Entra authentication is supported. For more information, see Microsoft
Entra authentication as an alternative to SQL authentication in Microsoft Fabric.
Examples
This example will create a table and will limit the columns that [email protected] can
see in the customers table.
SQL
We will allow Charlie to only access the columns related to the customer, but not the
sensitive CreditCard column:
SQL
Queries executed as [email protected] will fail if they include the CreditCard column:
SQL
Output
Next step
Implement column-level security in Fabric Data Warehousing
Related content
Security for data warehousing in Microsoft Fabric
Share your warehouse and manage permissions
Row-level security in Fabric data warehousing
Dynamic data masking in Fabric data warehousing
Feedback
Was this page helpful? Yes No
Row-level security (RLS) enables you to use group membership or execution context to
control access to rows in a database table. For example, you can ensure that workers
access only those data rows that are pertinent to their department. Another example is
to restrict customers' data access to only the data relevant to their company in a
multitenant architecture. The feature is similar to row-level security in SQL Server.
The access restriction logic is in the database tier, not in any single application tier. The
database applies the access restrictions every time data access is attempted, from any
application or reporting platform including Power BI. This makes your security system
more reliable and robust by reducing the surface area of your security system. Row-level
security only applies to queries on a Warehouse or SQL analytics endpoint in Fabric.
Power BI queries on a warehouse in Direct Lake mode will fall back to Direct Query
mode to abide by row-level security.
Filter predicates are applied while reading data from the base table. They affect all get
operations: SELECT , DELETE , and UPDATE . Each table must have its own row-level security
defined separately. Users who query tables without a row level security policy will view
unfiltered data.
Users can't select or delete rows that are filtered. The user can't update rows that are
filtered. But it's possible to update rows in such a way that they'll be filtered afterward.
You can define a predicate function that joins with another table and/or invokes a
function. If the security policy is created with SCHEMABINDING = ON (the default),
then the join or function is accessible from the query and works as expected
without any additional permission checks.
You can issue a query against a table that has a security predicate defined but
disabled. Any rows that are filtered or blocked aren't affected.
If a dbo user, a member of the db_owner role, or the table owner queries a table
that has a security policy defined and enabled, the rows are filtered or blocked as
defined by the security policy.
Attempts to alter the schema of a table bound by a schema bound security policy
will result in an error. However, columns not referenced by the predicate can be
altered.
Attempts to add a predicate on a table that already has one defined for the
specified operation results in an error. This will happen whether the predicate is
enabled or not.
Define a security policy that filters the rows of a table. The application is unaware
of any rows that are filtered for SELECT , UPDATE , and DELETE operations. Including
situations where all the rows are filtered out. The application can INSERT rows,
even if they will be filtered during any other operation.
Permissions
Creating, altering, or dropping security policies requires the ALTER ANY SECURITY POLICY
permission. Creating or dropping a security policy requires ALTER permission on the
schema.
Additionally, the following permissions are required for each predicate that is added:
REFERENCES permission on every column from the target table used as arguments.
Security policies apply to all users, including dbo users in the database. Dbo users can
alter or drop security policies however their changes to security policies can be audited.
If members of roles like Administrator, Member, or Contributor need to see all rows to
troubleshoot or validate data, the security policy must be written to allow that.
If a security policy is created with SCHEMABINDING = OFF , then to query the target table,
users must have the SELECT or EXECUTE permission on the predicate function and any
additional tables, views, or functions used within the predicate function. If a security
policy is created with SCHEMABINDING = ON (the default), then these permission checks
are bypassed when users query the target table.
$100,000. Even though there is a security predicate in place to prevent a malicious user
from directly querying other people's salary, the user can determine when the query
returns a divide-by-zero exception.
Examples
We can demonstrate row-level security Warehouse and SQL analytics endpoint in
Microsoft Fabric.
The following example creates sample tables that will work with Warehouse in Fabric,
but in SQL analytics endpoint use existing tables. In the SQL analytics endpoint, you
cannot CREATE TABLE , but you can CREATE SCHEMA , CREATE FUNCTION , and CREATE
SECURITY POLICY .
SQL
SQL
To modify a row level security function, you must first drop the security policy. In the
following script, we drop the policy SalesFilter before issuing an ALTER FUNCTION
statement on Security.tvf_securitypredicate . Then, we recreate the policy
SalesFilter .
SQL
Next step
Implement row-level security in Fabric Data Warehousing
Related content
Security for data warehousing in Microsoft Fabric
Share your warehouse and manage permissions
Column-level security in Fabric data warehousing
Dynamic data masking in Fabric data warehousing
Feedback
Was this page helpful? Yes No
Dynamic data masking helps prevent unauthorized viewing of sensitive data by enabling
administrators to specify how much sensitive data to reveal, with minimal effect on the
application layer. Dynamic data masking can be configured on designated database
fields to hide sensitive data in the result sets of queries. With dynamic data masking, the
data in the database isn't changed, so it can be used with existing applications since
masking rules are applied to query results. Many applications can mask sensitive data
without modifying existing queries.
A central data masking policy acts directly on sensitive fields in the database.
Designate privileged users or roles that do have access to the sensitive data.
Dynamic data masking features full masking and partial masking functions, and a
random mask for numeric data.
Simple Transact-SQL commands define and manage masks.
The purpose of dynamic data masking is to limit exposure of sensitive data, preventing
users who shouldn't have access to the data from viewing it. Dynamic data masking
doesn't aim to prevent database users from connecting directly to the database and
running exhaustive queries that expose pieces of the sensitive data.
Dynamic data masking is complementary to other Fabric security features like column-
level security and row-level security. It's highly recommended to use these data
protection features together in order to protect the sensitive data in the database.
ノ Expand table
Function Description Examples
Default Full masking according to the data Example column definition syntax: Phone#
types of the designated fields. varchar(12) MASKED WITH (FUNCTION =
'default()') NULL
For string data types, use XXXX (or
fewer) if the size of the field is fewer Example of alter syntax: ALTER COLUMN
than 4 characters (char, nchar, varchar, Gender ADD MASKED WITH (FUNCTION =
nvarchar, text, ntext). 'default()')
Email Masking method that exposes the first Example definition syntax: Email
letter of an email address and the varchar(100) MASKED WITH (FUNCTION =
constant suffix ".com", in the form of 'email()') NULL
an email address. [email protected] .
Example of alter syntax: ALTER COLUMN
Email ADD MASKED WITH (FUNCTION =
'email()')
Random A random masking function for use on Example definition syntax: Account_Number
any numeric type to mask the original bigint MASKED WITH (FUNCTION =
value with a random value within a 'random([start range], [end range])')
specified range.
Example of alter syntax: ALTER COLUMN
[Month] ADD MASKED WITH (FUNCTION =
'random(1, 12)')
Custom Masking method that exposes the first Example definition syntax: FirstName
String and last letters and adds a custom varchar(100) MASKED WITH (FUNCTION =
padding string in the middle. prefix, 'partial(prefix,[padding],suffix)') NULL
[padding],suffix
Example of alter syntax: ALTER COLUMN
If the original value is too short to [Phone Number] ADD MASKED WITH (FUNCTION
complete the entire mask, part of the = 'partial(1,"XXXXXXX",0)')
prefix or suffix isn't exposed.
This turns a phone number like
Function Description Examples
Additional example:
For more examples, see How to implement dynamic data masking in Synapse Data
Warehouse.
Permissions
Users without the Administrator, Member, or Contributor rights on the workspace, and
without elevated permissions on the Warehouse, will see masked data.
You don't need any special permission to create a table with a dynamic data mask, only
the standard CREATE TABLE and ALTER on schema permissions.
Adding, replacing, or removing the mask of a column, requires the ALTER ANY
MASK permission and ALTER permission on the table. It's appropriate to grant ALTER ANY
Users with SELECT permission on a table can view the table data. Columns that are
defined as masked will display masked data. Grant the UNMASK permission to a user to
enable them to retrieve unmasked data from the columns for which masking is defined.
The CONTROL permission on the database includes both the ALTER ANY
MASK and UNMASK permission that enables the user to view unmasked data.
As an example, consider a user that has sufficient privileges to run queries on the
Warehouse, and tries to 'guess' the underlying data and ultimately infer the actual
values. Assume that we have a mask defined on the [Employee].[Salary] column, and
this user connects directly to the database and starts guessing values, eventually
inferring the [Salary] value in the Employees table:
SQL
Results in:
ノ Expand table
ID Name Salary
This demonstrates that dynamic data masking shouldn't be used alone to fully secure
sensitive data from users with query access to the Warehouse or SQL analytics endpoint.
It's appropriate for preventing sensitive data exposure, but doesn't protect against
malicious intent to infer the underlying data.
It's important to properly manage object-level security with SQL granular permissions,
and to always follow the minimal required permissions principle.
Related content
Workspace roles in Fabric data warehousing
Column-level security in Fabric data warehousing
Row-level security in Fabric data warehousing
Security for data warehousing in Microsoft Fabric
Next step
How to implement dynamic data masking in Synapse Data Warehouse
Feedback
Was this page helpful? Yes No
Column-level security (CLS) in Microsoft Fabric allows you to control access to columns
in a table based on specific grants on these tables. For more information, see Column-
level security in Fabric data warehousing.
This guide will walk you through the steps to implement column-level security in a
Warehouse or SQL analytics endpoint.
Prerequisites
Before you begin, make sure you have the following:
1. Connect
1. Log in using an account with elevated access on the Warehouse or SQL analytics
endpoint. (Either Admin/Member/Contributor role on the workspace or Control
Permissions on the Warehouse or SQL analytics endpoint).
2. Open the Fabric workspace and navigate to the Warehouse or SQL analytics
endpoint where you want to apply column-level security.
2. Implement column-level security with the GRANT T-SQL statement and a column
list. For simplicity of management, assigning permissions to roles is preferred to
using individuals.
SQL
-- Grant select to subset of columns of a table
GRANT SELECT ON YourSchema.YourTable
(Column1, Column2, Column3, Column4, Column5)
TO [SomeGroup];
3. Replace YourSchema with the name of your schema and YourTable with the name
of your target table.
5. Replace the comma-delimited columns list with the columns you want to give the
role access to.
6. Repeat these steps to grant specific column access for other tables if needed.
SQL
3. Similar results for the user will be filtered with other applications that use Microsoft
Entra authentication for database access. For more information, see Microsoft
Entra authentication as an alternative to SQL authentication in Microsoft Fabric.
Related content
Column-level security for Fabric data warehousing
Row-level security in Fabric data warehousing
Feedback
Was this page helpful? Yes No
Row-level security (RLS) in Fabric Warehouse and SQL analytics endpoint allows you to
control access to rows in a database table based on user roles and predicates. For more
information, see Row-level security in Fabric data warehousing.
This guide will walk you through the steps to implement row-level security in Microsoft
Fabric Warehouse or SQL analytics endpoint.
Prerequisites
Before you begin, make sure you have the following:
1. Connect
1. Log in using an account with elevated access on the Warehouse or SQL analytics
endpoint. (Either Admin/Member/Contributor role on the workspace or Control
Permissions on the Warehouse or SQL analytics endpoint).
2. Open the Fabric workspace and navigate to the Warehouse or SQL analytics
endpoint where you want to apply row-level security.
2. Create security predicates. Security predicates are conditions that determine which
rows a user can access. You can create security predicates as inline table-valued
functions. This simple exercise assumes there is a column in your data table,
UserName_column , that contains the relevant username, populated by the system
function USER_NAME().
SQL
4. Replace UserName_column with a column in your table that contains user names.
5. Replace WHERE @UserName = USER_NAME(); with a WHERE clause that matches the
desired predicate-based security filter. For example, this filters the data where the
UserName column, mapped to the @UserName parameter, matches the result of the
6. Repeat these steps to create security policies for other tables if needed.
SQL
SELECT USER_NAME()
2. Query the database tables to verify that row-level security is working as expected.
Users should only see data that satisfies the security predicate defined in their role.
For example:
SQL
3. Similar filtered results for the user will be filtered with other applications that use
Microsoft Entra authentication for database access. For more information, see
Microsoft Entra authentication as an alternative to SQL authentication in Microsoft
Fabric.
Related content
Row-level security in Fabric data warehousing
Column-level security for Fabric data warehousing
Feedback
Was this page helpful? Yes No
For more information, see Dynamic data masking in Fabric data warehousing.
Prerequisites
Before you begin, make sure you have the following:
1. Connect
1. Open the Fabric workspace and navigate to the Warehouse you want to apply
dynamic data masking to.
2. Sign in using an account with elevated access on the Warehouse, either
Admin/Member/Contributor role on the workspace or Control Permissions on the
Warehouse.
2. Configure dynamic data masking
1. Sign into the Fabric portal with your admin account.
3. Select the New SQL query option, and under Blank, select New SQL query.
4. In your SQL script, define dynamic data masking rules using the MASKED WITH
FUNCTION clause. For example:
SQL
The FirstName column shows only the first and last two characters of the
string, with - in the middle.
The LastName column shows XXXX .
The SSN column shows XXX-XX- followed by the last four characters of the
string.
7. The script will apply the specified dynamic data masking rules to the designated
columns in your table.
1. Sign in to a tool like Azure Data Studio or SQL Server Management Studio as the
test user, for example [email protected].
2. As the test user, run a query against the table. The masked data is displayed
according to the rules you defined.
SQL
3. With your admin account, grant the UNMASK permission from the test user.
SQL
4. As the test user, verify that a user signed in as [email protected] can see
unmasked data.
SQL
5. With your admin account, revoke the UNMASK permission from the test user.
SQL
6. Verify that the test user cannot see unmasked data, only the masked data.
SQL
7. With your admin account, you can grant and revoke the UNMASK permission to a
role
SQL
GRANT UNMASK ON dbo.EmployeeData TO [TestRole];
REVOKE UNMASK ON dbo.EmployeeData TO [TestRole];
1. You can add a mask to an existing column, using the MASKED WITH FUNCTION clause:
SQL
SQL
5. Cleanup
1. To clean up this testing table:
SQL
Related content
Dynamic data masking in Fabric data warehousing
Workspace roles in Fabric data warehousing
Column-level security in Fabric data warehousing
Row-level security in Fabric data warehousing
Security for data warehousing in Microsoft Fabric
Feedback
Was this page helpful? Yes No
Applies to: ✅ SQL analytics endpoint, Warehouse, and Mirrored Database in Microsoft
Fabric
This article describes how to use the visual query editor in the Microsoft Fabric portal to
quickly and efficiently write queries. You can use the visual query editor for a no-code
experience to create your queries.
You can also query the data in your warehouse with multiple tools with a SQL
connection string.
You can use the SQL query editor to write T-SQL queries from the Microsoft Fabric
portal.
You can quickly view data in the Data preview.
Once you've loaded data into your warehouse, you can use the visual query editor to
create queries to analyze your data. There are two ways to get to the visual query editor:
In the ribbon, create a new query using the New visual query button, as shown in the
following image.
To create a query, drag and drop tables from the Object explorer onto the canvas. To
drag a table, select and hold the table until you see it's picked up from the Object
explorer before dragging. Once you drag one or more tables onto the canvas, you can
use the visual experience to design your queries. The warehouse editor uses the Power
Query diagram view experience to enable you to easily query and analyze your data.
Learn more about Power Query diagram view.
As you work on your visual query, the queries are automatically saved every few
seconds. A "saving indicator" appears in your query tab to indicate that your query is
being saved. All workspace users can save their queries in My queries folder. However,
users in viewer role of the workspace or shared recipients of the warehouse are
restricted from moving queries to Shared queries folder.
The following animated gif shows the merging of two tables using a no-code visual
query editor.
1. First, the table DimCity is dragged from the Explorer into the blank new visual
query editor.
2. Then, the table FactSale is dragged from the Explorer into the visual query editor.
3. In the visual query editor, in the content menu of DimCity , the Merge queries as
new Power Query operator is used to join them on a common key.
4. In the new Merge page, the CityKey column in each table is selected to be the
common key. The Join kind is Inner.
5. The new Merge operator is added to the visual query editor.
6. When you see results, you can use Download Excel file to view results in Excel or
Visualize results to create report on results.
Save as view
You can save your query as a view on which data load is enabled using the Save as view
button. Select the schema name that you have access to create views, provide name of
view and verify the SQL statement before confirming creating view. When view is
successfully created, it appears in the Explorer.
View SQL
The View SQL feature allows you to see the SQL query based on the applied steps of
your visual query.
Select View query to see the resulting T-SQL, and Edit SQL script to edit the SQL query
in the query editor.
When writing queries that are joining two or more tables using the Merge queries
action, the query that has load enabled will be reflected in the SQL script. To specify
which table's query should be shown in the SQL script, select the context menu and then
Enable load. Expand the table's columns that got merged in the results to see the steps
reflected in the SQL script.
Save as table
You can use Save as table to save your query results into a table for the query with load
enabled. Select the warehouse in which you would like to save results, select schema
that you have access to create tables and provide table name to load results into the
table using CREATE TABLE AS SELECT statement. When table is successfully created, it
appears in the Explorer.
To create a cross-warehouse query, drag and drop tables from added warehouses
and add merge activity. For example, in the following image example, store_sales
is added from sales warehouse and it's merged with item table from marketing
warehouse.
Related content
Query the SQL analytics endpoint or Warehouse in Microsoft Fabric
Query using the SQL query editor
Query insights in Fabric data warehousing
Feedback
Was this page helpful? Yes No
Applies to: ✅ SQL analytics endpoint, Warehouse, and Mirrored Database in Microsoft
Fabric
This article describes how to use the SQL query editor in the Microsoft Fabric portal to
quickly and efficiently write queries, and suggestions on how best to see the information
you need.
You can also query the data in your warehouse with multiple tools with a SQL
connection string.
You can build queries graphically with the Visual query editor.
You can quickly view data in the Data preview.
The SQL query editor provides support for IntelliSense, code completion, syntax
highlighting, client-side parsing, and validation. You can run Data Definition Language
(DDL), Data Manipulation Language (DML), and Data Control Language (DCL)
statements.
Create a new query using the New SQL query button in the ribbon.
If you select SQL templates dropdown list, you can easily create T-SQL objects with
code templates that populate in your SQL query window, as shown in the following
image.
As you work on your SQL query, the queries are automatically saved every few seconds.
A "saving" indicator appears in your query tab to indicate that your query is being
saved.
The Results preview is displayed in the Results section. If number of rows returned is
more than 10,000 rows, the preview is limited to 10,000 rows. You can search string
within results grid to get filtered rows matching search criteria. The Messages tab shows
SQL messages returned when SQL query is run.
The status bar indicates the query status, duration of the run and number of rows and
columns returned in results.
To enable Save as view, Save as table, Open in Excel, Explore this data (preview), and
Visualize results menus, highlight the SQL statement containing SELECT statement in
the SQL query editor.
Save as view
You can select the query and save your query as a view using the Save as view button.
Select the schema name that you have access to create views, provide name of view and
verify the SQL statement before confirming creating view. When view is successfully
created, it appears in the Explorer.
Save as table
You can use Save as table to save your query results into a table. Select the warehouse
in which you would like to save results, select schema that you have access to create
tables and provide table name to load results into the table using CREATE TABLE AS
SELECT statement. When table is successfully created, it appears in the Explorer.
Open in Excel
The Open in Excel button opens the corresponding T-SQL Query to Excel and executes
the query, enabling you to work with the results in Microsoft Excel on your local
computer.
1. After you select the Continue button, locate the downloaded Excel file in your
Windows File Explorer, for example, in the Downloads folder of your browser.
2. To see the data, select the Enable Editing button in the Protected View ribbon
followed by the Enable Content button in the Security Warning ribbon. Once both
are enabled, you are presented with the following dialog to approve running the
query listed.
3. Select Run.
4. Authenticate your account with the Microsoft account option. Select Connect.
Once you have successfully signed in, you'll see the data presented in the spreadsheet.
Explore this data (preview)
Explore this data (preview) provides the capability to perform ad-hoc exploration of
your query results. With this feature, you can launch a side-by-side matrix and visual
view to better understand any trends or patterns behind your query results before
diving into building a full Power BI report. For more information, see Explore your data
in the Power BI service.
Visualize results
Visualize results allows you to create reports from your query results within the SQL
query editor.
Copy
The Copy dropdown allows you to copy the results and/or column names in the data
grid. You can choose to copy results with column names, just copy the results only, or
just copy the column names only.
Cross-warehouse querying
For more information on cross-warehouse querying, see Cross-warehouse querying.
You can write a T-SQL query with three-part naming convention to refer to objects and
join them across warehouses, for example:
SQL
SELECT
emp.Employee
,SUM(Profit) AS TotalProfit
,SUM(Quantity) AS TotalQuantitySold
FROM
[SampleWarehouse].[dbo].[DimEmployee] as emp
JOIN
[WWI_Sample].[dbo].[FactSale] as sale
ON
emp.EmployeeKey = sale.SalespersonKey
WHERE
emp.IsSalesperson = 'TRUE'
GROUP BY
emp.Employee
ORDER BY
TotalProfit DESC;
Keyboard shortcuts
Keyboard shortcuts provide a quick way to navigate and allow users to work more
efficiently in SQL query editor. The table in this article lists all the shortcuts available in
SQL query editor in the Microsoft Fabric portal:
ノ Expand table
Function Shortcut
Undo Ctrl + Z
Redo Ctrl + Y
Move cursor up ↑
Limitations
In SQL query editor, every time you run the query, it opens a separate session and
closes it at the end of the execution. This means if you set up session context for
multiple query runs, the context is not maintained for independent execution of
queries.
You can run Data Definition Language (DDL), Data Manipulation Language (DML),
and Data Control Language (DCL) statements, but there are limitations for
Transaction Control Language (TCL) statements. In the SQL query editor, when you
select the Run button, you're submitting an independent batch request to execute.
Each Run action in the SQL query editor is a batch request, and a session only
exists per batch. Each execution of code in the same query window runs in a
different batch and session.
For example, when independently executing transaction statements, session
context is not retained. In the following screenshot, BEGIN TRAN was executed in
the first request, but since the second request was executed in a different
session, there is no transaction to commit, resulting into the failure of
commit/rollback operation. If the SQL batch submitted does not include a
COMMIT TRAN , the changes applied after BEGIN TRAN will not commit.
In the SQL query editor, the GO SQL command creates a new independent batch
in a new session.
When you are running a SQL query with USE, you need to submit the SQL query
with USE as one single request.
Visualize results currently does not support SQL queries with an ORDER BY clause.
T-SQL statements that use the T-SQL OPTION syntax are not currently supported in
the Explore this data or Visualize results options with DirectQuery mode. The
workaround is to create visualizations in Power BI Desktop using Import mode.
The following table summarizes the expected behavior will not match with SQL
Server Management Studio or Azure Data Studio:
ノ Expand table
Scenario Supported in Supported in SQL query
SSMS/ADS editor in Fabric portal
Related content
Query using the Visual Query editor
Tutorial: Create cross-warehouse queries with the SQL query editor
Next step
How-to: Query the Warehouse
Feedback
Was this page helpful? Yes No
Applies to: ✅ SQL analytics endpoint, Warehouse, and Mirrored Database in Microsoft
Fabric
The Data preview is one of the three switcher modes along with the Query editor and
Model view within Fabric Data Warehouse that provides an easy interface to view the
data within your tables or views to preview sample data (top 1,000 rows).
You can also query the data in your warehouse with multiple tools with a SQL
connection string.
You can use the SQL query editor to write T-SQL queries from the Microsoft Fabric
portal.
You can build queries graphically with the Visual query editor.
Get started
After creating a warehouse and ingesting data, select a specific table or view from the
Object explorer that you would like to display in the data grid of the Data preview page.
Search value – Type in a specific keyword in the search bar and rows with that
specific keyword will be filtered. In this example, "New Hampshire" is the keyword
and only rows containing this keyword are shown. To clear the search, select the X
inside the search bar.
Sort columns (alphabetically or numerically) – Hover over the column title to see
the More Options (...) button appear. Select it to see the "Sort Ascending" and
"Sort Descending" options.
Copy value – Select a specific cell in the data preview and press Ctrl + C
(Windows) or Cmd + C (Mac).
Related content
Define relationships in data models for data warehousing
Model data in the default Power BI semantic model in Microsoft Fabric
Feedback
Was this page helpful? Yes No
Warehouse in Microsoft Fabric is built up open file formats. User tables are stored in
parquet file format, and Delta Lake logs are published for all user tables.
The Delta Lake logs opens up direct access to the warehouse's user tables for any
engine that can read Delta Lake tables. This access is limited to read-only to ensure the
user data maintains ACID transaction compliance. All inserts, updates, and deletes to the
data in the tables must be executed through the Warehouse. Once a transaction is
committed, a system background process is initiated to publish the updated Delta Lake
log for the affected tables.
2. In the Object Explorer, you find more options (...) on a selected table in the Tables
folder. Select the Properties menu.
Delta Lake logs can be queried through shortcuts created in a lakehouse. You can
view the files using a Microsoft Fabric Spark Notebook or the Lakehouse explorer
in Synapse Data Engineering in the Microsoft Fabric portal.
Delta Lake logs can be found via Azure Storage Explorer, through Spark
connections such as the Power BI Direct Lake mode, or using any other service that
can read delta tables.
Delta Lake logs can be found in the _delta_log folder of each table through the
OneLake Explorer in Windows, as shown in the following screenshot.
The syntax to pause and resume Delta Lake log publishing is as follows:
SQL
SQL
Queries to warehouse tables on the current warehouse from other Microsoft Fabric
engines (for example, queries from a Lakehouse) now show a version of the data as it
was before pausing Delta Lake log publishing. Warehouse queries still show the latest
version of data.
To resume Delta Lake log publishing, use the following code snippet:
SQL
When the state is changed back to AUTO, the Fabric Warehouse engine publishes logs
of all recent changes made to tables on the warehouse, allowing other analytical
engines in Microsoft Fabric to read the latest version of data.
SQL
SELECT [name], [DATA_LAKE_LOG_PUBLISHING_DESC] FROM sys.databases
Limitations
Table Names can only be used by Spark and other systems if they only contain
these characters: A-Z a-z 0-9 and underscores.
Column Names that will be used by Spark and other systems cannot contain:
spaces
tabs
carriage returns
[
,
;
{
}
(
)
=
]
Related content
Query the Warehouse
How to use Microsoft Fabric notebooks
OneLake overview
Accessing shortcuts
Navigate the Fabric Lakehouse explorer
Feedback
Was this page helpful? Yes No
Warehouse in Microsoft Fabric offers the capability to query historical data as it existed
in the past. The ability to query a data from a specific timestamp is known in the data
warehousing industry as time travel.
Time travel facilitates stable reporting by maintaining the consistency and accuracy
of data over time.
Time travel enables historical trend analysis by querying across various past points
in time, and helps anticipate the future trends.
Time travel simplifies low-cost comparisons between previous versions of data.
Time travel aids in analyzing performance over time.
Time travel allows organizations to audit data changes over time, often required
for compliance purposes.
Time travel helps to reproduce the results from machine learning models.
Time travel can query tables as they existed at a specific point in time across
multiple warehouses in the same workspace.
Microsoft Fabric currently allows retrieval of past states of data in the following ways:
The results obtained from the time travel queries are inherently read-only. Write
operations such as INSERT, UPDATE, and DELETE cannot occur while utilizing the FOR
TIMESTAMP AS OF query hint.
Use the OPTION clause to specify the FOR TIMESTAMP AS OF query hint. Queries return
data exactly as it existed at the timestamp, specified as YYYY-MM-DDTHH:MM:SS[.fff] . For
example:
SQL
SELECT *
FROM [dbo].[dimension_customer] AS DC
OPTION (FOR TIMESTAMP AS OF '2024-03-13T19:39:35.28'); --March 13, 2024 at
7:39:35.28 PM UTC
Use the CONVERT syntax for the necessary datetime format with style 126.
The timestamp can be specified only once using the OPTION clause for queries, stored
procedures, views, etc. The OPTION applies to everything within the SELECT statement.
Data retention
In Microsoft Fabric, a warehouse automatically preserves and maintains various versions
of the data, up to a default retention period of thirty calendar days. This allows the
ability to query tables as of any prior point-in-time. All inserts, updates, and deletes
made to the data warehouse are retained. The retention automatically begins from the
moment the warehouse is created. Expired files are automatically deleted after the
retention threshold.
Currently, a SELECT statement with the FOR TIMESTAMP AS OF query hint returns the
latest version of table schema.
Any records that are deleted in a table are available to be queried as they existed
before deletion, if the deletion is within the retention period.
Any modifications made to the schema of a table, including but not limited to
adding or removing columns from the table, cannot be queried before the schema
change. Similarly, dropping and recreating a table with the same data removes its
history.
Stable reporting
Frequent execution of extract, transform, and load (ETL) jobs is essential to keep up with
the ever-changing data landscape. The ability to time travel supports this goal by
ensuring data integrity while providing the flexibility to generate reports based on the
query results that are returned as of a past point in time, such as the previous evening,
while background processing is ongoing.
ETL activities can run concurrently while the same table is queried as of a prior point-in-
time.
Time travel simplifies the analysis of historical data, helping uncover valuable trends and
patterns through querying data across various past time frames. This facilitates
predictive analysis by enabling experimenting with historical datasets and training of
predictive models. It aids in anticipating future trends and helps making well-informed,
data-driven decisions.
Performance analysis
Time travel can help analyze the performance of warehouse queries overtime. This helps
identify the performance degradation trends based on which the queries can be
optimized.
The FOR TIMESTAMP AS OF query hint cannot be used to create the views as of any
prior point in time within the retention period. It can be used to query views as of
past point in time, within the retention period.
The FOR TIMESTAMP AS OF query hint can be used only once within a SELECT
statement.
The FOR TIMESTAMP AS OF query hint can be defined within the SELECT statement in
a stored procedure.
Limitations
Supply at most three digits of fractional seconds in the timestamp. If you supply
more precision, you receive the error message An error occurred during
timestamp conversion. Please provide a timestamp in the format yyyy-MM-
Currently, only the Coordinated Universal Time (UTC) time zone is used for time
travel.
Currently, the data retention for time travel queries is thirty calendar days.
Time travel is not supported for the SQL analytics endpoint of the Lakehouse.
The OPTION FOR TIMESTAMP AS OF syntax can only be used in queries that begin
with SELECT statement. Queries such as INSERT INTO SELECT and CREATE TABLE AS
SELECT cannot be used along with the OPTION FOR TIMESTAMP AS OF . Consider
FOR TIMESTAMP AS OF syntax for time travel is not currently supported in Power BI
Next step
How to: Query using time travel
Related content
Query the SQL analytics endpoint or Warehouse in Microsoft Fabric
Query hints
Feedback
Was this page helpful? Yes No
In Microsoft Fabric, the capability to time travel unlocks the ability to query the prior
versions of data without the need to generate multiple data copies, saving on storage
costs. This article describes how to query warehouse tables using time travel at the
statement level, using the T-SQL OPTION clause and the FOR TIMESTAMP AS OF syntax.
This feature is currently in preview.
Warehouse tables can be queried up to a retention period of thirty calendar days using
the OPTION clause, providing the date format yyyy-MM-ddTHH:mm:ss[.fff] .
The following examples can be executed in the SQL Query Editor, SQL Server
Management Studio (SSMS), Azure Data Studio, or any T-SQL query editor.
7 Note
Currently, only the Coordinated Universal Time (UTC) time zone is used for time
travel.
The OPTION T-SQL clause specifies the point-in-time to return the data.
SQL
SQL
SELECT Sales.StockItemKey,
Sales.Description,
CAST (Sales.Quantity AS int)) AS SoldQuantity,
c.Customer
FROM [dbo].[fact_sale] AS Sales INNER JOIN [dbo].[dimension_customer] AS c
ON Sales.CustomerKey = c.CustomerKey
GROUP BY Sales.StockItemKey, Sales.Description, Sales.Quantity, c.Customer
ORDER BY Sales.StockItemKey
OPTION (FOR TIMESTAMP AS OF '2024-05-02T20:44:13.700');
The FOR TIMESTAMP AS OF clause cannot directly accept a variable, as values in this
OPTION clause must be deterministic. You can use sp_executesql to pass a strongly typed
datetime value to the stored procedure. This simple example passes a variable and
converts the datetime parameter to the necessary format with date style 126.
SQL
Then, you can call the stored procedure and pass in a variable as a strongly typed
parameter. For example:
SQL
SQL
SQL
--Create View
CREATE VIEW Top10CustomersView
AS
SELECT TOP (10)
FS.[CustomerKey],
DC.[Customer],
SUM(FS.TotalIncludingTax) AS TotalSalesAmount
FROM
[dbo].[dimension_customer] AS DC
INNER JOIN
[dbo].[fact_sale] AS FS ON DC.[CustomerKey] = FS.[CustomerKey]
GROUP BY
FS.[CustomerKey],
DC.[Customer]
ORDER BY
TotalSalesAmount DESC;
The historical data from tables in a view can only be queried for time travel
beginning from the time the view was created.
After a view is altered, time travel queries are only valid after it was altered.
If an underlying table of a view is altered without changing the view, time travel
queries on the view can return the data from before the table change, as expected.
When the underlying table of a view is dropped and recreated without modifying
the view, data for time travel queries is only available from the time after the table
was recreated.
Limitations
For more information on time travel at the statement level limitations with FOR
TIMESTAMP AS OF , see Time travel Limitations.
Related content
Query data as it existed in the past
Feedback
Was this page helpful? Yes No
The T-SQL notebook feature in Microsoft Fabric lets you write and run T-SQL code
within a notebook. You can use T-SQL notebooks to manage complex queries and write
better markdown documentation. It also allows direct execution of T-SQL on connected
warehouse or SQL analytics endpoint. By adding a Data Warehouse or SQL analytics
endpoint to a notebook, T-SQL developers can run queries directly on the connected
endpoint. BI analysts can also perform cross-database queries to gather insights from
multiple warehouses and SQL analytics endpoints.
Most of the existing notebook functionalities are available for T-SQL notebooks. These
include charting query results, coauthoring notebooks, scheduling regular executions,
and triggering execution within Data Integration pipelines.
) Important
1. Create a T-SQL notebook from the Data Warehouse homepage: Navigate to the
data warehouse experience, and choose Notebook.
2. Create a T-SQL notebook from an existing warehouse editor: Navigate to an
existing warehouse, from the top navigation ribbon, select New SQL query and
then New T-SQL query notebook
Once the notebook is created, T-SQL is set as the default language. You can add data
warehouse or SQL analytics endpoints from the current workspace into your notebook.
You can autogenerate T-SQL code using the code template from the object explorer's
context menu. The following templates are available for T-SQL notebooks:
You can run one T-SQL code cell by selecting the Run button in the cell toolbar or run
all cells by selecting the Run all button in the toolbar.
7 Note
Each code cell is executed in a separate session, so the variables defined in one cell
are not available in another cell.
Within the same code cell, it might contain multiple lines of code. User can select part of
these code and only run the selected ones. Each execution also generates a new session.
After the code is executed, expand the message panel to check the execution summary.
The Table tab list the records from the returned result set. If the execution contains
multiple result set, you can switch from one to another via the dropdown menu.
Because the Save as table and Save as view menu are only available for the
selected query text, you need to select the query text before using these
menus.
Create View does not support three-part naming, so the view is always
created in the primary warehouse by setting the warehouse as the primary
warehouse.
Related content
For more information about Fabric notebooks, see the following articles.
Feedback
Was this page helpful? Yes No
Restore in-place can be used to restore the warehouse to a known good state in
the event of accidental corruption, minimizing downtime and data loss.
Restore in-place can be helpful to reset the warehouse to a known good state for
development and testing purposes.
Restore in-place helps to quickly roll back the changes to prior state, due failed
database release or migration.
Restore in-place is an essential part of data recovery that allows restoration of the
warehouse to a prior known good state. A restore overwrites the existing warehouse,
using restore points from the existing warehouse.
You can also query data in a warehouse as it appeared in the past, using the T-SQL
OPTION syntax. For more information, see Query data as it existed in the past.
7 Note
The restore points and restore in place features are currently in preview.
To view all restore points for your warehouse, in the Fabric portal go to Settings ->
Restore points.
System-generated restore points are created throughout the day, and are available for
thirty days. System-generated restore points are created automatically every eight
hours. A system-created restore point might not be available immediately for a new
warehouse. If one is not yet available, create a user-defined restore point.
There can be up to 180 system-generated restore points at any given point in time.
If the warehouse is paused, system-created restore points can't be created unless and
until the warehouse is resumed. You should create a user-defined restore point before
pausing the warehouse. Before a warehouse is dropped, a system-created restore point
isn't automatically created.
System-created restore points can't be deleted, as the restore points are used to
maintain service level agreements (SLAs) for recovery.
Any number of user-defined restore points aligned with your specific business or
organizational recovery strategy can be created. User-defined restore points are
available for thirty calendar days and are automatically deleted on your behalf after the
expiry of the retention period.
For more information about creating and managing restore points, see Manage restore
points in the Fabric portal.
Warehouse deletes both the system-created and user-defined restore point at the
expiry of the 30 calendar day retention period.
The age of a restore point is measured by the absolute calendar days from the
time the restore point is taken, including when the Microsoft Fabric capacity is
paused.
System-created and user-generated restore points can't be created when the
Microsoft Fabric capacity is paused. The creation of a restore point fails when the
fabric capacity gets paused while the restore point creation is in progress.
If a restore point is generated and then the capacity remains paused for more than
30 days before being resumed, the restore point remains in existence until a total
of 180 system-created restore points are reached.
At any point in time, Warehouse is guaranteed to be able to store up to 180
system-generated restore points as long as these restore points haven't reached
the thirty day retention period.
All the user-defined restore points that are created for the warehouse are
guaranteed to be stored until the default retention period of 30 calendar days.
Storage billing
The creation of both system-created and user-defined restore points consume storage.
The storage cost of restore points in OneLake includes the data files stored in parquet
format. There are no storage charges incurred during the process of restore.
Compute billing
Compute charges are incurred during the creation and restore of restore points, and
consume the Microsoft Fabric capacity.
When you restore, the current warehouse is replaced with the restored warehouse. The
name of the warehouse remains the same, and the old warehouse is overwritten. All
components, including objects in the Explorer, modeling, Query Insights, and semantic
models are restored as they existed when the restore point was created.
Each restore point references a UTC timestamp when the restore point was created.
If you encounter Error 5064 after requesting a restore, resubmit the restore again.
Security
Any member of the Admin, Member, or Contributor workspace roles can create,
delete, or rename the user-defined restore points.
Any user that has the workspace roles of a Workspace Administrator, Member,
Contributor, or Viewer can see the list of system-created and user-defined restore
points.
A data warehouse can be restored only by user that has workspace roles of a
Workspace Administrator, from a system-created or user-defined restore point.
Limitations
A recovery point can't be restored to create a new warehouse with a different
name, either within or across the Microsoft Fabric workspaces.
Restore points can't be retained beyond the default thirty calendar day retention
period. This retention period isn't currently configurable.
Next step
Restore in-place in the Fabric portal
Related content
Manage restore points in the Fabric portal
Clone table in Microsoft Fabric
Query data as it existed in the past
Microsoft Fabric disaster recovery guide
Feedback
Was this page helpful? Yes No
Restore in-place is an essential part of data recovery that allows restoration of the
warehouse to a prior known good state. A restore overwrites the existing warehouse,
using restore points from the existing warehouse in Microsoft Fabric.
This tutorial guides you through how to create restore points and performing a
restore in-place in a warehouse, as well as how to rename, manage, and view
restore points.
Prerequisites
Review the workspace roles membership required for the following steps. For more
information, see Restore in place Security.
An existing user-defined or system-created restore point.
A system-created restore point might not be available immediately for a new
warehouse. If one is not yet available, create a user-defined restore point.
2. Review and confirm the dialogue. Select the checkbox followed by Restore.
3. A notification appears showing restore progress, followed by success notification.
Restore in-place is a metadata operation, so it can take a while depending on the
size of the metadata that is being restored.
) Important
When a restore in-place is initiated, users inside the warehouse are not
alerted that a restore is ongoing. Once the restore operation is completed,
users should refresh the Object Explorer.
If the restore point is over 30 days old and was deleted, the details of the
banner will show as N/A .
Related content
Restore in-place of a warehouse in Microsoft Fabric
Microsoft Fabric terminology
Feedback
Was this page helpful? Yes No
Microsoft Fabric offers the capability to create near-instantaneous zero-copy clones with
minimal storage costs.
You can use the CREATE TABLE AS CLONE OF T-SQL commands to create a table clone.
For a tutorial, see Tutorial: Clone table using T-SQL or Tutorial: Clone tables in the Fabric
portal.
You can also query data in a warehouse as it appeared in the past, using the T-SQL
OPTION syntax. For more information, see Query data as it existed in the past.
Current point-in-time: The clone is based on the present state of the table.
You can also clone a group of tables at once. This can be useful for cloning a group of
related tables at the same past point in time. For an example, see Clone multiple tables
at once.
You can also query data from tables as they existed in the past, using the Time travel
feature in Warehouse.
Data retention
Warehouse automatically preserves and maintains the data history for thirty calendar
days, allowing for clones to be made at a point in time. All inserts, updates, and deletes
made to the data warehouse are retained for thirty calendar days.
There is no limit on the number of clones created both within and across schemas.
Any changes made through DML or DDL on the source of the clone table are not
reflected in the clone table.
Similarly, any changes made through DDL or DML on the table clone are not
reflected on the source of the clone table.
Users with Admin, Member, or Contributor workspace roles can clone the tables
within the workspace. The Viewer workspace role cannot create a clone.
SELECT permission on all the rows and columns of the source of the table clone is
required.
User must have CREATE TABLE permission in the schema where the table clone will
be created.
Users with Admin, Member, or Contributor workspace roles can delete the table
clone within the workspace.
Users who have ALTER SCHEMA permissions on the schema in which the table
clone resides can delete the table clone.
The clone table inherits object-level SQL security from the source table of the
clone. As the workspace roles provide read access by default, DENY permission can
be set on the table clone if desired.
The clone table inherits the row-level security (RLS) and dynamic data masking
from the source of the clone table.
The clone table inherits all attributes that exist at the source table, whether the
clone was created within the same schema or across different schemas in a
warehouse.
The clone table inherits the primary and unique key constraints defined in the
source table.
A read-only delta log is created for every table clone that is created within the
Warehouse. The data files stored as delta parquet files are read-only. This ensures
that the data stays always protected from corruption.
Data archiving
For auditing or compliance purposes, zero copy clones can be easily used to create
copies of data as it existed at a particular point in time in the past. Some data might
need to be archived for long-term retention or legal compliance. Cloning the table at
various historical points ensures that data is preserved in its original form.
Limitations
Table clones across warehouses in a workspace are not currently supported.
Table clones across workspaces are not currently supported.
Clone table is not supported on the SQL analytics endpoint of the Lakehouse.
Clone of a warehouse or schema is currently not supported.
Table clones submitted before the retention period of thirty days cannot be
created.
Changes to the table schema prevent a clone from being created before to the
table schema change.
Next step
Tutorial: Clone tables in the Fabric portal
Related content
Tutorial: Clone a table using T-SQL in Microsoft Fabric
CREATE TABLE AS CLONE OF
Query the SQL analytics endpoint or Warehouse in Microsoft Fabric
Query data as it existed in the past
Feedback
Was this page helpful? Yes No
A zero-copy clone creates a replica of the table by copying the metadata, while still
referencing the same data files in OneLake. This tutorial guides you through creating a
table clone in Warehouse in Microsoft Fabric, using the warehouse editor with a no-
code experience.
On clone table pane, you can see the source table schema and name is already
populated. The table state as current, creates clone of the source table as of its current
state. You can choose destination schema and edit pre-populated destination table
name. You can also see the generated T-SQL statement when you expand SQL
statement section. When you select the Clone button, a clone of the table is generated
and you can see it in Explorer.
Clone table as of past point-in-time
Similar to current state, you can also choose the past state of the table within last 30
days by selecting the date and time in UTC. This generates a clone of the table from a
specific point in time, selectable in the Date and time of past state fields.
Clone multiple tables at once
You can also clone a group of tables at once. This can be useful for cloning a group of
related tables at the same past point in time. By selecting source tables, current or past
table state, and destination schema, you can perform clone of multiple tables easily and
quickly.
With the Clone tables context menu on Tables folder in the Explorer, you can select
multiple tables for cloning.
The default naming pattern for cloned objects is source_table_name-Clone . The T-SQL
commands for the multiple CREATE TABLE AS CLONE OF statements are provided if
customization of the name is required.
Related content
Clone table in Microsoft Fabric
Tutorial: Clone table using T-SQL
CREATE TABLE AS CLONE OF
Feedback
Was this page helpful? Yes No
Mirroring in Fabric is a low-cost and low-latency solution to bring data from various
systems together into a single analytics platform. You can continuously replicate your
existing data estate directly into Fabric's OneLake from a variety of Azure databases and
external data sources.
With the most up-to-date data in a queryable format in OneLake, you can now use all
the different services in Fabric, such as running analytics with Spark, executing
notebooks, data engineering, visualizing through Power BI Reports, and more.
Mirroring in Fabric allows users to enjoy a highly integrated, end-to-end, and easy-to-
use product that is designed to simplify your analytics needs. Built for openness and
collaboration between Microsoft, and technology solutions that can read the open-
source Delta Lake table format, Mirroring is a low-cost and low-latency turnkey solution
that allows you to create a replica of your data in OneLake which can be used for all
your analytical needs.
The Delta tables can then be used everywhere Fabric, allowing users to accelerate their
journey into Fabric.
Accessing and working with this data today requires complex ETL (Extract Transform
Load) pipelines, business processes, and decision silos, creating:
Mirroring in Fabric provides an easy experience to speed the time-to-value for insights
and decisions, and to break down data silos between technology solutions:
Near real time replication of data and metadata into a SaaS data-lake, with built-in
analytics built-in for BI and AI
Mirroring manages the replication of data and metadata into OneLake and
conversion to Parquet, in an analytics-ready format. This enables downstream
scenarios like data engineering, data science, and more.
A SQL analytics endpoint
A Default semantic model
In addition to the SQL query editor, there's a broad ecosystem of tooling including SQL
Server Management Studio (SSMS), the mssql extension with Visual Studio Code, and
even GitHub Copilot.
Sharing enables ease of access control and management, to make sure you can control
access to sensitive information. Sharing also enables secure and democratized decision-
making across your organization.
Types of mirroring
Fabric offers three different approaches in bringing data into OneLake through
mirroring.
Enabling Mirroring in Fabric is simple and intuitive, without having the need to
create complex ETL pipelines, allocate other compute resources, and manage data
movement.
Mirroring in Fabric is a fully managed service, so you don't have to worry about
hosting, maintaining, or managing replication of the mirrored connection.
For example, when accessing data registered in Unity Catalog, Fabric mirrors only the
catalog structure from Azure Databricks, allowing the underlying data to be accessed
through shortcuts. This method ensures that any changes in the source data are
instantly reflected in Fabric without requiring data movement, maintaining real-time
synchronization and enhancing efficiency in accessing up-to-date information.
Once data is in the landing zone with the proper format, replication will start running
and manage the complexity of merging the changes with updates, insert, and delete to
be reflected into delta tables. This method ensures that any data written into the landing
zone will be immediately and keeping the data in Fabric up-to-date.
Sharing
Sharing enables ease of access control and management, while security controls like
Row-level security (RLS) and Object level security (OLS), and more make sure you can
control access to sensitive information. Sharing also enables secure and democratized
decision-making across your organization.
By sharing, users grant other users or a group of users access to a mirrored database
without giving access to the workspace and the rest of its items. When someone shares
a mirrored database, they also grant access to the SQL analytics endpoint and
associated default semantic model.
For more information, see Share your mirrored database and manage permissions.
Cross-database queries
With the data from your mirrored database stored in the OneLake, you can write cross-
database queries, joining data from mirrored databases, warehouses, and the SQL
analytics endpoints of Lakehouses in a single T-SQL query. For more information, see
Write a cross-database query.
For example, you can reference the table from mirrored databases and warehouses
using three-part naming. In the following example, use the three-part name to refer to
ContosoSalesTable in the warehouse ContosoWarehouse . From other databases or
warehouses, the first part of the standard SQL three-part naming convention is the
name of the mirrored database.
SQL
SELECT *
FROM ContosoWarehouse.dbo.ContosoSalesTable AS Contoso
INNER JOIN Affiliation
ON Affiliation.AffiliationId = Contoso.RecordTypeID;
Microsoft Fabric users can access Data Science workloads. From there, they can discover
and access various relevant resources. For example, they can create machine learning
Experiments, Models and Notebooks. They can also import existing Notebooks on the
Data Science Home page.
Related content
What is Microsoft Fabric?
Model data in the default Power BI semantic model in Microsoft Fabric
What is the SQL analytics endpoint for a lakehouse?
Direct Lake overview
Feedback
Was this page helpful? Yes No
Once mirroring is configured, visit the Monitor replication page to monitor the current
state of replication.
The Monitor replication pane shows you the current state of the source database
replication, with the corresponding statuses of the tables, total rows replicated, and last
refresh date/time as well.
Status
The following are the possible statuses for the replication:
ノ Expand table
Monitor Status
Database Running: Replication is currently running bringing snapshot and change data into
level OneLake.
Running with warning: Replication is running, with transient errors.
Stopping/Stopped: Replication has stopped.
Error: Fatal error in replication that can't be recovered.
Related content
Troubleshoot Fabric mirrored databases
What is Mirroring in Fabric?
Feedback
Was this page helpful? Yes No
When you share a mirrored database, you grant other users or groups access to the
mirrored database without giving access to the workspace and the rest of its items.
Sharing a mirrored database also grants access to the SQL analytics endpoint and the
associated default semantic model.
7 Note
You're prompted with options to select who you would like to share the mirrored
database with, what permissions to grant them, and whether they'll be notified by email.
By default, sharing a mirrored database grants users Read permission to the mirrored
database, the associated SQL analytics endpoint, and the default semantic model. In
addition to these default permissions, you can grant:
"Read all SQL analytics endpoint data": Grants the recipient the ReadData
permission for the SQL analytics endpoint, allowing the recipient to read all data
via the SQL analytics endpoint using Transact-SQL queries.
"Read all OneLake data": Grants the ReadAll permission to the recipient, allowing
them to access the mirrored data in OneLake, for example, by using Spark or
OneLake Explorer.
"Build reports on the default semantic model": Grants the recipient the Build
permission for the default semantic model, enabling users to create Power BI
reports on top of the semantic model.
"Read and write": Grants the recipient the Write permission for the mirrored
database, allowing them to edit the mirrored database configuration and
read/write data from/to the landing zone.
Manage permissions
To review the permissions granted to a mirrored database, its SQL analytics endpoint, or
its default semantic model, navigate to one of these items in the workspace and select
the Manage permissions quick action.
If you have the Share permission for a mirrored database, you can also use the Manage
permissions page to grant or revoke permissions. To view existing recipients, select the
context menu (...) at the end of each row to add or remove specific permission.
7 Note
When mirroring data from Azure SQL Database or Azure SQL Managed Instance, its
System Assigned Managed Identity needs to have "Read and write" permission to
the mirrored database. If you create the mirrored database from the Fabric portal,
the permission is granted automatically. If you use API to create the mirrored
database, make sure you grant the permission following above instruction. You can
search the recipient by specifying the name of your Azure SQL Database logical
server or Azure SQL Managed Instance.
Related content
What is Mirroring in Fabric?
What is the SQL analytics endpoint for a lakehouse?
Feedback
Was this page helpful? Yes No
Learn more about all the methods to query the data in your mirrored database within
Microsoft Fabric.
To access the SQL analytics endpoint, select the corresponding item in the workspace
view or switch to the SQL analytics endpoint mode in the mirrored database explorer.
For more information, see What is the SQL analytics endpoint for a lakehouse?
For more information, see View data in the Data preview in Microsoft Fabric.
For more information, see Query using the visual query editor.
For more information, see Query using the SQL query editor.
For a step-by-step guide, see Explore data in your mirrored database with notebooks.
For more information, see Create shortcuts in lakehouse and see Explore the data in
your lakehouse with a notebook.
For a step-by-step guide, see Explore data in your mirrored database directly in
OneLake.
Create a report
Create a report directly from the semantic model (default) in three different ways:
For more information, see Create reports in the Power BI service in Microsoft Fabric and
Power BI Desktop.
Related content
What is Mirroring in Fabric?
Model data in the default Power BI semantic model in Microsoft Fabric
What is the SQL analytics endpoint for a lakehouse?
Direct Lake overview
Feedback
Was this page helpful? Yes No
Provide product feedback | Ask the community
Explore data in your mirrored database
directly in OneLake
Article • 11/19/2024
You can access mirrored database table data in Delta format files. This tutorial provides
steps to connect to Azure Cosmos DB data directly with Azure Storage Explorer.
Prerequisites
Complete the tutorial to create a mirrored database from your source database.
Tutorial: Create a mirrored database from Azure Cosmos DB
Tutorial: Create a mirrored database from Azure Databricks
Tutorial: Create a mirrored database from Azure SQL Database
Tutorial: Create a mirrored database from Azure SQL Managed Instance
Tutorial: Create a mirrored database from Snowflake
Tutorial: Create an open mirrored database
4. Open the Azure Storage Explorer desktop application. If you don't have it,
download and install Azure Storage Explorer .
5. Connect to Azure Storage.
6. On the Select Resource page, select Azure Data Lake Storage (ADLS) Gen2 as the
resource.
7. Select Next.
8. On the Select Connection Method page, Sign in using OAuth. If you aren't signed
into the subscription, you should do that first with OAuth. And then access ADLS
Gen2 resource.
9. Select Next.
10. On the Enter Connection Info page, provide a Display name.
11. Paste the SQL analytics endpoint URL into the box for Blob container or directory
URL.
12. Select Next.
13. You can access delta files directly from Azure Storage Explorer.
Tip
More examples:
Related content
Explore data in your mirrored database using Microsoft Fabric
Explore data in your mirrored database with notebooks
Connecting to Microsoft OneLake
Feedback
Was this page helpful? Yes No
You can explore the data replicated from your mirrored database with Spark queries in
notebooks.
Notebooks are a powerful code item for you to develop Apache Spark jobs and machine
learning experiments on your data. You can use notebooks in the Fabric Lakehouse to
explore your mirrored tables.
Prerequisites
Complete the tutorial to create a mirrored database from your source database.
Tutorial: Configure Microsoft Fabric mirrored database for Azure Cosmos DB
(Preview)
Tutorial: Configure Microsoft Fabric mirrored databases from Azure Databricks
(Preview)
Tutorial: Configure Microsoft Fabric mirrored databases from Azure SQL
Database
Tutorial: Configure Microsoft Fabric mirrored databases from Azure SQL
Managed Instance (Preview)
Tutorial: Configure Microsoft Fabric mirrored databases from Snowflake
Create a shortcut
You first need to create a shortcut from your mirrored tables into the Lakehouse, and
then build notebooks with Spark queries in your Lakehouse.
2. If you don't have a Lakehouse created already, select Lakehouse and create a new
Lakehouse by giving it a name.
5. You can see all your mirrored databases in the Fabric workspace.
6. Select the mirrored database you want to add to your Lakehouse, as a shortcut.
7. Select desired tables from the mirrored database.
9. In the Explorer, you can now see selected table data in your Lakehouse.
Tip
You can add other data in Lakehouse directly or bring shortcuts like S3, ADLS
Gen2. You can navigate to the SQL analytics endpoint of the Lakehouse and
join the data across all these sources with mirrored data seamlessly.
10. To explore this data in Spark, select the ... dots next to any table. Select New
notebook or Existing notebook to begin analysis.
11. The notebook will automatically open and the load the dataframe with a SELECT
... LIMIT 1000 Spark SQL query.
New notebooks can take up to two minutes to load completely. You can
avoid this delay by using an existing notebook with an active session.
Related content
Explore data in your mirrored database using Microsoft Fabric
Create shortcuts in lakehouse
Explore the data in your lakehouse with a notebook
Feedback
Was this page helpful? Yes No
Mirroring in Fabric provides an easy experience to avoid complex ETL (Extract Transform
Load) and integrate your existing Azure SQL Database estate with the rest of your data
in Microsoft Fabric. You can continuously replicate your existing Azure SQL Databases
directly into Fabric's OneLake. Inside Fabric, you can unlock powerful business
intelligence, artificial intelligence, Data Engineering, Data Science, and data sharing
scenarios.
For a tutorial on configuring your Azure SQL Database for Mirroring in Fabric, see
Tutorial: Configure Microsoft Fabric mirrored databases from Azure SQL Database.
To learn more and watch demos of Mirroring Azure SQL Database in Fabric, watch the
following the Data Exposed episode.
https://learn-video.azurefd.net/vod/player?show=data-exposed&ep=key-mirroring-to-
azure-sql-database-in-fabric-benefits-data-exposed&locale=en-
us&embedUrl=%2Ffabric%2Fdatabase%2Fmirrored-database%2Fazure-sql-database
The mirrored database item. Mirroring manages the replication of data into
OneLake and conversion to Parquet, in an analytics-ready format. This enables
downstream scenarios like data engineering, data science, and more.
A SQL analytics endpoint
A default semantic model
Each mirrored Azure SQL Database has an autogenerated SQL analytics endpoint that
provides a rich analytical experience on top of the Delta Tables created by the mirroring
process. Users have access to familiar T-SQL commands that can define and query data
objects but not manipulate the data from the SQL analytics endpoint, as it's a read-only
copy. You can perform the following actions in the SQL analytics endpoint:
Explore the tables that reference data in your Delta Lake tables from Azure SQL
Database.
Create no code queries and views and explore data visually without writing a line
of code.
Develop SQL views, inline TVFs (Table-valued Functions), and stored procedures to
encapsulate your semantics and business logic in T-SQL.
Manage permissions on the objects.
Query data in other Warehouses and Lakehouses in the same workspace.
In addition to the SQL query editor, there's a broad ecosystem of tooling that can query
the SQL analytics endpoint, including SQL Server Management Studio (SSMS), the mssql
extension with Visual Studio Code, and even GitHub Copilot.
Network requirements
Currently, Mirroring doesn't support Azure SQL Database logical servers behind an
Azure Virtual Network or private networking. If you have your Azure Database instance
behind a private network, you can't enable Azure SQL Database mirroring.
Currently, you must update your Azure SQL logical server firewall rules to Allow
public network access.
You must enable the Allow Azure services option to connect to your Azure SQL
Database logical server.
Related content
How to: Secure data Microsoft Fabric mirrored databases from Azure SQL
Database
Limitations in Microsoft Fabric mirrored databases from Azure SQL Database
Monitor Fabric mirrored database replication
Troubleshoot Fabric mirrored databases from Azure SQL Database
Feedback
Was this page helpful? Yes No
Prerequisites
Create or use an existing Azure SQL Database.
The source Azure SQL Database can be either a single database or a database in
an elastic pool.
If you don't have an Azure SQL Database, create a new single database. Use the
Azure SQL Database free offer if you haven't already.
Review the tier and purchasing model requirements for Azure SQL Database.
During the current preview, we recommend using a copy of one of your existing
databases or any existing test or development database that you can recover
quickly from a backup. If you want to use a database from an existing backup,
see Restore a database from a backup in Azure SQL Database.
You need an existing capacity for Fabric. If you don't, start a Fabric trial.
If you want to mirror a database from an existing backup, see Restore a
database from a backup in Azure SQL Database.
The Fabric capacity needs to be active and running. A paused or deleted capacity
will affect Mirroring and no data will be replicated.
Enable the Fabric tenant setting Service principals can use Fabric APIs. To learn how
to enable tenant settings, see Fabric Tenant settings.
Networking requirements for Fabric to access your Azure SQL Database:
Currently, Mirroring doesn't support Azure SQL Database logical servers behind
an Azure Virtual Network or private networking. If you have your Azure SQL
logical server behind a private network, you can't enable Azure SQL Database
mirroring.
You need to update your Azure SQL logical server firewall rules to Allow public
network access, and enable the Allow Azure services option to connect to your
Azure SQL Database logical server.
Enable System Assigned Managed Identity (SAMI) of your
Azure SQL logical server
The System Assigned Managed Identity (SAMI) of your Azure SQL logical server must be
enabled, and must be the primary identity, to publish data to Fabric OneLake.
1. To configure or verify that the SAMI is enabled, go to your logical SQL Server in the
Azure portal. Under Security in the resource menu, select Identity.
2. Under System assigned managed identity, select Status to On.
3. The SAMI must be the primary identity. Verify the SAMI is the primary identity with
the following T-SQL query: SELECT * FROM sys.dm_server_managed_identities;
You can accomplish this with a login and mapped database user.
Create a SQL Authenticated login named fabric_login . You can choose any
name for this login. Provide your own strong password. Run the following T-
SQL script in the master database:
SQL
SQL
3. Connect to the Azure SQL Database your plan to mirror to Microsoft Fabric, using
the Azure portal query editor, SQL Server Management Studio (SSMS), or the
mssql extension with Visual Studio Code.
SQL
Or,
SQL
1. Under New sources, select Azure SQL Database. Or, select an existing Azure SQL
Database connection from the OneLake hub.
2. If you selected New connection, enter the connection details to the Azure SQL
Database.
Server: You can find the Server name by navigating to the Azure SQL
Database Overview page in the Azure portal. For example, server-
name.database.windows.net .
Mirror all data means that any new tables created after Mirroring is started
will be mirrored.
Optionally, choose only certain objects to mirror. Disable the Mirror all data
option, then select individual tables from your database.
3. Wait for 2-5 minutes. Then, select Monitor replication to see the status.
4. After a few minutes, the status should change to Running, which means the tables
are being synchronized.
If you don't see the tables and the corresponding replication status, wait a few
seconds and then refresh the panel.
5. When they have finished the initial copying of the tables, a date appears in the
Last refresh column.
6. Now that your data is up and running, there are various analytics scenarios
available across all of Fabric.
) Important
For more information and details on the replication states, see Monitor Fabric mirrored
database replication.
) Important
If there are no updates in the source tables, the replicator engine will start to back
off with an exponentially increasing duration, up to an hour. The replicator engine
will automatically resume regular polling after updated data is detected.
Related content
Mirroring Azure SQL Database
What is Mirroring in Fabric?
Feedback
Was this page helpful? Yes No
This article answers frequently asked questions about Mirroring Azure SQL Database in
Microsoft Fabric.
For troubleshooting steps, see Troubleshoot Fabric mirrored databases from Azure SQL
Database.
Security
Is data ever leaving the customers Fabric tenant?
No.
Cost Management
What are the costs associated with Mirroring?
There is no compute cost for mirroring data from the source to Fabric OneLake. The
Mirroring storage cost is free up to a certain limit based on the purchased compute
capacity SKU you provision. Learn more from the Mirroring section in Microsoft Fabric -
Pricing .
Licensing
What are licensing options for Fabric Mirroring?
A Power BI Premium, Fabric Capacity, or Trial Capacity is required. For more information
on licensing, see Microsoft Fabric licenses.
Related content
What is Mirroring in Fabric?
Azure SQL Database mirroring in Microsoft Fabric
Troubleshoot Fabric mirrored databases from Azure SQL Database.
Feedback
Was this page helpful? Yes No
This guide helps you establish data security in your mirrored Azure SQL Database in
Microsoft Fabric.
Security requirements
1. The System Assigned Managed Identity (SAMI) of your Azure SQL logical server
needs to be enabled, and must be the primary identity. To configure, go to your
logical SQL Server in the Azure portal. Under Security the resource menu, select
Identity. Under System assigned managed identity, select Status to On.
After enabling the SAMI, if the SAMI is disabled or removed, the mirroring of
Azure SQL Database to Fabric OneLake will fail.
After enabling the SAMI, if you add a user assigned managed identity (UAMI),
it will become the primary identity, replacing the SAMI as primary. This will
cause replication to fail. To resolve, remove the UAMI.
2. Fabric needs to connect to the Azure SQL database. For this purpose, create a
dedicated database user with limited permissions, to follow the principle of least
privilege. Create either a login with a strong password and connected user, or a
contained database user with a strong password. For a tutorial, see Tutorial:
Configure Microsoft Fabric mirrored databases from Azure SQL Database.
) Important
You can also mask sensitive data from non-admins using dynamic data masking:
Related content
What is Mirroring in Fabric?
SQL granular permissions in Microsoft Fabric
Feedback
Was this page helpful? Yes No
This article covers troubleshooting steps troubleshooting for mirroring Azure SQL
Database.
For troubleshooting the automatically configured mirroring for Fabric SQL database, see
Troubleshoot mirroring from Fabric SQL database (preview).
Fabric capacity Mirroring will 1. Resume or assign capacity from the Azure portal
paused/deleted stop 2. Go to Fabric mirrored database item. From the toolbar,
select Stop replication.
3. Start replication by selecting Mirror database for the
mirrored item in the Fabric portal.
Fabric capacity Mirroring will 1. Go to Fabric mirrored database item. From the toolbar,
resumed not be resumed select Stop replication.
2. Start replication by selecting Mirror database for the
mirrored item in the Fabric portal.
Workspace Mirroring stops If mirroring is still active on the Azure SQL Database,
deleted automatically execute the following stored procedure on your Azure
SQL Database: exec sp_change_feed_disable_db; .
Fabric capacity Mirroring will Wait until the overload state is over or update your
exceeded pause capacity. Learn more from Actions you can take to recover
from overload situations. Mirroring will continue once the
capacity is recovered.
SQL
SQL
3. If there aren't any issues reported, execute the following stored procedure to
review the current configuration of the mirrored Azure SQL Database. Confirm it
was properly enabled.
SQL
EXEC sp_help_change_feed;
The key columns to look for here are the table_name and state . Any value besides
4 indicates a potential problem.
4. If replication is still not working, verify that the correct SAMI object has
permissions.
a. In the Fabric portal, select the "..." ellipses option on the mirrored database item.
b. Select the Manage Permissions option.
c. Confirm that the Azure SQL logical server name shows with Read, Write
permissions.
d. Ensure that AppId that shows up matches the ID of the SAMI of your Azure SQL
Database logical server.
Managed identity
The System Assigned Managed Identity (SAMI) of the Azure SQL logical server needs to
be enabled, and must be the primary identity. For more information, see Create an
Azure SQL Database server with a user-assigned managed identity.
After enablement, if SAMI setting status is either turned Off or initially enabled, then
disabled, and then enabled again, the mirroring of Azure SQL Database to Fabric
OneLake will fail.
The SAMI must be the primary identity. Verify the SAMI is the primary identity with the
following: SELECT * FROM sys.dm_server_managed_identities;
User Assigned Managed Identity (UAMI) is not supported. If you add a UAMI, it becomes
the primary identity, replacing the SAMI as primary. This causes replication to fail. To
resolve:
SPN permissions
Do not remove Azure SQL Database service principal name (SPN) contributor
permissions on Fabric mirrored database item.
If you accidentally remove the SPN permission, Mirroring Azure SQL Database will not
function as expected. No new data can be mirrored from the source database.
If you remove Azure SQL Database SPN permissions or permissions are not set up
correctly, use the following steps.
1. Add the SPN as a user by selecting the ... ellipses option on the mirrored
database item.
2. Select the Manage Permissions option.
3. Enter the name of the Azure SQL Database logical server name. Provide Read and
Write permissions.
Related content
Limitations of Microsoft Fabric Data Warehouse
Frequently asked questions for Mirroring Azure SQL Database in Microsoft Fabric
Feedback
Was this page helpful? Yes No
Provide product feedback | Ask the community
Limitations in Microsoft Fabric mirrored
databases from Azure SQL Database
Article • 11/19/2024
Current limitations in the Microsoft Fabric mirrored databases from Azure SQL Database
are listed in this page. This page is subject to change.
Table level
A table that does not have a defined primary key cannot be mirrored.
A table using a primary key defined as nonclustered primary key cannot be
mirrored.
A table cannot be mirrored if the primary key is one of the data types: sql_variant,
timestamp/rowversion.
Delta lake supports only six digits of precision.
Columns of SQL type datetime2, with precision of 7 fractional second digits, do
not have a corresponding data type with same precision in Delta files in Fabric
OneLake. A precision loss happens if columns of this type are mirrored and
seventh decimal second digit will be trimmed.
A table cannot be mirrored if the primary key is one of these data types:
datetime2(7), datetimeoffset(7), time(7), where 7 is seven digits of precision.
The datetimeoffset(7) data type does not have a corresponding data type with
same precision in Delta files in Fabric OneLake. A precision loss (loss of time
zone and seventh time decimal) occurs if columns of this type are mirrored.
Clustered columnstore indexes are not currently supported.
If one or more columns in the table is of type Large Binary Object (LOB) with a size
> 1 MB, the column data is truncated to size of 1 MB in Fabric OneLake.
Source tables that have any of the following features in use cannot be mirrored.
Temporal history tables and ledger history tables
Always Encrypted
In-memory tables
Graph
External tables
The following table-level data definition language (DDL) operations aren't allowed
on SQL database source tables when enabled for mirroring.
Switch/Split/Merge partition
Alter primary key
When there is DDL change, a complete data snapshot is restarted for the changed
table, and data is reseeded.
Currently, a table cannot be mirrored if it has the json or vector data type.
Currently, you cannot ALTER a column to the vector or json data type when a
table is mirrored.
Column level
If the source table contains computed columns, these columns cannot be mirrored
to Fabric OneLake.
If the source table contains columns with one of these data types, these columns
cannot be mirrored to Fabric OneLake. The following data types are unsupported
for mirroring:
image
text/ntext
xml
rowversion/timestamp
sql_variant
User Defined Types (UDT)
geometry
geography
Column names for a SQL table cannot contain spaces nor the following characters:
, ; { } ( ) \n \t = .
Warehouse limitations
Source schema hierarchy is not replicated to the mirrored database. Instead,
source schema is flattened, and schema name is encoded into the mirrored
database table name.
The SQL analytics endpoint is the same as the Lakehouse SQL analytics endpoint. It
is the same read-only experience. See SQL analytics endpoint limitations.
Asia Pacific:
Australia East
Australia Southeast
Central India
East Asia
Japan East
Korea Central
Southeast Asia
South India
Europe
North Europe
West Europe
France Central
Germany West Central
Norway East
Sweden Central
Switzerland North
Switzerland West
UK South
UK West
Americas:
Brazil South
Canada Central
Canada East
Central US
East US
East US2
North Central US
West US
West US2
Next step
Tutorial: Configure Microsoft Fabric mirrored databases from Azure SQL Database
Related content
Monitor Fabric mirrored database replication
Model data in the default Power BI semantic model in Microsoft Fabric
Feedback
Was this page helpful? Yes No
Mirroring in Fabric provides an easy experience to avoid complex ETL (Extract Transform
Load) and integrate your existing Azure SQL Managed Instance estate with the rest of
your data in Microsoft Fabric. You can continuously replicate your existing SQL Managed
Instance databases directly into Fabric's OneLake. Inside Fabric, you can unlock powerful
business intelligence, artificial intelligence, Data Engineering, Data Science, and data
sharing scenarios.
For a tutorial on configuring your Azure SQL Managed Instance for Mirroring in Fabric,
see Tutorial: Configure Microsoft Fabric mirrored databases from Azure SQL Managed
Instance (Preview).
The mirrored database item. Mirroring manages the replication of data into
OneLake and conversion to Parquet, in an analytics-ready format. This enables
downstream scenarios like data engineering, data science, and more.
A SQL analytics endpoint
A default semantic model
Each mirrored Azure SQL Managed Instance has an autogenerated SQL analytics
endpoint that provides a rich analytical experience on top of the Delta Tables created by
the mirroring process. Users have access to familiar T-SQL commands that can define
and query data objects but not manipulate the data from the SQL analytics endpoint, as
it's a read-only copy. You can perform the following actions in the SQL analytics
endpoint:
Explore the tables that reference data in your Delta Lake tables from Azure SQL
Managed Instance.
Create no code queries and views and explore data visually without writing a line
of code.
Develop SQL views, inline TVFs (Table-valued Functions), and stored procedures to
encapsulate your semantics and business logic in T-SQL.
Manage permissions on the objects.
Query data in other Warehouses and Lakehouses in the same workspace.
In addition to the SQL query editor, there's a broad ecosystem of tooling that can query
the SQL analytics endpoint, including SQL Server Management Studio (SSMS), Azure
Data Studio, and even GitHub Copilot.
Network requirements
During the current preview, Fabric Mirroring for Azure SQL Managed Instance requires
you to use the Public Endpoint and to configure your SQL managed instance VNET to
allow traffic from and to Azure services. You can use Azure Cloud or Power BI service
tags to scope this configuration:
Currently, you must update your Azure SQL Managed Instance network security to
Enable public endpoints.
Currently, you must allow Public Endpoint traffic in the network security group
option to be able connect your Fabric workspace to your Azure SQL Managed
Instance.
Next step
Tutorial: Configure Microsoft Fabric mirrored databases from Azure SQL Managed
Instance (Preview)
Related content
How to: Secure data Microsoft Fabric mirrored databases from Azure SQL
Managed Instance (Preview)
Limitations in Microsoft Fabric mirrored databases from Azure SQL Managed
Instance (Preview)
Monitor Fabric mirrored Managed Instance database replication
Troubleshoot Fabric mirrored databases from Azure SQL Managed Instance
(Preview)
Feedback
Was this page helpful? Yes No
Prerequisites
Create or use an existing Azure SQL Managed Instance.
Update Policy for source Azure SQL Managed Instance needs to be configured
to "Always up to date"
The source Azure SQL Managed Instance can be either a single SQL managed
instance or a SQL managed instance belonging to an instance pool.
If you don't have an Azure SQL Managed Instance, you can create a new SQL
managed instance. You can use the Azure SQL Managed Instance free offer if
you like.
During the current preview, we recommend using a copy of one of your existing
databases or any existing test or development database that you can recover
quickly from a backup. If you want to use a database from an existing backup,
see Restore a database from a backup in Azure SQL Managed Instance.
You need an existing capacity for Fabric. If you don't, start a Fabric trial.
The Fabric capacity needs to be active and running. A paused or deleted
capacity impacts Mirroring and no data are replicated.
Enable the Fabric tenant setting Service principals can use Fabric APIs. To learn how
to enable tenant settings, see About tenant settings.
Networking requirements for Fabric to access your Azure SQL Managed Instance:
In the current preview, Mirroring requires that your Azure SQL Managed
Instance has a public endpoint which needs to be accessible from Azure Cloud
or Power BI service tags. For more information, see Use Azure SQL Managed
Instance securely with public endpoints how to securely run a public endpoint
for Azure SQL Managed Instance.
Enable System Assigned Managed Identity (SAMI) of your
Azure SQL Managed Instance
The System Assigned Managed Identity (SAMI) of your Azure SQL Managed Instance
must be enabled, and must be the primary identity, to publish data to Fabric OneLake.
1. To configure or verify that the SAMI is enabled, go to your SQL Managed Instance
in the Azure portal. Under Security in the resource menu, select Identity.
2. Under System assigned managed identity, select Status to On.
3. The SAMI must be the primary identity. Verify the SAMI is the primary identity with
the following T-SQL query: SELECT * FROM sys.dm_server_managed_identities;
You can accomplish this with a login and mapped database user. Following the principle
of least privilege for security, you should only grant CONTROL DATABASE permission in
the database you intend to mirror.
Create a SQL Authenticated login. You can choose any name for this login,
substitute it in the following script for <fabric_login> . Provide your own
strong password. Run the following T-SQL script in the master database:
SQL
SQL
CREATE LOGIN [[email protected]] FROM EXTERNAL PROVIDER;
ALTER SERVER ROLE [##MS_ServerStateReader##] ADD MEMBER
[[email protected]];
3. Switch your query scope to the database you want to mirror. Substitute the name
of your database for <mirroring_source_database> and run the following T-SQL:
SQL
USE [<mirroring_source_database>];
4. Create a database user connected to the login. Substitute the name of a new
database user for this purpose for <fabric_user> :
SQL
SQL
1. Under New sources, select Azure SQL Managed Instance. Or, select an existing
Azure SQL Managed Instance connection from the OneLake catalog.
a. You can't use existing Azure SQL Managed Instance connections with type "SQL
Server" (generic connection type). Only connections with connection type "SQL
Managed Instance" are supported for mirroring of Azure SQL Managed Instance
data.
2. If you selected New connection, enter the connection details to the Azure SQL
Managed Instance. You need to connect to a specific database, you can't set up
mirroring for the entire SQL managed instance and all its databases.
Server: You can find the Server name by navigating to the Azure SQL
Managed Instance Networking page in the Azure portal (under Security
menu) and looking at the Public Endpoint field. For example,
<managed_instance_name>.public.<dns_zone>.database.windows.net,3342 .
3. Select Connect.
Mirror all data means that any new tables created after Mirroring is started
will be mirrored.
Optionally, choose only certain objects to mirror. Disable the Mirror all data
option, then select individual tables from your database.
If tables can't be mirrored at all, they show an error icon and relevant
explanation text. Likewise, if tables can only mirror with limitations, a warning
icon is shown with relevant explanation text.
2. On the next screen, give the destination item a name and select Create mirrored
database. Now wait a minute or two for Fabric to provision everything for you.
4. After a few minutes, the status should change to Running, which means the tables
are being synchronized.
If you don't see the tables and the corresponding replication status, wait a few
seconds and then refresh the panel.
5. When the initial copying of the tables is finished, a date appears in the Last refresh
column.
6. Now that your data is up and running, there are various analytics scenarios
available across all of Fabric.
) Important
If the initial sync is completed, a Last completed timestamp is shown next to the table
name. This timestamp indicates the time when Fabric has last checked the table for
changes.
Also, note the Rows replicated column. It counts all the rows that have been replicated
for the table. Each time a row is replicated, it is counted again. This means that, for
example, inserting a row with primary key =1 on the source increases the "Rows
replicated" count by one. If you update the row with the same primary key, replicates to
Fabric again, and the row count increases by one, even though it's the same row which
replicated again. Fabric counts all replications that happened on the row, including
inserts, deletes, updates.
The Monitor replication screen also reflects any errors and warnings with tables being
mirrored. If the table has unsupported column types or if the entire table is unsupported
(for example, in memory or columnstore indexes), a notification about the limitation is
shown on this screen. For more information and details on the replication states, see
Monitor Fabric mirrored database replication.
) Important
If there are no updates in the source tables, the replicator engine will start to back
off with an exponentially increasing duration, up to an hour. The replicator engine
will automatically resume regular polling after updated data is detected.
Related content
Mirroring Azure SQL Managed Instance (Preview)
What is Mirroring in Fabric?
Feedback
Was this page helpful? Yes No
This article answers frequently asked questions about Mirroring Azure SQL Managed
Instance in Microsoft Fabric.
For troubleshooting steps, see Troubleshoot Fabric mirrored databases from Azure SQL
Managed Instance. Contact support if more troubleshooting is required.
Security
What authentication to the Azure SQL Managed
Instance is allowed?
Currently, for authentication to the source Azure SQL Managed Instance, we support
SQL authentication with user name and password and Microsoft Entra ID. Your SQL
managed instance should have read rights on your Microsoft Entra directory. For more
information, see Configure and manage Microsoft Entra authentication with Azure SQL.
Cost Management
What are the costs associated with Mirroring?
There's no compute cost for mirroring data from the source to Fabric OneLake. The
Mirroring storage cost is free up to a certain limit based on the purchased compute
capacity SKU you provision. Learn more from the Mirroring section in Microsoft Fabric
Pricing . The compute on SQL, Power BI, or Spark to consume the mirrored data will be
charged based on the Capacity.
Licensing
What are licensing options for Fabric Mirroring?
A Power BI Premium, Fabric Capacity, or Trial Capacity is required. For more information
on licensing, see Microsoft Fabric licenses.
Feedback
Was this page helpful? Yes No
This guide helps you establish data security in your mirrored Azure SQL Managed
Instance database in Microsoft Fabric.
Security requirements
1. The System Assigned Managed Identity (SAMI) of your Azure SQL Managed
Instance needs to be enabled, and must be the primary identity. To configure or
verify that the SAMI is enabled, go to your SQL Managed Instance in the Azure
portal. Under Security in the resource menu, select Identity. Under System
assigned managed identity, select Status to On.
After enabling the SAMI, if the SAMI is disabled or removed, the mirroring of
Azure SQL Managed Instance to Fabric OneLake will fail.
After enabling the SAMI, if you add a user assigned managed identity (UAMI),
it will become the primary identity, replacing the SAMI as primary. This will
cause replication to fail. To resolve, remove the UAMI.
2. Fabric needs to connect to the Azure SQL Managed Instance. For this purpose,
create a dedicated database user with limited permissions, to follow the principle
of least privilege. For a tutorial, see Tutorial: Configure Microsoft Fabric mirrored
databases from Azure SQL Managed Instance (Preview).
) Important
You can also mask sensitive data from non-admins using dynamic data masking:
Related content
What is Mirroring in Fabric?
SQL granular permissions in Microsoft Fabric
Feedback
Was this page helpful? Yes No
This article covers troubleshooting steps troubleshooting for mirroring Azure SQL
Managed Instance.
Fabric capacity Mirroring stops 1. Resume or assign capacity from the Azure portal
paused/deleted 2. Go to Fabric mirrored database item. From the toolbar,
select Stop replication.
3. Start replication by selecting Mirror database for the
mirrored item in the Fabric portal.
Fabric capacity Mirroring isn't 1. Go to Fabric mirrored database item. From the toolbar,
resumed resumed select Stop replication.
2. Start replication by selecting Mirror database for the
mirrored item in the Fabric portal.
Workspace Mirroring stops 1. If mirroring is still active on the Azure SQL Managed
deleted automatically Instance, execute the following stored procedure on your
Azure SQL Managed Instance: exec
sp_change_feed_disable_db; .
SQL
SELECT * FROM sys.dm_change_feed_log_scan_sessions;
SQL
3. If there aren't any issues reported, execute the following stored procedure to
review the current configuration of the mirrored Azure SQL Managed Instance.
Confirm it was properly enabled.
SQL
EXEC sp_help_change_feed;
The key columns to look for here are the table_name and state . Any value besides
4 indicates a potential problem. (Tables shouldn't sit for too long in statuses other
than 4 )
4. If replication is still not working, verify that the correct SAMI object has
permissions (see SPN permissions).
a. In the Fabric portal, select the "..." ellipses option on the mirrored database item.
b. Select the Manage Permissions option.
c. Confirm that the Azure SQL Managed Instance name shows with Read, Write
permissions.
d. Ensure that AppId that shows up matches the ID of the SAMI of your Azure SQL
Managed Instance.
Managed identity
The System Assigned Managed Identity (SAMI) of the Azure SQL Managed Instance
needs to be enabled, and must be the primary identity.
After enablement, if SAMI setting status is either turned Off or initially enabled, then
disabled, and then enabled again, the mirroring of Azure SQL Managed Instance to
Fabric OneLake will fail. SAMI after re-enabling isn't the same identity as before
disabling. Therefore, you need to grant the new SAMI permissions to access the Fabric
workspace.
The SAMI must be the primary identity. Verify the SAMI is the primary identity with the
following SQL: SELECT * FROM sys.dm_server_managed_identities;
User Assigned Managed Identity (UAMI) isn't supported. If you add a UAMI, it becomes
the primary identity, replacing the SAMI as primary. This causes replication to fail. To
resolve:
SPN permissions
Don't remove Azure SQL Managed Instance service principal name (SPN) contributor
permissions on Fabric mirrored database item.
If you accidentally remove the SPN permission, mirroring Azure SQL Managed Instance
won't function as expected. No new data can be mirrored from the source database.
If you remove Azure SQL Managed Instance SPN permissions or permissions aren't set
up correctly, use the following steps.
1. Add the SPN as a user by selecting the ... ellipses option on the mirrored
managed instance item.
2. Select the Manage Permissions option.
3. Enter the Azure SQL Managed Instance public endpoint. Provide Read and Write
permissions.
Related content
Limitations in Microsoft Fabric mirrored databases from Azure SQL Managed
Instance (Preview)
Frequently asked questions for Mirroring Azure SQL Managed Instance in
Microsoft Fabric (Preview)
Feedback
Was this page helpful? Yes No
Current limitations in the Microsoft Fabric mirrored databases from Azure SQL Managed
Instance are listed in this page. This page is subject to change.
Feature availability
You can configure your Azure SQL Managed Instance for mirroring if it is deployed to
any Azure except: East US 2; West US 2; Central US; West US. For a complete list of
region support, see Fabric regions that support Mirroring.
Column level
If the source table contains computed columns, these columns can't be mirrored to
Fabric OneLake.
If the source table contains columns with one of these data types, these columns
can't be mirrored to Fabric OneLake. The following data types are unsupported for
mirroring:
image
text/ntext
xml
json
rowversion/timestamp
sql_variant
User Defined Types (UDT)
geometry
geography
Column names for a SQL table can't contain spaces nor the following characters: ,
; { } ( ) \n \t = .
The following column level data definition language (DDL) operations aren't
supported on source tables when they're enabled for SQL Managed Instance
mirroring to Microsoft Fabric:
Alter column
Rename column ( sp_rename )
Asia Pacific:
Australia East
Australia Southeast
Central India
East Asia
Japan East
Korea Central
Southeast Asia
South India
Europe
North Europe
West Europe
France Central
Germany West Central
Norway East
Sweden Central
Switzerland North
Switzerland West
UK South
UK West
Americas:
Brazil South
Canada Central
Canada East
East US2
West US2
Related content
Monitor Fabric mirrored database replication
Model data in the default Power BI semantic model in Microsoft Fabric
Feedback
Was this page helpful? Yes No
Data in OneLake is stored in the open-source delta format and automatically made
available to all analytical engines on Fabric.
You can use built-in Power BI capabilities to access data in OneLake in DirectLake mode.
With Copilot enhancements in Fabric, you can use the power of generative AI to get key
insights on your business data. In addition to Power BI, you can use T-SQL to run
complex aggregate queries or use Spark for data exploration. You can seamlessly access
the data in notebooks and use data science to build machine learning models.
) Important
If you're looking for BI reporting or analytics on your operational data in Azure Cosmos
DB, mirroring provides:
Every Mirrored Azure Cosmos DB database has three items you can interact with in your
Fabric workspace:
The mirrored database item. Mirroring manages the replication of data into
OneLake and conversion to Parquet, in an analytics-ready format. This enables
downstream scenarios like data engineering, data science, and more.
SQL analytics endpoint, which is automatically generated
Default semantic model, which is automatically generated
Mirrored database
The mirrored database shows the replication status and the controls to stop or start
replication in Fabric OneLake. You can also view your source database, in read-only
mode, using the Azure Cosmos DB data explorer. Using data explorer, you can view your
containers in your source Azure Cosmos DB database and query them. These operations
consume request units (RUs) from your Azure Cosmos DB account. Any changes to the
source database are reflected immediately in Fabric's source database view. Writing to
the source database isn't allowed from Fabric, as you can only view the data.
SQL analytics endpoint
Each mirrored database has an autogenerated SQL analytics endpoint that provides a
rich analytical experience on top of the OneLake's Delta tables created by the mirroring
process. You have access to familiar T-SQL commands that can define and query data
objects but not manipulate the data from the SQL analytics endpoint, as it's a read-only
copy.
You can perform the following actions in the SQL analytics endpoint:
Explore Delta Lake tables using T-SQL. Each table is mapped to a container from
your Azure Cosmos DB database.
Create no-code queries and views and explore them visually without writing a line
of code.
Join and query data in other mirrored databases, Warehouses, and Lakehouses in
the same workspace.
You can easily visualize and build BI reports based on SQL queries or views.
In addition to the SQL query editor, there's a broad ecosystem of tooling. These tools
include the mssql extension with Visual Studio Code, SQL Server Management Studio
(SSMS), and even GitHub Copilot. You can supercharge analysis and insights generation
from the tool of your choice.
Semantic model
The default semantic model is an automatically provisioned Power BI Semantic Model.
This feature enables business metrics to be created, shared, and reused. For more
information, see semantic models.
The continuous backup feature is a prerequisite for mirroring. You can enable either 7-
day or 30-day continuous backup on your Azure Cosmos DB account. If you are
enabling continuous backup specifically for mirroring, 7-day continuous backup is
recommended, as it is free of cost.
7 Note
Mirroring does not use Azure Cosmos DB's analytical store or change feed as a
change data capture source. You can continue to use these capabilities
independently, along with mirroring.
It could take a few minutes to replicate your Azure Cosmos DB Data into Fabric
OneLake. Depending on your data's initial snapshot or the frequency of
updates/deletes, replication could also take longer in some cases. Replication doesn't
affect the request units (RUs) you allocated for your transactional workloads.
Setup considerations
To mirror a database, it should already be provisioned in Azure. You must enable
continuous backup on the account as a prerequisite.
You can only mirror each database individually at a time. You can choose which
database to mirror.
You can mirror the same database multiple times within the same workspace. As a
best practice, a single copy of database can be reused across lakehouses,
warehouses, or other mirrored databases. You shouldn't need to set up multiple
mirrors to the same database.
You can also mirror the same database across different Fabric workspaces or
tenants.
Changes to Azure Cosmos DB containers, such as adding new containers and
deleting existing ones, are replicated seamlessly to Fabric. You can start mirroring
an empty database with no containers, for example, and mirroring seamlessly picks
up the containers added at a later point in time.
selectively. If you're using Power Query, you can also apply the ToJson function to
expand this data.
7 Note
Fabric has a limitation for string columns of 8 KB in size. For more information, see
data warehouse limitations.
If you rename a property in an item, Fabric tables retain both the old and new columns.
The old column will show null and the new one will show the latest value, for any items
that are replicated after the renaming operation.
If you change the data type of a property in Azure Cosmos DB items, the changes are
supported for compatible data types that can be converted. If the data types aren't
compatible for conversion in Delta, they're represented as null values.
SQL analytics endpoint tables convert Delta data types to T-SQL data types.
For example, if the Azure Cosmos DB item has addressName and AddressName as unique
properties, Fabric tables have corresponding addressName and AddressName_1 columns.
For more information, see replication limitations.
Security
Connections to your source database are based on account keys for your Azure Cosmos
DB accounts. If you rotate or regenerate the keys, you need to update the connections
to ensure replication works. For more information, see connections.
Account keys aren't directly visible to other Fabric users once the connection is set up.
You can limit who has access to the connections created in Fabric. Writes aren't
permitted to Azure Cosmos DB database either from the data explorer or analytics
endpoint in your mirrored database.
Mirroring doesn't currently support authentication using read-only account keys, single-
sign on (SSO) with Microsoft Entra IDs and role-based access control, or managed
identities.
Once the data is replicated into Fabric OneLake, you need to secure access to this data.
You can secure column filters and predicate-based row filters on tables to roles and
users in Microsoft Fabric:
You can also mask sensitive data from non admin users using dynamic data masking:
Network security
Currently, mirroring doesn't support private endpoints or customer managed keys
(CMK) on OneLake. Mirroring isn't supported for Azure Cosmos DB accounts with
network security configurations less permissive than all networks, using service
endpoints, using private endpoints, using IP addresses, or using any other settings that
could limit public network access to the account. Azure Cosmos DB accounts should be
open to all networks to work with mirroring.
For an Azure Cosmos DB account with a primary write region and multiple read regions,
mirroring chooses the Azure Cosmos DB read region closest to the region where Fabric
capacity is configured. This selection helps provide low-latency replication for mirroring.
When you switch your Azure Cosmos DB account to a recovery region, mirroring
automatically selects the nearest Azure Cosmos DB region again.
7 Note
Learn more on how to access OneLake using ADLS Gen2 APIs or SDK, the OneLake File
explorer, and Azure Storage explorer.
You can connect to the SQL analytics endpoint from tools such as SQL Server
Management Studio (SSMS) or using drivers like Microsoft Open Database Connectivity
(ODBC) and Java Database Connectivity (JDBC). For more information, see SQL analytics
endpoint connectivity.
You can also access mirrored data with services such as:
Azure services like Azure Databricks, Azure HDInsight, or Azure Synapse Analytics
Fabric Lakehouse using shortcuts for data engineering and data science scenarios
Other mirrored databases or warehouses in the Fabric workspace
You can also build medallion architecture solutions, cleaning and transforming the data
that is landing into mirrored database as the bronze layer. For more information, see
medallion architecture support in Fabric.
Pricing
Mirroring is free of cost for compute used to replicate your Cosmos DB data into Fabric
OneLake. Storage in OneLake is free of cost based on certain conditions. For more
information, see OneLake pricing for mirroring. The compute usage for querying data
via SQL, Power BI or, Spark is still charged based on the Fabric Capacity.
If you're using the data explorer in Fabric mirroring, you accrue typical costs based on
request unit (RU) usage to explore the containers and query the items in the source
Azure Cosmos DB database. The Azure Cosmos DB continuous backup feature is a
prerequisite to mirroring: Standard charges for continuous backup apply. There are no
additional charges for mirroring on continuous backup billing. For more information, see
Azure Cosmos DB pricing .
Next step
Tutorial: Configure Microsoft Fabric mirrored databases from Azure Cosmos DB
(Preview)
Related content
Limitations in Microsoft Fabric mirrored databases from Azure Cosmos DB
(Preview)
FAQ: Microsoft Fabric mirrored databases from Azure Cosmos DB
Feedback
Was this page helpful? Yes No
In this tutorial, you configure a Fabric mirrored database from an existing Azure Cosmos
DB for NoSQL account.
Mirroring incrementally replicates Azure Cosmos DB data into Fabric OneLake in near
real-time, without affecting the performance of transactional workloads or consuming
Request Units (RUs). You can build Power BI reports directly on the data in OneLake,
using DirectLake mode. You can run ad hoc queries in SQL or Spark, build data models
using notebooks and use built-in Copilot and advanced AI capabilities in Fabric to
analyze the data.
) Important
Prerequisites
An existing Azure Cosmos DB for NoSQL account.
If you don't have an Azure subscription, Try Azure Cosmos DB for NoSQL free .
If you have an existing Azure subscription, create a new Azure Cosmos DB for
NoSQL account.
An existing Fabric capacity. If you don't have an existing capacity, start a Fabric
trial. Mirroring might not be available in some Fabric regions. For more
information, see supported regions.
Tip
During the public preview, it's recommended to use a test or development copy of
your existing Azure Cosmos DB data that can be recovered quickly from a backup.
Configure your Azure Cosmos DB account
First, ensure that the source Azure Cosmos DB account is correctly configured to use
with Fabric mirroring.
2. Ensure that continuous backup is enabled. If not enabled, follow the guide at
migrate an existing Azure Cosmos DB account to continuous backup to enable
continuous backup. This feature might not be available in some scenarios. For
more information, see database and account limitations.
3. Ensure that the networking options are set to public network access for all
networks. If not, follow the guide at configure network access to an Azure Cosmos
DB account.
4. Select Create, locate the Data Warehouse section, and then select Mirrored Azure
Cosmos DB (Preview).
5. Provide a name for the mirrored database and then select Create.
2. Provide credentials for the Azure Cosmos DB for NoSQL account including these
items:
ノ Expand table
Value
7 Note
2. Wait two to five minutes. Then, select Monitor replication to see the status of the
replication action.
3. After a few minutes, the status should change to Running, which indicates that the
containers are being synchronized.
Tip
If you can't find the containers and the corresponding replication status, wait
a few seconds and then refresh the pane. In rare cases, you might receive
transient error messages. You can safely ignore them and continue to refresh.
4. When mirroring finishes the initial copying of the containers, a date appears in the
last refresh column. If data was successfully replicated, the total rows column
would contain the number of items replicated.
2. Here, monitor the current state of replication. For more information and details on
the replication states, see Monitor Fabric mirrored database replication.
2. Select View, then Source database. This action opens the Azure Cosmos DB data
explorer with a read-only view of the source database.
3. Select a container, then open the context menu and select New SQL query.
4. Run any query. For example, use SELECT COUNT(1) FROM container to count the
number of items in the container.
7 Note
All the reads on source database are routed to Azure and will consume
Request Units (RUs) allocated on the account.
3. Each container in the source database should be represented in the SQL analytics
endpoint as a warehouse table.
4. Select any table, open the context menu, then select New SQL Query, and finally
select Select Top 100.
5. The query executes and returns 100 records in the selected table.
6. Open the context menu for the same table and select New SQL Query. Write an
example query that use aggregates like SUM , COUNT , MIN , or MAX . Join multiple
tables in the warehouse to execute the query across multiple containers.
7 Note
SQL
SELECT
d.[product_category_name],
t.[order_status],
c.[customer_country],
s.[seller_state],
p.[payment_type],
sum(o.[price]) as price,
sum(o.[freight_value]) freight_value
FROM
[dbo].[products] p
INNER JOIN
[dbo].[OrdersDB_order_payments] p
on o.[order_id] = p.[order_id]
INNER JOIN
[dbo].[OrdersDB_order_status] t
ON o.[order_id] = t.[order_id]
INNER JOIN
[dbo].[OrdersDB_customers] c
on t.[customer_id] = c.[customer_id]
INNER JOIN
[dbo].[OrdersDB_productdirectory] d
ON o.product_id = d.product_id
INNER JOIN
[dbo].[OrdersDB_sellers] s
on o.seller_id = s.seller_id
GROUP BY
d.[product_category_name],
t.[order_status],
c.[customer_country],
s.[seller_state],
p.[payment_type]
This example assumes the name of your table and columns. Use your own
table and columns when writing your SQL query.
7. Select the query and then select Save as view. Give the view a unique name. You
can access this view at any time from the Fabric portal.
9. Select New visual query. Use the query editor to build complex queries.
Tip
You can also optionally use Copilot or other enhancements to build dashboards
and reports without any further data movement.
More examples
Learn more about how to access and query mirrored Azure Cosmos DB data in Fabric:
How to: Query nested data in Microsoft Fabric mirrored databases from Azure
Cosmos DB (Preview)
How to: Access mirrored Azure Cosmos DB data in Lakehouse and notebooks from
Microsoft Fabric (Preview)
How to: Join mirrored Azure Cosmos DB data with other mirrored databases in
Microsoft Fabric (Preview)
Related content
Mirroring Azure Cosmos DB (Preview)
FAQ: Microsoft Fabric mirrored databases from Azure Cosmos DB
Feedback
Was this page helpful? Yes No
This article answers frequently asked questions about Mirrored Azure Cosmos DB
database in Microsoft Fabric.
) Important
General questions
How is mirroring different from shortcuts in
relation to Azure Cosmos DB?
Mirroring replicates the source database into Fabric OneLake in open-source delta
format. You can run analytics on this data from anywhere in Fabric. Shortcuts don't
replicate the data into Fabric OneLake. Instead, shortcuts link to the source data without
data movement. Currently, Azure Cosmos DB is only available as a source for mirroring.
In contrast, a copy job is a scheduled job, which can add end-to-end latency for
incremental jobs. Additionally, copy jobs requirement management to pick up
incremental changes, add to compute costs in Fabric, and affect request unit
consumption on the source database in Azure Cosmos DB.
Copy jobs are useful for one-time copy jobs from Azure Cosmos DB, but mirroring is
ideal for tracking incremental changes.
2 Warning
Pricing
What costs are associated with mirroring Azure
Cosmos DB?
Mirroring is in preview. There are currently no costs for compute used to replicate data
from Azure Cosmos DB to Fabric OneLake. Storage costs for OneLake are also free upto
certain limits. For more information, see OneLake pricing for mirroring . The compute
for querying data using SQL, Power BI, or Spark is charged at regular rates.
For Azure Cosmos DB, continuous backup is a prerequisite to mirroring. If you enabled
any continuous backup tier before mirroring, you don't accrue any extra cost. If you
enable continuous backup specifically for mirroring, 7-day backup mode is free of cost;
if you enable 30-day backup, you're billed the price associated with that feature. For
more information, see Azure Cosmos DB pricing .
If you use data explorer to view the source data from Azure Cosmos DB, you will accrue
costs based on Request Units (RU) usage.
In Azure Cosmos DB, continuous backup is a prerequisite for mirroring. This prerequisite
allows Fabric to mirror your data without impacting your transactional workloads or
requiring the analytical store.
In Azure Cosmos DB, continuous backup is a prerequisite for mirroring. This prerequisite
allows Fabric to mirror your data without impacting your transactional workloads or
requiring the analytical store.
Does mirroring affect how Azure Synapse Link
works with Azure Cosmos DB?
No, mirroring in Fabric isn't related to Azure Synapse Link. You can continue to use
Azure Synapse Link while using Fabric mirroring.
Setup
Can I select specific containers within an Azure
Cosmos DB database for mirroring?
No, when you mirror a database from Azure Cosmos DB, all containers are replication
into Fabric OneLake.
This view of the live data directly in the Fabric portal is a useful tool to determine if the
data in OneLake is recent or represented correctly when compared to the source Azure
Cosmos DB database. Operations using the data explorer on the live Azure Cosmos DB
data can accrue request unit consumption.
Additionally, use Lakehouse to analyze the OneLake data long with other data. From
Lakehouse, you can utilize Spark to query data with notebooks.
) Important
Replication actions
How can I stop or disable replication for a
mirrored Azure Cosmos DB database?
Stop replication by using the Fabric portal's stop replication option. This action
completely stops replication but not remove any data that already exists in OneLake.
API support
Can I configure Azure Cosmos DB mirroring
programatically?
No, support for automated mirroring configuring is currently not available.
Security
Can you access an Azure Cosmos DB mirrored
database using Power BI Gateway or behind a
firewall?
No, this level of access is currently not supported.
Licensing
What are the licensing options for Azure Cosmos
DB mirroring?
Power BI Premium, Fabric Capacity, or Trial Capacity licensing is required to use
mirroring.
Related content
Overview of Microsoft Fabric mirrored databases from Azure Cosmos DB
Troubleshooting: Microsoft Fabric mirrored databases from Azure Cosmos DB
Limitations: Microsoft Fabric mirrored databases from Azure Cosmos DB
Feedback
Was this page helpful? Yes No
This article details the current limitations for Azure Cosmos DB accounts mirrored into
Microsoft Fabric. The limitation and quota details on this page are subject to change in
the future.
) Important
Availability
Mirroring is supported in a specific set of regions for Fabric and APIs for Azure Cosmos
DB.
Supported APIs
Mirroring is only available for the Azure Cosmos DB account types listed here.
ノ Expand table
Available
Asia Pacific:
Australia East
Australia Southeast
Central India
East Asia
Japan East
Korea Central
Southeast Asia
South India
Europe
North Europe
West Europe
France Central
Germany West Central
Norway East
Sweden Central
Switzerland North
Switzerland West
UK South
UK West
Americas:
Brazil South
Canada Central
Canada East
Central US
East US
East US2
North Central US
West US
West US2
Security limitations
Azure Cosmos DB read-write account keys are the only supported mechanism to
connect to the source account. Read-only account keys, managed identities, and
passwordless authentication with role-based access control aren't supported.
You must update the connection credentials for Fabric mirroring if the account
keys are rotated. If you don't update the keys, mirroring fails. To resolve this failure,
stop replication, update the credentials with the newly rotated keys, and then
restart replication.
Fabric users with access to the workspace automatically inherit access to the mirror
database. However, you can granularly control workspace and tenant level access
to manage access for users in your organization.
You can directly share the mirrored database in Fabric.
Permissions
If you only have viewer permissions in Fabric, you can't preview or query data in
the SQL analytics endpoint.
If you intend to use the data explorer, the Azure Cosmos DB data explorer doesn't
use the same permissions as Fabric. Requests to view and query data using the
data explorer are routed to Azure instead of Fabric.
Network security
The source Azure Cosmos DB account must enable public network access for all
networks.
Private endpoints aren't supported for Azure Cosmos DB accounts.
Network isolation using techniques and features like IP addresses or service
endpoints aren't supported for Azure Cosmos DB accounts.
Data in OneLake doesn't support private endpoints, customer managed keys, or
double encryption.
Replication limitations
Mirroring doesn't support containers that contain items with property names
containing either whitespaces or wild-card characters. This limitation causes
mirroring for the specific container to fail. Other containers within the same
databases can still successfully mirror. If property names are updated to remove
these invalid characters, you must configure a new mirror to the same database
and container and you can't use the old mirror.
Fabric OneLake mirrors from the geographically closest Azure region to Fabric's
capacity region in scenarios where an Azure Cosmos DB account has multiple read
regions. In disaster recovery scenarios, mirroring automatically scans and picks up
new read regions as your read regions could potentially fail over and change.
Delete operations in the source container are immediately reflected in Fabric
OneLake using mirroring. Soft-delete operations using time-to-live (TTL) values
isn't supported.
Mirroring doesn't support custom partitioning.
Fabric has existing limitations with T-SQL. For more information, see T-SQL
limitations.
Nested data
Nested JSON objects in Azure Cosmos DB items are represented as JSON strings in
warehouse tables.
Commands such as OPENJSON , CROSS APPLY , and OUTER APPLY are available to
expand JSON string data selectively.
PowerQuery includes ToJson to expand JSON string data selectively.
Mirroring doesn't have schema constraints on the level of nesting. For more
information, see Azure Cosmos DB analytical store schema constraints.
Give feedback
If you would like to give feedback on current limitations, features, or issues; let us know
at [email protected].
Related content
Mirroring Azure Cosmos DB (Preview)
FAQ: Microsoft Fabric mirrored databases from Azure Cosmos DB
Troubleshooting: Microsoft Fabric mirrored databases from Azure Cosmos DB
Feedback
Was this page helpful? Yes No
) Important
Here's a list of common issues and relevant troubleshooting steps to follow if mirroring
an Azure Cosmos DB database to Microsoft Fabric isn't working as expected.
Once the continuous backup feature is enabled, return to the Fabric mirroring setup and
continue with the remaining steps.
If there's an error message when enabling continuous backup for an Azure Cosmos DB
account, the account might have limitations blocking the feature. For example, if you
previously deleted analytical store for the account, the account can't support continuous
backup. In this scenario, the only remaining option is to use a new Azure Cosmos DB
account for mirroring.
Refresh the Fabric portal and determine if the problem is automatically resolved. Also,
you can stop and start replication. If none of these options work, open a support ticket.
In most cases, the Monitor replication option can provide further detail indicating
whether data is replicating to Fabric successfully. A common troubleshooting step is to
check if the last refreshed time is recent. If the time isn't recent, stop and then restart
replication as the next step. Note, "last refreshed time" is only updated if the source
database has changes since the time noted for replication. If the source database has no
updates, deletes or inserts, "last refreshed time" will not be updated.
If the data is still not available, use Lakehouse to create a shortcut and run a Spark query
from a notebook. Spark always shows the latest data. If the data is available in Spark but
not SQL analytics, open a support ticket.
If the data is also not available in Spark, there might be an unintended issue with
replication latency. Wait for some time and retry replication. If problems persist, open a
support ticket.
Related content
Overview of Microsoft Fabric mirrored databases from Azure Cosmos DB
FAQ: Microsoft Fabric mirrored databases from Azure Cosmos DB
Limitations: Microsoft Fabric mirrored databases from Azure Cosmos DB
Feedback
Was this page helpful? Yes No
In this guide, join two Azure Cosmos DB for NoSQL containers from separate databases
using Fabric mirroring.
You can join data from Cosmos DB with any other mirrored databases, warehouses, or
lakehouses within same Fabric workspace.
) Important
Prerequisites
An existing Azure Cosmos DB for NoSQL account.
If you don't have an Azure subscription, Try Azure Cosmos DB for NoSQL free .
If you have an existing Azure subscription, create a new Azure Cosmos DB for
NoSQL account.
An existing Fabric capacity. If you don't have an existing capacity, start a Fabric
trial.
The Azure Cosmos DB for NoSQL account must be configured for Fabric mirroring.
For more information, see account requirements.
Tip
During the public preview, it's recommended to use a test or development copy of
your existing Azure Cosmos DB data that can be recovered quickly from a backup.
5. Wait for replication to finish the initial snapshot of data for both mirrors.
3. In the menu, select + Warehouses. Select the SQL analytics endpoint item for the
other mirrored database.
4. Open the context menu for the table and select New SQL Query. Write an example
query that combines both databases.
For example, this query would execute across multiple containers and databases,
without any data movement. This example assumes the name of your table and
columns. Use your own table and columns when writing your SQL query.
SQL
SELECT
product_category_count = COUNT (product_category),
product_category
FROM
[StoreSalesDB].[dbo].[storeorders_Sql] as StoreSales
INNER JOIN
[dbo].[OrdersDB_order_status] as OrderStatus
ON StoreSales.order_id = OrderStatus.order_id
WHERE
order_status='delivered'
AND OrderStatus.order_month_year > '6/1/2022'
GROUP BY
product_category
ORDER BY
product_category_count desc
You can add data from more sources and query them seamlessly. Fabric simplifies
and eases bringing your organizational data together.
Related content
FAQ: Microsoft Fabric mirrored databases from Azure Cosmos DB
Troubleshooting: Microsoft Fabric mirrored databases from Azure Cosmos DB
Limitations in Microsoft Fabric mirrored databases from Azure Cosmos DB
(Preview)
Feedback
Was this page helpful? Yes No
Use the mirrored database in Microsoft Fabric to query nested JSON data sourced from
Azure Cosmos DB for NoSQL.
) Important
Prerequisites
An existing Azure Cosmos DB for NoSQL account.
If you don't have an Azure subscription, Try Azure Cosmos DB for NoSQL free .
If you have an existing Azure subscription, create a new Azure Cosmos DB for
NoSQL account.
An existing Fabric capacity. If you don't have an existing capacity, start a Fabric
trial.
The Azure Cosmos DB for NoSQL account must be configured for Fabric mirroring.
For more information, see account requirements.
Tip
During the public preview, it's recommended to use a test or development copy of
your existing Azure Cosmos DB data that can be recovered quickly from a backup.
3. Use + New container to create a new container. For this guide, name the container
TestC . The corresponding database name is arbitrary.
4. Use the + New item option multiple times to create and save these five JSON
items.
JSON
{
"id": "123-abc-xyz",
"name": "A 13",
"country": "USA",
"items": [
{
"purchased": "11/23/2022",
"order_id": "3432-2333-2234-3434",
"item_description": "item1"
},
{
"purchased": "01/20/2023",
"order_id": "3431-3454-1231-8080",
"item_description": "item2"
},
{
"purchased": "02/20/2023",
"order_id": "2322-2435-4354-2324",
"item_description": "item3"
}
]
}
JSON
{
"id": "343-abc-def",
"name": "B 22",
"country": "USA",
"items": [
{
"purchased": "01/20/2023",
"order_id": "2431-2322-1545-2322",
"item_description": "book1"
},
{
"purchased": "01/21/2023",
"order_id": "3498-3433-2322-2320",
"item_description": "book2"
},
{
"purchased": "01/24/2023",
"order_id": "9794-8858-7578-9899",
"item_description": "book3"
}
]
}
JSON
{
"id": "232-abc-x43",
"name": "C 13",
"country": "USA",
"items": [
{
"purchased": "04/03/2023",
"order_id": "9982-2322-4545-3546",
"item_description": "clothing1"
},
{
"purchased": "05/20/2023",
"order_id": "7989-9989-8688-3446",
"item_description": "clothing2"
},
{
"purchased": "05/27/2023",
"order_id": "9898-2322-1134-2322",
"item_description": "clothing3"
}
]
}
JSON
{
"id": "677-abc-yuu",
"name": "D 78",
"country": "USA"
}
JSON
{
"id": "979-abc-dfd",
"name": "E 45",
"country": "USA"
}
Setup mirroring and prerequisites
Configure mirroring for the Azure Cosmos DB for NoSQL database. If you're unsure how
to configure mirroring, refer to the configure mirrored database tutorial.
2. Create a new connection and mirrored database using your Azure Cosmos DB
account's credentials.
3. Open the context menu for the test table and select New SQL Query.
4. Run this query to expand on the items array with OPENJSON . This query uses OUTER
APPLY to include extra items that might not have an items array.
SQL
SELECT
t.name,
t.id,
t.country,
P.purchased,
P.order_id,
P.item_description
FROM OrdersDB_TestC AS t
OUTER APPLY OPENJSON(t.items) WITH
(
purchased datetime '$.purchased',
order_id varchar(100) '$.order_id',
item_description varchar(200) '$.item_description'
) as P
Tip
When choosing the data types in OPENJSON , using varchar(max) for string
types could worsen query performance. Instead, use varchar(n) wher n could
be any number. The lower n is, the more likely you will see better query
performance.
5. Use CROSS APPLY in the next query to only show items with an items array.
SQL
SELECT
t.name,
t.id,
t.country,
P.purchased,
P.order_id,
P.item_description
FROM
OrdersDB_TestC as t CROSS APPLY OPENJSON(t.items) WITH (
purchased datetime '$.purchased',
order_id varchar(100) '$.order_id',
item_description varchar(200) '$.item_description'
) as P
3. Use + New container to create a new container. For this guide, name the container
TestD . The corresponding database name is arbitrary.
4. Use the + New item option multiple times to create and Save this JSON item.
JSON
{
"id": "eadca09b-e618-4090-a25d-b424a26c2361",
"entityType": "Package",
"packages": [
{
"packageid": "fiwewsb-f342-jofd-a231-c2321",
"storageTemperature": "69",
"highValue": true,
"items": [
{
"id": "1",
"name": "Item1",
"properties": {
"weight": "2",
"isFragile": "no"
}
},
{
"id": "2",
"name": "Item2",
"properties": {
"weight": "4",
"isFragile": "yes"
}
}
]
},
{
"packageid": "d24343-dfdw-retd-x414-f34345",
"storageTemperature": "78",
"highValue": false,
"items": [
{
"id": "3",
"name": "Item3",
"properties": {
"weight": "12",
"isFragile": "no"
}
},
{
"id": "4",
"name": "Item4",
"properties": {
"weight": "12",
"isFragile": "no"
}
}
]
}
],
"consignment": {
"consignmentId": "ae21ebc2-8cfc-4566-bf07-b71cdfb37fb2",
"customer": "Humongous Insurance",
"deliveryDueDate": "2020-11-08T23:38:50.875258Z"
}
}
1. Open the context menu for the TestD table and select New SQL Query again.
2. Run this query to expand all levels of nested data using OUTER APPLY with
consignment.
SQL
SELECT
P.id,
R.packageId,
R.storageTemperature,
R.highValue,
G.id,
G.name,
H.weight,
H.isFragile,
Q.consignmentId,
Q.customer,
Q.deliveryDueDate
FROM
OrdersDB_TestD as P CROSS APPLY OPENJSON(P.packages) WITH (
packageId varchar(100) '$.packageid',
storageTemperature INT '$.storageTemperature',
highValue varchar(100) '$.highValue',
items nvarchar(MAX) AS JSON ) as R
OUTER APPLY OPENJSON (R.items) WITH (
id varchar(100) '$.id',
name varchar(100) '$.name',
properties nvarchar(MAX) as JSON
) as G OUTER APPLY OPENJSON(G.properties) WITH (
weight INT '$.weight',
isFragile varchar(100) '$.isFragile'
) as H OUTER APPLY OPENJSON(P.consignment) WITH (
consignmentId varchar(200) '$.consignmentId',
customer varchar(100) '$.customer',
deliveryDueDate Date '$.deliveryDueDate'
) as Q
7 Note
When expanding packages , items is represented as JSON, which can
optionally expand. The items property has sub-properties as JSOn which also
can optionally expand.
3. Finally, run a query that chooses when to expand specific levels of nesting.
SQL
SELECT
P.id,
R.packageId,
R.storageTemperature,
R.highValue,
R.items,
Q.consignmentId,
Q.customer,
Q.deliveryDueDate
FROM
OrdersDB_TestD as P CROSS APPLY OPENJSON(P.packages) WITH (
packageId varchar(100) '$.packageid',
storageTemperature INT '$.storageTemperature',
highValue varchar(100) '$.highValue',
items nvarchar(MAX) AS JSON
) as R
OUTER APPLY OPENJSON(P.consignment) WITH (
consignmentId varchar(200) '$.consignmentId',
customer varchar(100) '$.customer',
deliveryDueDate Date '$.deliveryDueDate'
) as Q
7 Note
Property limits for nested levels are not enforced in this T-SQL query
experience.
Related content
FAQ: Microsoft Fabric mirrored databases from Azure Cosmos DB
Troubleshooting: Microsoft Fabric mirrored databases from Azure Cosmos DB
Limitations in Microsoft Fabric mirrored databases from Azure Cosmos DB
(Preview)
Feedback
Was this page helpful? Yes No
In this guide, you learn how to Access mirrored Azure Cosmos DB data in Lakehouse
and notebooks from Microsoft Fabric (Preview).
) Important
Prerequisites
An existing Azure Cosmos DB for NoSQL account.
If you don't have an Azure subscription, Try Azure Cosmos DB for NoSQL free .
If you have an existing Azure subscription, create a new Azure Cosmos DB for
NoSQL account.
An existing Fabric capacity. If you don't have an existing capacity, start a Fabric
trial.
The Azure Cosmos DB for NoSQL account must be configured for Fabric mirroring.
For more information, see account requirements.
Tip
During the public preview, it's recommended to use a test or development copy of
your existing Azure Cosmos DB data that can be recovered quickly from a backup.
3. Select Create, locate the Data Engineering section, and then select Lakehouse.
5. Now select Get Data, and then New shortcut. From the list of shortcut options,
select Microsoft OneLake.
6. Select the mirrored Azure Cosmos DB for NoSQL database from the list of mirrored
databases in your Fabric workspace. Select the tables to use with Lakehouse, select
Next, and then select Create.
7. Open the context menu for the table in Lakehouse and select New or existing
notebook.
8. A new notebook automatically opens and loads a dataframe using SELECT LIMIT
1000 .
Python
7 Note
This example assumes the name of your table. Use your own table when
writing your Spark query.
Python
Tip
The table names in these sample code blocks assume a certain data schema.
Feel free to replace this with your own table and column names.
Python
dfCDB =
dfMirror.filter(dfMirror.categoryId.isNotNull()).groupBy("categoryId").
agg(max("price").alias("max_price"), max("id").alias("id"))
4. Next, configure Spark to write back to your Azure Cosmos DB for NoSQL account
using your credentials, database name, and container name.
Python
writeConfig = {
"spark.cosmos.accountEndpoint" :
"https://xxxx.documents.azure.com:443/",
"spark.cosmos.accountKey" : "xxxx",
"spark.cosmos.database" : "xxxx",
"spark.cosmos.container" : "xxxx"
}
Python
dfCDB.write.mode("APPEND").format("cosmos.oltp").options(**writeConfig)
.save()
) Important
Related content
FAQ: Microsoft Fabric mirrored databases from Azure Cosmos DB
Troubleshooting: Microsoft Fabric mirrored databases from Azure Cosmos DB
Limitations in Microsoft Fabric mirrored databases from Azure Cosmos DB
(Preview)
Feedback
Was this page helpful? Yes No
Provide product feedback | Ask the community
Mirroring Azure Databricks Unity
Catalog (Preview)
Article • 11/20/2024
Many organizations today register their data in Unity Catalog within Azure Databricks. A
mirrored Unity Catalog in Fabric enables customer to read data managed by Unity
Catalog from Fabric workloads. Azure Databricks and Fabric are better together.
For a tutorial on configuring your Azure Databricks Workspace for mirroring the Unity
Catalog into Fabric, see Tutorial: Configure Microsoft Fabric mirrored databases from
Azure Databricks (Preview).
Mirrored databases in Fabric allow users to enjoy a highly integrated, end-to-end, and
easy-to-use product that is designed to simplify your analytics needs. You can enjoy an
easy-to-use product designed to simplify your analytics needs and built for openness
and collaboration between Microsoft Fabric and Azure Databricks.
When you use Fabric to read data that is registered in Unity Catalog, there is no data
movement or data replication. Only the Azure Databricks catalog structure is mirrored to
Fabric and the underlying catalog data is accessed through shortcuts. Hence any
changes in data are reflected immediately in Fabric.
When you mirror an Azure Databricks Unity Catalog, Fabric creates three items:
You can access your mirrored Azure Databricks data multiple ways:
Each mirrored Azure Databricks item has an autogenerated SQL analytics endpoint
that provides a rich analytical experience created by the mirroring process. Use T-
SQL commands to define and query data objects from the read-only SQL analytics
endpoint.
Use Power BI with Direct Lake mode to create reports against the Azure Databricks
item.
Metadata sync
When you create a new mirrored database from Azure Databricks in Fabric, by default,
the Automatically sync future catalog changes for the selected schema is enabled. The
following metadata changes are reflected from your Azure Databricks workspace to
Fabric if automatic sync is enabled:
Schema/table selection:
By default, the entire catalog is selected when the user adds the catalog.
The user can exclude certain tables within the schema.
Unselecting a schema unselects all the tables within the schema.
If the user goes back and selects the schema, all tables within the schema are
selected again.
Same selection behavior applies to schemas within a catalog.
Related content
Tutorial: Configure Microsoft Fabric mirrored databases from Azure Databricks
(Preview)
Secure Fabric mirrored databases from Azure Databricks
Limitations in Microsoft Fabric mirrored databases from Azure Databricks (Preview)
Review the FAQ
Feedback
Was this page helpful? Yes No
Prerequisites
Create or use an existing Azure Databricks workspace with Unity Catalog enabled.
You must have the EXTERNAL USE SCHEMA privilege on the schema in Unity Catalog
that contains the tables that will be accessed from Fabric. For more information,
see Control external access to data in Unity Catalog.
Turn on the tenant setting "Mirrored Azure Databricks Catalog (Preview)" at the
tenant or capacity level for this feature.
You need to use Fabric's permissions model to set access controls for catalogs,
schemas, and tables in Fabric.
Azure Databricks workspaces shouldn't be behind a private endpoint.
Storage accounts containing Unity Catalog data can't be behind a firewall.
1. Navigate to https://powerbi.com .
If you don't have an existing connection, create a new connection and enter
all the details. You can authenticate to your Azure Databricks workspace
using 'Organizational account' or "Service principal". To create a connection,
you must be either a user or an admin of the Azure Databricks workspace.
4. Once you connect to an Azure Databricks workspace, on the Choose tables from a
Databricks catalog page, you're able to select the catalog, schemas, and tables via
the inclusion/exclusion list that you want to add and access from Microsoft Fabric.
Pick the catalog and its related schemas and tables that you want to add to your
Fabric workspace.
You can only see the catalogs/schemas/tables that you have access to as per
the privileges that are granted to them as per the privilege model described
at Unity Catalog privileges and securable objects.
By default, the Automatically sync future catalog changes for the selected
schema is enabled. For more information, see Mirroring Azure Databricks
Unity Catalog (Preview).
When you have made your selections, select Next.
5. By default, the name of the item will be the name of the catalog you're trying to
add to Fabric. On the Review and create page, you can review the details and
optionally change the mirrored database item name, which must be unique in your
workspace. Select Create.
7. You can also see a preview of the data when you access a shortcut by selecting the
SQL analytics endpoint. Open the SQL analytics endpoint item to launch the
Explorer and Query editor page. You can query your mirrored Azure Databricks
tables with T-SQL in the SQL Editor.
1. First, we create a lakehouse. If you already have a lakehouse in this workspace, you
can use an existing lakehouse.
a. Select your workspace in the navigation menu.
b. Select + New > Lakehouse.
c. Provide a name for your lakehouse in the Name field, and select Create.
2. In the Explorer view of your lakehouse, in the Get data in your lakehouse menu,
under Load data in your lakehouse, select the New shortcut button.
3. Select Microsoft OneLake. Select a catalog. This is the data item that you created
in the previous steps. Then select Next.
4. Select tables within the schema, and select Next.
5. Select Create.
6. Shortcuts are now available in your Lakehouse to use with your other Lakehouse
data. You can also use Notebooks and Spark to perform data processing on the
data for these catalog tables that you added from your Azure Databricks
workspace.
Tip
For the best experience, it's recommended that you use Microsoft Edge Browser for
Semantic Modeling Tasks.
In addition to the default Power BI semantic model, you have the option of updating the
default Power BI semantic model if you choose to add/remove tables from the model or
create a new Semantic Model. To update the Default semantic model:
Related content
Secure Fabric mirrored databases from Azure Databricks
Feedback
Was this page helpful? Yes No
This article helps you establish data security in your mirrored Azure Databricks in
Microsoft Fabric.
Unity Catalog
Users must reconfigure Unity Catalog policies and permissions in Fabric.
To allow Azure Databricks Catalogs to be available in Fabric, see Control external access
to data in Unity Catalog.
Unity Catalog policies and permission aren't mirrored in Fabric. Users can't reuse Unity
Catalog policies and permissions in Fabric. Permissions set on catalogs, schemas, and
tables inside Azure Databricks doesn't carry over to Fabrics workspaces. You need to use
Fabric's permission model to set access control on objects in Fabric.
The credential used to create the connection to Unity Catalog of this catalog mirroring is
used for all data queries.
Permissions
Permissions set on catalogs, schemas, and tables in your Azure Databricks workspace
can't be replicated to your Fabric workspace. Use Fabric's permissions model to set
access controls for catalogs, schemas, and tables in Fabric.
When selecting objects to mirror, you can only see the catalogs/schemas/tables that you
have access to as per the privileges that are granted to them as per the privilege model
described at Unity Catalog privileges and securable objects.
For more information on setting up Fabric Workspace security, see the Permission model
and Roles in workspaces in Microsoft Fabric.
Related content
Tutorial: Configure Microsoft Fabric mirrored databases from Azure Databricks
(Preview)
Limitations in Microsoft Fabric mirrored databases from Azure Databricks (Preview)
Review the FAQ
Mirroring Azure Databricks Unity Catalog (Preview)
Feedback
Was this page helpful? Yes No
This article lists current limitations with mirrored Azure Databricks in Microsoft Fabric.
Network
Azure Databricks workspaces shouldn't be behind a private endpoint.
Storage accounts containing unity catalog data can't be behind a firewall.
Azure Databricks IP Access lists aren't supported.
Limitations
Mirrored Azure Databricks item doesn't support renaming schema, table, or both
when added to the inclusion or exclusion list.
Azure Databricks workspaces shouldn't be behind a private endpoint.
Azure Data Lake Storage Gen 2 account that is utilized by your Azure Databricks
workspace must also be accessible to Fabric.
Supported regions
Here's a list of regions that support mirroring for Azure Databricks Catalog:
Asia Pacific:
Australia East
Australia Southeast
Central India
East Asia
Japan East
Japan West
Korea Central
Southeast Asia
South India
Europe
North Europe
West Europe
France Central
Germany North
Germany West Central
Norway East
Norway West
Sweden Central
Switzerland North
Switzerland West
Poland Central
Italy North
UK South
UK West
Americas:
Brazil South
Canada Central
Canada East
Central US
East US
East US2
North Central US
West US
Feedback
Was this page helpful? Yes No
This article answers frequently asked questions about mirrored databases from Azure
Databricks (preview) in Microsoft Fabric.
Related content
What is Mirroring in Fabric?
Mirroring Azure Databricks (Preview) Tutorial
Secure Fabric mirrored databases from Azure Databricks
Limitations
Mirroring Azure Databricks Unity Catalog (Preview)
Feedback
Was this page helpful? Yes No
Mirroring in Fabric provides an easy experience to avoid complex ETL (Extract Transform
Load) and integrate your existing Snowflake warehouse data with the rest of your data
in Microsoft Fabric. You can continuously replicate your existing Snowflake data directly
into Fabric's OneLake. Inside Fabric, you can unlock powerful business intelligence,
artificial intelligence, Data Engineering, Data Science, and data sharing scenarios.
For a tutorial on configuring your Snowflake database for Mirroring in Fabric, see
Tutorial: Configure Microsoft Fabric mirrored databases from Snowflake.
Each mirrored database has an autogenerated SQL analytics endpoint that provides a
rich analytical experience on top of the Delta Tables created by the mirroring process.
Users have access to familiar T-SQL commands that can define and query data objects
but not manipulate the data from the SQL analytics endpoint, as it's a read-only copy.
You can perform the following actions in the SQL analytics endpoint:
Explore the tables that reference data in your Delta Lake tables from Snowflake.
Create no code queries and views and explore data visually without writing a line
of code.
Develop SQL views, inline TVFs (Table-valued Functions), and stored procedures to
encapsulate your semantics and business logic in T-SQL.
Manage permissions on the objects.
Query data in other Warehouses and Lakehouses in the same workspace.
In addition to the SQL query editor, there's a broad ecosystem of tooling that can query
the SQL analytics endpoint, including SQL Server Management Studio (SSMS), the mssql
extension with Visual Studio Code, and even GitHub Copilot.
Security considerations
To enable Fabric mirroring, you will need user permissions for your Snowflake database
that contains the following permissions:
CREATE STREAM
SELECT table
SHOW tables
DESCRIBE tables
For more information, see Snowflake documentation on Access Control Privileges for
Streaming tables and Required Permissions for Streams .
) Important
Any granular security established in the source Snowflake warehouse must be re-
configured in the mirrored database in Microsoft Fabric. For more information, see
SQL granular permissions in Microsoft Fabric.
There are Snowflake compute and cloud query costs when data is being mirrored: virtual
warehouse compute and cloud services compute.
In the following screenshot, you can see the virtual warehouse compute and cloud
services compute costs for the associated Snowflake database that is being mirrored
into Fabric. In this scenario, majority of the cloud services compute costs (in yellow) are
coming from data change queries based on the points mentioned previously. The virtual
warehouse compute charges (in blue) are coming strictly from the data changes are
being read from Snowflake and mirrored into Fabric.
For more information of Snowflake specific cloud query costs, see Snowflake docs:
Understanding overall cost .
Next step
Tutorial: Configure Microsoft Fabric mirrored databases from Snowflake
Related content
How to: Secure data Microsoft Fabric mirrored databases from Snowflake
Model data in the default Power BI semantic model in Microsoft Fabric
Monitor Fabric mirrored database replication
Feedback
Was this page helpful? Yes No
In this example, you will learn how to configure a secure connection to your Snowflake
data source(s) along with other helpful information to get you acquainted with and
proficient with the concepts of Mirroring in Microsoft Fabric.
7 Note
While this example is specific to Snowflake, you can find detailed steps to configure
Mirroring for other data sources, like Azure SQL Database or Azure Cosmos DB. For
more information, see What is Mirroring in Fabric?
Prerequisites
Create or use an existing Snowflake warehouse. You can connect to any version of
Snowflake instance in any cloud, including Microsoft Azure.
You need an existing Fabric capacity. If you don't, start a Fabric trial.
You will need user permissions for your Snowflake database that contains the
following permissions. For more information, see Snowflake documentation on
Access Control Privileges for Streaming tables and Required Permissions for
Streams .
CREATE STREAM
SELECT table
SHOW tables
DESCRIBE tables
The user needs to have at least one role assigned that allows access to the
Snowflake database.
You can use an existing workspace (not My Workspace) or create a new workspace.
1. From your workspace, navigate to the Create hub.
2. After you have selected the workspace that you would like to use, select Create.
3. Scroll down and select the Mirrored Snowflake card.
4. Enter the name for the new database.
5. Select Create.
7 Note
You might need to alter the firewall cloud to allow Mirroring to connect to the
Snowflake instance.
2. If you selected "New connection", enter the connection details to the Snowflake
database.
ノ Expand table
Connection Description
setting
Server You can find your server name by navigating to the accounts on the
resource menu in Snowflake. Hover your mouse over the account name,
you can copy the server name to the clipboard. Remove the https://
from the server name.
Warehouse From the Warehouses section from the resource menu in Snowflake,
select Warehouses. The warehouse is the Snowflake Warehouse
(Compute) and not the database.
Connection Should be automatically filled out. Change it to a name that you would
name like to use.
Authentication Snowflake
kind
Username Your Snowflake username that you created to sign into Snowflake.com.
Password Your Snowflake password that you created when you created your login
information into Snowflake.com.
3. Select database from dropdown list.
Mirror all data means that any new tables created after Mirroring is started
will be mirrored.
Optionally, choose only certain objects to mirror. Disable the Mirror all data
option, then select individual tables from your database.
3. Wait for 2-5 minutes. Then, select Monitor replication to see the status.
4. After a few minutes, the status should change to Running, which means the tables
are being synchronized.
If you don't see the tables and the corresponding replication status, wait a few
seconds and then refresh the panel.
5. When they have finished the initial copying of the tables, a date appears in the
Last refresh column.
6. Now that your data is up and running, there are various analytics scenarios
available across all of Fabric.
) Important
For more information and details on the replication states, see Monitor Fabric mirrored
database replication.
) Important
If there are no updates in the source tables, the replicator engine will start to back
off with an exponentially increasing duration, up to an hour. The replicator engine
will automatically resume regular polling after updated data is detected.
Related content
Mirroring Snowflake
What is Mirroring in Fabric?
Feedback
Was this page helpful? Yes No
This article answers frequently asked questions about Mirroring Snowflake in Microsoft
Fabric.
Cost efficiency
What should I do to avoid or reduce Snowflake
costs?
Implement Snowflake budgets, use limits on credits, or use dedicated a smaller
Snowflake instance based on requirements.
How are ingress fees handled?
Fabric doesn't charge for Ingress fees into OneLake for Mirroring.
Performance
How long does the initial replication take?
It depends on the size of the data that is being brought in.
Data governance
Is data ever leaving the customers Fabric tenant?
No.
Licensing
What are licensing options for Fabric Mirroring?
A Power BI Premium, Fabric Capacity, or Trial Capacity is required. For more information
on licensing, see Microsoft Fabric licenses.
Related content
What is Mirroring in Fabric?
Snowflake connector overview
Feedback
Was this page helpful? Yes No
This guide helps you establish data security in your mirrored Snowflake in Microsoft
Fabric.
Security considerations
To enable Fabric mirroring, you will need user permissions for your Snowflake database
that contains the following permissions:
CREATE STREAM
SELECT table
SHOW tables
DESCRIBE tables
For more information, see Snowflake documentation on Access Control Privileges for
Streaming tables and Required Permissions for Streams .
) Important
Any granular security established in the source Snowflake database must be re-
configured in the mirrored database in Microsoft Fabric. For more information, see
SQL granular permissions in Microsoft Fabric.
You can also mask sensitive data from non-admins using dynamic data masking:
Feedback
Was this page helpful? Yes No
Current limitations in the Microsoft Fabric mirrored databases from Snowflake are listed
in this page. This page is subject to change.
Security
Snowflake authentication only via username/password is supported.
Sharing recipients must be added to the workspace. To share a dataset or report,
first add access to the workspace with a role of admin, member, reader, or
contributor.
Performance
If you're changing most the data in a large table, it's more efficient to stop and
restart Mirroring. Inserting or updating billions of records can take a long time.
Some schema changes are not reflected immediately. Some schema changes need
a data change (insert/update/delete) before schema changes are replicated to
Fabric.
Related content
What is Mirroring in Fabric?
Mirroring Snowflake
Tutorial: Configure Microsoft Fabric mirrored databases from Snowflake
Feedback
Was this page helpful? Yes No
Mirroring in Fabric provides an easy experience to avoid complex ETL (Extract Transform
Load) and integrate your existing data into OneLake with the rest of your data in
Microsoft Fabric. You can continuously replicate your existing data directly into Fabric's
OneLake. Inside Fabric, you can unlock powerful business intelligence, artificial
intelligence, Data Engineering, Data Science, and data sharing scenarios.
Open mirroring enables any application to write change data directly into a mirrored
database in Fabric. Open mirroring is designed to be extensible, customizable, and
open. It's a powerful feature that extends mirroring in Fabric based on open Delta Lake
table format.
Once the data lands in OneLake in Fabric, open mirroring simplifies the handling of
complex data changes, ensuring that all mirrored data is continuously up-to-date and
ready for analysis.
) Important
For a tutorial on configuring your open mirrored database in Fabric, see Tutorial:
Configure Microsoft Fabric open mirrored databases.
Use your own application to write data into the open mirroring landing zone per
the open mirroring landing zone requirements and formats.
Use one of our existing open mirroring partners to help you ingest data.
The mirrored database item. Mirroring manages the replication of data into
OneLake and conversion into Delta Parquet format, and manages the complexity
of the changes, in an analytics-ready format. This enables downstream scenarios
like data engineering, data science, and more.
A SQL analytics endpoint
A default semantic model
Each open mirrored database has an autogenerated SQL analytics endpoint that
provides a rich analytical experience on top of the Delta Tables created by the mirroring
process. Users have access to familiar T-SQL commands that can define and query data
objects but not manipulate the data from the SQL analytics endpoint, as it's a read-only
copy. You can perform the following actions in the SQL analytics endpoint:
Explore the tables that reference data in your Delta Lake tables.
Create no code queries and views and explore data visually without writing a line
of code.
Develop SQL views, inline TVFs (Table-valued Functions), and stored procedures to
encapsulate your semantics and business logic in T-SQL.
Manage permissions on the objects.
Query data in other Warehouses and Lakehouses in the same workspace.
In addition to the SQL query editor, there's a broad ecosystem of tooling that can query
the SQL analytics endpoint, including SQL Server Management Studio (SSMS), the mssql
extension with Visual Studio Code, and even GitHub Copilot.
In addition, the compute needed to manage the complexity of change data is free and it
doesn't consume capacity. Requests to OneLake as part of the mirroring process
consume capacity like normal with OneLake compute consumption.
Next step
Tutorial: Configure Microsoft Fabric open mirrored databases
Related content
Monitor Fabric mirrored database replication
Feedback
Was this page helpful? Yes No
In this tutorial, you configure an Open mirrored database in Fabric. This example guides
you to create a new open mirrored database and learn how to land data into the
landing zone. You'll get proficient with the concepts of open mirroring in Microsoft
Fabric.
) Important
Prerequisites
You need an existing capacity for Fabric. If you don't, start a Fabric trial.
The Fabric capacity needs to be active and running. A paused or deleted
capacity will affect Mirroring and no data will be replicated.
During the current preview, the ability to create an open mirrored database via the
Fabric portal is not available in all Fabric capacity regions.
Mirror all data means that any new tables created after Mirroring is started
will be mirrored.
Optionally, choose only certain objects to mirror. Disable the Mirror all data
option, then select individual tables from your database. For this tutorial, we
select the Mirror all data option.
For more information and details on the replication states, see Monitor Fabric mirrored
database replication.
Related content
Connecting to Microsoft OneLake
Open mirroring landing zone requirements and format
Feedback
Was this page helpful? Yes No
This article details the landing zone and table/column operation requirements for open
mirroring in Microsoft Fabric.
) Important
Once you have created your open mirrored database via the Fabric portal or public API
in your Fabric workspace, you get a landing zone URL in OneLake in the Home page of
your mirrored database item. This landing zone is where your application to create a
metadata file and land data in Parquet format (uncompressed, Snappy, GZIP, ZSTD).
Landing zone
For every mirrored database, there is a unique storage location in OneLake for metadata
and delta tables. Open mirroring provides a landing zone folder for application to create
a metadata file and push data into OneLake. Mirroring monitors these files in the
landing zone and read the folder for new tables and data added.
For example, if you have tables ( Table A , Table B , Table C ) to be created in the landing
zone, create folders like the following URLs:
id>/LandingZone/TableA
https://onelake.dfs.fabric.microsoft.com/<workspace id>/<mirrored database
id>/LandingZone/TableB
https://onelake.dfs.fabric.microsoft.com/<workspace id>/<mirrored database
id>/LandingZone/TableC
This table metadata file contains a JSON record to currently specify only the unique key
columns as keyColumns .
For example, to declare columns C1 and C2 as a compound unique key for the table:
JSON
All the Parquet files written to the landing zone have the following format:
<RowMarker><DataColumns>
after rowMarker ).
RowMaker values:
0 for INSERT
1 for UPDATE
2 for DELETE
4 for UPSERT
Row order: All the logs in the file should be in natural order as applied in
transaction. This is important for the same row being updated multiple times.
Open mirroring applies the changes using the order in the files.
File order: Files should be added in monotonically increasing numbers.
File name: File name is 20 digits, like 00000000000000000001.parquet for the first file,
and 00000000000000000002.parquet for the second. File names should be in
continuous numbers. Files will be deleted by the mirroring service automatically,
but the last file will be left so that the publisher system can reference it to add the
next file in sequence.
Initial load
For the initial load of data into an open mirrored database, all rows should have INSERT
as row marker. Without RowMarker data in a file, mirroring treats the entire file as an
INSERT.
Incremental changes
Open mirroring reads incremental changes in order and applies them to the target Delta
table. Order is implicit in the change log and in the order of the files.
Updated rows must contain the full row data, with all columns.
Here is some sample parquet data of the row history to change the EmployeeLocation
for EmployeeID E0001 from Redmond to Bellevue. In this scenario, the EmployeeID
column has been marked as a key column in the metadata file in the landing zone.
parquet
__rowMarker__,EmployeeID,EmployeeLocation
0,E0001,Redmond
0,E0002,Redmond
0,E0003,Redmond
1,E0001,Bellevue
If key columns are updated, then it should be presented by a DELETE on previous key
columns and an INSERT rows with new key and data. For example, the row history to
change the RowMarker unique identifier for EmployeeID E0001 to E0002. You don't need
to provide all column data for a DELETE row, only the key columns.
parquet
__rowMarker__,EmployeeID,EmployeeLocation
0,E0001,Bellevue
2,E0001,NULL
0,E0002,Bellevue
Table operations
Open mirroring supports table operations such as add, drop, and rename tables.
Add table
Open mirroring picks up any table added to landing zone by the application. Open
mirroring scans for new tables in every iteration.
Drop table
Open mirroring keeps track of the folder name. If a table folder is deleted, open
mirroring drops the table in the mirrored database.
If a folder is recreated, open mirroring drops the table and recreates it with the new data
in the folder, accomplished by tracking the ETag for the folder.
When attempting to drop a table, you can try deleting the folder, but there is a chance
that open mirroring is still using the data from the folder, causing a delete failure for
publisher.
Rename table
To rename a table, drop and recreate the folder with initial and incremental data. Data
will need to be repopulated to the renamed table.
Schema
A table path can be specified within a schema folder. A schema landing zone should
have a <schemaname>.schema folder name. There can be multiple schemas and there can
be multiple tables in a schema.
For example, if you have schemas ( Schema1 , Schema2 ) and tables ( Table A , Table B ,
Table C ) to be created in the landing zone, create folders like the following paths in
OneLake:
id>/LandingZone/Schema1.schema/TableA
https://onelake.dfs.fabric.microsoft.com/<workspace id>/<mirrored database
id>/LandingZone/Schema1.schema/TableB
Column types
Simple parquet types are supported in the landing zone.
Complex types should be written as a JSON string.
Binary complex types like geography, images, etc. can be stored as binary type in
the landing zone.
Add column
If new columns are added to the parquet files, open mirroring adds the columns to the
delta tables.
Delete column
If a column is dropped from the new log files, open mirroring stores NULL for those
columns in new rows, and old rows have the columns present in the data. To delete the
column, drop the table and create the table folder in the landing zone again, which will
result into recreation of the Delta table with new schema and data.
Open mirroring always unions all the columns from previous version of added data. To
remove a column, recreate the table/folder.
Rename column
To rename a column, delete the table folder and recreate the folder with all the data and
with the new column name.
Next step
Tutorial: Configure Microsoft Fabric open mirrored databases
Related content
Monitor Fabric mirrored database replication
Feedback
Was this page helpful? Yes No
This article answers frequently asked questions about open mirroring in Microsoft
Fabric.
) Important
Cost management
What are the costs associated with Mirroring?
See Open mirroring cost considerations.
Related content
Open mirroring landing zone requirements and format
Feedback
Was this page helpful? Yes No
The following are the open mirroring partners who have already built solutions to
integrate with Microsoft Fabric.
) Important
For more information, see Oracle GoldenGate 23ai integration into open mirroring in
Microsoft Fabric .
Striim
SQL2Fabric-Mirroring is a Striim solution that reads data from SQL Server and writes it
to Microsoft Fabric's mirroring landing zone in Delta-Parquet format. Microsoft's Fabric
replication service frequently picks up these files and replicates the file contents into
Fabric data warehouse tables.
For more information, see Striim integration into open mirroring in Microsoft Fabric .
MongoDB
MongoDB integrated with open mirroring for a solution to bring operational data from
MongoDB Atlas to Microsoft Fabric for Big data analytics, AI and BI, combining it with
the rest of the data estate of the enterprise. Once mirroring is enabled for a MongoDB
Atlas collection, the corresponding table in OneLake is kept in sync with the changes in
source MongoDB Atlas collection, unlocking opportunities of varied analytics and AI and
BI in near real-time.
For more information, see MongoDB integration into open mirroring in Microsoft
Fabric .
Related content
Tutorial: Configure Microsoft Fabric open mirrored databases
Feedback
Was this page helpful? Yes No
The public APIs for Fabric mirroring consist of two categories: (1) CRUD operations for
Fabric mirrored database item and (2) Start/stop and monitoring operations. The
primary online reference documentation for Microsoft Fabric REST APIs can be found in
Microsoft Fabric REST API references.
7 Note
These REST APIs don't apply to mirrored database from Azure Databricks.
Before you create mirrored database, the corresponding data source connection is
needed. If you don't have a connection yet, refer to create new connection using portal
and use that connection ID in the following definition. You can also refer to create new
connection REST API to create new connection using Fabric REST APIs.
Example:
ID>/mirroredDatabases
Body:
JSON
{
"displayName": "Mirrored database 1",
"description": "A mirrored database description",
"definition": {
"parts": [
{
"path": "mirroring.json",
"payload": "eyAicHJvcGVydGllcy..WJsZSIgfSB9IH0gXSB9IH0",
"payloadType": "InlineBase64"
}
]
}
}
The payload property in previous JSON body is Base64 encoded. You can use Base64
Encode and Decode to encode. The original JSON definition examples for different
types of sources follow:
If you want to replicate selective tables instead of all the tables in the specified
database, refer to JSON definition example of replicating specified tables.
) Important
To mirror data from Azure SQL Database or Azure SQL Managed Instance, you need
to also do the following before start mirroring:
1. Enable System Assigned Managed Identity (SAMI) of your Azure SQL logical
server or Azure SQL Managed Instance.
2. Grant the SAMI Read and Write permission to the mirrored database.
Currently you need to do this on the Fabric portal. Alternativley, you can grant
SAMI workspace role using Add Workspace Role Assignment API.
{
"properties": {
"source": {
"type": "Snowflake",
"typeProperties": {
"connection": "a0a0a0a0-bbbb-cccc-dddd-e1e1e1e1e1e1",
"database": "xxxx"
}
},
"target": {
"type": "MountedRelationalDatabase",
"typeProperties": {
"defaultSchema": "xxxx",
"format": "Delta"
}
}
}
}
{
"properties": {
"source": {
"type": "AzureSqlDatabase",
"typeProperties": {
"connection": "a0a0a0a0-bbbb-cccc-dddd-e1e1e1e1e1e1"
}
},
"target": {
"type": "MountedRelationalDatabase",
"typeProperties": {
"defaultSchema": "xxxx",
"format": "Delta"
}
}
}
}
{
"properties": {
"source": {
"type": "AzureSqlMI",
"typeProperties": {
"connection": "a0a0a0a0-bbbb-cccc-dddd-e1e1e1e1e1e1"
}
},
"target": {
"type": "MountedRelationalDatabase",
"typeProperties": {
"defaultSchema": "xxxx",
"format": "Delta"
}
}
}
}
{
"properties": {
"source": {
"type": "CosmosDb",
"typeProperties": {
"connection": "a0a0a0a0-bbbb-cccc-dddd-e1e1e1e1e1e1",
"database": "xxxx"
}
},
"target": {
"type": "MountedRelationalDatabase",
"typeProperties": {
"defaultSchema": "xxxx",
"format": "Delta"
}
}
}
}
{
"properties": {
"source": {
"type": "GenericMirror",
"typeProperties": {}
},
"target": {
"type": "MountedRelationalDatabase",
"typeProperties": {
"format": "Delta"
}
}
}
}
JSON
{
"properties": {
"source": {
"type": "Snowflake",
"typeProperties": {
"connection": "a0a0a0a0-bbbb-cccc-dddd-e1e1e1e1e1e1",
"database": "xxxx"
}
},
"target": {
"type": "MountedRelationalDatabase",
"typeProperties": {
"defaultSchema": "xxxx",
"format": "Delta"
}
},
"mountedTables": [
{
"source": {
"typeProperties": {
"schemaName": "xxxx",
"tableName": "xxxx"
}
}
}
]
}
}
Response 201:
JSON
{
"id": "<mirrored database ID>",
"type": "MirroredDatabase",
"displayName": "Mirrored database 1",
"description": "A mirrored database description",
"workspaceId": "<your workspace ID>"
}
Example:
Example:
Response 200:
JSON
{
"displayName": "Mirrored database 1",
"description": "A mirrored database description.",
"type": "MirroredDatabase",
"workspaceId": "<your workspace ID>",
"id": "<mirrored database ID>",
"properties": {
"oneLakeTablesPath": "https://onelake.dfs.fabric.microsoft.com/<your
workspace ID>/<mirrored database ID>/Tables",
"sqlEndpointProperties": {
"connectionString": "xxxx.xxxx.fabric.microsoft.com",
"id": "b1b1b1b1-cccc-dddd-eeee-f2f2f2f2f2f2",
"provisioningStatus": "Success"
},
"defaultSchema": "xxxx"
}
}
Example:
POST https://api.fabric.microsoft.com/v1/workspaces/<your workspace
ID>/mirroredDatabases/<mirrored database ID>/getDefinition
Response 200:
JSON
{
"definition": {
"parts":[
{
"path": "mirroring.json",
"payload": "eyAicHJvcGVydGllcy..WJsZSIgfSB9IH0gXSB9IH0",
"payloadType": "InlineBase64"
}
]
}
}
Example:
ID>/mirroredDatabases
Response 200:
JSON
{
"value": [
{
"displayName": "Mirrored database 1",
"description": "A mirrored database description.",
"type": "MirroredDatabase",
"workspaceId": "<your workspace ID>",
"id": "<mirrored database ID>",
"properties": {
"oneLakeTablesPath":
"https://onelake.dfs.fabric.microsoft.com/<your workspace ID>/<mirrored
database ID>/Tables",
"sqlEndpointProperties": {
"connectionString": "xxxx.xxxx.fabric.microsoft.com",
"id": "b1b1b1b1-cccc-dddd-eeee-f2f2f2f2f2f2",
"provisioningStatus": "Success"
},
"defaultSchema": "xxxx"
}
}
]
}
Example:
Body:
JSON
{
"displayName": "MirroredDatabase's New name",
"description": "A new description for mirrored database."
}
Response 200:
JSON
{
"displayName": "MirroredDatabase's New name",
"description": "A new description for mirrored database.",
"type": "MirroredDatabase",
"workspaceId": "<your workspace ID>",
"id": "<mirrored database ID>"
}
Example:
JSON
{
"definition": {
"parts": [
{
"path": "mirroring.json",
"payload": "eyAicHJvcGVydGllcy..WJsZSIgfSB9IH0gXSB9IH0",
"payloadType": "InlineBase64"
}
]
}
}
7 Note
This API returns the status of mirrored database instance. The list of available statuses
are provided at values of MirroringStatus.
Example:
Response 200:
JSON
{
"status": "Running"
}
Start mirroring
REST API - Mirroring - Start mirroring
Example:
7 Note
Mirroring can not be started when above Get mirroring status api returns
Initializing status.
If mirroring is started and Get mirroring status API returns Running status, this API
returns the status and metrics of tables replication.
Example:
Response 200:
JSON
{
"continuationToken": null,
"continuationUri": null,
"data": [
{
"sourceSchemaName": "dbo",
"sourceTableName": "test",
"status": "Replicating",
"metrics": {
"processedBytes": 1247,
"processedRows": 6,
"lastSyncDateTime": "2024-10-08T05:07:11.0663362Z"
}
}
]
}
Stop mirroring
REST API - Mirroring - Stop mirroring
Example:
7 Note
After stopping mirroring, you can call Get mirroring status api to query the
mirroring status.
Known limitations
Currently Service Principal/Managed Identity authentication is not supported if your
tenant home region is in North Central US or East US. You can use it in other regions.
Related content
REST API - Items
Feedback
Was this page helpful? Yes No
) Important
Resources
Review the troubleshooting section of frequently asked questions for each data source:
Troubleshoot Mirroring Azure SQL Database and FAQ about Mirroring Azure SQL
Database
Troubleshoot Mirroring Azure SQL Managed Instance and FAQ about Mirroring
Azure SQL Managed Instance
Troubleshoot Mirroring Azure Cosmos DB and FAQ about Mirroring Azure Cosmos
DB
Troubleshoot Mirroring Snowflake
FAQ about Mirroring Azure Databricks
Troubleshoot mirroring from Fabric SQL database (preview) and FAQ for Mirroring
Fabric SQL database (preview)
Stop replication
When you select Stop replication, OneLake files remain as is, but incremental replication
stops. You can restart the replication at any time by selecting Start replication. You
might want to do stop/start replication when resetting the state of replication, after
source database changes, or as a troubleshooting tool.
Troubleshoot
This section contains general Mirroring troubleshooting steps.
1. Check your connection details are correct, server name, database name, username,
and password.
2. Check the server is not behind a firewall or private virtual network. Open the
appropriate firewall ports.
Currently, views are not supported. Only replicating regular tables are supported.
1. Check the monitoring status to check the status of the tables. For more
information, see Monitor Fabric mirrored database replication.
2. Select the Configure replication button. Check to see if the tables are present in
the list of tables, or if any Alerts on each table detail are present.
The Fabric warehouse does not support VARCHAR(max) it only currently supports
VARCHAR(8000).
In the Monitoring page, the date shown is the last time data was successfully replicated.
I can't change the source database
Changing the source database is not supported. Create a new mirrored database.
ノ Expand table
"The tables count may There's a maximum of 500 tables. In the source database,
exceed the limit, there could drop or filter tables. If the
be some tables missing." new table is the 500th
table, no mitigation
required.
Related content
What is Mirroring in Fabric?
Monitor Fabric mirrored database replication
Feedback
Was this page helpful? Yes No
The Warehouse in Microsoft Fabric uses a query engine to create an execution plan for a
given SQL query. When you submit a query, the query optimizer tries to enumerate all
possible plans and choose the most efficient candidate. To determine which plan would
require the least overhead (I/O, CPU, memory), the engine needs to be able to evaluate
the amount of work or rows that might be processed at each operator. Then, based on
each plan's cost, it chooses the one with the least amount of estimated work. Statistics
are objects that contain relevant information about your data, to allow query optimizer
to estimate these costs.
User-defined statistics
The user manually uses data definition language (DDL) syntax to create, update,
and drop statistics as needed
Automatic statistics
Engine automatically creates and maintains statistics at querytime
SQL
SQL
SQL
SQL
SQL
The following T-SQL objects can also be used to check both manually created and
automatically created statistics in Microsoft Fabric:
SQL
SELECT <COLUMN_NAME>
FROM <YOUR_TABLE_NAME>
GROUP BY <COLUMN_NAME>;
In this case, you should expect that statistics for COLUMN_NAME to have been created. If
the column was also a varchar column, you would also see average column length
statistics created. If you'd like to validate statistics were automatically created, you can
run the following query:
SQL
select
object_name(s.object_id) AS [object_name],
c.name AS [column_name],
s.name AS [stats_name],
s.stats_id,
STATS_DATE(s.object_id, s.stats_id) AS [stats_update_date],
s.auto_created,
s.user_created,
s.stats_generation_method_desc
FROM sys.stats AS s
INNER JOIN sys.objects AS o
ON o.object_id = s.object_id
LEFT JOIN sys.stats_columns AS sc
ON s.object_id = sc.object_id
AND s.stats_id = sc.stats_id
LEFT JOIN sys.columns AS c
ON sc.object_id = c.object_id
AND c.column_id = sc.column_id
WHERE o.type = 'U' -- Only check for stats on user-tables
AND s.auto_created = 1
AND o.name = '<YOUR_TABLE_NAME>'
ORDER BY object_name, column_name;
Now, you can find the statistics_name of the automatically generated histogram
statistic (should be something like _WA_Sys_00000007_3B75D760 ) and run the following T-
SQL:
SQL
For example:
SQL
The Updated value in the result set of DBCC SHOW_STATISTICS should be a date (in UTC)
similar to when you ran the original GROUP BY query.
Histogram statistics
Created per column needing histogram statistics at querytime
These objects contain histogram and density information regarding the
distribution of a particular column. Similar to the statistics automatically created
at querytime in Azure Synapse Analytics dedicated pools.
Name begins with _WA_Sys_ .
Contents can be viewed with DBCC SHOW_STATISTICS
Average column length statistics
Created for variable character columns (varchar) greater than 100 needing
average column length at querytime.
These objects contain a value representing the average row size of the varchar
column at the time of statistics creation.
Name begins with ACE-AverageColumnLength_ .
Contents cannot be viewed and are nonactionable by user.
Table-based cardinality statistics
Created per table needing cardinality estimation at querytime.
These objects contain an estimate of the rowcount of a table.
Named ACE-Cardinality .
Contents cannot be viewed and are nonactionable by user.
Limitations
Only single-column histogram statistics can be manually created and modified.
Multi-column statistics creation is not supported.
Other statistics objects might appear in sys.stats, aside from manually created
statistics and automatically created statistics. These objects are not used for query
optimization.
Related content
Monitoring connections, sessions, and requests using DMVs
Feedback
Was this page helpful? Yes No
Retrieving data from the data lake is crucial input/output (IO) operation with substantial
implications for query performance. Fabric Data Warehouse employs refined access
patterns to enhance data reads from storage and elevate query execution speed.
Additionally, it intelligently minimizes the need for remote storage reads by leveraging
local caches.
There are two types of caches that are described later in this article:
In-memory cache
Disk cache
In-memory cache
As the query accesses and retrieves data from storage, it performs a transformation
process that transcodes the data from its original file-based format into highly
optimized structures in in-memory cache.
Data in cache is organized in a compressed columnar format optimized for analytical
queries. Each column of data is stored together, separate from the others, allowing for
better compression since similar data values are stored together, leading to reduced
memory footprint. When queries need to perform operations on a specific column like
aggregates or filtering, the engine can work more efficiently since it doesn't have to
process unnecessary data from other columns.
Additionally, this columnar storage is also conducive to parallel processing, which can
significantly speed up query execution for large datasets. The engine can perform
operations on multiple columns simultaneously, taking advantage of modern multi-core
processors.
This approach is especially beneficial for analytical workloads where queries involve
scanning large amounts of data to perform aggregations, filtering, and other data
manipulations.
Disk cache
Certain datasets are too large to be accommodated within an in-memory cache. To
sustain rapid query performance for these datasets, Warehouse utilizes disk space as a
complementary extension to the in-memory cache. Any information that is loaded into
the in-memory cache is also serialized to the SSD cache.
Given that the in-memory cache has a smaller capacity compared to the SSD cache, data
that is removed from the in-memory cache remains within the SSD cache for an
extended period. When subsequent query requests this data, it is retrieved from the SSD
cache into the in-memory cache at a significantly quicker rate than if fetched from
remote storage, ultimately providing you with more consistent query performance.
Cache management
Caching remains consistently active and operates seamlessly in the background,
requiring no intervention on your part. Disabling caching is not needed, as doing so
would inevitably lead to a noticeable deterioration in query performance.
The caching mechanism is orchestrated and upheld by the Microsoft Fabric itself, and it
doesn't offer users the capability to manually clear the cache.
Full cache transactional consistency ensures that any modifications to the data in
storage, such as through Data Manipulation Language (DML) operations, after it has
been initially loaded into the in-memory cache, will result in consistent data.
When the cache reaches its capacity threshold and fresh data is being read for the first
time, objects that have remained unused for the longest duration will be removed from
the cache. This process is enacted to create space for the influx of new data and
maintain an optimal cache utilization strategy.
Related content
Fabric Data Warehouse performance guidelines
Feedback
Was this page helpful? Yes No
The Warehouse in Microsoft Fabric storage uses the Delta Lake table format for all user
data. In addition to optimizations provided by the Delta format, a warehouse applies
optimizations to storage to provide faster query performance on analytics scenarios
while maintaining adherence to the Parquet format. This article covers V-Order write
optimization, its benefits, and how to control it.
What is V-Order?
V-Order is a write time optimization to the parquet file format that enables lightning-
fast reads under the Microsoft Fabric compute engines, such as Power BI, SQL, Spark,
and others.
Power BI and SQL engines make use of Microsoft Verti-Scan technology and V-Ordered
parquet files to achieve in-memory-like data access times. Spark and other non-Verti-
Scan compute engines also benefit from the V-Ordered files with an average of 10%
faster read times, with some scenarios up to 50%.
V-Order works by applying special sorting, row group distribution, dictionary encoding,
and compression on Parquet files. As a result, compute engines require less network,
disk, and CPU resources to read data from storage, providing cost efficiency and
performance. It's 100% compliant to the open-source parquet format; all parquet
engines can read it as regular parquet files.
Performance considerations
Consider the following before deciding to disable V-Order:
U Caution
Currently, disabling V-Order can only be done at the warehouse level, and it is
irreversible: once disabled, it cannot be enabled again. Users must consider the
performance if they choose to Disable V-Order in Fabric Warehouse.
Disabling V-Order can be useful for write-intensive warehouses, such as for warehouses
that are dedicated to staging data as part of a data ingestion process. Staging tables are
often dropped and recreated (or truncated) to process new data. These staging tables
might then be read only once or twice, which might not justify the ingestion time added
by applying V-Order. By disabling V-Order and reducing the time to ingest data, your
overall time to process data during ingestion jobs might be reduced. In this case, you
should segment the staging warehouse from your main user-facing warehouse, so that
the analytics queries and Power BI can benefit from V-Order.
Related content
Disable V-Order on Warehouse in Microsoft Fabric
Delta Lake table optimization and V-Order
Feedback
Was this page helpful? Yes No
Monitoring the usage and activity is crucial for ensuring that your warehouse operates
efficiently.
Query activity
Users are provided a one-stop view of their running and completed queries in an easy-
to-use interface, without having to run T-SQL. For more information, see Monitor your
running and completed queries using Query activity.
Query insights
Query Insights provides historical query data for completed, failed, canceled queries
along with aggregated insights to help you tune your query performance. For more
information, see Query insights in Fabric data warehousing.
Related content
Billing and utilization reporting in Fabric Data Warehouse
Monitor your running and completed queries using Query activity
Query insights in Fabric data warehousing
Monitor connections, sessions, and requests using DMVs
Feedback
Was this page helpful? Yes No
The article explains compute usage reporting of the Synapse Data Warehouse in Microsoft Fabric,
which includes read and write activity against the Warehouse, and read activity on the SQL analytics
endpoint of the Lakehouse.
When you use a Fabric capacity, your usage charges appear in the Azure portal under your
subscription in Microsoft Cost Management. To understand your Fabric billing, visit Understand your
Azure bill on a Fabric capacity.
For more information about monitoring current and historical query activity, see Monitor in Fabric
Data warehouse overview.
Capacity
In Fabric, based on the Capacity SKU purchased, you're entitled to a set of Capacity Units (CUs) that
are shared across all Fabric workloads. For more information on licenses supported, see Microsoft
Fabric licenses.
Capacity is a dedicated set of resources that is available at a given time to be used. Capacity defines
the ability of a resource to perform an activity or to produce output. Different resources consume
CUs at different times.
CUs consumed by data warehousing include read and write activity against the Warehouse, and
read activity on the SQL analytics endpoint of the Lakehouse.
In simple terms, 1 Fabric capacity unit = 0.5 Warehouse vCores. For example, a Fabric capacity SKU
F64 has 64 capacity units, which is equivalent to 32 Warehouse vCores.
Once you have installed the app, select the Warehouse from the Select item kind: dropdown list.
The Multi metric ribbon chart chart and the Items (14 days) data table now show only Warehouse
activity.
Both the Warehouse and SQL analytics endpoint rollup under Warehouse in the Metrics app, as they
both use SQL compute. The operation categories seen in this view are:
Warehouse Query: Compute charge for all user-generated and system-generated T-SQL
statements within a Warehouse.
SQL analytics endpoint Query: Compute charge for all user generated and system generated
T-SQL statements within a SQL analytics endpoint.
OneLake Compute: Compute charge for all reads and writes for data stored in OneLake.
For example:
Timepoint explore graph
This graph in the Microsoft Fabric Capacity Metrics app shows utilization of resources compared to
capacity purchased. 100% of utilization represents the full throughput of a capacity SKU and is
shared by all Fabric workloads. This is represented by the yellow dotted line. Selecting a specific
timepoint in the graph enables the Explore button, which opens a detailed drill through page.
In general, similar to Power BI, operations are classified either as interactive or background, and
denoted by color. Most operations in Warehouse category are reported as background to take
advantage of 24-hour smoothing of activity to allow for the most flexible usage patterns. Classifying
data warehousing as background reduces the frequency of peaks of CU utilization from triggering
throttling.
This table in the Microsoft Fabric Capacity Metrics app provides a detailed view of utilization at
specific timepoints. The amount of capacity provided by the given SKU per 30-second period is
shown along with the breakdown of interactive and background operations. The interactive
operations table represents the list of operations that were executed at that timepoint.
The Background operations table might appear to display operations that were executed much
before the selected timepoint. This is due to background operations undergoing 24-hour
smoothing. For example, the table displays all operations that were executed and still being
smoothed at a selected timepoint.
Identification of an operation that consumed many resources: sort the table by Total CU(s)
descending to find the most expensive queries, then use Operation Id to uniquely identify an
operation. This is the distributed statement ID, which can be used in other monitoring tools
like dynamic management views (DMVs) and Query Insights for end-to-end traceability, such
as in dist_statement_id in sys.dm_exec_requests, and distributed_statement_id in query
insights.exec_requests_history. Examples:
The following sample T-SQL query uses an Operation Id inside a query on the
sys.dm_exec_requests dynamic management view.
SQL
SQL
Billing example
Consider the following query:
SQL
For demonstration purposes, assume the billing metric accumulates 100 CU seconds.
The cost of this query is CU seconds times the price per CU. Assume in this example that the price
per CU is
0.18/hour. T hereare3600secondsinanhour. So, thecostof thisquerywouldbe(100x0.18)/3600 =
0.005.
The numbers used in this example are for demonstration purposes only and not actual billing
metrics.
Considerations
Consider the following usage reporting nuances:
Cross database reporting: When a T-SQL query joins across multiple warehouses (or across a
Warehouse and a SQL analytics endpoint), usage is reported against the originating resource.
Queries on system catalog views and dynamic management views are billable queries.
Duration(s) field reported in Fabric Capacity Metrics App is for informational purposes only. It
reflects the statement execution duration. Duration might not include the complete end-to-
end duration for rendering results back to the web application like the SQL Query Editor or
client applications like SQL Server Management Studio and Azure Data Studio.
Next step
How to: Observe Synapse Data Warehouse utilization trends
Related content
Monitor connections, sessions, and requests using DMVs
Workload management
Synapse Data Warehouse in Microsoft Fabric performance guidelines
What is the Microsoft Fabric Capacity Metrics app?
Smoothing and throttling in Fabric Data Warehousing
Understand your Azure bill on a Fabric capacity
Understand the metrics app compute page
Pause and resume in Fabric data warehousing
Monitor Fabric Data warehouse
Feedback
Was this page helpful? Yes No
You can use existing dynamic management views (DMVs) to monitor connection,
session, and request status in Microsoft Fabric. For more information about the tools
and methods of executing T-SQL queries, see Query the Warehouse.
sys.dm_exec_connections
Returns information about each connection established between the warehouse
and the engine.
sys.dm_exec_sessions
Returns information about each session authenticated between the item and
engine.
sys.dm_exec_requests
Returns information about each active request in a session.
In this tutorial, learn how to monitor your running SQL queries using dynamic
management views (DMVs).
SQL
SELECT *
FROM sys.dm_exec_sessions;
SQL
SELECT connections.connection_id,
connections.connect_time,
sessions.session_id, sessions.login_name, sessions.login_time,
sessions.status
FROM sys.dm_exec_connections AS connections
INNER JOIN sys.dm_exec_sessions AS sessions
ON connections.session_id=sessions.session_id;
SQL
SELECT request_id, session_id, start_time, total_elapsed_time
FROM sys.dm_exec_requests
WHERE status = 'running'
ORDER BY total_elapsed_time DESC;
This second query shows which user ran the session that has the long-running query.
SQL
SELECT login_name
FROM sys.dm_exec_sessions
WHERE 'session_id' = 'SESSION_ID WITH LONG-RUNNING QUERY';
This third query shows how to use the KILL command on the session_id with the long-
running query.
SQL
For example
SQL
KILL '101'
Permissions
An Admin has permissions to execute all three DMVs ( sys.dm_exec_connections ,
sys.dm_exec_sessions , sys.dm_exec_requests ) to see their own and others'
Related content
Query using the SQL Query editor
Query the Warehouse and SQL analytics endpoint in Microsoft Fabric
Query insights in the Warehouse and SQL analytics endpoint in Microsoft Fabric
Feedback
Was this page helpful? Yes No
Prerequisites
You must be an admin in your workspace to access Query activity. Members,
Contributors, Viewers do not have permission to access this view.
Get started
There are two ways you can launch the Query activity experience.
Select More Options (...) next to the warehouse you want to monitor within the
workspace view and select Query activity.
Within the query editor of the warehouse you want to monitor, select Query
activity in the ribbon.
Query runs
On the Query runs page, you can see a list of running, succeeded, canceled, and failed
queries up to the past 30 days.
Use the dropdown list to filter for status, submitter, or submit time.
Use the search bar to filter for specific keywords in the query text or other
columns.
For each query, the following details are provided:
ノ Expand table
Run source Name of the client program that initiated the session
When you want to reload the queries that are displayed on the page, select the Refresh
button in the ribbon. If you see a query that is running that you would like to
immediately stop the execution of, select the query using the checkbox and select the
Cancel button. You'll be prompted with a dialog to confirm before the query is canceled.
Any unselected queries that are part of the same SQL sessions you select will also be
canceled.
The same information regarding running queries can also be found using dynamic
management views.
Query insights
On the Query insights page, you can see a list of long running queries and frequently
run queries to help determine any trends within your warehouse's queries.
For each query in the Long running queries insight, the following details are provided:
ノ Expand table
Median run duration Median query execution time (ms) across runs
Last run distributed statement ID Unique ID for the last query execution
For each query in the Frequently run queries insight, the following details are provided:
ノ Expand table
Average run duration Average query execution time (ms) across runs
Last run distributed statement ID Unique ID for the last query execution
The same information regarding completed, failed, and canceled queries from Query
runs along with aggregated insights can also be found in Query insights in Fabric data
warehousing.
Limitations
Historical queries can take up to 15 minutes to appear in Query activity depending
on the concurrent workload being executed.
Only the top 10,000 rows can be shown in the Query runs and Query insights tabs
for the given filter selections.
An "Invalid object name queryinsights.exec_requests_history" error might occur if
Query activity is opened immediately after a new warehouse is created, due to the
underlying system views not yet generated. As a workaround, wait two minutes,
then refresh the page.
Related content
Billing and utilization reporting in Synapse Data Warehouse
Query insights in Fabric data warehousing
Monitor connections, sessions, and requests using DMVs
Feedback
Was this page helpful? Yes No
In Microsoft Fabric, the query insights feature is a scalable, sustainable, and extendable
solution to enhance the SQL analytics experience. With historical query data, aggregated
insights, and access to actual query text, you can analyze and tune your query
performance. QI provides information on queries run in a user's context only, system
queries aren't considered.
The query insights feature provides a central location for historic query data and
actionable insights for 30 days, helping you to make informed decisions to enhance the
performance of your Warehouse or SQL analytics endpoint. When a SQL query runs in
Microsoft Fabric, the query insights feature collect and consolidates its execution data,
providing you with valuable information. You can view complete query text for Admin,
Member, and Contributor roles.
Historical Query Data: The query insights feature stores historical data about
query executions, enabling you to track performance changes over time. System
queries aren't stored in query insights.
Aggregated Insights: The query insights feature aggregates query execution data
into insights that are more actionable, such as identifying long-running queries or
most active users. These aggregations are based on the query shape. For more
information, see How are similar queries aggregated to generate insights?
Which queries are frequently run, and can their performance be improved?
Can we identify queries that have failed or been canceled?
Can we track changes in query performance over time?
Are there any queries that consistently perform poorly?
queryinsights.exec_requests_history (Transact-SQL)
Returns information about each completed SQL request/query.
queryinsights.exec_sessions_history (Transact-SQL)
Returns information about frequently run queries.
queryinsights.long_running_queries (Transact-SQL)
Returns the information about queries by query execution time.
queryinsights.frequently_run_queries (Transact-SQL)
Returns information about frequently run queries.
You can utilize the query hash column in the views to analyze similar queries and drill
down to each execution.
For example, the following queries are considered the same after their predicates are
parameterized:
SQL
and
SQL
Examples
SQL
Furthermore, you can assess the use of cache by examining the sum of
data_scanned_memory_mb and data_scanned_disk_mb , and comparing it to the
7 Note
The data scanned values might not account the data moved during the
intermediate stages of query execution. In some cases, the size of the data moved
and CPU required to process may be larger than the data scanned value indicates.
SQL
SQL
SQL
Related content
Monitoring connections, sessions, and requests using DMVs
queryinsights.exec_requests_history (Transact-SQL)
queryinsights.exec_sessions_history (Transact-SQL)
queryinsights.long_running_queries (Transact-SQL)
queryinsights.frequently_run_queries (Transact-SQL)
Feedback
Was this page helpful? Yes No
Learn how to observe trends and spikes in your data warehousing workload in Microsoft
Fabric using the Microsoft Fabric Capacity Metrics app.
The Microsoft Fabric Capacity Metrics app provides visibility into capacity usage for all
Fabric workloads in one place. It's mostly used by capacity administrators to monitor the
performance of workloads and their usage, compared to purchased capacity.
Prerequisites
Have a Microsoft Fabric licenses, which grants Capacity Units (CUs) shared across
all Fabric workloads.
Add the Microsoft Fabric Capacity Metrics app from AppSource.
This graph can provide high-level CU trends in the last 14 days to see which Fabric
workload has used the most CU.
1. Use the Item table to identify specific warehouses consuming most Compute. The
Items table in the multi metric ribbon chart provides aggregated consumption at
item level. In this view, for example, you can identify which items have consumed
the most CUs.
2. Select "Warehouse" in the Select item kind(s) dropdown list.
3. Sort the Item table by CU(s), descending.
4. You can now identify the items using the most capacity units, overall duration of
activity, number of users, and more.
Drill through peak activity
Use the timepoint graph to identify a range of activity where CU utilization was at its
peak. We can identify individual interactive and background activities consuming
utilization.
The following animated image walks through several steps you can use to drill through
utilization, throttling, and overage information. For more information, visit Throttling in
Microsoft Fabric.
1. Select the Utilization tab in timepoint explore graph to identify the timepoint at
which capacity utilization exceeded more than what was purchased. The yellow
dotted line provides visibility into upper SKU limit. The upper SKU limit is based on
the SKU purchased along with the enablement of autoscale, if the capacity has
autoscale enabled.
2. Select the Throttling tab and go to the Background rejection section, which is
most applicable for Warehouse requests. In the previous sample animated image,
observe that on October 16, 2023 at 12:57 PM, all background requests in the
capacity were throttled. The 100% line represents the maximum limit based on the
Fabric SKU purchased.
3. Select the Overages tab. This graph gives an overview of the debt that is being
collected and carry forwarded across time periods.
Add % (Green): When the capacity overloads and starts adding to debt
bucket.
Burndown % (Blue): When the debt starts burning down and overall capacity
utilization falls below 100%.
Cumulative % (Red): Represents the total overall debt at timepoints. This
needs to be burned down eventually.
4. In the Utilization, Throttling, or Overages tabs, select a specific timepoint to
enable the Explore button for further drill through analysis.
5. Select Explore. The new page provides tables to explore details of both interactive
and background operations. The page shows some background operations that
are not occurring at that time, due to the 24-hour smoothing logic. In the previous
animated image, operations are displayed between October 15 12:57 PM to
October 16 12:57 PM, because of the background operations still being smoothed
at the selected timepoint.
6. In the Background operations table, you can also identify users, operations,
start/stop times, durations that consumed the most CUs.
The table of operations also provides a list of operations that are InProgress,
so you can understand long running queries and its current CU consumption.
The following sample T-SQL query uses the Operation Id inside a query on
the sys.dm_exec_requests dynamic management view.
SQL
SQL
7. The Burndown table graph represents the different Fabric workloads that are
running on this capacity and the % compute consumed by them at the selected
timepoint.
The table entry for DMS is your Warehouse workload. In the previous sample
animated image, DMS has added 26% to the overall carryforward debt.
The Cumulative % column provides a percentage of how much the capacity
has overconsumed. This value should be below 100% to avoid throttling. For
example, in the previous sample animated image, 2433.84% indicates that
DMS used 24 times more capacity than what the current SKU (F2) allows.
Related content
Billing and utilization reporting in Synapse Data Warehouse
Monitor connections, sessions, and requests using DMVs
Workload management
Synapse Data Warehouse in Microsoft Fabric performance guidelines
What is the Microsoft Fabric Capacity Metrics app?
Smoothing and throttling in Fabric Data Warehousing
Pause and resume in Fabric data warehousing
Feedback
Was this page helpful? Yes No
A Fabric capacity is a distinct pool of resources that's size (or SKU) determines the
amount of computational power available. Warehouse and SQL analytics endpoint
provide burstable capacity that allows workloads to use more resources to achieve
better performance.
Burstable capacity
Burstable capacity has a direct correlation to the SKU that has been assigned to the
Fabric capacity of the workspace. It also is a function of the workload. A non-demanding
workload might never use burstable capacity units. The workload could achieve optimal
performance within the baseline capacity that has been purchased.
To determine if your workload is using burstable capacity, the following formula can be
used to calculate the scale factor for your workload: Capacity Units (CU) / duration /
Baseline CU = Scale factor
As an illustration of this formula, if your capacity is an F8, and your workload takes 100
seconds to complete, and it uses 1500 CU, the scale factor would be calculated as
follows: 1500 / 100 / 8 = 1.875
When a scale factor is over 1, it means that burstable capacity is being used to meet the
demands of the workload. It also means that your workload is borrowing capacity units
from a future time interval. This is a fundamental concept of Microsoft Fabric called
smoothing.
Smoothing offers relief for customers who create sudden spikes during their peak times,
while they have a lot of idle capacity that is unused. Smoothing simplifies capacity
management by spreading the evaluation of compute to ensure that customer jobs run
smoothly and efficiently.
SKU guardrails
Burstable capacity is finite. There's a limit applied to the backend compute resources to
greatly reduce the risk of Warehouse and SQL analytics endpoint workloads causing
throttling.
The limit (or guardrail) is a scale factor directly correlated to the Fabric Capacity SKU size
that is assigned to the workspace.
ノ Expand table
F2 2 1x - 32x
F4 4 1x - 16x
F8 8 1x - 12x
F16 16 1x - 12x
F32 32 1x - 12x
F64 P1 64 1x - 12x
Smaller SKU sizes are often used for Dev/Test scenarios or ad hoc workloads. The larger
scale factor shown in the table gives more processing power that aligns with lower
overall utilization typically found in those environments.
Larger SKU sizes have access to more total capacity units, allowing more complex
workloads to run optimally and with more concurrency. Therefore, if desired
performance of a workload is not being achieved, increasing the capacity SKU size might
be beneficial.
7 Note
The maximum Burstable Scale Factor might only be observed for extremely small
time intervals, often within a single query for seconds or even milliseconds. When
using the Microsoft Fabric Capacity Metrics app to observe burstable capacity, the
scale factor over longer durations will be lower.
Isolation boundaries
Warehouse fully isolates ingestion from query processing, as described in Workload
management.
The burstable scale factor can be achieved independently for ingestion at the same time
the burstable scale factor is achieved for query processing. These scale factors
encapsulate all processes within a single workspace. However, capacity can be assigned
to multiple workspaces. Therefore, the aggregate max scale factor across a capacity
would be represented in the following formula: ([Query burstable scale factor] +
[Ingestion burstable scale factor]) * [number of Fabric workspaces] = [aggregate
burstable scale factor]
Considerations
Typically, a complex query running in a workspace assigned to a small capacity
SKU size should run to completion. However, if the data retrieval or intermediate
data processing physically can't run within the burstable scale factor, it results in
the following error message: This query was rejected due to current capacity
constraints. Review the performance guidelines to ensure data and query
optimization prior to increasing SKU size. To increase the SKU size, contact your
capacity administrator.
After the capacity is resized, new guardrails will be applied when the next query is
run. Performance should stabilize to the new capacity SKU size within a few
seconds of the first query submission.
Related content
Workload management
Scale your capacity
Smoothing and throttling in Fabric Data Warehousing
Manage capacity settings
Feedback
Was this page helpful? Yes No
Microsoft Fabric capacity can be paused to enable cost savings for your organization.
Similar to other workloads, Synapse Data Warehouse in Microsoft Fabric is affected
when the Fabric capacity is paused.
New requests: Once a capacity is paused, users cannot execute new SQL
statements or queries. This also includes activity on the Fabric portal like create
operations, loading data grid, opening model view, opening visual query editor.
Any new activity attempted after capacity is paused returns the following error
message Unable to complete the action because this Fabric capacity is
currently paused.
In client application tools like SQL Server Management Studio (SSMS) or Azure
Data Studio, users signing in to a paused capacity will get the same error text
with SQL error code: 24800.
In client application tools like SQL Server Management Studio (SSMS) or Azure
Data Studio, users attempting to run a new TSQL query on an existing
connection when capacity is paused will see the same error text with SQL error
code: 24802.
In-flight requests: Any open requests, like SQL statements in execution, or activity
on the SQL Query Editor, visual query editor, or modeling view, are canceled with
an error message like Unable to complete the action because this Fabric
capacity is currently paused.
User transactions: When a capacity gets paused in the middle of a user transaction
like BEGIN TRAN and COMMIT TRAN , the transactions roll back.
7 Note
The user experience of rejecting new requests and canceling in-flight requests is
consistent across both Fabric portal and client applications like SQL Server
Management Studio (SSMS) or Azure Data Studio.
Some cleanup activity might be affected when compute is paused. For example,
historical data older than the current data retention settings is not removed while the
capacity is paused. The activities catch up once the capacity resumes.
When a Fabric capacity is resumed, it restarts the warehouse compute resources with a
clean cache, it will take a few runs to add relevant data to cache. During this time after a
resume operation, there could be perceived performance slowdowns.
Tip
Make a trade-off between performance and cost before deciding to pause the
underlying Fabric capacity.
Effect on billing
When capacity is manually paused, it effectively pauses the compute billing meters
for all Microsoft Fabric workloads, including Warehouse.
Data warehouses do not report compute usage once pause workflow is initiated.
The OneLake storage billing meter is not paused. You continue to pay for storage
when compute is paused.
Learn more about billing implications here: Understand your Fabric capacity Azure bill.
be discarded.
Once the capacity resumes, it might take a couple of minutes to start accepting
new requests.
Background cleanup activity might be affected when compute is paused. The
activities catch up once the capacity resumes.
Related content
Scale your capacity
Workload management
Next step
Pause and resume your capacity
Feedback
Was this page helpful? Yes No
This article details the concepts of smoothing and throttling in workloads using
Warehouse and SQL analytics endpoint in Microsoft Fabric.
This article is specific to data warehousing workloads in Microsoft Fabric. For all Fabric
workloads and general information, see Throttling in Microsoft Fabric.
Compute capacity
Capacity forms the foundation in Microsoft Fabric and provides the computing power
that drives all Fabric workloads. Based on the Capacity SKU purchased, you're entitled to
a set of Capacity Units (CUs) that are shared across Fabric. You can review the CUs for
each SKU at Capacity and SKUs.
Smoothing
Capacities have periods where they're under-utilized (idle) and over-utilized (peak).
When a capacity is running multiple jobs, a sudden spike in compute demand might be
generated that exceeds the limits of a purchased capacity. Warehouse and SQL analytics
endpoint provide burstable capacity that allows workloads to use more resources to
achieve better performance.
Smoothing offers relief for customers who create sudden spikes during their peak times
while they have a lot of idle capacity that is unused. Smoothing simplifies capacity
management by spreading the evaluation of compute to ensure that customer jobs run
smoothly and efficiently.
For interactive jobs run by users: capacity consumption is typically smoothed over
a minimum of 5 minutes, or longer, to reduce short-term temporal spikes.
For scheduled, or background jobs: capacity consumption is spread over 24 hours,
eliminating the concern for job scheduling or contention.
Throttling behavior specific to the warehouse
and SQL analytics endpoint
In general, similar to Power BI, operations are classified either as interactive or
background.
Most Warehouse and SQL analytics endpoint operations only experience operation
rejection after over-utilization averaged over a 24-hour period. For more information,
see Future smoothed consumption.
Throttling considerations
Any inflight operations including long-running queries, stored procedures, batches
won't get throttled mid-way. Throttling policies are applicable to the next
operation after consumption is smoothed.
Warehouse operations are background except for scenarios that involves Modeling
operations (such as creating a measure, adding or removing tables from a default
semantic model, visualize results, etc.) or creating/updating Power BI semantic
models (including a default semantic model) or reports. These operations continue
to follow "Interactive Rejection" policy.
Just like most Warehouse operations, dynamic management views (DMVs) are also
classified as background and covered by the "Background Rejection" policy. As a
result, DMVs cannot be queried when capacity is throttled. Even though DMVs are
not available, capacity admins can go to Microsoft Fabric Capacity Metrics app to
understand the root cause.
When the "Background Rejection" policy is enabled, any activity on the SQL query
editor, visual query editor, or modeling view, might see the error message: Unable
to complete the action because your organization's Fabric compute capacity has
exceeded its limits. Try again later .
For a walkthrough of the app, visit How to: Observe Synapse Data Warehouse utilization
trends.
Use the Microsoft Fabric Capacity Metrics app to view a visual history of any
overutilization of capacity, including carry forward, cumulative, and burndown of
utilization. For more information, refer to Throttling in Microsoft Fabric and Overages in
the Microsoft Fabric Capacity Metrics app.
Next step
How to: Observe Synapse Data Warehouse utilization trends
Related content
Throttling in Microsoft Fabric
Billing and utilization reporting in Synapse Data Warehouse
What is the Microsoft Fabric Capacity Metrics app?
Synapse Data Warehouse in Microsoft Fabric performance guidelines
Understand your Azure bill on a Fabric capacity
Smoothing and throttling in Fabric Data Warehousing
Burstable capacity in Fabric data warehousing
Pause and resume in Fabric data warehousing
Feedback
Was this page helpful? Yes No
This article describes the architecture and workload management behind data
warehousing in Microsoft Fabric.
Data processing
The Warehouse and SQL analytics endpoint share the same underlying processing
architecture. As data is retrieved or ingested, it leverages a distributed engine built for
both small and large-scale data and computational functions.
The processing system is serverless in that backend compute capacity scales up and
down autonomously to meet workload demands.
When a query is submitted, the SQL frontend (FE) performs query optimization to
determine the best plan based on the data size and complexity. Once the plan is
generated, it is given to the Distributed Query Processing (DQP) engine. The DQP
orchestrates distributed execution of the query by splitting it into smaller queries that
are executed on backend compute nodes. Each small query is called a task and
represents a distributed execution unit. It reads file(s) from OneLake, joins results from
other tasks, groups, or orders data retrieved from other tasks. For ingestion jobs, it also
writes data to the proper destination tables.
When data is processed, results are returned to the SQL frontend for serving back to the
user or calling application.
The system is fault tolerant and if a node becomes unhealthy, operations executing on
the node are redistributed to healthy nodes for completion.
Warehouse and SQL analytics endpoint provide burstable capacity that allows workloads
to use more resources to achieve better performance, and use smoothing to offer relief
for customers who create sudden spikes during their peak times, while they have a lot of
idle capacity that is unused. Smoothing simplifies capacity management by spreading
the evaluation of compute to ensure that customer jobs run smoothly and efficiently.
As queries arrive, their tasks are scheduled based on first-in-first-out (FIFO) principles. If
there is idle capacity, the scheduler might use a "best fit" approach to optimize
concurrency.
When the scheduler identifies resourcing pressure, it invokes a scale operation. Scaling
is managed autonomously and backend topology grows as concurrency increases. As it
takes a few seconds to acquire nodes, the system is not optimized for consistent
subsecond performance of queries that require distributed processing.
When pressure subsides, backend topology scales back down and releases resource
back to the region.
Ingestion isolation
Applies to: Warehouse in Microsoft Fabric
In the backend compute pool of Warehouse in Microsoft Fabric, loading activities are
provided resource isolation from analytical workloads. This improves performance and
reliability, as ingestion jobs can run on dedicated nodes that are optimized for ETL and
do not compete with other queries or applications for resources.
Sessions
The Warehouse and SQL analytics endpoint have a user session limit of 724 per
workspace. When this limit is reached an error will be returned: The user session limit
for the workspace is 724 and has been reached .
7 Note
As Microsoft Fabric is a SaaS platform, there are many system connections that run
to continuously optimize the environment. DMVs show both system and user
sessions. For more information, see Monitor using DMVs.
Best practices
The Microsoft Fabric workspace provides a natural isolation boundary of the distributed
compute system. Workloads can take advantage of this boundary to manage both cost
and performance.
OneLake shortcuts can be used to create read-only replicas of tables in other
workspaces to distribute load across multiple SQL engines, creating an isolation
boundary. This can effectively increase the maximum number of sessions performing
read-only queries.
Related content
OneLake, the OneDrive for data
What is data warehousing in Microsoft Fabric?
Better together: the lakehouse and warehouse
Burstable capacity in Fabric data warehousing
Smoothing and throttling in Fabric Data Warehousing
Feedback
Was this page helpful? Yes No
This article details the strategy, considerations, and methods of migration of data
warehousing in Azure Synapse Analytics dedicated SQL pools to Microsoft Fabric
Warehouse.
Migration introduction
As Microsoft introduced Microsoft Fabric, an all-in-one SaaS analytics solution for
enterprises that offers a comprehensive suite of services, including Data Factory, Data
Engineering, Data Warehousing, Data Science, Real-Time Intelligence, and Power BI.
This article focuses on options for schema (DDL) migration, database code (DML)
migration, and data migration. Microsoft offers several options, and here we discuss
each option in detail and provide guidance on which of these options you should
consider for your scenario. This article uses the TPC-DS industry benchmark for
illustration and performance testing. Your actual result might vary depending on many
factors including type of data, data types, width of tables, data source latency, etc.
Another key goal of planning is to adjust your design to ensure that your solution takes
full advantage of the high query performance that Fabric Warehouse is designed to
provide. Designing data warehouses for scale introduces unique design patterns, so
traditional approaches aren't always the best. Review the Fabric Warehouse performance
guidelines, because although some design adjustments can be made after migration,
making changes earlier in the process will save you time and effort. Migration from one
technology/environment to another is always a major effort.
The following diagram depicts the Migration Lifecycle listing the major pillars consisting
of Assess and Evaluate, Plan and Design, Migrate, Monitor and Govern, Optimize and
Modernize pillars with the associated tasks in each pillar to plan and prepare for the
smooth migration.
You have an existing environment with a small number of data marts to migrate.
You have an existing environment with data that's already in a well-designed star
or snowflake schema.
You're under time and cost pressure to move to Fabric Warehouse.
In summary, this approach works well for those workloads that is optimized with your
current Synapse dedicated SQL pools environment, and therefore doesn't require major
changes in Fabric.
You might also want to redesign the architecture to take advantage of the new engines
and features available in the Fabric Workspace.
Table considerations
When you migrate tables between different environments, typically only the raw data
and the metadata physically migrate. Other database elements from the source system,
such as indexes, usually aren't migrated because they might be unnecessary or
implemented differently in the new environment.
T-SQL considerations
There are several Data Manipulation Language (DML) syntax differences to be aware of.
Refer to T-SQL surface area in Microsoft Fabric. Consider also a code assessment when
choosing method(s) of migration for the database code (DML).
Depending on the parity differences at the time of the migration, you might need to
rewrite parts of your T-SQL DML code.
The following table provides the mapping of supported data types from Synapse
dedicated SQL pools to Fabric Warehouse.
ノ Expand table
money decimal(19,4)
smallmoney decimal(10,4)
smalldatetime datetime2
datetime datetime2
nchar char
nvarchar varchar
tinyint smallint
binary varbinary
Synapse dedicated SQL pools Fabric Warehouse
datetimeoffset* datetime2
* Datetime2 does not store the extra time zone offset information that is stored in. Since
the datetimeoffset data type is not currently supported in Fabric Warehouse, the time
zone offset data would need to be extracted into a separate column.
This table summarizes information for data schema (DDL), database code (DML), and
data migration methods. We expand further on each scenario later in this article, linked
in the Option column.
ノ Expand table
7 Migrate using Schema dbt Existing dbt users can use the
dbt (DDL) dbt Fabric adapter to convert
conversion their DDL and DML. You must
database then migrate data using other
code (DML) options in this table.
conversion
Tip
Create an inventory of objects that need to be migrated, and document the
migration process from start to end, so that it can be repeated for other dedicated
SQL pools or workloads.
22 SELECT queries (one for each table selected) were generated and executed in
the dedicated SQL pool.
Make sure you have the appropriate DWU and resource class to allow the queries
generated to be executed. For this case, you need a minimum of DWU1000 with
staticrc10 to allow a maximum of 32 queries to handle 22 queries submitted.
Data Factory direct copying data from the dedicated SQL pool to Fabric Warehouse
requires staging. The ingestion process consisted of two phases.
The first phase consists of extracting the data from the dedicated SQL pool into
ADLS and is referred as staging.
The second phase consists of ingesting the data from staging into Fabric
Warehouse. Most of the data ingestion timing is in the staging phase. In
summary, staging has a huge impact on ingestion performance.
Recommended use
Using the Copy Wizard to generate a ForEach provides simple UI to convert DDL and
ingest the selected tables from the dedicated SQL pool to Fabric Warehouse in one step.
However, it isn't optimal with the overall throughput. The requirement to use staging,
the need to parallelize read and write for the "Source to Stage" step are the major
factors for the performance latency. It's recommended to use this option for dimension
tables only.
You have the option of using the source table physical partitioning, if available. If table
does not have physical partitioning, you must specify the partition column and supply
min/max values to use dynamic partitioning. In the following screenshot, the data
pipeline Source options are specifying a dynamic range of partitions based on the
ws_sold_date_sk column.
While using partition can increase the throughput with the staging phase, there are
considerations to make the appropriate adjustments:
Depending on your partition range, it might potentially use all the concurrency
slots as it might generate over 128 queries on the dedicated SQL pool.
You're required to scale to a minimum of DWU6000 to allow all queries to be
executed.
As an example, for TPC-DS web_sales table, 163 queries were submitted to the
dedicated SQL pool. At DWU6000, 128 queries were executed while 35 queries
were queued.
Dynamic partition automatically selects the range partition. In this case, an 11-day
range for each SELECT query submitted to the dedicated SQL pool. For example:
SQL
Recommended use
For fact tables, we recommended using Data Factory with partitioning option to increase
throughput.
However, the increased parallelized reads require dedicated SQL pool to scale to higher
DWU to allow the extract queries to be executed. Leveraging partitioning, the rate is
improved 10x over no partition option. You could increase the DWU to get additional
throughput via compute resources, but the dedicated SQL pool has a maximum 128
active queries allow.
7 Note
For more information on Synapse DWU to Fabric mapping, see Blog: Mapping
Azure Synapse dedicated SQL pools to Fabric data warehouse compute .
1. Extract the data from the dedicated SQL pool to ADLS, therefore mitigating the
stage performance overhead.
2. Use either Data Factory or the COPY command to ingest the data into Fabric
Warehouse.
Recommended use
You can continue to use Data Factory to convert your schema (DDL). Using the Copy
Wizard, you can select the specific table or All tables. By design, this migrates the
schema and data in one step, extracting the schema without any rows, using the false
condition, TOP 0 in the query statement.
The following code sample covers schema (DDL) migration with Data Factory.
This Data Pipeline accepts a parameter SchemaName , which allows you to specify which
schemas to migrate over. The dbo schema is the default.
In the Default value field, enter a comma-delimited list of table schema indicating which
schemas to migrate: 'dbo','tpch' to provide two schemas, dbo and tpch .
Pipeline design: Lookup activity
Create a Lookup Activity and set the Connection to point to your source database.
Connection is your Azure Synapse dedicated SQL pool. Connection type is Azure
Synapse Analytics.
The Query field needs to be built using a dynamic expression, allowing the
parameter SchemaName to be used in a query that returns a list of target source
tables. Select Query then select Add dynamic content.
This expression within the LookUp Activity generates a SQL statement to query the
system views to retrieve a list of schemas and tables. References the SchemaName
parameter to allow for filtering on SQL schemas. The Output of this is an Array of
SQL schema and tables that will be used as input into the ForEach Activity.
Use the following code to return a list of all user tables with their schema name.
JSON
@concat('
SELECT s.name AS SchemaName,
t.name AS TableName
FROM sys.tables AS t
INNER JOIN sys.schemas AS s
ON t.type = ''U''
AND s.schema_id = t.schema_id
AND s.name in (',coalesce(pipeline().parameters.SchemaName, 'dbo'),')
')
Pipeline design: ForEach Loop
For the ForEach Loop, configure the following options in the Settings tab:
Inside the ForEach Activity, add a Copy Activity. This method uses the Dynamic
Expression Language within Data Pipelines to build a SELECT TOP 0 * FROM <TABLE> to
migrate only the schema without data into a Fabric Warehouse.
Pipeline design: Sink
For Sink, point to your Warehouse and reference the Source Schema and Table name.
Once you run this pipeline, you'll see your Data Warehouse populated with each table in
your source, with the proper schema.
You can get the code samples at microsoft/fabric-migration on GitHub.com . This code
is shared as open source, so feel free to contribute to collaborate and help the
community.
Recommended use
This is a great option for those who:
You can execute the specific stored procedure for the schema (DDL) conversion, data
extract, or T-SQL code assessment.
For the data migration, you'll need to use either COPY INTO or Data Factory to ingest
the data into Fabric Warehouse.
This extension is available inside Azure Data Studio and Visual Studio Code. This feature
enables capabilities for source control, database testing and schema validation.
For more information on source control for warehouses in Microsoft Fabric, including Git
integration and deployment pipelines, see Source Control with Warehouse.
Recommended use
This is a great option for those who prefer to use SQL Database Project for their
deployment. This option essentially integrated the Fabric Migration Stored Procedures
into the SQL Database Project to provide a seamless migration experience.
For the data migration, you'll then use either COPY INTO or Data Factory to ingest the
data into Fabric Warehouse.
Adding to the Azure Data Studio supportability to Fabric, the Microsoft Fabric CAT team
has provided a set of PowerShell scripts to handle the extraction, creation, and
deployment of schema (DDL) and database code (DML) via a SQL Database Project. For
a walkthrough of using the SQL Database project with our helpful PowerShell scripts, see
microsoft/fabric-migration on GitHub.com .
For more information on SQL Database Projects, see Getting started with the SQL
Database Projects extension and Build and Publish a project.
Only a single query per table is submitted against the source Synapse dedicated
SQL pool. This won't use up all the concurrency slots, and so won't block
concurrent customer production ETL/queries.
Scaling to DWU6000 isn't required, as only a single concurrency slot is used for
each table, so customers can use lower DWUs.
The extract is run in parallel across all the compute nodes, and this is the key to the
improvement of performance.
Recommended use
Use CETAS to extract the data to ADLS as Parquet files. Parquet files provide the
advantage of efficient data storage with columnar compression that will take less
bandwidth to move across the network. Furthermore, since Fabric stored the data as
Delta parquet format, data ingestion will be 2.5x faster compared to text file format,
since there's no conversion to the Delta format overhead during ingestion.
Add parallel CETAS operations, increasing the use of concurrency slots but allowing
more throughput.
Scale the DWU on Synapse dedicated SQL pool.
The dbt framework generates DDL and DML (SQL scripts) on the fly with each execution.
With model files expressed in SELECT statements, the DDL/DML can be translated
instantly to any target platform by changing the profile (connection string) and the
adapter type.
Recommended use
The dbt framework is code-first approach. The data must be migrated by using options
listed in this document, such as CETAS or COPY/Data Factory.
The dbt adapter for Microsoft Fabric Synapse Data Warehouse allows the existing dbt
projects that were targeting different platforms such as Synapse dedicated SQL pools,
Snowflake, Databricks, Google Big Query, or Amazon Redshift to be migrated to a Fabric
Warehouse with a simple configuration change.
To get started with a dbt project targeting Fabric Warehouse, see Tutorial: Set up dbt for
Fabric Data Warehouse. This document also lists an option to move between different
warehouses/platforms.
Several factors to note so that you can design your process for maximum performance:
With Fabric, there isn't any resource contention when loading multiple tables from
ADLS to Fabric Warehouse concurrently. As a result, there is no performance
degradation when loading parallel threads. The maximum ingestion throughput
will only be limited by the compute power of your Fabric capacity.
Fabric workload management provides separation of resources allocated for load
and query. There's no resource contention while queries and data loading
executed at the same time.
Related content
Create a Warehouse in Microsoft Fabric
Synapse Data Warehouse in Microsoft Fabric performance guidelines
Security for data warehousing in Microsoft Fabric
Blog: Mapping Azure Synapse dedicated SQL pools to Fabric data warehouse
compute
Feedback
Was this page helpful? Yes No
1. Refer to the article about statistics to verify proper column statistics have been
created on all tables.
2. Ensure all table statistics are updated after large DML transactions.
3. Queries with complex JOINs, GROUP BY, and ORDER BY and expect to return large
result set use more tempdb space in execution. Update queries to reduce the
number of GROUP BY and ORDER BY columns if possible.
4. Rerun the query when there's no other active queries running to avoid resource
constraint during query execution.
Query performance seems to degrade over
time
Many factors can affect a query's performance, such as changes in table size, data skew,
workload concurrency, available resources, network, etc. Just because a query runs
slower doesn't necessarily mean there's a query performance problem. Take following
steps to investigate the target query:
1. Identify the differences in all performance-affecting factors among good and bad
performance runs.
2. Refer to the article about statistics to verify proper column statistics have been
created on all tables.
3. Ensure all table statistics are updated after large DML transactions.
4. Check for data skew in base tables.
5. Pause and resume the service. Then, rerun the query when there's no other active
queries running. You can monitor the warehouse workload using DMV.
2. If step 1 fails, run a CTAS command with the failed SELECT statement to send the
SELECT query result to another table in the same warehouse. Using CTAS avoids
query result set being sent back to the client machine. If the CTAS command
finishes successfully and the target table is populated, then the original query
failure is likely caused by the warehouse front end or client issues.
What to collect before contacting Microsoft
support
Provide the workspace ID of Warehouse.
Provide the Statement ID and Distributed request ID. They're returned as messages
after a query completes or fails.
Provide the text of the exact error message.
Provide the time when the query completes or fails.
Related content
Query insights in Fabric data warehousing
Monitoring connections, sessions, and requests using DMVs
What is the Microsoft Fabric Capacity Metrics app?
Limitations in Microsoft Fabric
Microsoft Entra authentication as an alternative to SQL authentication in Microsoft
Fabric
Feedback
Was this page helpful? Yes No
These limitations apply only to Warehouse and SQL analytics endpoint items in Fabric
Synapse Data Warehouse. For limitations of SQL Database in Fabric, see Limitations in
SQL Database in Microsoft Fabric (Preview).
Limitations
Current general product limitations for Data Warehousing in Microsoft Fabric are listed
in this article, with feature level limitations called out in the corresponding feature
article. More functionality will build upon the world class, industry-leading performance
and concurrency story, and will land incrementally. For more information on the future
of Microsoft Fabric, see Fabric Roadmap .
Clone table
Connectivity
Data types in Microsoft Fabric
Semantic models
Delta lake logs
Pause and resume in Fabric data warehousing
Share your data and manage permissions
Limitations in source control
Statistics
Tables
Transactions
Visual Query editor
Tables with renamed columns aren't supported in the SQL analytics endpoint.
Delta tables created outside of the /tables folder aren't available in the SQL
analytics endpoint.
If you don't see a Lakehouse table in the warehouse, check the location of the
table. Only the tables that are referencing data in the /tables folder are available
in the warehouse. The tables that reference data in the /files folder in the lake
aren't exposed in the SQL analytics endpoint. As a workaround, move your data to
the /tables folder.
Some columns that exist in the Spark Delta tables might not be available in the
tables in the SQL analytics endpoint. Refer to the Data types for a full list of
supported data types.
If you add a foreign key constraint between tables in the SQL analytics endpoint,
you won't be able to make any further schema changes (for example, adding the
new columns). If you don't see the Delta Lake columns with the types that should
be supported in SQL analytics endpoint, check if there is a foreign key constraint
that might prevent updates on the table.
Known issues
For known issues in Microsoft Fabric, visit Microsoft Fabric Known Issues .
Related content
T-SQL surface area
Create a Warehouse in Microsoft Fabric
Feedback
Was this page helpful? Yes No
The Warehouse item uses the owner's identity when accessing data on OneLake. To
change the owner of these items, currently the solution method is to use an API call as
described in this article.
This guide walks you through the steps to change your Warehouse owner to your
Organizational account. The takeover APIs for each allow you to change this owner's
identity to an SPN or other organization account (Microsoft Entra ID). For more
information, see Microsoft Entra authentication as an alternative to SQL authentication
in Microsoft Fabric.
The takeover API only works for Warehouse, not the SQL analytics endpoint.
Prerequisites
Before you begin, you need:
Install and import the Power BI PowerShell module, if not installed already. Open
Windows PowerShell as an administrator in an internet-connected workstation and
execute the following command:
PowerShell
Connect
1. Open Windows PowerShell as an administrator.
2. Connect to your Power BI Service:
PowerShell
Connect-PowerBIServiceAccount
4. Copy the second GUID from the URL, for example, 11aaa111-a11a-1111-1aaa-
aa111111aaa . Don't include the / characters. Store this in a text editor for use soon.
5. In the following script, replace workspaceID with the first GUID you copied. Run the
following command.
PowerShell
$workspaceID = 'workspaceID'
6. In the following script, replace warehouseID with the second GUID you copied. Run
the following command.
PowerShell
$warehouseid = 'warehouseID'
PowerShell
PowerShell
Full script
PowerShell
Related content
Security for data warehousing in Microsoft Fabric
Feedback
Was this page helpful? Yes No
Disabling V-Order causes any new Parquet files produced by the warehouse engine to
be created without V-Order optimization.
U Caution
Currently, disabling V-Order can only be done at the warehouse level, and it is
irreversible: once disabled, it cannot be enabled again. Users must consider all the
performance impact of disabling V-Order before deciding to do so.
SQL
SQL
This query outputs each warehouse on the current workspace, with their V-Order status.
A V-Order state of 1 indicates V-Order is enabled for a warehouse, while a state of 0
indicates disabled.
Related content
Understand and manage V-Order for Warehouse
Feedback
Was this page helpful? Yes No
Applies to: SQL Server Azure SQL Database Azure SQL Managed Instance
Azure Synapse Analytics Analytics Platform System (PDW) SQL analytics
endpoint in Microsoft Fabric Warehouse in Microsoft Fabric SQL database in
Microsoft Fabric
This article gives the basics about how to find and use the Microsoft Transact-SQL (T-
SQL) reference articles. T-SQL is central to using Microsoft SQL products and services. All
tools and applications that communicate with a SQL Server database do so by sending
T-SQL commands.
For example, this article applies to all versions, and has the following label.
Applies to: SQL Server Azure SQL Database Azure SQL Managed Instance
Azure Synapse Analytics Analytics Platform System (PDW)
Another example, the following label indicates an article that applies only to Azure
Synapse Analytics and Parallel Data Warehouse.
In some cases, the article is used by a product or service, but all of the arguments aren't
supported. In this case, other Applies to sections are inserted into the appropriate
argument descriptions in the body of the article.
Feedback
Was this page helpful? Yes No
Sharing your expertise with others on Microsoft Learn helps everyone achieve more. Use
the information in this guide to publish a new article to Microsoft Learn or make
updates to an existing published article.
Several of the Microsoft documentation sets are open source and hosted on GitHub.
Not all document sets are completely open source, but many have public-facing repos
where you can suggest changes via pull requests (PR). This open-source approach
streamlines and improves communication between product engineers, content teams,
and customers, and it has other advantages:
Open-source repos plan in the open to get feedback on what docs are most
needed.
Open-source repos review in the open to publish the most helpful content on our
first release.
Open-source repos update in the open to make it easier to continuously improve
the content.
The user experience on Microsoft Learn integrates GitHub workflows directly to make
it even easier. Start by editing the document you're viewing. Or help by reviewing new
topics or creating quality issues.
) Important
All repositories that publish to Microsoft Learn have adopted the Microsoft Open
Source Code of Conduct or the .NET Foundation Code of Conduct . For more
information, see the Code of Conduct FAQ . Contact [email protected]
or [email protected] with any questions or comments.
1. Some docs pages allow you to edit content directly in the browser. If so, you'll see
an Edit button like the one shown below. Choosing the Edit (or equivalently
localized) button takes you to the source file on GitHub.
If the Edit button isn't present, it means the content isn't open to public
contributions. Some pages are generated (for example, from inline documentation
in code) and must be edited in the project they belong to.
2. Select the pencil icon to edit the article. If the pencil icon is grayed out, you need
to either log in to your GitHub account or create a new account.
3. Edit the file in the web editor. Choose the Preview tab to check the formatting of
your changes.
4. When you're finished editing, scroll to the bottom of the page. In the Propose
changes area, enter a title and optionally a description for your changes. The title
will be the first line of the commit message. Select Propose changes to create a
new branch in your fork and commit your changes:
5. Now that you've proposed and committed your changes, you need to ask the
owners of the repository to "pull" your changes into their repository. This is done
using something called a "pull request" (PR). When you select Propose changes, a
new page similar to the following is displayed:
Select Create pull request. Next, enter a title and a description for the PR, and then
select Create pull request. If you're new to GitHub, see About pull requests for
more information.
6. That's it! Content team members will review your PR and merge it when it's
approved. You may get feedback requesting changes.
The GitHub editing UI responds to your permissions on the repository. The preceding
images are for contributors who don't have write permissions to the target repository.
GitHub automatically creates a fork of the target repository in your account. The newly
created fork name has the form GitHubUsername / RepositoryName by default. If you have
write access to the target repository, such as your fork, GitHub creates a new branch in
the target repository. The branch name has the default form patch- n , using a numeric
identifier for the patch branch.
We use PRs for all changes, even for contributors who have write access. Most
repositories protect the default branch so that updates must be submitted as PRs.
The in-browser editing experience is best for minor or infrequent changes. If you make
large contributions or use advanced Git features (such as branch management or
advanced merge conflict resolution), you need to fork the repo and work locally.
7 Note
Most localized documentation doesn't offer the ability to edit or provide feedback
through GitHub. To provide feedback on localized content, use
https://aka.ms/provide-feedback form.
Issues start the conversation about what's needed. The content team will respond to
these issues with ideas for what we can add, and ask for your opinions. When we create
a draft, we'll ask you to review the PR.