Fabric Data Warehouse
Fabric Data Warehouse
Data warehousing
e OVERVIEW
Overview
g TUTORIAL
b GET STARTED
What is a Lakehouse?
Security
c HOW-TO GUIDE
Connectivity
Workspace roles
Ingest data
c HOW-TO GUIDE
p CONCEPT
Datasets
Query
p CONCEPT
c HOW-TO GUIDE
Monitor
Workload Management
Statistics
Troubleshoot
Best practices
Y ARCHITECTURE
Performance guidelines
Security
Ingest Data
What is data warehousing in Microsoft
Fabric?
Article • 08/18/2023
Microsoft Fabric provides customers with a unified product that addresses every aspect
of their data estate by offering a complete, SaaS-ified Data, Analytics and AI platform,
which is lake centric and open. The foundation of Microsoft Fabric enables the novice
user through to the seasoned professional to leverage Database, Analytics, Messaging,
Data Integration and Business Intelligence workloads through a rich, easy to use, shared
SaaS experience with Microsoft OneLake as the centerpiece.
) Important
The Warehouse is built for any skill level - from the citizen developer through to the
professional developer, DBA or data engineer. The rich set of experiences built into
Microsoft Fabric workspace enables customers to reduce their time to insights by having
an easily consumable, always connected dataset that is integrated with Power BI in
DirectLake mode. This enables second-to-none industry leading performance that
ensures a customer's report always has the most recent data for analysis and reporting.
Cross-database querying can be leveraged to quickly and seamlessly leverage multiple
data sources that span multiple databases for fast insights and zero data duplication.
Virtual warehouses with cross database querying
Microsoft Fabric provides customers with the ability to stand up virtual warehouses
containing data from virtually any source by using shortcuts. Customers can build a
virtual warehouse by creating shortcuts to their data wherever it resides. A virtual
warehouse may consist of data from OneLake, Azure Data Lake Storage, or any other
cloud vendor storage within a single boundary and with no data duplication.
Seamlessly unlock value from a variety of data sources through the richness of cross-
database querying in Microsoft Fabric. Cross database querying enables customers to
quickly and seamlessly leverage multiple data sources for fast insights and with zero
data duplication. Data stored in different sources can be easily joined together enabling
customers to deliver rich insights that previously required significant effort from data
integration and engineering teams.
Cross-database queries can be created through the Visual Query editor, which offers a
no-code path to insights over multiple tables. The SQL Query editor, or other familiar
tools such as SQL Server Management Studio (SSMS), can also be used to create cross-
database queries.
Via the SQL Endpoint of the Lakehouse, the user has a subset of SQL commands that
can define and query data objects but not manipulate the data. You can perform the
following actions in the SQL Endpoint:
Query the tables that reference data in your Delta Lake folders in the lake.
Create views, inline TVFs, and procedures to encapsulate your semantics and
business logic in T-SQL.
Manage permissions on the objects.
In a Microsoft Fabric workspace, a SQL Endpoint is labeled "SQL Endpoint" under the
Type column. Each Lakehouse has an autogenerated SQL Endpoint that can be
leveraged through familiar SQL tools such as SQL Server Management Studio, Azure
Data Studio, the Microsoft Fabric SQL Query Editor.
To get started with the SQL Endpoint, see Better together: the lakehouse and warehouse
in Microsoft Fabric.
Unlike a SQL Endpoint which only supports read only queries and creation of views and
TVFs, a Warehouse has full transactional DDL and DML support and is created by a
customer. A Warehouse is populated by one of the supported data ingestion methods
such as COPY INTO, Pipelines, Dataflows, or cross database ingestion options such as
CREATE TABLE AS SELECT (CTAS), INSERT..SELECT, or SELECT INTO.
To get started with the Warehouse, see Create a warehouse in Microsoft Fabric.
For more information about querying your data in Microsoft Fabric, see Query the SQL
Endpoint or Warehouse in Microsoft Fabric.
Compare different warehousing capabilities
In order to best serve your analytics use cases, there are a variety of capabilities
available to you. Generally, the warehouse can be thought of as a superset of all other
capabilities, providing a synergistic relationship between all other analytics offerings that
provide T-SQL.
Within fabric, there are users who may need to decide between a Warehouse,
Lakehouse, and even a Power BI datamart.
Warehouse
Power BI datamart
Licensing
Primary capabilities
Read only, system generated SQL Endpoint for Lakehouse for T-SQL querying and
serving. Supports analytics on the Lakehouse Delta tables, and the Delta Lake folders
referenced via shortcuts.
Developer profile
Development experience
Warehouse Editor with full support for T-SQL data ingestion, modeling,
development, and querying UI experiences for data ingestion, modeling, and
querying
Read / Write support for 1st and 3rd party tooling
Lakehouse SQL Endpoint with limited T-SQL support for views, table valued
functions, and SQL Queries
UI experiences for modeling and querying
Limited T-SQL support for 1st and 3rd party tooling
Datamart Editor with UI experiences and queries support
UI experiences for data ingestion, modeling, and querying
Read-only support for 1st and 3rd party tooling
T-SQL capabilities
Full DQL, DML, and DDL T-SQL support, full transaction support
Full DQL, No DML, limited DDL T-SQL Support such as SQL Views and TVFs
Data loading
Dataflows only
NA
Storage layer
NA
For every Delta table in your Lakehouse, the SQL Endpoint automatically generates one
table.
Tables in the SQL Endpoint are created with a delay. Once you create or update Delta
Lake folder/table in the lake, the warehouse table that references the lake data won't be
immediately created/refreshed. The changes will be applied in the warehouse after 5-10
seconds.
For autogenerated schema data types for the SQL Endpoint, see Data types in Microsoft
Fabric.
Next steps
Better together: the lakehouse and warehouse in Microsoft Fabric
Create a warehouse
Create a lakehouse in Microsoft Fabric
Introduction to Power BI datamarts
Creating reports
Microsoft Fabric decision guide: choose
a data store
Article • 09/18/2023
Use this reference guide and the example scenarios to help you choose a data store for
your Microsoft Fabric workloads.
) Important
Security Object level Row level, table level Built-in RLS Row-level
(table, view, (when using T-SQL), none editor Security
function, stored for Spark
procedure, etc.),
column level,
row level,
DDL/DML
Query across Yes, query Yes, query across No Yes, query across
items across lakehouse and warehouse KQL Databases,
lakehouse and tables;query across lakehouses, and
warehouse lakehouses (including warehouses with
tables shortcuts using Spark) shortcuts
Ingestion Queued
latency ingestion,
Streaming
ingestion has a
Data Lakehouse Power BI KQL Database
warehouse Datamart
couple of
seconds latency
Scenarios
Review these scenarios for help with choosing a data store in Fabric.
Scenario 1
Susan, a professional developer, is new to Microsoft Fabric. They are ready to get started
cleaning, modeling, and analyzing data but need to decide to build a data warehouse or
a lakehouse. After review of the details in the previous table, the primary decision points
are the available skill set and the need for multi-table transactions.
Susan has spent many years building data warehouses on relational database engines,
and is familiar with SQL syntax and functionality. Thinking about the larger team, the
primary consumers of this data are also skilled with SQL and SQL analytical tools. Susan
decides to use a data warehouse, which allows the team to interact primarily with T-
SQL, while also allowing any Spark users in the organization to access the data.
Scenario 2
Rob, a data engineer, needs to store and model several terabytes of data in Fabric. The
team has a mix of PySpark and T-SQL skills. Most of the team running T-SQL queries are
consumers, and therefore don't need to write INSERT, UPDATE, or DELETE statements.
The remaining developers are comfortable working in notebooks, and because the data
is stored in Delta, they're able to interact with a similar SQL syntax.
Rob decides to use a lakehouse, which allows the data engineering team to use their
diverse skills against the data, while allowing the team members who are highly skilled
in T-SQL to consume the data.
Scenario 3
Ash, a citizen developer, is a Power BI developer. They're familiar with Excel, Power BI,
and Office. They need to build a data product for a business unit. They know they don't
quite have the skills to build a data warehouse or a lakehouse, and those seem like too
much for their needs and data volumes. They review the details in the previous table
and see that the primary decision points are their own skills and their need for a self
service, no code capability, and data volume under 100 GB.
Ash works with business analysts familiar with Power BI and Microsoft Office, and knows
that they already have a Premium capacity subscription. As they think about their larger
team, they realize the primary consumers of this data may be analysts, familiar with no-
code and SQL analytical tools. Ash decides to use a Power BI datamart, which allows the
team to interact build the capability fast, using a no-code experience. Queries can be
executed via Power BI and T-SQL, while also allowing any Spark users in the organization
to access the data as well.
Scenario 4
Daisy is business analyst experienced with using Power BI to analyze supply chain
bottlenecks for a large global retail chain. They need to build a scalable data solution
that can handle billions of rows of data and can be used to build dashboards and
reports that can be used to make business decisions. The data comes from plants,
suppliers, shippers, and other sources in various structured, semi-structured, and
unstructured formats.
Daisy decides to use a KQL Database because of its scalability, quick response times,
advanced analytics capabilities including time series analysis, geospatial functions, and
fast direct query mode in Power BI. Queries can be executed using Power BI and KQL to
compare between current and previous periods, quickly identify emerging problems, or
provide geo-spatial analytics of land and maritime routes.
Next steps
What is data warehousing in Microsoft Fabric?
Create a warehouse in Microsoft Fabric
Create a lakehouse in Microsoft Fabric
Introduction to Power BI datamarts
Create a KQL database
Feedback
Was this page helpful? Yes No
This article describes how to get started with Warehouse in Microsoft Fabric using the
Microsoft Fabric portal, including discovering creation and consumption of the
warehouse. You learn how to create your warehouse from scratch and sample along with
other helpful information to get you acquainted and proficient with warehouse
capabilities offered through the Microsoft Fabric portal.
7 Note
) Important
Tip
You can create your warehouse from the Create hub by selecting the Warehouse card
under the Data Warehousing section. When you select the card, an empty warehouse is
created for you to start creating objects in the warehouse or use a sample to get started
as previously mentioned.
Once initialized, you can load data into your warehouse. For more information about
getting data into a warehouse, see Ingesting data.
2. Provide the name for your sample warehouse and select Create.
3. The create action creates a new Warehouse and start loading sample data into it.
The data loading takes few minutes to complete.
4. On completion of loading sample data, the warehouse opens with data loaded into
tables and views to query.
1. Once you have created your warehouse, you can load sample data into warehouse
from Use sample database card.
3. On completion of loading sample data, the warehouse displays data loaded into
tables and views to query.
Sample scripts
SQL
/*************************************************
Get number of trips performed by each medallion
**************************************************/
SELECT
M.MedallionID
,M.MedallionCode
,COUNT(T.TripDistanceMiles) AS TotalTripCount
FROM
dbo.Trip AS T
JOIN
dbo.Medallion AS M
ON
T.MedallionID=M.MedallionID
GROUP BY
M.MedallionID
,M.MedallionCode
/****************************************************
How many passengers are being picked up on each trip?
*****************************************************/
SELECT
PassengerCount,
COUNT(*) AS CountOfTrips
FROM
dbo.Trip
WHERE
PassengerCount > 0
GROUP BY
PassengerCount
ORDER BY
PassengerCount
/***************************************************************************
******
What is the distribution of trips by hour on working days (non-holiday
weekdays)?
****************************************************************************
*****/
SELECT
ti.HourlyBucket,
COUNT(*) AS CountOfTrips
FROM dbo.Trip AS tr
INNER JOIN dbo.Date AS d
ON tr.DateID = d.DateID
INNER JOIN dbo.Time AS ti
ON tr.PickupTimeID = ti.TimeID
WHERE
d.IsWeekday = 1
AND d.IsHolidayUSA = 0
GROUP BY
ti.HourlyBucket
ORDER BY
ti.HourlyBucket
Tip
Next steps
Create tables in Warehouse
Create tables in the Warehouse in
Microsoft Fabric
Article • 05/23/2023
) Important
2. Instead of selecting New SQL query, you can select the dropdown arrow to see
Templates to create T-SQL objects.
3. Select Table, and an autogenerated CREATE TABLE script template appears in your
new SQL query window, as shown in the following image.
To learn more about supported table creation in Warehouse in Microsoft Fabric, see
Tables in data warehousing in Microsoft Fabric and Data types in Microsoft Fabric.
Next steps
Ingest data into your Warehouse using data pipelines
Ingest data into your Warehouse using
data pipelines
Article • 05/23/2023
Data pipelines offer an alternative to using the COPY command through a graphical user
interface. A data pipeline is a logical grouping of activities that together perform a data
ingestion task. Pipelines allow you to manage extract, transform, and load (ETL) activities
instead of managing each one individually.
In this tutorial, you'll create a new pipeline that loads sample data into a Warehouse in
Microsoft Fabric.
7 Note
Some features from Azure Data Factory are not available in Microsoft Fabric, but
the concepts are interchangeable. You can learn more about Azure Data Factory
and Pipelines on Pipelines and activities in Azure Data Factory and Azure Synapse
Analytics. For a quickstart, visit Quickstart: Create your first pipeline to copy data.
) Important
2. In the New pipeline dialog, provide a name for your new pipeline and select
Create.
3. You'll land in the pipeline canvas area, where you see three options to get started:
Add a pipeline activity, Copy data, and Choose a task to start.
Add pipeline activity: this option launches the pipeline editor, where you can
create new pipelines from scratch by using pipeline activities.
Copy data: this option launches a step-by-step assistant that helps you select
a data source, a destination, and configure data load options such as the
column mappings. On completion, it creates a new pipeline activity with a
Copy Data task already configured for you.
Choose a task to start: this option launches a set of predefined templates to
help get you started with pipelines based on different scenarios.
5. In the next page, you can select a dataset, the source file format, and preview the
selected dataset. Select the Bing COVID-19 dataset, the CSV format, and select
Next.
6. The next page, Data destinations, allows you to configure the type of the
destination dataset. We'll load data into a warehouse in our workspace, so select
the Warehouse tab, and the Data Warehouse option. Select Next.
7. Now it's time to pick the warehouse to load data into. Select your desired
warehouse in the dropdown box and select Next.
8. The last step to configure the destination is to provide a name to the destination
table and configure the column mappings. Here you can choose to load the data
to a new table or to an existing one, provide a schema and table names, change
column names, remove columns, or change their mappings. You can accept the
defaults, or adjust the settings to your preference.
9. The next page gives you the option to use staging, or provide advanced options
for the data copy operation (which uses the T-SQL COPY command). Review the
options without changing them and select Next.
10. The last page in the assistant offers a summary of the copy activity. Select the
option Start data transfer immediately and select Save + Run.
11. You are directed to the pipeline canvas area, where a new Copy Data activity is
already configured for you. The pipeline starts to run automatically. You can
monitor the status of your pipeline in the Output pane:
12. After a few seconds, your pipeline finishes successfully. Navigating back to your
warehouse, you can select your table to preview the data and confirm that the
copy operation concluded.
For more on data ingestion into your Warehouse in Microsoft Fabric, visit:
Next steps
Query the SQL Endpoint or Warehouse in Microsoft Fabric
Query the SQL Endpoint or Warehouse
in Microsoft Fabric
Article • 05/23/2023
Alternatively, you can use any of the below tools to connect to your SQL Endpoint
or Warehouse via T-SQL connection string. For more information, see Connectivity.
Download SQL Server Management Studio (SSMS).
Download Azure Data Studio .
7 Note
Review the T-SQL surface area for SQL Endpoint or Warehouse in Microsoft Fabric.
) Important
2. A new tab appears for you to create a visual query.
3. Drag and drop tables from the object Explorer to Visual query editor window to
create a query.
1. Add SQL Endpoint or Warehouse from your current active workspace to object
Explorer using + Warehouses action. When you select SQL Endpoint or Warehouse
from the dialog, it gets added into the object Explorer for referencing when
writing a SQL query or creating Visual query.
2. You can reference the table from added databases using three-part naming. In the
following example, use the three-part name to refer to ContosoSalesTable in the
added database ContosoLakehouse .
SQL
SELECT *
FROM ContosoLakehouse.dbo.ContosoSalesTable AS Contoso
INNER JOIN Affiliation
ON Affiliation.AffiliationId = Contoso.RecordTypeID;
3. Using three-part naming to reference the databases/tables, you can join multiple
databases.
SQL
SELECT *
FROM ContosoLakehouse.dbo.ContosoSalesTable AS Contoso
INNER JOIN My_lakehouse.dbo.Affiliation
ON My_lakehouse.dbo.Affiliation.AffiliationId = Contoso.RecordTypeID;
4. For more efficient and longer queries, you can use aliases.
SQL
SELECT *
FROM ContosoLakehouse.dbo.ContosoSalesTable AS Contoso
INNER JOIN My_lakehouse.dbo.Affiliation as MyAffiliation
ON MyAffiliation.AffiliationId = Contoso.RecordTypeID;
5. Using three-part naming to reference the database and tables, you can insert data
from one database to another.
SQL
6. You can drag and drop tables from added databases to Visual query editor to
create a cross-database query.
3. Once the script is automatically generated, select the Run button to run the script
and see the results.
7 Note
At this time, there's limited T-SQL functionality. See T-SQL surface area for a list of
T-SQL commands that are currently not available.
Next steps
Create reports on data warehousing in Microsoft Fabric
Create reports on data warehousing in
Microsoft Fabric
Article • 05/23/2023
Microsoft Fabric lets you create reusable and default Power BI datasets to create reports
in various ways in Power BI. This article describes the various ways you can use your
Warehouse or SQL Endpoint, and their default Power BI datasets, to create reports.
For example, you can establish a live connection to a shared dataset in the Power BI
service and create many different reports from the same dataset. You can create a data
model in Power BI Desktop and publish to the Power BI service. Then, you and others
can create multiple reports in separate .pbix files from that common data model and
save them to different workspaces.
Advanced users can build reports from a warehouse using a composite model or using
the SQL connection string.
Reports that use the Warehouse or SQL Endpoint can be created in either of the
following two tools:
Power BI service
Power BI Desktop
) Important
If no tables have been added to the default Power BI dataset, the dialog first
automatically adds tables, prompting the user to confirm or manually select the tables
included in the canonical default dataset first, ensuring there's always data first.
With a default dataset that has tables, the New report opens a browser tab to the report
editing canvas to a new report that is built on the dataset. When you save your new
report you're prompted to choose a workspace, provided you have write permissions for
that workspace. If you don't have write permissions, or if you're a free user and the
dataset resides in a Premium capacity workspace, the new report is saved in your My
workspace.
For more information on how to create reports using the Power BI service, see Create
reports in the Power BI service.
For a tutorial with Power BI Desktop, see Get started with Power BI Desktop. For
advanced situations where you want to add more data or change the storage mode, see
use composite models in Power BI Desktop.
You can use integrated Data hub experience in Power BI Desktop to select your SQL
Endpoint or Warehouse to make a connection and build reports.
Alternatively, you can complete the following steps to connect to a warehouse in Power
BI Desktop:
1. Navigate to the warehouse settings in your workspace and copy the SQL
connection string. Or, right-click on the Warehouse or SQL Endpoint in your
workspace and select Copy SQL connection string.
2. Select the Warehouse (preview) connector from the Get data or connect to the
default dataset from Data hub.
3. Paste the SQL connection string into the connector dialog.
4. For authentication, select organizational account.
5. Authenticate using Azure Active Directory - MFA.
6. Select Connect.
7. Select the data items you want to include or not include in your dataset.
Next steps
Data modeling in the default Power BI dataset in Microsoft Fabric
Create reports in the Power BI service in Microsoft Fabric
Data warehouse tutorial introduction
Article • 05/23/2023
Microsoft Fabric provides a one-stop shop for all the analytical needs for every
enterprise. It covers the complete spectrum of services including data movement, data
lake, data engineering, data integration and data science, real time analytics, and
business intelligence. With Microsoft Fabric, there's no need to stitch together different
services from multiple vendors. Instead, the customer enjoys an end-to-end, highly
integrated, single comprehensive product that is easy to understand, onboard, create
and operate. No other product on the market offers the breadth, depth, and level of
integration that Microsoft Fabric offers. Additionally, Microsoft Purview is included by
default in every tenant to meet compliance and governance needs.
) Important
1. Sign into your Power BI online account, or if you don't have an account yet, sign up
for a free trial.
2. Enable Microsoft Fabric in your tenant.
In this tutorial, you take on the role of a Warehouse developer at the fictional Wide
World Importers company and complete the following steps in the Microsoft Fabric
portal to build and implement an end-to-end data warehouse solution:
Data sources - Microsoft Fabric makes it easy and quick to connect to Azure Data
Services, other cloud platforms, and on-premises data sources to ingest data from.
Ingestion - With 200+ native connectors as part of the Microsoft Fabric pipeline and
with drag and drop data transformation with dataflow, you can quickly build insights for
your organization. Shortcut is a new feature in Microsoft Fabric that provides a way to
connect to existing data without having to copy or move it. You can find more details
about the Shortcut feature later in this tutorial.
Transform and store - Microsoft Fabric standardizes on Delta Lake format, which means
all the engines of Microsoft Fabric can read and work on the same dataset stored in
OneLake - no need for data duplicity. This storage allows you to build a data warehouse
or data mesh based on your organizational need. For transformation, you can choose
either low-code or no-code experience with pipelines/dataflows or use T-SQL for a code
first experience.
Consume - Data from the data warehouse can be consumed by Power BI, the industry
leading business intelligence tool, for reporting and visualization. Each data warehouse
comes with a built-in TDS/SQL endpoint for easily connecting to and querying data from
other reporting tools, when needed. When a data warehouse is created, a secondary
item, called a default dataset, is generated at the same time with the same name. You
can use the default dataset to start visualizing data with just a couple of steps.
Sample data
For sample data, we use the Wide World Importers (WWI) sample database. For our data
warehouse end-to-end scenario, we have generated sufficient data for a sneak peek into
the scale and performance capabilities of the Microsoft Fabric platform.
Wide World Importers (WWI) is a wholesale novelty goods importer and distributor
operating from the San Francisco Bay area. As a wholesaler, WWI's customers are mostly
companies who resell to individuals. WWI sells to retail customers across the United
States including specialty stores, supermarkets, computing stores, tourist attraction
shops, and some individuals. WWI also sells to other wholesalers via a network of agents
who promote the products on WWI's behalf. To earn more about their company profile
and operation, see Wide World Importers sample databases for Microsoft SQL.
Typically, you would bring data from transactional systems (or line of business
applications) into a data lake or data warehouse staging area. However, for this tutorial,
we use the dimensional model provided by WWI as our initial data source. We use it as
the source to ingest the data into a data warehouse and transform it through T-SQL.
Data model
While the WWI dimensional model contains multiple fact tables, for this tutorial we
focus on the Sale Fact table and its related dimensions only, as follows, to demonstrate
this end-to-end data warehouse scenario:
Next steps
Tutorial: Create a Microsoft Microsoft Fabric workspace
Tutorial: Create a Microsoft Fabric
workspace
Article • 05/23/2023
Before you can create a warehouse, you need to create a workspace where you'll build
out the remainder of the tutorial.
) Important
Create a workspace
The workspace contains all the items needed for data warehousing, including: Data
Factory pipelines, the data warehouse, Power BI datasets, and reports.
1. Sign in to Power BI .
Next steps
Tutorial: Create a Microsoft Fabric data warehouse
Tutorial: Create a Warehouse in
Microsoft Fabric
Article • 05/23/2023
Now that you have a workspace, you can create your first Warehouse in Microsoft
Fabric.
) Important
2. Search for the workspace you created in Tutorial: Create a Microsoft Fabric
workspace by typing in the search textbox at the top and selecting your workspace
to open it.
3. Select the + New button to display a full list of available items. From the list of
objects to create, choose Warehouse (Preview) to create a new Warehouse in
Microsoft Fabric.
4. On the New warehouse dialog, enter WideWorldImporters as the name.
6. Select Create.
Next steps
Tutorial: Ingest data into a Microsoft Fabric data warehouse
Tutorial: Ingest data into a Warehouse in
Microsoft Fabric
Article • 05/23/2023
Now that you have created a Warehouse in Microsoft Fabric, you can ingest data into
that warehouse.
) Important
Ingest data
1. From the Build a warehouse landing page, select Data Warehouse Tutorial in the
navigation menu to return to the workspace item list.
2. In the upper left corner, select New > Show all to display a full list of available
items.
3. In the Data Factory section, select Data pipeline.
4. On the New pipeline dialog, enter Load Customer Data as the name.
5. Select Create.
6. Select Add pipeline activity from the Start building your data pipeline landing
page.
8. If necessary, select the newly created Copy data activity from the design canvas
and follow the next steps to configure it.
11. Next to the Connection box, select New to create a new connection.
12. On the New connection page, select Azure Blob Storage from the list of
connection options.
16. Change the remaining settings on the Source page of the copy activity as follows,
to reach the .parquet files in
https://azuresynapsestorage.blob.core.windows.net/sampledata/WideWorldImporter
sDW/parquet/full/dimension_city/*.parquet :
i. Container: sampledata
17. Select Preview data next to the File path setting to ensure there are no errors.
18. On the Destination page, select Workspace for the Data store type.
19. Select Data Warehouse for the Workspace data store type.
20. In the Data Warehouse drop down, select WideWorldImporters from the list.
21. Next to the Table configuration setting, check the box under the dropdown list
labeled Edit. The dropdown changes to two text boxes.
22. In the first box next to the Table setting, enter dbo .
23. In the second box next to the Table setting, enter dimension_customer .
27. Select Save and run from the dialog box. The pipeline to load the
dimension_customer table with start.
28. Monitor the copy activity's progress on the Output page and wait for it to
complete.
Next steps
Tutorial: Create tables in a data warehouse
Tutorial: Create tables in a data
warehouse
Article • 05/23/2023
Learn how to create tables in the data warehouse you created in a previous part of the
tutorial.
) Important
Create a table
1. Select Workspaces in the navigation menu.
2. Select the workspace created in Tutorial: Create a Microsoft Fabric data workspace,
such as Data Warehouse Tutorial.
3. From the item list, select WideWorldImporters with the type of Warehouse.
SQL
/*
1. Drop the dimension_city table if it already exists.
2. Create the dimension_city table.
3. Drop the fact_sale table if it already exists.
4. Create the fact_sale table.
*/
--dimension_city
DROP TABLE IF EXISTS [dbo].[dimension_city];
CREATE TABLE [dbo].[dimension_city]
(
[CityKey] [int] NULL,
[WWICityID] [int] NULL,
[City] [varchar](8000) NULL,
[StateProvince] [varchar](8000) NULL,
[Country] [varchar](8000) NULL,
[Continent] [varchar](8000) NULL,
[SalesTerritory] [varchar](8000) NULL,
[Region] [varchar](8000) NULL,
[Subregion] [varchar](8000) NULL,
[Location] [varchar](8000) NULL,
[LatestRecordedPopulation] [bigint] NULL,
[ValidFrom] [datetime2](6) NULL,
[ValidTo] [datetime2](6) NULL,
[LineageKey] [int] NULL
);
--fact_sale
(
[SaleKey] [bigint] NULL,
[CityKey] [int] NULL,
[CustomerKey] [int] NULL,
[BillToCustomerKey] [int] NULL,
[StockItemKey] [int] NULL,
[InvoiceDateKey] [datetime2](6) NULL,
[DeliveryDateKey] [datetime2](6) NULL,
[SalespersonKey] [int] NULL,
[WWIInvoiceID] [int] NULL,
[Description] [varchar](8000) NULL,
[Package] [varchar](8000) NULL,
[Quantity] [int] NULL,
[UnitPrice] [decimal](18, 2) NULL,
[TaxRate] [decimal](18, 3) NULL,
[TotalExcludingTax] [decimal](29, 2) NULL,
[TaxAmount] [decimal](38, 6) NULL,
[Profit] [decimal](18, 2) NULL,
[TotalIncludingTax] [decimal](38, 6) NULL,
[TotalDryItems] [int] NULL,
[TotalChillerItems] [int] NULL,
[LineageKey] [int] NULL,
[Month] [int] NULL,
[Year] [int] NULL,
[Quarter] [int] NULL
);
6. Select Run to execute the query.
7. To save this query for reference later, right-click on the query tab just above the
editor and select Rename.
9. Press Enter on the keyboard or select anywhere outside the tab to save the
change.
10. Validate the table was created successfully by selecting the refresh button on the
ribbon.
11. In the Object explorer, verify that you can see the newly created Create Tables
query, fact_sale table, and dimension_city table.
Next steps
Tutorial: Load data using T-SQL
Tutorial: Load data using T-SQL
Article • 05/23/2023
Now that you know how to build a data warehouse, load a table, and generate a report,
it's time to extend the solution by exploring other methods for loading data.
) Important
SQL
--Copy data from the public Azure storage account to the dbo.fact_sale
table.
COPY INTO [dbo].[fact_sale]
FROM
'https://azuresynapsestorage.blob.core.windows.net/sampledata/WideWorld
ImportersDW/tables/fact_sale.parquet'
WITH (FILE_TYPE = 'PARQUET');
3. Select Run to execute the query. The query takes between one and four minutes to
execute.
4. After the query is completed, review the messages to see the rows affected which
indicated the number of rows that were loaded into the dimension_city and
fact_sale tables respectively.
5. Load the data preview to validate the data loaded successfully by selecting on the
fact_sale table in the Explorer.
6. Rename the query for reference later. Right-click on SQL query 1 in the Explorer
and select Rename.
7. Type Load Tables to change the name of the query.
8. Press Enter on the keyboard or select anywhere outside the tab to save the
change.
Next steps
Tutorial: Transform data using a stored procedure
Tutorial: Clone table using T-SQL in
Microsoft Fabric
Article • 10/03/2023
This tutorial guides you through creating a table clone in Warehouse in Microsoft Fabric,
using the CREATE TABLE AS CLONE OF T-SQL syntax.
) Important
2. In the query editor, paste the following code to create clones of the
dbo.dimension_city and dbo.fact_sale tables.
SQL
3. Select Run to execute the query. The query takes a few seconds to execute.
After the query is completed, the table clones dimension_city1 and fact_sale1
have been created.
4. Load the data preview to validate the data loaded successfully by selecting on the
dimension_city1 table in the Explorer.
5. Rename the query for reference later. Right-click on SQL query 3 in the Explorer
and select Rename.
7. Press Enter on the keyboard or select anywhere outside the tab to save the
change.
SQL
1. In the query editor, paste the following code to create clones of the
dbo.dimension_city and dbo.fact_sale tables in the dbo1 schema.
SQL
2. Select Run to execute the query. The query takes a few seconds to execute.
After the query is completed, clones dimension_city1 and fact_sale1 are created
in the dbo1 schema.
3. Load the data preview to validate the data loaded successfully by selecting on the
dimension_city1 table under dbo1 schema in the Explorer.
4. Rename the query for reference later. Right-click on SQL query 2 in the Explorer
and select Rename.
5. Type Clone Table in another schema to change the name of the query.
6. Press Enter on the keyboard or select anywhere outside the tab to save the
change.
Related content
Clone table in Microsoft Fabric
CREATE TABLE AS CLONE OF
Next step
Tutorial: Transform data using a stored procedure
Feedback
Was this page helpful? Yes No
Learn how to create and save a new stored procedure to transform data.
) Important
Transform data
1. From the Home tab of the ribbon, select New SQL query.
2. In the query editor, paste the following code to create the stored procedure
dbo.populate_aggregate_sale_by_city . This stored procedure will create and load
SQL
3. To save this query for reference later, right-click on the query tab just above the
editor and select Rename.
5. Press Enter on the keyboard or select anywhere outside the tab to save the
change.
9. From the Home tab of the ribbon, select New SQL query.
10. In the query editor, paste the following code. This T-SQL executes
dbo.populate_aggregate_sale_by_city to create the
dbo.aggregate_sale_by_date_city table.
SQL
11. To save this query for reference later, right-click on the query tab just above the
editor and select Rename.
12. Type Run Create Aggregate Procedure to change the name of the query.
13. Press Enter on the keyboard or select anywhere outside the tab to save the
change.
15. Select the refresh button on the ribbon. The query takes between two and three
minutes to execute.
16. In the Object explorer, load the data preview to validate the data loaded
successfully by selecting on the aggregate_sale_by_city table in the Explorer.
Next steps
Tutorial: Create a query with the visual query builder
Tutorial: Create a query with the visual
query builder
Article • 05/23/2023
Create and save a query with the visual query builder in the Microsoft Fabric portal.
) Important
2. Drag the fact_sale table from the Explorer to the query design pane.
3. Limit the dataset size by selecting Reduce rows > Keep top rows from the
transformations ribbon.
4. In the Keep top rows dialog, enter 10000 .
5. Select OK.
6. Drag the dimension_city table from the explorer to the query design pane.
7. From the transformations ribbon, select the dropdown next to Combine and select
Merge queries as new.
c. Select the CityKey field in the dimension_city table by selecting on the column
name in the header row to indicate the join column.
d. Select the CityKey field in the fact_sale table by selecting on the column
name in the header row to indicate the join column.
10. With the Merge step selected, select the Expand button next to fact_sale on the
header of the data grid then select the columns TaxAmount , Profit , and
TotalIncludingTax .
11. Select OK.
a. Change to Advanced.
b. Group by (if necessary, select Add grouping to add more group by columns):
i. Country
ii. StateProvince
iii. City
c. New column name (if necessary, select Add aggregation to add more
aggregate columns and operations):
i. SumOfTaxAmount
i. Choose Operation of Sum and Column of TaxAmount .
ii. SumOfProfit
i. Choose Operation of Sum and Column of Profit .
iii. SumOfTotalIncludingTax
i. Choose Operation of Sum and Column of TotalIncludingTax .
16. Type Sales Summary to change the name of the query.
17. Press Enter on the keyboard or select anywhere outside the tab to save the
change.
Next steps
Tutorial: Analyze data with a notebook
Tutorial: Analyze data with a notebook
Article • 05/23/2023
In this tutorial, learn about how you can save your data once and then use it with many
other services. Shortcuts can also be created to data stored in Azure Data Lake Storage
and S3 to enable you to directly access delta tables from external systems.
) Important
Create a lakehouse
First, we create a new lakehouse. To create a new lakehouse in your Microsoft Fabric
workspace:
4. The new lakehouse loads and the Explorer view opens up, with the Get data in
your lakehouse menu. Under Load data in your lakehouse, select the New
shortcut button.
5. In the New shortcut window, select the button for Microsoft OneLake.
6. In the Select a data source type window, scroll through the list until you find the
Warehouse named WideWorldImporters you created previously. Select it, then
select Next.
7. In the OneLake object browser, expand Tables, expand the dbo schema, and then
select the radio button beside dimension_customer . Select the Create button.
8. If you see a folder called Unidentified under Tables, select the Refresh icon in the
horizontal menu bar.
9. Select the dimension_customer in the Table list to preview the data. Notice that the
lakehouse is showing the data from the dimension_customer table from the
Warehouse!
10. Next, create a new notebook to query the dimension_customer table. In the Home
ribbon, select the drop down for Open notebook and choose New notebook.
11. Select, then drag the dimension_customer from the Tables list into the open
notebook cell. You can see a PySpark query has been written for you to query all
the data from ShortcutExercise.dimension_customer . This notebook experience is
similar to Visual Studio Code Jupyter notebook experience. You can also open the
notebook in VS Code.
12. In the Home ribbon, select the Run all button. Once the query is completed, you
will see you can easily use PySpark to query the Warehouse tables!
Next steps
Tutorial: Create cross-warehouse queries with the SQL query editor
Tutorial: Create cross-warehouse queries
with the SQL query editor
Article • 06/05/2023
In this tutorial, learn about how you can easily create and execute T-SQL queries with
the SQL query editor across multiple warehouse, including joining together data from a
SQL Endpoint and a Warehouse in Microsoft Fabric.
) Important
2. In the query editor, copy and paste the following T-SQL code.
SQL
SELECT Sales.StockItemKey,
Sales.Description,
SUM(CAST(Sales.Quantity AS int)) AS SoldQuantity,
c.Customer
FROM [dbo].[fact_sale] AS Sales,
[ShortcutExercise].[dbo].[dimension_customer] AS c
WHERE Sales.CustomerKey = c.CustomerKey
GROUP BY Sales.StockItemKey, Sales.Description, c.Customer;
3. Select the Run button to execute the query. After the query is completed, you will
see the results.
4. Rename the query for reference later. Right-click on SQL query 1 in the Explorer
and select Rename.
Next steps
Tutorial: Create a Power BI report
Tutorial: Create Power BI reports
Article • 05/23/2023
) Important
Create reports
1. Select the Model view from the options in the bottom left corner, just outside the
canvas.
2. From the fact_sale table, drag the CityKey field and drop it onto the CityKey
field in the dimension_city table to create a relationship.
4. Select Confirm.
a. On the Data pane, expand fact_sales and check the box next to Profit. This
creates a column chart and adds the field to the Y-axis.
b. On the Data pane, expand dimension_city and check the box next to
SalesTerritory. This adds the field to the X-axis.
c. Reposition and resize the column chart to take up the top left quarter of the
canvas by dragging the anchor points on the corners of the visual.
7. Select anywhere on the blank canvas (or press the Esc key) so the column chart
visual is no longer selected.
a. On the Visualizations pane, select the ArcGIS Maps for Power BI visual.
b. From the Data pane, drag StateProvince from the dimension_city table to the
Location bucket on the Visualizations pane.
c. From the Data pane, drag Profit from the fact_sale table to the Size bucket on
the Visualizations pane.
d. If necessary, reposition and resize the map to take up the bottom left quarter of
the canvas by dragging the anchor points on the corners of the visual.
9. Select anywhere on the blank canvas (or press the Esc key) so the map visual is no
longer selected.
b. From the Data pane, check the box next to SalesTerritory on the
dimension_city table.
c. From the Data pane, check the box next to StateProvince on the
dimension_city table.
d. From the Data pane, check the box next to Profit on the fact_sale table.
e. From the Data pane, check the box next to TotalExcludingTax on the fact_sale
table.
f. Reposition and resize the column chart to take up the right half of the canvas by
dragging the anchor points on the corners of the visual.
Next steps
Tutorial: Build a report from the OneLake data hub
Tutorial: Build a report from the
OneLake data hub
Article • 05/23/2023
Learn how to build a report with the data you ingested into your Warehouse in the last
step.
) Important
Build a report
1. Select the OneLake data hub in the navigation menu.
2. From the item list, select WideWorldImporters with the type of Dataset (default).
3. In the Visualize this data section, select Create a report > Auto-create. A report is
generated from the dimension_customer table that was loaded in the previous
section.
6. Enter Customer Quick Summary in the name box. In the Save your report dialogue,
select Save.
7. Your tutorial is complete!
Next steps
Tutorial: Clean up tutorial resources
Tutorial: Clean up tutorial resources
Article • 05/23/2023
You can delete individual reports, pipelines, warehouses, and other items or remove the
entire workspace. In this tutorial, you will clean up the workspace, individual reports,
pipelines, warehouses, and other items you created as part of the tutorial.
) Important
Delete a workspace
1. Select Data Warehouse Tutorial in the navigation menu to return to the workspace
item list.
4. Select Delete on the warning to remove the workspace and all its contents.
Next steps
What is data warehousing in Microsoft Fabric?
Connectivity to data warehousing in
Microsoft Fabric
Article • 06/08/2023
) Important
The SQL connection string requires TCP port 1433 to be open. TCP 1433 is the standard
SQL Server port number. The SQL connection string also respects the Warehouse or
Lakehouse SQL Endpoint security model for data access. Data can be obtained for all
objects to which a user has access.
1. Navigate to your workspace, select the Warehouse, and select More options.
2. Select Copy SQL connection string to copy the connection string to your
clipboard.
1. When you open SSMS, the Connect to Server window appears. If already open,
you can connect manually by selecting Object Explorer > Connect > Database
Engine.
2. Once the Connect to Server window is open, paste the connection string copied
from the previous section of this article into the Server name box. Select Connect
and proceed with the appropriate credentials for authentication. Remember that
only Azure Active Directory - MFA authentication is supported.
When connecting via SSMS (or ADS), you see both a SQL Endpoint and Warehouse
listed as warehouses, and it's difficult to differentiate between the two item types and
their functionality. For this reason, we strongly encourage you to adopt a naming
convention that allows you to easily distinguish between the two item types when you
work in tools outside of the Microsoft Fabric portal experience.
When establishing connectivity via JDBC, check for the following dependencies:
1. Add artifacts, choose Add Artifact and add the following four dependencies in the
window like this, then select Download/Update to load all dependencies.
XML
<dependency>
<groupId>com.microsoft.azure</groupId>
<artifactId>msal4j</artifactId>
<version>1.13.3</version>
</dependency>
<dependency>
<groupId>com.microsoft.sqlserver</groupId>
<artifactId>mssql-jdbc_auth</artifactId>
<version>11.2.1.x86</version>
</dependency>
<dependency>
<groupId>com.microsoft.sqlserver</groupId>
<artifactId>mssql-jdbc</artifactId>
<version>12.1.0.jre11-preview</version>
</dependency>
<dependency>
<groupId>com.microsoft.aad</groupId>
<artifactId>adal</artifactId>
<version>4.2.2</version>
</dependency>
The dbt data platform-specific adapter plugins allow users to connect to the data store
of choice. To connect to Synapse Data Warehouse in Microsoft Microsoft Fabric from
dbt use dbt-fabric adapter. Similarly, the Azure Synapse Analytics dedicated SQL pool
Both adapters support Azure Active Directory (Azure AD) authentication and allow
developers to use az cli authentication . However, SQL authentication is not
supported for dbt-fabric
The DBT Fabric DW Adapter uses the pyodbc library to establish connectivity with the
Warehouse. The pyodbc library is an ODBC implementation in Python language that
uses Python Database API Specification v2.0 . The pyodbc library directly passes
connection string to the database driver through SQLDriverConnect in the msodbc
connection structure to Microsoft Fabric using a TDS (Tabular Data Streaming) proxy
service.
For more information, see the Microsoft Fabric Synapse Data Warehouse dbt adapter
setup and Microsoft Fabric Synapse Data Warehouse dbt adapter configuration .
Custom applications
In Microsoft Fabric, a Warehouse and a Lakehouse SQL Endpoint provide a SQL
connection string. Data is accessible from a vast ecosystem of SQL tooling, provided
they can authenticate using Azure AD. For more information, see Connection libraries
for Microsoft SQL Database.
Next steps
Create a warehouse in Microsoft Microsoft Fabric
Better together: the lakehouse and warehouse in Microsoft Microsoft Fabric
Better together: the lakehouse and
warehouse
Article • 05/23/2023
This article explains the data warehousing experience with the SQL Endpoint of the
Lakehouse, and scenarios for use of the Lakehouse in data warehousing.
) Important
The SQL Endpoint enables you to query data in the Lakehouse using T-SQL language
and TDS protocol. Every Lakehouse has one SQL Endpoint, and each workspace can
have more than one Lakehouse. The number of SQL Endpoints in a workspace matches
the number of Lakehouse items.
The SQL Endpoint is automatically generated for every Lakehouse and exposes
Delta tables from the Lakehouse as SQL tables that can be queried using the T-SQL
language.
Every delta table from a Lakehouse is represented as one table. Data should be in
delta format.
The default Power BI dataset is created for every SQL Endpoint and it follows the
naming convention of the Lakehouse objects.
There's no need to create a SQL Endpoint in Microsoft Fabric. Microsoft Fabric users
can't create a SQL Endpoint in a workspace. A SQL Endpoint is automatically created for
every Lakehouse. To get a SQL Endpoint, create a lakehouse and a SQL Endpoint will be
automatically created for the Lakehouse.
7 Note
Behind the scenes, the SQL Endpoint is using the same engine as the Warehouse to
serve high performance, low latency SQL queries.
Automatic Metadata Discovery
A seamless process reads the delta logs and from the files folder and ensures SQL
metadata for tables, such as statistics, is always up to date. There's no user action
needed, and no need to import, copy data, or set up infrastructure. For more
information, see Automatically generated schema in the SQL Endpoint.
The Lakehouse, with its SQL Endpoint, powered by the Warehouse, can simplify the
traditional decision tree of batch, streaming, or lambda architecture patterns. Together
with a warehouse, the lakehouse enables many additive analytics scenarios. This section
explores how to leverage a Lakehouse together with a Warehouse for a best of breed
analytics strategy.
You can use OneLake shortcuts to reference gold folders in external Azure Data Lake
storage accounts that are managed by Synapse Spark or Azure Databricks engines.
Warehouses can also be added as subject area or domain oriented solutions for specific
subject matter that may have bespoke analytics requirements.
If you choose to keep your data in Fabric, it will always be open and accessible through
APIs, Delta format, and of course T-SQL.
Data in a Microsoft Fabric Lakehouse is physically stored in OneLake with the following
folder structure:
The /Files folder contains raw and unconsolidated (bronze) files that should be
processed by data engineers before they're analyzed. The files might be in various
formats such as CSV, Parquet, different types of images, etc.
The /Tables folder contains refined and consolidated (gold) data that is ready for
business analysis. The consolidated data is in Delta Lake format.
A SQL Endpoint can read data in the /tables folder within OneLake. Analysis is as
simple as querying the SQL Endpoint of the Lakehouse. Together with the Warehouse,
you also get cross-database queries and the ability to seamless switch from read-only
queries to building additional business logic on top of your OneLake data with Synapse
Data Warehouse.
In Fabric, you can leverage Spark Streaming or Data Engineering to curate your data.
You can use the Lakehouse SQL Endpoint to validate data quality and for existing T-SQL
processes. This can be done in a medallion architecture or within multiple layers of your
Lakehouse, serving bronze, silver, gold, or staging, curated, and refined data. You can
customize the folders and tables created through Spark to meet your data engineering
and business requirements. When ready, you can then leverage a Warehouse to serve all
of your downstream business intelligence applications and other analytics use cases,
without copying data, using Views or refining data using CREATE TABLE AS SELECT
(CTAS), stored procedures, and other DML / DDL commands.
Any folder referenced using a shortcut can be analyzed from a SQL Endpoint and a SQL
table is created for the referenced dataset. The SQL table can be used to expose data in
externally managed data lakes and enable analytics on them.
This shortcut acts as a virtual warehouse that can leveraged from a warehouse for
additional downstream analytics requirements, or queried directly.
Use the following steps to analyze data in external data lake storage accounts:
1. Create a shortcut that references a folder in Azure Data Lake storage or Amazon S3
account. Once you enter connection details and credentials, a shortcut is shown in
the Lakehouse.
2. Switch to the SQL Endpoint of the Lakehouse and find a SQL table that has a name
that matches the shortcut name. This SQL table references the folder in ADLS/S3
folder.
3. Query the SQL table that references data in ADLS/S3. The table can be used as any
other table in the SQL Endpoint. You can join tables that reference data in different
storage accounts.
7 Note
If the SQL table is not immediately shown in the SQL Endpoint, you might need to
wait a few minutes. The SQL table that references data in external storage account
is created with a delay.
columns. This allows you to store historical data logically separated in a format that
allows compute engines to read the data as needed with performant filtering, versus
reading the entire directory and all folders and files contained within.
Partitioned data enables faster access if the queries are filtering on the predicates that
compare predicate columns with a value.
A SQL Endpoint can easily read this type of data with no configuration required. For
example, you can use any application to archive data into a data lake, including SQL
Server 2022 or Azure SQL Managed Instance. After you partitioning data and land it in a
lake for archival purposes with external tables, a SQL Endpoint can read partitioned
Delta Lake tables as SQL tables and allow your organization to analyze them. this
reduces the total cost of ownership, reduces data duplication, and lights up big data, AI,
other analytics scenarios.
A SQL Endpoint enables you to leave the data in place and still analyze data in the
Warehouse or Lakehouse, even in other Microsoft Fabric workspaces, via a seamless
virtualization. Every Microsoft Fabric Lakehouse stores data in OneLake.
Every Microsoft Fabric Warehouse stores table data in OneLake. If a table is append-
only, the table data is exposed as Delta Lake datasets in OneLake. Shortcuts enable you
to reference folders in any OneLake where the Warehouse tables are exposed.
A Lakehouse SQL Endpoint can enable easy sharing of data between departments and
users, where a user can bring their own capacity and warehouse. Workspaces organize
departments, business units, or analytical domains. Using shortcuts, users can find any
Warehouse or Lakehouse's data. Users can instantly perform their own customized
analytics from the same shared data. In addition to helping with departmental
chargebacks and usage allocation, this is a zero-copy version the data as well.
The SQL Endpoint enables querying of any table and easy sharing. The added controls
of workspace roles and security roles that can be further layered to meet additional
business requirements.
7 Note
If the SQL table is not immediately shown in the SQL Endpoint, you might need to
wait a few minutes. The SQL table that references data in another workspace is
created with a delay.
columns. Partitioned data sets enable faster data access if the queries are filtering data
using the predicates that filter data by comparing predicate columns with a value.
A SQL Endpoint can represent partitioned Delta Lake data sets as SQL tables and enable
you to analyze them.
Next steps
What is a lakehouse?
Create a lakehouse with OneLake
Understand default Power BI datasets
Load data into the lakehouse
How to copy data using Copy activity in Data pipeline
Tutorial: Move data into lakehouse via Copy assistant
Connectivity
Warehouse of the lakehouse
Query the Warehouse
Create a sample Warehouse in Microsoft
Fabric
Article • 09/29/2023
This article describes how to get started with sample Warehouse using the Microsoft
Fabric portal, including creation and consumption of the warehouse.
) Important
2. Provide the name for your sample warehouse and select Create.
3. The create action creates a new Warehouse and start loading sample data into it.
The data loading takes few seconds to complete.
4. On completion of loading sample data, the warehouse opens with data loaded into
tables and views to query.
1. Once you have created your warehouse, you can load sample data into warehouse
from Use sample database card.
3. On completion of loading sample data, the warehouse displays data loaded into
tables and views to query.
Sample scripts
SQL
/*************************************************
Get number of trips performed by each medallion
**************************************************/
SELECT
M.MedallionID
,M.MedallionCode
,COUNT(T.TripDistanceMiles) AS TotalTripCount
FROM
dbo.Trip AS T
JOIN
dbo.Medallion AS M
ON
T.MedallionID=M.MedallionID
GROUP BY
M.MedallionID
,M.MedallionCode
/****************************************************
How many passengers are being picked up on each trip?
*****************************************************/
SELECT
PassengerCount,
COUNT(*) AS CountOfTrips
FROM
dbo.Trip
WHERE
PassengerCount > 0
GROUP BY
PassengerCount
ORDER BY
PassengerCount
/***************************************************************************
******
What is the distribution of trips by hour on working days (non-holiday
weekdays)?
****************************************************************************
*****/
SELECT
ti.HourlyBucket,
COUNT(*) AS CountOfTrips
FROM dbo.Trip AS tr
INNER JOIN dbo.Date AS d
ON tr.DateID = d.DateID
INNER JOIN dbo.Time AS ti
ON tr.PickupTimeID = ti.TimeID
WHERE
d.IsWeekday = 1
AND d.IsHolidayUSA = 0
GROUP BY
ti.HourlyBucket
ORDER BY
ti.HourlyBucket
Next steps
Query warehouse
Warehouse settings and context menus
Feedback
Was this page helpful? Yes No
) Important
Included in this document are some specific articles devoted to guidelines that apply
only during this Preview period.
During this preview, execute your query several times and focus on the
performance of later executions.
Metrics for monitoring performance
Currently, the Monitoring Hub does not include Warehouse. If you choose the Data
Warehouse experience, you will not be able to access the Monitoring Hub from the left
nav menu.
Fabric administrators will be able to access the Capacity Utilization and Metrics report
for up-to-date information tracking the utilization of capacity that includes Warehouse.
Statistics
The Warehouse uses a query engine to create an execution plan for a given SQL query.
When you submit a query, the query optimizer tries to enumerate all possible plans and
choose the most efficient candidate. To determine which plan would require the least
overhead, the engine needs to be able to evaluate the amount of work or rows that
might be processed by each operator. Then, based on each plan's cost, it chooses the
one with the least amount of estimated work. Statistics are objects that contain relevant
information about your data, to allow the query optimizer to estimate these costs.
For more information statistics and how you can augment the automatically created
statistics, see Statistics in Fabric data warehousing.
To help determine which option is best for you and to review some data ingestion best
practices, review Ingest data.
SQL
For guidance on how to handle these trickle load scenarios, see Best practices for
ingesting data.
Consider using CTAS (Transact-SQL) to write the data you want to keep in a table rather
than using DELETE. If a CTAS takes the same amount of time, it's safer to run since it has
minimal transaction logging and can be canceled quickly if needed.
Use integer-based data types if possible. SORT, JOIN, and GROUP BY operations
complete faster on integers than on character data.
For supported data types and more information, see data types.
Next steps
Query the SQL Endpoint or Warehouse in Microsoft Fabric
Limitations
Troubleshoot the Warehouse
Data types
T-SQL surface area
Tables in data warehouse
Caching in Fabric data warehousing
Feedback
Was this page helpful? Yes No
This article details key concepts for designing tables in Microsoft Fabric.
In Warehouse, tables are database objects that contain all the transactional data.
) Important
Dimension tables contain attribute data that might change but usually changes
infrequently. For example, a customer's name and address are stored in a
dimension table and updated only when the customer's profile changes. To
minimize the size of a large fact table, the customer's name and address don't
need to be in every row of a fact table. Instead, the fact table and the dimension
table can share a customer ID. A query can join the two tables to associate a
customer's profile and transactions.
Integration tables provide a place for integrating or staging data. For example,
you can load data to a staging table, perform transformations on the data in
staging, and then insert the data into a production table.
A table stores data in OneLake overview as part of the Warehouse. The table and the
data persist whether or not a session is open.
WideWorldImportersDW Source Table Name Table Type Data Warehouse Table Name
Create a table
For Warehouse, you can create a table as a new empty table. You can also create and
populate a table with the results of a select statement. The following are the T-SQL
commands for creating a table.
T-SQL Description
Statement
CREATE Creates an empty table by defining all the table columns and options.
TABLE
CREATE Populates a new table with the results of a select statement. The table columns and
TABLE AS data types are based on the select statement results. To import data, this statement
SELECT can select from an external table.
SQL
Schema names
Warehouse supports the creation of custom schemas. Like in SQL Server, schemas are a
good way to group together objects that are used in a similar fashion. The following
code creates a user-defined schema called wwi .
SQL
Data types
Microsoft Fabric supports the most commonly used T-SQL data types.
For more about data types, see Data types in Microsoft Fabric.
When you create a table in Warehouse, review the data types reference in CREATE
TABLE (Transact-SQL).
For a guide to create a table in Warehouse, see Create tables.
Collation
Currently, Latin1_General_100_BIN2_UTF8 is the default and only supported collation for
both tables and metadata.
Statistics
The query optimizer uses column-level statistics when it creates the plan for executing a
query. To improve query performance, it's important to have statistics on individual
columns, especially columns used in query joins. Warehouse supports automatic
creation of statistics.
If data is coming from multiple data stores, you can port the data into the data
warehouse and store it in an integration table. Once data is in the integration table, you
can use the power of data warehouse to implement transformation operations. Once
the data is prepared, you can insert it into production tables.
Limitations
Warehouse supports many, but not all, of the table features offered by other databases.
The following list shows some of the table features that aren't currently supported.
During preview, this list is subject to change.
Computed columns
Indexed views
Sequence
Sparse columns
Surrogate keys on number sequences with Identity columns
Synonyms
Triggers
Unique indexes
User-defined types
Temporary tables
Next steps
What is data warehousing in Microsoft Fabric?
What is data engineering in Microsoft Fabric?
Create a Warehouse
Query a warehouse
OneLake overview
Create tables in Warehouse
Transactions and modify tables
Data types in Microsoft Fabric
Article • 06/06/2023
Tables in Microsoft Fabric support the most commonly used T-SQL data types.
) Important
7 Note
The precision for datetime2 and time is limited to 6 digits of precision on fractions
of seconds.
The uniqueidentifier data type is a T-SQL data type, without a matching data type in
Parquet. As a result, it's stored as a binary type. Warehouse supports storing and reading
uniqueidentifier columns, but these values can't be read on the SQL Endpoint. Reading
uniqueidentifier values in the lakehouse displays a binary representation of the original
values. As a result, features such as cross-joins between Warehouse and SQL Endpoint
using a uniqueidentifier column doesn't work as expected.
For more information about the supported data types including their precisions, see
data types in CREATE TABLE reference.
money and Use decimal, however note that it can't store the monetary unit.
smallmoney
nchar and Use char and varchar respectively, as there's no similar unicode data type in
nvarchar Parquet. Char and varchar types in a UTF-8 collation may use more storage than
nchar and nvarchar to store unicode data. To understand the impact on your
environment, see Storage differences between UTF-8 and UTF-16.
Unsupported data types can still be used in T-SQL code for variables, or any in-memory
use in session. Creating tables or views that persist data on disk with any of these types
isn't allowed.
The rules for mapping original Delta types to the SQL types in SQL Endpoint are shown
in the following table:
DOUBLE float
DATE date
TIMESTAMP datetime2
BINARY varbinary(n).
The columns that have the types that aren't listed in the table aren't represented as the
table columns in the SQL Endpoint.
Next steps
T-SQL Surface Area in Microsoft Fabric
T-SQL surface area in Microsoft Fabric
Article • 07/12/2023
This article covers the T-SQL language syntax capabilities of Microsoft Fabric, when
querying the SQL Endpoint or Warehouse.
) Important
Limitations
At this time, the following list of commands is NOT currently supported. Don't try to use
these commands because even though they may appear to succeed, they could cause
issues to your warehouse.
Temp Tables
Triggers
TRUNCATE
Next steps
Data types in Microsoft Fabric
Limitations in Microsoft Fabric
Primary keys, foreign keys, and unique
keys in Warehouse in Microsoft Fabric
Article • 05/23/2023
Learn about table constraints in Warehouse in Microsoft Fabric, including the primary
key, foreign keys, and unique keys.
) Important
To add or remove primary key, foreign key, or unique constraints, use ALTER TABLE.
) Important
Table constraints
Warehouse in Microsoft Fabric supports these table constraints:
PRIMARY KEY is only supported when NONCLUSTERED and NOT ENFORCED are
both used.
UNIQUE constraint is only supported when NONCLUSTERED and NOT ENFORCED
is used.
FOREIGN KEY is only supported when NOT ENFORCED is used.
Remarks
Having primary key, foreign key and/or unique key allows Warehouse in Microsoft Fabric
to generate an optimal execution plan for a query.
) Important
After creating a table with primary key or unique constraint in Warehouse in
Microsoft Fabric, users need to make sure all values in those columns are unique. A
violation of that may cause the query to return inaccurate result. Foreign keys are
not enforced.
This example shows how a query may return inaccurate result if the primary key or
unique constraint column includes duplicate values.
SQL
-- Create table t1
CREATE TABLE t1 (a1 INT NOT NULL, b1 INT)
/*
a1 total
----------- -----------
1 2
2 1
3 1
4 1
(4 rows affected)
*/
/*
a1 total
----------- -----------
2 1
4 1
1 1
3 1
1 1
(5 rows affected)
*/
/*
a1 total
----------- -----------
2 1
4 1
1 1
3 1
1 1
(5 rows affected)
*/
/*
a1 b1
----------- -----------
2 200
3 300
4 400
0 1000
1 100
(5 rows affected)
*/
/*
a1 total
----------- -----------
2 1
3 1
4 1
0 1
1 1
(5 rows affected)
*/
/*
a1 total
----------- -----------
2 1
3 1
4 1
0 1
1 1
(5 rows affected)
*/
Examples
Create a Warehouse in Microsoft Fabric table with a primary key:
SQL
SQL
SQL
Next steps
Design tables in Warehouse in Microsoft Fabric
Data types in Microsoft Fabric
What is data warehousing in Microsoft Fabric?
What is data engineering in Microsoft Fabric?
Warehouse in Microsoft Fabric
Create a Warehouse
Query a warehouse
Transactions in Warehouse tables in
Microsoft Fabric
Article • 05/23/2023
Similar to their behavior in SQL Server, transactions allow you to control the commit or
rollback of read and write queries.
You can modify data that is stored in tables in a Warehouse using transactions to group
changes together.
For example, you could commit inserts to multiples tables, or, none of the tables if
an error arises. If you're changing details about a purchase order that affects three
tables, you can group those changes into a single transaction. That means when
those tables are queried, they either all have the changes or none of them do.
Transactions are a common practice for when you need to ensure your data is
consistent across multiple tables.
) Important
Transactional capabilities
The same transactional capabilities are supported in the SQL Endpoint in Microsoft
Fabric, but for read-only queries.
Transactions can also be used for sequential SELECT statements to ensure the tables
involved all have data from the same point in time. As an example, if a table has new
rows added by another transaction, the new rows don't affect the SELECT queries inside
an open transaction.
) Important
Only the snapshot isolation level is supported in Microsoft Fabric. If you use T-SQL
to change your isolation level, the change is ignored at Query Execution time and
snapshot isolation is applied.
Cross-database query transaction support
Warehouse in Microsoft Fabric supports transactions that span across databases that are
within the same workspace including reading from the SQL Endpoint of the Lakehouse.
Every Lakehouse has one SQL Endpoint and each workspace can have more than one
lakehouse.
These locks prevent conflicts such as a table's schema being changed while rows are
being updated in a transaction.
You can query locks currently held with the dynamic management view (DMV)
sys.dm_tran_locks.
Conflicts from two or more concurrent transactions that update one or more rows in a
table are evaluated at the end of the transaction. The first transaction to commit
completes successfully and the other transactions are rolled back with an error returned.
These conflicts are evaluated at the table level and not the individual parquet file level.
INSERT statements always create new parquet files, which means fewer conflicts with
other transactions except for DDL because the table's schema could be changing.
Transaction logging
Transaction logging in Warehouse in Microsoft Fabric is at the parquet file level because
parquet files are immutable (they can't be changed). A rollback results in pointing back
to the previous parquet files. The benefits of this change are that transaction logging
and rollbacks are faster.
Limitations
Distributed transactions are not supported.
Save points are not supported.
Named transactions are not supported.
Marked transactions are not supported.
At this time, there's limited T-SQL functionality in the warehouse. See TSQL surface
area for a list of T-SQL commands that are currently not available.
If a transaction has data insertion into an empty table and issues a SELECT before
rolling back, the automatically generated statistics may still reflect the
uncommitted data, causing inaccurate statistics. Inaccurate statistics can lead to
unoptimized query plans and execution times. If you roll back a transaction with
SELECTs after a large INSERT, you may want to update statistics for the columns
mentioned in your SELECT.
Next steps
Query the Warehouse
Tables in Warehouse
Warehouse settings and context menus
Article • 05/23/2023
Settings are accessible from the context menu or from the Settings icon in the ribbon
when you open the item. There are some key differences in the actions you can take in
settings depending on if you're interacting with the SQL Endpoint or a data warehouse.
) Important
Settings options
This section describes and explains the settings options available based on the item
you're working with and its description.
The following table is a list of settings available for each warehouse.
Warehouse Lets users add metadata details to provide descriptive information Warehouse
description about a warehouse.
SQL The SQL connection string for the workspace. You can use the SQL
connection connection string to create a connection to the warehouse using
string various tools, such as SSMS/Azure Data Studio.
The following table shows settings for the default Power BI dataset.
Setting Details
Query caching Turn on or off caching query results for speeding up reports by using
previously saved query results.
Endorsement and Endorse default dataset and make it discoverable in your org.
discovery
Context menus
Applies to: Warehouse in Microsoft Fabric
Warehouse offers an easy experience to create reports and access supported actions
using its context menus.
Share Lets users share the warehouse to build content based on the underlying default
Power BI dataset, query data using SQL or get access to underlying data files.
Shares the warehouse access (SQL- connect only, and autogenerated dataset) with
other users in your organization. Users receive an email with links to access the
detail page where they can find the SQL connection string and can access the
default dataset to create reports based on it.
Analyze in Uses the existing Analyze in Excel capability on default Power BI dataset. Learn
Excel more: Analyze in Excel.
Create Build a report in DirectQuery mode. Learn more: Get started creating in the Power
report BI service
Rename Updates the warehouse with the new name. Does not apply to (Lakehouse) SQL
endpoint.
Menu Option description
option
Delete Delete warehouse from workspace. A confirmation dialog notifies you of the
impact of the delete action. If the Delete action is confirmed, then the warehouse
and related downstream items are deleted. Does not apply to (Lakehouse) SQL
endpoint
Manage Enables users to add other recipients with specified permissions, similar to allowing
permissions the sharing of an underlying dataset or allowing to build content with the data
associated with the underlying dataset.
View This option shows the end-to-end lineage of warehouse from the data sources to
lineage the warehouse, the default Power BI dataset, and other datasets (if any) that were
built on top of the warehouse, all the way to deports, dashboards and apps.
Next steps
Warehouse in Microsoft Fabric
Data modeling in the default Power BI dataset
Create reports in the Power BI service
Admin portal
Ingest data into the Warehouse
Article • 09/18/2023
Warehouse in Microsoft Fabric offers built-in data ingestion tools that allow users to
ingest data into warehouses at scale using code-free or code-rich experiences.
) Important
Use the COPY (Transact-SQL) statement for code-rich data ingestion operations,
for the highest data ingestion throughput possible, or when you need to add data
ingestion as part of a Transact-SQL logic. For syntax, see COPY INTO (Transact-
SQL).
Use data pipelines for code-free or low-code, robust data ingestion workflows that
run repeatedly, at a schedule, or that involves large volumes of data. For more
information, see Ingest data using Data pipelines.
Use dataflows for a code-free experience that allow custom transformations to
source data before it's ingested. These transformations include (but aren't limited
to) changing data types, adding or removing columns, or using functions to
produce calculated columns. For more information, see Dataflows.
Use cross-warehouse ingestion for code-rich experiences to create new tables
with source data within the same workspace. For more information, see Ingest data
using Transact-SQL and Write a cross-database query.
7 Note
The COPY statement in Warehouse supports only data sources on Azure storage
accounts, with authentication using to Shared Access Signature (SAS), Storage
Account Key (SAK), or accounts with public access. For other limitations, see COPY
(Transact-SQL).
For cross-warehouse ingestion, data sources must be within the same Microsoft Fabric
workspace. Queries can be performed using three-part naming for the source data.
SQL
The COPY (Transact-SQL) statement currently supports the PARQUET and CSV file
formats. For data sources, currently Azure Data Lake Storage (ADLS) Gen2 and Azure
Blob Storage are supported.
Data pipelines and dataflows support a wide variety of data sources and data formats.
For more information, see Data pipelines and Dataflows.
Best practices
The COPY command feature in Warehouse in Microsoft Fabric uses a simple, flexible,
and fast interface for high-throughput data ingestion for SQL workloads. In the current
version, we support loading data from external storage accounts only.
You can also use TSQL to create a new table and then insert into it, and then update and
delete rows of data. Data can be inserted from any database within the Microsoft Fabric
workspace using cross-database queries. If you want to ingest data from a Lakehouse to
a warehouse, you can do this with a cross database query. For example:
SQL
Avoid ingesting data using singleton INSERT statements, as this causes poor
performance on queries and updates. If singleton INSERT statements were used
for data ingestion consecutively, we recommend creating a new table by using
CREATE TABLE AS SELECT (CTAS) or INSERT...SELECT patterns, dropping the
original table, and then creating your table again from the table you created using
CREATE TABLE AS SELECT (CTAS) or INSERT...SELECT.
When working with external data on files, we recommend that files are at least 4
MB in size.
For large compressed CSV files, consider splitting your file into multiple files.
Azure Data Lake Storage (ADLS) Gen2 offers better performance than Azure Blob
Storage (legacy). Consider using an ADLS Gen2 account whenever possible.
For pipelines that run frequently, consider isolating your Azure storage account
from other services that could access the same files at the same time.
Explicit transactions allow you to group multiple data changes together so that
they're only visible when reading one or more tables when the transaction is fully
committed. You also have the ability to roll back the transaction if any of the
changes fail.
If a SELECT is within a transaction, and was preceded by data insertions, the
automatically generated statistics may be inaccurate after a rollback. Inaccurate
statistics can lead to unoptimized query plans and execution times. If you roll back
a transaction with SELECTs after a large INSERT, you may want to update statistics
for the columns mentioned in your SELECT.
7 Note
Regardless of how you ingest data into warehouses, the parquet files produced by
the data ingestion task will be optimized using V-Order write optimization. V-Order
optimizes parquet files to enable lightning-fast reads under the Microsoft Fabric
compute engines such as Power BI, SQL, Spark and others. Warehouse queries in
general benefit from faster read times for queries with this optimization, still
ensuring the parquet files are 100% compliant to its open-source specification.
Unlike in Fabric Data Engineering, V-Order is a global setting in Synapse Data
Warehouse that cannot be disabled.
Next steps
Ingest data using Data pipelines
Ingest data using the COPY statement
Ingest data using Transact-SQL
Create your first dataflow to get and transform data
COPY (Transact-SQL)
CREATE TABLE AS SELECT (Transact-SQL)
INSERT (Transact-SQL)
Feedback
Was this page helpful? Yes No
Data pipelines offer an alternative to using the COPY command through a graphical user
interface. A data pipeline is a logical grouping of activities that together perform a data
ingestion task. Pipelines allow you to manage extract, transform, and load (ETL) activities
instead of managing each one individually.
In this tutorial, you'll create a new pipeline that loads sample data into a Warehouse in
Microsoft Fabric.
7 Note
Some features from Azure Data Factory are not available in Microsoft Fabric, but
the concepts are interchangeable. You can learn more about Azure Data Factory
and Pipelines on Pipelines and activities in Azure Data Factory and Azure Synapse
Analytics. For a quickstart, visit Quickstart: Create your first pipeline to copy data.
) Important
2. In the New pipeline dialog, provide a name for your new pipeline and select
Create.
3. You'll land in the pipeline canvas area, where you see three options to get started:
Add a pipeline activity, Copy data, and Choose a task to start.
Add pipeline activity: this option launches the pipeline editor, where you can
create new pipelines from scratch by using pipeline activities.
Copy data: this option launches a step-by-step assistant that helps you select
a data source, a destination, and configure data load options such as the
column mappings. On completion, it creates a new pipeline activity with a
Copy Data task already configured for you.
Choose a task to start: this option launches a set of predefined templates to
help get you started with pipelines based on different scenarios.
5. In the next page, you can select a dataset, the source file format, and preview the
selected dataset. Select the Bing COVID-19 dataset, the CSV format, and select
Next.
6. The next page, Data destinations, allows you to configure the type of the
destination dataset. We'll load data into a warehouse in our workspace, so select
the Warehouse tab, and the Data Warehouse option. Select Next.
7. Now it's time to pick the warehouse to load data into. Select your desired
warehouse in the dropdown box and select Next.
8. The last step to configure the destination is to provide a name to the destination
table and configure the column mappings. Here you can choose to load the data
to a new table or to an existing one, provide a schema and table names, change
column names, remove columns, or change their mappings. You can accept the
defaults, or adjust the settings to your preference.
9. The next page gives you the option to use staging, or provide advanced options
for the data copy operation (which uses the T-SQL COPY command). Review the
options without changing them and select Next.
10. The last page in the assistant offers a summary of the copy activity. Select the
option Start data transfer immediately and select Save + Run.
11. You are directed to the pipeline canvas area, where a new Copy Data activity is
already configured for you. The pipeline starts to run automatically. You can
monitor the status of your pipeline in the Output pane:
12. After a few seconds, your pipeline finishes successfully. Navigating back to your
warehouse, you can select your table to preview the data and confirm that the
copy operation concluded.
For more on data ingestion into your Warehouse in Microsoft Fabric, visit:
Next steps
Query the SQL Endpoint or Warehouse in Microsoft Fabric
Ingest data into your Warehouse using
the COPY statement
Article • 07/09/2023
The COPY statement is the primary way to ingest data into Warehouse tables. COPY
performs high high-throughput data ingestion from an external Azure storage account,
with the flexibility to configure source file format options, a location to store rejected
rows, skipping header rows, and other options.
This tutorial shows data ingestion examples for a Warehouse table using the T-SQL
COPY statement. It uses the Bing COVID-19 sample data from the Azure Open Datasets.
For details about this dataset, including its schema and usage rights, see Bing COVID-19.
7 Note
To learn more about the T-SQL COPY statement including more examples and the
full syntax, see COPY (Transact-SQL).
) Important
Create a table
Before you use the COPY statement, the destination table needs to be created. To create
the destination table for this sample, use the following steps:
3. To create the table used as the destination in this tutorial, run the following code:
SQL
SQL
If you ran the previous example to load data from Parquet, consider deleting all data
from your table:
SQL
To load data from a CSV file skipping a header row, use the following code:
SQL
If you ran both examples without deleting the rows in between runs, you'll see the result
of this query with twice as many rows. While that works for data ingestion in this case,
consider deleting all rows and ingesting data only once if you're going to further
experiment with this data.
Next steps
Ingest data using Data pipelines
Ingest data into your Warehouse using Transact-SQL
Ingesting data into the Warehouse
Ingest data into your Warehouse using
Transact-SQL
Article • 05/23/2023
The Transact-SQL language offers options you can use to load data at scale from
existing tables in your lakehouse and warehouse into new tables in your warehouse.
These options are convenient if you need to create new versions of a table with
aggregated data, versions of tables with a subset of the rows, or to create a table as a
result of a complex query. Let's explore some examples.
) Important
7 Note
The examples in this article use the Bing COVID-19 sample dataset. To load the
sample dataset, follow the steps in Ingest data into your Warehouse using the
COPY statement to create the sample data into your warehouse.
The first example illustrates how to create a new table that is a copy of the existing dbo.
[bing_covid-19_data_2023] table, but filtered to data from the year 2023 only:
SQL
You can also create a new table with new year , month , dayofmonth columns, with values
obtained from updated column in the source table. This can be useful if you're trying to
visualize infection data by year, or to see months when the most COVID-19 cases are
observed:
SQL
As another example, you can create a new table that summarizes the number of cases
observed in each month, regardless of the year, to evaluate how seasonality may affect
spread in a given country/region. It uses the table created in the previous example with
the new month column as a source:
SQL
Based on this new table, we can see that the United States observed more confirmed
cases across all years in the month of January , followed by December and October .
April is the month with the lowest number of cases overall:
SQL
For more examples and syntax reference, see CREATE TABLE AS SELECT (Transact-SQL).
SQL
The query criteria for the SELECT statement can be any valid query, as long as the
resulting query column types align with the columns on the destination table. If column
names are specified and include only a subset of the columns from the destination
table, all other columns are loaded as NULL . For more information, see Using INSERT
INTO...SELECT to Bulk Import data with minimal logging and parallelism.
assets:
A new table can be created that uses three-part naming to combine data from tables on
these workspace assets:
SQL
To learn more about cross-warehouse queries, see Write a cross-database SQL Query.
Next steps
Ingesting data into the Warehouse
Ingest data using the COPY statement
Ingest data using Data pipelines
Write a cross-database SQL Query
Tutorial: Set up dbt for Fabric Data
Warehouse
Article • 08/01/2023
This tutorial guides you through setting up dbt and deploying your first project to an
Azure Fabric Synapse Warehouse.
) Important
Introduction
dbt (Data Build Tool) is an open-source framework that simplifies data transformation
and analytics engineering. It focuses on SQL-based transformations within the analytics
layer, treating SQL as code. dbt supports version control, modularization, testing, and
documentation.
The dbt adapter for Microsoft Fabric can be used to create dbt projects, which can then
be deployed to a Fabric Synapse Data Warehouse.
You can also change the target platform for the dbt project by simply changing the
adapter, for example; a project built for Azure Synapse dedicated SQL pool can be
upgraded in a few seconds to a Fabric Synapse Data Warehouse .
3. Latest version of the dbt-fabric adapter from the PyPI (Python Package Index)
repository using pip install dbt-fabric .
PowerShell
7 Note
4. Make sure to verify that dbt-fabric and its dependencies are installed by using pip
list command:
PowerShell
pip list
A long list of the packages and current versions should be returned from this
command.
5. Create a warehouse if you haven't done so already. You can use the trial capacity
for this exercise: sign up for the Microsoft Fabric free trial , create a workspace,
and then create a warehouse.
You can clone a repo with Visual Studio Code's built-in source control.
Or, for example, you can use the git clone command:
PowerShell
yml
config:
partial_parse: true
jaffle_shop:
target: fabric-dev
outputs:
fabric-dev:
authentication: CLI
database: <put the database name here>
driver: ODBC Driver 18 for SQL Server
host: <enter your sql endpoint here>
schema: dbo
threads: 4
type: fabric
7 Note
Change the type from fabric to synapse to switch the database adapter to
Azure Synapse Analytics, if desired. Any existing dbt project's data platform
can be updated by changing the database adapter. For more information, see
the dbt list of supported data platforms .
Run az login in Visual Studio Code terminal if you're using Azure CLI
authentication.
For Service Principal or other Azure Active Directory authentication to
Synapse Data Warehouse in Microsoft Fabric, refer to dbt (Data Build Tool)
setup and dbt Resource Configurations .
6. Now you're ready to test the connectivity. Run dbt debug in the Visual Studio Code
terminal to test the connectivity to your warehouse.
PowerShell
dbt debug
All checks are passed, which means you can connect your warehouse using dbt-
fabric adapter from the jaffle_shop dbt project.
7. Now, it's time to test if the adapter is working or not. First run dbt seed to insert
sample data into the warehouse.
8. Run dbt test to run the models defined in the demo dbt project.
PowerShell
dbt test
PowerShell
dbt run
That's it! You have now deployed a dbt project to Synapse Data Warehouse in Fabric.
1. Install the new adapter. For more information and full installation instructions, see
dbt adapters .
Considerations
Important things to consider when using dbt-fabric adapter:
Fabric supports Azure Active Directory (Azure AD) authentication for user
principals, user identities and service principals. The recommended authentication
mode to interactively work on warehouse is CLI (command-line interfaces) and use
service principals for automation.
You can log issues on the dbt-fabric adapter by visiting Issues · microsoft/dbt-
fabric · GitHub .
Next steps
What is data warehousing in Microsoft Fabric?
Tutorial: Create a Warehouse in Microsoft Fabric
Tutorial: Transform data using a stored procedure
Default Power BI datasets in Microsoft
Fabric
Article • 06/04/2023
In Microsoft Fabric, Power BI datasets are a semantic model with metrics; a logical
description of an analytical domain, with business friendly terminology and
representation, to enable deeper analysis. This semantic model is typically a star schema
with facts that represent a domain, and dimensions that allow you to analyze, or slice
and dice the domain to drill down, filter, and calculate different analyses. With the
default dataset, the dataset is created automatically for you, and the aforementioned
business logic gets inherited from the parent lakehouse or Warehouse respectively,
jump-starting the downstream analytics experience for business intelligence and
analysis with an item in Microsoft Fabric that is managed, optimized, and kept in sync
with no user intervention.
Visualizations and analyses in Power BI reports can now be built completely in the web
- or in just a few steps in Power BI desktop - saving users time, resources, and by
default, providing a seamless consumption experience for end-users. The default Power
BI dataset follows the naming convention of the Lakehouse.
) Important
The default dataset is queried via the SQL Endpoint and updated via changes to the
Lakehouse. You can also query the default dataset via cross-database queries from a
Warehouse.
By default, all tables and views in the Warehouse are automatically added to the default
Power BI dataset. Users can also manually select tables or views from the Warehouse
they want included in the model for more flexibility. Objects that are in the default
Power BI dataset are created as a layout in the model view.
The background sync that includes objects (tables and views) waits for the downstream
dataset to not be in use to update the dataset, honoring bounded staleness. Users can
always go and manually pick tables they want or no want in the dataset.
The default layout for BI enabled tables persists in the user session and is generated
whenever a user navigates to the model view. Look for the Default dataset objects tab.
The New Power BI dataset button inherits the default dataset's configuration and allows
for further customization. The default dataset acts as a starter template, helping to
ensure a single version of the truth. For example, if you use the default dataset and
define new relationships, and then use the New Power BI dataset button, the new
dataset will inherit those relationships if the tables selected include those new
relationships.
2. In the Reporting ribbon, select New Power BI dataset, and then in the New
dataset dialog, select tables to be included, and then select Confirm.
3. Power BI automatically saves the dataset in the workspace based on the name of
your Warehouse, and then opens the dataset in Power BI.
4. Select Open data model to open the Power BI Web modeling experience where
you can add table relationships and DAX measures.
To learn more on how to edit data models in the Power BI service, see Edit Data Models.
Limitations
Default Power BI datasets follow the current limitations for datasets in Power BI. Learn
more:
If the parquet, Apache Spark, or SQL data types can't be mapped to one of the above
types, they are dropped as part of the sync process. This is in line with current Power BI
behavior. For these columns, we recommend that you add explicit type conversions in
their ETL processes to convert it to a type that is supported. If there are data types that
are needed upstream, users can optionally specify a view in SQL with the explicit type
conversion desired. This will be picked up by the sync or can be added manually as
previously indicated.
Next steps
Define relationships in data models
Data modeling in the default Power BI dataset
Data modeling in the default Power BI
dataset in Microsoft Fabric
Article • 06/14/2023
The default Power BI dataset inherits all relationships between entities defined in the
model view and infers them as Power BI dataset relationships, when objects are enabled
for BI (Power BI Reports). Inheriting the warehouse's business logic allows a warehouse
developer or BI analyst to decrease the time to value towards building a useful semantic
model and metrics layer for analytical business intelligence (BI) reports in Power BI,
Excel, or external tools like Tableau that read the XMLA format.
While all constraints are translated to relationships, currently in Power BI, only one
relationship can be active at a time, whereas multiple primary and foreign key
constraints can be defined for warehouse entities and are shown visually in the diagram
lines. The active Power BI relationship is represented with a solid line and the rest is
represented with a dotted line. We recommend choosing the primary relationship as
active for BI reporting purposes.
) Important
RelyOnReferentialIntegrity A boolean value that indicates whether the relationship can rely on
referential integrity or not.
To add objects such as tables or views to the default Power BI dataset, you have options:
1. Automatically add objects to the dataset, which happens by default with no user
intervention needed.
The auto detect experience determines any tables or views and opportunistically adds
them.
The manually detect option in the ribbon allows fine grained control of which object(s),
such as tables and/or views, should be added to the default Power BI dataset:
Select all
Filter for tables or views
Select specific objects
To remove objects, a user can use the manually select button in the ribbon and:
Un-select all
Filter for tables or views
Un-select specific objects
Tip
We recommend reviewing the objects enabled for BI and ensuring they have the
correct logical relationships to ensure a smooth downstream reporting experience.
Create a measure
A measure is a collection of standardized metrics. Similar to Power BI Desktop, the DAX
editing experience in warehouse presents a rich editor complete with autocomplete for
formulas (IntelliSense). The DAX editor enables you to easily develop measures right in
warehouse, making it a more effective single source for business logic, semantics, and
business critical calculations.
1. To create a measure, select the New Measure button in the ribbon, as shown in the
following image.
2. Enter the measure into the formula bar and specify the table and the column to
which it applies. The formula bar lets you enter your measure. For detailed
information on measures, see Tutorial: Create your own measures in Power BI
Desktop.
3. You can expand the table to find the measure in the table.
Select Hide in Report view from the menu that appears to hide the item from
downstream reporting.
You can also hide the entire table and individual columns by using the Model view
canvas options, as shown in the following image.
Next steps
Define relationships in data models
Create reports in the Power BI service
Define relationships in data models for
data warehousing in Microsoft Fabric
Article • 05/23/2023
) Important
Warehouse modeling
Modeling the warehouse is possible by setting primary and foreign key constraints and
setting identity columns on the model view within the data warehouse UX. After you
navigate the model view, you can do this in a visual entity relationship diagram that
allows a user to drag and drop tables to infer how the objects relate to one another.
Lines visually connecting the entities infer the type of physical relationships that exist.
In the model view, users can model their warehouse and the canonical autogenerated
default Power BI dataset. We recommend modeling your data warehouse using
traditional Kimball methodologies, using a star schema, wherever possible. There are
two types of modeling possible:
2. Select the Confirm button when your relationship is complete to save the
relationship information. The relationship set will effectively:
a. Set the physical relationships - primary and foreign key constraints in the
database
b. Set the logical relationships - primary and foreign key constraints in the default
Power BI dataset
You only see the table names and columns from which you can choose, you aren't
presented with a data preview, and the relationship choices you make are only validated
when you select Apply changes. Using the Properties pane and its streamlined
approach reduces the number of queries generated when editing a relationship, which
can be important for big data scenarios, especially when using DirectQuery connections.
Relationships created using the Properties pane can also use multi-select relationships
in the Model view diagram layouts. Pressing the Ctrl key and select more than one line
to select multiple relationships. Common properties can be edited in the Properties
pane and Apply changes processes the changes in one transaction.
Next steps
Data modeling in the default Power BI dataset
Create reports in the Power BI service in
Microsoft Fabric and Power BI Desktop
Article • 05/23/2023
This article describes three different scenarios you can follow to create reports in the
Power BI service.
) Important
If no tables have been added to the default Power BI dataset, the dialog first
automatically adds tables, prompting the user to confirm or manually select the tables
included in the canonical default dataset first, ensuring there's always data first.
With a default dataset that has tables, the New report opens a browser tab to the report
editing canvas to a new report that is built on the dataset. When you save your new
report you're prompted to choose a workspace, provided you have write permissions for
that workspace. If you don't have write permissions, or if you're a free user and the
dataset resides in a Premium capacity workspace, the new report is saved in your My
workspace.
Select Create report to open the report editing canvas to a new report on the dataset.
When you save your new report, it's saved in the workspace that contains the dataset as
long as you have write permissions on that workspace. If you don't have write
permissions, or if you're a free user and the dataset resides in a Premium capacity
workspace, the new report is saved in your My workspace.
In the Data hub, you see warehouse and their associated default datasets. Select the
warehouse to navigate to the warehouse details page. You can see the warehouse
metadata, supported actions, lineage and impact analysis, along with related reports
created from that warehouse. Default datasets derived from a warehouse behave the
same as any dataset.
To find the warehouse, you begin with the Data hub. The following image shows the
Data hub in the Power BI service:
1. Use Data hub menu in the ribbon to get list of all items.
3. From the drop down on Connect button, select Connect to SQL endpoint.
Next steps
Connectivity
Create reports
Tutorial: Get started creating in the Power BI service
Security for data warehousing in
Microsoft Fabric
Article • 07/12/2023
This article covers security topics for securing the SQL Endpoint of the lakehouse and
the Warehouse in Microsoft Fabric.
) Important
For information on connecting to the SQL Endpoint and Warehouse, see Connectivity.
Workspace roles
Workspace roles are used for development team collaboration within a workspace. Role
assignment determines the actions available to the user and applies to all items within
the workspace.
Item permissions
In contrast to workspace roles, which apply to all items within a workspace, item
permissions can be assigned directly to individual Warehouses. The user will receive the
assigned permission on that single Warehouse. The primary purpose for these
permissions is to enable sharing for downstream consumption of the Warehouse.
For details on the specific permissions provided for warehouses, see Share your
warehouse and manage permissions.
Object-level security
Workspace roles and item permissions provide an easy way to assign coarse permissions
to a user for the entire warehouse. However, in some cases, more granular permissions
are needed for a user. To achieve this, standard T-SQL constructs can be used to provide
specific permissions to users.
For details on the managing granular permissions in SQL, see SQL granular permissions.
Share a warehouse
Sharing is a convenient way to provide users read access to your Warehouse for
downstream consumption. Sharing allows downstream users in your organization to
consume a Warehouse using SQL, Spark, or Power BI. You can customize the level of
permissions that the shared recipient is granted to provide the appropriate level of
access.
For more information on sharing, see How to share your warehouse and manage
permissions.
Guidance
When evaluating the permissions to assign to a user, consider the following guidance:
Only team members who are currently collaborating on the solution should be
assigned to Workspace roles (Admin, Member, Contributor), as this provides them
access to all Items within the workspace.
If they primarily require read only access, assign them to the Viewer role and grant
read access on specific objects through T-SQL. For more information, see Manage
SQL granular permissions.
If they are higher privileged users, assign them to Admin, Member or Contributor
roles. The appropriate role is dependent on the other actions that they will need to
perform.
Other users, who only need access to an individual warehouse or require access to
only specific SQL objects, should be given Fabric Item permissions and granted
access through SQL to the specific objects.
You can manage permissions on Azure Activity Directory groups, as well, rather
than adding each specific member.
Next steps
Connectivity
SQL granular permissions in Microsoft Fabric
How to share your warehouse and manage permissions
Workspace roles in Fabric data
warehousing
Article • 05/23/2023
This article details the permissions that workspace roles provide in SQL Endpoint and
Warehouse. For instructions on assigning workspace roles, see Give Workspace Access.
) Important
Workspace roles
Assigning users to the various workspace roles provides the following capabilities:
Workspace Description
role
Admin Grants the user CONTROL access for each Warehouse and SQL Endpoint within
the workspace, providing them with full read/write permissions and the ability to
manage granular user SQL permissions.
Member Grants the user CONTROL access for each Warehouse and SQL Endpoint within
the workspace, providing them with full read/write permissions and the ability to
manage granular user SQL permissions.
Contributor Grants the user CONTROL access for each Warehouse and SQL Endpoint within
the workspace, providing them with full read/write permissions and the ability to
manage granular user SQL permissions.
Viewer Grants the user CONNECT permissions for each Warehouse and SQL Endpoint
within the workspace. Viewers can be granted granular SQL permissions to read
data from tables/views using T-SQL. For more information, see Manage SQL
granular permissions.
Next steps
Security for data warehousing in Microsoft Fabric
SQL granular permissions
Connectivity
Monitoring connections, sessions, and requests using DMVs
SQL granular permissions in Microsoft
Fabric
Article • 10/05/2023
) Important
Limitations
CREATE USER cannot be explicitly executed currently. When GRANT or DENY is
executed, the user is created automatically. The user will not be able to connect
until sufficient workspace level rights are given.
Row-level security is currently not supported.
Dynamic data masking is currently not supported.
View my permissions
When a user connects to the SQL connection string, they can view the permissions
available to them using the sys.fn_my_permissions function.
SQL
SELECT *
FROM sys.fn_my_permissions(NULL, 'Database');
SQL
SELECT *
FROM sys.fn_my_permissions('<schema-name>', 'Schema');
SQL
SELECT *
FROM sys.fn_my_permissions('<schema-name>.<object-name>', 'Object');
SQL
1. Provide the user with the Fabric Read permission only. This will grant them
CONNECT permissions only for the Warehouse. Optionally, create a custom role
and add the user to the role, if you'd like to restrict access based on roles.
SQL
2. Create a view that queries the table for which you'd like to restrict row access
3. Add a WHERE clause within the VIEW definition, using the SUSER_SNAME() or
IS_ROLEMEMBER() system functions, to filter based on user name or role
SQL
SQL
Related content
Security for data warehousing in Microsoft Fabric
GRANT, REVOKE, and DENY
How to share your warehouse and manage permissions
Feedback
Was this page helpful? Yes No
Sharing is a convenient way to provide users read access to your Warehouse for
downstream consumption. Sharing allows downstream users in your organization to
consume a Warehouse using SQL, Spark, or Power BI. You can customize the level of
permissions that the shared recipient is granted to provide the appropriate level of
access.
) Important
7 Note
Get started
After identifying the Warehouse you would like to share with another user in your Fabric
workspace, select the quick action in the row to Share a Warehouse.
The following animated gif reviews the steps to select a warehouse to share, select the
permissions to assign, and then finally Grant the permissions to another user.
You can share your Warehouse from the OneLake Data Hub or the Synapse Data
Warehouse by choosing Share from quick action, as highlighted in the following image.
Share a Warehouse
You are prompted with options to select who you would like to share the Warehouse
with, what permission(s) to grant them, and whether they will be notified by email.
When you have filled in all the required fields, select Grant access.
ReadData, ReadAll, and Build are separate permissions that do not overlap.
"Read all SQL endpoint data" is selected ("ReadData" permissions)- The shared
recipient can read all the database objects within the Warehouse. ReadData is the
equivalent of db_datareader role in SQL Server. The shared recipient can read data
from all tables and views within the Warehouse. If you want to further restrict and
provide granular access to some objects within the Warehouse, you can do this
using T-SQL GRANT/REVOKE/DENY statements.
"Read all data using Apache Spark" is selected ("ReadAll" permissions)- The
shared recipient has read access to the underlying parquet files in OneLake, which
can be consumed using Spark. ReadAll should be provided only if the shared
recipient wants complete access to your warehouse's files using the Spark engine.
When the shared recipient receives the email, they can select Open and navigate to the
Warehouse Data Hub page.
Depending on the level of access the shared recipient has been granted, the shared
recipient is now able to connect to the SQL Endpoint, query the Warehouse, build
reports, or read data through Spark.
ReadData permissions
With ReadData permissions, the shared recipient can open the Warehouse editor in
read-only mode and query the tables and views within the Warehouse. The shared
recipient can also choose to copy the SQL Endpoint provided and connect to a client
tool to run these queries.
For example, in the following screenshot, a user with ReadData permissions can query
the warehouse.
ReadAll permissions
A shared recipient with ReadAll permissions can find the Azure Blob File System (ABFS)
path to the specific file in OneLake from the Properties pane in the Warehouse editor.
The shared recipient can then use this path within a Spark Notebook to read this data.
For example, in the following screenshot, a user with ReadAll permissions can query the
data in FactSale with a Spark query in a new notebook.
Build permissions
With Build permissions, the shared recipient can create reports on top of the default
dataset that is connected to the Warehouse. The shared recipient can create Power BI
reports from the Data Hub or also do the same using Power BI Desktop.
For example, in the following screenshot a user with Build permissions can start to
Auto-create a Power BI report based on the shared warehouse.
Manage permissions
The Manage permissions page shows the list of users who have been given access by
either assigning to Workspace roles or item permissions.
If you are an Admin or Member, go to your workspace and select More options. Then,
select Manage permissions.
For users who were provided workspace roles, it shows the corresponding user,
workspace role and permissions. Admin, Member and contributors have read/write
access to items in this workspace. Viewers have ReadData permissions and can query all
tables and views within the Warehouse in that workspace. Item permissions Read,
ReadData, and ReadAll can be provided to users.
You can choose to add or remove permissions using the "Manage permissions"
experience:
Limitations
If you provide item permissions or remove users who previously had permissions,
permission propagation can take up to two hours. The new permissions may
reflect in "Manage permissions" immediately. Sign in again to ensure that the
permissions are reflected in your SQL Endpoint.
Shared recipients are able to access the Warehouse using owner's identity
(delegated mode). Ensure that the owner of the Warehouse is not removed from
the workspace.
Shared recipients only have access to the Warehouse they receive and not any
other artifacts within the same workspace as the Warehouse. If you want to
provide permissions for other users in your team to collaborate on the Warehouse
(read and write access), add them as Workspace roles such as "Member" or
"Contributor".
Currently, when you share a Warehouse and choose Read all SQL endpoint data,
the shared recipient can access the Warehouse editor in a read-only mode. These
shared recipients can create queries, but cannot currently save their queries.
Currently, sharing a Warehouse is only available through the user experience.
If you want to provide granular access to specific objects within the Warehouse,
share the Warehouse with no additional permissions, then provide granular access
to specific objects using T-SQL GRANT statement. For more information, see T-SQL
syntax for GRANT, REVOKE, and DENY.
If you see that the ReadAll permissions and ReadData permissions are disabled in
the sharing dialog, refresh the page.
Shared recipients do not have permission to reshare a Warehouse.
If a report built on top of the Warehouse is shared with another recipient, the
shared recipient needs more permissions to access the report. This depends on the
dataset mode:
If accessed through Direct query mode then ReadData permissions (or granular
SQL permissions to specific tables/views) need to be provided to the
Warehouse.
If accessed through Direct lake mode, then ReadData permissions (or granular
permissions to specific tables/views) need to be provided to the Warehouse.
If accessed through Import mode then no additional permissions are needed.
Next steps
Query the Warehouse
How to use Microsoft Fabric notebooks
Accessing shortcuts
Navigate the Fabric Lakehouse explorer
Query using the visual query editor
Article • 06/07/2023
This article describes how to use the visual query editor in the Microsoft Fabric portal to
quickly and efficiently write queries. You can use the visual query editor for a no-code
experience to create your queries.
You can also query the data in your warehouse with multiple tools with a SQL
connection string.
You can use the SQL query editor to write T-SQL queries from the Microsoft Fabric
portal.
You can quickly view data in the Data preview.
) Important
Once you've loaded data into your warehouse, you can use the visual query editor to
create queries to analyze your data. There are two ways to get to the visual query editor:
In the ribbon, create a new query using the New visual query button, as shown in the
following image.
To create a query, drag and drop tables from the Object explorer on the left onto the
canvas. Once you drag one or more tables onto the canvas, you can use the visual
experience to design your queries. The warehouse editor uses the Power Query diagram
view experience to enable you to easily query and analyze your data. Learn more about
Power Query diagram view.
As you work on your visual query, the queries are automatically saved every few
seconds. A "saving indicator" appears in your query tab to indicate that your query is
being saved.
The following animated gif shows the merging of two tables using a no-code visual
query editor. First, the DimCity then FactSale are dragged from the Explorer into the
visual query editor. Then, the Merge Power Query operator is used to join them on a
common key.
When you see results, you can use Download Excel file to view results in Excel or
Visualize results to create report on results.
To create a cross-warehouse query, drag and drop tables from added warehouses
and add merge activity. For example, in the following image example, store_sales
is added from sales warehouse and it's merged with item table from marketing
warehouse.
Next steps
How-to: Query the Warehouse
Query using the SQL Query editor
Query using the SQL query editor
Article • 10/03/2023
This article describes how to use the SQL query editor in the Microsoft Fabric portal to
quickly and efficiently write queries, and suggestions on how best to see the information
you need.
You can also query the data in your warehouse with multiple tools with a SQL
connection string.
You can build queries graphically with the Visual query editor.
You can quickly view data in the Data preview.
The SQL query editor provides support for IntelliSense, code completion, syntax
highlighting, client-side parsing, and validation. You can run Data Definition Language
(DDL), Data Manipulation Language (DML) and Data Control Language (DCL)
statements.
) Important
Select the Query icon located at the bottom of the warehouse editor window.
Create a new query using the New SQL query button. If you select the dropdown,
you can easily create T-SQL objects with code templates that populate in your SQL
query window, as shown in the following image.
View query results
Once you've written the T-SQL query, select Run to execute the query.
The Results preview is displayed in the Results section. If number of rows returned is
more than 10,000 rows, the preview is limited to 10,000 rows. You can search string
within results grid to get filtered rows matching search criteria. The Messages tab shows
SQL messages returned when SQL query is run.
The status bar indicates the query status, duration of the run and number of rows and
columns returned in results.
To enable Save as view, Save as table, Download Excel file, and Visualize results menus,
highlight the SQL statement containing SELECT statement in the SQL query editor.
Save as view
You can select the query and save your query as a view using the Save as view button.
Select the schema name, provide name of view and verify the SQL statement before
confirming creating view. When view is successfully created, it appears in the Explorer.
Save as table
You can use Save as table to save your query results into a table. Select the warehouse
in which you would like to save results, select schema and provide table name to load
results into the table using CREATE TABLE AS SELECT statement. When table is
successfully created, it appears in the Explorer.
1. After you select the Continue button, locate the downloaded Excel file in your
Windows File Explorer, for example, in the Downloads folder of your browser.
2. To see the data, select the Enable Editing button in the Protected View ribbon
followed by the Enable Content button in the Security Warning ribbon. Once both
are enabled, you are presented with the following dialog to approve running the
query listed.
3. Select Run.
Once you have successfully signed in, you'll see the data presented in the spreadsheet.
Visualize results
Visualize results allows you to create reports from your query results within the SQL
query editor.
As you work on your SQL query, the queries are automatically saved every few seconds.
A "saving" indicator appears in your query tab at the bottom to indicate that your query
is being saved.
When you run multiple queries and those return multiple results, you can select results
drop down list to see individual results.
Cross-warehouse querying
For more information on cross-warehouse querying, see Cross-warehouse querying.
You can write a T-SQL query with three-part naming convention to refer to objects and
join them across warehouses, for example:
SQL
SELECT
emp.Employee
,SUM(Profit) AS TotalProfit
,SUM(Quantity) AS TotalQuantitySold
FROM
[SampleWarehouse].[dbo].[DimEmployee] as emp
JOIN
[WWI_Sample].[dbo].[FactSale] as sale
ON
emp.EmployeeKey = sale.SalespersonKey
WHERE
emp.IsSalesperson = 'TRUE'
GROUP BY
emp.Employee
ORDER BY
TotalProfit DESC;
Keyboard shortcuts
Keyboard shortcuts provide a quick way to navigate and allow users to work more
efficiently in SQL query editor. The table in this article lists all the shortcuts available in
SQL query editor in the Microsoft Fabric portal:
Function Shortcut
Undo Ctrl + Z
Redo Ctrl + Y
Move cursor up ↑
Limitations
In SQL query editor, every time you run the query, it opens a separate session and
closes it at the end of the execution. This means if you set up session context for
multiple query runs, the context is not maintained for independent execution of
queries.
You can run Data Definition Language (DDL), Data Manipulation Language (DML)
and Data Control Language (DCL) statements, but there are limitations for
Transaction Control Language (TCL) statements. In the SQL query editor, when you
select the Run button, you're submitting an independent batch request to execute.
Each Run action in the SQL query editor is a batch request, and a session only
exists per batch. Each execution of code in the same query window runs in a
different batch and session.
For example, when independently executing transaction statements, session
context is not retained. In the following screenshot, BEGIN TRAN was executed in
the first request, but since the second request was executed in a different
session, there is no transaction to commit, resulting into the failure of
commit/rollback operation. If the SQL batch submitted does not include a
COMMIT TRAN, the changes applied after BEGIN TRAN will not commit.
In the SQL query editor, the GO SQL command creates a new independent batch
in a new session.
When you are running a SQL query with USE, you need to submit the SQL query
with USE as one single request.
Visualize Results currently does not support SQL queries with an ORDER BY clause.
The following table summarizes the expected behavior will not match with SQL
Server Management Studio/Azure Data Studio:
Related content
Query using the Visual Query editor
Tutorial: Create cross-warehouse queries with the SQL query editor
Next step
How-to: Query the Warehouse
Feedback
Was this page helpful? Yes No
The Data preview is one of the three switcher modes along with the Query editor and
Model view within the warehouse experience that provides an easy interface to view the
data within your tables or views to preview sample data (top 1000 rows).
You can also query the data in your warehouse with multiple tools with a SQL
connection string.
You can use the SQL query editor to write T-SQL queries from the Microsoft Fabric
portal.
You can build queries graphically with the Visual query editor.
) Important
Get started
After creating a warehouse and ingesting data, select the Data tab. Choose a specific
table or view you would like to display in the data grid of the Data preview page.
Search value – Type in a specific keyword in the search bar and rows with that
specific keyword will be filtered. In this example, "New York" is the keyword and
only rows containing this keyword are shown. To clear the search, select on the X
inside the search bar.
Sort columns (alphabetically or numerically) – Hover over the column title and
select on the up/down arrow that appears next to the title.
Copy value – Right-click a cell within the table and a Copy option will appear to
copy the specific selection.
Warehouse in Microsoft Fabric is built up open file formats. User tables are stored in
parquet file format, and Delta Lake logs are published for all user tables.
The Delta Lake logs opens up direct access to the warehouse's user tables for any
engine that can read Delta Lake tables. This access is limited to read-only to ensure the
user data maintains ACID transaction compliance. All inserts, updates, and deletes to the
data in the tables must be executed through the Warehouse. Once a transaction is
committed, a system background process is initiated to publish the updated Delta Lake
log for the affected tables.
2. In the Object Explorer, you find more options (...) on a selected table in the Tables
folder. Select the Properties menu.
Delta Lake logs can be queried through shortcuts created in a lakehouse. You can
view the files using a Microsoft Fabric Spark Notebook or the Lakehouse explorer
in Synapse Data Engineering in the Microsoft Fabric portal.
Delta Lake logs can be found via Azure Storage Explorer, through Spark
connections such as the Power BI Direct Lake mode, or using any other service that
can read delta tables.
Delta Lake logs can be found in the _delta_log folder of each table through the
OneLake Explorer (Preview) in Windows, as shown in the following screenshot.
Limitations
Currently, tables with inserts only are supported.
Currently, Delta Lake log checkpoint and vacuum functions are unavailable.
Table Names can only be used by Spark and other systems if they only contain
these characters: A-Z a-z 0-9 and underscores.
Column Names that will be used by Spark and other systems cannot contain:
spaces
tabs
carriage returns
[
,
;
{
}
(
)
=
]
Next steps
Query the Warehouse
How to use Microsoft Fabric notebooks
OneLake overview
Accessing shortcuts
Navigate the Fabric Lakehouse explorer
Clone table in Microsoft Fabric
Article • 06/29/2023
Microsoft Fabric offers the capability to create near-instantaneous zero-copy clones with
minimal storage costs.
You can use the CREATE TABLE AS CLONE OF T-SQL commands to create a table clone.
For a tutorial, see Tutorial: Clone table using T-SQL.
) Important
There is no limit on the number of clones created both within and across schemas.
Separate and independent
Upon creation, a table clone is an independent and separate copy of the data from its
source. Changes made to the source table, such as adding new attributes or data, are
not reflected in the cloned table.
Similarly, any new attributes or data added to the cloned table are not applied to the
source table.
Users with Admin, Member, or Contributor workspace roles can clone the tables
within the workspace. The Viewer workspace role cannot create a clone.
SELECT permission on all the rows and columns of the source of the table clone is
required.
User must have CREATE TABLE permission in the schema where the table clone will
be created.
All attributes that exist at the source table are inherited by the table clone, whether
the clone was created within the same schema or across different schemas in a
warehouse.
The primary and unique key constraints defined in the source table are inherited
by the table clone.
A read-only delta log is created for every table clone that is created within the
Warehouse. The data files stored as delta parquet files are read-only. This ensures
that the data stays always protected from corruption.
Limitations
Table clones across warehouses in a workspace are not currently supported.
Table clones across workspaces are not currently supported.
The tables present in SQL Endpoint cannot be cloned through T-SQL.
Clone creation as of a previous point in time is not currently supported.
Clone of a warehouse or schema is currently not supported.
Next steps
CREATE TABLE AS CLONE OF
Tutorial: Clone table using T-SQL
Query the Warehouse
Statistics in Fabric data warehousing
Article • 07/09/2023
The Warehouse in Microsoft Fabric uses a query engine to create an execution plan for a
given SQL query. When you submit a query, the query optimizer tries to enumerate all
possible plans and choose the most efficient candidate. To determine which plan would
require the least overhead (I/O, CPU, memory), the engine needs to be able to evaluate
the amount of work or rows that might be processed at each operator. Then, based on
each plan's cost, it chooses the one with the least amount of estimated work. Statistics
are objects that contain relevant information about your data, to allow query optimizer
to estimate these costs.
) Important
User-defined statistics
User issues DDL to create, update, and drop statistics as needed
Automatic statistics
Engine automatically creates and maintains statistics at querytime
SQL
SQL
SQL
SQL
SQL
The following T-SQL objects can also be used to check both manually created and
automatically created statistics in Microsoft Fabric:
sys.stats catalog view
sys.stats_columns catalog view
STATS_DATE system function
SQL
SELECT <COLUMN_NAME>
FROM <YOUR_TABLE_NAME>
GROUP BY <COLUMN_NAME>;
In this case, you should expect that statistics for COLUMN_NAME to have been created. If
the column was also a varchar column, you would also see average column length
statistics created. If you'd like to validate statistics were automatically created, you can
run the following query:
SQL
select
object_name(s.object_id) AS [object_name],
c.name AS [column_name],
s.name AS [stats_name],
s.stats_id,
STATS_DATE(s.object_id, s.stats_id) AS [stats_update_date],
s.auto_created,
s.user_created,
s.stats_generation_method_desc
FROM sys.stats AS s
INNER JOIN sys.objects AS o
ON o.object_id = s.object_id
INNER JOIN sys.stats_columns AS sc
ON s.object_id = sc.object_id
AND s.stats_id = sc.stats_id
INNER JOIN sys.columns AS c
ON sc.object_id = c.object_id
AND c.column_id = sc.column_id
WHERE o.type = 'U' -- Only check for stats on user-tables
AND s.auto_created = 1
AND o.name = '<YOUR_TABLE_NAME>'
ORDER BY object_name, column_name;
This query only looks for column-based statistics. If you'd like to see all statistics that
exist for this table, remove the JOINs on sys.stats_columns and sys.columns .
Now, you can find the statistics_name of the automatically generated histogram
statistic (should be something like _WA_Sys_00000007_3B75D760 ) and run the following T-
SQL:
SQL
For example:
SQL
The Updated value in the result set of DBCC SHOW_STATISTICS should be a date (in UTC)
similar to when you ran the original GROUP BY query.
Histogram statistics
Created per column needing histogram statistics at querytime
These objects contain histogram and density information regarding the
distribution of a particular column. Similar to the statistics automatically created
at querytime in Azure Synapse Analytics dedicated pools.
Name begins with _WA_Sys_ .
Contents can be viewed with DBCC SHOW_STATISTICS
Average column length statistics
Created for character columns (char and varchar) needing average column
length at querytime.
These objects contain a value representing the average row size of the varchar
column at the time of statistics creation.
Name begins with ACE-AverageColumnLength_ .
Contents cannot be viewed and are nonactionable by user.
Table-based cardinality statistics
Created per table needing cardinality estimation at querytime.
These objects contain an estimate of the rowcount of a table.
Named ACE-Cardinality .
Contents cannot be viewed and are nonactionable by user.
Limitations
Only single-column histogram statistics can be manually created and modified.
Multi-column statistics creation is not supported.
Other statistics objects may show under sys.stats aside from manually created
statistics and automatically created statistics. These objects are not used for query
optimization.
Next steps
Monitoring connections, sessions, and requests using DMVs
Caching in Fabric data warehousing
Article • 09/06/2023
) Important
Retrieving data from the data lake is crucial input/output (IO) operation with substantial
implications for query performance. In Microsoft Fabric, Synapse Data Warehouse
employs refined access patterns to enhance data reads from storage and elevate query
execution speed. Additionally, it intelligently minimizes the need for remote storage
reads by leveraging local caches.
There are two types of caches that are described later in this article:
In-memory cache
Disk cache
In-memory cache
As the query accesses and retrieves data from storage, it performs a transformation
process that transcodes the data from its original file-based format into highly
optimized structures in in-memory cache.
Data in cache is organized in a compressed columnar format optimized for analytical
queries. Each column of data is stored together, separate from the others, allowing for
better compression since similar data values are stored together, leading to reduced
memory footprint. When queries need to perform operations on a specific column like
aggregates or filtering, the engine can work more efficiently since it doesn't have to
process unnecessary data from other columns.
Additionally, this columnar storage is also conducive to parallel processing, which can
significantly speed up query execution for large datasets. The engine can perform
operations on multiple columns simultaneously, taking advantage of modern multi-core
processors.
This approach is especially beneficial for analytical workloads where queries involve
scanning large amounts of data to perform aggregations, filtering, and other data
manipulations.
Disk cache
Certain datasets are too large to be accommodated within an in-memory cache. To
sustain rapid query performance for these datasets, Warehouse utilizes disk space as a
complementary extension to the in-memory cache. Any information that is loaded into
the in-memory cache is also serialized to the SSD cache.
Given that the in-memory cache has a smaller capacity compared to the SSD cache, data
that is removed from the in-memory cache remains within the SSD cache for an
extended period. When subsequent query requests this data, it is retrieved from the SSD
cache into the in-memory cache at a significantly quicker rate than if fetched from
remote storage, ultimately providing you with more consistent query performance.
Cache management
Caching remains consistently active and operates seamlessly in the background,
requiring no intervention on your part. Disabling caching is not needed, as doing so
would inevitably lead to a noticeable deterioration in query performance.
The caching mechanism is orchestrated and upheld by the Microsoft Fabric itself, and it
doesn't offer users the capability to manually clear the cache.
Full cache transactional consistency ensures that any modifications to the data in
storage, such as through Data Manipulation Language (DML) operations, after it has
been initially loaded into the in-memory cache, will result in consistent data.
When the cache reaches its capacity threshold and fresh data is being read for the first
time, objects that have remained unused for the longest duration will be removed from
the cache. This process is enacted to create space for the influx of new data and
maintain an optimal cache utilization strategy.
Next steps
Synapse Data Warehouse in Microsoft Fabric performance guidelines
Feedback
Was this page helpful? Yes No
This article describes the architecture and workload management behind data
warehousing in Microsoft Fabric.
) Important
Data processing
The Warehouse and SQL Endpoint share the same underlying processing architecture.
As data is retrieved or ingested, it leverages a distributed engine built for both small and
large-scale data and computational functions.
The processing system is serverless in that backend compute capacity scales up and
down autonomously to meet workload demands.
When a query is submitted, the SQL frontend (FE) performs query optimization to
determine the best plan based on the data size and complexity. Once the plan is
generated, it is given to the Distributed Query Processing (DQP) engine. The DQP
orchestrates distributed execution of the query by splitting it into smaller queries that
are executed on backend compute nodes. Each small query is called a task and
represents a distributed execution unit. It reads file(s) from OneLake, joins results from
other tasks, groups, or orders data retrieved from other tasks. For ingestion jobs, it also
writes data to the proper destination tables.
When data is processed, results are returned to the SQL frontend for serving back to the
user or calling application.
The system is fault tolerant and if a node becomes unhealthy, operations executing on
the node are redistributed to healthy nodes for completion.
As queries arrive, their tasks are scheduled based on first-in-first-out (FIFO) principles. If
there is idle capacity, the scheduler may use a "best fit" approach to optimize
concurrency.
When the scheduler identifies resourcing pressure, it invokes a scale operation. Scaling
is managed autonomously and backend topology grows as concurrency increases. As it
takes a few seconds to acquire nodes, the system is not optimized for consistent
subsecond performance of queries that require distributed processing.
When pressure subsides, backend topology scales back down and releases resource
back to the region.
Ingestion isolation
Applies to: Warehouse in Microsoft Fabric
In the backend compute pool of Warehouse in Microsoft Fabric, loading activities are
provided resource isolation from analytical workloads. This improves performance and
reliability, as ingestion jobs can run on dedicated nodes that are optimized for ETL and
do not compete with other queries or applications for resources.
Best practices
The Microsoft Fabric workspace provides a natural isolation boundary of the distributed
compute system. Workloads can take advantage of this boundary to manage both cost
and performance.
The article explains compute usage reporting of the Synapse Data Warehouse in
Microsoft Fabric, which includes read and write activity against the Warehouse, and read
activity on the SQL Endpoint of the Lakehouse.
When you use a Fabric capacity, your usage charges appear in the Azure portal under
your subscription in Microsoft Cost Management. To understand your Fabric billing, visit
Understand your Azure bill on a Fabric capacity.
Capacity
In Fabric, based on the Capacity SKU purchased, you're entitled to a set of Capacity
Units (CUs) that are shared across all Fabric workloads. For more information on licenses
supported, see Microsoft Fabric licenses.
CUs consumed by data warehousing include read and write activity against the
Warehouse, and read activity on the SQL Endpoint of the Lakehouse.
The "CPU time" metric captures usage of compute resources when requests are
executed. "CPU time" isn't the same as elapsed time, it's the time spent by compute
cores in execution of a request. Similar to how Windows accounts for Processor Time,
multi-threaded workloads record more than one second of "CPU time" per second.
Once you have installed the app, select the Warehouse from the Select item kind:
dropdown list. The Multi metric ribbon chart chart and the Items (14 days) data table
now show only Warehouse activity.
Both the Warehouse and SQL Endpoint roll up under Warehouse in the Metrics app, as
they both use SQL compute. The operation categories seen in this view are:
Warehouse Query: Compute charge for all user generated and system generated
T-SQL statements within a warehouse.
SQL Endpoint Query: Compute charge for all user generated and system
generated T-SQL statements within a SQL Endpoint.
OneLake Compute: Compute charge for all reads and writes for data stored in
OneLake.
For example:
Billing example
Consider the following query:
SQL
For demonstration purposes, assume the billing metric accumulates 100 CPU seconds.
The cost of this query is CPU time times the price per CU. Assume in this example that
the price per CU is $0.18/hour. The cost would be (100 x 0.18)/3600 = $0.005.
The numbers used in this example are for demonstration purposes only and not actual
billing metrics.
Considerations
Consider the following usage reporting nuances:
Cross database reporting: When a T-SQL query joins across multiple warehouses
(or across a Warehouse and a SQL Endpoint), usage is reported against the
originating resource.
Queries on system catalog views and dynamic management views are billable
queries.
Duration(s) field reported in Fabric Capacity Metrics App is for informational
purposes only. It reflects the statement execution duration and might not include
the complete end-to-end duration for rendering results back to the web
application like the SQL Query Editor or client applications like SQL Server
Management Studio and Azure Data Studio.
Related content
Monitor connections, sessions, and requests using DMVs
Workload management
Synapse Data Warehouse in Microsoft Fabric performance guidelines
What is the Microsoft Fabric Capacity Metrics app?
Smoothing and throttling in Fabric Data Warehousing
Understand your Azure bill on a Fabric capacity
Understand the metrics app overview page
Next step
How to: Observe Synapse Data Warehouse utilization trends
Feedback
Was this page helpful? Yes No
Provide product feedback | Ask the community
Smoothing and throttling in Fabric Data
Warehousing
Article • 10/04/2023
This article details the concepts of smoothing and throttling in workloads using
Warehouse and SQL Endpoint in Microsoft Fabric.
This article is specific to data warehousing workloads in Microsoft Fabric. For all Fabric
workloads, visit Throttling in Microsoft Fabric.
) Important
Compute capacity
Capacity forms the foundation in Microsoft Fabric and provides the computing power
that drives all Fabric workload experiences. Based on the Capacity SKU purchased, you're
entitled to a set of Capacity Units (CUs) that are shared across Fabric. You can review the
CUs for each SKU at Capacity and SKUs.
Smoothing
Capacities have periods where they're under-utilized (idle) and over-utilized (peak).
When a capacity is running multiple jobs, a sudden spike in compute demand may be
generated that exceeds the limits of a purchased capacity.
Smoothing offers relief for customers who create sudden spikes during their peak times
while they have a lot of idle capacity that is unused. Smoothing simplifies capacity
management by spreading the evaluation of compute to ensure that customer jobs run
smoothly and efficiently.
For interactive jobs run by users: capacity consumption is typically smoothed over
a minimum of 5 minutes, or longer, to reduce short-term temporal spikes.
For scheduled, or background jobs: capacity consumption is spread over 24 hours,
eliminating the concern for job scheduling or contention.
Throttling
Throttling occurs when a customer's capacity consumes more CPU resources than what
was purchased. After consumption is smoothed, capacity throttling policies will be
checked based on the amount of future capacity consumed. This results in a degraded
end-user experience. When a capacity enters a throttled state, it only affects operations
that are requested after the capacity has begun throttling.
Throttling policies are applied at a capacity level, meaning that while one capacity, or set
of workspaces, may be experiencing reduced performance due to being overloaded,
other capacities may continue running normally.
10 minutes < Usage <=60 Interactive User-requested interactive jobs are delayed 20
minutes Delay seconds at submission.
Future Smoothed Throttling Experience Impact
Consumption - Policy Policy
Limits
60 minutes < Usage <= 24 Interactive User requested interactive type jobs are
hours Rejection rejected.
Usage > 24 hours Background All new jobs are rejected from execution. This
Rejection is the category for most Warehouse
operations.
All Warehouse and SQL Endpoint operations follow "Background Rejection" policy, and
as a result experience operation rejection only after over-utilization averaged over a 24-
hour period.
Throttling considerations
Any inflight operations including long-running queries, stored procedures, batches
won't get throttled mid-way. Throttling policies are applicable to the next
operation after consumption is smoothed.
Almost all Warehouse requests are considered background. Some requests may
trigger a string of operations that are throttled differently. This can make a
background operation become subject to throttling as an interactive operation.
Some Warehouse operations in the Fabric Portal may be subject to the "Interactive
Rejection" policy, as they invoke other Power BI services. Examples include creating
a warehouse, which invokes a call to Power BI to create a default dataset, and
loading the "Model" page, which invokes a call to Power BI modeling service.
Just like most Warehouse operations, dynamic management views (DMVs) are also
classified as background and covered by the "Background Rejection" policy. Even
though DMVs are not available, capacity admins can go to Microsoft Fabric
Capacity Metrics app to understand the root cause.
If you attempt to issue a T-SQL query when the "Background Rejection" policy is
enabled, you may see error message: Your request was rejected due to resource
constraints. Try again later .
If you attempt to connect to a warehouse via SQL connection string when the
"Background Rejection" policy is enabled, you may see error message: Your
request was rejected due to resource constraints. Try again later (Microsoft
SQL Server Server, Error: 18456) .
Utilization tab
This screenshot shows when the "Autoscale %" (the yellow line) was enabled to prevent
throttling of peak utilization. When the "Interactive %" (red line) exceeded the CU limit,
throttling policies were in effect. This example doesn't indicate any throttling of
background operations in capacity.
Throttling tab
To monitor and analyze throttling policies, a throttling tab is added to the usage graph.
With this, capacity admins can easily observe future usage as a percentage of each limit,
and even drill down to specific workloads that contributed to an overage. For more
information, refer to Throttling in the Metrics App.
Overages Tab
The Overages tab provides a visual history of any overutilization of capacity, including
carry forward, cumulative, and burndown of utilization. For more information, refer to
Throttling in Microsoft Fabric and Overages in the Microsoft Fabric Capacity Metrics app.
Related content
Billing and utilization reporting in Synapse Data Warehouse
What is the Microsoft Fabric Capacity Metrics app?
How to: Observe Synapse Data Warehouse utilization trends
Synapse Data Warehouse in Microsoft Fabric performance guidelines
Understand your Azure bill on a Fabric capacity
Throttling in Microsoft Fabric
Feedback
Was this page helpful? Yes No
You can use existing dynamic management views (DMVs) to monitor connection,
session, and request status in Microsoft Fabric. For more information about the tools
and methods of executing T-SQL queries, see Query the Warehouse.
) Important
sys.dm_exec_connections
Returns information about each connection established between the warehouse
and the engine.
sys.dm_exec_sessions
Returns information about each session authenticated between the item and
engine.
sys.dm_exec_requests
Returns information about each active request in a session.
SQL
SELECT *
FROM sys.dm_exec_sessions;
SQL
SELECT connections.connection_id,
connections.connect_time,
sessions.session_id, sessions.login_name, sessions.login_time,
sessions.status
FROM sys.dm_exec_connections AS connections
INNER JOIN sys.dm_exec_sessions AS sessions
ON connections.session_id=sessions.session_id;
SQL
This second query shows which user ran the session that has the long-running query.
SQL
SELECT login_name
FROM sys.dm_exec_sessions
WHERE 'session_id' = 'SESSION_ID WITH LONG-RUNNING QUERY';
This third query shows how to use the KILL command on the session_id with the long-
running query.
SQL
For example
SQL
KILL '101'
Permissions
An Admin has permissions to execute all three DMVs ( sys.dm_exec_connections ,
sys.dm_exec_sessions , sys.dm_exec_requests ) to see their own and others'
information within a workspace.
A Member, Contributor, and Viewer can execute sys.dm_exec_sessions and
sys.dm_exec_requests and see their own results within the warehouse, but does
Learn how to observe trends and spikes in your data warehousing workload in Microsoft
Fabric using the Microsoft Fabric Capacity Metrics app.
The Microsoft Fabric Capacity Metrics app provides visibility into capacity usage for all
Fabric workloads in one place. It's mostly used by capacity administrators to monitor the
performance of workloads and their usage, compared to purchased capacity.
Prerequisites
Have a Microsoft Fabric licenses, which grants Capacity Units (CUs) shared across
all Fabric workloads.
Add the Microsoft Fabric Capacity Metrics app from AppSource.
This graph can provide high-level CU trends in the last 14 days to see which Fabric
workload has used the most CU.
1. Use the Item table to identify specific warehouses consuming most Compute. The
Items table below the multi metric ribbon chart provides aggregated consumption
at item level. In this view, for example, you can identify which items have
consumed the most CUs.
2. Select "Warehouse" in the Select item kind(s) dropdown list.
3. Sort by CU(s) descending.
This graph shows granular usage into a list of operations that were at the selected
timepoint.
Yellow dotted line provides visibility into upper SKU limit boundary based on the
SKU purchased along with the enablement of autoscale, if a user has configured
their capacity with autoscale enabled.
When you zoom in and select a specific time point, you can observe the usage at
the CU limit. With a specific timepoint or range selected, then select the Explore
button.
Apply a filter to drill down into specific warehouse usage in the familiar Filter
pane. Expand the ItemKind list and Warehouse.
Sort by total CU(s) descending.
In this example, you can identify users, operations, start/stop times, durations
that consumed the most CUs.
The table includes an Operation Id for a specific operation. This is the unique
identifier, which can be used in other monitoring tools like dynamic
management views (DMVs) for end-to-end traceability, such as in
dist_statement_id in sys.dm_exec_requests.
The table of operations also provides a list of operations that are InProgress, so
you can understand long running queries and its current CU consumption.
Related content
Billing and utilization reporting in Synapse Data Warehouse
Monitor connections, sessions, and requests using DMVs
Workload management
Synapse Data Warehouse in Microsoft Fabric performance guidelines
What is the Microsoft Fabric Capacity Metrics app?
Smoothing and throttling in Fabric Data Warehousing
Feedback
Was this page helpful? Yes No
) Important
1. Refer to the article about statistics to verify proper column statistics have been
created on all tables.
2. Ensure all table statistics are updated after large DML transactions.
3. Queries with complex JOINs, GROUP BY, and ORDER BY and expect to return large
result set use more tempdb space in execution. Update queries to reduce the
number of GROUP BY and ORDER BY columns if possible.
4. Check for data skew in base tables.
5. Rerun the query when there's no other active queries running to avoid resource
constraint during query execution.
6. Pause and resume the service to flush tempdb data.
1. Identify the differences in all performance-affecting factors among good and bad
performance runs.
2. Refer to the article about statistics to verify proper column statistics have been
created on all tables.
3. Ensure all table statistics are updated after large DML transactions.
4. Check for data skew in base tables.
5. Pause and resume the service. Then, rerun the query when there's no other active
queries running. You can monitor the warehouse workload using DMV.
Next steps
Monitoring connections, sessions, and requests using DMVs
Limitations in Microsoft Fabric
Article • 07/12/2023
) Important
Limitations
Data Warehousing in Microsoft Fabric is currently in preview. The focus of this preview is
on providing a rich set of SaaS features and functionality tailored to all skill levels. The
preview delivers on the promise of providing a simplified experience through an open
data format over a single copy of data. This release is not focused on performance,
concurrency, and scale. Additional functionality will build upon the world class, industry-
leading performance and concurrency story, and will land incrementally as we progress
towards General Availability of data warehousing in Microsoft Fabric.
Current general product limitations for Data Warehousing in Microsoft Fabric are listed
in this article, with feature level limitations called out in the corresponding feature
article.
IMPORTANT At this time, there's limited T-SQL functionality, and certain T-SQL
commands can cause warehouse corruption. See T-SQL surface area for a list of T-
SQL command limitations.
Warehouse recovery capabilities are not available during preview.
Data warehousing is not supported for multiple geographies at this time. Your
Warehouse and Lakehouse items should not be moved to a different region during
preview.
Delta tables created outside of the /tables folder aren't available in the SQL
Endpoint.
If you don't see a Lakehouse table in the warehouse, check the location of the
table. Only the tables that are referencing data in the /tables folder are available
in the warehouse. The tables that reference data in the /files folder in the lake
aren't exposed in the SQL Endpoint. As a workaround, move your data to the
/tables folder.
Some columns that exist in the Spark Delta tables might not be available in the
tables in the SQL Endpoint. Refer to the Data types for a full list of supported data
types.
If you add a foreign key constraint between tables in the SQL Endpoint, you won't
be able to make any further schema changes (for example, adding the new
columns). If you don't see the Delta Lake columns with the types that should be
supported in SQL Endpoint, check if there is a foreign key constraint that might
prevent updates on the table.
Known issues
For known issues in Microsoft Fabric, visit Microsoft Fabric Known Issues .
Next steps
Get Started with Warehouse
Transact-SQL reference (Database
Engine)
Article • 07/12/2023
Applies to: SQL Server Azure SQL Database Azure SQL Managed Instance
Azure Synapse Analytics Analytics Platform System (PDW) SQL Endpoint in
Microsoft Fabric Warehouse in Microsoft Fabric
This article gives the basics about how to find and use the Microsoft Transact-SQL (T-
SQL) reference articles. T-SQL is central to using Microsoft SQL products and services. All
tools and applications that communicate with a SQL Server database do so by sending
T-SQL commands.
For example, this article applies to all versions, and has the following label.
Applies to: SQL Server Azure SQL Database Azure SQL Managed Instance
Azure Synapse Analytics Analytics Platform System (PDW)
Another example, the following label indicates an article that applies only to Azure
Synapse Analytics and Parallel Data Warehouse.
In some cases, the article is used by a product or service, but all of the arguments aren't
supported. In this case, other Applies to sections are inserted into the appropriate
argument descriptions in the body of the article.
Next steps
Tutorial: Writing Transact-SQL Statements
Transact-SQL Syntax Conventions (Transact-SQL)
Microsoft Learn documentation
contributor guide overview
Article • 02/16/2023
Sharing your expertise with others on Microsoft Learn helps everyone achieve more. Use
the information in this guide to publish a new article to Microsoft Learn or make
updates to an existing published article.
Several of the Microsoft documentation sets are open source and hosted on GitHub.
Not all document sets are completely open source, but many have public-facing repos
where you can suggest changes via pull requests (PR). This open-source approach
streamlines and improves communication between product engineers, content teams,
and customers, and it has other advantages:
Open-source repos plan in the open to get feedback on what docs are most
needed.
Open-source repos review in the open to publish the most helpful content on our
first release.
Open-source repos update in the open to make it easier to continuously improve
the content.
The user experience on Microsoft Learn integrates GitHub workflows directly to make
it even easier. Start by editing the document you're viewing. Or help by reviewing new
topics or creating quality issues.
) Important
All repositories that publish to Microsoft Learn have adopted the Microsoft Open
Source Code of Conduct or the .NET Foundation Code of Conduct . For more
information, see the Code of Conduct FAQ . Contact [email protected]
or [email protected] with any questions or comments.
1. Some docs pages allow you to edit content directly in the browser. If so, you'll see
an Edit button like the one shown below. Choosing the Edit (or equivalently
localized) button takes you to the source file on GitHub.
If the Edit button isn't present, it means the content isn't open to public
contributions. Some pages are generated (for example, from inline documentation
in code) and must be edited in the project they belong to.
2. Select the pencil icon to edit the article. If the pencil icon is grayed out, you need
to either log in to your GitHub account or create a new account.
3. Edit the file in the web editor. Choose the Preview tab to check the formatting of
your changes.
4. When you're finished editing, scroll to the bottom of the page. In the Propose
changes area, enter a title and optionally a description for your changes. The title
will be the first line of the commit message. Select Propose changes to create a
new branch in your fork and commit your changes:
5. Now that you've proposed and committed your changes, you need to ask the
owners of the repository to "pull" your changes into their repository. This is done
using something called a "pull request" (PR). When you select Propose changes, a
new page similar to the following is displayed:
Select Create pull request. Next, enter a title and a description for the PR, and then
select Create pull request. If you're new to GitHub, see About pull requests for
more information.
6. That's it! Content team members will review your PR and merge it when it's
approved. You may get feedback requesting changes.
The GitHub editing UI responds to your permissions on the repository. The preceding
images are for contributors who don't have write permissions to the target repository.
GitHub automatically creates a fork of the target repository in your account. The newly
created fork name has the form GitHubUsername / RepositoryName by default. If you have
write access to the target repository, such as your fork, GitHub creates a new branch in
the target repository. The branch name has the default form patch- n , using a numeric
identifier for the patch branch.
We use PRs for all changes, even for contributors who have write access. Most
repositories protect the default branch so that updates must be submitted as PRs.
The in-browser editing experience is best for minor or infrequent changes. If you make
large contributions or use advanced Git features (such as branch management or
advanced merge conflict resolution), you need to fork the repo and work locally.
7 Note
Most localized documentation doesn't offer the ability to edit or provide feedback
through GitHub. To provide feedback on localized content, use
https://aka.ms/provide-feedback form.
Issues start the conversation about what's needed. The content team will respond to
these issues with ideas for what we can add, and ask for your opinions. When we create
a draft, we'll ask you to review the PR.