0% found this document useful (0 votes)
247 views329 pages

Data Engineer

The document outlines a comprehensive course on mastering Microsoft Fabric, focusing on an end-to-end project with Continuous Integration and Continuous Deployment (CICD). It covers prerequisites, course content, and the architecture of Microsoft Fabric, emphasizing its components and benefits for data analytics. Additionally, it provides insights into costs, licensing, and practical applications of Microsoft Fabric in data engineering and analysis.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
247 views329 pages

Data Engineer

The document outlines a comprehensive course on mastering Microsoft Fabric, focusing on an end-to-end project with Continuous Integration and Continuous Deployment (CICD). It covers prerequisites, course content, and the architecture of Microsoft Fabric, emphasizing its components and benefits for data analytics. Additionally, it provides insights into costs, licensing, and practical applications of Microsoft Fabric in data engineering and analysis.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 329

Master Microsoft Fabric: A

Complete End-to-End Project- CICD

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Fabric Lakehouse Fabric Warehouse OneLake Data Factory

End to end project with Continuous Integration and Continuous Deployment (CICD)
Author: Shanmukh Sattiraju
https://in.linkedin.com/in/shanmukh-sattiraju
End to End flow

Raw Landing Bronze Silver Gold

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Prerequisites
• No experience is needed in Microsoft Fabric
• An Azure account for hands-on practical

• Basic knowledge on Python/PySpark and SQL/SparkSQL

• Basic knowledge on Azure Cloud Environment

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
What you’ll get from this course?

• Nearly 21+ hours of updated learning content


• Deep dive into components of Microsoft Fabric
• Hands-on end to end project
• Implementing CICD in Microsoft Fabric
• Lifetime access to this Course
• Certificate of completion at end of the course

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Fabric Project Architecture

Data science

Landing Bronze Silver Gold Power BI

LMS Data
Analysis

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Analyze LMS data with Power BI

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Continuous Integration and Continuous
Deployment (CICD) in Microsoft Fabric

Dev Workspace Prod Workspace

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Learning Structure
Understanding Microsoft Fabric Access Control and Permissions

Lakehouse Power BI in Fabric

Data Factory

OneLake End to End project

Synapse Data Engineering


Git Integration
Synapse migration to Fabric

Capacity Metrics App

Synapse Data Warehouse

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Environment Setup

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Microsoft
Fabric
A unified analytics solution
for the era of AI

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Evolution of Data Architectures

Data Warehouse Modern Data Warehouse Lakehouse Architecture


(Data lake) (Delta lake)
Author: Shanmukh Sattiraju
https://in.linkedin.com/in/shanmukh-sattiraju
Lakehouse Architecture

Best elements of Best elements of

Data lake Data warehouse

Lakehouse

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Lakehouse Architecture

BI Reports Data-Science ML

Metadata, caching Layer

Datalake

Structured, Semi- Structured & Unstructured Data


Author: Shanmukh Sattiraju
https://in.linkedin.com/in/shanmukh-sattiraju
How to create delta lake?
Instead of parquet.. Replace with delta..

dataframe. dataframe.
write\ write\
.format(“parquet”)\ .format(“delta”)\
.save(“/data/”) .save(“/data/”)

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Delta format

Azure Data Lake


Storage

Parquet + Transaction Log

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Author: Shanmukh Sattiraju
https://in.linkedin.com/in/shanmukh-sattiraju
Why Microsoft Fabric?

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Typical Data workflow

Source ETL Report


Author: Shanmukh Sattiraju
https://in.linkedin.com/in/shanmukh-sattiraju
Typical Data workflow

Ingest Transform Load

Store

Source Report

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Azure Data Services

Ingest Transform Load

Store
Source Report

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Azure Data Services

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Azure Data Services

Azure Data Synapse Data


SQL DW Power BI Data lake
Factory Analytics Explorer

Computes

Serverless Dedicated Data


Spark pool
SQL Pool SQL pool Explorer pool

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Microsoft Fabric
Data Synapse Data Synapse Data Synapse Data Synapse Real Data
Power BI
Factory Engineering Warehousing Science Time Analytics Activator

Onelake

SaaS Foundation

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Why Microsoft Fabric?
Microsoft’s definition:

“Microsoft Fabric is an all-in-one analytics solution for enterprises that


covers everything from data movement to data science, Real-Time
Analytics, and business intelligence. It offers a comprehensive suite of
services, including data lake, data engineering, and data integration, all
in one place.”

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
How to enable / access Fabric?

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Enable / access Microsoft Fabric

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Fabric Capacity License
*
SKU Capacity Units (CU) Power BI SKU Power BI v-cores
F2 2 - 0.25
F4 4 - 0.5
F8 8 EM/A1 1
F16 16 EM2/A2 2
F32 32 EM3/A3 4
F64 64 P1/A4 8
Trial 64 - 8
F128 128 P2/A5 16
F256 256 P3/A6 32
F512 512 P4/A7 64
F1024 1024 P5/A8 128
F2048 2048 - 256

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Fabric Capacity License
*
SKU Capacity Units (CU) Power BI SKU Power BI v-cores
F2 2 - 0.25
F4 4 - 0.5
F8 8 EM/A1 1 Power BI Pro
F16 16 EM2/A2 2
F32 32 EM3/A3 4
F64 64 P1/A4 8
Trial 64 - 8
F128 128 P2/A5 16
F256 256 P3/A6 32
F512 512 P4/A7 64
F1024 1024 P5/A8 128
F2048 2048 - 256

Reference: Microsoft Fabric concepts -Author:


Microsoft Fabric
Shanmukh | Microsoft Learn
Sattiraju
https://in.linkedin.com/in/shanmukh-sattiraju
Spark nodes per capacity

Reference: Configure and manage starter pools inShanmukh


Author: Fabric Sattiraju
Spark. - Microsoft Fabric | Microsoft Learn
https://in.linkedin.com/in/shanmukh-sattiraju
Cost of using Fabric?

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Cost of using Fabric?
Compute cost :

SKU Capacity unit (CU) Pay-as-you-go

F2 2 $0.36/hour

F4 4 $0.72/hour

F8 8 $1.44/hour

F 16 16 $2.88/hour

F 32 32 $5.76/hour

F 64 64 $11.52/hour

F 128 128 $23.04/hour

F 256 256 $46.08/hour

F 512 512 $92.16/hour

F 1024 1024 $184.32/hour

F 2048 2048 $368.64/hour

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Cost of using Fabric?
Storage cost :

Storage Price
**
OneLake storage/month $0.023 per GB

OneLake BCDR storage/month $0.0414 per GB


*
OneLake cache/month $0.246 per GB

Resources: Resources: Microsoft Fabric - Pricing | Microsoft Azure

Pricing Calculator | Microsoft Azure

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Benefits of Fabric
• One environment for everyone
• SaaS platform makes development easier
• Universal compute
• All data is open-source delta lake
• OneLake = one copy of data
• Sharing data between workspaces
• OneSecurity

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Components / Features of
Fabric

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Experiences of Microsoft Fabric

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Fabric Home

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Experiences

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Power BI

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Experiences

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Data Engineering

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Fabric Terminology

Item
Workspace

Experience Item
Workspace

Fabric home
Item
Workspace

Experience Item
Workspace

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Fabric Terminology
Example:
Experience Workspace Item

Data pipeline
Data
Finance
Engineering

Notebook
Fabric home

Semantic
Model
Power BI Marketing
Report

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
OneLake

Before Fabric After Fabric

Data lake Data lake OneLake

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
OneLake

OneLake

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
OneLake

One Copy of the Data

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
One Copy for all computes
Data Synapse Data Synapse Data Synapse Data Synapse Real Data
Power BI
Factory Engineering Warehousing Science Time Analytics Activator

Onelake

SaaS Foundation

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
One Copy for all computes
Data Synapse Data Synapse Data Synapse Data Synapse Real Data
Power BI
Factory Engineering Warehousing Science Time Analytics Activator

Serverless Compute

Onelake

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
One Copy for all computes
Data Synapse Data Synapse Data Synapse Data Synapse Real Data
Power BI
Factory Engineering Warehousing Science Time Analytics Activator

Analysis
Spark SQL KQL Services
Serverless Compute

Lakehouse Warehouse Semantic


Lakehouse
model
Onelake

Delta-Parquet Delta-Parquet Delta-Parquet Delta-Parquet


Format Format Format Format
Author: Shanmukh Sattiraju
https://in.linkedin.com/in/shanmukh-sattiraju
Lakehouse

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Access Microsoft Fabric
Microsoft Fabric Tenant

Capacity Capacity

Team A Workspace Team B Workspace Finance Workspace

Team C Workspace Sales Workspace

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Workspace roles
Each workspace role contains permissions that allow users to perform certain actions

Role Can add admins? Can add Can write data and Can read data?
members? create items?
Admin Yes Yes Yes Yes
Member No Yes Yes Yes
Contributor No No Yes Yes
Viewer No No No Yes

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Creating Lakehouse
• Lakehouse
• Semantic Model (default)
• SQL analytics endpoint

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Inside Lakehouse ( Lakehouse Explorer)

Contains only tables, whether they were automatically generated


Tables or explicitly created and registered in the
metastore.

Managed
It displays any folders or files present in the managed area that
Unidentified lack the associated tables.
If the table created is not a delta table, it will automatically get
saved in the Unidentified folder.

"landing zone" for raw data ingested from various sources


Unmanaged Files metastore.
Author: Shanmukh Sattiraju
https://in.linkedin.com/in/shanmukh-sattiraju
Upload data to Lakehouse
• To see some features of Lakehouse we need some sample data
• So we are uploading a file to get some sample data
1. Upload CSV to Tables section (will show as unidentified)
2. Upload CSV to files section

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
OneLake Explorer
• Application that integrates OneLake with Windows File Explorer
• You can install it and use it like Onedrive
• Installation URL : Access Fabric data locally with OneLake file explorer
- Microsoft Fabric | Microsoft Learn

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
SQL analytics endpoint of Lakehouse

You can perform the following actions in the SQL analytics


endpoint:
• Query the tables that reference data in your Delta Lake
folders in the lake.
• Create views, inline TVFs, and procedures to encapsulate
your semantics and business logic in T-SQL.
• Manage permissions on the objects.

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Access SQL Analytics Endpoint using SSMS

1. Copy SQL connection string


2. Paste in SSMS
3. Mention database name as Lakehouse
4. Authenticate with same credentials of Fabric

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
SQL Endpoint - Visual query
1. GUI interface to query tables in your lakehouse.
2. Drag one or more tables onto the canvas, you can use the
visual experience to design your queries.
3. Save the visual queries as SQL View

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Semantic model
• Source of data ready for reporting, visualization.
• Created by default when creating a Lakehouse
• All tables and Views created in lakehouse will be synced to default
semantic model
• You can create a Power BI report using this or you can also create a
new semantic model and create report using that

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Data Factory

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Ways to load data into Lakehouse
1. Local file/folder upload ✓
2. Copy tool in pipelines -----------> Data Factory
3. Dataflow Gen2 ------------------> Data Factory
4. Notebook code
5. Shortcut

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Consideration when loading data

Use case Recommendation


Small file upload from local Use Local file upload
machine
Small data or specific connector Use Dataflows

Large data source Use Copy tool in pipelines

Complex data transformations Use Notebook code

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Ingest On-premise SQL server to OneLake
Azure Data Factory Fabric Data Factory

Self-hosted Integration Runtime On-premise Gateway

Linked Service to SQL Server Connection to SQL server

Dataset
Pipeline
Pipeline

Azure Data Lake OneLake


Author: Shanmukh Sattiraju
https://in.linkedin.com/in/shanmukh-sattiraju
Data Gateway Types
• On-premise Data Gateway
• The on-premises data gateway provides quick and secure
data transfer between on-premises data and several
Microsoft cloud services, such as Power BI, Power Apps
• V-Net Data Gateway
• A virtual network (VNet) data gateway helps you to
connect from Microsoft Cloud services to your Azure data
services within a virtual network

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Installing On-premise Gateway
• Install a Standard Gateway using
Install an on-premises data gateway | Microsoft Learn

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Ingest On-premise SQL server to OneLake
Azure Data Factory Fabric Data Factory

Self-hosted Integration Runtime On-premise Gateway



Linked Service to SQL Server Connection to SQL server

Dataset
Pipeline
Pipeline

Azure Data Lake OneLake


Author: Shanmukh Sattiraju
https://in.linkedin.com/in/shanmukh-sattiraju
Connections
Connections establish authentication to Data Sources like linked
services in Azure Data Factory / Synapse Analytics

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Ingest On-premise SQL server to OneLake
Azure Data Factory Fabric Data Factory

Self-hosted Integration Runtime On-premise Gateway



Linked Service to SQL Server Connection to SQL server

Dataset
Pipeline
Pipeline

Azure Data Lake OneLake


Author: Shanmukh Sattiraju
https://in.linkedin.com/in/shanmukh-sattiraju
Ingest On-premise SQL server to OneLake
Azure Data Factory Fabric Data Factory

Self-hosted Integration Runtime On-premise Gateway



Linked Service to SQL Server Connection to SQL server

Dataset

Pipeline
Pipeline

Azure Data Lake
Author: Shanmukh Sattiraju
OneLake

https://in.linkedin.com/in/shanmukh-sattiraju

Pipeline Triggers in Fabric
Triggers invoke pipelines in Data Factory

Trigger types are:


1. Schedule trigger
2. Storage event trigger (preview) -- Only for Azure Blob storage

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Dataflow Gen2

Source Transform Destination


Author: Shanmukh Sattiraju
https://in.linkedin.com/in/shanmukh-sattiraju
Data pipeline

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Dataflow Gen2

Datalake Gen2 Dataflow Gen2 Lakehouse

Role: Storage Data


Blob Contributor
Author: Shanmukh Sattiraju
https://in.linkedin.com/in/shanmukh-sattiraju
Data Factory summary

Ingest data and Prep, clean and


schedule workflows transform data

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
OneLake

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Ways to load data into Lakehouse
1. Local file/folder upload ✓
2. Copy tool in pipelines -----------> Data Factory ✓
3. Dataflow Gen2 ------------------> Data Factory ✓
4. Shortcut ------------------> OneLake
5. Notebook code ------------------> Data Engineering

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Shortcut

Azure Datalake
Lakehouse

OneLake

Amazon S3
Dataverse

Google Cloud Storage


( preview)
Author: Shanmukh Sattiraju
https://in.linkedin.com/in/shanmukh-sattiraju
Shortcuts in Fabric

Shortcuts

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Creating a shortcut
1.Source location
A. Lakehouse
Internal to Fabric
B. Warehouse
C. Azure Datalake
D. Amazon S3 External to Fabric
E. Dataverse
2.Authentication to Source data
3.Destination
A. Lakehouse
B. KQL Database Author: Shanmukh Sattiraju
https://in.linkedin.com/in/shanmukh-sattiraju
Creating a shortcut in Files

Azure Datalake Gen2


Lakehouse
Container: shortcutfile
SubFolder: Emp Section: Files
Files: Emp1.csv,Emp2.csv,etc Folder: Emp

Role: Storage Blob Data Contributor


Author: Shanmukh Sattiraju
https://in.linkedin.com/in/shanmukh-sattiraju
Creating a shortcut in Table

Azure Datalake Gen2


Lakehouse
Container: shortcutfile
SubFolder: Emp Section: Table
Files: Emp1.csv,Emp2.csv,etc

Role: Storage Blob Data Contributor


Author: Shanmukh Sattiraju
https://in.linkedin.com/in/shanmukh-sattiraju
Creating a shortcut in Table

Azure Datalake Gen2


Lakehouse


Container: shortcutfile
SubFolder: Emp Section: Table
Files: Emp1.csv,Emp2.csv,etc

Role: Storage Blob Data Contributor


Author: Shanmukh Sattiraju
https://in.linkedin.com/in/shanmukh-sattiraju
Creating a shortcut in Table

Azure Datalake Gen2


Lakehouse
Container: shortcutdeltaroot
Files: Delta format Section: Table

Role: Storage Blob Data Contributor


Author: Shanmukh Sattiraju
https://in.linkedin.com/in/shanmukh-sattiraju
Creating a delta file

Delta File

Parquet File
Author: Shanmukh Sattiraju
https://in.linkedin.com/in/shanmukh-sattiraju
Creating a shortcut in Table

Azure Datalake Gen2


Lakehouse
Container: shortcutdeltaroot
Files: <delta format> ✓ Section: Table

Role: Storage Blob Data Contributor


Author: Shanmukh Sattiraju
https://in.linkedin.com/in/shanmukh-sattiraju
Creating a shortcut in Table

Azure Datalake Gen2


Lakehouse
Container: shortcutdeltasub
Files: data/<delta format> Section: Table

Role: Storage Blob Data Contributor


Author: Shanmukh Sattiraju
https://in.linkedin.com/in/shanmukh-sattiraju
Creating a shortcut in Table

Azure Datalake Gen2


Lakehouse
Container: shortcutparquet
Files: <parquet format> Section: Table

Role: Storage Blob Data Contributor


Author: Shanmukh Sattiraju
https://in.linkedin.com/in/shanmukh-sattiraju
Shortcuts in Fabric

Shortcuts in Tables
• In the Tables folder, you can only create shortcuts at the
top level. Shortcuts aren't supported in other
subdirectories of the Tables folder
• Data to be in Delta/parquet format so that lakehouse
automatically synchronizes the metadata and recognizes
the folder as a table

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Shortcuts in Fabric

Shortcuts in Files
• If your shortcut location data is in form of sub-directories
go with storing them in files.
• If they are not in delta-parquet format , store them in files

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Shortcut updating scenario

Azure Datalake Gen2


Lakehouse
Container: shortcutdeltaroot
Files: <delta format>

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Updating data in Shortcut Table

Azure Datalake Gen2 Lakehouse


Read access
Update data

Write access
Update data

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Updating data in Shortcut Table

Azure Datalake Gen2 Lakehouse


Read access
Update data

Write access
Gets updated Update data

✓ Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Updating data in Data lake

Azure Datalake Gen2 Lakehouse

Update data

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Updating data in Data lake

Azure Datalake Gen2 Lakehouse

Update data Gets updated


Author: Shanmukh Sattiraju
https://in.linkedin.com/in/shanmukh-sattiraju
Shortcut deletion scenarios

Azure Datalake Gen2


Lakehouse
Container: shortcutdeltaroot
Files: <delta format>

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Scenario 1: Delete content in Shortcut of Files section

Azure Datalake Gen2 Lakehouse


Files

Delete data

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Scenario 1: Delete content in Shortcut of Files section

Azure Datalake Gen2 Lakehouse


Files

Gets deleted Delete data


Author: Shanmukh Sattiraju
https://in.linkedin.com/in/shanmukh-sattiraju
Scenario 2: Delete a specific content in ADLS

Azure Datalake Gen2 Lakehouse


Files

Delete data

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Scenario 2: Delete a specific content in ADLS

Azure Datalake Gen2 Lakehouse


Files

Gets deleted


Delete data

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Scenario 3: Delete content in Shortcut of Tables section

Azure Datalake Gen2 Lakehouse


Tables

Delete data

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Scenario 3: Delete content in Shortcut of Tables section

Azure Datalake Gen2 Lakehouse


Tables

Gets deleted Delete data


Author: Shanmukh Sattiraju
https://in.linkedin.com/in/shanmukh-sattiraju
Scenario 4: Delete a specific content of delta table in ADLS

Azure Datalake Gen2 Lakehouse


Tables

Delete data

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Scenario 4: Delete a specific content of delta table in ADLS

Azure Datalake Gen2 Lakehouse


Tables

Gets deleted
Delete data


Author: Shanmukh Sattiraju
https://in.linkedin.com/in/shanmukh-sattiraju
Scenario 5: Deleting shortcut completely in Lakehouse

Azure Datalake Gen2 Lakehouse


Tables

Delete shortcut

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Scenario 5: Deleting shortcut completely in Lakehouse

Azure Datalake Gen2 Lakehouse


Tables

Will not be deleted Delete shortcut


Author: Shanmukh Sattiraju
https://in.linkedin.com/in/shanmukh-sattiraju
Shortcut deletion scenarios
Scenario 1 File Delete content in Shortcut of Files
section
Deletes in Datalake ✓
Scenario 2 File
Delete a specific content in ADLS Deletes in Lakehouse ✓
Scenario 3 Table Delete content in Shortcut of Tables
section
Deletes in Datalake ✓
Scenario 4 Table Delete a specific content of delta
table in ADLS
Deletes in Lakehouse ✓
Scenario 5 File &
Table
Deleting shortcut completely in
Lakehouse
ADLS data will not be
deleted 
Author: Shanmukh Sattiraju
https://in.linkedin.com/in/shanmukh-sattiraju
Synapse Data Engineering

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Ways to load data into Lakehouse
1. Local file/folder upload ✓
2. Copy tool in pipelines -----------> Data Factory ✓
3. Dataflow Gen2 ------------------> Data Factory ✓
4. Shortcut ------------------> OneLake

5. Notebook code ------------------> Data Engineering

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Spark in Microsoft Fabric

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Spark pools in Microsoft Fabric

Spark Pools

Starter pool Custom Spark


(default) pool

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Starter pools

Starter Pools

Default
compute

-------------- Spark Compute ---------→


Author: Shanmukh Sattiraju
https://in.linkedin.com/in/shanmukh-sattiraju
Node Size
Size vCore Memory
Small 4 32 GB
Medium 8 64 GB
Large 16 128 GB
X-Large 32 256 GB
XX-Large 64 512 GB

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Starter pools
Starter pool configuration

Node Family Memory Optimized

Node Size Medium

Min and Max Nodes 1 to 10

Auto scale On

Dynamic Allocation On

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Customize Starter pool

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Custom pools

Custom Pools

Cluster Sized based on customer


specifications

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Standard session (default)

Notebook

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
High Concurrency

Notebook 1 Notebook 2 Notebook 3

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Notebook 1 Notebook 2

2/3 mins < 15 seconds

High concurrency Session

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Benefits of High Concurrency

• Multi-task:
• One user can use multiple notebooks with one session
• Prevents delays due to session creation
• Security:
• Session sharing is always within a single user boundary
• Cost-effective:
• Better resource utilization and cost-saving

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Session sharing conditions
• Sessions should be within a single user boundary.
• Sessions should have the same default Lakehouse
configuration.
• Sessions should have the same Spark compute
properties.

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Notebook basics

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
MSSparkUtils

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
MSSparkUtils
• Microsoft Spark Utilities (MSSparkUtils) is a built-in package
to help you easily perform common tasks
• You can use MSSparkUtils to work with file systems, to get
environment variables, to chain notebooks together, and to
work with secrets

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Mssparkutls.help()

• Notebook: Utility for notebook operations (e.g, chaining Fabric


notebooks together)
• Lakehouse: [Preview] Utility for Lakehouse operations (e.g, create,
delete, update, list lakehouse)
• fs: Utility for filesystem operations in Fabric
• Credentials: Utility for obtaining credentials (tokens and keys) for
Fabric resources
• Runtime: Utility for getting context information of runtime that
matters to current session

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Mssparkutils.fs.help()
Mounting:
• Mount an Azure Datalake
• Mount a lakehouse

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
FS utils - fastcp
• Performant copy file
• Faster way to copy files , especially large volume
• Makes use of Azcopy

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Notebook utils
• Exit
• Run

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Notebook – runMultiple
• Execute multiple notebooks simultaneously without waiting
for each one to finish
• Specific dependencies and order of exaction using JSON
• You can optimize the Spark compute resources using this
• You can view the snapshot of each notebook run record
• You can use exit value of one notebook and use them in
downstream tasks

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Ways to load data into Lakehouse
1. Local file/folder upload ✓
2. Copy tool in pipelines -----------> Data Factory ✓
3. Dataflow Gen2 ------------------> Data Factory ✓
4. Shortcut ------------------> OneLake

5. Notebook code ------------------> Data Engineering

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Ingest data from ADLS to Lakehouse
Authentication methods from Notebook:
• Using Microsoft Entra ID of user
• Using Service principal
• Using Service principal with key vault

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Using Microsoft Entra ID of user

Storage blob data contributor

Role based access


Azure Datalake Gen2

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Using Service principal

Storage blob data contributor

Role based access


App ID Azure Datalake Gen2
Tenant ID
Secret Key

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Using Service principal using Key vault

Storage blob data contributor

Role based access


App ID Azure Datalake Gen2
Tenant ID
Secret Key

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Storage blob data contributor

Role based access


App ID Azure Datalake Gen2
Tenant ID
Secret Key

Fabric Workspace
Identity

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Ways to load data into Lakehouse
1. Local file/folder upload ✓
2. Copy tool in pipelines -----------> Data Factory ✓
3. Dataflow Gen2 ------------------> Data Factory ✓
4. Shortcut ------------------> OneLake

5. Notebook code ------------------> Data Engineering ✓

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Managed vs External table
• Managed: Data and table definition is managed by engine of
Microsoft Fabric
• External: Table definition is managed by Fabric engine but data is
stored somewhere

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Managed table
• Handles both data and metadata
• Data stored in Lakehouse’s Table folder
• Metadata includes info about Lakehouse , Tables, schema,
etc.
• Dropping table removes ALL data and metadata

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Ways to create Managed Table
1. Using .saveAsTable() from dataframe
2. Using SQL CREATE TABLE syntax
3. Using DeltaTableBuilder API

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
External Table
• Handles metadata only
• Specify LOCATION to store table data
• Dropping table removed metadata but data persists in the
specified LOCATION

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Ways to create External Table
1. Using .saveAsTable() from dataframe
2. Using SQL CREATE TABLE syntax

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Data wrangler
• A GUI based tool in notebooks to perform most common operations
on the dataframes.
• No-code based approach
• Provided quick summary view of data

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Environments in notebooks
• You can use your custom libraries, configure runtime and upload
resources
• Flexible way to customize compute configurations for running your
Spark jobs
• Configure session level properties to customize memory and core of
executors

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
V-Order optimization
• Write time optimization to the parquet file format
• Enables lighting-fast reads under the fabric compute engines
• It is 100% open-source parquet format compliant
• This is enabled by default
• Works by applying special sorting, row group distribution and
compression on parquet files

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Spark job definition
• Allows you to submit batch/streaming jobs to Spark clusters
• To run these, you must have at least one lakehouse to serve as default
file system context
• You can schedule this job definition

Synapse - Choosing Between Spark Notebook vs Spark Job Definition - Microsoft Community Hub

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Source Central Storage Consumers

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Finance

HR

IT

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Domains
• Microsoft Fabric’s data mesh architecture supports organizing data
into domains
• Enable data consumers to be able to filter and discover content by
domain
• Ensure that the data in your organization is well structured and
effectively governed, and that data consumers can easily find the
content they need.
• Domains is a key enabler for data mesh architecture, providing the
infrastructure for decentralized architecture

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Migration to Microsoft Fabric

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Migrate Notebooks from Azure Synapse to
Fabric
• Migrate notebooks - Microsoft Fabric | Microsoft Learn

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Migrate Synapse notebooks to Fabric
Option 1 : Export from Synapse and Import to Fabric
Option 2 : Script to export notebooks from Synapse and import them to
fabric using API

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Migrate Synapse notebooks to Fabric using API
• Fabric workspace and Lakehouse
• Synapse workspace
• Service principal
• Access to service principal on Synapse workspace

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Migrate Synapse/Data Factory pipelines to Fabric

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Migrate ADLS data to Fabric Onelake
• Option 1: ADLS Gen2 as storage (shortcuts)
• Option 2: OneLake as storage
• mssparkutils fastcp
• AzCopy
• Azure Data Factory, Azure Synapse, and Data Factory in Fabric
• Azure Storage Explorer

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Capacity Metrics App

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Capacity Metrics App
• The Microsoft Fabric Capacity Metrics app is designed to provide
monitoring capabilities for Microsoft Fabric capacities
• Each capacity has its own number of Capacity Units (CU).
• CUs are used to measure the compute power available for your
capacity.

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Install Capacity Metrics App
• You must be a capacity admin to install and view the Microsoft Fabric
Capacity Metrics app.
• Install Capacity metrics app Install the Microsoft Fabric capacity
metrics app - Microsoft Fabric | Microsoft Learn

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Capacity
Metrics App
• It's divided into the three
visuals listed in the
following sections.
• The top two visuals
include
• a ribbon chart
• a line and stacked
column chart, and
• the bottom visual is a
matrix table.

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
• To view metrics select the capacity name

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Multi metric ribbon chart
• The multi metric ribbon chart provides an
hourly view of your capacity's usage. To
identify daily patterns, drill down to a specific
day
• The multi metric column ribbon displays the
following four values. You'll see the top results
for these values per item during the past two
weeks.
• CU - Capacity Units (CU) processing time in
seconds.
• Duration - Processing time in seconds.
• Operations - The number of operations that
took place.
• Users - The number of users that performed
operations.

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Multi metric ribbon chart
• Use the tabs at the top right corner of the visual to toggle how the visual is displayed.
• Linear - Display the information using a linear scale that starts at 0 percent.
• Logarithmic - Display the information using a logarithmic scale that depends on your
CUs consumption.

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Line and stack column chart

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Line and stack column chart
• Understanding CU (s) Capacity Units in Seconds

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Throttling

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Smoothing
Over Utilized Under Utilized
120 120

100 100

80 80

60 60

40 40

20 20

0 0
25-06-2024 25-06-2024
10:01:30 10:01:30

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Throttling stages
Policy Consumption Impact
Overage protection Usage <= 10 minutes Jobs can consume 10 minutes of future
capacity use without throttling.

Interactive delay 10 minutes < usage <= 60 User-requested interactive jobs are
minutes delayed 20 seconds at submission.

Interactive rejection 60 minutes < usage <= 24 hours User requested interactive jobs are
rejected.

Background rejection Usage > 24 hours User scheduled background jobs are
rejected and not executed.

Microsoft Documentation: Understand your Fabric capacity throttling - Microsoft Fabric | Microsoft Learn
Author: Shanmukh Sattiraju
https://in.linkedin.com/in/shanmukh-sattiraju
Overage protection
Utilization
120
Overage
100

80

60

40

20

0
25-06-2024
10:01:30

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Overage protection
Utilization
120
<= 10 mins
100

80

60

40

20

0
25-06-2024 25-06-2024
10:01:30 10:06:30

5 mins

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Overage protection
Utilization
120

100

80

60

40

20

0
25-06-2024 25-06-2024
10:01:30 10:06:30

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Overage protection
Utilization
120

100

80

60

40

20

0
25-06-2024 25-06-2024
10:01:30 10:06:30

24 hour

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Interactive delay
Utilization
120

100
10 mins >
80

60

40

20

0
25-06-2024 25-06-2024 25-06-2024
10:01:30 10:06:30 10:21:30

20 mins

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Interactive delay
Utilization
120

100
10 mins > CU usage <= 60 mins
80

60

40

20

0
25-06-2024 25-06-2024 25-06-2024
10:01:30 10:06:30 10:21:30

20 mins

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Interactive rejection
Utilization
120

100
60 mins > CU usage <= 24 hours
80

60

40

20

0
25-06-2024 25-06-2024 25-06-2024 25-06-2024
10:01:30 10:06:30 10:21:30 11:02:00

> 1 hour

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
background rejection
Utilization
120

100

> 24 hours
80

60

40

20

0
25-06-2024 25-06-2024 25-06-2024 25-06-2024 26-06-2024
10:01:30 10:06:30 10:21:30 11:02:00 10:01:30

> 24 hours

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Policy Consumption Impact

Overage protection Usage <= 10 minutes Jobs can consume 10 minutes of future
capacity use without throttling.

Interactive delay 10 minutes < usage <= 60 minutes User-requested interactive jobs are delayed
20 seconds at submission.

Interactive rejection 60 minutes < usage <= 24 hours User requested interactive jobs are rejected.

User scheduled background jobs are rejected


Background rejection Usage > 24 hours
and not executed.

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Overages

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Overages

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
System events

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Matrix

Fabric documentation : Understand the metrics app compute page -


Microsoft Fabric | Microsoft Learn
Author: Shanmukh Sattiraju
https://in.linkedin.com/in/shanmukh-sattiraju
Synapse Data Warehouse

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Warehouse
• Stores Structured data
• Contains Databases, Schemas, Tables
• Interact with SQL
• Supports transactions, DDL, and DML queries

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Create a warehouse
When a warehouse is created:
- Warehouse
- Semantic Model (default)

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Warehouse

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Fabric workspace

Delta-parquet Delta-parquet Delta-parquet Delta-parquet


Onelake

Read-only Read-only Write/update Write/update

T-SQL T-SQL T-SQL T-SQL

SQL Analytics SQL Analytics


Warehouse Warehouse
Endpoint Endpoint

Lakehouse Lakehouse Warehouse Warehouse

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Loading data into warehouse
A Warehouse is populated by one of the supported data
ingestion methods such as
1. COPY INTO statement
2. Pipelines
3. Dataflow Gen2
4. Cross database ingestion options such as CREATE TABLE AS SELECT
(CTAS), INSERT..SELECT, or SELECT INTO.

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Using COPY INTO command

Azure Datalake Gen2 Warehouse

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Using COPY INTO command
• COPY performs high-throughput data ingestion from an external
Azure storage account
• In Microsoft Fabric, currently supports the PARQUET and CSV file
formats.
• For data sources, only Azure Data Lake Storage Gen2 accounts are
supported
• User’s Entra ID account must have access to the underlying files
through Azure role-based access control (RBAC)) or data lake ACLs).
• Trusted workspace access using Fabric is in preview supported for F64
or above capacities ( not trail)

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Using Pipelines
Ingest data from On-premise SQL to Warehouse in Fabric
1. You need to have a On-prem gateway
2. Create a connection to SQL server
3. Pipeline
4. Staging area
• Should be Blob or ADLS storage
• Authenticated only via SAS or Account key (Oauth not supported)
• Check “allow this connection to be used with On-prem data gateways”

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Using DataFlow Gen2

On-prem SQL Server Warehouse

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Data sharing - Lakehouse & Warehouse

Lakehouse Warehouse

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Cross database ingestion options
Get data using Lakehouse tables
1. Using CTAS (Create Table As Select)
2. Using INSERT INTO … SELECT *
• Need to have a table created
3. Using SELECT INTO

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Loading data into warehouse
A Warehouse is populated by one of the supported data
ingestion methods such as
1. COPY INTO statement ✓
2. Pipelines ✓
3. Dataflow Gen2 ✓
4. Cross database ingestion options such as CREATE TABLE AS SELECT
(CTAS), INSERT..SELECT, or SELECT INTO. ✓

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Microsoft Fabric Architecture

Data Synapse Data Synapse Data Synapse Data Synapse Real Data
Power BI
Factory Engineering Warehousing Science Time Analytics Activator

Onelake

SaaS Foundation
Author: Shanmukh Sattiraju
https://in.linkedin.com/in/shanmukh-sattiraju
Lakehouse vs Warehouse

Data Synapse Data Synapse Data Synapse Data Synapse Real Data
Power BI
Factory Engineering Warehousing Science Time Analytics Activator

Delta Delta
Lakehouse Onelake Warehouse
SaaS Foundation
Author: Shanmukh Sattiraju
https://in.linkedin.com/in/shanmukh-sattiraju
Lakehouse Warehouse

• Structured, semi-structured • Structured


and unstructured

• Spark Notebooks • SQL Queries, Stored Procs

• PySpark, Scala, Spark SQL, • Full T-SQL support


Spark R
• Write transformed data into • Write transformed data into
Lakehouse tables or files Warehouse tables

• Data Engineers, Data • SQL Developer, Data Analysts


scientists
Author: Shanmukh Sattiraju
https://in.linkedin.com/in/shanmukh-sattiraju
Medallion architecture patterns

Bronze Silver Gold

Lakehouse

Warehouse
Author: Shanmukh Sattiraju
https://in.linkedin.com/in/shanmukh-sattiraju
Update data from Lakehouse and Warehouse

UPDATE

Lakehouse UPDATE Warehouse

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
SQL query as session
• In fabric SQL editor , each statement acts as individual session
• But if you are using the same thing in SSMS it is different

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Zero-Copy clone feature

Data + metadata Metadata

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Zero-Copy clone feature
• A zero-copy clone creates a replica of the table by copying the
metadata , while referencing the same data files in OneLake
• The metadata is copied while the underlying data of the table stored
as parquet files is not copied.
• A clone of a table can be created within or across schemas in a
warehouse.
• A table clone is an independent and separate copy of the data from
its source.

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Creating a table clone
A Clone of a table can be created based on either:
• Current point-in-time:
• The clone is based on the present state of the table.
• Previous point-in-time
• The clone is based on a point-in-time up to seven days in the past.
• This feature is known as "time travel“
• The new table is created with a timestamp based on UTC

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Syntax:

CREATE TABLE
{ database_name.schema_name.table_name
(or)
schema_name.table_name (or) table_name }
AS CLONE OF
{ database_name.schema_name.table_name |
(or)
schema_name.table_name (or) table_name }
[AT {point_in_time}] -- 'YYYY-MM-DDThh:mm:ss’

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Time travel in Warehouse
• The ability to query a data from a specific timestamp of past
• Low-cost comparisons between previous versions of data.
• Audit data changes over time
• Default retention period of seven calendar days.
• Any modifications made to the schema of a table, including but not
limited to adding or removing columns from the table, cannot be
queried before the schema change
• Time travel is not supported for the SQL analytics endpoint of the
Lakehouse.
Author: Shanmukh Sattiraju
https://in.linkedin.com/in/shanmukh-sattiraju
Retention of data history
• Warehouse automatically preserves and maintains the data history
for seven calendar days, allowing for clones to be made at a point in
time.
• All inserts, updates, and deletes made to the data warehouse are
retained for seven calendar days.
• There is no limit on the number of clones created both within and
across schemas.

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Benefits of Zero-Copy clone
• Development and testing
• Low-cost, near-instantaneous recovery
• Data archiving

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Limitations
• Table clones across warehouses in a workspace are not currently
supported.
• Table clones across workspaces are not currently supported.
• Clone table is not supported on the SQL analytics endpoint of the
Lakehouse.
• Clone of a warehouse or schema is currently not supported.
• Table clones submitted before the retention period of seven days
cannot be created.
• Changes to the table schema prevent a clone from being created
before to the table schema change.
Author: Shanmukh Sattiraju
https://in.linkedin.com/in/shanmukh-sattiraju
Visual query editor
• Applies to SQL analytics endpoint , warehouse and mirrored database
in Fabric
• You can use the visual query editor for a no-code experience to
create your queries.

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Limitations of Visual query editor
• In the visual query editor, you can only run DQL (Data Query
Language) or read-only SELECT statements. DDL or DML statements
are not supported.
• Only a subset of Power Query operations that support Query folding
are currently supported.

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Query Insights
• The query insights feature provides a central location for historic
query data and actionable insights for 30 days, helping you to make
informed decisions to enhance the performance of your Warehouse
or SQL analytics endpoint.
• Available in SQL analytics endpoint and Warehouse in Microsoft
Fabric
• Provides information on queries run in a user's context only, system
queries aren't considered.

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Type of Query insights:
• Historical Query Data: The query insights feature stores historical
data about query executions, enabling you to track performance
changes over time. System queries aren't stored in query insights.
• Aggregated Insights: The query insights feature aggregates query
execution data into insights that are more actionable, such as
identifying long-running queries or most active users.

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
exec_requests_history
Provides information about each complete SQL request.

Column name Data type Description


distributed_statement_id uniqueid Unique ID for each query.

start_time datetime2 Time when the query started running.


command varchar(8000) Complete text of the executed query.a
login_name varchar(128) Name of the user or system that sent the query.

row_count bigint Number of rows retrieved by the query.


total_elapsed_time_ms int Total time (ms) taken by the query to finish.
status varchar(30) Query status (Succeeded, Failed, or Canceled).

session_id smallint ID linking the query to a specific user session.


connection_id uniqueid (nullable) Identification number for the query's connection.

batch_id uniqueid (nullable) ID for grouped queries (if applicable).


Author: Shanmukh Sattiraju
root_batch_id uniqueid (nullable) ID for the main group of queries (if nested).
https://in.linkedin.com/in/shanmukh-sattiraju
frequently_run_queries
Provides information about frequently run queries in Fabric Data Warehousing.

Column name Data type Description


last_run_start_time datetime2 Time of the most recent query execution.
last_run_command varchar(8000) Text of the last query execution.
number_of_runs int Total number of times the query was executed.
avg_total_elapsed_time_ms int Average query execution time (ms) across all runs.
last_run_total_elapsed_time_ms int Time taken by the last execution (ms).
last_dist_statement_id uniqueidentifier ID linking the query to queryinsights.exec_requests_history.
last_run_session_id smallint User session ID for the last execution.
min_run_total_elapsed_time_ms int Shortest query execution time (ms).
max_run_total_elapsed_time_ms int Longest query execution time (ms).
number_of_successful_runs int Number of successful query executions.
number_of_failed_runs int Number of failed query executions.
number_of_cancelled_runs int Author:Number of canceled query executions.
Shanmukh Sattiraju
https://in.linkedin.com/in/shanmukh-sattiraju
long_running_queries
provides information about SQL query execution times.

Column name Data type Description


last_run_start_time datetime2 Time of the most recent query execution.
last_run_command varchar(8000) Text of the last query execution.
median_total_elapsed_ti int Median query execution time (ms) across runs.
me_ms
number_of_runs int Total number of times the query was executed.

last_run_total_elapsed_ti int Time taken by the last execution (ms).


me_ms
last_dist_statement_id uniqueidentifier ID linking the query
to queryinsights.exec_requests_history.
last_run_session_id smallint User session ID for the last execution.
Author: Shanmukh Sattiraju
https://in.linkedin.com/in/shanmukh-sattiraju
Access control and permissions in Fabric

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Microsoft Fabric structure
Enterprise Level

Tenant - level Tenant

Capacity - level Capacity Capacity

Workspace - level Workspace Workspace

Item - level Warehouse Lakehouse Notebook

Object - level [dbo].[Dept] [dbo].[Employee] Tables/Sales Bronze.ipynb

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Warehouse security
Warehouse

Column level security

[dbo].[Dept] [dbo].[Employee]

DeptID DeptName Budget CC Number


101 HR $1000 xxxx31
102 Admin $2000 xxxx53
103 IT $3000 xxxx34
Row level security 104 Transport $4000 xxxx76
105 Sales $5000 xxxx42

Dynamic data masking


Author: Shanmukh Sattiraju
https://in.linkedin.com/in/shanmukh-sattiraju
Lakehouse security
Lakehouse

Tables Files
Tables/Folder Files/Folder

Sales Sales Team

Marketing

Finance Finance Team

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Tenant – Level permissions

Tenant - level Tenant

Capacity Capacity

Workspace Workspace

Warehouse Lakehouse Notebook

[dbo].[Dept] [dbo].[Employee] Tables/Sales Bronze.ipynb

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Capacity level permissions

✓ Tenant - level Tenant

Capacity - level Capacity Capacity

Workspace Workspace

Warehouse Lakehouse Notebook

[dbo].[Dept] [dbo].[Employee] Tables/Sales Bronze.ipynb

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Workspace level security

✓ Tenant - level Tenant

✓ Capacity - level Capacity Capacity

Workspace - level Workspace Workspace

Warehouse Lakehouse Notebook

[dbo].[Dept] [dbo].[Employee] Tables/Sales Bronze.ipynb

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Workspace roles
Each workspace role contains permissions that allow users to perform certain actions

Role Can add admins? Can add Can write data and Can read data?
members? create items?
Admin Yes Yes Yes Yes
Member No Yes Yes Yes
Contributor No No Yes Yes
Viewer No No No Yes

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Workspace Administration
Workspace action Admin Member Contributor Viewer

Update/ Delete workspace


✓   
Add/ Remove users
✓   
Add/ Remove members (and below)
✓ ✓  
Allow others to re-share items
✓ ✓  
Schedule refresh (via on-prem data gateway)
✓ ✓ ✓ 
Modify gateway connection strings
✓ ✓ ✓ 
Keep in mind that you also need permissions on the gateway. Those permissions are managed
elsewhere, independent of workspace roles and permissions.
Author: Shanmukh Sattiraju
https://in.linkedin.com/in/shanmukh-sattiraju
Data pipeline permissions

Data Pipeline actions Admin Member Contributor Viewer

View content and output


✓ ✓ ✓ ✓
Execute and cancel pipeline execution
✓ ✓ ✓ 
Schedule pipeline refreshes
✓ ✓ ✓ 
Create , modify and delete pipelines
✓ ✓ ✓ 

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Notebook, Spark Jobs, Experiments, ML Models, Event
streams permissions
Item actions Admin Member Contributor Viewer

View content and output


✓ ✓ ✓ ✓
Execute and cancel item/pipeline execution
✓ ✓ ✓ 
Create , modify and delete items
✓ ✓ ✓ 

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Data warehouse permissions
Data warehouse actions Admin Member Contributor Viewer

✓ ✓
Connect to SQL Analytics endpoint
✓ ✓
Read data and shortcuts through SQL endpoint
✓ ✓ ✓ ✓
✓ ✓
Read through OneLake API
✓ 
Read through Spark (Shortcut)
✓ ✓ ✓ 
Create , Modify tables / views, etc.
✓ ✓ ✓ 
Shortcuts :
1. Reading through shortcuts need additional permission from shortcut destination for objects internal to Fabric
2. ADLS shortcuts use delegated authorization model

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Accessing shortcuts internal to fabric

WSP_1 WSP_2

LH_A LH_B

Alice (having contributor role in WSP1)

WSP_2

T-SQL

✓ 
Bob (having Contributor role in WSP_2)
Author: Shanmukh Sattiraju
https://in.linkedin.com/in/shanmukh-sattiraju
Accessing shortcuts internal to fabric

WSP_1 WSP_2

LH_A LH_B

Alice (having contributor role in WSP1)

WSP_2

T-SQL

✓ 
Bob (having Contributor role in WSP_2)
Author: Shanmukh Sattiraju
https://in.linkedin.com/in/shanmukh-sattiraju
Accessing shortcuts internal to fabric

WSP_1 WSP_2

LH_A LH_B

Alice (having contributor role in WSP1)

WSP_2

T-SQL

✓ 
Bob (having Contributor role in WSP_2)
Author: Shanmukh Sattiraju
https://in.linkedin.com/in/shanmukh-sattiraju
Accessing shortcuts internal to fabric

WSP_1 WSP_2

LH_A LH_B

Alice (having contributor role in WSP1)

WSP_2

T-SQL

✓ 
Bob (having Contributor role in WSP_2)
Author: Shanmukh Sattiraju
https://in.linkedin.com/in/shanmukh-sattiraju
Accessing shortcuts internal to fabric

When accessing shortcuts through Power BI semantic models or T-


SQL, the calling user’s identity (who is currently using the session) is not
passed through to the shortcut target. The calling item owner’s identity
(who created that shortcut item) is passed instead, delegating access to
the calling user.

But this behavior will be different when accessing from notebook, the
caller need to have access at shortcut destination

Reference: OneLake shortcuts - Microsoft Fabric | Microsoft Learn

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Accessing ADLS shortcuts
WSP_2

LH_B
Azure Data Lake storage
Alice (having Storage blob data
contributor role in ADLS)

WSP_2

T-SQL

✓ ✓
Bob (having Contributor role in WSP_2)
Author: Shanmukh Sattiraju
https://in.linkedin.com/in/shanmukh-sattiraju
Accessing ADLS shortcuts
WSP_2

LH_B
Azure Data Lake storage
Alice (having Storage blob data
contributor role in ADLS)

WSP_2

T-SQL

✓ ✓
Bob (having Contributor role in WSP_2)
Author: Shanmukh Sattiraju
https://in.linkedin.com/in/shanmukh-sattiraju
Lakehouse permissions
Lakehouse actions Admin Member Contributor Viewer

Connect to SQL Analytics endpoint


✓ ✓ ✓ ✓
Read data and shortcuts through SQL endpoint
✓ ✓ ✓ ✓
Read data and shortcuts through Lakehouse
Explorer ✓ ✓ ✓ 
Read through OneLake API
✓ ✓ 

Read through Spark
✓ ✓ ✓ 
Reading Files
✓ ✓ ✓ 
Create / modify / delete tables / files
✓ ✓ ✓ 
Shortcuts :
1. Reading through shortcuts need additional permission from shortcut destination for objects internal to Fabric
2. ADLS shortcuts use delegated authorization model

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Item- level permissions

✓ Tenant - level Tenant

✓ Capacity - level Capacity Capacity

✓ Workspace - level Workspace Workspace

Item - level Warehouse Lakehouse Notebook

[dbo].[Dept] [dbo].[Employee] Tables/Sales Bronze.ipynb

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Item-Level permissions
• You can share individual items by clicking on 3 dots beside that item
Item level sharing is useful when:
• You want to collaborate with colleagues who don’t have a role in
workspace
• You want to grant additional item level permissions for colleagues
who already have a role in the workspace

Official documentation: Share items in Microsoft Fabric - Microsoft


Fabric | Microsoft Learn

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Item Level permissions

Items Sharing

Data pipeline

Data flow Gen2

Event Stream

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Data
warehouse
sharing

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Warehouse item sharing permissions
Permission granted while sharing Effect
If no additional permissions are selected The shared recipient by default receives "Read" permission, which
only allows recipient to
• Connect to SQL analytics endpoint
• Recipient cannot query any table or view
• Recipient cannot execute any function or Stored proc
Read all data using SQL (ReadData) Read all objects within the warehouse using T-SQL
The shared recipient can read all database objects within warehouse
• ReadData is equivalent to db_datareader role in SQL server
• Further restriction can be done by using GRANT/REVOKE/DENY
statements
Read all OneLake data (ReadAll) Read the warehouse's underlying OneLake files using
• Apache Spark
• Pipelines, or Shortcut
• Other apps that access the OneLake data directly
Build reports on the default semantic • Build reports on top of the default semantic model connected to the
Model (Build) warehouse
Author: Shanmukh Sattiraju
https://in.linkedin.com/in/shanmukh-sattiraju
Warehouse – Read all data using SQL (ReadData)

WSP_1
Read, ReadData permissions
No role in workspace

Steve
WH_A


WSP_2

LH_A

Alice (contributor access in WSP_2)

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Warehouse - Read all OneLake (ReadAll)

WSP_1
Read, ReadData, ReadAll permissions
No role in workspace

Steve
WH_A


WSP_2

LH_A

Alice (contributor access in WSP_2)


Author: Shanmukh Sattiraju
https://in.linkedin.com/in/shanmukh-sattiraju
Lakehouse
sharing

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Lakehouse item sharing permissions
Permission granted while sharing Effect
If no additional permissions are selected The shared recipient by default receives "Read" permission, which only
allows recipient to
• Connect to SQL analytics endpoint
• Recipient cannot query any table or view
• Recipient cannot see anything in Lakehouse Explorer

Read all SQL Endpoint data • Read data from the SQL analytics endpoint of the Lakehouse
• User cannot create or modify tables
• Need GRANT / MODIFY from admins to make changes
Read all Apache Spark (ReadAll) • Read Lakehouse data through OneLake APIs and Spark.
• Read Lakehouse data through Lakehouse explorer.
Build reports on the default • Build reports on top of the default semantic model connected to the
semantic Model (Build) warehouse

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Notebook - sharing

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Item-level permissions

People in your organization: People with existing access: Specific people:


This type of link allows people in your This type of link generates a URL to the This type of link allows specific people
organization to access this item. It item, but it doesn't grant any access to or groups to access the report.
doesn't work for external users or the item. Use this link type if you just
This link type also lets you share to
guest users. Use this link type when: want to send a link to somebody who
guest users in your organization's
already has access.
• You want to share with someone in Microsoft Entra ID
your organization.
You can't share to external users who
• You're comfortable with the link aren't guests in your organization.
being shared with other people in
your organization.
• You want to ensure that the link
doesn't work for external or guest
users.

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Notebook – Additional permissions

Additional permission Effect


If no additional permissions are The shared recipient will only see the notebook cells but cannot execute it
selected
Share Share the notebook with other people. This permission is also known as
Reshare.
Edit Edit all notebook cells. This permission is also known as Write.
Run Run all notebook cells. This permission is also known as Execute.

You must also grant run permission to any user who gets edit permission.

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
OneLake data access
• This is a new feature that enables you to apply role-based access
control (RBAC) to your data stored in OneLake.
• You can define security roles that grant read access to specific folders
within a Fabric item, and assign them to users or group
• By default , everyone have DefaultReader role to read all folders

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
OneLake data access
• You can provide access to tables or files section in Lakehouse
• All will be shown in the format of folders
• For OneLake shortcuts, the permissions must be defined in the
destination table. Defining permissions on the shortcut itself is not
allowed.

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Row-level security
Warehouse

[dbo].[Dept] [dbo].[Employee]

DeptID DeptName Budget CC Number


101 HR $1000 xxxx31
102 Admin $2000 xxxx53
103 IT $3000 xxxx34
Row level security 104 Transport $4000 xxxx76

Available in : 105 Sales $5000 xxxx42


• Synapse Data Warehousing
• SQL analytics endpoint in lakehouse

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Row-level security
• The access restriction logic is in the database tier
• The database applies the access restrictions every time data access is
attempted
• Access to row-level data in a table is restricted by a security predicate
defined as an inline table-valued function.
• The function is then invoked and enforced by a security policy.

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Dynamic data masking
Warehouse

[dbo].[Dept] [dbo].[Employee]

DeptID DeptName Budget CC Number


101 HR $1000 xxxx31
102 Admin $2000 xxxx53
103 IT $3000 xxxx34

✓ Row level security 104 Transport $4000 xxxx76


105 Sales $5000 xxxx42
Available in :
• Synapse Data Warehousing
• SQL analytics endpoint in lakehouse
Dynamic data masking
Author: Shanmukh Sattiraju
https://in.linkedin.com/in/shanmukh-sattiraju
Dynamic Data Masking
• Dynamic data masking limits sensitive data exposure by masking it to
nonprivileged users.
• Prevent unauthorized viewing of sensitive data by enabling
administrators to specify how much sensitive data to reveal
• Can be configured on designated database fields to hide sensitive
data in the result sets of queries
• Users without the Administrator, Member, or Contributor rights on
the workspace, and without elevated permissions on the Warehouse,
will see masked data.

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Function Description
Default .

For string data types, use XXXX (or fewer) if the size of the field is fewer than 4 characters
(char, nchar, varchar, nvarchar, text, ntext).

For numeric data types use a zero value


(bigint, bit, decimal, int, money, numeric, smallint, smallmoney, tinyint, float, real).

For date and time data types, use 1900-01-01


00:00:00.0000000 (date, datetime2, datetime, datetimeoffset, smalldatetime, time).

For binary data types use a single byte of ASCII value 0 (binary, varbinary, image).
Email

Random A random masking function for use on any numeric type to mask the original value with a random value within a
specified range.
Custom String Masking method that exposes the first and last letters and adds a custom padding string in the
middle. prefix,[padding],suffix

If the original value is too short to complete the entire mask, part of the prefix or suffix isn't exposed.
Author: Shanmukh Sattiraju
https://in.linkedin.com/in/shanmukh-sattiraju
default() function
Full masking according to the data types of the designated fields

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Default() function
Column type DataTypes Masked value
Strings char, nchar, varchar, nvarchar, text, ntext XXXX

Numeric bigint, bit, decimal, int, money, numeric, small 0


int, smallmoney, tinyint, float, real

Date and DateTime date, datetime2, datetime, datetimeoffset, s 1900-01-01


malldatetime, time 00:00:00.0000000

Binary binary, varbinary, image ASCII value of 0

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Email() function
• Masking method that exposes the first letter of an email address and
the constant suffix ".com", in the form of an email
address. [email protected].

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Bypassing masking using inference or brute-force
techniques
• Dynamic Data Masking can be useful to prevent accidental exposure of
sensitive data when accessing data directly
• It's important to note that unprivileged users with query permissions can
apply techniques to gain access to the actual data.
• dynamic data masking shouldn't be used alone to fully secure sensitive
data from users with query access to the Warehouse or SQL analytics
endpoint.
• It's appropriate for preventing sensitive data exposure, but doesn't protect
against malicious intent to infer the underlying data.
• Follow principle of Least privilege and use Object-level security with SQL
GRANT/REVOKE/ statements

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Column level security
Warehouse

Column level security

[dbo].[Dept] [dbo].[Employee]

DeptID DeptName Budget CC Number


101 HR $1000 xxxx31
102 Admin $2000 xxxx53


103 IT $3000 xxxx34
Row level security 104 Transport $4000 xxxx76

Available in : 105 Sales $5000 xxxx42


• Synapse Data Warehousing
• SQL analytics endpoint in lakehouse

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
✓ Dynamic data masking
Column-level security
• Allows you to restrict column access to certain users
• The access restriction logic is located in the database tier,
• The database applies the access restrictions every time data access is
attempted
• If user is having "Read all with SQL or ReadData" permission , Column
level security will not apply

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Best practices
Follow principal of least privilege
• If users only need access to a single lakehouse or data item, use the
share feature to grant them access to only that item
• Assigning a user to a workspace role should only be used if that user
needs to see ALL items in that workspace
• Use OneLake data access roles (preview) to restrict access to folders
and tables within a lakehouse for access through OneLake APIs or
Apache Spark notebooks.

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Power BI

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Semantic Model

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Semantic model
• Semantic models represent a source of data ready for reporting,
visualization
• A default semantic model is created automatically when you create a
Lakehouse or warehouse
• This default semantic model will contain all the tables that we create
in lakehouse or warehouse

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Semantic model with XMLA endpoint
• XMLA (XML for analysis) allows client tools to communicate with
semantic models in fabric
• Perform operations like read- or write- . Read is enabled by default , if
you want to some write- operations like modifying the semantic
model from external client tools, you need to enable the write- from
Fabric admin portal
• Client tools are SSMS, DAX studio, Visual Studio, Powershell cmdlets
and Tabular Editor 2
• Premium capacity feature : Premium per capacity or premium per
user license types
Author: Shanmukh Sattiraju
https://in.linkedin.com/in/shanmukh-sattiraju
Direct Query mode

Pulls data in real time Translates to source dialect Sends DAX queries
e.g., SQL

Direct Query

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Import mode

Data is copied to Power BI model Sends DAX queries

Import

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Slow , but real-time

Direct Query

Fast, but latent and duplicative

Import

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Direct Lake
Data Factory

parquet

Synapse
Data
Engineering

Synapse Direct Lake


Data
Warehousing

Synapse
Data Science

OneLake Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Refresh Semantic Model
• Refresh Semantic Model – Manual
• Refresh Semantic Model – Notebook
• Refresh Semantic Model – XMLA endpoint
• Refresh Semantic Model – Data pipeline

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Fallback to direct query from Direct lake
DirectLake = Direct Lake mode read Delta tables directly from OneLake
Direct query = Query use SQL to retrieve the results from the SQL
endpoint of the lakehouse or warehouse, which can impact query
performance.

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Fallback to direct query from Direct lake
Scenario 1:
Exceeding SKU limits
Fabric SKUs Parquet files Row groups Rows per Max model Max memory
per table per table table size on (GB)
(millions) disk/OneLake
1
(GB)
F2 1,000 1,000 300 10 3
F4 1,000 1,000 300 10 3
F8 1,000 1,000 300 10 3
F16 1,000 1,000 300 20 5
F32 1,000 1,000 300 40 10
F64/FT1/P1 5,000 5,000 1,500 Unlimited 25
F128/P2 5,000 5,000 3,000 Unlimited 50
F256/P3 5,000 5,000 6,000 Unlimited 100
F512/P4 10,000 10,000 12,000 Unlimited 200
F1024/P5 10,000 10,000 24,000 Unlimited 400
Author: Shanmukh Sattiraju
F2048 10,000 10,000 24,000 Unlimited 400
https://in.linkedin.com/in/shanmukh-sattiraju
Fallback to direct query from Direct lake
Scenario 2:
• Using features that don’t support Direct Lake mode, like SQL views in
a warehouse, the query can fall back to DirectQuery mode.

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Handling fallback behavior
Direct Lake models include the DirectLakeBehavior property, which has
three options:
• Automatic - (Default) Specifies queries fall back to DirectQuery mode
if data can't be efficiently loaded into memory.
• DirectLakeOnly - Specifies all queries use Direct Lake mode only.
Fallback to DirectQuery mode is disabled. If data can't be loaded into
memory, an error is returned. Use this setting to determine if DAX
queries fail to load data into memory, forcing an error to be returned.
• DirectQueryOnly - Specifies all queries use DirectQuery mode only.
Use this setting to test fallback performance.

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Copy measures from Model1 to Model2

Semantic Model 1 Semantic Model 2


Total_UnEmployed Total_UnEmployed
Total_Employed Total_Employed
Total_Sales Total_Sales
Total_Attended Total_Attended

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Row-level security
• Row-level security (RLS) with Power BI can be used to restrict data
access for given users.
• RLS only restricts data access for users with Viewer permissions. It
doesn't apply to Admins, Members, or Contributors.
• You can also configure RLS on semantic models that are using
DirectQuery
• You can't define roles within Power BI Desktop for Analysis Services
live connections. You need to do that within the Analysis Services
model. (applies to DirectLake)
• userprincipalname() defines email address of user

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Object Level Security

Semantic Model

Table1
Table2 User 1
Table3
Table4
User 2 Table5
Table6

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
There are 3 options
• Default = role by default see this table
• None = This will hide the table from this role
• Read = The table will be visible only to this role

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Column Level Security

Employee Table

ID
Name Show
Subject
Fees
Hide OE1
OE2

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
End to end project implementation in Fabric

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Medallion Architecture foundation

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Medallion Architecture foundation

Bronze Silver Gold

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Medallion Architecture foundation

Bronze Silver Gold

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Medallion Architecture in Fabric

Bronze Silver Gold

Cleanse and validate Additional transformations


What happens in this layer? Ingest raw data
data and modelling

What tool is used? Pipelines, dataflows and Dataflows or notebooks SQL Endpoint or
notebooks semantic models

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Incremental ingestion types

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Timestamp-based incremental ingestion
• Ingest data based on timestamp column (e.g. last_updated or
created_at)
• Fetch records where the timestamp is greater than the maximum
timestamp from last ingestion
Supported Data Sources:
• Relational Databases: MySQL, PostgreSQL, Oracle, SQL Server
• NoSQL Databases: MongoDB, Cassandra, DynamoDB
• APIs: Rest APIs, SOAP APIs, GraphQL APIs
• Data warehouse: Snowflake, Redshift, Google BigQuery, Azure
Synapse
Author: Shanmukh Sattiraju
https://in.linkedin.com/in/shanmukh-sattiraju
Change Data Capture (CDC)
• Captures changes (inserts, updates, deletes) from source system
• Use database logs or triggers to track changes
Supported Data Sources:
• Relational Databases: MySQL, PostgreSQL, Oracle, SQL Server
• NoSQL Databases: MongoDB, Cassandra, DynamoDB
• Data warehouse: Snowflake, Redshift, Google BigQuery, Azure
Synapse

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Delta Ingestion
• Ingest only the new or changed data since the last ingestion
• Use unique identifiers to track and compare changes
Supported Data Sources:
• Relational Databases: MySQL, PostgreSQL, Oracle, SQL Server
• NoSQL Databases: MongoDB, Cassandra, DynamoDB
• File Systems: HDFS, S3, Azure Blob or ADLS, Cloud Storage

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Batch Window Ingestion
• Ingest data in fixed time interval (e.g. hourly, daily)
• Fetch data within the defined time window
Supported Data Sources:
• Relational Databases: MySQL, PostgreSQL, Oracle, SQL Server
• NoSQL Databases: MongoDB, Cassandra, DynamoDB
• File Systems: HDFS, S3, Azure Blob or ADLS, Cloud Storage

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Partition-based ingestion
• Ingest data based on partitions (e.g. date, region)
• Ingest data for each partition incrementally
Supported Data Sources:
• Relational Databases: MySQL, PostgreSQL, Oracle, SQL Server
• NoSQL Databases: MongoDB, Cassandra
• File Systems: HDFS, S3, Azure Blob or ADLS, Cloud Storage
• Data warehouse: Snowflake, Redshift, Google BigQuery, Azure
Synapse

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
End to End Fabric project

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Online Learning platform

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Dataset information
No. Column Description
1 Student_ID Unique identifier for each student.
2 Name Student's full name.
3 Age Student's age.
4 Gender Student's gender (M/F).
5 Grade_Level Student's current grade level.
6 Course_ID Unique identifier for each course.
7 Course_Name Name of the course.
8 Enrollment_Date Date the student enrolled in the course.
9 Completion_Date Date the student completed the course.
Current status of the student in the course (e.g., In
10 Status
Progress, Completed).
11 Final_Grade Final grade obtained by the student in the course.
12 Attendance_Rate Percentage of classes attended by the student.
13 Time_Spent_on_Course (hrs) Total hours spent by the student on the course.
Author: Shanmukh Sattiraju
https://in.linkedin.com/in/shanmukh-sattiraju
Dataset information
No Column Description
14 Assignments_Completed Number of assignments completed by the student.

15 Quizzes_Completed Number of quizzes completed by the student.


16 Forum_Posts Number of forum posts made by the student.
17 Messages_Sent Number of messages sent by the student.
18 Quiz_Average_Score Average score of all quizzes taken by the student.

19 Assignment_Scores Assignment scores by students

20 Assignment_Average_Score Average score of all assignments completed by the student.

21 Project_Score Score of the final project completed by the student.


22 Extra_Credit Extra credit points earned by the student.
Overall performance score considering all aspects of the
23 Overall_Performance
course.

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Dataset information
No Column Description
24 Feedback_Score Average feedback score provided by the student for the course.
Level of parent involvement in the student's education (e.g., High,
25 Parent_Involvement
Medium, Low).
Demographic group the student belongs to (e.g., Urban, Suburban,
26 Demographic_Group
Rural).
27 Internet_Access Whether the student has access to the internet at home (Yes/No).

28 Learning_Disabilities Any learning disabilities the student may have.

29 Preferred_Learning_Style Student's preferred learning style (e.g., Visual, Auditory, Kinesthetic).


Proficiency level in the language of instruction (e.g., Beginner,
30 Language_Proficiency
Intermediate, Advanced).
31 Participation_Rate Percentage of active participation in class activities.

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Fabric Project Architecture

Data science

Landing Bronze Silver Gold Power BI

LMS Data
Analysis

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Raw to landing layer

Landing

LMS Data

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Raw to landing layer

Raw Landing

ProcessingDate = 2024-09-17

ProcessingDate = 2024-09-18
LMS Data

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Steps for Raw to landing layer

1. Data comes to Raw layer containing format of 'LMS_YYYY-MM-DD’


2. Only 1 file will be generated per day
3. We need to write the arrived data to Landing zone
4. While writing we need to partition the data based on today (E.g.
2024-09-17) as Processed_Date
5. This Processed_Date column will be used for partitioning the data

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Landing to Bronze layer
Today’s Date = 2024-09-17

Landing LH_Bronze

ProcessingDate = 2024-09-16
bronze_data
ProcessingDate = 2024-09-17

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Understanding UPSERT to bronze
new_data Bronze_Table

Std_ID Name Course_ID


101 Michael brown 71 ID = 101
Course_ID = 71

Landing

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Understanding UPSERT to bronze
new_data Bronze_Table

Std_ID Name Course_ID Std_ID Name Course_ID


101 Michael brown 71 ID = 101 101 Michael brown 71
Course_ID = 71

INSERT
Landing

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Understanding UPSERT to bronze
new_data Bronze_Table

Std_ID Name Course_ID Std_ID Name Course_ID


101 Michael brown 71 ID = 101 101 Michael brown 71
Course_ID = 71
101 Michael White 71

Landing

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Understanding UPSERT to bronze
new_data Bronze_Table

Std_ID Name Course_ID Std_ID Name Course_ID


101 Michael brown 71 ID = 101 101 Michael White 71
Course_ID = 71
101 Michael White 71
UPDATE
Landing

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Bronze to Silver layer
Today’s Date = 2024-09-17

ProcessingDate = {Today’s Date}


LH_Bronze LH_Silver

bronze_data silver_data

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Bronze to Silver layer

Data Cleaning
1.Handle duplicates
2.Handle Missing or NULL values
1.Delete rows for missing critical values
2.Fill rows with default values for other data
3.Standardize date formats
4.Check for logical consistency

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Bronze to Silver layer

Business transformations

Column Name Logic

Completion_Time_Days Completion_date – Enrollment_Date

(Quiz_Average_Score * 0.2) +
Performance_Score (Assignment_Average_score) * 0.2 +
(Project_Score) * 0.1

Course_Completion_Rate If completion_time_Days <=90


then “On-time”
Else “Delayed”
Author: Shanmukh Sattiraju
https://in.linkedin.com/in/shanmukh-sattiraju
Understanding UPSERT to Silver
Cleaned_Transformed_data Silver_Table

Std_ID Location Course_ID


311 Ohio 52 ID = 311
Course_ID = 52

LH_Bronze

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Understanding UPSERT to Silver
Cleaned_Transformed_data Silver_Table

Std_ID Location Course_ID Std_ID Location Course_ID


311 Ohio 52 ID = 311 101 Ohio 71
Course_ID = 52

LH_Bronze
INSERT

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Understanding UPSERT to Silver
Cleaned_Transformed_data Silver_Table

Std_ID Location Course_ID Std_ID Location Course_ID


311 London 52 ID = 311 101 Ohio 71
Course_ID = 52

LH_Bronze

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Understanding UPSERT to Silver
Cleaned_Transformed_data Silver_Table

Std_ID Location Course_ID Std_ID Location Course_ID


311 London 52 ID = 311 101 London 71
Course_ID = 52

LH_Bronze
UPDATE

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Silver to Gold layer
Today’s Date = 2024-09-17

LH_Silver ProcessingDate = {Today’s Date} LH_Gold

silver_data
Fact Dimension

Dimension

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Data modelling

Student_ID Student_ID

Course_ID Course_ID

dim_student

dim_course

fact_student_performance

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Git Integration

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
End to End flow

Raw Landing Bronze Silver Gold

DEV DEV

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
End to End flow

Raw Landing Bronze Silver Gold

PROD PROD

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Supported items for Git integration
• Data pipelines
• Lakehouse
• Notebooks
• Paginated reports
• Reports (except reports connected to semantic models hosted in Azure Analysis
Services, SQL Server Analysis Services or reports exported by Power BI Desktop
that depend on semantic models hosted in MyWorkspace)
• Semantic models (except push datasets, live connections to Analysis Services,
model v1).
• Spark Job Definitions
• Spark environment
• Warehouses
Author: Shanmukh Sattiraju
https://in.linkedin.com/in/shanmukh-sattiraju
Limitations of enabling Git

• Currently, only Git in Azure Repos with the same tenant as the Fabric
tenant is supported.
• If the workspace and Git repo are in two different geographical
regions, the tenant admin must enable cross-geo exports.
• Azure DevOps on-prem isn't supported.
• Sovereign clouds aren't supported.

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Permissions in workspace
• Admin: Can perform any operation on the workspace, limited only by
their Azure DevOps role.
• Member/Contributor: Once they connect to a workspace, a
member/contributor can commit and update changes, depending on
their Azure DevOps role. For actions related to the workspace
connection (for example, connect, disconnect, or switch branches)
seek help from an Admin.
• Viewer: Can't perform any actions. The viewer can't see any Git
related information in the workspace.

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Operation Workspace role Git permissions
Connect workspace to Git Admin Read=Allow
repo
Sync workspace with Git repo Admin Read=Allow

Disconnect workspace from Admin No permissions are


Git repo needed
Switch branch in the Admin Read=Allow (in
workspace (or any change in target
connection setting) repo/directory/branc
h)
View Git connection details Admin, Member, Contributor Read or None
See workspace 'Git status' Admin, Member, Contributor Read=Allow
Author: Shanmukh Sattiraju
https://in.linkedin.com/in/shanmukh-sattiraju
Operation Workspace role Git permissions
Update from Git All of the following: Read=Allow

Contributor in the workspace (WRITE permission on all items)

Owner of the item (if the tenant switch blocks updates for
nonowners)

BUILD on external dependencies (where applicable)

Commit workspace changes to Git All of the following: Read=Allow


Contribute=Allow
Contributor in the workspace (WRITE permission on all items) branch policy
should allow direct
Owner of the item (if the tenant switch blocks updates for commit
nonowners)

BUILD on external dependencies (where applicable)

Create new Git branch from within Admin Role=Write


Fabric Create
Author: Shanmukh Sattiraju
https://in.linkedin.com/in/shanmukh-sattiraju branch=Allow
Considerations with Azure DevOps
• The Azure DevOps account must be registered to the same user that is using the
Fabric workspace.
• Power BI Datasets connected to Analysis Services aren't supported at this time.

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Continuous Integration

Dev Workspace Azure DevOps

Approve Notebook /main branch


/main

Azure DevOps

Pull request
Feature Workspace

Commit changes Notebook /feature branch

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Changed items are listed with an icon
indicating the status:

new
modified
deleted
conflict

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Deployment pipeline (Continuous deployment)

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Continuous Deployment

Dev Workspace Prod Workspace

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Continuous Deployment

Dev Workspace Prod Workspace

Dev Storage Prod Storage

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Deployment rules
Change data source after deployment. Prod stage might be pointing to
different datasets

Item Data source rule Parameter rule Default Details


lakehouse rule
Dataflow ✅ ✅ ❌ Use to determine the values of the data sources or
parameters for a specific dataflow.

Semantic model ✅ ✅ ❌ Use to determine the values of the data sources or


parameters for a specific semantic model.

Datamart ✅ ✅ ❌ Use to determine the values of the data sources or


parameters for a specific datamart.

Paginated report ✅ ❌ ❌ Defined for the data sources of each paginated


report. Use to determine the data sources of the
paginated report.
Notebook ❌ ❌ ✅ Use to determine the default lakehouse for a
Author: Shanmukh Sattiraju specific notebook.
https://in.linkedin.com/in/shanmukh-sattiraju
Supported Items for deployment
When you deploy content from one pipeline stage to another, the copied
content can contain the following items:
• Data pipelines
• Dataflows Gen1
• Datamarts
• Lakehouse
• Notebooks
• Paginated reports
• Reports (based on supported semantic models)
• Spark environment
• Semantic models (except for DirectLake semantic models)
• Warehouses
Author: Shanmukh Sattiraju
https://in.linkedin.com/in/shanmukh-sattiraju
Version control for Power BI items

Azure DevOps Azure DevOps


Clone Repo
PR to main
.pbip

Visual Studio /Repo Visual Studio


Feature Branch Commit and push

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju
Congratulations on completing the Course !

Author: Shanmukh Sattiraju


https://in.linkedin.com/in/shanmukh-sattiraju

You might also like