Lake House Data at Scale With Power Bi
Lake House Data at Scale With Power Bi
Slides
CAT
Premium Sponsors
Raffle Prizes
Standard Sponsors
Benni De Jagere
Senior Program Manager | Fabric Customer Advisory Team ( FabricCAT )
Fabric CAT
.be Member
@BenniDeJagere
/bennidejagere
/bennidejagere
/bennidejagere
#SayNoToPieCharts
Stijn Wynants
Senior Customer Engineer | FastTrack Engineering
FastTrack
.be Member
@SQLStijn
/stijn-wynants-ba528660/
/Stijn-wynants
Fabric Espresso
#OneMore?
Disclaimer: We’re not benchmarking
Session Objectives
Session Objectives
Data Synapse Data Synapse Data Synapse Data Synapse Real Data
Power BI
Factory Engineering Science Warehousing Time Analytics Activator
OneLake
One Copy for all computes
Real separation of compute and storage
Data Synapse Data Synapse Data Synapse Data Synapse Real Data
Power BI The data is stored in a single common format
Factory Engineering Science Warehousing Time Analytics Activator
Spark T-SQL
Serverless KQL
Analysis
Once data is stored in the lake, it is directly
Compute Services
accessible by all the engines without needing
any import/export
Storage
Database files
“Import Mode”
DAX
Data
Queries
Power BI
Tables Scan Warehouse/ Import
Analysis
Latent & duplicative but fast
Reports
Lakehouse Services
Storage
Copy of
Tables
Database files
SQL DAX
“Direct Query Mode” Scan
Data
Queries
Power BI Queries
Tables Warehouse/ Analysis
Slow, but real time
Reports
Lakehouse Services
Storage
Database files
DAX
“Import Mode” Scan
Data
Import Power BI
Queries
Tables Warehouse/ Analysis
Latent & duplicative but fast
Reports
Lakehouse Services
Storage
Copy of
Tables
Parquet/Delta Lake
Perfect!
Reports
Lakehouse Services
OneLake
Why Delta?
Why Delta (Parquet)?
Header:
RowGroup1:
StoreID: StoreA, StoreA, StoreA
DateTime : 2023-01-01, 2023-01-02, 2023-01-03
ProductID : SKU001, SKU001, SKU001
Value: 10, 15, 12
RowGroup2:
….
Footer:
Inside Delta (Parquet) – Dictionary IDs
Header:
RowGroup1:
StoreID: 1, 1, 1
DateTime : 1, 2, 3
ProductID : 1, 1, 1
Value: 1, 2, 3
RowGroup2:
….
Footer:
Introducing V-Ordering
x3.2
Reduced IO for workloads
V-ordering in our demo case
STOP! Demo time!
Using Direct Lake mode over a Lakehouse
DirectLake Mode
On start, no data is loaded in-memory
Column data is transcoded from Parquet files when queried
Multi-column tables can have mix of transcoded (resident) and non-
resident
Column data can get evicted over time
DirectLake fallback to SQL Server for suitable sub-queries
“Framing” of dataset determines what gets loaded from DeltaLake
DQ Fallback
Dataset
Direct Lake Mode Delta Lake
Lakehouse (Parquet Files)
Trips002.parquet
Verti-
Scan Trips003.parquet
DimBike001.parquet
Duration
On demand transcoding as
Station
Trips
Bike
needed
Framing
What is framing
"point in time" way of tracking what data can be queried by DirectLake
Why is this important
Delta-lake data is transient for many reasons
ETL Process
Ingest data to delta lake tables
Transform as needed using preferred tool
When ready, perform Framing operation on dataset
Framing is near instant and acts like a cursor
Determines the set of .parquet files to use/ignore for transcoding operations
Framing
Source Data Delta Lake Power BI
(ADLS Parquet Files)
Dataset
EVALUATE ‘Table’
1,2,3 1,2,3 Full Refresh 1
Value
4,5,6 4,5,6 Full Refresh 2 -------
1
2
7,8,9 7,8,9 Full Refresh 3 3
4
5
6
7
8
9
STOP! Demo time!
Let’s look at Framing
Optimizing Delta for Direct Lake mode
Optimizing Delta for Direct Lake mode
• V-Order makes a big difference, as it’s tailored for Verti-Scan
• Direct Lake will work over Shortcuts to external data
Expect a performance impact, because reasons ..