0% found this document useful (0 votes)
87 views38 pages

Lake House Data at Scale With Power Bi

Uploaded by

JAKSON
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
87 views38 pages

Lake House Data at Scale With Power Bi

Uploaded by

JAKSON
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 38

Using Lakehouse data

at scale with Power BI.


Featuring Power BI
Direct Lake mode!
Stijn Wynants
Benni De Jagere

Slides

CAT
Premium Sponsors
Raffle Prizes
Standard Sponsors
Benni De Jagere
Senior Program Manager | Fabric Customer Advisory Team ( FabricCAT )

Fabric CAT
.be Member
@BenniDeJagere
/bennidejagere
/bennidejagere
/bennidejagere
#SayNoToPieCharts
Stijn Wynants
Senior Customer Engineer | FastTrack Engineering

FastTrack
.be Member
@SQLStijn
/stijn-wynants-ba528660/
/Stijn-wynants
Fabric Espresso
#OneMore?
Disclaimer: We’re not benchmarking
Session Objectives
Session Objectives

Introduce Fabric and OneLake


Set the scene for Direct Lake
Take it for spin.. ☺
Introducing Fabric
Microsoft Fabric
The unified data platform for the era of AI

Data Synapse Data Synapse Data Synapse Data Synapse Real Data
Power BI
Factory Engineering Science Warehousing Time Analytics Activator

OneLake
One Copy for all computes
Real separation of compute and storage

All the compute engines store their data


automatically in OneLake

Data Synapse Data Synapse Data Synapse Data Synapse Real Data
Power BI The data is stored in a single common format
Factory Engineering Science Warehousing Time Analytics Activator

Delta – Parquet, an open standards format,


is the storage format for all tabular data in
Analytics vNext

Spark T-SQL
Serverless KQL
Analysis
Once data is stored in the lake, it is directly
Compute Services
accessible by all the engines without needing
any import/export

All the compute engines have been fully


optimized to work with Delta Parquet as their
Customers Service Business
360
Finance
Telemetry KPIs
native format

Delta – Delta – Delta – Delta – Shared universal security model is enforced


Parquet Parquet Parquet Parquet
FormatÅ Format Format Format across all the engines
Database files
SQL
“Direct Query Mode”
DAX
Queries Queries
Data Power BI
Tables Scan Warehouse/ Analysis

Slow, but real time


Reports
Lakehouse Services

Storage

Database files

“Import Mode”
DAX
Data
Queries
Power BI
Tables Scan Warehouse/ Import
Analysis
Latent & duplicative but fast
Reports
Lakehouse Services

Storage
Copy of
Tables
Database files
SQL DAX
“Direct Query Mode” Scan
Data
Queries
Power BI Queries
Tables Warehouse/ Analysis
Slow, but real time
Reports
Lakehouse Services

Storage

Database files
DAX
“Import Mode” Scan
Data
Import Power BI
Queries
Tables Warehouse/ Analysis
Latent & duplicative but fast
Reports
Lakehouse Services

Storage
Copy of
Tables

Parquet/Delta Lake

“Direct Lake Mode”


DAX
Data
Queries
Power BI
Tables Warehouse/ Scan Analysis

Perfect!
Reports
Lakehouse Services

OneLake
Why Delta?
Why Delta (Parquet)?

Open Standard for data format

Column oriented, efficient data storage and retrieval

Efficient Data Compression and Encoding

Becoming the Industry Standard

Well suited for pruning ( Column, rowgroup)

Thrives on bulk operations


Inside Delta (Parquet)

Header:
RowGroup1:
StoreID: StoreA, StoreA, StoreA
DateTime : 2023-01-01, 2023-01-02, 2023-01-03
ProductID : SKU001, SKU001, SKU001
Value: 10, 15, 12
RowGroup2:
….
Footer:
Inside Delta (Parquet) – Dictionary IDs

Header:
RowGroup1:
StoreID: 1, 1, 1
DateTime : 1, 2, 3
ProductID : 1, 1, 1
Value: 1, 2, 3
RowGroup2:
….
Footer:
Introducing V-Ordering

Write time optimization to parquet files


Sorting, row group distribution, dictionary encoding, and
compression (Shuffling)
Complies to the open standard
Z-Order, compaction, vacuum, time travel, etc. are
orthogonal to V-Order
V-ordering in action
Microsoft Internal DB (162 tables)

CSV Parquet V-Order

880GB 268GB 84GB

x3.2
Reduced IO for workloads
V-ordering in our demo case
STOP! Demo time!
Using Direct Lake mode over a Lakehouse
DirectLake Mode
On start, no data is loaded in-memory
Column data is transcoded from Parquet files when queried
Multi-column tables can have mix of transcoded (resident) and non-
resident
Column data can get evicted over time
DirectLake fallback to SQL Server for suitable sub-queries
“Framing” of dataset determines what gets loaded from DeltaLake
DQ Fallback
Dataset
Direct Lake Mode Delta Lake
Lakehouse (Parquet Files)

DAX/MDX Fallback? DQ Trips001.parquet

Trips002.parquet

Verti-
Scan Trips003.parquet

DimBike001.parquet
Duration

On demand transcoding as
Station
Trips

Bike

needed
Framing
What is framing
"point in time" way of tracking what data can be queried by DirectLake
Why is this important
Delta-lake data is transient for many reasons
ETL Process
Ingest data to delta lake tables
Transform as needed using preferred tool
When ready, perform Framing operation on dataset
Framing is near instant and acts like a cursor
Determines the set of .parquet files to use/ignore for transcoding operations
Framing
Source Data Delta Lake Power BI
(ADLS Parquet Files)
Dataset
EVALUATE ‘Table’
1,2,3 1,2,3 Full Refresh 1
Value
4,5,6 4,5,6 Full Refresh 2 -------
1
2
7,8,9 7,8,9 Full Refresh 3 3
4
5
6
7
8
9
STOP! Demo time!
Let’s look at Framing
Optimizing Delta for Direct Lake mode
Optimizing Delta for Direct Lake mode
• V-Order makes a big difference, as it’s tailored for Verti-Scan
• Direct Lake will work over Shortcuts to external data
Expect a performance impact, because reasons ..

• Direct Lake thrives on fewer, larger .parquet files


Physical structure will always be crucial
OPTIMIZE (bin-compaction) and VACUUM in the Data Engineering process will be key
Especially with streaming/small batch architectures, keep this in mind

• Principle of lean models will still apply


Only include what’s needed for the reports and datasets
Common Answers to Common Questions
“Greatest Hits”
• Delta doesn’t like spaces in object names ☺
• Delta Tables are a hard requirement for Direct Lake mode
Dataflows Gen2, Pipelines, Notebooks can create them for you in the lakehouse

• Web modelling is the only way to use DirectLake for now


• XMLA Read/Write is not yet supported
No External Tools, Calc Groups, ..

• DirectLake doesn’t have unique DAX limitations


DQ does ..

• No confirmed plans right now to support Apache Iceberg, HUDI, ..


• No, you can’t have Copilot yet
What does this mean for my data modelling?
Thanks, @KoVer!
Data should be transformed as far
upstream as possible, and as far
downstream as necessary.

Matthew Roche, 2021


(The purple haired sword afficionado)
https://ssbipolar.com/2021/05/31/roches-maxim
Resources
https://learn.microsoft.com/en-us/power-bi/enterprise/directlake-overview
https://learn.microsoft.com/en-us/power-bi/enterprise/directlake-analyze-qp
https://learn.microsoft.com/en-us/fabric/data-engineering/lakehouse-pbi-
reporting
https://learn.microsoft.com/en-us/fabric/data-engineering/delta-optimization-
and-v-order?tabs=sparksql
https://fabric.guru/power-bi-direct-lake-mode-frequently-asked-questions
https://www.fourmoo.com/2023/05/24/using-power-bi-directlake-in-microsoft-
fabric/
Slides
https://github.com/BenniDeJagere/Presentations/{Year}/{YYYYMMDD}_{Event}
Thank you

You might also like