0% found this document useful (0 votes)

87 views38 pages

Lake House Data at Scale With Power Bi

Uploaded by

JAKSON

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

87 views38 pages

Lake House Data at Scale With Power Bi

Uploaded by

JAKSON

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 38

Using Lakehouse data

at scale with Power BI.

Featuring Power BI
Direct Lake mode!
Stijn Wynants
Benni De Jagere

Slides

CAT
Premium Sponsors
Raffle Prizes
Standard Sponsors
Benni De Jagere
Senior Program Manager | Fabric Customer Advisory Team ( FabricCAT )

Fabric CAT
.be Member
@BenniDeJagere
/bennidejagere
/bennidejagere
/bennidejagere
#SayNoToPieCharts
Stijn Wynants
Senior Customer Engineer | FastTrack Engineering

FastTrack
.be Member
@SQLStijn
/stijn-wynants-ba528660/
/Stijn-wynants
Fabric Espresso
#OneMore?
Disclaimer: We’re not benchmarking
Session Objectives
Session Objectives

Introduce Fabric and OneLake

Set the scene for Direct Lake
Take it for spin.. ☺
Introducing Fabric
Microsoft Fabric
The unified data platform for the era of AI

Data Synapse Data Synapse Data Synapse Data Synapse Real Data
Power BI
Factory Engineering Science Warehousing Time Analytics Activator

OneLake
One Copy for all computes
Real separation of compute and storage

All the compute engines store their data

automatically in OneLake

Data Synapse Data Synapse Data Synapse Data Synapse Real Data
Power BI The data is stored in a single common format
Factory Engineering Science Warehousing Time Analytics Activator

Delta – Parquet, an open standards format,

is the storage format for all tabular data in
Analytics vNext

Spark T-SQL
Serverless KQL
Analysis
Once data is stored in the lake, it is directly
Compute Services
accessible by all the engines without needing
any import/export

All the compute engines have been fully

optimized to work with Delta Parquet as their
Customers Service Business
360
Finance
Telemetry KPIs
native format

Delta – Delta – Delta – Delta – Shared universal security model is enforced

Parquet Parquet Parquet Parquet
FormatÅ Format Format Format across all the engines
Database files
SQL
“Direct Query Mode”
DAX
Queries Queries
Data Power BI
Tables Scan Warehouse/ Analysis

Slow, but real time

Reports
Lakehouse Services

Storage

Database files

“Import Mode”
DAX
Data
Queries
Power BI
Tables Scan Warehouse/ Import
Analysis
Latent & duplicative but fast
Reports
Lakehouse Services

Storage
Copy of
Tables
Database files
SQL DAX
“Direct Query Mode” Scan
Data
Queries
Power BI Queries
Tables Warehouse/ Analysis
Slow, but real time
Reports
Lakehouse Services

Storage

Database files
DAX
“Import Mode” Scan
Data
Import Power BI
Queries
Tables Warehouse/ Analysis
Latent & duplicative but fast
Reports
Lakehouse Services

Storage
Copy of
Tables

Parquet/Delta Lake

“Direct Lake Mode”

DAX
Data
Queries
Power BI
Tables Warehouse/ Scan Analysis

Perfect!
Reports
Lakehouse Services

OneLake
Why Delta?
Why Delta (Parquet)?

Open Standard for data format

Column oriented, efficient data storage and retrieval

Efficient Data Compression and Encoding

Becoming the Industry Standard

Well suited for pruning ( Column, rowgroup)

Thrives on bulk operations

Inside Delta (Parquet)

Header:
RowGroup1:
StoreID: StoreA, StoreA, StoreA
DateTime : 2023-01-01, 2023-01-02, 2023-01-03
ProductID : SKU001, SKU001, SKU001
Value: 10, 15, 12
RowGroup2:
….
Footer:
Inside Delta (Parquet) – Dictionary IDs

Header:
RowGroup1:
StoreID: 1, 1, 1
DateTime : 1, 2, 3
ProductID : 1, 1, 1
Value: 1, 2, 3
RowGroup2:
….
Footer:
Introducing V-Ordering

Write time optimization to parquet files

Sorting, row group distribution, dictionary encoding, and
compression (Shuffling)
Complies to the open standard
Z-Order, compaction, vacuum, time travel, etc. are
orthogonal to V-Order
V-ordering in action
Microsoft Internal DB (162 tables)

CSV Parquet V-Order

880GB 268GB 84GB

x3.2
Reduced IO for workloads
V-ordering in our demo case
STOP! Demo time!
Using Direct Lake mode over a Lakehouse
DirectLake Mode
On start, no data is loaded in-memory
Column data is transcoded from Parquet files when queried
Multi-column tables can have mix of transcoded (resident) and non-
resident
Column data can get evicted over time
DirectLake fallback to SQL Server for suitable sub-queries
“Framing” of dataset determines what gets loaded from DeltaLake
DQ Fallback
Dataset
Direct Lake Mode Delta Lake
Lakehouse (Parquet Files)

DAX/MDX Fallback? DQ Trips001.parquet

Trips002.parquet

Verti-
Scan Trips003.parquet

DimBike001.parquet
Duration

On demand transcoding as
Station
Trips

Bike

needed
Framing
What is framing
"point in time" way of tracking what data can be queried by DirectLake
Why is this important
Delta-lake data is transient for many reasons
ETL Process
Ingest data to delta lake tables
Transform as needed using preferred tool
When ready, perform Framing operation on dataset
Framing is near instant and acts like a cursor
Determines the set of .parquet files to use/ignore for transcoding operations
Framing
Source Data Delta Lake Power BI
(ADLS Parquet Files)
Dataset
EVALUATE ‘Table’
1,2,3 1,2,3 Full Refresh 1
Value
4,5,6 4,5,6 Full Refresh 2 -------
1
2
7,8,9 7,8,9 Full Refresh 3 3
4
5
6
7
8
9
STOP! Demo time!
Let’s look at Framing
Optimizing Delta for Direct Lake mode
Optimizing Delta for Direct Lake mode
• V-Order makes a big difference, as it’s tailored for Verti-Scan
• Direct Lake will work over Shortcuts to external data
Expect a performance impact, because reasons ..

• Direct Lake thrives on fewer, larger .parquet files

Physical structure will always be crucial
OPTIMIZE (bin-compaction) and VACUUM in the Data Engineering process will be key
Especially with streaming/small batch architectures, keep this in mind

• Principle of lean models will still apply

Only include what’s needed for the reports and datasets
Common Answers to Common Questions
“Greatest Hits”
• Delta doesn’t like spaces in object names ☺
• Delta Tables are a hard requirement for Direct Lake mode
Dataflows Gen2, Pipelines, Notebooks can create them for you in the lakehouse

• Web modelling is the only way to use DirectLake for now

• XMLA Read/Write is not yet supported
No External Tools, Calc Groups, ..

• DirectLake doesn’t have unique DAX limitations

DQ does ..

• No confirmed plans right now to support Apache Iceberg, HUDI, ..

• No, you can’t have Copilot yet
What does this mean for my data modelling?
Thanks, @KoVer!
Data should be transformed as far
upstream as possible, and as far
downstream as necessary.

Matthew Roche, 2021

(The purple haired sword afficionado)
https://ssbipolar.com/2021/05/31/roches-maxim
Resources
https://learn.microsoft.com/en-us/power-bi/enterprise/directlake-overview
https://learn.microsoft.com/en-us/power-bi/enterprise/directlake-analyze-qp
https://learn.microsoft.com/en-us/fabric/data-engineering/lakehouse-pbi-
reporting
https://learn.microsoft.com/en-us/fabric/data-engineering/delta-optimization-
and-v-order?tabs=sparksql
https://fabric.guru/power-bi-direct-lake-mode-frequently-asked-questions
https://www.fourmoo.com/2023/05/24/using-power-bi-directlake-in-microsoft-
fabric/
Slides
https://github.com/BenniDeJagere/Presentations/{Year}/{YYYYMMDD}_{Event}
Thank you

Pyspark Practice - Databricks
No ratings yet
Pyspark Practice - Databricks
66 pages
PySpark and Azure Data Engineer Free Notes
No ratings yet
PySpark and Azure Data Engineer Free Notes
65 pages
Spark QA
No ratings yet
Spark QA
34 pages
Pyspark
No ratings yet
Pyspark
31 pages
SQL Interview Questions For A Data Engineer
No ratings yet
SQL Interview Questions For A Data Engineer
11 pages
Master Pyspark Zero To Hero 1738689679
No ratings yet
Master Pyspark Zero To Hero 1738689679
102 pages
APJ Elevate - Databricks Certification Exam Overview Training Data Analyst Associate
No ratings yet
APJ Elevate - Databricks Certification Exam Overview Training Data Analyst Associate
96 pages
Fabric
100% (1)
Fabric
46 pages
ADE Azure Data Engineer Interview
No ratings yet
ADE Azure Data Engineer Interview
12 pages
Databricks Optimization Technique
No ratings yet
Databricks Optimization Technique
18 pages
Power Bi Interview Question Asked in Tech Mahindra 1721390502
No ratings yet
Power Bi Interview Question Asked in Tech Mahindra 1721390502
15 pages
Unity Catalog
No ratings yet
Unity Catalog
16 pages
Spark SQL Optimization
No ratings yet
Spark SQL Optimization
29 pages
Data Lake Bootcamp: Building Reliable Data Lakes
No ratings yet
Data Lake Bootcamp: Building Reliable Data Lakes
29 pages
Data Bricks
No ratings yet
Data Bricks
20 pages
Fabric Get Started
No ratings yet
Fabric Get Started
99 pages
PySpark Tutorial For Beginners - Python Examples - Spark by (Examples)
No ratings yet
PySpark Tutorial For Beginners - Python Examples - Spark by (Examples)
19 pages
1 Introduction To Databricks Machine Learning
No ratings yet
1 Introduction To Databricks Machine Learning
9 pages
Ultimate Big Data Masters Program Curriculum v1
No ratings yet
Ultimate Big Data Masters Program Curriculum v1
14 pages
De Mod 4 Build Data Pipelines With Delta Live Tables
No ratings yet
De Mod 4 Build Data Pipelines With Delta Live Tables
52 pages
Caching in Spark
No ratings yet
Caching in Spark
51 pages
Snowflake Architecture
No ratings yet
Snowflake Architecture
18 pages
DP 3011 ENU PowerPoint - 01 Content
No ratings yet
DP 3011 ENU PowerPoint - 01 Content
42 pages
Big Data With Apache Spark 3 and Python From Zero To Expert
No ratings yet
Big Data With Apache Spark 3 and Python From Zero To Expert
28 pages
MicrosoftFabric Training
No ratings yet
MicrosoftFabric Training
16 pages
Maneesh Azure
No ratings yet
Maneesh Azure
6 pages
How To Create Secrets in Databricks? - by Ashish Garg - Medium
No ratings yet
How To Create Secrets in Databricks? - by Ashish Garg - Medium
13 pages
Connect Databricks Delta Tables With DBeaver
No ratings yet
Connect Databricks Delta Tables With DBeaver
10 pages
Spark Optimizations & Deployment
No ratings yet
Spark Optimizations & Deployment
39 pages
Dokumen - Pub - Understanding Etl Data Pipelines For Modern Data Architectures Early Release 9781098159252
No ratings yet
Dokumen - Pub - Understanding Etl Data Pipelines For Modern Data Architectures Early Release 9781098159252
39 pages
Shelly Bansal - SR Data Engineer
No ratings yet
Shelly Bansal - SR Data Engineer
6 pages
Matthieu - Lamairesse - Reda - Khouani - Why The Best Serverless Data Warehouse Is A Lakehouse - (DAIWT - PARIS)
No ratings yet
Matthieu - Lamairesse - Reda - Khouani - Why The Best Serverless Data Warehouse Is A Lakehouse - (DAIWT - PARIS)
38 pages
Tableau Interview Questions and Answers
No ratings yet
Tableau Interview Questions and Answers
5 pages
Lab - Qlik Replicate Azure Databricks
No ratings yet
Lab - Qlik Replicate Azure Databricks
16 pages
Databricks & PySpark Learning Day-10
No ratings yet
Databricks & PySpark Learning Day-10
4 pages
Databricks Questions
No ratings yet
Databricks Questions
23 pages
Spark Use Cases
No ratings yet
Spark Use Cases
2 pages
Airflow 2 X
100% (2)
Airflow 2 X
39 pages
Set Your Data in Motion
No ratings yet
Set Your Data in Motion
8 pages
SCD Typ2 in Databricks Azure
0% (1)
SCD Typ2 in Databricks Azure
8 pages
Matillion Optimizing Snowflake
No ratings yet
Matillion Optimizing Snowflake
23 pages
ABD22 1st Exam - 6 January - Attempt Review
No ratings yet
ABD22 1st Exam - 6 January - Attempt Review
13 pages
SCD Type 2. Pyspark
No ratings yet
SCD Type 2. Pyspark
7 pages
Databricks Best Practices
No ratings yet
Databricks Best Practices
25 pages
Open Source Data Engineering Landscape 2024 by Alireza Sadeghi Feb, 2024 Medium
No ratings yet
Open Source Data Engineering Landscape 2024 by Alireza Sadeghi Feb, 2024 Medium
25 pages
PowerBI Interview
No ratings yet
PowerBI Interview
25 pages
Mining Your Data Lake For Analytics Insights v3 101420
No ratings yet
Mining Your Data Lake For Analytics Insights v3 101420
16 pages
SCD in Databricks
No ratings yet
SCD in Databricks
16 pages
Snowflake To Oracle
No ratings yet
Snowflake To Oracle
16 pages
SnowPro Advanced Data Engineer
No ratings yet
SnowPro Advanced Data Engineer
8 pages
ADB Course Catalog
No ratings yet
ADB Course Catalog
84 pages
Star and Snowflake Schemas
No ratings yet
Star and Snowflake Schemas
4 pages
Real Time Data Processing With PDI
No ratings yet
Real Time Data Processing With PDI
15 pages
Big Data and Spark Developers
No ratings yet
Big Data and Spark Developers
5 pages
Cloudera Introduction PDF
No ratings yet
Cloudera Introduction PDF
97 pages
Jamb Mat Questions 1 5
No ratings yet
Jamb Mat Questions 1 5
46 pages
Bodybuilding, Drugs and Risk
No ratings yet
Bodybuilding, Drugs and Risk
230 pages
WebSphere DataPower SOA Appliances and XSLT Part 1
No ratings yet
WebSphere DataPower SOA Appliances and XSLT Part 1
23 pages
Globe Telecom Globe Telecom: Jedi + 3G Migration
No ratings yet
Globe Telecom Globe Telecom: Jedi + 3G Migration
51 pages
Marking Guideline: Building and Structural Construction N5
No ratings yet
Marking Guideline: Building and Structural Construction N5
8 pages
Term 2 Basic 3 Week 3 Lesson Plan
No ratings yet
Term 2 Basic 3 Week 3 Lesson Plan
20 pages
Assessment of Credit Management in Micro Finance Institution
No ratings yet
Assessment of Credit Management in Micro Finance Institution
42 pages
Repulsion Motor
100% (1)
Repulsion Motor
12 pages
2 新车准备
No ratings yet
2 新车准备
7 pages
Age of Empires Rise of Rome
No ratings yet
Age of Empires Rise of Rome
35 pages
A Brief History of Consumer Culture
No ratings yet
A Brief History of Consumer Culture
6 pages
Indian Railway
No ratings yet
Indian Railway
29 pages
O Level Forces
No ratings yet
O Level Forces
16 pages
Answer
100% (2)
Answer
7 pages
RDBMS Unit2
No ratings yet
RDBMS Unit2
28 pages
Energy Relationships in Chemical Reactions
No ratings yet
Energy Relationships in Chemical Reactions
11 pages
The Impact of Digital Marketing Management On Customers Buying Behavior
No ratings yet
The Impact of Digital Marketing Management On Customers Buying Behavior
22 pages
Akash Internship Report
No ratings yet
Akash Internship Report
49 pages
UG - CAO - .00132-002 Tools & Equipment
No ratings yet
UG - CAO - .00132-002 Tools & Equipment
39 pages
Fundamentals of Multimedia
No ratings yet
Fundamentals of Multimedia
3 pages
Glass Ampoules & Glass Vials Import Sample
No ratings yet
Glass Ampoules & Glass Vials Import Sample
15 pages
Animal Toxins: - Composition & Chemical Properties
No ratings yet
Animal Toxins: - Composition & Chemical Properties
6 pages
Cico Plast-N: Normal Water Reducing Admixture / Plasticiser For Concrete
No ratings yet
Cico Plast-N: Normal Water Reducing Admixture / Plasticiser For Concrete
2 pages
Abdul - Azeez Bin Abdullaah Bin Baaz
No ratings yet
Abdul - Azeez Bin Abdullaah Bin Baaz
4 pages
Java Lab Cycle Programs 2022
No ratings yet
Java Lab Cycle Programs 2022
2 pages
SOW Ransomware Protection Solution v1
No ratings yet
SOW Ransomware Protection Solution v1
11 pages
Definition: The Ability To Use Strength Quickly To Produce An Explosive Effort
No ratings yet
Definition: The Ability To Use Strength Quickly To Produce An Explosive Effort
41 pages
Benchmark Report - Voice Service Optimization For Common State, TP20160728
No ratings yet
Benchmark Report - Voice Service Optimization For Common State, TP20160728
16 pages
Cue Words Relaxation
No ratings yet
Cue Words Relaxation
4 pages
2012 - 2013 Full Program
No ratings yet
2012 - 2013 Full Program
36 pages
Data Engineering with Scala and Spark: Build streaming and batch pipelines that process massive amounts of data using Scala
From Everand
Data Engineering with Scala and Spark: Build streaming and batch pipelines that process massive amounts of data using Scala
Eric Tome
No ratings yet
Building Modern Data Applications Using Databricks Lakehouse: Develop, optimize, and monitor data pipelines on Databricks
From Everand
Building Modern Data Applications Using Databricks Lakehouse: Develop, optimize, and monitor data pipelines on Databricks
Will Girten
No ratings yet
HDInsight Essentials - Second Edition
From Everand
HDInsight Essentials - Second Edition
Rajesh Nadipalli
No ratings yet
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
From Everand
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
Wei Liu
No ratings yet
Databricks Essentials: A Guide to Unified Data Analytics
From Everand
Databricks Essentials: A Guide to Unified Data Analytics
Robert Johnson
No ratings yet

Lake House Data at Scale With Power Bi

Uploaded by

Lake House Data at Scale With Power Bi

Uploaded by

Using Lakehouse data

at scale with Power BI.

Introduce Fabric and OneLake

All the compute engines store their data

Delta – Parquet, an open standards format,

All the compute engines have been fully

Delta – Delta – Delta – Delta – Shared universal security model is enforced

Slow, but real time

“Direct Lake Mode”

Open Standard for data format

Column oriented, efficient data storage and retrieval

Efficient Data Compression and Encoding

Becoming the Industry Standard

Well suited for pruning ( Column, rowgroup)

Thrives on bulk operations

Write time optimization to parquet files

CSV Parquet V-Order

880GB 268GB 84GB

DAX/MDX Fallback? DQ Trips001.parquet

• Direct Lake thrives on fewer, larger .parquet files

• Principle of lean models will still apply

• Web modelling is the only way to use DirectLake for now

• DirectLake doesn’t have unique DAX limitations

• No confirmed plans right now to support Apache Iceberg, HUDI, ..

Matthew Roche, 2021

You might also like