2025 04 Power Bi On Databricks Best Practices Cheat Sheet
2025 04 Power Bi On Databricks Best Practices Cheat Sheet
This document summarizes the most relevant best practices when using Power BI
PHASE DATA PREPARATION SQL SERVING POWER BI INTEGRATION POWER BI REPORT DESIGN
Read if you are Preparing the dataset that will be served Setting up and configuring the Databricks SQL Configuring the integration between Designing semantic models and reports in
for the report Warehouse to which Power BI connects Power BI and Databricks Power BI
Your objective ■ Implement an efficient ■ Meet performance SLA ■ Optimize semantic model ■ Create fast and efficient
data model and scalability for performance reports and dashboards
■ Optimize storage layer for ■ Keep costs under control ■ Leverage unified governance ■ Inherit the optimizations
query performance performed in all the
other layers
■ Adopt medallion architecture on ■ Always use a SQL warehouse ■ Ensure that Power BI and Databricks are ■ Limit the number of visuals on each
Delta Lake (link) and serve the Gold (and not all-purpose clusters) (link) → hosted as closely as possible, ideally in report page → limit the number of queries
layer only (link) → leverage benefits of SQL warehouses are optimized for the same region → this would minimize that will be executed.
Delta and aggregated data for better BI workloads. network latency and may help avoid ■ Limit the number of rows and columns
report performance. ■ Use SQL Serverless warehouse (link) → cross-region traffic costs.
in semantic models and report visuals →
■ When designing your data model, opt if optimal price/performance and delivers ■ Use the most appropriate Power BI avoid large data transfers.
possible for a star schema (link, link) → instant and elastic compute. Leverage storage mode: Direct Query for Fact ■ Leverage user-defined aggregations
Power BI performs better. Serverless query result cache even when tables, Dual for Dimension tables
(link) → improve query performance over
■ Leverage SQL views and persisted SQL Warehouse is scaled or restarted. (NOT Import) (link) → let Power BI
large DirectQuery semantic models by
■ Enable SQL warehouse Auto stop generate more efficient SQL queries.
tables with the required granularity caching pre-aggregated data.
(e.g., date/time) → improve performance (scale-to-zero) only if the SLA permits; ■ Evaluate where and how to use
■ Use automatic aggregations (link) →
by pre-aggregating data for the most the SQL warehouse will take just 5–10s composite models (link) → allows a
continuously optimize DirectQuery
common and resource-intensive queries. to start up when the first query arrives → mixed usage of DirectQuery, Dual, Import
semantic models by building
■ Declare primary and foreign keys (link), save costs when the warehouse is mode tables, as well as Aggregation and
aggregations based on Query History
not used. Hybrid tables.
and use RELY (link) where possible → for maximum report performance.
Power BI can leverage this information to ■ Use higher cluster size for larger ■ Use hybrid tables whenever you need
■ If referential integrity has been validated
automatically create table relationships, datasets (link) → the larger the cluster aggregated historical data augmented
in the upstream ingestion use “Assume
and the Databricks SQL engine can (M, L, XL, etc.), the faster complex queries with detailed real-time data for the same
Referential Integrity” when defining
optimize queries using PK constraints. run. If having only simple, short-running table (link) → efficient and quick in-
table relations (link) → more efficient
■ Avoid using wide data types and queries, don’t increase the size (might be memory queries combined with
join strategies in SQL queries.
slower due to data shuffling). the latest data changes directly
high-cardinality columns → reduce ■ Avoid “many-to-many” relationships
■ Use SQL Warehouse scaling from the source.
Power BI semantic model size and where possible → decrease complexity
improve query performance. (min/max cluster count) (link) → handle ■ In Import mode, use table partitioning
and improve Power BI model efficiency.
■ Use auto-generated columns when more concurrent users / queries. (link). Alternatively use incremental
■ Configure “Is nullable” for table columns
SQL Warehouse scales out to handle the refresh (link) → allows importing data
you need to generate a derived value where applicable → Power BI generates
increased workload. When hitting limits, faster and managing larger datasets.
from other existing columns (link) → simpler and more efficient SQL queries.
queries get queued, not rejected. ■ Check for query parallelization
persisted columns minimize the need
Best ■ If expecting many concurrent queries, ■ “Move left” transformations whenever
for calculating values at query time. configuration settings (link) → improves
practices: increase the minimum number of query parallelization and maximizes possible (e.g., prefer SQL views over
■ Use Liquid Clustering (link); alternatively,
What you clusters → prevent queueing queries utilization of SQL warehouse to improve PowerQuery transformations and DAX
use Z-ordering (link) and consider formulas) → leverage the power of
should do waiting for scaling out. overall performance.
partitioning larger tables >1TB (link) → Databricks SQL engine for more efficient
improve query performance by efficient ■ Use the same SQL Warehouse ■ Connect Power BI to Databricks using
report execution.
file and data skipping. whenever the same dataset is queried → single sign-on (SSO) (link) → allows
■ If using DAX, review your code to use
■ Optimize your data layout by using will leverage the various layers of caching leveraging security and governance
available, increasing performance. controls implemented in Databricks efficient DAX calculations (link) →
predictive optimization (link) or running inefficient calculations can lead to
■ Use separate SQL warehouses for Unity Catalog (link) and allows audit
VACUUM (link) and OPTIMIZE (link) → deteriorated performance.
data access.
improve performance by deleting old files different workloads and/or business
■ If you need to connect to different ■ Leverage query reduction settings by
and optimizing the physical data layout. units → right-size SQL warehouses
to achieve better performance and Databricks environments, use Power adding Apply/Clear All Slicers button
■ Periodically compute statistics (link)
reasonable costs. BI parameters (link) → allows flexibility (link) → prevent a new query from being
or use automatic statistics (currently in sent to the data source every time the
preview, link) → improve performance by ■ If unsure about sizing, start with a when connecting to different
Databricks workspaces or different user interacts with the report’s filters.
choosing a more efficient join strategy. Medium SQL Warehouse scaling
Databricks SQL warehouses. ■ Avoid using DAX calculated columns
■ Evaluate using materialized views (link) between 1 and 10; monitor query
response time and scaling → ■ Use gateway clusters (link) to and calculated tables in semantic
to calculate results incrementally using models → will perform better if
then adjust the sizing based on connect to IP ACL or private link-secured
the latest data in the source tables → defined directly in your Gold tables.
the observed results. Databricks workspaces → avoid single
improve performance by leveraging Measures that can be precomputed
precomputed results. ■ Leverage the Query History (link) and points of failure and load balance traffic
across gateways in a cluster. as columns also perform best if done
SQL warehouse events (link) system
in the Gold layer.
tables to programmatically monitor ■ Use Publish to Power BI Service (link) →
SQL Warehouses and query performance enables seamless catalog integration and
→ allows for identifying performance data model sync, allowing you to publish
bottlenecks and issues, implementing datasets directly to Power BI Service
more detailed analysis and setting without leaving the Databricks UI.
up alerts. ■ Use Automatic Publishing to
Power BI (link) → publish datasets
from Unity Catalog to Power BI
directly from data pipelines.
1. Ensure the data layout is regularly 1. Monitor the SQL Warehouse performance 1. Evaluate which Power BI storage mode 1. Use Power BI Performance Analyzer
optimized with VACUUM and OPTIMIZE. in the SQL Warehouse/Monitoring page is used. Prefer DirectQuery for Fact to examine report element performance
(link) looking for: tables and Dual for Dimensions → ideal (link) → identify the visual that takes
2. Evaluate generating aggregated views for compromise between performance and the most to load and where the
the most common and resource-intensive a. Running queries vs. Queued freshness of the data. bottleneck is (DAX Query, Visual
queries whenever applicable. queries. If too many Queued Display, DirectQuery, etc.).
queries: 1) increase the number 2. For very small, static and performance-
3. In the SQL Warehouse/Monitoring page of clusters if already reaching the sensitive (<2s) reports, evaluate the 2. Ensure there are not too many visuals
(link), review the Query History looking for maximum allowed, 2) evaluate if worth usage of Import mode → would provide in the same report → many visuals could
the queries with the longest duration: increasing the cluster size if there are the best report performance. generate lots of queries that can be
long-running queries. queued by Power Bl or Databricks
a. Ensure the query leverages the way 3. Especially for DirectQuery, check (e.g., the query itself runs fast but spends
the storage layer has been optimized b. Number of active clusters: how many queries Power BI can send time in the queue).
or adapt it if needed. 1) evaluate to set a higher minimum in parallel to Databricks. Ensure the
number of clusters if queries are Databricks SQL warehouse is sized 3. For the most common and
b. Open the Query Profile, enable
queued while scaling out, 2) check if accordingly to handle the required level resource-intensive queries, ask for
verbose mode, review the tasks that
the cluster scaled in to zero when the of parallelism → avoids queries to be creating SQL views or persisted tables
took the most time and/or memory
queries arrived. queued, resulting in a slow report. in the Gold layer, which provides pre-
looking for:
aggregated data (often valid for Date
Troubleshooting: 3. Monitor Query History in the SQL 4. For performance fine-tuning, evaluate dimension) → results in overall better
i. “Cloud storage request duration”
Why is my Warehouse/Monitoring page (link), the following properties of Power BI performance.
to check if the cloud storage has
report slow? reviewing query details for: semantic models:
been slow in responding.
4. Ensure there are no SQL queries returning
a. “Scheduling” (i.e., time spent in queue) a. “Maximum connections large result sets (1,000s of records), which
ii. “Files read,” “Size of the smallest
vs. “Optimizing Query” (i.e., time spent per data source” is often an indication of inefficient DAX
file read” ensuring not too many
mainly identifying data to be skipped) formulas (e.g., TOPN function) → evaluate
small files are read. If so ensure b. “Maximum number
vs. “Executing” (i.e., time spent the complexity of DAX formulas and
OPTIMIZE is run regularly. of simultaneous evaluations”
executing the query). optimize where possible.
iii. “Number of output rows” and c. “Maximum number
b. “Result fetching by client” (i.e., time
“Rows skipped” ensure filters are of concurrent jobs”
spent for the client to download the
applied sooner than later.
result set).
d. “MaxParallelismPerQuery”
c. “Rows returned” to ensure the query
5. Monitor the queries in the SQL
does not return too many rows.
Warehouse/Monitoring page, looking
d. “Bytes read from cache” to evaluate at “Started at” and at which time
the disk cache efficiency. the queries arrived to validate the
effective parallelism in Power BI.
e. “Bytes spilled to disk” to ensure no
data is spilled to disk; if so, increase
the cluster size.
■ Power Up With Power BI and Lakehouse in ■ SQL Warehouse Sizing, Scaling and ■ Power Up Your BI With Microsoft Power Bl ■ Power Up Your BI With Microsoft Power Bl
Azure Databricks: Part 3 — Tuning Azure Queuing Behavior (link) and Lakehouse in Azure Databricks: Part and Lakehouse in Azure Databricks: Part
Databricks SQL (link) ■ DBSQL Warehouse Advisor (link) 1 — Essentials (link) 2 — Tuning Power BI (link)
■ Star Schema Data Modeling ■ Power BI — Databricks SQL QuickStart ■ Optimization Guide for Power BI (link)
■ Tune Query Performance in Databricks
What you Best Practices on Databricks SQL (link) Samples (link)
SQL With the Query Profile (link) ■ Monitor Report Performance
should read
■ One Big Table vs. Dimensional Modeling ■ Power BI on DBSQL Design Patterns (link) in Power BI (link)
on Databricks SQL (link) ■ Boosting Power BI Performance With ■ Troubleshoot Report Performance in
Azure Databricks Through Automatic Power BI (link)
Aggregations (link)