15 Manage Datasets in Power BI
15 Manage Datasets in Power BI
32 min
Module
11 Units
4.7 (2,713)
Rate it
Intermediate
Data Analyst
Power BI
Microsoft Power Platform
With Microsoft Power BI, you can build multiple reports from a single dataset,
meaning that if you change the dataset, all reports will be updated with that change.
Additionally, you can clean and prep data once rather than repeatedly for each
report.
Learning objectives
In this module, you will:
Create dynamic reports with parameters.
Create what-if parameters.
Use a Power BI gateway to connect to on-premises data sources.
Configure a scheduled refresh for a dataset.
Configure incremental refresh settings.
Manage and promote datasets.
Troubleshoot service connectivity.
Boost performance with query caching (Premium).
StartSave
Prerequisites
None
Introduction
Completed100 XP
2 minutes
For this module's scenario, you work as a Power BI developer for Tailwind Traders.
You have created reports for multiple teams across the organization; however, your
work is not yet completed. Report users have asked you to make the reports more
dynamic so that they can filter the data themselves. Additionally, they want you to
find a way for them to run what-if scenarios. Management has also requested that
you help guarantee the coherency and integrity of your datasets. They want the
datasets available in one place, for future use, and they want you to automate the
refresh process to ensure that the data is kept up to date.
Dynamic reports are reports in which the data can be changed by a developer
according to user specifications. Dynamic reports are valuable because a single
report can be used for multiple purposes. If you use dynamic reports, you'll have
fewer individual reports to create, which will save organizational time and resources.
You can use parameters by determining the values that you want to see data for in
the report, and the report updates accordingly by filtering the data for you.
Creating dynamic reports allows you to give users more power over the data that is
displayed in your reports; they can change the data source and filter the data by
themselves.
In the following example, you've created a report for the Sales team at Tailwind
Traders that displays the sales data in the SQL Server database. The report gives a
holistic view of how the Sales team is performing. While the report is useful, the Sales
team members want to be able to filter the report so that they can view their own
data only and track their performance against their sales targets.
Create dynamic reports for individual values
To create a dynamic report, you first need to write your SQL query. Then, you will use
the Get data feature in Power BI Desktop to connect to the database.
In this example, you will connect to your database on SQL Server by following these
steps:
Now, you need to adjust the code in SQL query to assess your new parameter:
Though you won't notice a difference on the screen, Power BI will have run the query.
4. To confirm that the query was run, you can run a test by selecting the
parameter query and entering a new value in the Current Value box.
5. A warning icon might display next to the query. If so, select that query to
view the warning message, which states that permission is required to
run this native database query. Select Edit Permission and then
select Run.
When the query runs successfully, the parameter will update and display
the new value.
You can now create a report that displays data for one particular value at a time. If
you want to display data for multiple values at the same time, you need to complete
additional steps, as outlined in the next section.
Next, use the Get data feature in Power BI Desktop to connect to the data in that
Excel worksheet, and then follow these steps:
Next, you need to create a function that will pass the new SalesPersonID query
into Query1:
1. Right-click Query1 and then select Create function.
The New column name will update automatically and the table that contains the
values that you're going to pass through the parameter will be selected by default.
7. Select the two arrows icon in the new column header and then select the
check boxes of the columns that you want to load. This section is where
you will determine the details that will be available in the report for each
value (sales person ID).
8. Clear the Use original column name as prefix check box at the bottom
of the screen because you don't need to see a prefix with the column
names in the report.
9. Select OK.
You should be able to view the data for the columns that you selected,
for each value (sales person ID).
If necessary, you can add more values (sales people IDs) to
the SalesPersonID column in the Excel worksheet, or you can change
the existing values.
10. Save your changes and then return to Power Query Editor.
11. On the Home tab, select Refresh Preview, and then run the native
query again (if necessary). You should see the sales from the new sales
people IDs that you added into the worksheet.
12. Select Close and Apply to return to the report editor, where you'll see
the new column names in the Fields pane.
You can use what-if parameters in multiple situations, such as to determine the effect
of increased sales to deeper discounts, or to let sales consultants see their
compensation if they meet certain sales goals or percentages.
In the following example, you want to enable the Sales team to find out how much
growth (percentage), from a sales perspective, that they need to make to earn USD 2
million gross sales each month.
For decimal numbers, make sure that you precede the value with a zero (as
in 0.50 versus .50). Otherwise, the number won't validate and the OK button won't
be selectable.
The new slicer visual will appear on the current report page. You can move the slider
to see the numbers increase according to the settings that you applied. You should
also see a new field for the Sales Forecast Percentage table in the Fields pane, and
when you expand that field, the what-if parameter should be selected.
Similarly, you should see that a measure was also created. You can use this measure
to visualize the current value of the what-if parameter.
After you have created a what-if parameter, the parameter and the measure will
become part of your model; therefore, they will be available throughout the report
and can be used on other report pages. Additionally, because the parameter and
measure are part of the model, you can delete the slicer from the report page. If you
want it back, you can drag the what-if parameter from the Fields list onto the canvas
and then change the visual type to a slicer.
Initially, the bars are similar; however, as you move the slider, notice that the Gross
Sales Forecast column reflects the sales forecast percentage amount.
To enhance the visual, you can add a constant line so that you can clearly see how
the organization is performing against a particular threshold or target. In this
example, you will add a constant line with USD 2 million as the threshold value. Then,
you will use the slider to find out what percentage of gross sales needs to increase
by, each month, to reach that threshold. In the following image, the gross sales need
to increase by 1.40 percent to reach the USD 2 million threshold.
Next unit: Use a Power BI gateway to connect to on-
premises data sources
Gateway software acts like a bridge; it allows organizations to retain databases and
other data sources on their on-premises networks and access that on-premises data
in cloud services, such as Power BI and Microsoft Azure Analysis Services.
When the on-premises gateway is installed and configured, you can start the
gateway and then sign in by using your Microsoft 365 organization account.
When you are working in the cloud and interacting with an element that is
connected to an on-premises data source, the following actions occur:
The cloud service creates a query and the encrypted credentials for the
on-premises data source. The query and credentials are sent to the
gateway queue for processing.
The gateway cloud service analyzes the query and pushes the request
to Microsoft Azure Service Bus.
Service Bus sends the pending requests to the gateway.
The gateway gets the query, decrypts the credentials, and then connects
to one or more data sources with those credentials.
The gateway sends the query to the data source to be run.
The results are sent from the data source back to the gateway and then
to the cloud service. The service then uses the results.
Troubleshoot an on-premises data gateway
Troubleshooting a gateway is an ever-changing topic. Refer to the following
documents for the latest troubleshooting guidance:
In this example, you are creating a report, but then realize that the version of the
sales data that you're using isn't the most up to date. You check the refresh status
and notice that it was last refreshed 10 days ago, and no refresh is scheduled to take
place.
Considering how important it is to have accurate sales data, you need to find a
solution. Usually, the data is updated weekly, but you don't want to return to the
report every week to manually refresh the dataset, and you know that you
occasionally forget to do so. Therefore, you decide to use the Scheduled
refresh functionality in Power BI to solve this problem.
1. Go to the Datasets + dataflows page.
2. Hover over the dataset for which you want to set up the schedule and
then select the Schedule refresh icon.
3. On the Settings page, turn on the Scheduled refresh feature.
4. Select the Refresh frequency and ensure that the correct time zone is
selected.
5. Add the time(s) that you want the refresh to occur. You can configure up
to eight daily time slots, if your dataset is on shared capacity, or 48 time
slots on Power BI Premium.
6. When you have finished configuring the scheduled refresh, select Apply.
Note
While you can set a time for the refresh, be aware that the refresh might not take
place at that exact time. Power BI starts scheduled refreshes on a best effort basis.
The goal is to initiate the refresh within 15 minutes of the scheduled time slot, but a
delay of up to one hour can occur if the service can't allocate the required resources
sooner.
In this example, you want the system to refresh the sales data on a daily basis, at 6:00
AM, 10:00 AM, and 3:00 PM, as illustrated in the following image.
When you have configured a refresh schedule, the dataset settings page informs you
of the next refresh time, as shown in the following image.
For example, you might want to refresh now because you need to view the most
recent data and can't wait for the next refresh time, or you might want to test your
gateway and data source configuration.
Note
Power BI deactivates your refresh schedule after four consecutive failures or when
the service detects an unrecoverable error that requires a configuration update, such
as invalid or expired credentials. It is not possible to change the consecutive failures
threshold.
A quick way to check the refresh status is to view the list of datasets in a workspace.
If a dataset displays a small warning icon, you'll know that the dataset is currently
experiencing an issue. Select the warning icon to get more information.
You should also check the refresh history occasionally to review the success or failure
status of past synchronization cycles. To view the refresh history, open the dataset's
settings page and then select Refresh history.
Warning
Incremental refresh should only be used on data sources and queries that support
query folding. If query folding isn't supported, incremental refresh could lead to a
bad user experience because, while it will still issue the queries for the relevant
partitions, it will pull all data, potentially multiple times.
Traditionally, complex code was required for running incremental refreshes, but you
can now define a refresh policy within Power BI Desktop. The refresh policy is applied
when you publish to Power BI service, which then does the work of managing
partitions for optimized data loads, resulting in the following benefits:
In this example, the Sales team has come to you with a dilemma. The data in their
report is already out-of-date. It isn't feasible for you to manually refresh the data by
adding a new file because the refreshes need to happen regularly to match the
frequency of the sales transactions that are occurring. Also, the manual refresh task is
becoming more difficult because the datasets have millions of rows. Consequently,
you need to implement a better data refresh solution.
You can define an incremental refresh policy to solve this business problem. This
process involves the following steps:
To define the parameters for the incremental refresh, follow these steps:
To ensure that your organization has consistent data for making decisions and a healthy data
culture, it's important to create and share optimized datasets and then endorse those datasets
as the one source of truth. Report creators can then reuse those endorsed datasets to build
accurate, standardized reports.
Promotion - Promote your datasets when they're ready for broad usage. Power
BI Admins have permissions to promote datasets.
Certification - Request certification for a promoted dataset from an admin user
that is defined in the Dataset Certification tenant admin setting. This
certification adds another layer of security for your datasets. Certification can be
a highly selective process, so only the truly reliable and authoritative datasets
are used across the organization.
In this example, you and the other teams are using a workspace in Power BI service to
organize all your reports and dashboards. However, you begin to receive emails from
confused users who expected to see a sales report and are now looking at a product report
instead. You need to make some changes to direct your users to the datasets that they should
be accessing, and you can accomplish this task with the endorsing capability in Power BI.
In this example, the certification type of endorsement is best suited for the Sales team
because it will require users to have special access before they can view the Sales dashboards.
By implementing the certification, you'll lead your users to the most appropriate reports and
dashboards, avoiding the inevitable confusion that might arise with building and sharing a
diversity of reports.
Though you'll soon learn how to certify the dataset, you'll first learn how to promote a
dataset, in case you prefer to use that method.
Promote a dataset
You can only promote a dataset if you're a Power BI admin user or the owner of that dataset.
To promote a dataset, go to your workspace in Power BI service, and then open the settings
page for the dataset that you want to promote. In this example, you want to promote the
Tailwind Traders dataset.
Select the Endorsement setting.
Certify a dataset
You can only certify a dataset if you've been listed as a user in the tenant settings. The
certification option will appear dimmed for other users.
To certify a dataset, you would start the same way as you did to promote the dataset. This
time, however, you will select the Certified option in the Endorsement settings.
When you apply your change, the Certified setting will update to display a message
regarding who certified the dataset and when they did so.
Cloud services, such as SharePoint, do not require a gateway because the data is
already in the cloud. You only need to provide your authorization credentials to set
up a data source connection.
If your report fails to refresh, ensure that your data source credentials are up to date.
If your data source credentials are not up to date, you'll need to take further action
to investigate and resolve the issue.
With the Query Caching feature, you can use the local caching services of Power BI to
process query results. Instead of relying on the dataset to calculate queries, which when
overloaded can reduce performance, you can use cloud resources on your Premium capacities
on Power BI service to load your report and, thereby, ensure constant performance.
To continue with the module scenario, as you begin collaborating with more teams to build
reports and dashboards, you notice that some of your datasets are causing the reports to load
more slowly than before, an issue that is starting to annoy your users. The Sales team wants
to know how they can improve performance and make these reports load faster. You decide
to use the Query Caching ability in Power BI to help solve this problem.
Query caching
Query Caching is a local caching feature that maintains results on a user and report
basis. This service is only available to users with Power BI Premium or Power BI
Embedded.
When using query caching, the query results are only specific to a user, and you can only use
query caching on a specific page of a report. Several benefits to using query caching
include:
Note
Switching from On to Off will clear all previously saved query results. When turning off
query caching (either through the default or the Off option), a small delay will occur in query
loading because the report queries are running against the dataset and it does not have saved
queries to fall back on.
Warning
If many datasets have query caching enabled, and a refresh occurs, a reduction in
performance might occur because a large number of queries are being processed at once.
Power BI Desktop
AppSource
2.
What reserved parameters configure the start and end of where Incremental refresh
should occur?
What is the difference between Promotion and Certification when you are endorsing
a dataset?
Promotion requires write access while Certification requires permission from the
dataset owner to access to the dataset.
Promotion is for broad usage while Certification needs permission granted on the
Admin Tenant settings.
Promotion is for specific users while Certification needs permission granted on the
Admin Tenant settings.
Check your answers
Summary
Completed100 XP
1 minute
This module allowed you to take advantage of the following features of Power BI to
help you manage your datasets:
Parameters that are stored in a Microsoft Excel workbook that help you
create dynamic reports in Power BI service. These dynamic reports help
you provide your users with the ability to filter data for specific values.
Parameters to create what-if scenarios for more in-depth analysis of the
data.
Two data refresh options to help you automate the refresh process and
make it more efficient.
Endorsement features for your most critical datasets to help your users
identify the datasets that they should use.
The on-premises gateway and ideas on how to troubleshoot potential
connectivity issues.
These dataset management techniques will help you to increase the ease of access
and up-to-date nature of your datasets, and will help you build high-quality reports
and dashboards so that your users can make real-time decisions.
Module incomplete: