4 Clean Transform, and Load Data in Power BI
4 Clean Transform, and Load Data in Power BI
Power BI
2 hr 1 min
Module
10 Units
4.8 (13,420)
Rate it
Intermediate
Data Analyst
Power BI
Microsoft Power Platform
Power Query has an incredible amount of features that are dedicated to helping you
clean and prepare your data for analysis. You will learn how to simplify a complicated
model, change data types, rename objects, and pivot data. You will also learn how to
profile columns so that you know which columns have the valuable data that you’re
seeking for deeper analytics.
Learning objectives
By the end of this module, you’ll be able to:
Resolve inconsistencies, unexpected or null values, and data quality
issues.
Apply user-friendly value replacements.
Profile data so you can learn more about a specific column before using
it.
Evaluate and transform column data types.
Apply data shape transformations to table structures.
Combine queries.
Apply user-friendly naming conventions to columns and queries.
Edit M code in the Advanced Editor.
StartSave
Prerequisites
None
Introduction
Completed100 XP
2 minutes
Consider the scenario where you have imported data into Power BI from several
different sources and, when you examine the data, it is not prepared for analysis.
What could make the data unprepared for analysis?
You start working with the data, but every time you create visuals on reports, you get
bad data, incorrect results, and simple reports about sales totals are wrong.
Dirty data can be overwhelming and, though you might feel frustrated, you decide to
get to work and figure out how to make this data model as pristine as possible.
Fortunately, Power BI and Power Query offer you a powerful environment to clean
and prepare the data. Clean data has the following advantages:
18 minutes
Power Query Editor in Power BI Desktop allows you to shape (transform) your imported
data. You can accomplish actions such as renaming columns or tables, changing text to
numbers, removing rows, setting the first row as headers, and much more. It is important to
shape your data to ensure that it meets your needs and is suitable for use in reports.
You need to use Power Query Editor to clean up and shape this data before you can start
building reports.
Get started with Power Query Editor
To start shaping your data, open Power Query Editor by selecting the Transform data option
on the Home tab of Power BI Desktop.
When you work in Power Query Editor, all steps that you take to shape your data are
recorded. Then, each time the query connects to the data source, it automatically applies your
steps, so your data is always shaped the way that you specified. Power Query Editor only
makes changes to a particular view of your data, so you can feel confident about changes that
are being made to your original data source. You can see a list of your steps on the right side
of the screen, in the Query Settings pane, along with the query's properties.
The Power Query Editor ribbon contains many buttons you can use to select, view, and shape
your data.
To learn more about the available features and functions, see The query ribbon.
Note
In Power Query Editor, the right-click context menus and Transform tab in
the ribbon provide many of the same options.
Identify column headers and names
The first step in shaping your initial data is to identify the column headers and names within
the data and then evaluate where they are located to ensure that they are in the right place.
In the following screenshot, the source data in the csv file for SalesTarget (sample not
provided) had a target categorized by products and a subcategory split by months, both of
which are organized into columns.
Consequently, the data is difficult to read. A problem has occurred with the data in its current
state because column headers are in different rows (marked in red), and several
columns have undescriptive names, such as Column1, Column2, and so on.
When you have identified where the column headers and names are located, you can make
changes to reorganize the data.
Promote headers
Rename columns
The next step in shaping your data is to examine the column headers. You might discover
that one or more columns have the wrong headers, a header has a spelling error,
or the header naming convention is not consistent or user-friendly.
You can rename column headers in two ways. One approach is to right-click the header,
select Rename, edit the name, and then press Enter. Alternatively, you can double-click the
column header and overwrite the name with the correct name.
You can also work around this issue by removing (skipping) the first two rows and then
renaming the columns to the correct name.
Remove top rows
When shaping your data, you might need to remove some of the top rows, for example, if
they are blank or if they contain data that you do not need in your reports.
Remove columns
A key step in the data shaping process is to remove unnecessary columns. It is much better to
remove columns as early as possible. One way to remove columns would be to limit the
column when you get data from data source. For instance, if you are extracting data from a
relational database by using SQL, you would want to limit the column that you extract by
using a column list in the SELECT statement.
Removing columns at an early stage in the process rather than later is best, especially when
you have established relationships between your tables. Removing unnecessary columns will
help you to focus on the data that you need and help improve the overall performance of your
Power BI Desktop datasets and reports.
Examine each column and ask yourself if you really need the data that it contains. If you
don't plan on using that data in a report, the column adds no value to your data model.
Therefore, the column should be removed. You can always add the column later, if your
requirements change over time.
You can remove columns in two ways. The first method is to select the columns that you
want to remove and then, on the Home tab, select Remove Columns.
Alternatively, you can select the columns that you want to keep and then, on the Home tab,
select Remove Columns > Remove Other Columns.
Unpivot columns
Unpivoting is a useful feature of Power BI. You can use this feature with data from any data
source, but you would most often use it when importing data from Excel. The following
example shows a sample Excel document with sales data.
Though the data might initially make sense, it would be difficult to create a total of all sales
combined from 2018 and 2019. Your goal would then be to use this data in Power BI with
three columns: Month, Year, and SalesAmount.
When you import the data into Power Query, it will look like the following image.
Next, rename the first column to Month. This column was mislabeled because that header in
Excel was labeling the 2018 and 2019 columns. Highlight the 2018 and 2019 columns, select
the Transform tab in Power Query, and then select Unpivot.
You can rename the Attribute column to Year and the Value column to SalesAmount.
Unpivoting streamlines the process of creating DAX measures on the data later. By
completing this process, you have now created a simpler way of slicing the data with
the Year and Month columns.
Pivot columns
If the data that you are shaping is flat (in other words, it has lot of detail but is not organized
or grouped in any way), the lack of structure can complicate your ability to identify patterns
in the data.
You can use the Pivot Column feature to convert your flat data into a table that contains an
aggregate value for each unique value in a column. For example, you might want to use this
feature to summarize data by using different math functions such
as Count, Minimum, Maximum, Median, Average, or Sum.
In the SalesTarget example, you can pivot the columns to get the quantity of product
subcategories in each product category.
The following image illustrates how the Pivot Column feature changes the way that the data
is organized.
Power Query Editor records all steps that you take to shape your data, and the list of steps are
shown in the Query Settings pane. If you have made all the required changes, select Close &
Apply to close Power Query Editor and apply your changes to your data model.
However, before you select Close & Apply, you can take further steps to clean up and
transform your data in Power Query Editor. These additional steps are covered later in this
module.
To continue with the previous scenario where you shaped the initial data in your
model, you need to take further action to simplify the structure of the sales
data and get it ready for developing reports for the Sales team. You have already
renamed the columns, but now you need to examine the names of the queries
(tables) to determine if any improvements can be made. You also need to review the
contents of the columns and replace any values that require correction.
Rename a query
It's good practice to change uncommon or unhelpful query names to names that are
more obvious or that the user is more familiar with. For instance, if you import a
product fact table into Power BI Desktop and the query name displays
as FactProductTable, you might want to change it to a more user-friendly name, such
as Products. Similarly, if you import a view, the view might have a name that contains
a prefix of v, such as vProduct. People might find this name unclear and confusing, so
you might want to remove the prefix.
Replace values
You can use the Replace Values feature in Power Query Editor to replace any value
with another value in a selected column.
Replace null values
Occasionally, you might find that your data sources contain null values. For example,
a freight amount on a sales order might have a null value if it's synonymous with
zero. If the value stays null, the averages will not calculate correctly. One solution
would be to change the nulls to zero, which will produce the more accurate freight
average. In this instance, using the same steps that you followed previously will help
you replace the null values with zero.
Remove duplicates
You can also remove duplicates from columns to only keep unique names in a
selected column by using the Remove Duplicates feature in Power Query.
In this example, notice that the Category Name column contains duplicates for each
category. As a result, you want to create a table with unique categories and use it in
your data model. You can achieve this action by selecting a column, right-clicking on
the header of the column, and then selecting the Remove Duplicates option.
You might consider copying the table before removing the duplicates.
The Copy option is at the top of the context menu, as shown in the following
screenshot. Copying the table before removing duplicates will give you a comparison
of the tables and will let you use both tables, if needed.
Best practices for naming tables, columns, and
values
Naming conventions for tables, columns, and values have no fixed rules; however, we
recommend that you use the language and abbreviations that are commonly used
within your organization and that everyone agrees on and considers them as
common terminology.
A best practice is to give your tables, columns, and measures descriptive business
terms and replace underscores ("_") with spaces. Be consistent with abbreviations,
prefaces, and words like "number" and "ID." Excessively short abbreviations can
cause confusion if they are not commonly used within the organization.
Also, by removing prefixes or suffixes that you might use in table names and instead
naming them in a simple format, you will help avoid confusion.
When replacing values, try to imagine how those values will appear on the report.
Values that are too long might be difficult to read and fit on a visual. Values that are
too short might be difficult to interpret. Avoiding acronyms in values is also a good
idea, provided that the text will fit on the visual.
10 minutes
When you import a table from any data source, Power BI Desktop automatically
starts scanning the first 1,000 rows (default setting) and tries to detect the type of
data in the columns. Some situations might occur where Power BI Desktop does not
detect the correct data type. Where incorrect data types occur, you will experience
performance issues.
You have a higher chance of getting data type errors when you are dealing with flat
files, such as comma-separated values (.CSV) files and Excel workbooks (.XLSX),
because data was entered manually into the worksheets and mistakes were made.
Conversely, in databases, the data types are predefined when tables or views are
created.
A best practice is to evaluate the column data types in Power Query Editor before
you load the data into a Power BI data model. If you determine that a data type is
incorrect, you can change it. You might also want to apply a format to the values in a
column and change the summarization default for a column.
To continue with the scenario where you are cleaning and transforming sales data in
preparation for reporting, you now need to evaluate the columns to ensure that they
have the correct data type. You need to correct any errors that you identify.
You evaluate the OrderDate column. As expected, it contains numeric data, but
Power BI Desktop has incorrectly set the column data type to Text. To report on this
column, you need to change the data type of this column from Text to Date.
Incorrect data types will prevent you from creating certain calculations, deriving
hierarchies, or creating proper relationships with other tables. For example, if you try
to calculate the Quantity of Orders YTD, you will get the following error stating that
the OrderDate column data type is not Date, which is required in time-based
calculations.
Another issue with having an incorrect data type applied on a date field is the
inability to create a date hierarchy, which would allow you to analyze your data on a
yearly, monthly, or weekly basis. The following screenshot shows that the SalesDate
field is not recognized as type Date and will only be presented as a list of dates in the
Table visual. However, it is a best practice to use a date table and turn off the auto
date/time to get rid of the auto generated hierarchy. For more information about this
process, see Auto generated data type documentation.
Change the column data type
You can change the data type of a column in two places: in Power Query Editor and
in the Power BI Desktop Report view by using the column tools. It is best to change
the data type in the Power Query Editor before you load the data.
In Power Query Editor, you can change the column data type in two ways. One way is
to select the column that has the issue, select Data Type in the Transform tab, and
then select the correct data type from the list.
Another method is to select the data type icon next to the column header and then
select the correct data type from the list.
As with any other changes that you make in Power Query Editor, the change that you
make to the column data type is saved as a programmed step. This step is
called Changed Type and it will be iterated every time the data is refreshed.
After you have completed all steps to clean and transform your data, select Close &
Apply to close Power Query Editor and apply your changes to your data model. At
this stage, your data should be in great shape for analysis and reporting.
The ability to combine queries is powerful because it allows you to append or merge different
tables or queries together. You can combine tables into a single table in the following
circumstances:
You can combine the tables in two different ways: merging and appending.
Assume that you are developing Power BI reports for the Sales and HR teams. They have
asked you to create a contact information report that contains the contact information and
location of every employee, supplier, and customer. The data is in the HR.Employees,
Production.Suppliers, and the Sales.Customers tables, as shown in the following image.
However, this data comes from multiple tables, so the dilemma is determining how you can
merge the data in these multiple tables and create one source-of-truth table to create a report
from. The inherent functionality of Power BI allows you to combine and merge queries into a
single table.
Append queries
When you append queries, you will be adding rows of data to another table or query. For
example, you could have two tables, one with 300 rows and another with 100 rows, and when
you append queries, you will end up with 400 rows. When you merge queries, you will be
adding columns from one table (or query) into another. To merge two tables, you must have a
column that is the key between the two tables.
Before you begin combining queries, you can remove extraneous columns that you don't need
for this task from your tables. To complete this task, format each table to have only four
columns with your pertinent information, and rename them so they all have the same column
headers: ID, company, name, and phone. The following images are snippets of the
reformatted Sales.Customers, Production.Suppliers, and HR.Employees tables.
After you have finished reformatting, you can combine the queries. On the Home tab on the
Power Query Editor ribbon, select the drop-down list for Append Queries. You can
select Append Queries as New, which means that the output of appending will result in a
new query or table, or you can select Append Queries, which will add the rows from an
existing table into another.
Your next task is to create a new master table, so you need to select Append Queries as
New. This selection will bring you to a window where you can add the tables that you want
to append from Available Tables to Tables to Append, as shown in the following image.
After you have added the tables that you want to append, select OK. You will be routed to a
new query that contains all rows from all three of your tables, as shown in the following
image.
You have now succeeded in creating a master table that contains the information for the
employees, suppliers, and customers. You can exit Power Query Editor and build any report
elements surrounding this master table.
However, if you wanted to merge tables instead of appending the data from one table to
another, the process would be different.
Merge queries
When you merge queries, you are combining the data from multiple tables into one based on
a column that is common between the tables. This process is similar to the JOIN clause in
SQL. Consider a scenario where the Sales team now wants you to consolidate orders and
their corresponding details (which are currently in two tables) into a single table. You can
accomplish this task by merging the two tables, Orders and OrderDetails, as shown in the
following image. The column that is shared between these two tables is OrderID.
Left Outer - Displays all rows from the first table and only the matching rows
from the second.
Full Outer - Displays all rows from both tables.
Inner - Displays the matched rows between the two tables.
For this scenario, you will choose to use a Left Outer join. Select OK, which will route you
to a new window where you can view your merged query.
Now, you can merge two queries or tables in different ways so that you can view your data
in the most appropriate way for your business requirements.
For more information on this topic, see the Shape and Combine Data in Power
BI documentation.
Profiling data is about studying the nuances of the data: determining anomalies, examining
and developing the underlying data structures, and querying data statistics such as row
counts, value distributions, minimum and maximum values, averages, and so on. This
concept is important because it allows you to shape and organize the data so that interacting
with the data and identifying the distribution of the data is uncomplicated, therefore helping
to make your task of working with the data on the front end to develop report elements near
effortless.
Assume that you are developing reports for the Sales team at your organization. You are
uncertain how the data is structured and contained within the tables, so you want to profile
the data behind the scenes before you begin developing the visuals. Power BI has inherent
functionality that makes these tasks user-friendly and straightforward.
Select View on the ribbon, and under Data Preview, you can choose from a few options. To
understand data anomalies and statistics, select the Column Distribution, Column Quality,
and Column Profile options. The following figure shows the statistics that appear.
By default, Power Query examines the first 1000 rows of your data set. To change this, select
the profiling status in the status bar and select Column profiling based on entire data set. ]
Column distribution shows you the distribution of the data within the column and the counts
of distinct and unique values, both of which can tell you details about the data counts.
Distinct values are all the different values in a column, including duplicates and null values,
while unique values do not include duplicates or nulls. Therefore, distinct in this table tells
you the total count of how many values are present, while unique tells you how many of
those values only appear once.
Column profile gives you a more in-depth look into the statistics within the columns for the
first 1,000 rows of data. This column provides several different values, including the count of
rows, which is important when verifying whether the importing of your data was successful.
For example, if your original database had 100 rows, you could use this row count to verify
that 100 rows were, in fact, imported correctly. Additionally, this row count will show how
many rows that Power BI has deemed as being outliers, empty rows and strings, and the min
and max, which will tell you the smallest and largest value in a column, respectively. This
distinction is particularly important in the case of numeric data because it will immediately
notify you if you have a maximum value that is beyond what your business identifies as a
"maximum." This value calls to your attention these values, which means that you can then
focus your efforts when delving deeper into the data. In the case where data was in the text
column, as seen in the previous image, the minimum value is the first value and the
maximum value is the last value when in alphabetical order.
Additionally, the Value distribution graph tells you the counts for each distinct value in that
specific column. When looking at the graph in the previous image, notice that the value
distribution indicates that "Anthony Grosse" appears the greatest number of times within
the SalesPerson column and that "Lily Code" appears the least amount of times. This
information is particularly important because it identifies outliers. If a value appears far more
than other values in a column, the Value distribution feature allows you to pinpoint a place
to begin your investigation into why this is so.
For example, while looking through invoice data, you notice that the Value
distribution graph shows that a few salespeople in the SalesPerson column appear the same
amount of times within the data. Additionally, you notice the same situation has occurred in
the Profit column and in a few other tables as well. During your investigation, you discover
that the data you were using was bad data and needed to be refreshed, so you immediately
complete the refresh. Without viewing this graph, you might not have seen this error so
quickly and, for this reason, value distribution is essential.
After you have completed your edits in Power Query Editor and are ready to begin building
visuals, return to Home on the Power Query Editor ribbon. Select Close & Apply, which will
return you to Power BI Desktop and any column edits/transformations will also be applied.
You have now determined the elements that make up profiling data in Power BI, which
include loading data in Power BI, interrogating column properties to gain clarity about and
make further edits to the type and format of data in columns, finding data anomalies, and
viewing data statistics in Power Query Editor. With this knowledge, you can include in your
toolkit the ability to study your data in an efficient and effective manner.
Each time you shape data in Power Query, you create a step in the Power Query
process. Those steps can be reordered, deleted, and modified where it makes
sense. Each cleaning step that you made was likely created by using the graphical
interface, but Power Query uses the M language behind the scenes. The combined
steps are available to read by using the Power Query Advanced Editor. The M
language is always available to be read and modified directly. It is not required that
you use M code to take advantage of Power Query. You will rarely need to write M
code, but it can still prove useful. Because each step in Power Query is written in M
code, even if the UI created it for you, you can use those steps to learn M code and
customize it to suit your needs.
After creating steps to clean data, select the View ribbon of Power Query and then
select Advanced Editor.
You might notice that M code is written top-down. Later steps in the process can
refer to previous steps by the variable name to the left of the equal sign. Be careful
about reordering these steps because it could ruin the statement
dependencies. Write to a query formula step by using the in statement. Generally, the
last query step is used as the in final data set result.
45 minutes
This unit includes a lab to complete.
Use the free resources provided in the lab to complete the exercises in this unit. You
will not be charged.
Microsoft provides this lab experience and related content for educational purposes.
All presented information is owned by Microsoft and intended solely for learning
about the covered products and services in this Microsoft Learn module.
Launch lab
Tip
To dock the lab environment so that it fills the window, select the PC icon at the top
and then select Fit Window to Machine.
Overview
The estimated time to complete the lab is 45 minutes.
In this lab, you'll begin to apply transformations to queries. You'll then apply the
queries to load each as a table to the data model.
Before you start this lab, you will need to open the lab environment link above, and
log in to the lab environment. There is no need to provide your own environment, as
an environment has been prepared for this lab.
Load Data
In this exercise, you'll apply transformations to each of the queries.
1. Double-click the Power BI Desktop icon. (This may take a minute or two
to open.)
The query name will determine the model table name. It's recommended
to define concise, yet friendly, names.
You'll now filter the query rows to retrieve only employees who are
salespeople.
o EmployeeKey
o EmployeeNationalIDAlternateKey
o FirstName
o LastName
o Title
o EmailAddress
14. Click OK.
18. Right-click either of the select column headers, and then in the context
menu, select Merge Columns.
21. Click OK.
UPN is an acronym for User Principal Name. The values in this column
will be used when you configure row-level security in Lab 05A.
25. At the bottom-left, in the status bar, verify that the query has 5 columns
and 18 rows.
It's important that you do not proceed if your query does not produce
the correct result, it won't be possible to complete later labs. If it doesn't,
refer back to the steps in this task to fix any problems.
5. Right-click either of the select column headers, and then in the context
menu, select Remove Columns.
6. In the status bar, verify that the query has 2 columns and 39 rows.
1. Select the DimProduct query.
o ProductKey
o EnglishProductName
o StandardCost
o Color
o DimProductSubcategory
9. Uncheck the Use Original Column Name as Prefix checkbox.
10. Click OK.
o EnglishProductName to Product
o StandardCost to Standard Cost (include a space)
o EnglishProductSubcategoryName to Subcategory
o EnglishProductCategoryName to Category
13. In the status bar, verify that the query has six columns and 397 rows.
1. Select the DimReseller query.
o ResellerKey
o BusinessType
o ResellerName
o DimGeography
o City
o StateProvinceName
o EnglishCountryRegionName
o In the Replace With box, enter Warehouse
8. Click OK.
o BusinessType to Business Type (include a space)
o ResellerName to Reseller
o StateProvinceName to State-Province
o EnglishCountryRegionName to Country-Region
10. In the status bar, verify that the query has 6 columns and 701 rows.
1. Select the DimSalesTerritory query.
o SalesTerritoryKey
o SalesTerritoryRegion
o SalesTerritoryCountry
o SalesTerritoryGroup
o SalesTerritoryRegion to Region
o SalesTerritoryCountry to Country
o SalesTerritoryGroup to Group
6. In the status bar, verify that the query has 4 columns and 10 rows.
Configure the Sales query
1. Select the FactResellerSales query.
o SalesOrderNumber
o OrderDate
o ProductKey
o ResellerKey
o EmployeeKey
o SalesTerritoryKey
o OrderQuantity
o UnitPrice
o TotalProductCost
o SalesAmount
o DimProduct
8. For your convenience, you can copy the expression from the D:\DA100\
Lab03A\Assets\Snippets.txt file.
Power Query
9. Click OK.
o TotalProductCost
o StandardCost
o OrderQuantity to Quantity
o UnitPrice to Unit Price (include a space)
o SalesAmount to Sales
o Unit Price
o Sales
o Cost
The fixed decimal number data type stores values with full precision, and
so requires more storage space than the decimal number. It's important
to use the fixed decimal number type for financial values, or rates (like
exchange rates).
14. In the status bar, verify that the query has 10 columns and 999+ rows.
A maximum of 1000 rows will be loaded as preview data for each query.
4. Right-click either of the select column headers, and then in the context
menu, select Unpivot Other Columns.
o Value to Target
10. Click OK.
13. Notice that the first row is for year 2017 and month number 7.
The virtual machine uses US regional settings, so this date is in fact July
1, 2017.
15. Notice that the grid cells update with predicted values.
The feature has accurately predicted that you're combining values from
two columns.
16. Notice also the formula presented above the query grid.
19. Click OK.
o Year
o MonthNumber
24. Click OK.
25. In the status bar, verify that the query has 3 columns and 809 rows.
1. Select the ColorFormats query.
4. In the status bar, verify that the query has 3 columns and 10 rows.
Update the Product query
1. Select the Product query.
Merging queries allows integrating data, in this case from different data
sources (SQL Server and a CSV file).
7. Click Save.
8. In the Merge window, click OK.
10. In the status bar, verify that the query now has 8 columns and 397 rows.
1. Select the ColorFormats query.
Disabling the load means it won't load as a table to the data model. This
is done because the query was merged with the Product query, which is
enabled to load to the data model.
4. Click OK.
Finish up
o Salesperson
o SalespersonRegion
o Product
o Reseller
o Region
o Sales
o Targets
In the next lab, you'll configure data model tables and relationships.
If you have two queries that have different data but the same column headers, and
you want to combine both tables into one query with all the combined rows, which
operation should you perform?
Append
Merge
Combine Column
3.
Which of the following selections are not best practices for naming conventions in
Power BI?
Summary
100 XP
1 minute
This module explained how you can take data that is difficult to read, build
calculations on, and discover and make it simpler for report authors and others to
use. Additionally, you learned how to combine queries so that they were fewer in
number, which makes data navigation more streamlined. You also replaced renamed
columns into a human readable form and reviewed good naming conventions for
objects in Power BI.
Correct. AVERAGE takes the total and divides by the number of non-null values. If
NULL is synonymous with zero in the data, the average will be different from the
accurate average.
2.
If you have two queries that have different data but the same column headers, and
you want to combine both tables into one query with all the combined rows, which
operation should you perform?
Append
Correct. Append will take two tables and combine it into one query. The combined
query will have more rows while keeping the same number of columns.
Merge
Combine Column
Which of the following selections are not best practices for naming conventions in
Power BI?
Correct. Abbreviations lead to confusion because they are often overused or not
universally agreed on.
Module complete:
Lab
Load Data in Power BI Desktop
In this lab you will commence apply transformations to each of the queries created in the
previous lab. You will then apply the queries to load each as a table to the data model.
Lab story
This lab is one of many in a series of labs that was designed as a complete story from data
preparation to publication as reports and dashboards. You can complete the labs in any order.
However, if you intend to work through multiple labs, for the first 10 labs, we suggest you do
them in the following order:
In this task you will setup the environment for the lab.
Important: If you are continuing on from the previous lab (and you completed that lab
successfully), do not complete this task; instead, continue from the next task.
1. To open the Power BI Desktop, on the taskbar, click the Microsoft Power BI Desktop
shortcut.
2. To close the getting started window, at the top-left of the window, click X.
3. To open the starter Power BI Desktop file, click the File ribbon tab to open the
backstage view.
4. Select Open Report.
5. Click Browse Reports.
The message alerts you to the fact that the queries have not been applied to load as
model tables. You’ll apply the queries later in this lab.
11. To dismiss the warning message, at the right of the yellow warning message, click X.
12. To create a copy of the file, click the File ribbon tab to open the backstage view.
13. Select Save As.
The query name will determine the model table name. It’s recommended to define
concise, yet friendly, names.
Tip: This technique is useful when a query contains many columns. If there’s not too
many columns, you can simply horizontally scroll to locate the column of interest.
Each transformation you create results in additional step logic. It’s possible to edit or
delete steps. It’s also possible to select a step to preview the query results at that
stage of the query transformation.
17. Right-click either of the select column headers, and then in the context menu,
select Merge Columns.
Many common transformations can be applied by right-clicking the column header,
and then choosing them from the context menu. Note, however, more transformations
are available in the ribbon.
20. Click OK.
21. To rename the EmployeeNationalIDAlternateKey column, double-click
the EmployeeNationalIDAlternateKey column header.
22. Replace the text with EmployeeID, and then press Enter.
Important: When instructed to rename columns, it’s important that you rename them
exactly as described.
UPN is an acronym for User Principal Name. The values in this column will be used
when you configure row-level security in the Model Data in Power BI Desktop, Part
2 lab.
24. At the bottom-left, in the status bar, verify that the query has five columns and 18
rows.
Important: It’s important that you do not proceed if your query does not produce the
correct result—it won’t be possible to complete later labs. If the query columns or
rows don’t match, refer back to the steps in this task to fix any problems.
Important: When detailed instructions have already been provided, the lab steps will now
provide more concise instructions. If you need the detailed instructions, you can refer back to
the steps of previous tasks.
1. Select the DimProduct query.
Query column names must always be unique. If left checked, this checkbox would
prefix each column with the expanded column name (in this
case DimProductSubcategory). Because it’s known that the selected column names
don’t collide with column names in the Product query, the option is deselected.
10. Click OK.
11. Notice that the transformation resulted in the addition of two columns, and that
the DimProductSubcategory column has been removed.
12. Expand the DimProductCategory column, and then introduce only
the EnglishProductCategoryName column.
13. Rename the following four columns:
o EnglishProductName to Product
o StandardCost to Standard Cost (include a space)
o EnglishProductSubcategoryName to Subcategory
o EnglishProductCategoryName to Category
14. In the status bar, verify that the query has six columns and 397 rows.
1. Select the DimReseller query.
8. Click OK.
1. Select the DimSalesTerritory query.
1. Select the FactResellerSales query.
2. Rename the query to Sales.
3. Remove all columns, except the following:
o SalesOrderNumber
o OrderDate
o ProductKey
o ResellerKey
o EmployeeKey
o SalesTerritoryKey
o OrderQuantity
o UnitPrice
o TotalProductCost
o SalesAmount
o DimProduct
Power Query
9. Click OK.
13. Modify the following three column data types to Fixed Decimal Number.
o Unit Price
o Sales
o Cost
The fixed decimal number data type stores values with full precision, and so requires
more storage space that decimal number. It’s important to use the fixed decimal
number type for financial values, or rates (like exchange rates).
14. In the status bar, verify that the query has 10 columns and 999+ rows.
A maximum of 1000 rows will be loaded as preview data for each query.
1. Select the ResellerSalesTargets query.
4. Right-click either of the select column headers, and then in the context menu,
select Unpivot Other Columns.
5. Notice that the column names now appear in the Attribute column, and the values
appear in the Value column.
6. Apply a filter to the Value column to remove hyphen (-) values.
You may recall that the hyphen character was used in the source CSV file to
represent zero (0).
You’ll now apply transformations to produce a date column. The date will be derived
from the Year and MonthNumber columns. You’ll create the column by using
the Columns From Examples feature.
10. Click OK.
11. Modify the MonthNumber column data type to Whole Number.
13. Notice that the first row is for year 2017 and month number 7.
14. In the Column1 column, in the first grid cell, commence entering 7/1/2017, and then
press Enter.
The virtual machine uses US regional settings, so this date is in fact July 1, 2017.
15. Notice that the grid cells update with predicted values.
The feature has accurately predicted that you are combining values from
the Year and MonthNumber columns.
16. Notice also the formula presented above the query grid.
24. Click OK.
25. In the status bar, verify that the query has three columns and 809 rows.
1. Select the ColorFormats query.
2. Notice that the first row contains the column names.
3. On the Home ribbon tab, from inside the Transform group, click Use First Row as
Headers.
4. In the status bar, verify that the query has three columns and 10 rows.
1. Select the Product query.
2. To merge the ColorFormats query, on the Home ribbon tab, from inside
the Combine group, click Merge Queries.
Merging queries allows integrating data, in this case from different data sources
(SQL Server and a CSV file).
Privacy levels can be configured for data source to determine whether data can be
shared between sources. Setting each data source as Organizational allows them to
share data, if necessary. Note that Private data sources can never be shared with
other data sources. It doesn’t mean that Private data cannot be shared; it means that
the Power Query engine cannot share data between the sources.
7. Click Save.
1. Select the ColorFormats query.
Disabling the load means it will not load as a table to the data model. This is done
because the query was merged with the Product query, which is enabled to load to
the data model.
4. Click OK.
3. In the Fields pane (located at the right), notice the seven tables loaded to the data
model.
You’ll configure data model tables and relationships in the Model Data in Power BI
Desktop, Part 1 lab.
Congratulations!
You have successfully completed this Module, to mark the lab as complete click End.