2.prepare Data For Analysis
2.prepare Data For Analysis
Before you can create reports, you must first extract data from the various
data sources. Interacting with SQL Server is different from Excel, so you
should learn the nuances of both systems. After gaining understanding of the
systems, you can use Power Query to help you clean the data, such as
renaming columns, replacing values, removing errors, and combining query
results. Power Query is also available in Excel. After the data has been
cleaned and organized, you're ready to build reports in Power BI. Finally,
you'll publish your combined dataset and reports to Power BI service. From
there, other people can use your dataset and build their own reports or they
can use the reports you’ve already built. Additionally, if someone else built a
dataset you'd like to use, you can build reports from that too!
This module will focus on the first step of getting the data from the different
data sources and importing it into Power BI by using Power Query.
Organizations often export and store data in files. One possible file format is
a flat file. A flat file is a type of file that has only one data table and every
row of data is in the same structure. The file doesn't contain hierarchies.
Likely, you're familiar with the most common types of flat files, which are
comma-separated values (.csv) files, delimited text (.txt) files, and fixed
width files. Another type of file would be the output files from different
applications, like Microsoft Excel workbooks (.xlsx).
Power BI Desktop allows you to get data from many types of files. You can
find a list of the available options when you use the Get data feature in
Power BI Desktop. The following sections explain how you can import data
from an Excel file that is stored on a local computer.
Scenario
The Human Resources (HR) team at Tailwind Traders has prepared a flat file
that contains some of your organization's employee data, such as employee
name, hire date, position, and manager. They've requested that you build
Power BI reports by using this data, and data that is located in several other
data sources.
The first step is to determine which file location you want to use to export
and store your data.
Using a cloud option such as OneDrive or SharePoint Team Sites is the most
effective way to keep your file and your dataset, reports, and dashboards in
Power BI in-sync. However, if your data doesn't change regularly, saving files
on a local computer is a suitable option.
Connect to data in a file
In Power BI, on the Home tab, select Get data. In the list that displays,
select the option that you require, such as Text/CSV or XML. For this
example, you'll select Excel.
Tip
The Home tab contains quick access data source options, such as Excel,
next to the Get data button.
Depending on your selection, you need to find and open your data source.
You might be prompted to sign into a service, such as OneDrive, to
authenticate your request. In this example, you'll open the Employee
Data Excel workbook that is stored on the Desktop (Remember, no files are
provided for practice, these are hypothetical steps).
Select the check box(es) of the table(s) that you want to bring in to Power BI.
This selection activates the Load and Transform Data buttons as shown in
the following image.
Now you can select the Load button to automatically load your data into the
Power BI model or select the Transform Data button to launch the Power
Query Editor, where you can review and clean your data before loading it
into the Power BI model.
We often recommend that you transform data, but that process will be
discussed later in this module. For this example, you can select Load.
Change the source file
You might have to change the location of a source file for a data source
during development, or if a file storage location changes. To keep your
reports up to date, you'll need to update your file connection paths in Power
BI.
Power Query provides many ways for you to accomplish this task, so that you
can make this type of change when needed.
If you are changing a file path, make sure that you reconnect to the same file
with the same file structure. Any structural changes to a file, such as deleting
or renaming columns in the source file, will break the reporting model.
For example, try changing the data source file path in the data source
settings. Select Data source settings in Power Query. In the Data source
settings window, select your file and then select Change Source. Update
the File path or use the Browse option to locate your file, select OK, and
then select Close.
Get data from relational data
sources
Completed100 XP
14 minutes
If your organization uses a relational database for sales, you can use Power
BI Desktop to connect directly to the database instead of using exported flat
files.
Connecting Power BI to your database will help you to monitor the progress
of your business and identify trends, so you can forecast sales figures, plan
budgets and set performance indicators and targets. Power BI Desktop can
connect to many relational databases that are either in the cloud or on-
premises.
Scenario
The Sales team at Tailwind Traders has requested that you connect to the
organization's on-premises SQL Server database and get the sales data into
Power BI Desktop so you can build sales reports.
Connect to data in a relational database
You can use the Get data feature in Power BI Desktop and select the
applicable option for your relational database. For this example, you would
select the SQL Server option, as shown in the following screenshot.
Tip
Next to the Get Data button are quick access data source options, such
as SQL Server.
Your next step is to enter your database server name and a database name
in the SQL Server database window. The two options in data connectivity
mode are: Import (selected by default, recommended) and DirectQuery.
Mostly, you select Import. Other advanced options are also available in
the SQL Server database window, but you can ignore them for now.
After you've added your server and database names, you'll be prompted to
sign in with a username and password. You'll have three sign-in options:
Select a sign-in option, enter your username and password, and then
select Connect.
Select data to import
Select the check box(es) of the table(s) that you want to bring in to Power BI
Desktop, and then select either the Load or Transform Data option.
Load - Automatically load your data into a Power BI model in its current
state.
Transform Data - Open your data in Microsoft Power Query, where you
can perform actions such as deleting unnecessary rows or columns,
grouping your data, removing errors, and many other data quality tasks.
Another way you can import data is to write an SQL query to specify only the
tables and columns that you need.
To write your SQL query, on the SQL Server database window, enter your
server and database names, and then select the arrow next to Advanced
options to expand this section and view your options. In the SQL
statement box, write your query statement, and then select OK. In this
example, you'll use the Select SQL statement to load the ID, NAME and
SALESAMOUNT columns from the SALES table.
Change data source settings
After you create a data source connection and load data into Power BI
Desktop, you can return and change your connection settings at any time.
This action is often required due to a security policy within the organization,
for example, when the password needs to be updated every 90 days. You
can change the data source, edit permissions or clear permissions.
On the Home tab, select Transform data, and then select the Data
source settings option.
From the list of data sources that displays, select the data source that you
want to update. Then, you can right-click that data source to view the
available update options or you can use the update option buttons on the
lower left of the window. Select the update option that you need, change the
settings as required, and then apply your changes.
You can also change your data source settings from within Power Query.
Select the table, and then select the Data source settings option on
the Home ribbon. Alternatively, you can go to the Query Settings panel on
the right side of the screen and select the settings icon next to Source (or
double Select Source). In the window that displays, update the server and
database details, and then select OK.
After you have made the changes, select Close and Apply to apply those
changes to your data source settings.
As previously mentioned, you can import data into your Power BI model by
using an SQL query. SQL stands for Structured Query Language and is a
standardized programming language that is used to manage relational
databases and perform various data management operations.
Consider the scenario where your database has a large table that is
comprised of sales data over several years. Sales data from 2009 isn't
relevant to the report that you're creating. This situation is where SQL is
beneficial because it allows you to load only the required set of data by
specifying exact columns and rows in your SQL statement and then
importing them into your data model. You can also join different tables, run
specific calculations, create logical statements, and filter data in your SQL
query.
The following example shows a simple query where the ID, NAME and
SALESAMOUNT are selected from the SALES table.
The SQL query starts with a Select statement, which allows you to choose
the specific fields that you want to pull from your database. In this example,
you want to load the ID, NAME, and SALESAMOUNT columns.
SQLCopy
SELECT
ID
, NAME
, SALESAMOUNT
FROM
FROM specifies the name of the table that you want to pull the data from. In
this case, it's the SALES table. The following example is the full SQL query:
SQLCopy
SELECT
ID
, NAME
, SALESAMOUNT
FROM
SALES
When using an SQL query to import data, try to avoid using the wildcard
character (*) in your query. If you use the wildcard character (*) in your
SELECT statement, you import all columns that you don't need from the
specified table.
The following example shows the query using the wildcard character.
SQLCopy
SELECT *
FROM
SALES
The wildcard character (*) will import all columns within the Sales table. This
method isn't recommended because it will lead to redundant data in your
data model, which will cause performance issues and require extra steps to
normalize your data for reporting.
All queries should also have a WHERE clause. This clause will filter the rows
to pick only filtered records that you want. In this example, if you want to get
recent sales data after January 1st, 2020, add a WHERE clause. The evolved
query would look like the following example.
SQLCopy
SELECT
ID
, NAME
, SALESAMOUNT
FROM
SALES
WHERE
OrderDate >= ‘1/1/2020’
It's a best practice to avoid doing this directly in Power BI. Instead, consider
writing a query like this in a view. A view is an object in a relational
database, similar to a table. Views have rows and columns, and can contain
almost every operator in the SQL language. If Power BI uses a view, when it
retrieves data, it participates in query folding, a feature of Power Query.
Query folding will be explained later, but in short, Power Query will optimize
data retrieval according to how the data is being used later.
Scenario
Software developers at Tailwind Traders created an application to manage
shipping and tracking products from their warehouses that uses Cosmos DB,
a NoSQL database, as the data repository. This application uses Cosmos DB
to store JSON documents, which are open standard file formats that are
primarily used to transmit data between a server and web application. You
need to import this data into a Power BI data model for reporting.
On the Preview Connector window, select Continue and then enter your
database credentials. In this example, on the Azure Cosmos DB window,
you can enter the database details. You can specify the Azure Cosmos DB
account endpoint URL that you want to get the data from (you can get the
URL from the Keys blade of your Azure portal). Alternatively, you can enter
the database name, collection name or use the navigator to select the
database and collection to identify the data source.
If you are connecting to an endpoint for the first time, as you are in this
example, make sure that you enter your account key. You can find this key
in the Primary Key box in the Read-only Keys blade of your Azure portal.
JSON type records must be extracted and normalized before you can report
on them, so you need to transform the data before loading it into Power BI
Desktop.
After you have connected to the database account, the Navigator window
opens, showing a list of databases under that account. Select the table that
you want to import. In this example, you will select the Product table. The
preview pane only shows Record items because all records in the document
are represented as a Record type in Power BI.
Select the Edit button to open the records in Power Query.
Review the selected data to ensure that you are satisfied with it, then
select Close & Apply to load the data into Power BI Desktop.
The data now resembles a table with rows and columns. Data from Cosmos
DB can now be related to data from other data sources and can eventually
be used in a Power BI report.
Scenario
Tailwind Traders uses SharePoint to collaborate and store sales data. It's the
start of the new financial year and the sales managers want to enter new
goals for the sales team. The form that the leadership uses exists in
SharePoint. You're required to establish a connection to this data within
Power BI Desktop, so that the sales goals can be used alongside other sales
data to determine the health of the sales pipeline.
The following sections examine how to use the Power BI Desktop Get
Data feature to connect to data sources that are produced by external
applications. To illustrate this process, we've provided an example that
shows how to connect to a SharePoint site and import data from an online
list.
When connecting to data in an application, you would begin in the same way
as you would when connecting to the other data sources: by selecting
the Get data feature in Power BI Desktop. Then, select the option that you
need from the Online Services category. In this example, you
select SharePoint Online List.
After you've selected Connect, you'll be asked for your SharePoint URL. This
URL is the one that you use to sign into your SharePoint site through a web
browser. You can copy the URL from your SharePoint site and paste it into
the connection window in Power BI. You don't need to enter your full URL file
path; you only need to load your site URL because, when you're connected,
you can select the specific list that you want to load. Depending on the URL
that you copied, you might need to delete the last part of your URL, as
illustrated in the following image.
After you've entered your URL, select OK. Power BI needs to authorize the
connection to SharePoint, so sign in with your Microsoft account and then
select Connect.
Choose the application data to import
The most popular way to use data in Power BI is to import it into a Power BI
dataset. Importing the data means that the data is stored in the Power BI file
and gets published along with the Power BI reports. This process helps make
it easier for you to interact directly with your data. However, this approach
might not work for all organizations.
To continue with the scenario, you're building Power BI reports for the Sales
department at Tailwind Traders, where importing the data isn't an ideal
method. The first task you need to accomplish is to create your datasets in
Power BI so you can build visuals and other report elements. The Sales
department has many different datasets of varying sizes. For security
reasons, you aren't allowed to import local copies of the data into your
reports, so directly importing data is no longer an option. Therefore, you
need to create a direct connection to the Sales department’s data source.
The following section describes how you can ensure that these business
requirements are satisfied when you're importing data into Power BI.
The three different types of storage modes you can choose from:
Import
DirectQuery
Dual (Composite)
You can access storage modes by switching to the Model view, selecting a
data table, and in the resulting Properties pane, selecting which mode that
you want to use from the Storage mode drop-down list, as shown in the
following visual.
Let’s take a closer look at the different types of Storage Modes.
Import mode
The Import mode allows you to create a local Power BI copy of your datasets
from your data source. You can use all Power BI service features with this
storage mode, including Q&A and Quick Insights. Data refreshes can be
scheduled or on-demand. Import mode is the default for creating new Power
BI reports.
DirectQuery mode
The DirectQuery option is useful when you don't want to save local copies of
your data because your data won't be cached. Instead, you can query the
specific tables that you'll need by using native Power BI queries, and the
required data will be retrieved from the underlying data source. Essentially,
you're creating a direct connection to the data source. Using this model
ensures that you're always viewing the most up-to-date data, and that all
security requirements are satisfied. Additionally, this mode is suited for when
you have large datasets to pull data from. Instead of slowing down
performance by having to load large amounts of data into Power BI, you can
use DirectQuery to create a connection to the source, solving data latency
issues as well.
In Dual mode, you can identify some data to be directly imported and other
data that must be queried. Any table that is brought in to your report is a
product of both Import and DirectQuery modes. Using the Dual mode allows
Power BI to choose the most efficient form of data retrieval.
Notable differences between Azure Analysis Services and SQL Server are:
As previously mentioned, you use the Get data feature in Power BI Desktop.
When you select Analysis Services, you're prompted for the server address
and the database name with two options: Import and Connect live.
Connect live is an option for Azure Analysis Services. Azure Analysis
Services uses the tabular model and DAX to build calculations, similar to
Power BI. These models are compatible with one another. Using the Connect
live option helps you keep the data and DAX calculations in their original
location, without having to import them all into Power BI. Azure Analysis
Services can have a fast refresh schedule, which means that when data is
refreshed in the service, Power BI reports will immediately be updated,
without the need to initiate a Power BI refresh schedule. This process can
improve the timeliness of the data in your report.
Similar to a relational database, you can choose the tables that you want to
use. If you want to directly query the Azure Analysis Services model, you can
use DAX or MDX.
You'll likely import the data directly into Power BI. An acceptable alternative
is to import all other data that you want (from Excel, SQL Server, and so on)
into the Azure Analysis Services model and then use a live connection. This
approach simplifies your solution by keeping the data modeling and DAX
measures in one place.
Consider the scenario where you're building reports for the Sales team in
your organization. You’ve imported your data, which is in several tables
within the Sales team’s SQL database, by creating a data connection to the
database through DirectQuery. When you create preliminary visuals and
filters, you notice that some tables are queried faster than others, and some
filters are taking longer to process compared to others.
Query folding
The query folding within Power Query Editor helps you increase the
performance of your Power BI reports. Query folding is the process by which
the transformations and edits that you make in Power Query Editor are
simultaneously tracked as native queries, or simple Select SQL statements,
while you're actively making transformations. The reason for implementing
this process is to ensure that these transformations can take place in the
original data source server and don't overwhelm Power BI computing
resources.
You can use Power Query to load data into Power BI. Then use Power Query
Editor to transform your data, such as renaming or deleting columns,
appending, parsing, filtering, or grouping your data.
Consider a scenario where you’ve renamed a few columns in the Sales data
and merged a city and state column together in the “city state” format.
Meanwhile, the query folding feature tracks those changes in native queries.
Then, when you load your data, the transformations take place
independently in the original source, this ensures that performance is
optimized in Power BI.
The following scenario shows query folding in action. In this scenario, you
apply a set of queries to multiple tables. After you add a new data source by
using Power Query, and you're directed to the Power Query Editor, you go to
the Query Settings pane and right-click the last applied step, as shown in
the following figure.
If the View Native Query option isn't available (not displayed in bold type),
then query folding isn't possible for this step, and you'll have to work
backward in the Applied Steps area until you reach the step in which View
Native Query is available (displays in bold type). This process will reveal
the native query that is used to transform the dataset.
Query diagnostics
Another tool that you can use to study query performance is query
diagnostics. You can determine what bottlenecks may exist while loading
and transforming your data, refreshing your data in Power Query, running
SQL statements in Query Editor, and so on.
Selecting Diagnose Step shows you the length of time that it takes to run
that step, as shown in the following image. This selection can tell you if a
step takes longer to complete than others, which then serves as a starting
point for further investigation.
This tool is useful when you want to analyze performance on the Power
Query side for tasks such as loading datasets, running data refreshes, or
running other transformative tasks.
For more information, refer to Query Folding Guidance and Query Folding.
While importing data into Power BI, you may encounter errors resulting from
factors such as:
Power BI imports from numerous data sources.
Each data source might have dozens (and sometimes hundreds) of
different error messages.
Other components can cause errors, such as hard drives, networks,
software services, and operating systems.
Data often can't comply with any specific schema.
The following sections cover some of the more common error messages that
you might encounter in Power BI.
Relational source systems often have many people who are concurrently
using the same data in the same database. Some relational systems and
their administrators seek to limit a user from monopolizing all hardware
resources by setting a query timeout. These timeouts can be configured for
any timespan, from as little as five seconds to as much as 30 minutes or
more.
For instance, if you’re pulling data from your organization’s SQL Server, you
might see the error shown in the following figure.
You can resolve this error by pulling fewer columns or rows from a single
table. While you're writing SQL statements, it might be a common practice to
include groupings and aggregations. You can also join multiple tables in a
single SQL statement. Additionally, you can perform complicated subqueries
and nested queries in a single statement. These complexities add to the
query processing requirements of the relational system and can greatly
elongate the time of implementation.
If you need the rows, columns, and complexity, consider taking small chunks
of data and then bringing them back together by using Power Query. For
instance, you can combine half the columns in one query and the other half
in a different query. Power Query can merge those two queries back together
after you're finished.
Occasionally, you may encounter the “We couldn’t find any data formatted
as a table” error while importing data from Microsoft Excel. Fortunately, this
error is self-explanatory. Power BI expects to find data formatted as a table
from Excel. The error event tells you the resolution. Perform the following
steps to resolve the issue:
1. Open your Excel workbook, and highlight the data that you want to
import.
2. Press the Ctrl-T keyboard shortcut. The first row will likely be your
column headers.
3. Verify that the column headers reflect how you want to name your
columns. Then, try to import data from Excel again. This time, it should
work.
Couldn't find file
While importing data from a file, you may get the "Couldn't find file" error.
Usually, this error is caused by the file moving locations or the permissions
to the file changing. If the cause is the former, you need to find the file and
change the source settings.
Sometimes, when you import data into Power BI, the columns appear blank.
This situation happens because of an error in interpreting the data type in
Power BI. The resolution to this error is unique to the data source. For
instance, if you're importing data from SQL Server and see blank columns,
you could try to convert to the correct data type in the query.
By specifying the correct type at the data source, you eliminate many of
these common data source errors.
You may encounter different types of errors in Power BI that are caused by
the diverse data source systems where your data resides.