Module 2 - Connecting & Shaping Data
Module 2 - Connecting & Shaping Data
SHAPING DATA
TYPES OF DATA CONNECTORS
Formula Bar
(this is “M” code)
Table Name
& Properties
Query
Applied Steps
Pane (like a macro)
*In older versions of Power BI, the Transform Data option may be named Edit Queries
QUERY EDITING TOOLS
The HOME tab includes general settings and common table transformation tools
The TRANSFORM tab includes tools to modify existing columns (splitting/grouping, transposing, extracting text, etc)
The ADD COLUMN tools create new columns (based on conditional rules, text operations, calculations, dates, etc)
BASIC TABLE TRANSFORMATIONS
Sort values (A-Z, Low-High, etc.) Change data type (date, $, %, text, etc.)
Promote
header row
Date & Time tools are relatively straight-forward, and include the following options:
• Age: Difference between the current time and the date in each row
• Date Only: Removes the time component of a date/time field
• Year/Month/Quarter/Week/Day: Extracts individual components from a date field
(Time-specific options include Hour, Minute, Second, etc.)
• Earliest/Latest: Evaluates the earliest or latest date from a column as a single value
(can
only be accessed from the “Transform” menu)
Note: You will almost always want to perform these operations from the “Add Column” menu
to
build out new fields, rather than transforming an individual date/time column
PRO TIP:
Load up a table containing a single date column and use Date tools to build out an entire calendar table
CREATING A BASIC CALENDAR TABLE
NOTE: Any fields not specified in the Group By settings are lost
GROUPING & AGGREGATING DATA
(ADVANCED)
This time we’re transforming the daily, transaction-level table into a summary
of “TotalQuantity” aggregated by both “ProductKey” and “CustomerKey”
(using the advanced option in the dialog box)
PRO TIP:
Use the “Folder” option (Get Data > More > Folder) to append all files within a folder (assuming they share
the same structure); as you add new files, simply refresh the query and they will automatically append!
DATA SOURCE SETTINGS
Within each query, you can click each item within the “Applied Steps”
pane to view each stage of the transformation, add new steps or delete
existing ones, or modify individual steps by clicking the gear icons
*Formerly known as “Edit Queries”
REFRESHING QUERIES
PRO TIP:
Exclude queries that don’t change often,
like lookups or static data tables
DEFINING DATA CATEGORIES
*In older versions of Power BI, these tools can be found in the Modeling tab in the Data view
DEFINING HIERARCHIES
Hierarchies are groups of nested columns that reflect multiple levels of granularity
• For example, a “Geography” hierarchy might include Country, State, and City columns
• Each hierarchy can be treated as a single item in tables and reports, allowing users to “drill up” and
“drill down” through different levels of the hierarchy in a meaningful way
1) From within the Data view, right-click a field 2) This creates a hierarchy field 3) Right-click other fields
(or click the ellipsis) and select “New hierarchy” containing “Start of Year”, which (like “Start of Month”) and
(here we’ve selected “Start of Year”) we’ve renamed “Date Hierarchy” select “Add to Hierarchy”
PRO TIP: IMPORTING MODELS FROM
EXCEL
PRO TIP:
Power Pivot includes some features that Power BI does not (filtering options, DAX function help, etc); if you
are more comfortable in the Excel environment, build your models there and then import to Power BI!
*In older versions of Power BI, this import option was called “Excel Workbook Contents”
BEST PRACTICES: CONNECTING & SHAPING
DATA
Get yourself organized, before loading the data into Power BI
• Define clear and intuitive table names (no spaces!) from the start; updating them later
can be a headache, especially if you’ve referenced them in multiple places
• Establish a file/folder structure that makes sense from the start, to avoid having to
modify data source settings if file names or locations change
When working with large tables, only load the data you need
• Don’t include hourly data when you only need daily, or product-level transactions when
you only care about store-level performance; extra data will only slow you down
Reference sources: