0% found this document useful (0 votes)
2 views

Module 3 - Creating a Data Model

The document explains the concept of data modeling, emphasizing the importance of creating relationships between tables to ensure data integrity and efficiency. It covers topics such as database normalization, the distinction between data and lookup tables, and best practices for managing relationships and filter flows. Additionally, it highlights the significance of avoiding complex cross-filtering and the necessity of hiding irrelevant fields in report views.

Uploaded by

Vishal Kapoor
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Module 3 - Creating a Data Model

The document explains the concept of data modeling, emphasizing the importance of creating relationships between tables to ensure data integrity and efficiency. It covers topics such as database normalization, the distinction between data and lookup tables, and best practices for managing relationships and filter flows. Additionally, it highlights the significance of avoiding complex cross-filtering and the necessity of hiding irrelevant fields in report views.

Uploaded by

Vishal Kapoor
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 25

CREATING A DATA MODEL

WHAT’S A “DATA
MODEL”?

This IS NOT a data model


• This is a collection of independent tables,
which share no connections or relationships
• If you tried to visualize Orders and Returns
by Product, this is what you’d get
WHAT’S A “DATA
MODEL”?

This IS a data model!


• The tables are connected via relationships,
based on the common ProductKey field
• Now the Sales and Returns tables know how
to filter using fields from the Product table!
DATABASE NORMALIZATION
Normalization is the process of organizing the tables and columns in a relational
database to reduce redundancy and preserve data integrity. It’s commonly used to:
• Eliminate redundant data to decrease table sizes and improve processing speed & efficiency
• Minimize errors and anomalies from data modifications (inserting, updating or deleting records)
• Simplify queries and structure the database for meaningful analysis

TIP: In a normalized database, each table should serve a distinct and specific purpose (i.e. product
information, dates, transaction records, customer attributes, etc.)

When you don’t normalize, you end up with tables like


this; all of the rows with duplicate product info could be
eliminated with a lookup table based on product_id

This may not seem critical now, but minor inefficiencies


can become major problems as databases scale in size!
DATA TABLES VS. LOOKUP
TABLES
Models generally contain two types of tables: data (or “fact”) tables, and lookup (or “dimension”) tables
• Data tables contain numbers or values, typically at a granular level, with ID or “key” columns that can be used to
create table relationships
• Lookup tables provide descriptive, often text-based attributes about each dimension in a table

This Calendar Lookup table provides additional attributes about each date (month, year, weekday, quarter, etc.)

This Product Lookup table provides additional attributes about each product (brand, product name, sku, price, etc.)

This Data Table contains “quantity” values, and connects


to lookup tables via the “date” and “product_id” columns
PRIMARY VS. FOREIGN
KEYS

These columns are foreign keys; they These columns are primary keys; they uniquely identify each
contain multiple instances of each row of a table, and match the foreign keys in related data tables
value, and are used to match the
primary keys in related lookup tables
RELATIONSHIPS VS. MERGED
TABLES
Can’t I just merge queries or use LOOKUP or RELATED functions to pull those
attributes into the fact table itself, so that I have everything in one place??
-Anonymous confused man

Original Fact Table fields Attributes from Calendar Lookup table Attributes from Product Lookup table

Sure you can, but it’s inefficient!


• Merging data in this way creates redundant data and utilizes significantly more memory and
processing power than creating relationships between multiple small tables
CREATING TABLE RELATIONSHIPS
Option 1: Click and drag to connect primary and foreign Option 2: Add or detect relationships using
keys within the Relationships pane the “Manage Relationships” dialog box
CREATING “SNOWFLAKE” SCHEMAS

The Sales_Data table can connect to Products using the ProductKey field,
but cannot connect directly to the Subcategories or Categories tables

By creating relationships from Products to Subcategories (using


ProductSubcategoryKey) and Subcategories to Categories (using
ProductCategoryKey), we have essentially connected Sales_Data to each
lookup table; filter context will now flow all the way down the chain

PRO TIP:
Models with chains of dimension tables are often called
“snowflake” schemas (whereas “star” schemas usually have
individual lookup tables surrounding a central data table)
MANAGING & EDITING
RELATIONSHIPS

The “Manage Relationships” dialog box allows Editing tools allow you to activate/deactivate relationships, view
you to add, edit, or delete table relationships cardinality, and modify the cross filter direction (stay tuned!)
ACTIVE VS. INACTIVE
RELATIONSHIPS

The Sales_Data table contains two date fields (OrderDate & StockDate), but
there can only be one active relationship to the Date field in the Calendar table

Double-click the relationship line, and check the “Make this relationship
active”
box to toggle (note that you have to deactivate one in order to activate
another)
RELATIONSHIP CARDINALITY

Cardinality refers to the uniqueness of values in a column


• For our purposes, all relationships in the data model should
follow a “one-to-many” cardinality; one instance of each
primary key, but potentially many instances of each foreign key

In this case, there is only ONE instance of each ProductKey in the Products
table (noted by the “1”), since each row contains attributes of a single product
(Name, SKU, Description, Retail Price, etc)
There are MANY instances of each ProductKey in the Sales_Data table (noted
by the asterisk *), since there are multiple sales associated with each product
CARDINALITY CASE STUDY: MANY-TO-
MANY

• If we try to connect these tables using product_id,


we’ll get a “many-to-many relationship” error since
there are multiple instances of each ID in both tables
• Even if we could create this relationship, how would
you know which product was actually sold on each
date – Cream Soda or Diet Cream Soda?
CARDINALITY CASE STUDY: ONE-
TO-ONE

• Connecting the two tables above using the product_id field creates a one-to-one relationship,
since each ID only appears once in each table
• Unlike many-to-many, there is nothing illegal about this relationship; it’s just inefficient

To eliminate the inefficiency, you could simply


merge the two tables into a single, valid lookup

NOTE: this still respects the laws of normalization, since all rows
are unique and capture attributes related to the primary key
CONNECTING MULTIPLE DATA TABLES

This model contains two data tables:


Sales_Data and Returns_Data
• Note that the Returns table connects to
Calendar and Product_Lookup just like the
Sales table, but without a CustomerKey field
it cannot be joined to Customer_Lookup
• This allows us to analyze sales and returns
within the same view, but only if we filter or
segment the data using shared lookups
• In other words, we know which product was
returned and on which date, but nothing
about which customer made the return

HEY THIS IS IMPORTANT!


In general, never create direct relationships between data tables; instead, connect them through shared lookups
FILTER FLOW

Here we have two data tables (Sales_Data and Returns_Data),


connected to Territory_Lookup

Note the filter directions (shown as arrows) in each relationship;


by default, these will point from the “one” side of the relationship
(lookups) to the “many” side (data)
• When you filter a table, that filter context is passed along to all
related “downstream” tables (following the direction of the arrow)
• Filters cannot flow “upstream” (against the direction of the arrow)

PRO TIP:
Arrange your lookup tables above your data tables in your model as a visual reminder that filters flow “downstream”

*In some cases filters may default to “two-way” depending on your Power BI Desktop settings
FILTER FLOW
(CONT.)
In this case, the only valid way filter both Sales and Returns data by
Territory is to use the TerritoryKey field from the Territory_Lookup
table, which is upstream and related to both data tables
• Filtering using TerritoryKey from the Sales table yields incorrect
Returns values, since the filter context cannot flow upstream to
either one of the other tables

• Similarly, filtering using TerritoryKey from the Returns table yields


incorrect Sales data; in addition, only territories that registered
returns are visible in the table (even though they registered sales)

3) Filtering using TerritoryKey from


1) Filtering using TerritoryKey from 2) Filtering using TerritoryKey from the Returns_Data table
the Territory_Lookup table the Sales_Data table
TWO-WAY FILTERS

Updating the filter direction between Sales and Territory


from “Single” to “Both” allows filter context to flow both ways
• This means that filters applied to the Sales_Data table will pass to
the lookup, and then down to the Returns_Data table
NOTE: The “Apply security filter in both directions” option relates to row-level security (RLS)
settings, which are not covered in this course
TWO-WAY FILTERS
(CONT.)
With two-way cross-filtering enabled between the Sales and Territory
tables, we now see correct values using TerritoryKey from either table
• The filter context for Sales_Data[TerritoryKey] now passes up to the
Territory_Lookup, and then down to the Returns_Data table

• Note that we still see incorrect values when filtering using TerritoryKey from
the Returns table, since the filter context is isolated to that single table

3) Filtering using TerritoryKey from


1) Filtering using TerritoryKey from 2) Filtering using TerritoryKey from the Returns_Data table
the Territory_Lookup table the Sales_Data table
TWO-WAY FILTERS
(CONT.)
In this case, we’ve enabled two-way cross-filtering between the
Returns and Territory tables
• As expected, we now see incorrect values when filtering using TerritoryKey
from the Sales table, since the filter context is isolated to that single table

• While the values appear to be correct when filtering using TerritoryKey from
the Returns table, we’re missing sales data from any territories that didn’t
register returns (specifically Territories 2 & 3)

Since no information about


Territory 2 or 3 is passed from the
Returns_Data table to
Territory_Lookup, they get filtered
out of the lookup, and subsequently
filtered out of the Sales_Data

3) Filtering using TerritoryKey from


1) Filtering using TerritoryKey from 2) Filtering using TerritoryKey from the Returns_Data table
the Territory_Lookup table the Sales_Data table
TWO-WAY FILTERS: A WORD OF
WARNING

Use two-way filters carefully, and only when necessary*


• If you try to use multiple two-way filters in a more complex model,
you run the risk of creating “ambiguous relationships” by introducing
multiple filter paths between tables:

In this model, filter context from the Product_Lookup table can pass down to
Returns_Data and up to Territory_Lookup, which would filter accordingly based on the
TerritoryKey values passed from the Returns table

If we were able to activate the relationship between Product_Lookup and Sales_Data as


PRO TIP: well, filters could pass from the Product_Lookup table through EITHER the Sales or
Returns table to reach the Territory_Lookup, which could yield conflicting filter context
Design your models with one-way filters
and 1-to-Many cardinality, unless more
complex relationships are necessary

*Two-way filters are not recommended for models with multiple data tables, but may be used when you need to filter a lookup using a data table, or connect two “many” tables via a shared lookup (not covered in this course)
HIDING FIELDS FROM REPORT VIEW

Hiding fields from Report View makes them inaccessible


from the Report tab (although they can still be accessed
within the Data or Relationships views)
This is commonly used to prevent users from filtering
using invalid fields, or to hide irrelevant metrics from view

PRO TIP:
Hide the foreign key columns in your data tables to force
users to filter using the primary keys in the lookup tables
BEST PRACTICES: DATA
MODELING
Focus on building a normalized model from the start
• Make sure that each table in your model serves a single, distinct purpose
• Use relationships vs. merged tables; long & narrow tables are better than short & wide

Organize lookup tables above data tables in the diagram view


• This serves as a visual reminder that filters flow “downstream”

Avoid complex cross-filtering unless absolutely necessary


• Don’t use two-way filters when 1-way filters will get the job done

Hide fields from report view to prevent invalid filter context


• Recommend hiding foreign keys from data tables, so that users can only access valid fields
Reference sources:

- Microsoft PowerBI website


- PowerBI resources on Coursera, Udemy
Disclaimer
The information in this document is highly confidential and may be legally privileged. It
is intended solely for the addressee. Access to this presentation by anyone else is
unauthorized. If you are not the intended recipient, any disclosure, copying, distribution
or any action taken or omitted to be taken in reliance on it, is prohibited and may be
unlawful. The sample screens shown in this presentation are CONVZ FZE’s IP and
cannot be used or distributed without their prior consent. This presentation is
considered approved for submission to the Client by the Above-Authorized signatory.

You might also like