0% found this document useful (0 votes)
24 views9 pages

Introduction To Data Modeling For Power BI - Gray

Uploaded by

ahmad niduzi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views9 pages

Introduction To Data Modeling For Power BI - Gray

Uploaded by

ahmad niduzi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

www.sqlbi.

com

INTRODUCTION TO

1 2

We write We teach We provide We are recognized


Books Courses Consulting BI Experts
Remote
Consulting
Why is data modeling important?

Introduction to data modeling


Power BI/SSAS
Optimization

BI Architectural
Review

On-Site
Consulting

Custom Training
& Mentoring w w w .s qlbi.co m

3 4

Working with a single table Granularity


o In Excel, you work with a single table o Granularity is the level of detail of your table
o As simple as it is, it is already a data model o The more the columns, the higher the granularity
o It comes with several limitations o Higher granularity
• Number of rows: less than 1 Million • More detailed information
• Speed and memory usage are not optimal • More powerful model
• Can only perform basic calculations • Increase in the number of rows
o The limit on size becomes a limit on the data model o Lower granularity
• Faster and smaller model
• Less analytical power

5 6

1
Granularity and table size Scattered information
o Increasing granularity increase the size of the model o Higher granularity is not always the best choice
• More columns  More rows o Too high is as bad as too low
o You quickly hit the limit of 1M rows of Excel… o Example: yearly income repeated on every row
Category Subcategory Sales

Category Sales Bikes Cross bikes 8,000

Bikes 10,000 Bikes Mountain bikes 2,000

Helmets 5,000 Helmets Colorful helmets 2,000

Helmets Lightweight helmets 3,000

7 8

Leveraging the data model Customer is a business entity


o Using the data model, you can load multiple tables o Being a business entity, it deserves a table by itself
o Load Customers and Sales as separate tables
o Two tables need to be linked through a relationship
o Sales[CustomerKey] = Customer[CustomerKey]

If YearlyIncome is a customer related


information, then you need a separate
Customer table to store it.

9 10

Business Entities Relationships


o Each business has different entities o Many tables need
• Products, Customers, Resellers relationships
• Patients, Medications, Doctors o Links two tables
• Claims, Customers
o Has a direction
• Teams, Workers, Buildings, Projects
• Customer: one side
• Software, Features, Bugs, Customers
• Sales: many side
o Each business entity has unique characteristics
• Many sales, one customer

1 Entity = 1 Table o Best practice: same column name in both tables

11 12

2
Granularity with multiple tables
o With multiple tables, granularity is a different topic
o Each table has its own granularity Adding and removing tables is the key skill of any data modeler

• Customers: at the customer level


• Date: at the date level
Normalization and denormalization
• Product: at the product level
o Sales has granularity defined by related tables
• Customer, Date and Product level
• If you have those three tables
o We will come back to granularity pretty often…

13 14

Normalization vs Denormalization Normalization of a table


o Normalization is the process of organizing the columns
(attributes) and tables of a database to reduce data Brand Code Brand Name

redundancy and improve data integrity 1 Adventure Works

2 Contoso
o Denormalization is the opposite of normalization, that is
3 Fabrikam
increasing data redundancy, with the goal of improving
4 Proseware
the understanding of the model
5 The Phone Company
o Let us see the concept with some examples … …

15 16

Working with a single table Normalized models (OLTP)


o All columns are denormalized o This whole model represents a customer in an OLTP
database
o There are
a lot of
different
tables…
o Not a good
model for
queries

17 18

3
Denormalized model (BI model)
o Denormalization is welcome, to make the model easier
Star schemas are the most popular way of modeling data in Business Intelligence

Introducing star schemas

19 20

Separation between facts and dimensions What makes a dimension?


o Different entities need different ways of handling o One business entity = one table
o Fact: something that happened o Attributes of an entity in the same table
• The sale of a product to a customer o Customer is a business entity
• A cash withdrawal on an ATM machine • Attributes: city, country, region, education, gender, age
• The signature of an order
o Usually Country is not an entity
• The prescription of a medical treatment
• It is an attribute of other dimensions
o Dimension: something that describes a fact
• Country of customer, country of store
• Attribute of a fact
o Exception: demographic data
• The name of the customer, or of the patient
• The date when the fact happened • Measure: population (fact table)
• The currency of the cash withdrawal • Dimension: country (which is an entity in this model)

21 22

Placing tables in a diagram Introducing star schemas


Dimension
o Fact table
• Stands, alone, in the center
o Dimensions Dimension Fact
• All around the fact table
• Directly linked to it
o The figure that appears looks like a star Dimension
o Hence, the name: star schema Dimension

Dimension

23 24

4
Star schemas If you don’t have a star schema
o Very easy to understand at first glance o Most of the times, you are in trouble
• You slice by dimensions and aggregate facts o Any model change towards a star schema is a good step
• There is no ambiguity
o We will see several examples of this
• One level of indirection makes it easy to see roles of tables
o Very fast o Your model is not different from all the other ones
• Modern engines are optimized for star schemas • As anybody else, you have a “special” model
• With special requirements and special calculations
o Drive a clean modeling path
• However, a star schema will fit it well!
• Numbers go in the fact table
o If you are unable to identify facts and dimensions
• Strings go in the dimension
• Everything else… we need to understand what it is • It is likely you still have to understand well the model

25 26

Why choosing a different model?


o If the model is not the right one
• DAX code tends to be very complex
• Formulas are hard to think at
Why data modeling is useful • Complexity turns into performance issues
o With the correct model
• DAX code is simple, as it should be
• Performance is great
o Building the right model requires experience

27 28

Tasks of a data modeler Is your model a different one?


o Data modeling means o At the beginning, you always feel your model is different
• Knowing several patterns than the standard ones
• Being able to match your model to a pattern o 99.9% of the times, this is not the case
• Apply the pattern
o Do not deviate from standard modeling, unless you really
• Adapt the small differences appearing in custom models know what you are doing
o You learn patterns with experience
o Business Intelligence was born in 1958
o In this course, we present multiple patterns
o In 60 years, we analyzed nearly any existing model
o The goal is not learning them, but seeing them in action
o And we found star schemas to be the best option
• Learning requires time, you will do it later

29 30

5
Common scenarios
o Header / detail tables
o Multiple fact tables

Data modeling scenarios o Handling multiple dates


o Events with different durations

31 32

Introducing header/detail schemas


o Two fact tables, linked through a relationship
Let us see a first deviation from star schemas • Invoices / lines of invoice
• Orders / lines of the order
Header / detail tables • Teams / Individuals
o The model appears when you link fact tables
o Linking dimensions in hierarchies possible
• Even if not a best practice
o Linking fact tables, increases the complexity and it is
usually a very bad idea

33 34

Sales headers and sales details Header/details issues


Dimension
Dimension
o Being two fact tables, both tables store information
o Aggregating the header produces incorrect results, if
Fact Fact sliced by any dimension not linked to it

Dimension

Dimension
Dimension

35 36

6
Back to a star schema
Once correctly denormalized, the model becomes a star
schema again.
Computing over multiple star schemas

Once you identify the


Multiple fact tables
set of facts and
dimensions, you no
longer need
header/detail tables.

Star schemas are


always the best choice

37 38

Using multiple fact tables Denormalized fact tables


o Very common scenario
• Sales and purchases
• Orders and shipments
• Sales and weather information
o What we cover in this section
• Build the correct set of dimensions
• Use one fact table to filter the other one(s)

39 40

Building a star schema Options to build the new dimension


o A proper star schema is nearly always the best choice o Use an SQL view, if feasible
o But how do we build the Product table? o Use M code in Power Query
• Available in Excel and Power BI
Fact Fact
o Use DAX code and build a calculated table
• Available in Power BI and SSAS 2016
o You need a key for the new dimension
• Easy in SQL
Dimension • Harder in M or DAX, if primary key not already available

41 42

7
Multiple date tables
Fact
o Multiple date tables
In a fact table you might have multiple dates, how should you handle them? o Single fact table

Handling multiple dates o The model becomes


Dimension
more complicated Dimension

o Slicing multiple fact


tables becomes
troublesome
o Not a best practice
Dimension Dimension

43 44

Multiple relationships with date


o One table, multiple relationships
o Only one relationship can be active Different events, different durations, different fact tables…

Dimension Dimension Events with different durations

Fact

45 46

Different durations The scenario


o This scenario happens when you have o SalaryEmployee
• Multiple fact tables • Salary of an employee
• Each fact table contains some sort of event • From date, to date
• The start date and the duration of different events is o StoreEmployee
unrelated
• Assignment of an employee to a given store
o Example • From date, to date
• Fact: hours worked by employees o Schedule
• Fact: the store where the employee is working
• Working schedule of an employee
• Fact: the salary of the employee, changing over time
• Daily granularity

47 48

8
Employees, with salary and stores Precompute the values
o Using two calculated columns
o Remove the links with the bridge tables

Fact Fact Dimension


Dimension

Dimension
Fact

Dimension
Fact
Dimension
Dimension
Helper Helper

49 50

Thank you!
We write We teach We provide We are recognized
Books Courses Consulting BI Experts
Remote
Consulting

Power BI/SSAS
Optimization

BI Architectural
Review

On-Site
Consulting

Custom Training
& Mentoring w w w .s qlbi.co m
Check our articles, whitepapers and courses on
www.sqlbi.com

51 52

You might also like