Entity-Relationship Model: Data Warehouse Data Models

The document discusses different data modeling techniques for data warehouses, including entity-relationship (ER) modeling and dimensional modeling. It provides details on the core components of each model, including entities, attributes, and relationships for ER modeling and facts, measures, dimensions, and dimension hierarchies for dimensional modeling. Dimensional modeling is presented as being more suitable for data warehousing given its simplicity, ability to align with business questions, and efficiency for summarization and analytics.

Uploaded by

Mahima Prasad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

33 views4 pages

Entity-Relationship Model: Data Warehouse Data Models

Uploaded by

Mahima Prasad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 4

Data Warehouse Data Models
You can choose either of the two data modeling techniques–Entity-Relationship (ER) modeling
and dimensional modeling—depending on your application scenario. The entity relationship model
has been in use much longer than the relatively new dimensional model.
Entity-Relationship Model
The ER modeling is more popular in operational applications or OLTP systems, though it is also
used in data warehousing as well. This is a requirement in the top-down approach, where a central
data warehouse is built using the ER model. There are several graphical tools that help you to create
an entity-relationship diagram (ERD) to conceptualize data. These tools primarily use three basic
constructs: entity, relationship, and attribute.
 Entity An entity is a concept, real or abstract, about which information is collected. It
represents a class of objects such as products, an object such as a car, or an event such as a
sale that can be classified by their properties and characteristics. It will usually have a business
definition and is defined uniquely using primary keys in the data model.
 Relationship A relationship is an association among entities in a model and indicates how
two or more entities are related to one another. For example, a customer owns a car—this
indicates the relationship, how the customer is related to the car. It is represented by a line
drawn between entities. The entities are related to each other in differing cardinality, which
defines the maximum number of instances of one entity related to a single instance of another
entity. The relationship has one-to-one, one-to-many, and many-to-many cardinalities.
 Attributes Both entities and relationships have attributes that describe their characteristics
or properties. For example, a car has a VIN number attribute.

The relational databases are designed using the normalized ER model to remove redundant
data. Six normal forms have been defined to date, but a database is considered adequately
normalized if it is in third normal form (3NF). Normalization is a systematic process for assigning
attributes to entities to ensure that a database is integral by breaking down information to its smallest
divisible parts and removing data duplication. It is an incremental process, which means that to be
transformed into 3NF, the entities must first qualify for 2NF. The 1NF removes repetition in data by
creating one-to-many relationships between master and detail entities. For example, you will remove
repeating similar columns from a table into another table. 2NF takes removal of repeating data a
further step by removing duplicate rows of data from a table into a separate table. 3NF removes the
columns that are not dependent on the primary key and resolves many-to-many relationships into
unique values.

As the normalization level increases, the data is further broken down and granularity of data is
increased. Such highly granular data models are very efficient in returning small amounts of
information or updating small amounts of data. This is required by OLTP systems that have many
users working on small pieces of information. That’s the reason the ER Model is highly successful in
relational database applications. However, the requirements of a data warehouse are different. The
queries are usually small in number, but they can perform huge I/O activity on the server. These
requirements are met with the dimensional model.
Dimensional Model
Dimensional modeling is a relatively newer modeling technique than ER modeling. Recent
trends in modeling preference favor dimensional data modeling, as it is simple to build, is easily to
understand even by business folks, and aligns with the questions usually asked of a data warehouse.
This model is very efficient at summarizing values and presenting data to analytical tools. This model
keeps the numerical values called facts in one table, while the attributes that measure these values
are grouped together in tables called dimensions. This structure makes it particularly suitable to
answer business questions such as sales this quarter or the sales this quarter compared to sales the
previous quarter.

Fact
The numeric values along with some contextual data are stored in fact tables. A data
warehouse contains one or more subject-oriented fact tables. A fact typically represents an item, an
event, or a business transaction such as sale of an item that can be used to perform business
analysis. Further, a fact consists of some columns containing values and some foreign keys linking to
dimensions. As the transactions are added to the fact table, it can soon become quite large; consider
that each transaction could represent a row in the fact table.

It is the granularity that can really affect the size of your fact data. You have to understand the
business requirements completely—what they call for now and in the future—to decide the level of
detail you want to keep in the fact data. You might choose to keep every transaction, or you might
choose to keep aggregated data at the day level if business is never going to drill down to the
transaction-level detail. Thus the lower the granularity level, the more the data and hence, the more
disk space will be required. However, disk space should not be an issue, because data warehouses
are meant to be large in size. Always be careful while choosing granularity, as a change in granularity
means the whole data warehouse has to be reseeded.

Measure
A measure is closely related to a fact, and sometimes the terms are used interchangeably. A
measure is what you want to measure, and the fact is a measure with context. So, a measure is a
numeric value used to indicate the performance of the business, and the fact equates to a row in the
Fact table. A measure is used in combination with dimension members, while the value is taken from
facts. For example, TotalSales-by-year and SupplyCost-by-month are measures.

Dimension
A dimension contains the same type of information broken down in levels of interest. For
example, a time dimension can contain year, quarter, month, and day levels. A dimension contains
the information that a business wants to analyze the facts with. This information does not change very
often. Typically, a dimension table is relatively quite small compared to a fact table. A data warehouse
can have many dimensions attached to each fact table. Some of the common dimensions could be
Time, Product, Employee, Location, and Customer. A dimension is made up of members and
hierarchies.

Dimension Members
Member of a dimension are arranged in levels, for instance, members in a time dimension have
different levels: day, week, month, quarter, and year. Similarly all cities, states, and countries are
members of a geography dimension. While analyzing the data, you can choose any dimension
member level and associated facts will be used in analysis automatically.

Dimension Hierarchies
The members of a dimension can be arranged in a hierarchical order with multiple levels to
create dimension hierarchies. You can create more than one hierarchy and a dimension member can
participate in more than one hierarchy. For example, you can create two hierarchies for time
dimension—one with Day, Month, Quarter, and Year as levels and the other one with Day and Week
as levels.
Dimension Types
Dimensions can be designed in different ways to meet specific business functions and to
enhance performance. Four types of configurations are covered here.

Conformed Dimension
At times you will use a dimension in more than one subject-oriented data marts. If you are
keeping the dimensions exactly same or are sourcing them from the central data warehouse,
obviously making them the same, then these dimensions are called conformed dimensions. A
conformed dimension does not need to be exactly the same as the main dimension; it is still
conformed to the main dimension if it is a subset of the detailed dimension. In this case, the attributes
in a conformed dimension need to be labeled exactly in the same way as in a detailed dimension. The
most common example could be a date dimension used across many data marts having same
attributes such as date, month, quarter, and year.

Junk Dimension
The business data generally contains some attributes that are not related to any dimension but
are associated more with the fact. These can be easily identified, as generally these attributes
represent themselves in the form of indicators or flags. For example, in a car rental business flags
such as IsDamaged, IsStolen, and IsExchanged are common in the databases. These flags are not
exactly part of any dimension, but businesses do want to analyze data using these flags. If you leave
them with the fact data, the performance of the queries will be very poor due to the large size of the
fact table, and indexes won’t help due to yes/no nature of such flags. You could put them in their own
dimension, in which case you would have as many dimensions as there are such attributes. Very
frequently you will see that the number of indicators or flags that exist in the data reaches 20 plus. In
this case, your data mart design will be cluttered with lots of dimensions that have only one member,
enough to confuse users. A recommended approach is to club all these nonrelated attributes and put
them in a table, thus creating a junk dimension. For instance, you can select distinct combinations of
the attributes and add them to the junk dimension where each distinct row is assigned a surrogate key
that is referenced in the Fact table. Keeping the flags and indicators in one dimension makes it easier
for users to find out these attributes and is useful in that the queries perform much better.

Degenerate Dimension
Another type of attribute in the data is the number associated with each business transaction,
such as an invoice number or a ticket number. Though these attributes are not actively used in
analysis, every now and then businesses do have a requirement to look for such attributes, as they
link back with operational systems. These attributes are also very useful in reconciling the data
warehouse with ODS systems. When the grain of the fact table is the same as that of a transaction,
the likelihood of a degenerate dimension increases. As the fact grain is the same as that of the
degenerate dimension, so sometimes it forms an integral part of the primary key of the fact table. A
degenerate dimension should exist along with the facts in the fact table. There is no point in creating a
separate dimension table for a degenerate dimension, as it will grow with the fact and will become
quite large.

Role-Playing Dimensions
These types of dimensions exist when the same attribute is used multiple times. For instance, a
fact row could contain SaleDate and DeliveryDate columns, both pointing to the date attribute of the
date dimension. In this case, the key of the date attribute of the date dimension is used multiple times
in the fact row and is often referred to as a role-playing dimension. You create a table alias or a view
to use the date dimension foreign key in the fact table.

Loading a Dimension Using a Slowly Changing Dimension
Loading a static dimension is a simple one-off task, but you will come across dimensions that
change with time and will find loading such a dimension a challenging task. Some of the data
warehouse dimensions do not change that often. For instance, a date dimension’s members stay as is
and never change. By contrast, other business dimensions do change with time as a business
evolves; for example, a product could change in size and volume. Typically, a row in the dimension
table will have different attribute requirements:
 Some attributes never change, such as IDs, or in the case of a car, a VIN number.
 Some attributes will change but a business always want to see the current value; for
instance, a business may not bother to see the old description of the product if the description of
a product is changed. This type of change is called a Type 1 change in which the attribute value
is overwritten and a business cannot go back and see the old value or learn when the change in
value has been applied.
 Some attributes will change and the business will want to see the current value as well as the
historical value. This happens when a business is interested in analyzing the facts before and
after a particular change has been applied. For instance, a business might want to see the effect
on sales when the size of a can of beans is changed from 400 g to 350 g. This type of change is
called a Type 2 change.
 Most of the business requirements can be covered with use of the preceding types. Other
types of changes have been defined such as Type 3 and so on; however, their primary function is
to improve storage efficiency and query performance. These change types are not covered here.
Refer to data warehousing books if you want to know more about them. Also, the SCD
transformation in SSIS supports only up to Type 2 changes.

You have studied the Slowly Changing Dimension (SCD) transformation in Chapter 10 and
have done a Hands-On exercise as well. Here just to recap: the SCD transformation is designed to
help you load a dimension that changes in time, which is generally a challenge with an ETL tool. The
SCD transformation supports the following attribute change types to support the previously mentioned
requirements.

 Fixed Attribute Change Type This change type supports the attributes that do not change
(Type 0) and align with the first scenario mentioned previously.

 Changing Attribute Change Type Using this change type, you can load the attribute values
that are Type 1 changes, and it overwrites the existing values with the new values. This is an in-
place modification.

 Historic Attribute Change Type In this change type, a new row is added that will be valid for
the current or future transactions. Typically, three columns are used to handle this type of change
—StartDate, EndDate, and IsActive. When a row is getting a Type 2 update, it will update the
IsActive flag to ‘N’ and will timestamp the EndDate with the current date and time to indicate that
this row is no longer active, while the activity period of this row can be found using StartDate and
EndDate. Also, at the same time, a new row is added with same values in all the columns except
in the Type 2 column that is getting the update. In this Type 2 column, the updated new value is
inserted, StartDate gets the current date and time stamp, EndDate is kept as null, and the
IsActive column gets a ‘Y’ value to indicate that this row is active for the particular member.

Among other methods, SCD transformation can be tested to load a dimension. If you think that
your dimension is too huge and the SCD transformation is not a fit for the purpose, you can always
create a script in your package to load a slowly changing dimension.

Arduino Based Gloves Translator of Filipino Sign Language FSL Into Speech and Text
No ratings yet
Arduino Based Gloves Translator of Filipino Sign Language FSL Into Speech and Text
62 pages
Microsoft Press Ebook Introducing Microsoft SQL Server 2012
No ratings yet
Microsoft Press Ebook Introducing Microsoft SQL Server 2012
62 pages
Ravi Data Warehousing Concepts Document 1665375367
No ratings yet
Ravi Data Warehousing Concepts Document 1665375367
49 pages
Chapter Eight
No ratings yet
Chapter Eight
33 pages
Datawarehouse
No ratings yet
Datawarehouse
27 pages
Bi Unit 2
No ratings yet
Bi Unit 2
14 pages
Unit 2
No ratings yet
Unit 2
33 pages
Dimensional Modeling: Confidential © L&T Infotech
No ratings yet
Dimensional Modeling: Confidential © L&T Infotech
20 pages
DWM 2
No ratings yet
DWM 2
21 pages
Lecture 3
No ratings yet
Lecture 3
42 pages
Bi Lecture4 - 2023
No ratings yet
Bi Lecture4 - 2023
49 pages
???? ?????????
No ratings yet
???? ?????????
22 pages
BI Tech Session On Data Warehousing: Dhruv Nath
No ratings yet
BI Tech Session On Data Warehousing: Dhruv Nath
58 pages
Unit II DWDM
No ratings yet
Unit II DWDM
97 pages
Week 3
No ratings yet
Week 3
39 pages
Data Warehouse: What, Why and How ?
No ratings yet
Data Warehouse: What, Why and How ?
25 pages
DWDM Unit 2 PDF
No ratings yet
DWDM Unit 2 PDF
16 pages
C 01 Dimensional Modeling
No ratings yet
C 01 Dimensional Modeling
30 pages
Dimensional Modelling
No ratings yet
Dimensional Modelling
30 pages
Kimball Group A Dimensional Modeling Manifesto Kimball Group Print
No ratings yet
Kimball Group A Dimensional Modeling Manifesto Kimball Group Print
7 pages
Data Warehousing Concepts
No ratings yet
Data Warehousing Concepts
14 pages
Introduction To DataWarehouse and DataMining
No ratings yet
Introduction To DataWarehouse and DataMining
35 pages
Dimensional Model
No ratings yet
Dimensional Model
18 pages
Data Warehousing 2
No ratings yet
Data Warehousing 2
14 pages
Data Warehouse Schema
No ratings yet
Data Warehouse Schema
10 pages
Data Cubemod2
100% (1)
Data Cubemod2
21 pages
Unit 2
No ratings yet
Unit 2
8 pages
Week 5
No ratings yet
Week 5
19 pages
Different Types of Dimensions and Facts in Data
No ratings yet
Different Types of Dimensions and Facts in Data
5 pages
Data Warehouse Concepts PDF
0% (1)
Data Warehouse Concepts PDF
14 pages
Dimensions DW
No ratings yet
Dimensions DW
6 pages
Lecture 1 Notes: Dimension Tables
No ratings yet
Lecture 1 Notes: Dimension Tables
2 pages
Data Warehouse Concepts
No ratings yet
Data Warehouse Concepts
11 pages
Dimensional Modeling
No ratings yet
Dimensional Modeling
7 pages
Data Warehouse
No ratings yet
Data Warehouse
8 pages
"Data Warehouse and Data Mining": Institute of Management Studies
No ratings yet
"Data Warehouse and Data Mining": Institute of Management Studies
14 pages
Dimensional Modeling (DM)
No ratings yet
Dimensional Modeling (DM)
9 pages
What Is Data Warehouse?: Explanatory Note
No ratings yet
What Is Data Warehouse?: Explanatory Note
10 pages
DATAWAREHOUSE PPT NEWW
No ratings yet
DATAWAREHOUSE PPT NEWW
27 pages
What Is Data Warehouse?: Explanatory Note
No ratings yet
What Is Data Warehouse?: Explanatory Note
11 pages
1 DWH Concepts
No ratings yet
1 DWH Concepts
13 pages
ETL Testing
No ratings yet
ETL Testing
3 pages
DMDW
No ratings yet
DMDW
40 pages
Dimensional Modeling: Prof. Sunita Sahu
No ratings yet
Dimensional Modeling: Prof. Sunita Sahu
50 pages
Fact Tables
No ratings yet
Fact Tables
3 pages
Chapter Four - Data Warehouse Design: SATA Technology and Business Collage
No ratings yet
Chapter Four - Data Warehouse Design: SATA Technology and Business Collage
10 pages
Data Model
100% (1)
Data Model
11 pages
Basics of Dimensional Modeling
100% (1)
Basics of Dimensional Modeling
14 pages
C 01 Dimensional Modeling
No ratings yet
C 01 Dimensional Modeling
30 pages
Dimensional Modelling
No ratings yet
Dimensional Modelling
26 pages
DWH
No ratings yet
DWH
48 pages
5.data Warehouse
No ratings yet
5.data Warehouse
19 pages
Data Warehouse: Subject Oriented
No ratings yet
Data Warehouse: Subject Oriented
6 pages
Data Warehouse Notes
No ratings yet
Data Warehouse Notes
5 pages
DW Concepts
No ratings yet
DW Concepts
7 pages
What Is Dimensional Model
No ratings yet
What Is Dimensional Model
7 pages
DWH Int Questions
100% (1)
DWH Int Questions
9 pages
3BR4765JZ 104 PDF
No ratings yet
3BR4765JZ 104 PDF
32 pages
Renault Midlum D-Range D 18 HIGH P4X2 240
No ratings yet
Renault Midlum D-Range D 18 HIGH P4X2 240
5 pages
Datawarehouse Concepts
No ratings yet
Datawarehouse Concepts
5 pages
Datawarehouse Concepts
No ratings yet
Datawarehouse Concepts
7 pages
GRZINIC Repoliticizing Art
No ratings yet
GRZINIC Repoliticizing Art
240 pages
Manual GE 745
No ratings yet
Manual GE 745
114 pages
Operations Management MBA 3rd Semnew
No ratings yet
Operations Management MBA 3rd Semnew
8 pages
4.RF Module VSWR Abnormal Alarm
No ratings yet
4.RF Module VSWR Abnormal Alarm
10 pages
Electromagnetic Brake Project
No ratings yet
Electromagnetic Brake Project
3 pages
Computer Hardware N Software
No ratings yet
Computer Hardware N Software
21 pages
HP Aruba Certified Network Security Professional - HPE7-A02 Free Exam Questions (2024) - 6
No ratings yet
HP Aruba Certified Network Security Professional - HPE7-A02 Free Exam Questions (2024) - 6
4 pages
ISTQB CTFL40 Sample-Exam-Answers SET-E v1.2 GTB-edition Engl en
No ratings yet
ISTQB CTFL40 Sample-Exam-Answers SET-E v1.2 GTB-edition Engl en
59 pages
Automatic Registration Version.2.5 Revision. 1.0 en
No ratings yet
Automatic Registration Version.2.5 Revision. 1.0 en
78 pages
CH 32 Security in The Internet IPSec SSLTLS PGP VPN and Firewalls Multiple Choice Questions and Answers PDF
No ratings yet
CH 32 Security in The Internet IPSec SSLTLS PGP VPN and Firewalls Multiple Choice Questions and Answers PDF
9 pages
Full Stack Developer - Job Description CWSSG
No ratings yet
Full Stack Developer - Job Description CWSSG
2 pages
7inch Wide Screen, TFT Color LCD Type Graphic Panel + PLC Function Logic Panel LP-S070
No ratings yet
7inch Wide Screen, TFT Color LCD Type Graphic Panel + PLC Function Logic Panel LP-S070
12 pages
Unit Description Specification
No ratings yet
Unit Description Specification
7 pages
PF 1.0 and 1.1 - Core Mapping
No ratings yet
PF 1.0 and 1.1 - Core Mapping
65 pages
Mvi56e MCM MCMXT Um PDF
No ratings yet
Mvi56e MCM MCMXT Um PDF
205 pages
Edward Heath: Work Experience
No ratings yet
Edward Heath: Work Experience
2 pages
Files2Sql - Manual (PDF Library)
No ratings yet
Files2Sql - Manual (PDF Library)
32 pages
CAD-Technologies Company Profile PDF
No ratings yet
CAD-Technologies Company Profile PDF
12 pages
Concur Expense EXP - SG - Workflow - AuthAppr
No ratings yet
Concur Expense EXP - SG - Workflow - AuthAppr
38 pages
Camshaft Angle Variator
No ratings yet
Camshaft Angle Variator
2 pages
GS-26 English
No ratings yet
GS-26 English
20 pages
Client Log
No ratings yet
Client Log
30 pages
BTB Brochure
No ratings yet
BTB Brochure
7 pages
Wimax Technology PDF
No ratings yet
Wimax Technology PDF
39 pages
BI Brochure
No ratings yet
BI Brochure
2 pages
JM Jar Quote
No ratings yet
JM Jar Quote
5 pages
Four-Pole Squirrel-Cage Induction Motor 579493 (8221-05) : Labvolt Series Datasheet
No ratings yet
Four-Pole Squirrel-Cage Induction Motor 579493 (8221-05) : Labvolt Series Datasheet
3 pages
DATA ANALYSIS AND DATA SCIENCE: Unlock Insights and Drive Innovation with Advanced Analytical Techniques (2024 Guide)
From Everand
DATA ANALYSIS AND DATA SCIENCE: Unlock Insights and Drive Innovation with Advanced Analytical Techniques (2024 Guide)
WINTON CLEM
No ratings yet
Microsoft Excel Statistical and Advanced Functions for Decision Making
From Everand
Microsoft Excel Statistical and Advanced Functions for Decision Making
Palani Murugappan
No ratings yet

Entity-Relationship Model: Data Warehouse Data Models

Uploaded by

Entity-Relationship Model: Data Warehouse Data Models

Uploaded by

You might also like