Data Modelling 242
Data Modelling 242
Characteristics of a DW
Subject-oriented Data
collects all data for a subject, from different sources
Read-only Requests
loaded during off-hours, read-only during day hours
Pre-aggregated data
to improve runtime performance
Data
Staging
Area
Storage
Flat Files
RDBMS
Processing
DWH
Servers
Data Mart 1
Dimensional
Conforms to
DW Bus
No User
Query
Services
Data Mart 2
End User
Data Access
Query
Tools
Report
Writers
Mining
Tools
optional
to cleanse the source data
Accepts data from different sources
Data model is required at staging area
Multiple data models may be required for
parking different sources and for transformed
data to be pushed out to warehouse
Data Modeling
WHAT IS A DATA MODEL???
A data model is an abstraction of some aspect of the real
world (system).
WHY A DATA MODEL???
Multidimensional analysis
Analyse data content by looking at it in different
perspectives
Data mining
discover patterns and clustering attributes in data
Requirements of a Decision
Support Query Environment
To provide a method for testing hypothesis (eg.
what if .)
To allow ad-hoc queries
To allow human input (DSS makes decisions
with users )
Expects user knowledge of problem
To simulate the behaviour of a real-world
problem
Multidimensional Analysis
Data Mining
Data Mining
discovers unusual patterns
requires low level of detail data
Operational
Data
R
Y
M
A
N
A
External
data
G
E
R
Detailed
Information
Summary
information
M
A
N
A
Meta Data
G
E
R
Warehouse Manager
OLAP
DW Architectures
Architecture Choices depend on
Current infrastructure
Business environment
Desired management and control structure
resources
commitment ..
DW Architectures
Architecture Choices determine
Where will DW reside?
Centrally / locally / distributed
3 choices
Global
Independent
Interconnected
DW Architectures
Global Architecture
Global Architecture
DW Architectures
Independent Architecture
stand-alone
controlled by a department
minimal integration
no global view
very fast to implement
DW Architectures
Interconnected Architecture
distributed
integrated and interconnected
gives a global view of enterprise
more complexity
who manages / controls data
another tier in architecture to share common data
between multiple data marts
have a data sharing schema across data marts
Datamart
Datamart
Datamart
Data Mart
Solution
Share a uniform architecture to allow them to
be fused coherently
Classical Architectures
Physical data warehouse (physical)
Data warehouse --> data marts
Data marts --> data warehouse
Parallel data warehouse and data marts
SOURCE DATA
Operational Data
Data Warehouse
Data Marts
Staging Area
SOURCE DATA
Operational Data
Data Warehouse
Data Marts
Staging Area
SOURCE DATA
Staging Area
Operational Data
Data Marts
DW Implementation Approaches
Top Down
Bottom-up
Combination of both
Choices depend on:
current infrastructure
resources
architecture
ROI
Implementation speed
Bottom Up Implementation
DW Implementation Approaches
Top Down
More planning and design
initially
Involve people from
different work-groups,
departments
Data marts may be built
later form Global DW
Overall data model to be
decided up-front
Bottom Up
Can plan initially without
waiting for global
infrastructure
built incrementally
can be built before or in
parallel with Global DW
Less complexity in design
DW Implementation Approaches
Top Down
Consistent data definition
and enforcement of
business rules across
enterprise
High cost, lengthy
process, time consuming
Works well when there is
centralized IS department
responsible for all H/W
and resources
Bottom Up
Data redundancy and
inconsistency between
data marts may occur
Integration requires great
planning
Less cost of H/W and
other resources
Faster pay-back
DW Implementation Approaches
Combined Approach
Determine degree of planning and design for a global
approach to integrate data marts being built by bottom-up
approach
Develop base level infrastructure definition for global DW
at business level
Develop plan to handle data elements needed by multiple
data marts
Build a common data store to be used by data marts and
global DW
Levels of modeling
Business
Process
Conceptual
Logical
Model
Physical
Model
Levels of modeling
Conceptual modeling
Describe data requirements from a business
point of view without technical details
Logical modeling
Refine conceptual models
Data structure oriented, platform independent
Physical modeling
Detailed specification of what is physically
implemented using specific technology
Conceptual Model
A conceptual model shows data through
business eyes.
All entities which have business meaning.
Important relationships
Few significant attributes in the entities.
Few identifiers or candidate keys.
Customer
Invoices
Customers
Sales Reps
Customer
Addresses
Geographic
Boundaries
Sample
Conceptual
Model
Logical Model
Replaces many-to-many relationships with
associative entities.
Defines a full population of entity attributes.
May use non-physical entities for domains
and sub-types.
Establishes entity identifiers.
Has no specifics for any RDBMS or
configuration.
PRODUCT
#PRODUCT CODE
.PRODUCT DESCRIPTION
sold by
CUSTOMER ADDRESS
#CUSTOMER ID
#ADDRESS ID
CUSTOMER
#CUSTOMER ID
#SNAPSHOT DATE
.CUSTOMER NAME
for the
for the located within
customer
customer
managed by sold to by
SALES REP
#SALES REP ID
the salesman
for
GEOGRAPHIC
BOUNDARY
#GEO CODE
Physical Model
A Physical data model may include
Referential Integrity
Indexes
Views
Alternate keys and other constraints
Tablespaces and physical storage objects.
PRODUCTS
# PRODUCT_CODE
PRODUCT_DESCRIPTION
CATEGORY_CODE
CATEGORY_DESCRIPTION
SALES_REPS
#SALES_REP_ID
LAST_NAME
FIRST_NAME
oMANAGER_FIRST_NAME
oMANAGER_LAST_NAME
CUSTOMER_INVOICES
CUSTOMERS
#INVOICE_ID
#LINE_ITEM_SEQ
INVOICE_DATE
CUSTOMER_ID
BILL_TO_ADDRESS_ID
SALES_REP_ID
MANAGER_REP_ID
ORGANIZATION_ID
ORG_ADDRESS_ID
PRODUCT_CODE
QUANTITY
UNIT_PRICE
AMOUNT
oPRODUCT_COST
LOAD_DATE
#CUSTOMER_ID
#SNAPSHOT_DATE
CUSTOMER_NAME
oAGE
oMARITAL_STATUS
CREDIT_RATING
Sample Physical
Model
CUSTOMER_ADDRESSES
GEOGRAPHIC_BOUNDARIES
#CUSTOMER_ID
#ADDRESS_ID
ADDRESS_LINE1
oADDRESS_LINE2
oPOSTAL_CODE
SALES_REP_ID
GEO_CODE
LOAD_DATE
#GEO_CODE
CITY_NAME
STATE_NAME
COUNTRY_NAME
oCITY_ABBRV
oSTATE_ABBRV
oCOUNTRY_ABBRV
Data Architecting
What is data architecting???
Structure and locate data according to its
characteristics
3 Basic types of data
Real time data
Derived data
Reconciled data
Granularity
Level of summarization of data elements
Level of detail available in the data
More the detail Lower the granularity
Why is it important in DW???
Opportunity for TRADE-OFF
performance
vs. volume of data stored
ability to access detailed data vs. cost of storage
Granularity
Granularity
To overcome trade-offs between data volume and
query capability :
Divide the data in the DW
Create 2 levels of granularity of data
Detailed Raw data
keep it on separate storage medium
load when required
Summarized data
Geography (location)
Product (more generically, by line of business)
Organizational unit
A combination of the above
ODS
YES !
DATAWAREHOUSE/DATAMART
YES!
Dimensional modeling
Use for modeling during analysis and design
phases
Can be implemented using other modeling
styles e.g. object-oriented, relational
E-R Modeling
Produces a data model, using two basic
concepts entities and the relationships
between those entities.
Detailed ER models also contain attributes,
which can be properties of either the entities
or the relationships.
EMPLOYEE
EmpName
Address
Attributes
Relationships or Associations
Belongs
To
Entities
Principal data objects about which information
is to be collected.
Usually recognizable concepts such as person,
things, or events.
Examples : EMPLOYEES, PROJECTS,
INVOICES.
1: 1
One - Many
1: m
Many - Many
m:n
Normalization
Normalization - 1NF
Eliminate Repeating groups
Person
Skills
A Oracle, DB2
B MS Access, Oracle
C Oracle, CICS, SQL
D DB2, CICS
Who are the ones who have DB2 skills???
Normalization - 2NF
Eliminate Redundant data
Skill ID Skill Description
S1 DB2
S2 Oracle
S3 MS Access
S4 CICS
S5 SQL
Normalization - 3NF
Eliminate Columns Not Dependent On Key
Memb ID Skill ID
A
S1
Relational modeling
Represents business entities, data items
associated with each entity, and the
relationships of business interest among the
entities
Entities are usually broken down into
smallest possible units and combined using
relationships
Diagram looks like a spiderweb
Description
to describe precisely what the entity represents
required for sharing and reuse of data model
components
Category
classifies entities sharing common characteristics
Acronyms
avoid (not understood by all, not unique)
if used, document them
Homonyms
Same or similar in sound or spelling as another
BUT DIFFERENT IN MEANING!!
Create CONFUSION!
Synonyms
Same meaning ...
Same logical concept ...
Assigned different names!!
Introduce redundancy in model!
IDENTIFY AND RESOLVE them - for entities
and attributes!!
Synonyms (contd.)
Compare Definition, Relationships to other entities, Key
structure, attributes, domain values
Attribute Completeness
Checklist
Name
to uniquely identify the attribute
to meet naming conventions/standards
Description
to describe precisely what the attribute represents
Type
refers to how the attribute is used in the datamodel
Non-Key attributes
Specific Domain
Enumerated domain
specific set of values that are valid and allowed
static values (eg. Flat type : 2 bed, 3 bed, duplex etc)
Acronyms
avoid (not understood by all, not unique)
if used, document them
Key use
applies only to primary keys
will serve as primary or foreign key in child entity
Source
whether attribute is primitive or derived
Traceability
why is the attribute there
refer to source (paragraph, citation of statement, physical
data structure element ...)
mapped to metadata object that is maintained as part of
system lifecycle (eg. Critical success factor, objective,
physical system element like file, table
Derived Attributes
Created by accumulating values of multiple
instances of attributes. Eg.
Aggregation/summarization
Library Branch
BranchBranch
Holding
Holding
Branch id
Total Titles
Branch id
book id
number of copies
CalculatedAttributes
Describes a feature of a single instance of entity
Calculated from another single instance of related attribute
Attribute Metadata
TASK
Task id
Task Start date
Task End Date
Task Duration
Branch
Calculation formula for task Holding
duration:
Task duration = task end date - task start date
Derivation Dependencies :
1. Task start date and Task duration
2. Task end date and Task duration
TIME
PERIOD
PRODUCT
Order #(PK)
order date
Product #(PK)
Product Name
Product Price
PRODUCT
ORDER
Product # (PK)
Order # (PK)
Total units sold
Total sales price
PRODUCT
PERIOD
Product # (PK)
Period Start Date(PK)
Period End Date (PK)
Total product period sales
Attribute Names
SHOULD NOT
replace or contradict definition of attribute
contain abbreviations not approved by authority
Attribute Names
SHOULD NOT CONTAIN
Attribute Description
Builds on and is consistent with attribute name
unambiguous, clear, economically worded
stand alone (not dependent on another attribute
definition to convey meaning. BEWARE of circular
attribute definitions)
Never MISS giving a description
AVOID:
restating the name of attribute and/or characteristics (eg.
Length, data type, domain values)
using technical jargon
limiting description to direct extract from dictionary
Pretty Good
driving license #
passport #
SS #
None of them are definitive
Fingerprint ID Is DEFINITIVE
Relationships- Checklist
Name & Description - Optional
Type (identifying/non-identifying)
Cardinality (Degree/Nature)
one-to-one 1:1
many-to-one m:1
one-to many 1:m
many-to-many m:m(resolved using associative entities)
Dimensional Modeling
Dimensional modeling uses three basic
concepts : measures, facts, dimensions.
Is powerful in representing the requirements
of the business user in the context of
database tables.
Focuses on numeric data, such as values
counts, weights, balances and occurences.
Dimensional modeling
Must identify
Facts
A fact is a collection of related data items,
consisting of measures and context data.
Each fact typically represents a business
item, a business transaction, or an event that
can be used in analyzing the business or
business process.
Facts are measured, continuously valued,
rapidly changing information. Can be
calculated and/or derived.
Fact Table
A table that is used to store business
information (measures) that can be used in
mathematical equations.
Quantities
Percentages
Prices
Dimensions
A dimension is a collection of members or
units of the same type of views.
Dimensions determine the contextual
background for the facts.
Dimensions represent the way business
people talk about the data resulting from a
business process, e.g., who, what, when,
where, why, how
Dimension Table
Table used to store qualitative data about
fact records
Who
What
When
Where
Why
verbose, descriptive
complete
no misspellings, impossible values
indexed
equally available
documented ( metadata to explain origin,
interpretation of each attribute)
Dimensional model
visualise a dimensional model as a CUBE
(hypercube because dimensions can be more than
3 in number)
Operations for OLAP
Drill Down :Higher level of detail
Roll Up: summarized level of data
(The navigation path is determined by hierarchies within dimensions.)
Dimensions
Collection of members or units of the same type of
views.
determine the contextual background for the facts.
the parameters over which we want to perform
OLAP (Eg. Time, Location/region, Customers)
Member is a distinct name to determine data items
position (eg. Time - Month, quarter)
Hierarchy arrange members into hierarchies or levels
Hierarchies
Allow for the rollup of data to more
summarized levels.
Time
day
month
quarter
year
Hierarchies
Aggregates
Aggregate
Tables
are
pre-stored
summarized tables created at a higher
level of granularity across any or all of the
dimensions.
If the existing granularity is Day wise sales,
then creating a separate month wise sales
table is an example of Aggregate Table.
Aggregates
The use of such aggregates is the single
most effective tool the data warehouse
designer has to improve query performance.
Usage of Aggregates can increase the
performance of Queries by several times.
Measures
A measure is a numeric attribute of a fact,
representing the performance or behaviour of the
business relative to dimensions.
The actual numbers are called as variables.
Eg. sales in money, sales volume, quantity supplied, supply cost,
transaction amount
THE CUBE
Types of Facts
Additive
Able to add the facts along all the dimensions
Discrete numerical measures eg. Retail sales in $
Semi Additive
Snapshot, taken at a point in time
Measures of Intensity
Not additive along time dimension eg. Account
balance, Inventory balance
Added and divided by number of time period to get
a time-average
Types of Facts
Non Additive
Numeric measures that cannot be added across any
dimensions
Intensity measure averaged across all dimensions eg.
Room temperature
Textual facts - AVOID THEM
Advantages of Dimensional
Modeling
Allows complex multi-dimensional data
structure to be defined with a very simple data
model.
Reduces number of physical joins the query
has to process
Simplifies the view of data model.
Allows DWH to expand and evolve with
relatively low maintenance.
Customers Location
Sales
Rep
Date
Product
Manufacturing
Employee
Compensation
Products Customers
Location Sales
Rep
Date
Product
Manufacturing
(units)
Sales
Commission ($)
Payroll (gross)
($)
TIME PERIOD
PRODUCT
Product description
Category code
Category description
SALES REP
Last name
First name
Invoice date
Fiscal year
Quarter
Month
Week
CUSTOMERS
Customer name
ADDRESS
Address line 1
Address line 2
City name
State abbreviation
Postal code
Country name
CUSTOMER DEMOGRAPHICS
Snapshot date
Credit rating
Marital status
Age
PRODUCT_SNAPSHOTS
PRODUCTS
#PRODUCT_CODE
#SNAPSHOT_DATE
. MSRP
. UOM
. PRIMARY_SUPPLIER_NAME
. SUPPLIER_CITY_NAME
. SUPPLIER_STATE_ABBRV
. SUPPLIER_COUNTRY_NAME
#PRODUCT_CODE
. PRODUCT_DESCRIPTION
. CATEGORY_CODE
. CATEGORY_DESCRIPTION
SALES_REPS
# SALES_REP_ID
. LAST_NAME
. FIRST_NAME
o
MANAGER_FIRST_
NAME
oMANAGER_LAST
_NAME
CUSTOMER_INVOICES
#INVOICE_ID
#LINE_ITEM_SEQ
. INVOICE_DATE
. CUSTOMER_DATE
. BILL_TO_ADDRESS_ID
. SALES_REP_ID
. MANAGER_REP_ID
. ORGANIZATION_ID
. ORG_ADDRESS_ID
. PRODUCT_CODE
. QUANTITY
. UNIT_PRICE
. AMOUNT
o PRODUCT COST
. LOAD_DATE
CUSTOMER_ADDRESSES
#CUSTOMER_ID
#ADDRESS_ID
. ADDRESS_LINE1
oADDRESS_LINE2
oPOSTAL_CODE
. SALES_REP_ID
. GEO_CODE
. LOAD_DATE
PURCHASE_INVOICES
# INVOICE_ID
#LINE_ITEM_SEQ
. INVOICE_DATE
. SUPPLIER_ID
. ADDRESS_ID
. BUDGET_ID
. REVISION_SEQ
. BUDGET_LINE_ITEM_SEQ
. PRODUCT_CODE
. QUANTITY
. UNIT_PRICE
. AMOUNT
. LOAD_DATE
CUSTOMERS
#CUSTOMER_ID
#SNAPSHOT_DATE
. CUSTOMER_NAME
oAGE
oMARITAL STATUS
. CREDIT_RATING
#BUDGET_ID
#REVISION_SEQ
#LINE_ITEM_SEQ
. BLI_TYPE_CODE
. BLI_TYPE_DESCRIPTION
. ORGANIZATION_ID
. ADDRESS_ID
. BUDGET_PERIOD
. LOAD_DATE
. BUDGET_AMOUNT
. EXPENDITURES
o PRODUCT_CODE
SUPPLIER_ADDRESSES
#SUPPLIER_ID
#ADDRESS_ID
. SUPPLIER_NAME
oPOSTAL_CODE
. GEO_CODE
. LOAD_DATE
GEOGRAPHIC_BOUNDARIES
#GEO_CODE
. CITY_NAME
. STATE_NAME
. COUNTRY_NAME
oCITY_ABBRV
oSTATE_ABBRV
oCOUNTRY_ABBRV
BUDGET_DETAILS
Sample Physic
Model
for
Data Warehous
INTERNAL_ORG_ADDRESSES
#ORGANIZATION_ID
#ADDRESS_ID
. ORG_TYPE
. ORGANIZATION_NAME
. ADDRESS_LINE1
oADDRESS_LINE2
oPOSTAL_CODE
. GEO_CODE
oPARENT_ORG_ID
. LOAD_DATE
Snowflake - Disadvantages
Normalization of dimension makes it
difficult for user to understand
Decreases the query performance because it
involves more joins
Dimension tables are normally smaller than
fact tables - space may not be a major issue
to warrant snowflaking
Keys ..
Primary Keys
uniquely identify a record
Foreign Keys
primary key of another table referred here
Surrogate Keys
system-generated key for dimensions
key on its own has no meaning
integer key, less space
More Keys ..
Smart Keys
primary key out of various attributes of
dimension
AVOID THEM!
Join to Fact table should be on single surrogate
key
Production Keys
DO NOT USE Production defined attributes
Business may reuse/change them - DW cannot!
TYPE 1
Overwrite dimension record with new
values
used when old value of attribute has no
significance
Large Dimensions
Dimensions containing several million records!!!
HOW TO SUPPORT???
Database to support indexing technology
that support rapid browsing
Find and suppress duplicate entries in the
dimension (eg. Name and address
matching)
Never use Type 2 to solve changing
dimensions (i.e. adding records)
HOW TO SUPPORT???
Break the Monster dimension into separate
dimension tables
Constant information into original table
New dimension table can have discrete
values for each attribute
Choose pre-defined set of values per
attribute
Fact Table
Customer Dimension
Customer_Key (PK)
Name
Original_Address
date_of_birth
first_order_date
..
Income
Education
Number_children
marital_status
credit_score
purchase_score
Fact Table
Customer Dimension
Becomes..
Customer_Key
(PK) Name
Original_Address
date_of_birth
first_order_date
..
Demographics Dimension
Demog_Key (PK)
Income
Education
Number_children
marital_status
credit_score
purchase_score
Customer Dimension
Customer_Key (PK)
Relatively constant
attributes .
Demographics dimension
Fact Table
Any fact table containing
customer_key,
demog_Key
demog_key and
demographic attributes
.
purch_cred_demog_key
as foreign keys .
Drawbacks
Forced to use ranges of discrete values for
dimensional attributes
New dimension cannot be too big (not >1M)
Data in new dimension can be accessed along with
static data only through the fact table - slower
Only if event occurs, link the static and changing
portions of dimension - keep a dummy event in fact
Degenerate Dimensions
Occur in line item oriented fact tables
occur when dimension table is left only
with a single key and no other fields
all other attributes have been moved into
other dimension tables
Moved to fact table - not joined to anything
Junk Dimensions
Number of miscellaneous flags and text
attributes left over after design
WHAT TO DO WITH THEM????
DO NOT
Leave them behind in the fact table
Make each flag and attribute into its own dimension
Strip off all such flags and attributes
Conformed Dimensions
Dimension that means the same thing with every
possible fact table that it is joined.
Dimension is identically the same dimension in each
data mart
Major responsibility of the central DWdesign team is to
establish, publish, maintain and enforce them
DW cannot function as an integrated whole without
strict adherence to conformed dimensions
Time Dimension
An exclusive Time dimension is required
because the SQL date semantics and
functions cannot generate several important
attributes required for analytical purposes.
Attributes like weekdays, weekends, fiscal
period, holidays, season cannot be
generated by SQL statements.
Time Dimension
Moreover SQL date stamps occupy more
space largely increasing the size of the fact
table.
Joins on such SQL generated date-stamps
are costly decreasing the query speed
significantly.
Time Dimension
The Day of week(Monday, ...) is useful to
create reports comparing for ex. Monday
sales to Friday sales.
The Day number in month is useful for
comparing measures for the same day in
each month.
The last day in month flag is useful for
performing payday analysis.
Time Dimension
The holiday flag and season attributes are
useful for holiday VS non-holiday analysis
and season business analysis.
Event attribute is needed to record special
days like strike days, etc..
Case Study
on
Data Modeling
Store
Store Key
Store Id
Store Name
Locality
Region
.
.
Sales Fact
Time Key
Product Key
Store Key
Promotion Key
Sales (Rs.)
Product
Product Key
Product Id
Product category
..
Brand Name
SKU
..
Promotion
Fact
Time Key
Product key
Store key
Promotion key
Time
Time Key
Time Id
Date
Month
Year
.
.
Promotion
Promotion key
Promotion Id
Promotion Category
..
Promotion Name
..
Aggregates
Consider a schema with Product and Time
Dimensions with a granularity of individual
product Brand and day wise sales.
The Product Hierarchy:
Category-Product-Brand
The Time Hierarchy:
Year-Month-Day
Aggregates
Product Dimension
Categories : 3
Products : 30
Brands
: 150
i.e 150 rows in the Product Dimension
Time Dimension
Year : 5
Month
: 60
Days : 365*5=1825
i.e 1825 rows in the Time Dimension
Aggregates
Assuming a transaction for each of the
Brands everyday; we have 1825*150 rows
in our sales Fact table.
A Query like: Show Category wise sales
figures for the past five years would have to
access 1825*150 rows to get the answer.
Aggregates
Aggregated Tables
Product
Category: 3
Time
Year : 5
Month: 60
There would be 60*3=180 rows in this
aggregated fact table.
The query on this table needs to access only
180 rows to get the same set of results.
Aggregates
MONTH
Time_Key
Month
Fiscal_Period
Season
CATEGORY
AGG. SALES
FACT
Category_Key
Time_Key
Department
Category_Key
Sales
Cost
Category
Aggregates
Aggregates increase the complexity of the
data model.
Aggregates increase the maintenance load
on the Data warehouse. They must be
updated as the base table data gets updated.
Aggregates occupy storage space. Hence
aggregates should be created only for
frequent and time taking queries.
Aggregate Navigation
Aggregate Navigation features enable endusers to query the data mart without
bothering about the presence of aggregates.
Without Aggregate navigation, the end user
needs to be aware of the presence of
aggregates so that he can query the
aggregated table instead of detailed table
thus increasing the complexity of the user
interface.
Aggregate Navigation
An aggregate navigator intercepts the
clients SQL and if possible transforms
base-level SQL into aggregate aware SQL.
Aggregate Aware function in Business
Objects 4.1 is an example of Aggregate
navigator.
Aggregate Navigation
New features in Oracle 8i like Materialized
views, Query rewrite
enable aggregate navigation to be built
within the data mart DBMS instead of front
end access tools.
enables all front end access tools to utilize
the aggregate navigation feature.
STUDENT
Time_Key
Student_Key
COURSE
Course_Key
TEACHER
Teacher_Key
attendance=1
The grain of this fact table is individual attendance event.
Dummy measure-attendance included to make the SQL
more readable.
Store
Store Key
Store Id
Store Name
Locality
Region
.
.
Sales Fact
Time Key
Product Key
Store Key
Promotion Key
Sales (Rs.)
Product
Product Key
Product Id
Product category
..
Brand Name
SKU
..
Promotion
Fact
Time Key
Product key
Store key
Promotion key
Time
Time Key
Time Id
Date
Month
Year
.
.
Promotion
Promotion key
Promotion Id
Promotion Category
..
Promotion Name
..
Disadvantages
Increased risk of producing wrong set of requirements
Disadvantages
Expectations to be closely managed.
Case Study
CelDial Case Study
HOW??
Determine for each measure which additional dimensions can
be added to increase its granularity
OR
record the discount amount and generate the
quantity sold at a discount by adding up the
quantity sold where the discount amount is not
zero.
Solution: Merge Fact 2, 3, 4