Computer Science Faculty Information Systems Department: Data Warehousing & BI
Computer Science Faculty Information Systems Department: Data Warehousing & BI
3 2 5
2 5 4
• A large number of possible connections to any two (or more) tables
Information Systems Department 7
Need for DM: The Paradox
• The Paradox: Trying to make information accessible using tables resulted in an
inability to query them!
• ER and Normalization result in large number of tables which are:
• Hard to understand by the users (DB programmers)
• Hard to navigate optimally by DBMS software
• Real value of ER is in using tables individually or in pairs
• Too complex for queries that span multiple tables with a large number of records
Books Cloths
Engg Medical
Star
Snow-flake
QUARTER
YEAR
Information Systems Department 16
The Benefit of Simplicity
• Business Processes are often termed as Data Marts and that is why
many people criticize DM as being data mart oriented.
Star-1
Snow-flake
Star-2 23
• Typical grains
• Individual Transactions
• Daily aggregates (snapshots)
• Monthly aggregates
150
100
50
0
Week-1 Week-2 Week-3 Week-4
Facts
“We need monthly sales
volume and Rs. by
week, product and Zone”
Dimensions
• Example: Time_of_Day may change several times during daily aggregate, but
not during a transaction
• Choose the dimensions that are applicable within the selected grain.
• Example: Time_of_Day may change several times during daily aggregate, but
not during a transaction
• Choose the dimensions that are applicable within the selected grain.
• Algebraic
• Compute aggregate from constant-sized summary of subgroup
• Examples: STDDEV, AVERAGE
• For AVERAGE, summary data for each group is SUM, COUNT
• Holistic
• Require unbounded amount of information about each subgroup
• Examples: MEDIAN, COUNT DISTINCT
• Usually impractical for a data warehouses!
• Dilemma: Want to track both old and new descriptions, what do they
use for the key? And where do they put the two values of the
changed ingredient attribute?
• Create an additional dimension record at the time of change with new attribute
values.
• Requires adding two to three version numbers to the end of key. SKU#+1, SKU#+2
etc.
- Note: For Detailed analysis, refer to chapter two of Dimensional Modelling(third edition).
You will find 7 options on SCD.