0% found this document useful (0 votes)
399 views128 pages

DataModel Session ELTP

Data Modeling is an abstraction of some aspect of the real world (system) a Data Model is an. Iterative process Highly detailed, iterative. Process Uses basic objects to deliver pictorial image of requirements.

Uploaded by

bharatheedara
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
399 views128 pages

DataModel Session ELTP

Data Modeling is an abstraction of some aspect of the real world (system) a Data Model is an. Iterative process Highly detailed, iterative. Process Uses basic objects to deliver pictorial image of requirements.

Uploaded by

bharatheedara
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 128

Data Modeling

© Mahindra Satyam 2009


AGENDA

Time Topics to be Covered


Over view of Data Model
Need of Data Model
9.30am to 11.00am
Types of Data Model
Overview of Normalized Data Model and Case Study discussion
11.00am to 11.15 Tea Break

1.00pm to 2.00pm Lunch Break

2.00pm 3.30pm Dimensional Data Model (Cont…)

3.30 pm to 3.45pm Tea Break

3.45pm 5.15pm Dimension model with ERWin Demo

5.30pm to 6.30pm ERWin Demo with Q & A Session

© Mahindra Satyam 2009 2


What is Data Modeling

WHAT IS A DATA MODELING?


A data model is an abstraction of some aspect of the real
world (system).
 Data-oriented activity!
 Part art, part science
 Highly detailed, iterative process
 Uses basic objects to deliver pictorial image of
requirements
– Entities (ERD &DDM)
– Attributes (ERD & DDM)
– Relationships (ERD & DDM)
 Uses Metadata to supplement data requirements
described by pictorial image

© Mahindra Satyam 2009 3


What happens if you don’t have one?

Individual Data Store

© Mahindra Satyam 2009 4


What happens if you don’t have one?

Corporate Data Store

© Mahindra Satyam 2009 5


Where Data Models are used

Operational Systems
 Traditional Applications designed to run the day-to-day business of the Enterprise
External Systems ***
 Data used within an Enterprise that is obtained from outside sources
Staging Areas ***
 Created to aid in the collection and transformation of data that is targeted for a Data
Warehouse
Operational Data Store ***
 W. H. Inmon and Claudia Imhoff definition: ―A subject-oriented, integrated, volatile, current
valued data store containing only corporate detailed data‖.
Data Warehouse (DW)
 W. H. Inmon definition: ―A subject-oriented, integrated, non-volatile, time-variant collection of data
organized to support management needs‖.

Data Mart (DM)


 TDWI definition: ―A data structure that is optimized for access. It is designed to facilitate end-user analysis
of data. It typically supports a single analytic application used by a distinct set of workers.‖
*** - Not discussed here

© Mahindra Satyam 2009 6


DATA MODELING TECHNIQUES

Entity Relationship Model (ERM)

Dimensional Data Model (DDM)

© Mahindra Satyam 2009 7


Where to use what?

Stages Types of Model

OLTP Normalized Data Model

Staging Flat Table without


Area constraints

ODS Normalized model

Data marts Dimensional model

© Mahindra Satyam 2009 8


DW and role of E/R Modeling

Bill Inman says……. Ralph Kimball says…….


ER Model is suitable for data ER Models are too complicated for
warehouses because it is end users to understand
stable, and supports ER Modeling/ normalizing only
consistency and flexibility suitable for OLTP or in data staging
area since it eliminates redundancy
Normalized data is ideal
Results in too many tables to be
basis for the design of the
easy to query
Data Warehouse and the
ER models are optimized for update
ODS
activity not high performance
May not be suitable for the querying
data mart, which deals
heavily with regular query
activity and time-variant
analysis
© Mahindra Satyam 2009
Who is right?
9
Normalized Data Model

© Mahindra Satyam 2009 10


TOPICS TO BE COVERED…

ER Model Concepts


☻ ER Diagrams - Notation
☻ Entities and Attributes
☻ Weak Entity Types
☻ Entity Types, Value Sets, and Key Attributes
☻ Relationships and Relationship Types
☻ Roles and Attributes in Relationship Types
ER Diagram for COMPANY Schema

© Mahindra Satyam 2009 11


DATABASE DESIGN STEPS

© Mahindra Satyam 2009 12


ENTITIES

Entities principal data object about which information is to be


collected.

Recognizable concepts, either concrete or abstract, such as


person, places, things, or events which have relevance to the
database.

Examples of entities are EMPLOYEES, PROJECTS, INVOICES.

 An entity is analogous to a table in the relational model.

Student is an entity.

Student

© Mahindra Satyam 2009 13


WEAK ENTITY TYPES

An entity that does not have a key attribute


 A weak entity must participate in an identifying relationship type with
an owner or identifying entity type.
 Entities are identified by the combination of:
• A partial key of the weak entity type
•The particular entity they are related to in the identifying entity type

© Mahindra Satyam 2009 14


ATTRIBUTES

Attributes are data objects that either


identify or describe entities. Student
•Name
•Last Name
Attributes that identify entities are key
•First Name
attributes. •Address
•Street Address
Attributes that describe an entity are •City
non-key attributes. •State or Province

First Name

Attributes Address

City

© Mahindra Satyam 2009 15


ATTRIBUTES

Attributes are properties used to E.g.: An EMPLOYEE entity may have a


describe an entity. Name, SSN, Address, Sex, Birthdates

A specific entity will have a value E.g.: A specific employee entity may have
for each of its attributes Name='John Smith', SSN='123456789',

Each attribute has a value set (or E.g.: integer, string, date , enumerated type,
data type) associated with it …

© Mahindra Satyam 2009 16


TYPES OF ATTRIBUTES

Simple Attributes
•Each entity has a single atomic value for E.g. SSN or Sex
the attribute.
Composite Attributes
•The attribute may be composed of several E.g.: Address (Apt#, House#, Street,
components. City, State, Zip_Code, Country)
or
•Composition may form a hierarchy where Name (First_Name, Middle_Name,
some components are themselves Last_Name).
composite
Multi-valued Attributes
•An entity may have multiple values for the E.g.: Color of a CAR or
attribute. Previous Degrees of a STUDENT.

Nested Attributes
In general, composite and multi-valued E.g.: Previous Degrees of a STUDENT is a
attributes may be nested arbitrarily to any composite multi-valued attribute denoted by
number of levels although this is rare. {Previous Degrees (College, Year, Degree,
Field)}.

© Mahindra Satyam 2009 17


RELATIONSHIP

A Relationship represents an association between two or more


entities.

Entities Strong or Weak/Dependent or Independent depending


on Relationship

© Mahindra Satyam 2009 18


RELATIONSHIP

1 0

1:N APARTMENT
BUILDING

Strong Entity Weak Entity

© Mahindra Satyam 2009 19


CLASSIFYING RELATIONSHIPS

 Classified by their
 Degree
 Connectivity
 Cardinality
 Direction
 Existence.

© Mahindra Satyam 2009 20


DEGREE OF A RELATIONSHIP

The number of entities associated with the relationship.

Binary relationships, the most common type in the real world.

Ternary relationship when a binary relationship is inadequate.

© Mahindra Satyam 2009 21


DEGREE OF RELATIONSHIP

One entity
related to Entities of two
another of the different types Entities of three
same entity related to each different types
type other related to each
other

© Mahindra Satyam 2009 22


CONNECTIVITY AND CARDINALITY

Connectivity describes the mapping of associated


entity instances in the relationship.

 The values of connectivity are "one" or "many".

Cardinality is the actual number of related


occurrences for each of the two entities.
one-to-one,
one-to-many,
many-to-many.

© Mahindra Satyam 2009 23


CARDINALITY…

© Mahindra Satyam 2009 24


CARDINALITY…

© Mahindra Satyam 2009 25


CARDINALITY…

Many-to-many relationships cannot be directly translated to


relational tables but instead must be transformed into two or
more one-to-many relationships using associative entities.

Employee Emp_Proj Projects

© Mahindra Satyam 2009 26


DIRECTION

The direction of a relationship indicates the originating entity of


a binary relationship.

The entity from which a relationship originates is the parent


entity.

The entity where the relationship terminates is the child entity.

Patient Patient History

Parent Entity Child Entity

© Mahindra Satyam 2009 27


EXISTENCE

Denotes whether the existence of an entity instance is


dependent upon the existence of another, related, entity
instance.

Either mandatory or optional.

Mandatory - “Every project must be managed by a single


department".

Optional - "employees may be assigned to a BU".

© Mahindra Satyam 2009 28


CONSTRAINTS ON RELATIONSHIPS

Constraints on Relationship Types


( Also known as ratio constraints )

•Cardinality Constraints - the number of instances of one


entity that can or must be associated with each instance of
another entity.
•Minimum Cardinality(also called participation
constraint or existence dependency constraints)
If zero, then optional participation, not existence-dependent
If one or more, then mandatory, existence-dependent

•Maximum Cardinality
The maximum number
One-to-one (1:1)
One-to-many (1:N) or Many-to-one (N:1)
Many-to-many

© Mahindra Satyam 2009 29


CONCEPTUAL MODELING

A conceptual model shows data through business eyes.

Identify entities which have business meaning.

Identify important relationships

Identify significant attributes in the entities.

© Mahindra Satyam 2009 30


CONCEPTUAL MODELING

Next step is to build the ER Diagram from the entities and data
items identified in the requirements.

Determine if there are any relationships between the entities.

An entity that does not relate to any other entity may end up
as a “stand alone” table with no defined relationships.

© Mahindra Satyam 2009 31


ER – DIAGRAM NOTATIONS

© Mahindra Satyam 2009 32


CASE STUDY

The XYZ Company wants Satyam to design and develop a database system for
its regular operations.

The database should record information about the departments, projects,


employees and their dependant. The company is organized into departments.
Employees work for a department and may work on many projects. Departments
control the project which are being operated from that location. Department has
to be managed by someone.

There are managers who manages and monitors the work done by the
employees. Suppose an employee is assigned to a project, the hours are
calculated based on number of hours the employee is scheduled to work on a
project.

Although most employees have managers, senior staff. The date on which a
manager started managing the department could be stored as an attribute of
department.

A department may be spread over many locations. The department name and
number are unique for the department. Employee may have number of
dependants.

© Mahindra Satyam 2009 33


IDENTIFYING ENTITIES

Number of
Fname Mname Lname employees

Address Dnumber
N 1
Name Dname Dlocation
Salary WORKS_FOR
Sex
SSNO
Department
Employee Startdate
Bdate 1

1 1
CONTROLS
MANAGES

Hours N
supervisor supervisee
M N

SUPERVISION
WORKS_ON Project
1 N
1

DEPENDANTS_OF
Pname Pnumber Plocation

Dependant

Name Sex Bdate Relationship

© Mahindra Satyam 2009 34


1-TO-1 ENTITY RELATIONSHIP

1 1
Employee
MANAGES Department
(Manager)

The relationship between these two entities is 1 to 1


because in this company, only 1 manager is allowed
to manage a single department.
Every department is required to have an assigned
manager.
What kind of table design does this suggest?
A single table: for the Department entity that
includes the Manager Entity.

© Mahindra Satyam 2009 35


ONE-TO-MANY (1:N) RELATIONSHIP

1 N
Department WORKFOR Employees

The relationship between these two entities is 1 to


Many because there can be 1 or more employees in
each department.
Every department is required to have at least one
employee, and no employee can belong to more than
one department.
What kind of table design does this suggest?
A single table for each entity: the Department Table
and Employee Table.

© Mahindra Satyam 2009 36


MANY-TO-ONE (N:1) RELATIONSHIP

N 1
Dependants DEPENDANT_OF Employees

The relationship between these two entities is Many


to 1 because there can be 1 or more dependants for
each employee.
What kind of table design does this suggest?
A single table for each entity: the Dependants Table
and Employee Table.

© Mahindra Satyam 2009 37


MANY- TO – MANY (N:M) RELATIONSHIP

Works On
Employee Project
Have
These 2 entities have 2 relationships - 1 to many in
each direction - resulting in a many-many
relationship.
Employees are optionally assigned to one or more
Projects, as appropriate. A Project must have at
least 1 employee.
What kind of table design does this suggest?
2 Tables plus a table with a column for each entity.
(Employee, Project, Employee_Project)
© Mahindra Satyam 2009 38
RECURSIVE RELATIONSHIPS

MANAGES

EMPLOYEE

We can also have a recursive relationship type.


Both participations are same entity type in different roles.
E.g.: SUPERVISION (MANAGES) relationships between
EMPLOYEE (in role of supervisor or boss) and (another)
EMPLOYEE (in role of subordinate or worker).
In ER diagram, need to display role names to distinguish
Participations.

© Mahindra Satyam 2009 39


ATTRIBUTES OF RELATIONSHIP TYPES

Here, the date completed attribute pertains specifically


to the employee’s completion of a course…it is an
attribute of the relationship

© Mahindra Satyam 2009 40


NOTATION

The (min, max) notation


relationship constraints
(0,1) (1,1)

(1,N) (1,1)

© Mahindra Satyam 2009 41


PROBLEM WITH ER

The Entity Relationship Model In Its Original


Form Did Not Support

The Specialization
Generalization

© Mahindra Satyam 2009 42


Rationale for
Dimensional Modeling

43
© Mahindra Satyam 2009 43
Dimensional Model

Definition

 Logical data model used to represent the measures and dimensions that
pertain to one or more business subject areas
 Dimensional Model = Star Schema

 Serves as basis for the design of a relational database schema

 Can easily translate into multi-dimensional database design if required

 Overcomes OLTP design shortcomings

© Mahindra Satyam 2009 44


Dimensional Model Advantages

Understandable
Systematically represents history
Reliable join paths

High performance query

Enterprise scalability

© Mahindra Satyam 2009 45


Subject Area Models

Subject
area E/R
models
Manufacturing and Shipping and Sales Order Entry Customer Support
Process Control Inventory and Campaign and Relationship
Management Management Management

Product Sales and Customer


Development Operations Marketing Services

Subject area
dimensional
models

© Mahindra Satyam 2009 46


Enterprise Models

Enterprise
Scope E/R
model

Enterprise
scope
dimensional
model

© Mahindra Satyam 2009 47


PROCESS MEASUREMENT

Measures
Metrics or indicators by which people evaluate a business
process
Referred to as “Facts”
Examples Coffee Maker Fulfillment Report
Margin
Inventory Amount
Brand Product Units Sold Units Shipped % Shipped

Captain Standard 5,000 3,800 76%

Sales Dollars Coffee Coffee


Maker

Receivable Dollars Thermal


Coffee
2,400 1,632 68%

Return Rate
Maker

Deluxe
Coffee 2,073 1,658 80%
Maker

All
Products 9,473 7,090 75%

Facts

© Mahindra Satyam 2009 48


Star Schema Dimension Tables

Dimension tables
 Store dimension values Dimension

 Textual content Dimension


 Dimension tables usually
referred to simply as
'dimensions'
 Spend extra effort to add
dimensional attributes
Dimension

© Mahindra Satyam 2009 49


Dimension Keys

Synthetic keys Dimension


 Each table assigned a Dimension
key

unique primary key, key


specifically generated for
the data warehouse

 Primary keys from source


systems may be present Dimension

in the dimension, but are key

not used as primary keys


in the star schema

© Mahindra Satyam 2009 50


Dimension Columns

Dimension
Dimension attributes
 Specify the way in which Dimension
Key
attribute
measures are viewed: Key
attribute
rolled up, broken out or attribute
attribute
summarized attribute
 Often follow the word ―by‖ attribute
as in ―Show me Sales by
Region and Quarter‖ Dimension

 Frequently referred to as Key


attribute
'Dimensions'
attribute
attribute

© Mahindra Satyam 2009 51


Star Schema Fact Table

Process measures
 Start by assigning one fact
table per business subject
area Fact Table
 Fact tables store the
process measures (aka
Facts) fact1
 Compared to dimension fact2
tables, fact tables usually fact3

have a very large number


of rows

© Mahindra Satyam 2009 52


Fact Table Primary Key

Every fact table


 Multi-part primary key
added
 Made up of foreign keys Fact Table
key
referencing dimensions key
key
fact1
fact2
fact3

© Mahindra Satyam 2009 53


Fact Table Grain

Grain
 The level of detail represented by a
row in the fact table
 Must be identified early
 Cause of greatest confusion during Fact Table
design process
Example
 Each row in the fact table represents
the daily item sales total

© Mahindra Satyam 2009 54


Designing a Star Schema

Five initial design steps


Based on Kimball's six steps
Start designing in order
Re-visit and adjust over project life

Five initial design steps


 Identify fact table
 Identify fact table grain
 Identify dimensions
 Select facts
 Identify dimensional attributes

© Mahindra Satyam 2009 55


EXERCISE 1

Scenario
Industry: Automobile manufacturing
Company: Millennium Motors
Value chain focus: Sales
Sample business questions:
What are the top 10 selling car models this month?
How do this months top 10 selling models compare to the top 10 over
the last six months?
Show me dealer sales by region by model by day
What is the total number of cars sold by month by dealer by state?
List facts and dimensions

© Mahindra Satyam 2009 56 56


EXERCISE 1 SOLUTION

Facts
Sales revenue
Quantity sold
Dimensions
Model name
Month
Dealer name
Region
State
Date

© Mahindra Satyam 2009 57 57


Example Fact Table

Sales Facts
model_key
dealer_key
time_key

revenue
quantity

© Mahindra Satyam 2009 58


Example Fact Table Records

Sales Facts

time_key model_key dealer_key revenue quantity


1 1 1 75840.27 2
1 2 1 152260.37 3
1 3 1 28360.15 1
1 4 1 132675.22 4
1 5 1 43789.45 1
1 1 2 35678.98 1
1 3 2 57864.78 2
1 5 2 92876.67 2
Primary Key Facts
© Mahindra Satyam 2009 59
Facts

Fully additive
 Can be summed across any and all dimensions
 Stored in fact table Time
 Examples: revenue, quantity time_key

year
Model
Sales Facts quarter
model_key
model_key month
dealer_key date
brand
category time_key
line
model revenue
quantity

Dealer
dealer_key

region
state
city
dealer

© Mahindra Satyam 2009 60


Facts

Semi-additive
 Can be summed across most dimensions but not all
 Examples: Inventory quantities, account balances, or personnel counts
 Anything that measures a ―level‖
 Must be careful with ad-hoc reporting
 Often aggregated across the ―forbidden dimension‖ by averaging
Time
Model time_key
Sales Facts
model_key model_key
year
dealer_key
brand quarter
time_key
category month
line date
model inventory Dealer
dealer_key

region
state
city
dealer
© Mahindra Satyam 2009 61
Facts

Non-Additive
 Cannot be summed across any dimension
 All ratios are non-additive
 Break down to fully additive components, store them in fact table
Time
Model Sales Facts
time_key
model_key model_key
dealer_key year
brand time_key quarter
category month
line revenue date
model margin_amt
Dealer
dealer_key

Margin_rate is non-additive
region
Margin_rate = margin_amt/revenue state
city
© Mahindra Satyam 2009 62 dealer
Unit Amounts

Unit price, Unit cost, etc.


 Are numeric, but not measures
 Store the extended amounts which are additive
 Unit amounts may be useful as dimensions for ―price point analysis‖
 May store unit values to save space

Factless Fact Table


 A fact table with no measures in it
 Nothing to measure...
 except the convergence of dimensional attributes
 Sometimes store a ―1‖ for convenience
 Examples: Attendance, Customer Assignments, Coverage

© Mahindra Satyam 2009 63


Example Dimension Table Records

Dealer Dimension
dealer_key region state city dealer
1 Northeast Massachusetts Boston Honest Ted's
2 Northeast Massachusetts Boston Stoller Co.
3 Southwest Arizona Tucson Wright Motors
12 Southwest California San Diego American
245 Central Illinois Chicago Lugwig Motors

Synthetic Key Attributes

© Mahindra Satyam 2009 64


Dimension Tables

Characteristics
 Hold the dimensional attributes
 Usually have a large number of attributes (―wide‖)
 Add flags and indicators that make it easy to perform specific types of reports
 Have small number of rows in comparison to fact tables (most of the time)

Don’t normalize dimension table


 Saves very little space
 Impacts performance
 Can confuse matters when multiple hierarchies exist
 A star schema with normalized dimensions is called a "snowflake schema"
 Usually advocated by software vendors whose product require snowflake for
performance

© Mahindra Satyam 2009 65


Slowly Changing Dimension Example

Example: A woman gets married


 Possible changes to customer dimension
– Last Name
– Marriage Status
– Address
– Household Income
 Existing facts need to remain associated with her single
profile
 New facts need to be associated with her married profile

© Mahindra Satyam 2009 66


Slowly Changing Dimension Types

Three types of slowly changing dimensions


 Type 1
– Updates existing record with modifications
– Does not maintain history
 Type 2
– Adds new record
– Does maintain history
– Maintains old record
 Type 3:
– Keep old and new values in the existing row
– Requires a design change

© Mahindra Satyam 2009 67


Designing Loads to Handle SCD

Design and implementation guidelines


 Gather SCD requirements when designing data mapping
and loading

 SCD needs to be defined and implemented at the


dimensional attribute level

 Each column in a dimension table needs to be identified as a


Type 1 or a Type 2 SCD

 If one Type 1 column changes, then all Type 1 columns will


be updated

 If one Type 2 column changes, then a new record will be


inserted into the dimension table
© Mahindra Satyam 2009 68
Type 1 Example

OLTP Star Schema


Customer OLTP Customer Dim Sales Facts Day Dim
Cust Marital Home Cust Cust Marital Home Cust Day Day Business
ID Name Status Income Key ID Name Status Income Status Key Key Sales Key Date

123 Sue Jones S $30K 1 123 Sue Jones S $30K 0 1 1 $40 1 1/31/01

Sue Gets Married 2/1/01


Customer OLTP Customer Dim Sales Facts Day Dim
Cust Marital Home Cust Cust Marital Home Cust Day Day Business
ID Name Status Income Key ID Name Status Income Status Key Key Sales Key Date

123 Sue Smith M $60K 1 123 Sue Smith M $60K 0 1 1 $40 1 1/31/01

1 2 $50 2 2/01/01

© Mahindra Satyam 2009 69


Type 2 Example

OLTP Star Schema


Customer OLTP Customer Dim Sales Facts Day Dim
Cust Marital Home Cust Cust Marital Home Cust Day Day Business
ID Name Status Income Key ID Name Status Income Status Key Key Sales Key Date

123 Sue Jones S 30K 1 123 Sue Jones S $30K 0 1 1 $40 1 1/31/01

Sue Gets Married 2/1/01


Customer OLTP Customer Dim Sales Facts Day Dim
Cust Marital Home Cust Cust Marital Home Cust Day Day Business
ID Name Status Income Key ID Name Status Income Status Key Key Sales Key Date

123 Sue Smith M $60K 1 123 Sue Jones S $30K 1 1 1 $40 1 1/31/01

2 123 Sue Smith M $60K 0 2 2 $50 2 2/01/01

© Mahindra Satyam 2009 70


Aggregate Designs

Aggregates
 Pre-stored fact summaries
 Along one or more dimensions
 The most effective tool for improving performance
Examples
 Summary of sales by region, by product, by category
 Monthly sales

© Mahindra Satyam 2009 71


Aggregate Background

Aggregate rationale
 Improve end user query performance
 Reduce required CPU cycles
 Powerful cost saving tool

Restrictions
 Additive facts only
 Must use dimensional design

Aggregate Guidelines

 Don’t start with aggregates


 Design and build based on usage

 Sooner or later you'll need to build aggregates

© Mahindra Satyam 2009 72


Aggregate Types

Separate Tables
 Separate fact table for every aggregate
 Separate dimension table for every aggregate dimension
 Same number of fact records as level field tables
Advantage
 Removes possibility of double counting
 Schema clarity

© Mahindra Satyam 2009 73


Separate Tables

Month
month_key
Mthly Sales Year
One Way Facts Agg Fiscal Period
Month
Aggregate month_key
product_key
market_key
Quantity Market
Amount
market_key
Product Region District
State
product_key Sales Facts City
Category
Brand time_key
Product product_key Time
Diet Indicator market_key
Quantity time_key
Amount Year
Fiscal Period
Month
Day
Day of Week

© Mahindra Satyam 2009 74


Multiple Fact Tables

Different business processes usually require different fact


tables

There are also several cases where a single business process


will require multiple fact tables

© Mahindra Satyam 2009 75


Different Dimensions or Grain

Shipper

shipper_key
name
Shipment Facts type
time_key mode
product_key address
shipper_key Time
market_key
Product time_key
Quantity
Year
product_key Weight
Fiscal Period
Category Brand
Month
Product
Day
Diet Indicator Sales Facts Day of Week
time_key
Market
product_key
market_key
market_key
Region District
Quantity
State
Amount
City

© Mahindra Satyam 2009 76


DRILLING

Drilling down
Adding dimensional detail Quarterly Auto Sales Summary
Further breaks out a Region Units Sold Revenue

measure in some way Northeast

Southeast

Central

Northwest

Southwest

Quarterly Auto Sales Summary

Region State Units Sold Revenue

Northeast Maine

New York

Massachusetts

Southeast Florida

Georgia

Virginia

© Mahindra Satyam 2009 77


DRILLING

Rolling up Quarterly Auto Sales Summary


Removing dimensional detail
Rolls up a measure Region

Northeast
State

Maine
Units Sold Revenue

New York

Massachusetts

Southeast Florida

Georgia

Virginia

Quarterly Auto Sales Summary

Region Units Sold Revenue

Northeast

Southeast

Central

Northwest

Southwest

© Mahindra Satyam 2009 78


CONFORMED DIMENSIONS

Definition
Dimensions are conformed when they are the same
-or-
When one dimension is a strict rollup of another

© Mahindra Satyam 2009 79 79


CONFORMED DIMENSIONS

Same dimensions must:

1. ... have exactly the same set of primary keys


and

2. ... have the same number of records

© Mahindra Satyam 2009 80


CONFORMED DIMENSIONS

Rolled up dimension
When one dimension is a strict rollup of another

Which means
Two conformed dimensions can be combined into a single
logical dimension by creating a union of the attributes

© Mahindra Satyam 2009 81


CONFORMED DIMENSIONS

Description
Shared common dimensions
Integrates logical design
Ensures consistency between data marts
Allows incremental development
Independent of physical location
Some re-work may be required

© Mahindra Satyam 2009 82


CONFORMED DIMENSIONS

Advantages
Enables an incremental development approach
Easier and cheaper to maintain
Drastically reduces extraction and loading complexity
Answers business questions that cross data marts
Supports both centralized and distributed architectures

© Mahindra Satyam 2009 83


Erwin

© Mahindra Satyam 2009


ERWIN

 All Fusion Erwin Data Modeler commonly known as


Erwin, is a powerful and leading data modeling tool
from Computer Associates.
 Has many powerful features that you can use to
design entity relation data models and dimensional
models
 Currently used Version : 4.1.4
 CA has recently released version Erwin Data Modeler
r7
 Has many powerful features that you can use to
design entity relation data models and dimensional
models

© Mahindra Satyam 2009 85


ERWIN BASIC FEATURES

 Creating a Model
 Templates - To save time, you can also start working from a
template that you or others in your workgroup have created. When
you create a model from a template, all the objects and display
settings in the template are automatically applied to the new
model.
 Subject Areas - For each new model, ERwin also automatically
creates a subject area (Main Subject Area). You can create
additional subject areas.
 Stored Displays – Represent a different view of a subject area
without the need to change setting repeatedly.
 Model Types – Logical, Physical , Logical/Physical or
Logical/Dimensional
 Modeling Preferences - You can customize your working
environment using ERwin's many display options and model
preferences. You can also choose to create your model using
IDEF1X or IE notation.

© Mahindra Satyam 2009 86


ERWIN BASIC FEATURES

 Creating a Model
 Reverse Engineering - Create a model by reverse
engineering an existing database.

 Printing Models – Can print on normal printers or plotter


printers

© Mahindra Satyam 2009 87


ERWIN FILE FORMATS

 ER1 - Standard ERwin file format. ERwin version 3.5.2 and later are
supported.
 XML - ERwin metamodel saved as an Extensible Markup Language file.
When you open an ERwin model saved in XML format, ERwin reads the
data structure specified in the XML file and automatically reverse
engineers the database and creates a matching data model diagram.
 ERS,SQL DDL (Data Definition Language) - schema script text file.
When you open a text file with this extension, ERwin reads the data
structure specified in the text file and automatically reverse engineers
the database and creates a matching data model.
 DBF- A file name with this extension is a database file in dBASE
format. When you open a DBF file, ERwin automatically reverse
engineers the database and creates a matching data model.
 MDB - A file name with this extension is a database file in Microsoft
Access format. When you open an *.mdb file, ERwin automatically
reverse engineers the database and creates a matching data model.

© Mahindra Satyam 2009 88


ERWIN WORKPLACE

Model Explorer & Toolbars

© Mahindra Satyam 2009 89


MODEL EXPLORER

© Mahindra Satyam 2009 90


LOGICAL AND PHYSICAL MODELS

© Mahindra Satyam 2009 91


NOTATIONS

© Mahindra Satyam 2009 92


DIMENSIONAL MODEL NOTATION

© Mahindra Satyam 2009 93


ADDING ENTITIES AND TABLES

© Mahindra Satyam 2009 94


RELATIONSHIPS

© Mahindra Satyam 2009 95


DOMAINS

© Mahindra Satyam 2009 96


RELATIONSHIP

© Mahindra Satyam 2009 97


RELATIONSHIP

© Mahindra Satyam 2009 98


ROLENAMES

© Mahindra Satyam 2009 99


DISPLAY LEVEL

© Mahindra Satyam 2009 100


SUBJECT AREAS

© Mahindra Satyam 2009 101


TRANSFORMS

© Mahindra Satyam 2009 102


NAMING STANDARDS

© Mahindra Satyam 2009 103


NAMING STANDARDS

© Mahindra Satyam 2009 104


FORWARD ENGINEERING

© Mahindra Satyam 2009 105


REVERSE ENGINEERING

© Mahindra Satyam 2009 106


REPORTS

© Mahindra Satyam 2009 107


Required Data Modeler Skills

OLAP Data Modelers:


 Design at multiple levels of granularity
 Recognize when the requirements call for
different model types
 Understand when to normalize or not
 Don’t be afraid of redundancy!
 Know the impact of “time” on a model
 Adapt to ever changing data requirements
 Be a resource to ETL teams and understand
data mapping issues
—Look good in all sorts of hats!

© Mahindra Satyam 2009 108


Are the expected benefits being realized?

There is no magic solution!

© Mahindra Satyam 2009 109


Are the expected benefits being realised?

The data model is required for good data


management but it is only one of the elements.

Today's systems are tomorrow's legacy systems!

© Mahindra Satyam 2009 110


Barriers to good data management

© Mahindra Satyam 2009 111


Barriers to good data management

 Data problems
– lack of resources, data hoarding, lack of data knowledge

 System users
– not committed, not convinced, lack of time

 Legacy systems and data stores

 Different business interests

 Cost

© Mahindra Satyam 2009 112


Barriers to good data management

© Mahindra Satyam 2009 113


Dimensional Modeling
Case Studies

114
© Mahindra Satyam 2009
CASE STUDY - 1
PURPOSE:-
The aim of the case study is to introduce you to the concepts
and principles involved in dimensional modeling design and
development.

You are expected to produce a small dimensional model based on


the scenario given in following slides.

© Mahindra Satyam 2009 115


CASE STUDY - 1
PROBLEM STATEMENT
Telecom Sales Assignment (Star Schema) : -
A telecom company wants to develop a data warehouse system to computerize its sales
management. Here are the details:-
 The company is tracking the sales of its products (made in different manufacturing plants) to
different customers.
 The company is basically comprised of two broad operations :
– Manufacturing products in its manufacturing plants
– Sales of these products by its sales outlets to customers
 The customers of the company are either big corporate companies or retailers who buy directly
over the counter.
 Each customer purchases one or more products through an order.
 There are two types of seller outlets:
– Corporate sales office
– Retail stores
 The products can be bought in the following two ways:
– In the case of retail (non-corporate) customers, the products are
purchased over the counter from retail outlets.
– In the case of corporate customers, orders can be placed over the phone
and goods are delivered directly from plant to the particular corporate office.

© Mahindra Satyam 2009 116


CASE STUDY - 1
Business Questions to be answered : -
1 What are the total cost and revenue for each model sold today, summarized by outlet,
outlet type, region?
2 What are the total cost and revenue for each model sold today, summarized by
manufacturing plant and region?
3 For each month how much was the ordered revenue by customer region? How much
was delivered?
4 What are the top five models sold last month by total revenue? By quantity sold? By total
cost?

© Mahindra Satyam 2009 117


CASE STUDY - 2
PURPOSE:-
The aim of the case study is to introduce you to the concepts
and principles involved in dimensional modeling design and
development.

You are expected to produce a small dimensional model based on


the scenario given in following slides.

© Mahindra Satyam 2009 118


CASE STUDY - 2
PROBLEM STATEMENT
Company Payroll Assignment (Star Schema) : -
A software company wants to develop a data warehouse system to computerize its payroll
management. Here are the details:-
 The company has 10000 employees on payroll out of which 9000 are
permanent employees and 1000 are contract employees.
 The company has 20 divisions.
 The company offices and development centers are in 50 locations (offshore
and onsite both included)
 The payroll cycle is monthly and payment is made on first of every month
 The paychecks are made in local currency (depending upon the assignment of
employee)
 The salary of the employee depends upon his grade. For every grade there is
a lower and higher salary bandwidth.

© Mahindra Satyam 2009 119


CASE STUDY - 2
Business Questions to be answered : -
1 What is the total payroll cost for each division for each pay cycle?
2 What is the payroll cost employee grade wise as a percentage of
total payroll cost per cycle?
3 What is location wise payroll cost every month?
4 Which are the top 5 divisions that have incurred maximum payroll
cost?
5 What is the ratio of supporting divisions payroll cost to the total
payroll cost?
6 What is the payroll cost of temporary employees as a ratio of total
payroll cost?

© Mahindra Satyam 2009 120


CASE STUDY - 3
PURPOSE:-
The aim of the case study is to introduce you to the concepts
and principles involved in dimensional modeling design and
development.

You are expected to produce a small dimensional model based on


the scenario given in following slides.

© Mahindra Satyam 2009 121


CASE STUDY - 3
PROBLEM STATEMENT
Automobile Finance Assignment (Star Schema) : -
An automobile company wants to develop a data warehouse system to computerize its finance
management. Here are the details:-

 The company has 1000 dealers (i.e. customers).


 The company has 10 profit centers.
 The revenue is accrued in local currency
 The company has 5 product groups. Each product group has several models
 The region for the sales person and customer is same as that of profit center

© Mahindra Satyam 2009 122


CASE STUDY - 3
Business Questions to be answered : -
1 What is the total revenue for each profit center for each month?
2 How is the revenue growth for each profit center on year-on-year
basis?
3 Which are top 10 customers by revenue?
4 Which are top 10 products by revenue?
5 Which regions are not doing well revenue wise?
6 Who are the 5 best sales representatives by revenue accruals
for this year?

© Mahindra Satyam 2009 123


CASE STUDY - 4
PURPOSE:-
The aim of the case study is to introduce you to the concepts
and principles involved in dimensional modeling design and
development.

You are expected to produce a small dimensional model based on


the scenario given in following slides.

© Mahindra Satyam 2009 124


CASE STUDY - 4
PROBLEM STATEMENT
Automobile Inventory Assignment (Star Schema) : -
An automobile company (say Tata Motors) wants to develop a data warehouse system to
computerize its inventory management. Here are the details:-

 The company has 3 manufacturing plant units (Pune, Lucknow and JSR)
 The company has 5 cost centers. (cost centers categorized by product group)
(HCV,MCV,LCV, Tata Indica and Tata Safari)
 Each plant has several store locations
 The company has 5 product groups. Each product group has several models

© Mahindra Satyam 2009 125


CASE STUDY - 4

Business Questions to be answered : -


1 What is the total inventory quantity and amount for each plant location at opening of
each month?
2 How much is the inventory cost for each cost center for each quarter end?
3 Which are the top 10 products that has maximum inventory cost at opening of each
month?
4 What is the total inventory quantity and amount for each store location at opening of
each month?

© Mahindra Satyam 2009 126


Q&A

© Mahindra Satyam 2009 127


Thank you

© Mahindra Satyam 2009 128

You might also like