IBM Data Model1
IBM Data Model1
Introduction
www.ilearnonline.co.in
Agenda
Data Modeling Different types of models Operational Systems 3rd NF (ER Modeling) Enterprise Data Modeling Modeling Lifecycle Modeling stakeholders Validating models Questions..
www.ilearnonline.co.in
Model
Many perspectives of business How it works can be defined by the process model How the organization is organized can be defined by the org chart Where the organizations operate can be defined / explained by location or Geo map To understand the data flow of the application you can create data flow diagram What information needs to run the business is defined by Data Model
www.ilearnonline.co.in
Types of Models
Define main terms and definitions for the high level entities. Not every project will have this, only if the scope is big then we can have this. Important entities which holds the application together Business concepts and rules Identifying the relationship between each of the entities. Complete detail requirements, resolving M:M relationships identified in Conceptual Model. Detailing all the attributes needed for every entity Providing relationships in terms of PK and FK.
Conceptual (Overview)
Logical
Physical Model Exploiting the RDBMS to take the advantage of the model.
www.ilearnonline.co.in
Contextual
Conceptual
Logical Usually 1:5 to 1:7 ratio of entities between different models. Physical
www.ilearnonline.co.in
A data model is the place through which you maintain information about things (entities) The facts about those things are nothing but rows. Its a real thing, not a technology Its stable until business does not change, if business changes then the data model has to be flexible enough to accommodate that change. It drives the consistency of dealing with information It should always start simple. Data model is relevant to the business As the business rules and requirements gets complex, the painful detail comes
www.ilearnonline.co.in
To be a better data modeler you should have both domain knowledge and good exposure to data modeling concepts Its a living thing, so expect changes for sure. Its our visualization of information and proactive vision of business is key to build model which does not change quite often. Data model is a non technical description of your business in terms of things it need to know about By following few techniques a business analyst can build great data models Collect all the facts about business through knowledge transfer sessions, reading about your business and the process at least 10% of your time. Involve in discussions with various business line mangers to get overview of what they are doing.
www.ilearnonline.co.in
Information Visualization
Business Line of Business
Business Process
Activity
Information DATA
www.ilearnonline.co.in
Information Visualization
HDFC Credit Card Customer Care Customer verification Verifier, address etc Data
www.ilearnonline.co.in
Information Visualization
www.ilearnonline.co.in
Zachman Framework
It explains the major processes Focuses on specific problem area (SCOPE) Master entities are defined At this level we never talk about the data what you are going to store (No details) Excellent starting point for the data modeling Vocabulary, Define the terms.
www.ilearnonline.co.in
Conceptual Model Entities are identified Not Normalized M:M relationships No keys Can list the important attributes Logical data model M:M resolved Shows PK, FK and all attributes Its fully Normalized All attributes are atomic
www.ilearnonline.co.in
Customers / markets
Products
Services
www.ilearnonline.co.in
Create a model to hold all the employee information, career path individual choose. Each career path has certain set of training programs. Individuals can choose training programs based on the current capabilities and future needs. Assign to be participants of individuals to up coming training programs based on the request individual mentioned. You should look for manager recommended candidates for the training program first then fill the requested resources based on the seats in the training program. Create the set of training programs and associate the proposed candidates based on the business rules. System should be able to capture the attendance for the training program.
www.ilearnonline.co.in
www.ilearnonline.co.in
Logical Model
www.ilearnonline.co.in
Requirement gathering and understanding the business process is the foundation to get the right data model. Gather the information Questions Answers Analyze and confirm Add to model Make the above 4 steps as an iterative process before freezing the model. Apply the business scenarios and see whether the data model accommodates the same.
www.ilearnonline.co.in
Customer
Order contains
Product
www.ilearnonline.co.in
About entities
An entity is a person, place, thing, event or any of the interest to the enterprise, about which facts may be recorded You should name it in a real world term Eventually entity becomes a table in relational database Examples Employee Region Department Customer Entities are not supposed to be designed by using the input or output formats (screens and reports). This wont give us enough flexibility in the data model.
www.ilearnonline.co.in
Entity Definition
www.ilearnonline.co.in
Entity example
Building
A ground based structure which supports the company business operations by housing employees, equipment, or supplies Examples: office buildings, retail store, plants and warehouse Can be owned or leased May or many not have street address To avoid misunderstandings between entities we have to do this Its crucial, so do them early
Key Points
www.ilearnonline.co.in
Entity Types
Every thing else either further describes, associates or classifies the kernel entities
Can exists independently Ideally, its the starting point for modeling Associative Relates two other entities Evolves from resolving M:M relationships Important associates shown in conceptual, remaining will be detailed in the logical model Types Kind of Kernel entity that classifies or categorizes other entities Example (Attribute Customer Type in customer table , or Account Type and Account table) Transaction tables
www.ilearnonline.co.in
Attributes
A property of a thing that can be expressed as a piece of information one of the facts about things that must be maintained Properties of the entities Example for customer entity, following are the attributes
Questions to ask Is it a fundamental attribute or could it be derived attribute Business Rules association at the attribute level. Exceptions
www.ilearnonline.co.in
An association between two things (entities) is called a relation We have three different types of relationships in RDBMS
1:1 (One to One) rare 1:M (One to Many) common M:M (Many to Many) more in conceptual model, none in Logical model and Physical model 1:1 (Person to PAN ID) 1:M (Customer to Phone) M:M (Doctor and Patient)
Examples
M:M relationships
Only RDBMS supports the M:M relationship Only in the conceptual model, we can have this relationship In the logical model, we have to resolve this relationship. We resolve M:M by using associate table concept.
doctor
Patient
doctor
Patient
www.ilearnonline.co.in
M:M relationship
Always M:M is intersected by some other parameter, in this case its the orders. In the case in survey you will have multiple questions, so the set of questions becomes the parameter. In this case a specific course can be conducted in more than one branch. In one branch you will conduct different courses.
www.ilearnonline.co.in
Special cases
Recursion
Empno and mgrno are stored in the same entity Mgrno is also an employee number Usually this goes as self referential integrity constraint We define this as a foreign key which refers the PK of the same table. A recursive relationship is fully optional.
customer
account
phone
Call records
www.ilearnonline.co.in
Sample Tables
Cust_id 100 101 102 Cust_name Citi HSBC SBI Email id [email protected] [email protected] [email protected] Contact name Bill H Tim D Ram K Cust_since 10-JAN-08 15-APR-10 12-APR-11 Act_id 1234 1235 1236 1237 Call_id 56789 56790 --------Duration 14 15 --------Phone_no 123456776 987654321 --------098756782 1234 Cust_id 100 101 100 102 No_of_phones 2 3 2 1
Phone_no
123456776 987654321 456780987
Act_id
1234 1235 1234
Current system is generating bills based on the customer id. Today, Bank A bought the Bank B, then we should generate the bills bank B to bank A only. How do we change this model.
www.ilearnonline.co.in
Normalization
Its a methodology we follow in order to make sure there is no redundant data available in the data model. Its a part of your logical model process with in database design Advantages: Reduced space in the db Transaction speed increases Disadvantage Complex queries i.e. a query which has more number of table joins will tend to impact the query performance.
customer
Cust_id Cust_name Cust_dob Cust_phone Cust_email Cust_city
Assume we have 10000 rows in this table. We have customers from 5 different cities. Mumbai has 2000 customers. One of the city name changed again (Mumbai -- > ) In the real world one activity happen. What statement you will issue to record the same change on our data. How many records you changed?
www.ilearnonline.co.in
Normalization process
Its about breaking entities into their most granular form 1st Normal Form (1NF) Every attribute must be atomic Repeating attributes moved to a separate entity 2nd Normal form (2NF) All the attributes should be functionally / partially dependent on the primary key / concatenated 3rd Normal form (3NF) All the attributes should be primarily dependent on the primary key
No repeating elements or groups of elements 1st NF No partial dependencies on a concatenated key -- 2nd NF No dependencies on non-key attributes 3rd NF
www.ilearnonline.co.in
Table to be normalized
CUSTOMER VEHICLE COST DATE_OUT DATE_IN CUSTOMER_ PHONE CUSTOMER_ CITY
Dolye, Dawn
Davidow, Joel Fox, Valerie Vidal, Alina
Ford
chevy Suzuki Honda
59.99
79.99 69.99 39.99
10-SEP-01
12-OCT-01 15-NOV-01 24-NOV-01
12-SEP-01
15-OCT-01 19-NOV-01 25-NOV-01
123789489
879393399 89303809 74990039
DALLAS
NEW YORK DALLAS NEW YORK
Uma T
Dolye, Dawn
Honda
Ford
39.99
59.99
23-APR-11
15-jun-11
25-APR-11
19-jun-11
89009000
123789489
New york
DALLAS
This excel kind of data is tracking all the transactions happens when a customer rents the vehicle. It has some customer information, vehicle information and the rented out and returned back data. Normalize this table based on your assumptions.
www.ilearnonline.co.in
Normalized model
www.ilearnonline.co.in
Tables to analyze
Store name ABC Stores Address 123, MGRoad, Chennai 124, 5th Cross, Anna Nagar, CHN Jan_sales 345609 Feb_Sales 94040 Mar_sales 45958 Apr_Sales 748490 May_sales 849938 June_sales 84949
BBC Stores
849409
440400
9840940
89989
456655
23455
www.ilearnonline.co.in
Agent is resource through we which we sell the products to our end customers. With out agent you cannot sell any product we have. Products have different life spans, minimum payment, minimum terms etc. Product can be sold during its offer only. Company provides the training to agents when a new product has been launched. Only agents who completed the training can sell that product. When a customer buys a product through an agent, then we create a contract called as policy. All the policy numbers will uniquely identify a product, a customer, agent. We get revenue after getting the contract and have to pay the commission to agent. This commission depends on premium then pay. System should be able to track the premium payment data and intimate to respective agents to do a follow up for the payments which are due. Need to stored the customer multiple address, agents family, addresses and bank /account information for payment processing. Organization should be able to generate the expected commission for next month based on the policy premiums we expect.
www.ilearnonline.co.in
agent commission
Policy transactions
www.ilearnonline.co.in
Normalization
1NF, 2NF, 3NF are breaking entities into their most granular form 4NF and 5NF also about granularity, but only the granularity of associative entities with 3 or more parents
AB
BC
AB 5NF
BC
CA
Example
This is OK if. Any combination is valid
Program
Worker
Role
Assignment
BUT.. Workers have defined role Programs require certain roles Workers are assigned programs independent of roles Then we will end up writing lots and lots of coding to make sure to implement these business rules. If we normalize it some of the rules are automatically taken care of
Modeling Approach
Top Down
Good way of getting a model We will be thinking out of the box because we talk about entities, how two entities relates to each other etc Think about the business in broader sense by using the Business Analysts / subjects. Most of the companies follow the top down approach and get validate based on the scenarios.
Bottom Up
Based on the output we will create the model Kind of reverse engineering (some times) Easy to normalize, but we may miss out on bigger picture.
www.ilearnonline.co.in
)
Internal department
In this scenario, an order can be placed only by customer or by internal department So its mutually exclusive (Solid lines exactly one) The other example is, payment table in telecom billing system can have credit_card, Cash or check. In one payment you will accept either one of these three, so its mutually exculsive
)
Televison Internet
In this scenario, an advertisement can be placed in different media So its mutually inclusive (dotted lines one or more)
state
branches contract Agent
Contract type
category
city
product
Cust_type
www.aroha.co.in
www.ilearnonline.co.in
Dont change the stored data, if you do you loose the history It becomes as is reporting. Add new records when there is a change This way you track the history of changes to that record and when it got changed. If they can correct the changes, then capture the correction date also, so that you have all the information to track back what did happen. Have audit columns in place to capture the time and history.
Comments
www.aroha.co.in
www.ilearnonline.co.in
All most same as Logical Model, with little variation. Taking the advantage of the database what you are dealing with. Make sure all the PK and FK are numbers. (Try a POC before we say this) Example: Should I use partitions or materialized views or external tables etc. Should we use columnar Database (Vertica, Opensource db) for certain data marts. Designing rolling window based on partitions to improve the performance. Creating views to simulate the scenario rather than increasing the database size (especially small tables)
Logical model as the input. Factors of the DB on which we deploy the tables. (Oracle,
SQL Server, DB2, MySQL etc)
Data Velocity factors (Frequency of activities in the business) Initial Load Data & Incremental load planning Data Volume calculation (Capacity Planning) Identifying importance of tables in terms of joins and table data volume Deciding partitions, indexes, views, materialized views Implementing back up recovery mechanism for the system.
hp sells the products through channel partners. The scope of the project is related to only sales. Through presales system hp generates the quotes and provide the same to channel partners, channel partners sells the products to end customers based on the quote (This is outside of this system) Certain channel partners can sell only certain products of hp. End customer who buy hp products through channel partner can be an individual or a company (dont want to consider end customers as part of the scope). Based on the sales made by the channel partner, hp have to raise the invoices to channel partner. System should have the ability to store the grade of the channel partner. hp provides the commission to channel partners based on the sales they made automatically once in three months. System should have the ability to store the addresses, different kinds of contacts and bank information of channel partners. Channel partner payment to our invoices and the payment to channel partner commission are tracked in this system only
De-normalization
Is a process where we increase the query performance Especially used for reporting, not used to increase the transaction processing To get best of both worlds, is to create the normalized model for faster transaction processing and take the advantage of oracles materialized views concept to get your reports run faster We are making an another copy of the data, but system takes care of it. This way we dont implement new bugs.
Insurance Business
Company sells insurance to various customers through agents We sell different kind of insurance policies like Risk, kids education, endowment, pension etc Nominations must exists in each one of the policy. Max of 2 nominations can exists of every policy. We can surrender the policy if we paid the minimum number of premiums. This number varies between policies Customers can get loan against the policy they have. Agent commission should be processed by the system. Based on the policy as well as the premium paid, the commission to agent differs. When we surrender the policy, the charges are applicable. When we pay the premium late the interest should be added to the payment. Build a conceptual model and logical model for the above mentioned business.
Telephone Billing
Customer comes and buys the telephone Customer can have multiple phones. Customers can be either corporate or individuals Billing flexibility should be available based on certain phone numbers so that I can send the consolidated bill to different groups with in the corporation Billing cycle can be decided by customer Must maintain different addresses of the customer Always we have to mail the bill to Billing Address, should have flexibility to send it to an electronic address also. Customer can subscribe for multiple service like wireless, internet. Customer can select a specific plan (rate)
Assume, we want to create a data model which takes care of a retail chain. We want to create the model which stores the supplier information, order management through which we place orders to all the suppliers. We have one warehouse from which point, we distribute to various stores in the city and store the point of sale. Customer can return the goods with in 7 days. Some of the products cannot be taken back. One employee can work multiple stores.
Trg fact
Enroll fact
Premium fact
Comm. fact
claims fact
workflow fact
product
time
Payment type
channel
location
Tran_type_id desc
Cust Policies
cust_dim
txn_fact Cust_id Name Address Phone State_id
emp_dim
policy_dim
Quarter_lookup
Quarter_id Quarter Year_id month_lookup Month_id Cal_month Quarter_id
Week_lookup
Week_id Cal_week Month_id Cal_date_id Date Week_id
Online Sales
Call Center
Thank you
Questions