0% found this document useful (0 votes)

56 views36 pages

Data Warehousing & DATA MINING (SE-409) : Lecture-2

Here are the key rules for first normal form (1NF): - Each column should contain a single value (atomicity) - no repeating groups of values. - The domain (set of possible values) of each column should be well-defined and not change for different rows. - Each row must be uniquely identifiable by its primary key. - Columns have unique names to avoid confusion. The goal of 1NF is to eliminate repeating groups and ensure each cell contains a single value. If a table follows these rules, it is considered to be in first normal form. Normalization helps reduce data redundancy and ensures data dependencies make logical sense.

Uploaded by

Huma Qayyum MohyudDin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

56 views36 pages

Data Warehousing & DATA MINING (SE-409) : Lecture-2

Uploaded by

Huma Qayyum MohyudDin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 36

Data Warehousing & DATA

MINING (SE-409)
Lecture-2
Introduction and Background

Huma Ayub
Software Engineering department

University of Engineering and Technology, Taxila

How is it Different?
• Starts with a 6x12 availability requirement ... but
7x24 usually becomes the goal.
 Decision makers typically don’t work 24 hrs a day and 7
days a week. An ATM (OLTP) system does.

 Once decision makers start using the DWH, and start

gaining the benefits, they start liking it…

 Start using the DWH more often, till want it available

100% of the time.
 For business across the globe, 50% of the world may be
sleeping at any one time, but the businesses are up 100%
of the time.
 100% availability not a minor task, need to take into
account loading strategies, refresh
DWH-Ahsan Abdullahrates etc. 2
How is it Different?
• Does not follows the traditional development
model
Requirements

 Program

Classical SDLC

 Requirements gathering
 Analysis
 Design
 Programming
 Testing
 Integration
 Implementation
DWH-Ahsan Abdullah 3
How is it Different?
• Does not follows the traditional development
model
DWH

Program

 Requirements
DWH SDLC (CLDS)

 Implement warehouse
 Integrate data
 Test for biasness
 Program w.r.t data
 Design DSS system
 Analyze results
 Understand requirement
DWH-Ahsan Abdullah 4
Data Warehouse Vs. OLTP

OLTP (On Line Transaction Processing)

Select tx_date, balance from tx_table
Where account_ID = 23876;

DWH-Ahsan Abdullah 5
Data Warehouse Vs. OLTP

DWH
Select balance, age, sal, gender from
customer_table, tx_table
Where age between (30 and 40) and
Education = ‘graduate’ and
CustID.customer_table =
Customer_ID.tx_table;

DWH-Ahsan Abdullah 6
Data Warehouse Vs. OLTP
OLTP: OnLine Transaction Processing (MIS or Database System)

OLTP DWH
Primary key used Primary key NOT used
No concept of Primary Index Primary index used
Few rows returned Many rows returned

May use a single table Uses multiple tables

High selectivity of query Low selectivity of query
Indexing on primary key Indexing on primary index
(unique) (non-unique)

DWH-Ahsan Abdullah 7
Putting the pieces together

Data Data Warehouse Server OLAP Servers Clients

(Tier 0) (Tier 1) (Tier 2) (Tier 3)


Semistructured MOLAP
Sources Query/Reporting

www data
Meta
Data 
 Extract
Data 
 
Analysis







 Archived
Transform
Load Warehouse 
 data
(ETL) ROLAP Business
IT Data Mining
Users
Users
Operational
Data Bases 

Data sources Data Marts  Tools
Business Users

DWH-Ahsan Abdullah 8
Types & Typical Applications of DWH

DWH-Ahsan Abdullah 9
Types of data warehouse

• Financial
• Telecommunication
• Insurance
• Human Resource
• Global
• Exploratory

DWH-Ahsan Abdullah 10
Types of data warehouse
Financial
 First data warehouse that an organization
builds. This is appealing because:

 Nerve center, easy to get attention.

 In most organizations start work from smallest data

set. [due to risk factor, more complexity]

 Touches all aspects of an organization, with a

common denomination i.e. money.

DWH-Ahsan Abdullah 11
Types of data warehouse
Telecommunication
Controlled by complete volume of data.

Many ways to accommodate call level detail:

 Only a few months of call level detail,

 Storing lots of call level detail scattered over different
storage media,

 Storing only selective call level detail, etc.

 Unfortunately, for many kinds of processing, working at

an aggregate level is simply not possible.

DWH-Ahsan Abdullah 12
Types of data warehouse
Insurance
Insurance data warehouses are similar to other
data warehouses BUT with a few exceptions.
Stored data that is very, very old, used for actuarial
processing.(RISK ASSESMENT)
Typical business may change dramatically over
last 40-50 years, but not insurance.
In retailing or telecomm there are a few important
dates, but in the insurance environment there are
many dates of many kinds.

DWH-Ahsan Abdullah 13
Types of data warehouse
Insurance
Insurance data warehouses are similar to other
data warehouses BUT with a few exceptions.
Long operational business cycles, in years.
Processing time in months. Thus the operating
speed is different.
Transactions are not gathered and processed, but
are in kind of “frozen”.
Thus a very unique approach of design &
implementation.

DWH-Ahsan Abdullah 14
Typical Applications
Impact on organization’s core business is to streamline
and maximize profitability.

• Fraud detection.
• Profitability analysis.
• Direct mail/database marketing.
• Credit risk prediction.
• Customer retention modeling.
• Yield management.
• Inventory management.

DWH-Ahsan Abdullah 15
Typical Applications
Fraud detection

• By observing data usage patterns.

• People have typical purchase patterns.
• Deviation from patterns.
• Certain cities notorious for fraud.
• Certain items bought by stolen cards.
• Similar behavior for stolen phone cards.

DWH-Ahsan Abdullah 16
Typical Applications
Profitability Analysis
• Every Banks know if they are profitable or not.
• Don’t know which customers are profitable.
• Typically more than 50% are NOT profitable.
• Don’t know which one?
• Balance is not enough, transactional behavior is
the key.
• Restructure products and pricing strategies.
• Life-time profitability models (next 3-5 years).
DWH-Ahsan Abdullah 17
Typical Applications
Direct mail marketing

• Targeted marketing.
• Offering high bandwidth package NOT to all
users.
• Know from call detail records of web surfing.
• Saves marketing expense, saving pennies.

DWH-Ahsan Abdullah 18
Typical Applications
Credit risk prediction

• Who should get a loan?

• Customer separation i.e. stable vs. rolling.
• Qualitative decision making NOT subjective.
• Different interest rates for different customers.
• Do not fund bad customer on the basis of good.

DWH-Ahsan Abdullah 19
Normalization

Ahsan Abdullah 20
Normalization
What is normalization?
What are the goals of normalization?
 Eliminate redundant data.
 Ensure data dependencies make sense.

What is the result of normalization?

What are the levels of normalization?

Ahsan Abdullah 21
Rules for First Normal Form
The first normal form expects you to follow a few simple rules while designing your
database, and they are:

Rule 1: Single Valued Attributes

Each column of your table should be single valued which means they should not
contain multiple values. We will explain this with help of an example later, let's see
the other rules for now.

Rule 2: Attribute Domain should not change

This is more of a "Common Sense" rule. In each column the values stored must be
of the same kind or type.

For example: If you have a column dob to save date of births of a set of people,
then you cannot or you must not save 'names' of some of them in that column along
with 'date of birth' of others in that column. It should hold only 'date of birth' for all
the records/rows.
Rules for First Normal Form
Rule 3: Unique name for Attributes/Columns
This rule expects that each column in a table should have a unique name. This is to
avoid confusion at the time of retrieving data or performing any other operation on
the stored data.
If one or more columns have same name, then the DBMS system will be left
confused.

Rule 4: Order doesn't matters

This rule says that the order in which you store the data in your table doesn't matter.

Time for an Example

Here is our table, with some sample data added to it.
Rules for First Normal Form
roll_no name subject
101 Akon OS, CN
103 Ckon Java
102 Bkon C, C++
How to solve this Problem?
It's very simple, because all we have to do is break the values into
atomic values.
Here is our updated table and it now satisfies the First Normal Form.

roll_no name subject

101 Akon OS
101 Akon CN
103 Ckon Java
102 Bkon C
102 Bkon C++
Second Normal Form
• For a table to be in the Second Normal form, it
should be in the First Normal form and it should not
have Partial Dependency.
• Partial Dependency exists, when for a composite
primary key, any attribute in the table depends only
on a part of the primary key and not on the complete
primary key.
• To remove Partial dependency, we can divide the
table, remove the attribute which is causing partial
dependency, and move it to some other table where
it fits in well.
Let's create another table for Subject, which will have subject_id and subject_name fields
and subject_id will be the primary key.
subject_id subject_name
1 Java
2 C++
3 Php
Let's create another table Score, to store the marks obtained by students
in the respective subjects. We will also be saving name of the
teacher who teaches that subject along with marks.

score_id student_id subject_id marks teacher

1 10 1 70 Java
Teacher
2 10 2 75 C++ Teacher
3 11 1 80 Java Teach
In the score table we are saving the student_id to know which student's
marks are these and subject_id to know for which subject the marks
are for.

Together, student_id + subject_id forms a Candidate Key(learn

about Database Keys) for this table, which can be the Primary key.

Confused, How this combination can be a primary key?

See, if I ask you to get me marks of student with student_id 10, can you
get it from this table? No, because you don't know for which subject.
And if I give you subject_id, you would not know for which student.
Hence we need student_id + subject_id to uniquely identify any row.
But where is Partial Dependency?
• Now if you look at the Score table, we have a column
names teacher which is only dependent on the
subject, for Java it's Java Teacher and for C++ it's C++
Teacher & so on.
• Now as we just discussed that the primary key for
this table is a composition of two columns which
is student_id & subject_id but the teacher's name
only depends on subject, hence the subject_id, and
has nothing to do with student_id.
• This is Partial Dependency, where an attribute in a
table depends on only a part of the primary key and
not on the whole key.
How to remove Partial
Dependency?
There can be many different solutions for this, but out objective is to remove teacher's
name from Score table.
The simplest solution is to remove columns teacher from Score table and add it to the
Subject table. Hence, the Subject table will become:
And our Score table is now in the second normal form, with no partial dependency.

subject_id subject_name teacher

1 Java Java Teacher

2 C++ C++ Teacher

3 Php Php Teacher

How to remove Partial
Dependency?
And our Score table is now in the second normal form, with no partial dependency.

score_id student_id subject_id marks

1 10 1 70
2 10 2 75
3 11 1 80
Third Normal Form (3NF)

• Requirements for Third Normal Form

• For a table to be in the third normal form,
• It should be in the Second Normal form.
• And it should not have Transitive Dependency.
• By transitive functional dependency, we mean
we have the following relationships in the
table: A is functionally dependent on B, and B
is functionally dependent on C. In this case, C
is transitively dependent on A via B.
• 3rd Normal Form Example
• Consider the following example:

MA209-001 SK-15eT User Manual
50% (2)
MA209-001 SK-15eT User Manual
46 pages
ETL Testing - PPT
No ratings yet
ETL Testing - PPT
77 pages
MAN799600011-2-DVX806 VATPD HVM-TATA697NA+CT100+TATA AXLE March2013
No ratings yet
MAN799600011-2-DVX806 VATPD HVM-TATA697NA+CT100+TATA AXLE March2013
98 pages
Assembly Line Balancing
100% (1)
Assembly Line Balancing
5 pages
cbr600rr 2007
No ratings yet
cbr600rr 2007
259 pages
Method of Statement For Concrete T Walls
100% (2)
Method of Statement For Concrete T Walls
6 pages
Arduino: Introduction & Programming: Course Instructor
100% (1)
Arduino: Introduction & Programming: Course Instructor
29 pages
SE-6104 Data Mining and Analytics: Lecture # 12 Rule Based Classification
No ratings yet
SE-6104 Data Mining and Analytics: Lecture # 12 Rule Based Classification
62 pages
Medical Gas System
No ratings yet
Medical Gas System
3 pages
Avoiding Electrocution Hazards
No ratings yet
Avoiding Electrocution Hazards
11 pages
Analysis of Algorithm and Design
No ratings yet
Analysis of Algorithm and Design
64 pages
Twin Cam Tech Tip 09
No ratings yet
Twin Cam Tech Tip 09
30 pages
Design Standards For Cross Country Pipe Lines (Unloading Lines and Pre Cooling Lines)
No ratings yet
Design Standards For Cross Country Pipe Lines (Unloading Lines and Pre Cooling Lines)
14 pages
Portable Power 1150: Operator'S
No ratings yet
Portable Power 1150: Operator'S
15 pages
CS403 IMP Notes For Final
No ratings yet
CS403 IMP Notes For Final
17 pages
Design and Analysis of Algorithm Course Code: 5009
No ratings yet
Design and Analysis of Algorithm Course Code: 5009
50 pages
Unit 1 Data Warehouse
No ratings yet
Unit 1 Data Warehouse
87 pages
Fundamental and Advanced Database Tutorial
No ratings yet
Fundamental and Advanced Database Tutorial
93 pages
Design and Analysis of Algorithm Course Code: 5009
No ratings yet
Design and Analysis of Algorithm Course Code: 5009
46 pages
Database Management System and ER Modelling
No ratings yet
Database Management System and ER Modelling
48 pages
PDF Document 2
No ratings yet
PDF Document 2
72 pages
Manual Programacion Neveras MTLB30ENG04
No ratings yet
Manual Programacion Neveras MTLB30ENG04
11 pages
IPC SpecTree Jan13 PDF
100% (1)
IPC SpecTree Jan13 PDF
1 page
Previews 2111727 Pre
No ratings yet
Previews 2111727 Pre
9 pages
Unit-1 4
No ratings yet
Unit-1 4
54 pages
Week01 89407
No ratings yet
Week01 89407
57 pages
Adbms Imp
No ratings yet
Adbms Imp
25 pages
DDM Print Out
No ratings yet
DDM Print Out
57 pages
Basis Data - Database Design and SQL
No ratings yet
Basis Data - Database Design and SQL
72 pages
Module 1
No ratings yet
Module 1
78 pages
RD SQL Notes
No ratings yet
RD SQL Notes
119 pages
Dbms
No ratings yet
Dbms
48 pages
As Chapter 10
No ratings yet
As Chapter 10
46 pages
Lecture 02
No ratings yet
Lecture 02
46 pages
OBIEE - Quick Guide
No ratings yet
OBIEE - Quick Guide
78 pages
Database Analysis & Design
No ratings yet
Database Analysis & Design
57 pages
Unit 1dbms Merged
No ratings yet
Unit 1dbms Merged
30 pages
Data Base - Database - Databse Chapter 9
No ratings yet
Data Base - Database - Databse Chapter 9
54 pages
Database Management Systems
No ratings yet
Database Management Systems
44 pages
RDBMS
No ratings yet
RDBMS
46 pages
Motorcycle: Increasing Craze of Royal Enfield in Kathmandu
No ratings yet
Motorcycle: Increasing Craze of Royal Enfield in Kathmandu
7 pages
3dbmsnormalization 150910064638 Lva1 App6892
No ratings yet
3dbmsnormalization 150910064638 Lva1 App6892
26 pages
Database Systems
No ratings yet
Database Systems
9 pages
Schema Diagram
No ratings yet
Schema Diagram
37 pages
Database Design and Development Week 1
No ratings yet
Database Design and Development Week 1
64 pages
Chapter 5-T323 Introduction To The Relational Database
No ratings yet
Chapter 5-T323 Introduction To The Relational Database
37 pages
Data Warehousing & DATA MINING (SE-409) : Lecture-4
No ratings yet
Data Warehousing & DATA MINING (SE-409) : Lecture-4
28 pages
Business Computing: 1. Computing:-Computing Applies A Set of
No ratings yet
Business Computing: 1. Computing:-Computing Applies A Set of
35 pages
C.B.Hariharan: Career Objective
No ratings yet
C.B.Hariharan: Career Objective
6 pages
Entity Relationships To Normal Forms
No ratings yet
Entity Relationships To Normal Forms
57 pages
Digi SM-110 Operation and Programming Manual
No ratings yet
Digi SM-110 Operation and Programming Manual
7 pages
Data Warehousing and Data Mining
No ratings yet
Data Warehousing and Data Mining
48 pages
Elective-I Advanced Database Management Systems
No ratings yet
Elective-I Advanced Database Management Systems
67 pages
Design and Analysis of Algorithm Course Code: 5009
No ratings yet
Design and Analysis of Algorithm Course Code: 5009
59 pages
Data Warehouse: Bilal Hussain
No ratings yet
Data Warehouse: Bilal Hussain
34 pages
SQL Basics
No ratings yet
SQL Basics
6 pages
Dr. Huma Qayyum Department of Software Engineering Huma - Ayub@uettaxila - Edu.pk
No ratings yet
Dr. Huma Qayyum Department of Software Engineering Huma - Ayub@uettaxila - Edu.pk
20 pages
Dr. Huma Qayyum Department of Software Engineering Huma - Ayub@uettaxila - Edu.pk
No ratings yet
Dr. Huma Qayyum Department of Software Engineering Huma - Ayub@uettaxila - Edu.pk
20 pages
DBMS & SQL
No ratings yet
DBMS & SQL
14 pages
Oil Transformer Catalog 2010.compressed
No ratings yet
Oil Transformer Catalog 2010.compressed
8 pages
Lec 5DataWarehousing Part1
No ratings yet
Lec 5DataWarehousing Part1
13 pages
PTCL Positioning Statement
No ratings yet
PTCL Positioning Statement
5 pages
MS SQL
No ratings yet
MS SQL
95 pages
MS SQL
No ratings yet
MS SQL
95 pages
A Relational Model of Data For Large Shared Data Banks
100% (1)
A Relational Model of Data For Large Shared Data Banks
35 pages
BS 1217 2008 Capillary Absorption Test
No ratings yet
BS 1217 2008 Capillary Absorption Test
1 page
DMDW 6
No ratings yet
DMDW 6
41 pages
SSAD Chapter - 4 Note
No ratings yet
SSAD Chapter - 4 Note
6 pages
Database Management Systems
No ratings yet
Database Management Systems
42 pages
Data Management and Database Design: INFO 6210 Week #4
No ratings yet
Data Management and Database Design: INFO 6210 Week #4
44 pages
1 DX450 SCT Leaflet en W3636
No ratings yet
1 DX450 SCT Leaflet en W3636
4 pages
DBMS
No ratings yet
DBMS
7 pages
Cs 614
No ratings yet
Cs 614
12 pages
Imp Ans
No ratings yet
Imp Ans
7 pages
Data Warehouse 1
No ratings yet
Data Warehouse 1
6 pages
Cs 614
No ratings yet
Cs 614
10 pages
Butler Presentation PDF
No ratings yet
Butler Presentation PDF
25 pages
Datawarehousing Key Concepts - Latest
No ratings yet
Datawarehousing Key Concepts - Latest
5 pages
Database Management System: Introduction To DBMS Ms. Deepikkaa.S
No ratings yet
Database Management System: Introduction To DBMS Ms. Deepikkaa.S
45 pages
50022a100 Quincy Pilot Valves
No ratings yet
50022a100 Quincy Pilot Valves
4 pages
DBMS Notes 2
No ratings yet
DBMS Notes 2
7 pages
Maximizing The Efficiency Using Montgomery Multipliers On FPGA in RSA Cryptography For Wireless Sensor Networks
No ratings yet
Maximizing The Efficiency Using Montgomery Multipliers On FPGA in RSA Cryptography For Wireless Sensor Networks
14 pages
VLB Janakiammal College of Engineering and Technology
No ratings yet
VLB Janakiammal College of Engineering and Technology
54 pages
Unit - I: DBMS Concept Introduction
No ratings yet
Unit - I: DBMS Concept Introduction
18 pages
Designing The Data Warehouse Aima Second Lecture
No ratings yet
Designing The Data Warehouse Aima Second Lecture
34 pages
Database Management Systems
No ratings yet
Database Management Systems
38 pages
SE-6104 Data Mining and Analytics: Lecture # 13 Advance Classification
No ratings yet
SE-6104 Data Mining and Analytics: Lecture # 13 Advance Classification
31 pages
Panzieri-Davoli1993 - Chapter - RealTimeSystemsATutorial (2) Realtime
No ratings yet
Panzieri-Davoli1993 - Chapter - RealTimeSystemsATutorial (2) Realtime
28 pages
Lin Ville
No ratings yet
Lin Ville
5 pages
Design and Analysis of Algorithm: Lecture 13, 14 Backtracking
No ratings yet
Design and Analysis of Algorithm: Lecture 13, 14 Backtracking
8 pages
Lecture 1: Part I: Emerging Database Technology, Research and Applications
No ratings yet
Lecture 1: Part I: Emerging Database Technology, Research and Applications
11 pages
RDBMS Concepts
No ratings yet
RDBMS Concepts
28 pages
Designing Databases: Data Storage Design Objectives
No ratings yet
Designing Databases: Data Storage Design Objectives
8 pages
Introduction To Real-Time Systems
No ratings yet
Introduction To Real-Time Systems
13 pages
Assignment 2 (16 SE 13)
No ratings yet
Assignment 2 (16 SE 13)
7 pages
NFPA855 Safety 240111
No ratings yet
NFPA855 Safety 240111
1 page
Database Applications 1.1. Introduction To Database Applications 1.1.1. What Is A Database?
No ratings yet
Database Applications 1.1. Introduction To Database Applications 1.1.1. What Is A Database?
8 pages
Chapter 08
No ratings yet
Chapter 08
52 pages
Data and Databases
No ratings yet
Data and Databases
9 pages
Embedded and Real Time System Assignment 4 SE-2K-16 Marks 20
No ratings yet
Embedded and Real Time System Assignment 4 SE-2K-16 Marks 20
1 page
SOBA
No ratings yet
SOBA
6 pages
Learn Data Warehousing in 24 Hours
From Everand
Learn Data Warehousing in 24 Hours
Alex Nordeen
No ratings yet
Evacuation Management Criteria For Landslides Based On Experimental Studies
No ratings yet
Evacuation Management Criteria For Landslides Based On Experimental Studies
12 pages
BS5467 SWA/PVC Cable IEC 60502 600/1000V Current Ratings and Electrical Data
No ratings yet
BS5467 SWA/PVC Cable IEC 60502 600/1000V Current Ratings and Electrical Data
3 pages
Dynamic Model and Control Strategies of Battery-Supercapacitor Hybrid Power Sources For Electric Vehicles: A Review
No ratings yet
Dynamic Model and Control Strategies of Battery-Supercapacitor Hybrid Power Sources For Electric Vehicles: A Review
15 pages
12 Full Deck Tilt
No ratings yet
12 Full Deck Tilt
2 pages

Data Warehousing & DATA MINING (SE-409) : Lecture-2

Uploaded by

Data Warehousing & DATA MINING (SE-409) : Lecture-2

Uploaded by

Data Warehousing & DATA

University of Engineering and Technology, Taxila

 Once decision makers start using the DWH, and start

 Start using the DWH more often, till want it available

OLTP (On Line Transaction Processing)

May use a single table Uses multiple tables

Data Data Warehouse Server OLAP Servers Clients

 Nerve center, easy to get attention.

 In most organizations start work from smallest data

 Touches all aspects of an organization, with a

Many ways to accommodate call level detail:

 Only a few months of call level detail,

 Storing only selective call level detail, etc.

 Unfortunately, for many kinds of processing, working at

• By observing data usage patterns.

• Who should get a loan?

What is the result of normalization?

What are the levels of normalization?

Rule 1: Single Valued Attributes

Rule 2: Attribute Domain should not change

Rule 4: Order doesn't matters

Time for an Example

roll_no name subject

score_id student_id subject_id marks teacher

Together, student_id + subject_id forms a Candidate Key(learn

Confused, How this combination can be a primary key?

subject_id subject_name teacher

1 Java Java Teacher

2 C++ C++ Teacher

3 Php Php Teacher

score_id student_id subject_id marks

• Requirements for Third Normal Form

You might also like