0% found this document useful (0 votes)
6 views

DB Lecture W09a Normalization

Uploaded by

Zeshan Khan
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

DB Lecture W09a Normalization

Uploaded by

Zeshan Khan
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 15

Database Systems

Databases Systems 1
Normalization
Normalization
• The important objective is to deal with anomalies
• Normalization is not must – but highly recommended
– Design a database without normalization – will still work
– If normalization is not done while designing a database – will
create many problems
• The conclusion – go for Normalization
– The process of organizing data to minimize redundancy is called
normalization.
• The goal of database normalization is
– to decompose relations with anomalies
– produce smaller, well-structured relations.

Databases Systems 2
Normalization
Normalization –– Cont…
Cont…
• Normalization usually involves
– dividing large tables into smaller (and less redundant) tables
– and defining relationships between them.

• The objective is to isolate data so that


– additions, deletions, and modifications of a field can be made in
just one table and
– propagated through the rest of the database via the defined
relationships.
• E. F. Codd, the inventor of the relational model,
– introduced the concept of normalization
– known as the First Normal Form (1NF) in 1970

Databases Systems 3
Normalization
Normalization –– Cont…
Cont…
– Codd went on to define the Second Normal Form (2NF) and
Third Normal Form (3NF) in 1971,
– Codd and Raymond F. Boyce defined the Boyce-Codd Normal
Form (BCNF) in 1974 – the 4th NF.
– Higher normal forms (e.g., 5th NF) were defined by other
theorists in subsequent years.
• Informally, a relational database table (the computerized
representation of a relation)
– Often described as "normalized" if it is in the Third Normal Form.
– Most 3NF tables are free of insertion, update, and deletion
anomalies

Databases Systems 4
Anomalies
Anomalies
• Anomaly (flaw)
– The situation that can convert or make a database incorrect /
inconsistent.
• Anomalies will occur easily, if DB is not designed
carefully
– It can be controlled if aware of anomalies

• Types of Anomalies
1. Repetition Anomaly
2. Update Anomaly
3. Insertion Anomaly
4. Deletion Anomaly
Databases Systems 5
Repetition
Repetition Anomaly
Anomaly
• The ENAME, TITLE, SAL attribute values are repeated
for each project that the employee is involved in.
– Waste of space
– Complicates updates
EMP

ENO ENAME TITLE SAL PNO RESP DUR

E1 J. Doe Elect. Eng. 40000 P1 Manager 12


E2 M. Smith Analyst 34000 P1 Analyst 24
E2 M. Smith Analyst 34000 P2 Analyst 6
E3 A. Lee Mech. Eng. 27000 P3 Consultant 10
E3 A. Lee Mech. Eng. 27000 P4 Engineer 48
E4 J. Miller Programmer 24000 P2 Programmer 18
E5 B. Casey Syst. Anal. 34000 P2 Manager 24
E6 L. Chu Elect. Eng. 40000 P4 Manager 48
E7 R. Davis Mech. Eng. 27000 P3 Engineer 36
E8 J. Jones Syst. Anal. 34000 P3 Manager 40

Databases Systems 6
Update
Update Anomaly
Anomaly
• If any attribute of Employees Skills (say Employee
Address of an employee) is updated,
– Multiple tuples have to be updated to reflect the change.

– Employee 519 is shown as having different addresses on


different records

Databases Systems 7
Insertion
Insertion Anomaly
Anomaly
• Until the new faculty member, Dr. Newsome, is assigned
to teach at least one course, his details cannot be
recorded

Databases Systems 8
Deletion
Deletion Anomaly
Anomaly
• If an engineer, who is the only employee on a project, leaves the
company, his personal information cannot be deleted, or the
information about that project is lost.
– May have to delete many tuples.
EMP

ENO ENAME TITLE SAL PNO RESP DUR

E1 J. Doe Elect. Eng. 40000 P1 Manager 12


E2 M. Smith Analyst 34000 P1 Analyst 24
E2 M. Smith Analyst 34000 P2 Analyst 6
E3 A. Lee Mech. Eng. 27000 P3 Consultant 10
E3 A. Lee Mech. Eng. 27000 P4 Engineer 48
E4 J. Miller Programmer 24000 P2 Programmer 18
E5 B. Casey Syst. Anal. 34000 P5 Manager 24
E6 L. Chu Elect. Eng. 40000 P4 Manager 48
E7 R. Davis Mech. Eng. 27000 P6 Engineer 36
E8 J. Jones Syst. Anal. 34000 P3 Manager 40

Databases Systems 9
What
What to
to do?
do?
• Take each relation individually and improve it in terms of
the desired characteristics
• Major activity of normalization is the Decomposition
– The process to break a relation
– The popular approach used is the Universal Relation Approach
• Starting from a single (universal) relation moving towards more
relations, until no anomalies exist.
1. The concept is to consider the whole database as a single relation
2. Analyze and remove the anomalies
 The single relation is divided (decomposed) into two relations
3. Repeat step 2 for individual relation
4. The process continues until there remains no anomalies
Databases Systems 10
Normalization
Normalization Issues
Issues
1. How do we decompose a schema into a desirable
normal form?
2. What criteria should the decomposed schemas follow in
order to preserve the semantics of the original schema?
A. Lossless Decomposition: No loss of information, i.e., when a
relation is decomposed into two relations, as a result, no
information should go loss.
– i.e., obtain the original relation in terms of
i. Scheme of the relation
ii. Data of the relation

Databases Systems 11
Normalization
Normalization Issues
Issues –– Cont…
Cont…
– For example; a relation with 15 attributes originally
• Must obtain the same number of attributes after decomposition
– Original 100 records in a relation
• Must obtain the same original number of records after decomposition
– Note: Less or extra no. of attributes / records
• is loss of information

B. Dependency preservation: The constraints (i.e., dependencies)


that hold on the original relation should be enforceable by
means of the constraints (i.e., dependencies) defined on the
decomposed relations.

Databases Systems 12
Dependency
Dependency Structure
Structure
• Normalization process apply on DB / Tables schemes
– Based on Dependencies, i.e., starting from a universal relation
and decomposed in 2, 4 5, 6 etc relations
• The question is that
– On which basis such decomposition takes place?

• The answer to the question is the “Dependencies”


• A designer CANNOT create dependencies
– Dependencies already exists in the environment / system that is
being developed but it needs to be identified by the designer
• After identification, described it in technical language and
• Then use the dependencies in a DB Design

Databases Systems 13
Dependency
Dependency
• A relation R defined on attributes A (A1, A2, A3 ..., An), if X

 A, Y  A, and if for each value of X there is a unique


value in Y, then X functionally determines Y, i.e., X  Y
– Attribute Y is functionally dependent on attribute X
– That for each value of X there should be a unique value of Y

• For example, PROJ


PNO PNAME BUDGET

• For PNO there exist PNAME and BUDGET


– PNAME & BUDGET both are functionally dependent on PNO
– PNO functionally determine PNAME and BUDGET.

Databases Systems 14
Form
Form of
of Dependencies
Dependencies
• Functional Dependency (FD)
– The first three normal forms (1NF, 2NF, 3NF) and BCNF of
normalization process depends on FDs
• Multi-valued Dependency (MVD)
– Fourth normal form (4NF) is based on MVD

• Projection-Join Dependency (P-JD)


– Fifth normal form (5NF) is based on P-JD

Databases Systems 15

You might also like