0% found this document useful (0 votes)
22 views

Normalization

Uploaded by

Pradeep Kumar
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views

Normalization

Uploaded by

Pradeep Kumar
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 57

Normalization

Introduction to Normalization
• Normalization: Process of decomposing unsatisfactory "bad" relations by
breaking up their attributes into smaller relations
• Normalization is the process of removing redundant data from your tables
in order to improve storage efficiency, data integrity and scalability.
• This improvement is balanced against an increase in complexity and
potential performance losses from the joining of the normalized tables at
query-time.
• There are two goals of the normalization process:
– Eliminating redundant data (for example, storing the same data in more than one table)
and ensuring data dependencies make sense (only storing related data in a table). Both of
these are worthy goals as they reduce the amount of space a database consumes and
ensure that data is logically stored.
Introduction to Normalization
Normal form: Condition using keys and FDs of a relation to certify whether a relation schema
is in a particular normal form
– 2NF, 3NF, BCNF based on keys and FDs of a relation schema
– 4NF based on keys, multi-valued dependencies
There is a sequence to normal forms:
– 1NF is considered the weakest,
– 2NF is stronger than 1NF,
– 3NF is stronger than 2NF, and
– BCNF is considered the strongest

Also,
– any relation that is in BCNF, is in 3NF;
– any relation in 3NF is in 2NF; and
– any relation in 2NF is in 1NF.
WHY WE NEED NORMALIZATION?

Normalization is the aim of well design Relational Database


Management System (RDBMS). It is step by step set of rules by
which data is put in its simplest forms. We normalize the
relational database management system because of the following
reasons:
– Minimize data redundancy i.e. no unnecessarily duplication of data.
– To make database structure flexible i.e. it should be possible to add new
data values and rows without reorganizing the database structure.
– Complex queries required by the user should be easy to handle.
– Data should be consistent throughout the database i.e. it should not
suffer from anomalies…..
WHY WE NEED NORMALIZATION?

• Updation Anamoly : To update address of a student who occurs twice or more than
twice in a table, we will have to update S_Address column in all the rows, else data
will become inconsistent.
• Insertion Anamoly : Suppose for a new admission, we have a Student id(S_id), name
and address of a student but if student has not opted for any subjects yet then we have
to insert NULL there, leading to Insertion Anamoly.
• Deletion Anamoly : If (S_id) 401 has only one subject and temporarily he drops it,
when we delete that row, entire student record will be deleted along with it.
Example : Anomalies
in DBMS
Idea: In the below table of Student Info, we have tried to store entire data
about students.
Result: Entire branch data must be repeated for every student of branch.

S_id Name Age Br_code Br_name Hod_Name


1 A 18 101 CS XYZ
2 B 19 101 CS XYZ
3 C 18 101 CS XYZ
4 D 21 102 EC PQR
5 E 20 102 EC PQR
6 F 19 103 ME KLM
Redundancy: When same data is stored multiple time unnecessarily in a database.
Disadvantage: 1. Insertion, deletion and modification anomalies.
2. Inconsistency (data) occurs.
3. Insertion in data base size and increase in time (slow in access).

Insertion Anomalies: When certain data (attributes) cannot be inserted into the database,
without the presence of other data. For example: In above table, if no student enrolled for any
branch then we can not add branch name.

Deletion Anomalies: If we delete some data (unwanted), it cause deletion of some other data
(wanted) due to multiple entries.

Update Anomalies: When we want to update a single piece of data, but it must be done at all of
its copies (due to multiple entries)
Normalization
Systematic process of reducing the redundancy and avoiding the existing anomalies in
the relation.

Objectives of Normalization

 Develop a good description of the data, its relationships and constraints


 Produce a stable set of relations that
 Is a faithful model of the enterprise
 Is highly flexible
 Reduces redundancy-saves space and reduces inconsistency in data
 Is free of update, insertion and deletion anomalies
Types of Normalization
1 Normal Form
As per the rule of first normal form, an attribute (column) of a table cannot hold
multiple values. It should hold only atomic values.
First Normal Form
• Disallows composite attributes, multivalued attributes, and nested
relations; attributes whose values for an individual tuple are non-atomic.

• We say a relation is in 1NF if all values stored in the relation are


single-valued and atomic.
First Normal Form

• An outer join between Employee and Employee Degree will produce the
information we saw before
Implication of INF

 You must have single value in a single column, null value can be

present.

 Order of row and order of column is insignificant.

 Every value must belong to same domain.

 Every column should have unique name.


Second Normal Form
• A relation is in 2NF if it is in 1NF, and every non-key attribute is fully
dependent on each candidate key. (That is, we don’t have any partial
functional dependency.)

• 2NF (and 3NF) both involve the concepts of key and non-key attributes.

• A key attribute is any attribute that is part of a key; any attribute that is
not a key attribute, is a non-key attribute
Second Normal Form
• {SSN, PNUMBER}  HOURS is a full FD since neither SSN  HOURS
nor PNUMBER  HOURS hold
• {SSN, PNUMBER}  ENAME is not a full FD (it is called a partial
dependency ) since SSN  ENAME also holds
• R can be decomposed into 2NF relations via the process of 2NF
normalization
Second Normal Form
Example: 2nd Normal Form
Example: R ( ABCD)

Where, AB D and BC

In this Candidate Key, (AB)+ = {ABCD}

Here, Prime attributes: AB


Non prime Attribute: CD Dependent on both A and B

Depend only on B, called Partial Dependency

A relation schema R is in second normal form (2NF) if every non-prime


attribute A in R is fully functionally dependent on the primary key
R can be decomposed into 2NF relations via the process of 2NF
normalization
Example: 2nd Normal Form
A B C
Relation: R (ABC)
Where, B C a 1 X Relation R
b 2 Y Not in 2nd NF, Because
Now, AB+ = {ABC} C has partial
So AB is candidate Key a 3 Z
dependency
c 3 Z
Relation R1
d 3 Z
A B e 3 Z Relation R2
a 1
B C
b 2 R (ABC) 1 X
a 3
2 Y
c 3 R1 (AB) R2 (BC) 3 Z
d 3
e 3
Now, it is in 2nd NF
Third Normal Form
• A database is in third normal form if it satisfies the following conditions:
– It is in second normal form
– There is no transitive functional dependency

• Definition
– Transitive functional dependency – if there a set
of atribute Z that are neither a primary or candidate
key and both X  Z and Y  Z holds.
3 Normal Form
rd

A relation schema R is in third normal form (3NF) if it is in 2NF and no


non-prime attribute A in R is transitively dependent on the primary key
Third normal form (3NF) is based on the concept of transitive dependency. A
functional dependency X → Y in a relation schema R is a transitive
dependency if there exists a set of attributes Z in R that is neither a
candidate key nor a subset of any key of R, and both X → Z and Z → Y
hold.
Third Normal Form
• In the table able, [Book ID] determines [Genre ID], and [Genre ID]
determines [Genre Type]. Therefore, [Book ID] determines [Genre Type]
via [Genre ID] and we have transitive functional dependency, and this
structure does not satisfy third normal form.
• To bring this table to third normal form, we split the table into two as
follows:

Now all non-key attributes are fully functional dependent only on the primary key. In
[TABLE_BOOK], both [Genre ID] and [Price] are only dependent on [Book ID]. In
[TABLE_GENRE], [Genre Type] is only dependent on [Genre ID].
EXAMPLE
Consider this table:

FD set:
{STUD_NO -> STUD_NAME, STUD_NO -> STUD_STATE, STUD_STATE ->
STUD_COUNTRY, STUD_NO -> STUD_AGE}

Candidate Key:
{STUD_NO}

For this relation in table , STUD_NO -> STUD_STATE and STUD_STATE ->
STUD_COUNTRY are true. So STUD_COUNTRY is transitively dependent on STUD_NO. It
violates the third normal form. To convert it in third normal form, we will decompose the
relation STUDENT (STUD_NO, STUD_NAME, STUD_PHONE, STUD_STATE,
STUD_COUNTRY_STUD_AGE) as:

STUDENT (STUD_NO, STUD_NAME, STUD_PHONE, STUD_STATE, STUD_AGE)


STATE_COUNTRY (STATE, COUNTRY)
27
28
BCNF (Boyce-Codd Normal Form)
A relation schema R is in Boyce- Codd Normal Form (BCNF) if whenever
an FD X  A holds in R, then X is a Super key of R

– Each normal form is strictly stronger than the


previous one:
• Every 2NF relation is in 1NF
• Every 3NF relation is in 2NF
• Every BCNF relation is in 3NF
– There exist relations that are in 3NF but not in
BCNF
– The goal is to have each relation in BCNF (or
3NF)
Boyce codd Normal Form
In BCNF:
o When a relation has more than one candidate key, anomalies may result even though the
relation is in 3NF.
o 3NF does not deal satisfactorily with the case of a relation with overlapping candidate
keys
o i.e. composite candidate keys with at least one attribute in common.
o BCNF is based on the concept of a determinant.
o A determinant is any attribute (simple or composite) on which some other attribute is
fully functionally dependent.
o A relation is in BCNF is, and only if, every determinant is a candidate key.
o Definition: A relation is in Boyce-Codd Normal Form (BCNF) if every determinant is a
candidate key. (See the links in the box at right for definitions of determinant and
candidate key.)
The difference between 3NF and BCNF is that for a functional dependency A  B,
3NF allows this dependency in a relation if B is a primary-key attribute and A is not
a candidate key, Whereas BCNF insists that for this dependency to remain in a
relation, A must be a candidate key.
BCNF (Boyce-Codd Normal Form)
For a table to be in BCNF, following conditions must be satisfied:
• R must be in 3rd Normal Form
• and, for each functional dependency ( X -> Y ), X should be a super Key.
Example:

For example consider relation R(A, B, C)


A -> BC, B -> A
A and B both are super keys so above relation is in BCNF.

Note –

BCNF decomposition may always not possible with dependency


preserving, however, it always satisfies lossless join condition. For
example, relation R (V, W, X, Y, Z), with functional dependencies:
V, W -> X Y, Z -> X W -> Y
It would not satisfy dependency preserving BCNF decomposition.

Note -:Redundancies are sometimes still present in a BCNF relation as it


is not always possible to eliminate them completely.
Types of Dependancy
❑ Functional Dependency
❑ Partial Dependency
❑ Fully Functional Dependency
❑ Multivalued dependency
❑ Transitive dependency
Key terms
Here, are some key terms for functional dependency:

Key Terms Description


Axiom Axioms is a set of inference rules used to infer all the
functional dependencies on a relational database.

Decomposition It is a rule that suggests if you have a table that appears to


contain two entities which are determined by the same
primary key then you should consider breaking them up
into two different tables.

Dependent It is displayed on the right side of the functional


dependency diagram.
Determinant It is displayed on the left side of the functional
dependency Diagram.
Union It suggests that if two tables are separate, and the PK is the
same, you should consider putting them. together
Functional Dependence (FD)
• A functional dependency is an association between two attributes of the
same relational database table.
• One of the attributes is called the determinant and the other attribute is
called the determined.
• For each value of the determinant there is associated one and only one
value of the determined.
Functional dependency
● Functional Dependency (FD) determines the relation of one attribute to
another attribute in a database management system (DBMS) system.
● Functional dependency helps you to maintain the quality of data in the
database.
● A functional dependency is denoted by an arrow →.
● The functional dependency of X on Y is represented by X → Y.
● Functional Dependency plays a vital role to find the difference between
good and bad database design.
Example : Functional Dependency
• If A is the determinant and B is the determined then we say that A
functionally determines B and graphically represent this as A -> B.
Example
• Here, Sname is FD on Sno. Because,
Sname can take only one value for
the given value of Sno

• Consider another database of


shipment with following attributes:

• In this case Qty is FD on


combination of Sno, Pno because
each combination of Sno and Pno
results only for one Quantity.

• SP (Sno, Pno) -->


SP.QTY
Dependency Diagrams - Exmaple
A dependency diagram consists of the
attribute names and all functional
dependencies in a given table. The
dependency diagram of Supplier table is.

Here, following functional dependencies


exist in supplier table
• Sno - Sname
• Sname - Sno
• Sno - City
• Sno - Status
• Sname - City
• Sname - Status
• City - Status
Dependency Diagrams - Exmaple
Here following functional
dependencies exist in Part
table:

• Pno - Pname
• Pno - Color
• Pno - Wt
Dependency Diagrams - Exmaple

Here following functional


dependencies exist in
parts table

SP (Sno, Pno) - SP.QTY


Functional Dependencies
• Functional dependencies (FDs) are used to specify formal measures of the
"goodness" of relational designs.
• FDs and keys are used to define normal forms for relations
• FDs are constraints that are derived from the meaning and
interrelationships of the data attributes
• A set of attributes X functionally determines a set of attributes Y if the
value of X determines a unique value for Y
• X Y holds if whenever two tuples have the same value for X, they must
have the same value for Y
If t1[X]=t2[X], then t1[Y]=t2[Y] in any relation
instance r(R)
• X Y in R specifies a constraint on all relation instances r(R)
• If K is a key of R, then K functionally determines all attributes in R (since
we never have two distinct tuples with t1[K]=t2[K])
• FDs are derived from the real-world constraints on the attributes
Fully Functional
dependency
❑ An attribute is fully functional dependent on another attribute, if it is Functionally
Dependent on that attribute and not on any of its proper subset.

❑ For example, an attribute Q is fully functional dependent on another attribute P, if it is


Functionally Dependent on P and not on any of the proper subset of P.

❑ Let us see an example:

EmpID ProjectID Days ProjectID ProjectCost


(spent on the project)
E099 001 320 001 1000

E056 002 190 002 5000

<ProjectCost <EmployeeProject>
>
The above relations states:

EmpID, ProjectID, ProjectCost -> Days

However, it is not fully functional dependent.

Whereas the subset {EmpID, ProjectID} can easily determine the {Days} spent on
the project by the employee.
This summarizes and gives our fully functional dependency:

{EmpID, ProjectID} -> (Days)


Fully Functional Dependence (FFD)
• Fully Functional Dependence (FFD) is defined, as Attribute Y is FFD on
attribute" X, if it is FD on X and not FD on any proper subset of X.
• For example, in relation Supplier, different cities may have the same
status. It may be possible that cities like Amritsar, Jalandhar may have
the same status 10.
So, the City is not FD on Status.
But, the combination of Sno, Status
can give only one corresponding
City ,because Sno" is unique. Thus,
(Sno, Status) à City
It means city is FD on composite
attribute (Sno, Status) however City is
not fully functional dependent on this
composite attribute.
Partial Dependency
If any proper subsets of the key determine any of the non-key attributes then
there exist a partial dependency.
• Example: Given a relation R(A,B,C,D,E) , Functional Dependency :
AB→CDE , Primary_key(or simply 'key') is AB.

• Then A→C : is a Partial Dependency


A→D : is a Partial Dependency
A→E : is a Partial Dependency
B→C : is a Partial Dependency
B→D : is a Partial Dependency
B→E : is a Partial Dependency
Partial dependency
❑ Partial Dependency occurs when a nonprime attribute is functionally dependent on part of a
candidate key.

❑ The 2nd Normal Form (2NF) eliminates the Partial Dependency. Let us see an example:

❑ <StudentProject>

StudentID ProjectNo StudentName ProjectName

S01 199 Katie Geo Location

S02 120 Ollie Cluster Exploration


In the above table, we have partial dependency; let us see how:

❑ The prime key attributes are StudentID and ProjectNo.

❑ As stated, the non-prime attributes i.e. StudentName and ProjectName should be

functionally dependent on part of a candidate key, to be Partial Dependent.

❑ The StudentName can be determined by StudentID that makes the relation Partial

Dependent.

❑ The ProjectName can be determined by ProjectID, which that the relation Partial

Dependent.
Transitive Dependency
The transitivity rule is perhaps the most important one. It states that
if X functionally determines Y and Y functionally
determine Z then X functionally determines Z.

X🡪 Y
Y🡪 Z
X🡪 Z
Multi-Value Dependency
• A Multi-Value Dependency (MVD) occurs when two or more
independent multi valued facts about the same attribute occur within the
same table.
• Multivalued dependencies occur when the presence of one or
more rows in a table implies the presence of one or more other rows in
that same table.
• Examples: For example, imagine a car company that manufactures
many models of car, but always makes both red and blue colors of each
model. If you have a table that contains the model name, color and year
of each car the company manufactures, there is a multivalued
dependency in that table.
• If there is a row for a certain model name and year in blue, there must
also be a similar row corresponding to the red version of that same car.
Multivalued dependency
Multivalued dependency occurs in the situation where there are multiple
independent multivalued attributes in a single table. A multivalued
dependency is a complete constraint between two sets of attributes in a
relation. It requires that certain tuples be present in a relation.

Example:

In this example, maf_year and color are independent of each other but
dependent on car_model. In this example, these two columns are said to
be multivalue dependent on car_model.
This dependence can be represented like this:
car_model -> maf_year
car_model-> colour
Car_model Maf_year Color
H001 2017 Metallic
H001 2017 Green
H005 2018 Metallic
H005 2018 Blue
H010 2015 Metallic
H033 2012 Gray
Join Dependency
• A Join Dependency exists if a relation R is equal to the join of the
projections X Z.

• A join dependency is a constraint on the set of legal relations over a


database scheme. A table T is subject to a join dependency if T can
always be recreated by joining multiple tables each having a subset of
the attributes of T. If one of the tables in the join has all the attributes of
the table T, the join dependency is called trivial.
Inference Rules for FDs
• Given a set of FDs F, we can infer additional FDs that hold whenever the
FDs in F hold
• Armstrong's inference rules
A1. (Reflexive) If Y subset-of X, then X 🡪 Y

A2. (Augmentation) If X 🡪 Y, then XZ 🡪 YZ


(Notation: XZ stands for X U Z)

A3. (Transitive) If X 🡪 Y and Y 🡪 Z, then X 🡪 Z

• A1, A2, A3 form a sound and complete set of inference rules


Additional Useful Inference Rules
• Decomposition
– If X 🡪 YZ, then X 🡪 Y and X 🡪 Z
• Union
– If X 🡪 Y and X 🡪 Z, then X 🡪 YZ
• Psuedo transitivity
– If X 🡪 Y and WY 🡪 Z, then WX 🡪 Z
• Closure of a set F of FDs is the set F+ of all FDs that can be inferred
from F

You might also like