0% found this document useful (0 votes)
7 views54 pages

5 - Normalization

The document discusses database normalization and functional dependencies. It defines concepts like redundancy, anomalies, functional dependency, trivial and non-trivial dependencies. It also covers Armstrong's axioms and different types of functional dependencies.

Uploaded by

Yasemin Gövez
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views54 pages

5 - Normalization

The document discusses database normalization and functional dependencies. It defines concepts like redundancy, anomalies, functional dependency, trivial and non-trivial dependencies. It also covers Armstrong's axioms and different types of functional dependencies.

Uploaded by

Yasemin Gövez
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 54

GAZI Industrial Engineering

Normalization
Lecturer : Dr. Murat AKIN
e-mail : [email protected]

Gazi University Industrial Engineering – Database Management Systems Lecture Notes


Introduction
• Normalization is the theory and process by which to evaluate and
improve relational database design.
• Typically divide larger tables into smaller, less redundant tables.
• Spans both logical and physical database design.

Objectives of Normalization:
• Make the schema informative.
• Minimize information duplication.
• Avoid modification anomalies.
• Disallow spurious(bogus) tuples.

Gazi University Industrial Engineering – Database Management Systems Lecture Notes


Introduction

Gazi University Industrial Engineering – Database Management Systems Lecture Notes


Redundancy in databases
Redundancy in a database denotes the repetition of stored data:
Redundancy might cause various anomalies and problems pertaining to
storage requirements:
• Insertion anomalies: It may be impossible to store certain information
without storing some other, unrelated information.
• Update anomalies: If one copy of such repeated data is updated, all
copies need to be updated to prevent inconsistency.
• Deletion anomalies: It may be impossible to delete certain information
without losing some other, unrelated information.
• Increasing storage requirements: The storage requirements may increase
over time.

Gazi University Industrial Engineering – Database Management Systems Lecture Notes


Anomalies in DBMS
• Insertion – can’t enter a new employee without having the employee
take a class.
• Update – giving a salary increase to employee 100 forces us to update
multiple records.
• Deletion – if we remove employee 140, we lose information about the
existence of a Tax Acc class.

Gazi University Industrial Engineering – Database Management Systems Lecture Notes


Functional Dependency (FD)
• In a relation r, a set of attributes Y is functionally dependent upon another set
of attributes X (XY) if:
for all pairs of tuples 𝑡1 and 𝑡2 in r…
if 𝑡1 [X]= 𝑡2 [X] it MUST be the case that 𝑡1 [Y]= 𝑡2 [Y]
Functional Dependency: A constraint that specifies the relationship between
two sets of attributes where one set can accurately determine the value of
other sets. It is denoted as X → Y, where X is a set of attributes that is capable of
determining the value of Y.
• The attribute set on the left side of the arrow, X is called Determinant, while
on the right side, Y is called the Dependent.
• Functional dependencies are used to mathematically express relations among
database entities and are very important to understand advanced concepts in
Relational Database System.
In a nutshell, it can be said that there is a functional dependency, if you always
get the value Y when you choose X.

Gazi University Industrial Engineering – Database Management Systems Lecture Notes


Functional Dependency (FD)
roll_no name dept_name dept_building
42 abc CO A4
43 pqr IT A3
44 xyz CO A4
45 xyz IT A3
46 mno EC B2
47 jkl ME B2

From the above table we can conclude some valid functional dependencies:
• roll_no → { name, dept_name, dept_building }, Here, roll_no can determine values of fields name,
dept_name and dept_building, hence a valid Functional dependency
• roll_no → dept_name , Since, roll_no can determine whole set of {name, dept_name, dept_building}, it can
determine its subset dept_name also.
• dept_name → dept_building , Dept_name can identify the dept_building accurately, since departments
with different dept_name will also have a different dept_building
• More valid functional dependencies: roll_no → name, {roll_no, name} ⇢ {dept_name, dept_building}, etc.

Gazi University Industrial Engineering – Database Management Systems Lecture Notes


Functional Dependency (FD)
roll_no name dept_name dept_building
42 abc CO A4
43 pqr IT A3
44 xyz CO A4
45 xyz IT A3
46 mno EC B2
47 jkl ME B2

Here are some invalid functional dependencies:


name → dept_name Students with the same name can have different dept_name, hence this is not a
valid functional dependency.
dept_building → dept_name There can be multiple departments in the same building, For example, in
the above table departments ME and EC are in the same building B2, hence dept_building → dept_name
is an invalid functional dependency.
More invalid functional dependencies: name → roll_no, {name, dept_name} → roll_no, dept_building →
roll_no, etc.

Gazi University Industrial Engineering – Database Management Systems Lecture Notes


Functional Dependency (FD)
roll_no name dept_name dept_building
42 abc CO A4
43 pqr IT A3
44 xyz CO A4
45 xyz IT A3
46 mno EC B2
47 jkl ME B2

Armstrong’s axioms/properties of functional dependencies:


Reflexivity: If Y is a subset of X, then X→Y holds by reflexivity rule
For example, {roll_no, name} → name is valid.
Augmentation: If X → Y is a valid dependency, then XZ → YZ is also valid by the augmentation rule.
For example, If {roll_no, name} → dept_building is valid, hence {roll_no, name, dept_name} → {dept_building,
dept_name} is also valid.
Transitivity: If X → Y and Y → Z are both valid dependencies, then X→Z is also valid by the Transitivity rule.
For example, roll_no → dept_name & dept_name → dept_building, then roll_no → dept_building is also valid.

Gazi University Industrial Engineering – Database Management Systems Lecture Notes


Functional Dependency (FD)

Types of Functional dependencies in DBMS:

1.Trivial functional dependency


2.Non-Trivial functional dependency
3.Multivalued functional dependency
4.Transitive functional dependency

Gazi University Industrial Engineering – Database Management Systems Lecture Notes


Trivial Functional Dependency

• In Trivial Functional Dependency, a dependent is always a subset of


the determinant.
• i.e. If X → Y and Y is the subset of X, then it is called trivial functional
dependency
For example,
roll_no name age
42 abc 17
43 pqr 18
44 xyz 18

Here, {roll_no, name} → name is a trivial functional dependency, since the dependent
name is a subset of determinant set {roll_no, name}
Similarly, roll_no → roll_no is also an example of trivial functional dependency.

Gazi University Industrial Engineering – Database Management Systems Lecture Notes


Non-Trivial Functional Dependency

• In Non-trivial functional dependency, the dependent is strictly not a


subset of the determinant.
i.e. If X → Y and Y is not a subset of X, then it is called Non-trivial
functional dependency.
For example,
roll_no name age
42 abc 17
43 pqr 18
44 xyz 18

Here, roll_no → name is a non-trivial functional dependency, since the dependent name is
not a subset of determinant roll_no
Similarly, {roll_no, name} → age is also a non-trivial functional dependency, since age is not
a subset of {roll_no, name}

Gazi University Industrial Engineering – Database Management Systems Lecture Notes


Multi-Valued Functional Dependency

• In Multivalued functional dependency, entities of the dependent set


are not dependent on each other.
i.e. If a → {b, c} and there exists no functional dependency between
b and c, then it is called a multivalued functional dependency.
For example,
roll_no name age
42 abc 17
43 pqr 18
44 xyz 18
45 abc 19

Here, roll_no → {name, age} is a multivalued functional dependency, since the dependents
name & age are not dependent on each other(i.e. name → age or age → name doesn’t
exist !)

Gazi University Industrial Engineering – Database Management Systems Lecture Notes


Transitive Functional Dependency

• In transitive functional dependency, dependent is indirectly


dependent on determinant.
i.e. If a → b & b → c, then according to axiom of transitivity, a → c.
This is a transitive functional dependency
For example,
enrol_no name dept building_no
42 abc CO 4
43 pqr EC 2
44 xyz IT 1
45 abc EC 2
Here, enrol_no → dept and dept → building_no,
Hence, according to the axiom of transitivity, enrol_no → building_no is a valid functional dependency.
This is an indirect functional dependency, hence called Transitive functional dependency.

Gazi University Industrial Engineering – Database Management Systems Lecture Notes


Important FD Definitions

Trivial FD XY,Y⊆X

Non-Prime An attribute that does not occur in any key (opposite: Prime)

Full FD X  Y , ∀ A ∈ X ( ( X – {A} ) Y)

Transitive FD XZ X  Y and Y Z

Gazi University Industrial Engineering – Database Management Systems Lecture Notes


Functional Dependencies

Gazi University Industrial Engineering – Database Management Systems Lecture Notes


Functional Dependencies

Gazi University Industrial Engineering – Database Management Systems Lecture Notes


Functional Dependencies

• One cannot determine whether FDs hold for all relation states unless
the meaning of and relationships among the attributes are known

Gazi University Industrial Engineering – Database Management Systems Lecture Notes


Functional Dependencies

• One cannot determine whether FDs hold for all relation


states unless the meaning of and relationships among the
attributes are known:
 These are the “data dependencies” foreshadowed in Relational
Model
 If you do have this domain knowledge, it is possible to identify
candidate keys (minimal subsets of attributes that FD all attributes)

• One can state an FD does not hold given a relation state by


identifying violating tuple(s)

Gazi University Industrial Engineering – Database Management Systems Lecture Notes


Functional Dependencies

Gazi University Industrial Engineering – Database Management Systems Lecture Notes


Functional Dependencies

Gazi University Industrial Engineering – Database Management Systems Lecture Notes


Functional Dependencies

Gazi University Industrial Engineering – Database Management Systems Lecture Notes


FD - Exercise

• Consider the following visual depiction of the functional


dependencies of a relational schema.
1. List all FDs in algebraic notation
2. Identify all key(s) of this relation

Gazi University Industrial Engineering – Database Management Systems Lecture Notes


FD - Exercise

Functional Dependencies Keys

AB DA
CD  E DB

BD  A

DC

Gazi University Industrial Engineering – Database Management Systems Lecture Notes


Functional Dependencies

Gazi University Industrial Engineering – Database Management Systems Lecture Notes


Normalization Forms

1NF

2NF

3NF

Boyce-Codd

4NF

5NF

Gazi University Industrial Engineering – Database Management Systems Lecture Notes


Normalization Process
• Submit a relational schema to a set of tests (related to FDs) to certify
whether it satisfies a normal form
• If it does not pass, decompose into smaller relations that satisfy the
normal form
– Must be non-additive (i.e. no spurious tuples!)
• The normal form of a relation refers to the highest normal form that it
meets
– As of 2002 the most constraining is 6NF
• The normal form of a database refers to the lowest normal form that any
relation meets
– Practically, a database is normalized if all relations ≥ 3NF

Gazi University Industrial Engineering – Database Management Systems Lecture Notes


First Normal Form (1NF)
• A relation will be 1NF if it contains an atomic value.
• It states that an attribute of a table cannot hold multiple values. It must
hold only single-valued attribute.
• First normal form disallows the multi-valued attribute, composite
attribute, and their combinations.
To make the relations 1NF:
• Eliminate repeating columns in each table.
• Create a separate table for each set of related data.
• Identify each set of related data with a primary key.
• All attributes are single valued & non-repeating.

Gazi University Industrial Engineering – Database Management Systems Lecture Notes


First Normal Form (1NF)

Gazi University Industrial Engineering – Database Management Systems Lecture Notes


First Normal Form (1NF)

Gazi University Industrial Engineering – Database Management Systems Lecture Notes


First Normal Form (1NF)

Example: Relation EMPLOYEE is not in 1NF because of multi-valued


attribute EMP_PHONE.
EMPLOYEE table
EMP_ID EMP_NAME EMP_PHONE EMP_STATE
7272826385,
14 John Florida
9064738238
20 Harry 8574783832 Texas
7390372389,
12 Sam Washington
8589830302

Gazi University Industrial Engineering – Database Management Systems Lecture Notes


First Normal Form (1NF)

The decomposition of the EMPLOYEE table into 1NF has been shown below:

EMP_ID EMP_NAME EMP_PHONE EMP_STATE


14 John 7272826385 Florida
14 John 9064738238 Florida
20 Harry 8574783832 Texas
12 Sam 7390372389 Washington
12 Sam 8589830302 Washington

Gazi University Industrial Engineering – Database Management Systems Lecture Notes


Second Normal Form (2NF)
• In the 2NF, relational schema must be in 1NF.
• In the second normal form, all non-key attributes are fully FD on the
primary key.
• 1NF PLUS every non-key attribute is fully functionally dependent on the
ENTIRE primary key.
• Every non-key attribute must be defined by the entire key, not by only
part of the key
• No partial functional dependencies.

Gazi University Industrial Engineering – Database Management Systems Lecture Notes


Second Normal Form (2NF)

Gazi University Industrial Engineering – Database Management Systems Lecture Notes


Second Normal Form (2NF)

Gazi University Industrial Engineering – Database Management Systems Lecture Notes


Second Normal Form (2NF)

Gazi University Industrial Engineering – Database Management Systems Lecture Notes


Second Normal Form (2NF)
Example: Let's assume, a school can store the data of teachers and the
subjects they teach. In a school, a teacher can teach more than one
subject.
TEACHER table
TEACHER_ID SUBJECT TEACHER_AGE
25 Chemistry 30
25 Biology 30
47 English 35
83 Math 38
83 Computer 38

Gazi University Industrial Engineering – Database Management Systems Lecture Notes


Second Normal Form (2NF)
• In the given table, non-prime attribute TEACHER_AGE is dependent on
TEACHER_ID which is a proper subset of a candidate key. That's why it
violates the rule for 2NF.
• To convert the given table into 2NF, we decompose it into two tables:

TEACHER_DETAIL table: TEACHER_SUBJECT table:

TEACHER_ID TEACHER_AGE TEACHER_ID SUBJECT


25 30 25 Chemistry
47 35 25 Biology
83 38 47 English
83 Math
83 Computer

Gazi University Industrial Engineering – Database Management Systems Lecture Notes


Second Normal Form (2NF)

Gazi University Industrial Engineering – Database Management Systems Lecture Notes


Third Normal Form (3NF)
• A relation will be in 3NF if it is in 2NF and not contain any transitive
partial dependency.
• 3NF is used to reduce the data duplication. It is also used to achieve the
data integrity.
• If there is no transitive dependency for non-prime attributes, then the
relation must be in third normal form.
• A relation is in third normal form if it holds at least one of the following
conditions for every non-trivial function dependency X → Y.
 X is a super key.
 Y is a prime attribute, i.e., each element of Y is part of some candidate key.

Gazi University Industrial Engineering – Database Management Systems Lecture Notes


Third Normal Form (3NF)

EMPLOYEE_DETAIL table:

EMP_ID EMP_NAME EMP_ZIP EMP_STATE EMP_CITY


222 Harry 201010 UP Noida
333 Stephan 02228 US Boston
444 Lan 60007 US Chicago
555 Katharine 06389 UK Norwich
666 John 462007 MP Bhopal

Super key in the table above:


{EMP_ID}, {EMP_ID, EMP_NAME}, {EMP_ID, EMP_NAME, EMP_ZIP}....so on

Candidate key: {EMP_ID}


Non-prime attributes: In the given table, all attributes except EMP_ID are non-prime.

Gazi University Industrial Engineering – Database Management Systems Lecture Notes


Third Normal Form (3NF)
Super key in the table above:
{EMP_ID}, {EMP_ID, EMP_NAME}, {EMP_ID, EMP_NAME, EMP_ZIP}....so on

Candidate key: {EMP_ID}


Non-prime attributes: In the given table, all attributes except EMP_ID are non-prime.

Here, EMP_STATE & EMP_CITY dependent on EMP_ZIP and EMP_ZIP dependent on


EMP_ID. The non-prime attributes (EMP_STATE, EMP_CITY) transitively dependent on
super key (EMP_ID). It violates the rule of third normal form.

That's why we need to move the EMP_STATE and EMP_CITY to the new <EMPLOYEE_ZIP>
table, with EMP_ZIP as a Primary key.

Gazi University Industrial Engineering – Database Management Systems Lecture Notes


Third Normal Form (3NF)

That's why we need to move the EMP_CITY and EMP_STATE to the new <EMPLOYEE_ZIP>
table, with EMP_ZIP as a Primary key.
EMPLOYEE table EMPLOYEE_ZIP table
EMP_ID EMP_NAME EMP_ZIP EMP_ZIP EMP_STATE EMP_CITY
222 Harry 201010 201010 UP Noida
333 Stephan 02228 02228 US Boston
444 Lan 60007 60007 US Chicago
555 Katharine 06389 06389 UK Norwich
666 John 462007 462007 MP Bhopal

Gazi University Industrial Engineering – Database Management Systems Lecture Notes


Third Normal Form (3NF)

Gazi University Industrial Engineering – Database Management Systems Lecture Notes


Third Normal Form (3NF)

Gazi University Industrial Engineering – Database Management Systems Lecture Notes


Exercise
• Consider the schema for relation T, as well as all FDs.
• What is the normal form of T?
• If T violates 3NF, provide a 3NF decomposition that satisfies the FDs
(including the primary key) and does not produce spurious tuples.
• Show and explain all steps of your analysis and decomposition (if
applicable).

Gazi University Industrial Engineering – Database Management Systems Lecture Notes


Answer - 1

Gazi University Industrial Engineering – Database Management Systems Lecture Notes


Answer - 2

Gazi University Industrial Engineering – Database Management Systems Lecture Notes


Answer - 3

Gazi University Industrial Engineering – Database Management Systems Lecture Notes


Answer - 4

Gazi University Industrial Engineering – Database Management Systems Lecture Notes


Answer - 5

Supplier_ID Status City Part_ID Qty

Gazi University Industrial Engineering – Database Management Systems Lecture Notes


Summary

Normal Form Description


1NF A relation is in 1NF if it contains an atomic value.

A relation will be in 2NF if it is in 1NF and all non-key attributes are fully functional
2NF
dependent on the primary key. No partial dependency.

3NF A relation will be in 3NF if it is in 2NF and no transition dependency exists.

Gazi University Industrial Engineering – Database Management Systems Lecture Notes


Summary

• Normalization is the theory and process by which to evaluate and


improve relational database design
 Makes the schema informative
 Minimizes information duplication
 Avoids modification anomalies
 Disallows spurious tuples

• Make sure all your relations are at least 3NF!


 Higher normal forms exist
 We may reduce during physical design

Gazi University Industrial Engineering – Database Management Systems Lecture Notes


DBMS - End of Lesson

Any Questions

You might also like