0% found this document useful (0 votes)
5 views

Databases Lecture 5

The document discusses database normalization and different normal forms including first, second, third and Boyce-Codd normal forms. It explains key concepts like functional dependencies and anomalies. The goals of normalization are to eliminate redundant data and ensure logical data storage. Normalization is important for database design and modifying existing databases.

Uploaded by

5frq4hkpc5
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Databases Lecture 5

The document discusses database normalization and different normal forms including first, second, third and Boyce-Codd normal forms. It explains key concepts like functional dependencies and anomalies. The goals of normalization are to eliminate redundant data and ensure logical data storage. Normalization is important for database design and modifying existing databases.

Uploaded by

5frq4hkpc5
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

Database Systems

Prof. Ahmed Awad

Lecture 5. Functional Dependencies

1
Outline of Today’s Lecture

• Functional Dependencies
• Normalisation
• Types of Normal Form
• 1st Normal Form (1NF)
• 2nd Normal Form (2NF)
• 3rd Normal Form (3NF)
• Boyce-Codd Normal Form
• There are higher normal forms but we do not cover them in this course

2
Functional Dependencies
• Functional dependency is a set of constraints between two attribute sets in a relation.
• Functional dependency says that if two tuples have the same values for attributes A1, A2,..., An, then
those two tuples must have the same values for attributes B1, B2, ..., Bn.
• Functional dependency is represented by an arrow sign (→) that is, X→Y, where X functionally
determines Y.
• The left-hand side attributes determine the values of attributes on the right-hand side.
• There is a set of rules for the data in the real world. Such as, a person can not have his name in
numeric characters.
• Some constraints that can be applied to the database of a university database are given below:
 Students and teachers are identified by their unique IDs.
 Every student and teacher has only one name.
 Every student and teacher is associated with only one department.
 Each department has one value for the chairperson, building and budget.

3
Functional Dependencies
• Example

student_id name department address


5 John CS Tallinn
9 Bob IT Tartu
10 Alice SE Narva
3 John IT Parnu

• The student_id is unique in the above table so we can get record of student(s) through student_id.
• If two or more students have the same name, we can differentiate them through student_id.

4
Functional Dependencies Definition
• Let R be a relational schema
α ⊆ R and β ⊆ R
• The functional dependency
α→β

holds on R if and only if for any legal relations r(R), whenever any two tuples t1 and t2 of r agree on the
attributes α, they also agree on the attributes β. That is,

t1[α] = t2 [α] ⇒ t1[β ] = t2 [β ]

• Example: Consider the student table, student_id → Name, but not vice versa

5
Armstrong’s Axioms
• Armstrong's Axioms are a set of rules, when applied repeatedly, it generates a closure of functional
dependencies.
• If F is a set of functional dependencies then the closure of F, denoted as F+, is the set of all functional
dependencies logically implied by F.
• Reflexive rule − If alpha is a set of attributes and beta is_subset_of alpha, then alpha holds beta.
• Augmentation rule − If a → b holds and y is attribute set, then ay → by also holds. That is adding
attributes in dependencies, does not change the basic dependencies.
• Transitivity rule − Same as transitive rule in algebra, if a → b holds and b → c holds, then a → c also
holds. a → b is called as a functionally that determines b.

6
Trivial Functional Dependency

• Trivial − If a functional dependency (FD) X → Y holds, where Y is a subset of X, then it is called a


trivial FD. Trivial FDs always hold.

• Non-trivial − If an FD X → Y holds, where Y is not a subset of X, then it is called a non-trivial FD.

• Completely non-trivial − If an FD X → Y holds, where X intersect Y = Φ, it is said to be a completely


non-trivial FD.

7
Normalization

• Normalization is a process of making the database consistent by reducing the redundancies and
ensuring the integrity of data through lossless decomposition.
• Normalization is the process of decomposing unsatisfactory/bad relations by breaking up their
attributes into smaller relations.
• If the design of a database is not thorough, it may contain anomalies.
• Managing a database with anomalies is a hefty and difficult task.
• There are three types of anomalies which can affect interaction with database.
 Update anomalies − If data items are scattered and are not linked properly, then it could lead to
strange situations.
 For example, when we try to update one data item that has copies in many other tables, there is a
very much high possibility of updating some records and the other remain unedited leading to
inconsistency.
 Deletion anomalies − While deleting a record, some instances of it can be left undeleted because of
unawareness and saving of data in other tables.
 Insert anomalies − To insert data in a record that does not exist at all.

8
Normalization

• Normalization should be part of the database design process.


• However, it is difficult to separate the normalization process from the ER modelling process
so the two techniques should be used concurrently.
• Entity Relation Diagram (ERD) should be used to provide the big picture, or macro view, of an
organization’s data requirements and operations.
• This is created through an iterative process that involves identifying relevant entities, their
attributes and their relationships.
• The normalization procedure focuses on the characteristics of specific entities and represents
the micro view of entities within the ERD after being transformed into relational tables.

9
Normalization (Cont.)

• Normalization rules divides larger tables into smaller tables and links them using
relationships.
• The purpose of normalization is to eliminate redundant (repetitive) data and ensure data is
stored logically.
• The theory of normalization was introduced by Edgar Codd with the introduction of First
Normal Form (1NF).
• Edgar Codd extended his theory with Second and Third Normal Form.
• Later on, Edgar Codd joined Raymond F. Boyce to develop the theory of Boyce-Codd Normal
Form.

10
Normalization (Cont.)

Importance of Normalization
• During the normalization process of database design, it is ensured that proposed entities
meet required normal form before table structures are created.
• Many real-world databases have been improperly designed or burdened with anomalies if
improperly modified during the course of time.
• Database Administrators may be asked to redesign and modify existing databases.
• This can be a large undertaking if the tables are not properly normalized.

11
Normalization (Cont.)

Goals of Normalization
• Normalization is the branch of relational theory that provides design insights. It is the
process of determining how much redundancy exists in a table.
• The goals of normalization are to:
 Be able to characterize the level of redundancy in a relational schema.
 Provide mechanisms for transforming schemas in order to remove redundancy.

12
Types of Normal Form

• There are seven types of Normal Form


1. 1NF (First Normal Form)
2. 2NF (Second Normal Form)
3. 3NF (Third Normal Form)
4. BCNF (Boyce-Codd Normal Form)
5. 4NF (Fourth Normal Form)
6. 5NF (Fifth Normal Form)
7. 6NF (Sixth Normal Form)
• In practical applications, the normalization achieves best results in Third Normal Form (3NF).
• We will discuss four types of Normal Form which are 1NF, 2NF, 3NF, and Boyce-Codd
Normal Form as these are the standard normal forms used.

13
First Normal Form (1NF)

• A table is 1NF if it contains an atomic (single) value.


• It states that an attribute of a table cannot hold multiple values. It must hold only single-valued
attributes.
• The attribute domain should not change.
• There must be a unique name for the attributes/columns of a table.
• The first normal form disallows the multi-valued attribute, composite attribute, and their
combinations.
• Do you recall what were the mapping rules for composite and multi-valued attributes in an
ERD?

14
First Normal Form (1NF) (Cont.)

• Example: An instance of Employee table.

emp_id emp_name emp_phone

1 John 1212112,
2154887

2 Bob 1233214

3 Alice 5847415,
9574185

15
First Normal Form (1NF) (Cont.)

• Employee table, after decomposing into 1NF.


• The multiple values in emp_phone column are changed to single value (making another entry
for the second value).
emp_id emp_name emp_phone
1 John 1212112
1 John 2154887

2 Bob 1233214

3 Alice 5847415

3 Alice 9574185

16
Second Normal Form (2NF)

• To create a table in said to be in 2NF if:


• The table is in 1NF.
• There should be no Partial Dependency.
• Partial Dependency: where an attribute in a table depends on only a part of the primary key
and not on the whole key.
• In the second normal form, all non-key attributes are fully functional dependent on the
primary key.

17
Second Normal Form (2NF) (Cont.)

• Example: Instance of a table of teachers in a university, teachers can teach more than one
subjects.
teacher_id subject teacher_age
2 Calculus 30
2 Algebra 30
4 Programming 35
4 Software Testing 35
6 Database 38

• The table above is in 1NF because each attribute has a single value.
• It is not in 2NF. To remove partial dependency, the table is divided.
• The attribute causing partial dependency is moved to another table where it fits.
18
Second Normal Form (2NF) (Cont.)
• To convert the given table into 2NF, the table is decomposed into two tables.
• teacher_details Table

teacher_id teacher_age
2 30
4 35
6 38

• teacher_subjects Table

teacher_id subjects
2 Calculus
2 Algebra
4 Programming
4 Software Testing
6 Database 19
Third Normal Form (3NF)
• A table is said to be in 3NF if:

 The table is in 2NF

 The table should not have Transitive Dependency.

• Third normal Form can be explained as: a table is in 3NF if it is in 2NF and for each functional
dependency X → Y, at least one of the following condition is present:

 X is a super key of table

OR

 Y is a prime attribute of table

20
Third Normal Form (3NF) (Cont.)
• Example: We have 3 tables of students, subjects and score

• students Table student_id name reg_no department address


5 John AB-11 CS Tallinn
9 Bob AC-22 IT Tartu
10 Alice BB-22 SE Narva

subject_id subject_name teacher


• subjects Table 1 Programming Smith
2 Calculus Andrus
3 Database Arthur
6 Mechanical Joseph
Workshop

21
Third Normal Form (3NF) (Cont.)
• score Table
student_id subject_id marks
5 1 60
5 2 25
9 1 65
11 6 150

• We need to add two more columns to get extra information in the score table for exam name and max
marks for that exam.
student_id subject_id marks exam_name max_marks
5 1 60 Theory 70
5 2 25 Practical 30
9 1 65 Theory 70
11 6 150 Workshop 200
22
Third Normal Form (3NF) (Cont.)
• Explaining Transitive Dependency
• The primary key of score table is a composite key, which is made up of two attributes/columns .i.e.,
student_id + subject_id.
• The new column exam_name depends on student as well as subject.
 Example: A mechanical engineering student has a workshop exam and a computer science student does not
have a workshop exam. Likewise, some subjects have their practical exam however some subjects have
only theoretical exams.
• We can assess that exam_name is dependent of both student_id and subject_id.

23
Third Normal Form (3NF) (Cont.)
• The max_marks column depends on exam_name as with exam type, the score changes (less marks
for practical and more marks for theory)

• The column exam_name is neither a primary key, nor is it part of the primary key in the score table.

• However, the column max_marks depends on the exam_name column.

• This type of dependency is called Transitive Dependency, when a non-prime attribute depends on
another non-prime attribute rather than depending on the primary key attributes.

24
Third Normal Form (3NF) (Cont.)
• Removing the Transitive Dependency and transforming the table to 3NF.
• We decompose the score table into two tables, score Table and exam Table.
• Take out the columns exam_name and total_marks from the score table and create a new table
exam.

• The score table before removing Transitive Dependency.

student_id subject_id marks exam_name total_marks


5 1 60 Theory 70
5 2 25 Practical 30
9 1 65 Theory 70
11 6 150 Workshop 200

25
Third Normal Form (3NF) (Cont.)
• The score table after removing Transitive Dependency.

student_id subject_id marks exam_id


5 1 60 1
5 2 25 2
9 1 65 1
11 6 150 3

• The newly created exam Table

exam_id exam_name total_marks


1 Theory 70
2 Practical 30
3 Workshop 200

• After removing Transitive Dependency, the number of data duplications is reduced.


• Data integrity is achieved by having more accurate information. 26
Boyce-Codd Normal Form (BCNF)

• Boyce-Codd Normal Form (BCNF) is the extension of Third Normal Form (3NF).
• BCNF is also known as 3.5 Normal Form.
• This Normal Form deals with certain types of anomaly that is not handled by 3NF.
• A 3NF table which does not have multiple overlapping candidate keys is said to be in
BCNF.
• For a table to be in BCNF, the following conditions should be satisfied.
 Relation or a Table must be in 3NF.
 For any dependency X → Y, X should be a super key. i.e., for a dependency A → B, A cannot
be a non-prime attribute, if B is a prime attribute.

27
Boyce-Codd Normal Form (BCNF)
• Example: We take an instance of university_enrollment table having columns student_id,
subject_name and teacher.

student_id subject_name teacher

11 Programming Smith

11 Database Arthur

20 Programming Mosh

33 Calculus Andrus

44 Database Arthur

51 Programming Smith

28
Boyce-Codd Normal Form (BCNF)
student_id subject_name teacher
11 Programming Smith
11 Database Arthur
20 Programming Mosh
33 Calculus Andrus
44 Database Arthur
51 Programming Smith

• In the university_enrollment table, we can analyse that:


 One student can be enrolled in multiple subjects
 Example: The student with student_id = 11 is enrolled in Programming and Database
courses.
 A teacher is assigned to each student for every subject.
 There are multiple teachers who teach the same subject, i.e., Programming is taught by Smith
and also by Mosh.
29
Boyce-Codd Normal Form (BCNF)
student_id subject_name teacher
11 Programming Smith
11 Database Arthur
20 Programming Mosh
33 Calculus Andrus
44 Database Arthur
51 Programming Smith

• The primary key for the above table will be student_id and subject_name.
• We can find all the records in the table using student_id and subject_name.
• In this case, one teacher can teach only one subject; however, one subject can have multiple
teachers.
• There is a dependency between subject_name and teacher, where subject_name depends on
teacher.
30
Boyce-Codd Normal Form (BCNF)
student_id subject_name teacher
11 Programming Smith
11 Database Arthur
20 Programming Mosh
33 Calculus Andrus
44 Database Arthur
51 Programming Smith

• This table satisfies the 1NF, as all the columns have atomic (single) values. All the columns have
unique names, and the values of the columns are from the same domain.
• The 2NF conditions are also satisfied, as there is no Partial Dependency.
• The table also satisfies the conditions of 3NF, as there is no Transitive Dependency.

31
Boyce-Codd Normal Form (BCNF)
student_id subject_name teacher
11 Programming Smith
11 Database Arthur
20 Programming Mosh
33 Calculus Andrus
44 Database Arthur
51 Programming Smith

• However, this table is still not in Boyce-Codd Normal Form.


• In the table, student_id and subject_name form the primary key, indicating that subject_name is the
prime attribute.
• There is also a dependency of teacher → subject_name. Here the subject_name is a prime attribute
and teacher is a non-prime attribute.
• This type of dependency is not allowed by BCNF.
32
Boyce-Codd Normal Form (BCNF)
• To remove the dependency, we divide the table into two tables, teacher table and student table.
• The teacher table: teacher_id teacher subject_name
1 Smith Programming
2 Arthur Database
3 Mosh Programming
4 Andrus Calculus

student_id teacher_id
• The student table:
11 1
11 2
20 3
33 4
44 2
51 1 33
34

You might also like