Databases Lecture 5
Databases Lecture 5
1
Outline of Today’s Lecture
• Functional Dependencies
• Normalisation
• Types of Normal Form
• 1st Normal Form (1NF)
• 2nd Normal Form (2NF)
• 3rd Normal Form (3NF)
• Boyce-Codd Normal Form
• There are higher normal forms but we do not cover them in this course
2
Functional Dependencies
• Functional dependency is a set of constraints between two attribute sets in a relation.
• Functional dependency says that if two tuples have the same values for attributes A1, A2,..., An, then
those two tuples must have the same values for attributes B1, B2, ..., Bn.
• Functional dependency is represented by an arrow sign (→) that is, X→Y, where X functionally
determines Y.
• The left-hand side attributes determine the values of attributes on the right-hand side.
• There is a set of rules for the data in the real world. Such as, a person can not have his name in
numeric characters.
• Some constraints that can be applied to the database of a university database are given below:
Students and teachers are identified by their unique IDs.
Every student and teacher has only one name.
Every student and teacher is associated with only one department.
Each department has one value for the chairperson, building and budget.
3
Functional Dependencies
• Example
• The student_id is unique in the above table so we can get record of student(s) through student_id.
• If two or more students have the same name, we can differentiate them through student_id.
4
Functional Dependencies Definition
• Let R be a relational schema
α ⊆ R and β ⊆ R
• The functional dependency
α→β
holds on R if and only if for any legal relations r(R), whenever any two tuples t1 and t2 of r agree on the
attributes α, they also agree on the attributes β. That is,
• Example: Consider the student table, student_id → Name, but not vice versa
5
Armstrong’s Axioms
• Armstrong's Axioms are a set of rules, when applied repeatedly, it generates a closure of functional
dependencies.
• If F is a set of functional dependencies then the closure of F, denoted as F+, is the set of all functional
dependencies logically implied by F.
• Reflexive rule − If alpha is a set of attributes and beta is_subset_of alpha, then alpha holds beta.
• Augmentation rule − If a → b holds and y is attribute set, then ay → by also holds. That is adding
attributes in dependencies, does not change the basic dependencies.
• Transitivity rule − Same as transitive rule in algebra, if a → b holds and b → c holds, then a → c also
holds. a → b is called as a functionally that determines b.
6
Trivial Functional Dependency
7
Normalization
• Normalization is a process of making the database consistent by reducing the redundancies and
ensuring the integrity of data through lossless decomposition.
• Normalization is the process of decomposing unsatisfactory/bad relations by breaking up their
attributes into smaller relations.
• If the design of a database is not thorough, it may contain anomalies.
• Managing a database with anomalies is a hefty and difficult task.
• There are three types of anomalies which can affect interaction with database.
Update anomalies − If data items are scattered and are not linked properly, then it could lead to
strange situations.
For example, when we try to update one data item that has copies in many other tables, there is a
very much high possibility of updating some records and the other remain unedited leading to
inconsistency.
Deletion anomalies − While deleting a record, some instances of it can be left undeleted because of
unawareness and saving of data in other tables.
Insert anomalies − To insert data in a record that does not exist at all.
8
Normalization
9
Normalization (Cont.)
• Normalization rules divides larger tables into smaller tables and links them using
relationships.
• The purpose of normalization is to eliminate redundant (repetitive) data and ensure data is
stored logically.
• The theory of normalization was introduced by Edgar Codd with the introduction of First
Normal Form (1NF).
• Edgar Codd extended his theory with Second and Third Normal Form.
• Later on, Edgar Codd joined Raymond F. Boyce to develop the theory of Boyce-Codd Normal
Form.
10
Normalization (Cont.)
Importance of Normalization
• During the normalization process of database design, it is ensured that proposed entities
meet required normal form before table structures are created.
• Many real-world databases have been improperly designed or burdened with anomalies if
improperly modified during the course of time.
• Database Administrators may be asked to redesign and modify existing databases.
• This can be a large undertaking if the tables are not properly normalized.
11
Normalization (Cont.)
Goals of Normalization
• Normalization is the branch of relational theory that provides design insights. It is the
process of determining how much redundancy exists in a table.
• The goals of normalization are to:
Be able to characterize the level of redundancy in a relational schema.
Provide mechanisms for transforming schemas in order to remove redundancy.
12
Types of Normal Form
13
First Normal Form (1NF)
14
First Normal Form (1NF) (Cont.)
1 John 1212112,
2154887
2 Bob 1233214
3 Alice 5847415,
9574185
15
First Normal Form (1NF) (Cont.)
2 Bob 1233214
3 Alice 5847415
3 Alice 9574185
16
Second Normal Form (2NF)
17
Second Normal Form (2NF) (Cont.)
• Example: Instance of a table of teachers in a university, teachers can teach more than one
subjects.
teacher_id subject teacher_age
2 Calculus 30
2 Algebra 30
4 Programming 35
4 Software Testing 35
6 Database 38
• The table above is in 1NF because each attribute has a single value.
• It is not in 2NF. To remove partial dependency, the table is divided.
• The attribute causing partial dependency is moved to another table where it fits.
18
Second Normal Form (2NF) (Cont.)
• To convert the given table into 2NF, the table is decomposed into two tables.
• teacher_details Table
teacher_id teacher_age
2 30
4 35
6 38
• teacher_subjects Table
teacher_id subjects
2 Calculus
2 Algebra
4 Programming
4 Software Testing
6 Database 19
Third Normal Form (3NF)
• A table is said to be in 3NF if:
• Third normal Form can be explained as: a table is in 3NF if it is in 2NF and for each functional
dependency X → Y, at least one of the following condition is present:
OR
20
Third Normal Form (3NF) (Cont.)
• Example: We have 3 tables of students, subjects and score
21
Third Normal Form (3NF) (Cont.)
• score Table
student_id subject_id marks
5 1 60
5 2 25
9 1 65
11 6 150
• We need to add two more columns to get extra information in the score table for exam name and max
marks for that exam.
student_id subject_id marks exam_name max_marks
5 1 60 Theory 70
5 2 25 Practical 30
9 1 65 Theory 70
11 6 150 Workshop 200
22
Third Normal Form (3NF) (Cont.)
• Explaining Transitive Dependency
• The primary key of score table is a composite key, which is made up of two attributes/columns .i.e.,
student_id + subject_id.
• The new column exam_name depends on student as well as subject.
Example: A mechanical engineering student has a workshop exam and a computer science student does not
have a workshop exam. Likewise, some subjects have their practical exam however some subjects have
only theoretical exams.
• We can assess that exam_name is dependent of both student_id and subject_id.
23
Third Normal Form (3NF) (Cont.)
• The max_marks column depends on exam_name as with exam type, the score changes (less marks
for practical and more marks for theory)
• The column exam_name is neither a primary key, nor is it part of the primary key in the score table.
• This type of dependency is called Transitive Dependency, when a non-prime attribute depends on
another non-prime attribute rather than depending on the primary key attributes.
24
Third Normal Form (3NF) (Cont.)
• Removing the Transitive Dependency and transforming the table to 3NF.
• We decompose the score table into two tables, score Table and exam Table.
• Take out the columns exam_name and total_marks from the score table and create a new table
exam.
25
Third Normal Form (3NF) (Cont.)
• The score table after removing Transitive Dependency.
• Boyce-Codd Normal Form (BCNF) is the extension of Third Normal Form (3NF).
• BCNF is also known as 3.5 Normal Form.
• This Normal Form deals with certain types of anomaly that is not handled by 3NF.
• A 3NF table which does not have multiple overlapping candidate keys is said to be in
BCNF.
• For a table to be in BCNF, the following conditions should be satisfied.
Relation or a Table must be in 3NF.
For any dependency X → Y, X should be a super key. i.e., for a dependency A → B, A cannot
be a non-prime attribute, if B is a prime attribute.
27
Boyce-Codd Normal Form (BCNF)
• Example: We take an instance of university_enrollment table having columns student_id,
subject_name and teacher.
11 Programming Smith
11 Database Arthur
20 Programming Mosh
33 Calculus Andrus
44 Database Arthur
51 Programming Smith
28
Boyce-Codd Normal Form (BCNF)
student_id subject_name teacher
11 Programming Smith
11 Database Arthur
20 Programming Mosh
33 Calculus Andrus
44 Database Arthur
51 Programming Smith
• The primary key for the above table will be student_id and subject_name.
• We can find all the records in the table using student_id and subject_name.
• In this case, one teacher can teach only one subject; however, one subject can have multiple
teachers.
• There is a dependency between subject_name and teacher, where subject_name depends on
teacher.
30
Boyce-Codd Normal Form (BCNF)
student_id subject_name teacher
11 Programming Smith
11 Database Arthur
20 Programming Mosh
33 Calculus Andrus
44 Database Arthur
51 Programming Smith
• This table satisfies the 1NF, as all the columns have atomic (single) values. All the columns have
unique names, and the values of the columns are from the same domain.
• The 2NF conditions are also satisfied, as there is no Partial Dependency.
• The table also satisfies the conditions of 3NF, as there is no Transitive Dependency.
31
Boyce-Codd Normal Form (BCNF)
student_id subject_name teacher
11 Programming Smith
11 Database Arthur
20 Programming Mosh
33 Calculus Andrus
44 Database Arthur
51 Programming Smith
student_id teacher_id
• The student table:
11 1
11 2
20 3
33 4
44 2
51 1 33
34