Chapter 4 Logical Database Design Normalization, Redundancy and
Chapter 4 Logical Database Design Normalization, Redundancy and
2
Mekonnen K.
What is Redundancy?
▪ Redundancy refers to the duplication of data within a database system. While some degree of
redundancy is inevitable and even necessary for efficient data retrieval, excessive redundancy can lead
to various issues, including
- increased storage requirements,
- data inconsistency, and
- decreased performance.
▪ Redundancy in a DBMS refers to the storage of the same piece of data in multiple places.
▪ It can arise due to various reasons, such as
- denormalized database design,
- a lack of proper data modeling, and
- the replication of data for backup or distribution purposes.
Mekonnen K. 3
Redundancy
▪ This occurs when entire rows (records) in a table are repeated or contain very similar information.
1 Jojo 20 1 Jojo 20
2 Kit 25
Mekonnen K.
4
Redundancy (Cont..)
▪ Column Level Redundancy:
▪ This happens when attributes (columns) store repetitive or derived information that can be
obtained from other columns or tables.
▪ Now Rows are same but in column level because of Sid is primary key but columns are same.
Redundant
Sid Sname Cid Cname Fid Fname Salary Column
Values
1 AA C1 DBMS F1 Jojo 30000
2 BB C2 JAVA F2 KK 50000
Mekonnen K. 5
What is an Anomaly?
▪ Problems that can occur in poorly planned, unnormalized databases where all the
data is stored in one table (a flat-file database).
▪ Types of Anomalies:
• Insert
• Delete
• Update
Mekonnen K. 6
Anomalies in DBMS
▪ Insert Anomaly : An Insert Anomaly occurs when certain attributes cannot be inserted into the
database without the presence of other attributes.
▪ Delete Anomaly: A Delete Anomaly exists when certain attributes are lost because of the deletion
of other attributes.
▪ Update Anomaly: An Update Anomaly exists when one or more instances of duplicated data is
updated, but not all.
Mekonnen K. 7
Anomaly Example
▪ Below table University consists of seven attributes: Sid, Sname, Cid, Cname, Fid,
Fname, and Salary. And the Sid acts as a key attribute or a primary key in the relation.
Mekonnen K. 8
Insertion Anomaly
▪ Suppose a new faculty joins the University, and the Database Administrator inserts the faculty data
into the above table. But he is not able to insert because Sid is a primary key, and can’t be NULL.
So, this type of anomaly is known as an insertion anomaly.
Mekonnen K. 9
Delete Anomaly
▪ When the Database Administrator wants to delete the student details of Sid=2 from the above table,
then it will delete the faculty and course information too which cannot be recovered further.
SQL:
DELETE FROM University WHERE Sid=2;
Mekonnen K. 10
Update Anomaly
▪ When the Database Administrator wants to change the salary of faculty F1 from 30000 to 40000 in
above table University, then the database will update salary in more than one row due to data
redundancy. So, this is an update anomaly in a table.
SQL:
UPDATE University
SET Salary= 40000
WHERE Fid=“F1”;
Mekonnen K. 11
What is Functional Dependency?
▪ A functional dependency (FD) is a relationship between two attributes, typically between the primary
key and other non-key attributes within a table.
▪ A functional dependency denoted by X→Y , is an association between two sets of attribute X and Y.
▪ For example,
▪ Here, SIN determines Name, Address and Birthdate. So, SIN is the determinant and
Name, Address and Birthdate are the dependents.
Mekonnen K. 12
Functional Dependency
▪ Types of functional dependency
1. Fully-Functional Dependency
2. Partial Dependency
3. Transitive Dependency
4. Trivial Dependency
5. Multivalued Dependency
Mekonnen K. 13
Functional Dependency
1. Full functional Dependency
▪ A functional dependency X → Y is said to be a full functional dependency if Y is
functionally dependent on X, and not on any proper subset of X.
- If you remove any attribute A from X, the dependency no longer holds.
- This means Y depends on the entire set X, and not just a part of it.
▪ For example,
Mekonnen K. 14
Functional Dependency
2. Partial functional Dependency
▪ A partial functional dependency occurs when a non-prime attribute is functionally dependent on part
(but not all) of a candidate key.
- Let X → Y be a functional dependency.
- If X is a composite candidate key (i.e., consists of two or more attributes), and
- There exists a proper subset A of X such that A → Y also holds,
- Then X → Y is a partial dependency, because Y is not fully dependent on the whole of X, just a
part of it.
- For example,
- If {Emp_num,Proj_num} → Emp_name but also Emp_num → Emp_name then Emp_name is
partially functionally dependent on {Empl_num,Proj_num}.
Mekonnen K. 15
Functional Dependency
2. Partial functional Dependency
Mekonnen K. 16
Functional Dependency
3. Transitive Dependency
A → B and B → C.
▪ Functional dependencies are transitive, which means that we also have the
functional dependency A→C
Mekonnen K. 17
Functional Dependency
3. Transitive Dependency
EmpNum → DeptNum
DeptNum → DeptName
▪ For example,
▪ {Emp_num,Emp_name} → Emp_num is a trivial functional dependency since Emp_num is a subset of
{Emp_num,Emp_name}.
5. Multivalued Dependency
▪ Multivalued dependency occurs in the situation where there are multiple independent multivalued
attributes in a single table.
▪ car_model-> colour are multivalued dependency since manufr_year and color both
are multivalued attribute.
Mekonnen K. 20
What is Normalization ?
▪ Normalization is the process of identifying the logical associations between data items and designing
a database that will represent such associations but without any type of anomalies.
▪ It is a database design technique that organizes tables in a manner that reduces redundancy and
dependency of data.
▪ Normalization is used to avoid redundancy and the problems arising out of redundancy.
▪ Normalization is the process of structuring and handling the relationship between data to minimize
redundancy in the relational table and avoid the unnecessary anomalies properties from the database like
insertion, update and delete.
▪ It helps to divide large database tables into smaller tables and make a relationship between them.
▪ It can remove the redundant data and ease to add, manipulate or delete table fields.
Mekonnen K. 21
What is Normalization ?
Mekonnen K.
22
Purpose of Normalization
▪ Normalization is the process of efficiently organizing data in a database.
1. Eliminating redundant data (for example, storing the same data in more than
one table)
2. Ensuring data dependencies make sense (only storing related data in a table)
▪ Both of these are worthy goals as they reduce the amount of space a database
consumes and ensure that data is logically stored.
Mekonnen K. 23
Purpose of Normalization
▪ The benefits of using a database that has a suitable set of relations is that the
database will be:
1. Easier for the user to access and maintain the data;
Mekonnen K. 24
Normal forms
▪ A normalization defines rules for the relational table as to whether it
satisfies the normal form.
▪ We have various levels or steps in normalization called Normal Forms.
▪ A normal form is a process that evaluates each relation against defined
criteria and removes the multivalued, joins, functional and trivial
dependency from a relation.
▪ The level of complexity, the strength of the rule, and decomposition
increase as we move from one lower-level Normal Form to the higher.
▪ A table in a relational database is said to be in a certain normal form if it
satisfies certain constraints.
▪ If any data is updated, deleted or inserted, it does not cause any problem
for database tables and help to improve the relational table' integrity and
efficiency.
Mekonnen K. 25
Normal forms
▪ The Theory of Data Normalization in SQL is still being developed further. For example, there are
▪ However, in most practical applications, normalization achieves its best in 3rd Normal Form.
Mekonnen K. 26
First Normal Form (1NF)
Mekonnen K. 27
First Normal Form (1NF)
Mekonnen K. 28
First Normal Form (1NF)
Mekonnen K. 29
1NF Example
▪ Example:
The following Course_Content relation is not in 1NF because the Content attribute contains
multiple values.
Mekonnen K. 30
1NF Example (Cont..)
Mekonnen K. 31
Second Normal Form (2NF)
Mekonnen K. 32
Prime and Non Prime Attributes
Prime attributes: The attributes which are used to form a candidate key are called prime attributes.
Non-Prime attributes: The attributes which do not form a candidate key are called non-prime
attributes.
Mekonnen K. 33
Second Normal Form (2NF)
• Steps to achieve 2NF:
⚬ Ensure the table is in 1NF.
⚬ Remove partial dependencies (attributes must depend on the whole
primary key).
Mekonnen K. 34
Second Normal Form (2NF)
Mekonnen K. 35
Example 2NF
▪ The Course Name depends on only CourseID, a part of the primary key
not the whole primary {CourseID, SemesterID}.It’s called partial dependency.
▪ Solution:
▪ Remove CourseID and Course Name together to create a new table.
Mekonnen K. 36
CourseID SemesterID Num Student
Example 2NF (Cont..) IT101 201301 25
IT101 201302 25
IT102 201301 30
IT102 201302 35
IT103 201401 20
Mekonnen K. 38
Third Normal Form (3NF)
Mekonnen K. 39
Third Normal Form (3NF)
Mekonnen K. 40
Example 3NF
Solution:
Remove Teacher Name and Teacher Tel together The Teacher Tel is a nonkey attribute, and
to create a new table. the Teacher Name is also a nonkey atttribute.
But Teacher Tel depends on Teacher Name.
It is called transitive dependency.
Mekonnen K. 41
StudyID Course Name T.ID
Example 3NF 1 Database T1
2 Database T2
3 Web Prog T3
4 Web Prog T3
5 Networking T4
Done?
Oh no, it is still not
in 1NF yet.
Remove Repeating
row. ID Teacher Name Teacher Tel
Note about primary key:
T1 Sok Piseth 012 123 456
- In theory, you can choose
Teacher Name to be a primary key. T2 Sao Kanha 0977 322 111
- But in practice, you should add T3 Chan Veasna 012 412 333
Teacher ID as the primary key. T4 Pou Sambath 077 545 221
Mekonnen K. 42
Boyce Codd Normal Form(BCNF)
▪ Boyce Codd normal form (BCNF) - is the advance version of 3NF. It is stricter than 3NF.
▪ A table is in BCNF if every functional dependency X → Y, X is the super key of the table.
▪ For BCNF, the table should be in 3NF, and for every FD, LHS is super key.
▪ Example: assume there is a company where employees work in more than one department.
EMPLOYEE table:
Mekonnen K. 43
Boyce Codd Normal Form(BCNF)
▪ The table is not in BCNF because neither EMP_DEPT nor EMP_ID alone are keys.
Mekonnen K.
44
Boyce Codd Normal Form(BCNF)
▪ To convert the given table into BCNF, we decompose it into three tables:
1. EMP_ID → EMP_COUNTRY
▪ Candidate keys:
▪ Now, this is in BCNF because left side part of both the functional dependencies is a key.
Mekonnen K. 46
Forth Normal Form (4NF)
▪ A relation will be in 4NF if it is in Boyce Codd normal form and has no multi-valued
dependency.
▪ The given STUDENT table is in 3NF, but the COURSE and HOBBY are two independent entity.
▪ In the STUDENT relation, a student with STU_ID, 21 contains two courses, Computer and Math
and two hobbies, Dancing and Singing. So, there is a Multi-valued dependency on STU_ID, which
Mekonnen K. 48
Forth Normal Form (4NF)
▪ So, to make the above table into 4NF, we can decompose it into two tables:
STUDENT_COURSE STUDENT_HOBBY
Mekonnen K. 49
Denormalization
▪ The data from one table is included in another table to reduce the number of joins in
the query and hence helps in speeding up the performance.
Mekonnen K. 50
Denormalization
▪ A denormalized database should never be confused by a database that has never been
normalized.
▪ Example: Suppose after normalization we have two tables first, Student table and second, Branch
table. The student has the attributes as Roll_no , Student-name , Age , and Branch_id .
Mekonnen K. 51
Denormalization
▪ The branch table is related to the Student table with Branch_id as the foreign key in the Student table.
► If we want the name of students along with the name of the branch name then we need to perform
a join operation. The problem here is that if the table is large, we need a lot of time to perform the
join operations. So, we can add the data of Branch_name from Branch table to the Student table
and this will help in reducing the time that would have been used in join operation and thus
optimize the database.
Mekonnen K. 52
Denormalization
▪ Advantages of Denormalization
▪ Disadvantages of Denormalization
1. As data redundancy is there, update and insert operations are more expensive and take more
time. Since we are not performing normalization, so this will result in redundant data.
Mekonnen K.
53
Conclusion
Mekonnen K. 54
Exercise
Mekonnen K. 55
Exercise
2. StudentID is the primary key. Is it 1NF? How can you make it 1NF?
Mekonnen K. 56
Mekonnen K. 57