0% found this document useful (0 votes)
2 views

Chapter 4- Database Design - (Normalization)

Chapter 4 discusses functional dependencies and normalization in relational databases, outlining the importance of clear semantics, reducing redundancy, and avoiding anomalies. It details various normal forms, including First, Second, and Third Normal Forms, which are essential for efficient database design. The chapter emphasizes the need for normalization to eliminate data redundancy and maintain data integrity while also addressing the potential performance issues that may arise from normalization.

Uploaded by

Tesfalegn Yakob
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Chapter 4- Database Design - (Normalization)

Chapter 4 discusses functional dependencies and normalization in relational databases, outlining the importance of clear semantics, reducing redundancy, and avoiding anomalies. It details various normal forms, including First, Second, and Third Normal Forms, which are essential for efficient database design. The chapter emphasizes the need for normalization to eliminate data redundancy and maintain data integrity while also addressing the potential performance issues that may arise from normalization.

Uploaded by

Tesfalegn Yakob
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 43

Chapter 4:

Functional Dependencies and Normalization


outlines
• Functional Dependency and Normalization
• Functional Dependency
• Normal Forms
• First Normal Form
• Second Normal Form
• Third Normal Form
• Boyce Codd Normal Form
Introduction
 In Relational Model, each relation schema consists of a number of
attributes, and the relational database schema consists of a number
of relation schemas.
 Conceptual data models such as the ER or Enhanced-ER (EER)
model or some other conceptual models make the designer to
identify entity types and relationship types and their respective
attributes , which leads to a natural and logical grouping of the
attributes into relations.
 There are two levels at which we can discuss the goodness of
relation schemas.
I. The first is the logical (or conceptual) level—how users interpret
the relation schemas and the meaning of their attributes.
II. The second is the implementation (or physical storage) level—
how the tuples in a base relation are stored and updated which will
be physically stored as files
Informal Design Guidelines for Relation Schemas
• There are four informal measures that may be used as measures
to determine the quality of relation schema design: they are
1. Making sure that the semantics of the attributes is clear in the
schema.
2. Reducing the redundant information in tuples.
3. Reducing the NULL values in tuples.
4. Disallowing the possibility of generating spurious tuples.
A. Clear Semantics to Attributes in Relations
The semantics of a relation refers to the interpretation of attribute
values in a tuple.
• Semantics: meaning of the attributes in a tuple - how they relate to
each other.
• example: DMU database - the meaning of each relation schema is
pretty straightforward look at the Employee relation:
– SSN is a key (primary key)
– FNAME :the name of the employee
– MAN_SSN - a foreign key representing an implicit relationship between the
DMU db
• make semantics clear and easy to explain - one relation describes
only one real world entity
B. Reducing Redundant Information in Tuples
• One goal of schema design is to minimize the storage space used by the
base rela-tions (and hence the corresponding files). Grouping attributes into
relation schemas has a significant effect on storage space
• Consider the two base relations EMPLOYEE and DEPARTMENT as
shown in the figure below.
•Now, consider an EMP_DEPT base relation as shown in the below
diagram which is the result of applying a NATURAL JOIN operation
to EMPLOYEE and DEPARTMENT in the above diagram.
• In EMP_DEPT, the attribute values pertaining to a particular department
(Dnumber, Dname, Dmgr_ssn) are repeated for every employee who works for that
department.
• In contrast, each department’s information appears only once in the
DEPARTMENT relation in the first figure (shows the EMPLOYEE and
DEPARTMENT relation separately).
• Only the department number (Dnumber) is repeated in the EMPLOYEE relation
for each employee who works in that department as a foreign key.
• Another problem with using the relations in figure above (which shows the
EMP_DEPT base relation) is the problem of update anomalies.
• Update Anomalies: When several instances of the same data are scattered
across the database without proper relationship/link
• Anomalies in DBMS are caused when there is too much redundancy in the
database’s information
• These can be classified into;
 insertion anomalies
 deletion anomalies and
 modification anomalies.
To understand these anomalies let us take an example of a Student table.

rollno name branch hod office_tel


401 Akon CSE Mr. X 53337
402 Bkon CSE Mr. X 53337
403 Ckon CSE Mr. X 53337
404 Dkon CSE Mr. X 53337

 In the table above, we have data of 4 Computer Sci. students.


 As we can see, data for the fields branch, hod(Head of Department)
and office_tel is repeated for the students who are in the same branch
in the college, this is Data Redundancy.
Insertion Anomalies
 The insertion anomaly occurs when new record is inserted in
relation. In this anomaly user cannot insert a fact about an entity
until he has an additional fact about another entity.
 For ex:
1. Suppose for a new admission, until and unless a student opts for a
branch, data of the student cannot be inserted, or else we will have
to set the branch information as NULL.
2. Also, if we have to insert data of 100 students of same branch,
then the branch information will be repeated for all those 100
students.
These scenarios are nothing but Insertion anomalies.
Deletion Anomalies
 A “deletion anomaly” is a failure to remove information about an
existing database entry.
 Additionally, deletion of one data may result in lose of other
information.
 Example:
 In our Student table, two different informations are kept together,
Student information and Branch information. Hence, at the end of
the academic year, if student records are deleted, we will also lose
the branch information. This is Deletion anomaly.
Modification Anomalies
 A “modification anomaly” is a failure to modify/update information
about an existing database entry.
 For example:
 What if Mr. X leaves the college? or is no longer the HOD of
computer science department? In that case all the student records
will have to be updated, and if by mistake we miss any record, it
will lead to data inconsistency. This is Updation anomaly.
C. NULL values in tuples
 Null : represent values of attributes that are unknown or do not
apply for that particular row
 In some schema designs, if many of the attributes do not apply to all
tuples in the relation, we end up with many NULLs in those tuples.
 This can waste space at the storage level and may also lead to
problems with understanding the meaning of the attributes.
 Also, another problem with NULL is how to account aggregate
operations such as COUNT or SUM.
 As far as possible, avoid placing attributes in a base relation whose
values may frequently be NULL.
 If NULLs are unavoidable, make sure that they apply in exceptional
cases only and do not apply to a majority of tuples in the relation.
D. Generation of spurious tuples
 spurious tuples: represent wrong information and occur when
joining two or more badly designed relations
 Consider two base relations EMPLOYEE and DEPARTMENT as
given below
• If we attempt to JOIN (Cartesian product) the above
relations, the following relation will be occurred.
• In the above relation, you can observe that there are some
meaningless tuples (which are called as spurious tuples).
• For ex: consider the second tuple in the above relation. It shows
that an employee with E_id = 101, is getting a salary of 100,
belongs to CS & IT and Electrical department also.
• This is clearly spurious information, since one employee cannot
belong to two departments. So, this tuple will be a spurious tuple
and is marked by asterisks (*).
• To obtain the correct data, we have to apply conditions on the JOIN
operation. For ex: if the condition is as
EMPLOYEE . E_id = DEPARTMENT . Dep_id
• We will be retrieving the only the tuples 1, 5 and 9 only, which is
the required one.
Data Dependency
 a functional dependency is a relationship among attributes.
 The logical associations between data items that point the database
designer in the direction of a good database design are referred to as
determinant or dependent relationships.
 Two data items A and B are said to be in a determinant or dependent
relationship if certain values of data item B always appears with
certain values of data item A.
 A functional dependency is denoted by an arrow "→". The
functional dependency of A on B is represented by A → B
• If the data item A is the determinant data item and B the dependent
data item, then the direction of the association is from A to B and
not vice versa.
• The essence of this idea is that if A exists, implies that B must exist
and have a certain value, and then we say that "B is functionally
dependent on A."
• Also it is possible to say that "A functionally determines B," or that
"B is a function of A," or that "A functionally governs B" or "If A,
then B.“
Example:: if we know the value of Employee SSN, we can obtain Employee Name,
city, salary, etc. By this, we can say that the city, Employee Name, and salary are
functionally depended on Employee number.

SSN Employee Salary City


Name

1 Dana 50000 D/markos


2 Francis 38000 BUrie
3 Andrew 25000 F/selam

Since the type of Wine served depends on the type of Dinner, we say Wine is
functionally dependent on Dinner. ie,
Types of Functional Dependency
• Full Functional Dependency
• Partial Functional Dependency
• Transitive Functional Dependency
• Multivalued Functional Dependency
Full Dependency
• If an attribute which is not a member of the primary key is
dependent on the whole key and not on some part of the primary
key, then that attribute is fully functionally dependent on the
primary key.

 Let {A, B} be the Primary Key and C is a non- key attribute


Then if {A, B}àC and BàC and AàC does not hold, Then C is
fully functionally dependent on {A, B}.
Eg: {Ssn, P_number}  Hours is a full dependency because Hours
is dependent on both the attributes Ssn and P_number, not on any one
of them, separately.
Let us see an example −
<ProjectCost> <EmployeeProject>
ProjectID ProjectCost EmpID ProjectID Days (spent on the
001 1000 project)
002 5000
E099 001 320
E056 002 190

The above relations states EmpID, ProjectID, ProjectCost -> Days

 However, it is not fully functional dependent


 Whereas the subset {EmpID, ProjectID} can easily determine
the {Days} spent on the project by the employee.
 This summarizes and gives our fully functional dependency −

{EmpID, ProjectID} -> (Days)


Partial Dependency
 If an attribute which is not a member of the primary key is
dependent on some part of the primary key, then that attribute is
partially functionally dependent on the primary key.
 Let {A, B} is the Primary Key and C is no key attribute.
• Then if {A, B} àC and BàC or AàC
 Then C is partially functionally dependent on {A, B}
 Eg: {Ssn, Pnumber} Ename
 is a partial dependency, because SsnEname
Transitive Dependency
• When an indirect relationship causes functional dependency it
is called Transitive Dependency.
• In mathematics and logic, a transitive relationship is a
relationship of the following form: "If A implies B, and if also
B implies C, then A implies C."
• Eg:
In the above case;
If, Dinner  Wine and
also Wine  Service then
Dinner  Service
• ie, in generally, If {(AàB) AND (BàC)} ==> AàC
NORMALIZATION
• A relational database is merely a collection of data, organized in a
particular manner.
• Database normalization is a series of steps followed to obtain a
database design that allows for consistent storage and efficient
access of data in a relational database.
• Concept of normalization was introduced by E.F. Codd (known as
the father of the relational data model)as the basis for database
design.
• He defined first, second and third normal forms depending upon
the constraints which each normalization form satisfies.
• Normalization is used to avoid redundancy and the problems
arising out of redundancy.
• Normalization is the process of identifying the logical
associations between data items and designing a database that
will represent such associations but without any type of
anomalies.
• Normalization may reduce system performance since the data
will be cross referenced from many tables.
• Thus de-normalization is sometimes used to improve
performance, at the cost of reduced consistency guarantees.
Steps of Normalization
• We have various levels or steps in normalization
called Normal Forms.
• The level of complexity, strength of the rule and
decomposition increases as we move from one
lower level Normal Form to the higher.
• A table in a relational database is said to be in a
certain normal form if it satisfies certain
constraints.
Normalization towards a logical design consists of the following
steps:
• UnNormalized Form (UNF): Identify all data elements
• First Normal Form (1NF): Find the key with which you
can find all data i.e. remove any repeating group
• Second Normal Form (2NF): Remove part-key
dependencies (partial dependency). Make all data
dependent on the whole key.
• Third Normal Form (3NF): Remove non-key
dependencies (transitive dependencies). Make all data
dependent on nothing but the key.
• For most practical purposes, databases are considered
normalized if they adhere to the third normal form (there
is no transitive dependency).
First Normal Form (1NF)
• A relation is said to be in first normal form (INF) if
and only if all underlying domains contain atomic
values only. i.e it states that the domain of an
attribute must include only atomic values
(simple, indivisible) and that the value of any
attribute in a tuple must be a single value from
the domain of that attribute.
• 1NF does not allows
– composite attributes
– multivalued attributes
The following diagram depicts the steps of normalization into 1NF form
Second Normal form (2NF)
• No partial dependency of a non key attribute on
part of the primary key. This will result in a set of
relations with a level of Second Normal Form.
• Definition: A table (relation) is in 2NF, if
 It is in 1NF, and
 If all non-key attributes are dependent on the
entire primary key. i.e. no partial dependency.
• That means, a relation R is said to be in 2NF if it is
in 1NF and every non key attribute is completely
functionally dependent on the primary key of R.
Example for 2NF:
• Consider the relation schema given below.

Business rule: Whenever an employee participates in a project, he/she will be


entitled for an incentive.
This schema is in its 1NF since we don‘t have any repeating groups or attributes
with multi-valued property. To convert it into a 2NF, we need to remove all
partial dependencies of non key attributes on part of the primary key.
• As we can see, some non key attributes are
partially dependent on some part of the
primary key.
• This can be witnessed by analyzing the first
two functional dependencies (FD1 and FD2).
• Thus, each Functional Dependencies, with
their dependent attributes should be moved
to a new relation (as shown in the below
diagram) where the determinant will be the
Primary Key for each.
Third Normal Form (3NF)
• Eliminate Columns dependent on another non-
Primary Key - If attributes do not contribute to a
description of the key; remove them to a
separate table.
• This level avoids update and deletes anomalies.
Definition: A Table (Relation) is in 3NF, if:
 It is in 2NF , and
 There are no transitive dependencies between a
primary key and non-primary key attributes.
Example for (3NF)
• Assumption: Students of same batch (same year) live in the same dormitory

This schema is in its 2NF since the primary key is a


single attribute and there are no repeating groups
(multi valued attributes).
To convert it into a 3NF, we need to remove all transitive
dependencies of non key attributes on another non-key attribute.

The non-primary key attributes, dependent on each other will be


moved to another table and linked with the main table using
Candidate Key- Foreign Key relationship as shown below.
• Generally, even though there are other four additional
levels of Normalization, a table is said to be
normalized if it reaches 3NF.
• A database with all tables in the 3NF is said to be
Normalized Database.
• Tips for remembering the rationale for normalization
up to 3NF could be the following:
 No Redundancy: no repeating fields in the table.
 The Fields depend upon the Key: the table should
solely depend on the key.
 The Whole Key: no partial key dependency.
 And nothing but the Key: no inter data dependency.
Pitfalls of Normalization: Problems associated with
normalization
 Requires data to see the problems
 May reduce performance of the system
 Is time consuming
 Difficult to design and apply
 Prone to human error
Reading Assignment

• Refer some text books (from Library)/Internet,


for the following other levels of Normalization-
Those are as follows:
• Boyce-Codd Normal Form (BCNF)
• Forth Normal form (4NF)
• Fifth Normal Form (5NF)
• Domain-Key Normal Form (DKNF)
Thank You

You might also like