0% found this document useful (0 votes)
15 views37 pages

DB Lecture 4

Uploaded by

mtaddis19
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views37 pages

DB Lecture 4

Uploaded by

mtaddis19
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 37

Chapter 4

Functional Dependency
and
Normalization

1
4.1. Functional Dependency
• When we design a database for an enterprise, the main
objective is to create an accurate representation of the data,
relationships between the data and constraints on the data
that is relevant to the enterprise.
• One or more database design techniques are used to achieve
this objective.
• Normalization is one of these techniques.
• Before moving to the definition and application of
normalization, it is important to have an understanding of
"functional dependency“.
• The logical associations between data items that point the
database designer in the direction of a good database design
are referred to as determinant or dependent relationships.

2
Cont.
• A functional dependency is a relationship between two sets of
attributes in a database, where one set (the determinant)
determines the values of the other set (the dependent).
• Two data items A and B are said to be in a determinant or
dependent relationship, if certain values of data item B always
appears with certain values of data item A.
• If the data item A is the determinant data item and B the
dependent data item, then the direction of the association is from
A to B and not vice versa.(NAME can’t determine ID).
• The essence of this idea is that if the existence of something, call it
A, implies that B must exist and have a certain value, then we say
that "B is functionally dependent on A".
• We also often express this idea by saying that "A determines B" or
that "B is a function of A" or that "A functionally governs B".

3

Cont.
Often, the notions of functionality and functional dependency are
expressed briefly by the statement, "If A, then B“.
• It is important to note that the value B must be unique for a given
value of A, i.e., any given value of A must imply just one and only
one value of B, in order for the relationship to qualify for the name
"function “.
• However, this does not necessarily prevent different values of A
from implying the same value of B.(having different ID, but there
may be similar names).
• If B is functionally dependent on A it is denoted as A  B.
• A is called determinant and B is called dependent.
• Both A and B can refer to a single attribute or group of attributes.
• For instance, you may have a set of attributes (A, B, C, etc.) and an
arrow (->) denoting the dependency. For example, if we have a table
of employee data with columns "EmployeeID," "FirstName," and
"LastName," we can express a functional dependency like
this: EmployeeID -> FirstName, LastName. 4
Cont.
• A  B holds if whenever two tuples have the same value for A,
they must have the same value for B.(this does not necessarily
prevent different values of A from implying the same value of B)
• In general, a functional dependency is a relationship among
attributes.
• In relational databases, we can have a determinant that governs one
other attribute or several other attributes.
• Functional dependencies (FDs) are derived from the real-world
constraints on the attributes.
• Partial Dependency
– If we have composite primary key and if an attribute which is
not a member of the primary key is dependent on some part of
the primary key then that attribute is partially functionally
dependent on the primary key.
– E.g. {SSN, PNUMBER} -> ENAME
is not a full FD (it is called a partial dependency ) since SSN ->
ENAME also holds.
5
Cont.
– Let {A,B} is the Primary Key and C is not key attribute, then if
{A,B}C and BC or AC then C is partially functionally
dependent on {A,B}.
• Full Dependency
– If an attribute which is not a member of the primary key is not
dependent on some part of the primary key but the whole key
(if we have composite primary key) then that attribute is fully
functionally dependent on the primary key.
– Let {A,B} is the Primary Key and C is not key attribute, then if
{A,B}→C and B→C and A→C does not hold ( if B can not
determine C and A can not determine C) then C is fully
functionally dependent on {A,B}.
– E.g. {SSN, PNUMBER} -> HOURS
is a full FD since neither SSN -> HOURS nor PNUMBER ->
HOURS hold.
6
Cont.
• Transitive Dependency
– In mathematics and logic, a transitive relationship is a
relationship of the following form: "If A implies B,
and if also B implies C, then A implies C”.
– Generalized way of describing transitive dependency
is that:
• If A functionally governs B, and B functionally governs
C then A functionally governs C provided that neither C
nor B determines A.
• Transitive dependency occurs when some non-key attribute
determines some other attribute.

7
Cont.
E.g.

 Emp-ID determines F-Name, L-Name and Dept-ID. Similarly Dept-ID


determines Dept-Name. Therefore, Emp-ID can determine Dept-name. This
indicates there is transitive dependency.
8
Cont.
• Another example:

9
4.2. Normal Forms
• A relational database is merely a collection of data, organized in a
particular manner.
• One of the best ways to determine what information should be
stored in a database is to clarify what questions will be asked of it
and what data would be included in the answers.
• Database normalization is a series of steps followed to obtain a
database design that allows for consistent storage and efficient
access of data in a relational database.
• These steps reduce data redundancy and the risk of data
becoming inconsistent.
• Normalization is the process of identifying the logical
associations between data items and designing a database that will
represent such associations, but without suffering the update
anomalies.

10
Cont.
• Normalization may reduce system performance since data will be
cross referenced from many tables.
• Thus de-normalization is sometimes used to improve
performance, at the cost of reduced consistency guarantees.
• Normalization normally is considered as good if it is lossless
decomposition.
• The decomposition of a given relation X is known as a lossless
decomposition when the X decomposes into two relations X1 and
X2 in a way that the natural joining of X1 and X2 gives us the
original relation X in return.
• All the normalization rules will eventually remove the update
anomalies that may exist during data manipulation after the
implementation.
• Normalization is a process of applying a series of rules to ensure
that a database achieves an optimal structure.
• Normalization begins by examining the relationships between
attributes called functional dependencies.
11
Cont.
• The purpose of normalization is to identify a suitable set of relations
that support the data requirement of an enterprise.
• The characteristics of suitable relations include:
1. Minimal number of attributes to support the data
requirement of an enterprise.
2. Putting attributes with a close logical relationships in the
same relation.
3. Minimal redundancy with each attribute represented only
once.
• Note:- The exception to the third characteristics is that of attributes
that form all or part of foreign keys which are essential for the
joining of related relations.(Because these can be repeated).

12
Cont.
• A major aim of relational database design is to group attributes
into relations to minimize data redundancy.
• If data redundancy is minimized:
– Updates to the data stored in the database are achieved with a
minimal number of operations – reducing the opportunities
for data inconsistencies occurring in the database.
– Reduction in the file storage required by the base relations thus
minimizing costs.
• It is important to identify unwanted redundancies and required
redundancies./example foreign key values/.

13
Cont.
• The type of problems that could occur in insufficiently normalized
table is called update anomalies which includes:
– Insertion anomalies
– Deletion anomalies
– Modification anomalies
• Insertion anomalies
– An "insertion anomaly" is a failure to place information about a
new database entry into all the places in the database where
information about that new entry needs to be stored.
– In a properly normalized database, information about a new
entry needs to be inserted into only one place in the database;
in an inadequately normalized database, information about a
new entry may need to be inserted into more than one
place./if there is repetition./
14
Cont.
• Deletion anomalies
– A "deletion anomaly" is a failure to remove information about
an existing database entry when it is time to remove that entry.
– In a properly normalized database, information about an old,
to-be-gotten-rid-of entry needs to be deleted from only one
place in the database; in an inadequately normalized
database, information about that old entry may need to be
deleted from more than one place, and some of the needed
deletions may be missed.

15
Cont.
• Modification anomalies
– A modification of a database involves changing some
value of the attribute of a table.
– In a properly normalized database table, what ever
information is modified by the user, the change will be
effected and used accordingly.
• The purpose of normalization is to reduce the
chances for anomalies to occur in a database.

16
Cont.
• Example: consider the following tables that illustrate the problems
with unwanted data redundancies.
– Staff(SNo,SName,Position,Salary,BNo)
– Branch(BNo,BAddress)
– StaffBranch(SNo,SName,Position,Salary,BNo,BAddress)

StaffBranch
SNo SName Position Salary BNo BAddress
SL21 Abebe Manager 3000 B005 A.A.
SG37 Hagos Assistant 1200 B003 D.D.
SG14 Almaz Supervisor 1800 B003 D.D.
SA9 Aster Assistant 900 B007 B.D.
SG5 Hailu Manager 2400 B003 D.D.
SL41 Kebede Assistant 900 B005 A.A.
17
Cont.
• In the StaffBranch relation there is redundant data.
• The details of a branch are repeated for every member of staff
located at that branch.
• These type of relations that have redundant data may have
problems called update anomalies (irregularities).
• To enter details of a new member of staff into the StaffBaranch
relation, we must include the details of the branch at which the staff
are to be located.
• To enter details of a new branch that currently has no members of
staff into the StaffBranch relation, it is necessary to enter NULL
into attributes for staff such as SNo.
• However as SNo is the primary key for the StaffBranch relation,
attempting to enter NULL into SNo violates entity integrity and is
not allowed.

18
Cont.
• We therefore can not enter a tuple for a new branch into the
StaffBranch relation with a NULL for the SNo.
• If we delete a tuple from the StaffBranch relation, the details about
that branch are also lost from the database.
– E.g. if we delete the tuple for the staff number SA9(Aster) from
the StaffBranch relation, the details relating to the branch
number B007 are lost from the database.
• If we want to change the value of one of the attributes of a
particular branch in the StaffBranch relation, e.g. the address for
branch number B003, we must update the tuples of all staff located
at that branch.
• If this modification is not carried out on all the appropriate tuples
of the StaffBranch relation, the database will become inconsistent.
• We avoid these update anomalies from the StaffBranch relation
by decomposing the relation into two as Staff and Branch
relations as follows:
19
Cont.
Staff
SNo SName Position Salary BNo Branch
SL21 Abebe Manager 3000 B005
BNo BAddress
SG37 Hagos Assistant 1200 B003
B003 D.D.
SG14 Almaz Superviso 1800 B003
B005 A.A.
r
B007 B.D.
SA9 Aster Assistant 900 B007
SG5 Hailu Manager 2400 B003
SL41 Kebede Assistant 900 B005

20
Cont.
• Branch details appear only once for each branch in the Branch
relation.
• And only the BNo is repeated in the Staff relation to represent
where each member of the staff is located.
• The latter two relations do not suffer from potential
inconsistency because we enter only appropriate branch number
for each staff member in the Staff relation.
• The details of the branch number B007 are recorded in the database
as a single tuple in the Branch relation.
• Update anomalies will be avoided from the two relations Staff and
Branch because branch details are stored into the Branch relation
separately from staff details which is stored in the Staff relation.

21
Cont.
• Normalization can be used to derive well-formed relations.
• We have various levels or steps in normalization called Normal
Forms.
• The level of complexity, strength of the rule and decomposition
increases as we move from one lower level Normal Form to the
higher.
• A table in a relational database is said to be in a certain normal
form if it satisfies certain constraints/rules or requirements for
the normal form.

22
Cont.
• Normalization towards a logical design consists of the following steps:
– UnNormalized Form:
• Identify all data elements.
– First Normal Form:
• Find the key with which you can find all data/primary key/.
– Second Normal Form:
E.g .teacher id,subject,teacher age as attributes/if the first two are keys/
• Remove part-key dependencies
• Make all data dependent on the whole key.
– Third Normal Form
• Remove non-key dependencies/dependent attributes beyond
primary key/
• Make all data dependent on nothing but the key.
• For most practical purposes, databases are considered normalized if
they adhere to third normal form.
23
Cont.
• First Normal Form (1NF)
– Requires that all column values in a table are atomic (e.g., a
number is an atomic value/age/, while a list, or a set is
not/mobile number/).
– We have tow ways of achieving this:
1. Putting each repeating group into a separate table and
connecting them with a primary key-foreign key
relationship. E.g staff-branch relation in the
previous case.
2. Moving this repeating groups to a new row by
repeating the common attributes. If so, then find the
key with which you can find all data.

24
Cont.

25
Cont.

26
Cont.

27
Cont.

28
Cont.
• Definition: a table (relation) is in 1NF
If
– There are no duplicated rows in the table. But having unique identifier.
– Each cell is single-valued (i.e., there are no repeating groups).
– Entries in a column (attribute, field) are of the same kind. Data type.
• Second Normal Form (2NF)
– No partial dependency of a non key attribute on part of the
primary key.
– This will result in a set of relations with a level of Second
Normal Form.
– Any table that is in 1NF and has a single-attribute (i.e., a non-
composite) primary key is automatically in 2NF.

29
Cont.

30
Cont.

31
Cont.
• Definition: a table (relation) is in 2NF:
If
– It is in 1NF and
– If all non-key attributes are dependent on the entire primary key. i.e. no
partial dependency. E.g. In the given table, non-prime attribute
TEACHER_AGE is dependent on TEACHER_ID, this violates the rule of
2NF. So divide the table into to TEACHER_ID, TEACHER_AGE, and
TEACHER_ID, SUBJECT.

32
Cont.
• To convert the given table into 2NF, we decompose it into two
tables:

33
Cont.
• Third Normal Form (3NF)
– Eliminate columns dependent on another non-primary Key.
– If attributes do not contribute to a description of the key, remove
them to a separate table.
– This level avoids update and delete anomalies.
E.g. A bank statement has the following attributes for registration.
Id, name, acc.no, bank code, bank
Id is primary key, but name, acc.no, and bank code are dependent on
id and bank is dependent on bank code, so this to mean bank is
dependent on id/which is transitive functional dependency/, but in
3NF this dependency should be removed. So separating the table
is important as id, name, acc.no, bank code as one and bank
code and bank as the other table.

34
Cont.
• Definition: a table (relation) is in 3NF:
If
– It is in 2NF and
– There are no transitive dependencies between a
primary key and non-primary key attributes.
• Generally, even though there are other four additional levels of
Normalization, a table is said to be normalized if it reaches 3NF.
• A database with all tables in the 3NF is said to be Normalized
database.

35
Cont.

36
Thank
you!!!
37

You might also like