0% found this document useful (0 votes)
68 views62 pages

DB Design Normalization

This document discusses database normalization and related concepts. It begins with refreshing key concepts like primary keys, candidate keys, and superkeys. It then defines normalization as a technique for organizing data to reduce redundancy and anomalies by grouping logically related data. The benefits of normalization include minimizing storage space, reducing inconsistencies, and easier maintenance. Data redundancy can lead to different types of anomalies when data is updated, deleted or modified. The document provides examples to illustrate how normalization addresses these issues.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
68 views62 pages

DB Design Normalization

This document discusses database normalization and related concepts. It begins with refreshing key concepts like primary keys, candidate keys, and superkeys. It then defines normalization as a technique for organizing data to reduce redundancy and anomalies by grouping logically related data. The benefits of normalization include minimizing storage space, reducing inconsistencies, and easier maintenance. Data redundancy can lead to different types of anomalies when data is updated, deleted or modified. The document provides examples to illustrate how normalization addresses these issues.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 62

SEN235 Introduction to Database Systems

(Fall 2022)
Database Design-Normalization
Asst.Prof.Dr. Hasan ÇİFCİ
Agenda

▪ Refresh: Keys
▪ Normalization
▪ Data Redundancy
▪ Anomalies
▪ Functional Dependencies (FD)
▪ The Process of Normalization
▪ Examples

2/62
Refresh: Keys

▪ Primary key: In the relational model of databases, a primary key is a


specific choice of a minimal set of attributes (columns) that uniquely
specify a tuple (row) in a relation (table).
▪ Informally, a primary key is "which attributes identify a record," and in
simple cases constitute a single attribute: a unique ID.
▪ More formally, a primary key is a choice of candidate key (a minimal
superkey); any other candidate key is an alternate key.
▪ A primary key may consist of real-world observables, in which case it is
called a natural key, while an attribute created to function as a key and not
used for identification outside the database is called a surrogate key.
▪ For example, for a database of people (of a given nationality), time
and location of birth could be a natural key.
▪ National identification number is another example of an attribute that
may be used as a natural key.

3/62
Refresh: Keys

▪ Candidate key: A candidate key, or simply a key, of a relational


database is a minimal superkey.
▪ In other words, it is any set of columns that have a unique combination of
values in each row (which makes it a superkey), with the additional
constraint that removing any column would possibly produce duplicate
rows (which makes it a minimal superkey).
▪ Specific candidate keys are sometimes called primary keys, secondary
keys or alternate keys.
▪ The columns in a candidate key are called prime attributes, and a column
that does not occur in any candidate key is called a non-prime attribute.
▪ Every relation without NULL values will have at least one candidate key:
Since there cannot be duplicate rows, the set of all columns is a superkey,
and if that isn't minimal, some subset of that will be minimal.
▪ There is a functional dependency from the candidate key to all the
attributes in the relation.
▪ The candidate keys of a relation are all the possible ways we can identify
a row. As such, they are an important concept for the design of database
schema.
4/62
Refresh: Keys

▪ Superkey: In the relational data model a superkey is a set of attributes


that uniquely identifies each tuple of a relation.
▪ Because superkey values are unique, tuples with the same superkey
value must also have the same non-key attribute values.
▪ That is, non-key attributes are functionally dependent on the superkey.
▪ The set of all attributes is always a superkey (the trivial superkey). Tuples
in a relation are by definition unique, with duplicates removed after each
operation, so the set of all attributes is always uniquely valued for every
tuple. A candidate key (or minimal superkey) is a superkey that can't be
reduced to a simpler superkey by removing an attribute.
▪ For example, in an employee schema with attributes employeeID, name,
job, and departmentID, if employeeID values are unique then employeeID
combined with any or all of the other attributes can uniquely identify
tuples in the table.
▪ Each combination, {employeeID}, {employeeID, name}, {employeeID, name,
job}, and so on is a superkey.
▪ {employeeID} is a candidate key--no subset of its attributes is also a superkey.
{employeeID, name, job, departmentID} is the trivial superkey.
5/62
Normalization

• A technique for producing a set of relations with desirable properties,


given the data requirements of an enterprise.
• The purpose of normalization is to identify a suitable set of relations
that support the data requirements of an enterprise.
• The characteristics of a suitable set of relations include the following:
• the minimal number of attributes necessary to support the data
requirements of the enterprise.
• attributes with a close logical relationship (describes as functional
dependency) are found in the same relation.
• minimal redundancy with each attribute represented only once
with the important exception of attributes that form all or part of
foreign keys, which are essential for the joining of related
relations.

6/62
Benefit of Normalization

• Minimize data redundancies in a database, thus will reduce storage


space required to store the data
• Reduce data anomalies
• Easy for user to access data from a database
• Easy for user to maintain data

7/62
How Normalization Supports Database
Design
• 2 approaches for using normalization:
• Approach 1 – normalization can be used as a bottom-up
standalone database design technique.
• Approach 2 – normalization can be used as a validation technique
to check the structure of relations, using a top-down approach
such as ER modeling.
• Goal – creating a set of well-designed relations that meet the data
requirements of the enterprise.

8/62
Data Redundancy

• Data redundancy = repeated details for the same data


• To minimize data redundancy → group attributes into relations in
Relational Database (RD).
• This would give benefits for the implemented database, such as:
• Minimal number of operations when update data
• Reduce the opportunities for data inconsistencies
• Reduce file storage and minimize costs

9/62
Data Redundancy

• However, certain amount of data redundancy is required in Relational


Databases
• copies of Primary Keys (or candidate keys) acting as Foreign Keys
in related relations to enable the modeling of relationships
between data.

10/62
Data Redundancy

Situation 1 VS Situation 2
Information of staffs in Staff relation
Both information of staffs and branch
Information of branch in Branch
are in StaffBranch relation
relation
11/62
Update Anomalies

Figure 1

Unwanted data
redundancy may cause
update anomalies

Look at tuples in row 2, 3 and 5. The details of a branch (B003) are


repeated for every member of staff located at that branch → data
redundancy, hence defeat one of the purpose of building a database.
Category of update anomalies:
• Insertion anomalies
• Deletion anomalies
• Modification anomalies

12/62
Insertion Anomalies

• There are two main types of insert anomalies:


• To insert the details of new members of staff into the StaffBranch
relation
• we must include the details of the branch at which the staff are to
be located. For example, to insert the details of new staff located
at branch number B007, we must enter the correct details of
branch number B007 so that the branch details are consistent with
values for branch B007 in other tuples of the StaffBranch relation.
• The relations shown in Situation 1 do not suffer from this potential
inconsistency, because we enter only the appropriate branch
number for each staff member in the Staff relation. Instead, the
details of branch number B007 are recorded in the database as a
single tuple in the Branch relation.

Anomaly due to the design of the relation that cause difficulties


during inserting new data
13/62
Insertion Anomalies

• To insert details of a new branch that currently has no members of


staff into the StaffBranch relation
• it is necessary to enter nulls into the attributes for staff, such as
staffNo. However, as staffNo is the primary key for the StaffBranch
relation, attempting to enter nulls for staffNo violates entity
integrity and is not allowed.
• We therefore cannot enter a tuple for a new branch into the
StaffBranch relation with a null for the staffNo.
• The design of the relations shown in Situation 1 avoids this
problem, because branch details are entered in the Branch relation
separately from the staff details. The details of staff ultimately
located at that branch are entered at a later date into the Staff
relation.

14/62
Deletion Anomalies

• If we delete a tuple from the StaffBranch relation that represents the


last member of staff located at a branch, the details about that branch
are also lost from the database.
• For example, if we delete the tuple for staff number SA9 (Mary
Howe) from the StaffBranch relation, the details relating to branch
number B007 are lost from the database.
• The design of the relations in situation 1 avoids this problem,
because branch tuples are stored separately from staff tuples and
only the attribute branchNo relates the two relations.
• If we delete the tuple for staff number SA9 from the Staff relation, the
details on branch number B007 remain unaffected in the Branch
relation.

Anomaly due to the design of the relation that cause difficulties


during deletion of data. Might cause unwanted data loss.
15/62
Modification Anomalies

• If we want to change the value of one of the attributes of a particular


branch in the StaffBranch relation—for example, the address for
branch number B003
• we must update the tuples of all staff located at that branch. If this
modification is not carried out on all the appropriate tuples of the
StaffBranch relation, the database will become inconsistent.
• In this example, branch number B003 may appear to have different
addresses in different staff tuples.

Anomaly due to the design of the relation that cause difficulties


during modification of existing data. Might lead to inconsistency
of the database.

16/62
Functional Dependencies (FD)

• An important concept associated with normalization is functional


dependency, which describes the relationship between attributes
(Maier, 1983).
• if A and B are attributes of relation R, B is functionally dependent
on A (written as: A → B) if each value of A is associated with
exactly one value of B. (A and B may each consist of one or more
attributes.)
• When a functional dependency is present, the dependency is specified
as a constraint between the attributes.
• An alternative way → ‘A functionally determines B’

17/62
FD: Determinant

• Determinant: Refers to the attribute, or group of attributes, on the left-


hand side of the arrow of a functional dependency.

determinant

18/62
Example of a Functional Dependency

(a) staffNo functionally determines


position
• staffNo  position
• Relationship between
staffNo and position is 1:1

(b) position does not functionally


determine staffNo
 staffNo
• position X
• relationship between
position and staffNo is 1:*

19/62
Functional Dependencies

• In normalization we are only interested in FD that “holds at all time”


• Indicate a 1:1 relationship between attributes
• The reason is that we want to identify functional dependencies that
hold for all possible values for attributes of a relation as these
represent the types of integrity constraints that we need to identify.
• Approach → by understanding the purpose of each attribute in an
identified relation.

20/62
Functional Dependencies

• Which of these two FDs hold at all times?


1. staffNo attribute functionally determines the sName attribute
• staffNo → sName
2. sName attribute functionally determines the staffNo attributes.
• sName → staffNo

21/62
Functional Dependencies

• Characteristics of functional dependencies use in normalization


• One-to-one relationship exists between the attribute(s) on the left-
hand side (determinant) and those on the right-had side of a
functional dependency.
• They hold for all time
• The determinant has the minimal number of attributes necessary
to maintain the dependency with the attribute(s) on the right-hand
side → there must be a full functional dependency between the
attributes(s) on the left-hand and right-hand sides of the
dependency.

22/62
Full Functional Dependency

• Full functional dependency: Indicates that if A and B are attributes of a


relation, B is fully functionally dependent on A if B is functionally
dependent on A, but not on any proper subset of A.
• Only applicable when there are more than one attribute on the left-hand
side of the FD.
StaffBranch (staffNo, position, salary, branchNo, bAddress)
• A functional dependency A→B is a full functional dependency if
removal of any attribute from A results in the dependency no longer
existing.
branchNo, position → salary

• A functional dependency A → B is a partial dependency if there is


some attribute that can be removed from A and yet the dependency
still holds.
staffNo, sName → branchNo

23/62
Transitive Dependency

• A condition where A, B, and C are attributes of a relation such that if A


→ B and B → C, then C is transitively dependent on A via B (provided
that A is not functionally dependent on B or C).
staffNo → sName, position, salary, branchNo, bAddress
branchNo → bAddress
• The transitive dependency branchNo → bAddress exists on staffNo
via branchNo.
• The staffNo attribute functionally determines the bAddress via the
branchNo attribute.
• Neither branchNo nor bAddress functionally determines staffNo.

24/62
Identifying Functional Dependencies

• Identifying FDs requires understanding of the meaning and purpose of


each attributes and the relationships between the attributes.
• Example:
StaffBranch (staffNo, sName, position, salary, branchNo, bAddress)
• (Note: for the purpose of discussion, assume that the position
held, and the branch can determine the salary)
• FDs:
• staffNo → sName, position, salary, branchNo, bAddress
• branchNo → bAddress
• bAddress → branchNo
• branchNo, position → salary
• bAddress, position → salary

25/62
Identifying Functional Dependencies
The sample relation
displaying data for
attributes A,B,C,D
and E and the
functional
dependencies (fd1 to
fd4) that exist
between these
attributes.
A → C (fd1)
C → A (fd2)
B → D (fd3)
A,B → E (fd4)
B,C → E (fd5)

26/62
Identifying Functional Dependency

• To identify the functional dependencies that exist between attributes


A, B, C, D, and E:
• We examine the Sample relation and identify when values in one
column are consistent with the presence of particular values in
other columns.
• We begin with the first column on the left-hand side and work our
way over to the right-hand side of the relation and then we look at
combinations of columns; in other words, where values in two or
more columns are consistent with the appearance of values in
other columns.

27/62
The Process of Normalization

• Formal technique for analyzing relations based on their primary key


(or candidate keys) and functional dependencies.
• Involves a series of rules that can be used to test each individual
relations so that a database can be normalized to any degree.
• A database is said to be a normalized database if all relations are in
the “highest” normal form (usually 3NF onwards)
• When a requirement is not met, the relation violating the requirement
must be decomposed into relations that individually meet the
requirements of normalization.
• Checking rules begin with from 1NF, moving upward until at least 3NF
(in this course we will cover until BCNF).

28/62
The Process of Normalization

• Three normal forms:


• First Normal Form (1NF)
• Second Normal Form (2NF)
• Third Normal Form (3NF)

29/62
sources
The Process of Normalization
Data

30/62
First Normal Form (1NF)

• Before the process of normalization, data from source table could be


in unnormalized form (UNF), which is a table that contains one or
more repeating groups.
• 1NF – A relation in which the intersection of each row and column
contains one and only one value.
• Repeating group – is an attribute or group of attributes within a table
that occurs with multiple values for a single occurrence of the
nominated key attribute(s) for that table.
• Two Approach to removing repeating groups from UNF:
• By entering appropriate data in the empty columns of rows
containing the repeating data. (1st approach)
• By placing the repeating data, along with a copy of the original key
attributes(s) in a separate relation. (2nd approach)

31/62
UNF: Example

Collection of
DreamHome leases

Repeating Groups

ClientRental
Unnormalized table

32/62
1NF Example: 1st Approach

→ Remove repeating group in UNF table (property rented


details) by entering the appropriate client data into each row.

1NF relation ClientRental (clientNo, propertyNo, cName, pAddress, rentStart,


schema: rentFinish, rent, ownerNo, oName)

33/62
1NF Example: 2nd Approach

→ Remove repeating group (property rented details) in 1NF table


with a copy of the original key attribute(i.e. the PK - clientNo) by
replacing them in a separate relation from the non-repeating group
relation. Name both relations appropriately.

→ 2 newly formed 1NF relations with a single value at


the intersection of each row:

1NF relation Client (clientNo, cName)


schema: PropertyRentalOwner (clientNo, propertyNo, pAddress, rentStart,
rentFinish, rent, ownerNo, oName)
34/62
1NF: Determine Primary Key (PK)

ClientRental (clientNo, propertyNo, cName, pAddress, rentStart, rentFinish,


rent, ownerNo, oName)

35/62
First Normal Form (1NF)

Important
▪ If you are using normalization as a validation technique to validate
relations that are derived from conceptual ER model, we can safely
say that the derived relations are already in 1NF, where the
assumption is a row of record in the relation is considered as one
complete tuple.

36/62
Second Normal Form (2NF)

• 2NF: A relation that is in 1NF and every non-primary-key attribute is


fully functionally dependent on the primary key.
• A 1NF relation with a single attribute PK is automatically a 2NF
relation.
• A 1NF relation with composite PK MAY or MAY NOT BE a 2NF relation.
• The normalization of 1NF relations to 2NF involves the removal of
partial dependencies.
• If a partial dependency exists, remove the partially dependent
attribute(s) from the relation by replacing them in a new relation
along with a copy of determinant.

37/62
1NF to 2NF: What to do?

• Identify the functional dependencies in the relation.


• Observe the FDs, if partial dependencies exist on the primary key,
move all partial dependencies attributes from the relation by placing
them in a new relation along with a copy of their determinant.

38/62
2NF Example

• Let us use the 1NF ClientRental relation (obtained from the 1st
approach).
• list its FDs
• determine the PK for ClientRental

39/62
2NF Example

FDs list in ClientRental


FD1 : clientNo, propertyNo → rentStart, rentFinish
FD2 : clientNo → cName
FD3 : propertyNo → pAddress, rent, ownerNo, oName
FD4 : ownerNo → oName
FD5 : clientNo, rentStart → propertyNo, pAddress, rentFinish, rent, ownerNo, oName
FD6 : propertyNo, rentStart → clientNo, cName, rentFinish

40/62
2NF Example

Identify FDs with existence of partial dependencies on PK


1. Is there any FD such that the determinant is a subset of PK?
2. If so, then that FD is not a Full FD, hence the relation violates the 2NF rule.

→ PK is clientNo, propertyNo
→ Subsets of PK:
→{clientNo}, {propertyNo}, {clientNo, propertyNo}
→ determinant in FD2 and FD3 is subset of PK
FDs list in ClientRental
FD1 : clientNo, propertyNo → rentStart, rentFinish
FD2 : clientNo → cName (partial dependency)
FD3 : propertyNo → pAddress, rent, ownerNo, oName (partial dependency)
FD4 : ownerNo → oName
FD5 : clientNo, rentStart → propertyNo, pAddress, rentFinish, rent, ownerNo, oName
FD6 : propertyNo, rentStart → clientNo, cName, rentFinish
41/62
2NF Example

3. Move all partial FDs attributes from the relation


i. Create new relation for each partial FDs
ii. Name all relations appropriately

2NF Relations:
From FD2: Client (clientNo, clientName)
From FD3: PropertyOwner (propertyNo, pAddress, ownerNo, oName)
The rest : Rental (clientNo, propertyNo, rentStart, rentFinish)

These are our latest set of relations. All of them are now in 2NF
as every non-PK attribute is fully FD on the PK of the relation.

42/62
2NF Example

43/62
Third Normal Form (3NF)

• 3NF: A relation that is in 1NF and 2NF and in which no non-primary


key attribute is transitively dependent on the primary key.
• 2NF relations may still suffer from update anomalies.
• This update anomaly is caused by a transitive dependency.
• The normalization of 2NF into 3NF involves the removal of transitive
dependency.
• If a transitive dependency exists, move the transitively dependent
attribute(s) from the relation by placing the attribute(s) in a new
relation along with a copy of the determinant.

44/62
2NF to 3NF: What to do?

• List FDs for all 2NF relations.


• Examine each FDs by looking for transitive dependencies
• If transitive dependencies exist on the primary key, remove them by
placing them in a new relation along with a copy of their determinant.

45/62
3NF: Example

2NF relations:
Client (clientNo, clientName)
i. FD2 : clientNo → cName

Rental (clientNo, propertyNo, rentStart, rentFinish)


i. FD1 : clientNo, propertyNo → rentStart, rentFinish
ii. FD5 : clientNo, rentStart → propertyNo, rentFinish
iii. FD6 : propertyNo, rentStart → clientNo, rentFinish

PropertyOwner (propertyNo, pAddress, rent, ownerNo, oName)


i. FD3 : propertyNo → pAddress, rent, ownerNo, oName
ii. FD4 : ownerNo → oName (Transitive dependency)

Look at PropertyOwner relation:


→ In FD3, propertyNo → ownerNo and propertyNo→oName
→In FD4, ownerNo → oName
→ oName is TD on propertyNo via ownerNo
46/62
3NF: Example

• Move transitive dependency attributes from PropertyOwner by placing


the attributes in a new relation
• Owner (ownerNo, oName)
• Rename the original relation (PropertyOwner) appropriately and place
the remaining attributes
• PropertyForRent ( propertyNo, pAddress, rent, ownerNo)

47/62
3NF: Example

PropertyOwner
(2NF)

PropertyForRent (3NF)
Owner (3NF)

48/62
3NF: Example

• These are our final set of normalized relations in the 3NF derived from
the ClientRental relation.
• Client (clientNo, cName)
• Rental (clientNo, propertyNo, rentstart, rentFinish)
• PropertyForRent (propertyNo, pAddress, rent, ownerNo)
• Owner (ownerNo, oName)

49/62
Boyce-Codd Normal Form (BCNF)

• Based on functional dependencies that take into account all candidate


keys in a relation, however BCNF also has additional constraints
compared with the general definition of 3NF.
• Boyce–Codd normal form (BCNF)
• A relation is in BCNF if and only if every determinant is a candidate
key.

50/62
Boyce-Codd Normal Form (BCNF)

• Difference between 3NF and BCNF is that for a functional dependency


A → B:
• 3NF allows this dependency in a relation if B is a primary-key
attribute and A is not a candidate key.
• Whereas BCNF insists that for this dependency to remain in a
relation, A must be a candidate key.
• Every relation in BCNF is also in 3NF. However, a relation in 3NF is not
necessarily in BCNF.

51/62
BCNF: Example

• In this example, we extend the DreamHome case study to include a


description of client interviews by members of staff.

Non-candidate key determinant This relation is in 1NF, 2NF, 3NF, BUT NOT
BCNF, because fd4 determinant is not a CK
52/62
BCNF: Example

53/62
Example of DreamHome

Client Rental PropertyForRent Owner BCNF

54/62
Another Example

• Given the following table, normalize up to BCNF.

55/62
Another Example

• UNF to 1NF:
• StaffPropertylnspection (propertyNo, iDate, iTime, pAddress,
comments, staffNo, sName, carReg)

56/62
Another Example

• Dependency

57/62
Another Example
1NF:
StaffPropertylnspection (propertyNo, iDate, iTime, pAddress, comments, staffNo, sName, carReg)

2NF:
Propertylnspection (propertyNo, iDate, iTime, comments, staffNo, sName, carReg)
Property (propertyNo, pAddress)

Transitive dependency

58/62
Another Example

3NF:
Propertylnspection (propertyNo, iDate, iTime, comments, staffNo, carReg)
Staff (staffNo, sName)
Property (propertyNo, pAddress)

Non candidate key

BCNF:
lnspection (propertyNo, iDate, iTime, comments, staffNo, carReg)
StaffCar (staffNo, iDate, carReg)
Staff (staffNo, sName)
Property (propertyNo, pAddress)

59/62
Another Example

60/62
Summary

• 1NF: removing repeating group


• 2NF: removing partial dependency
• 3NF: removing transitive dependency
• BCNF: removing non-candidate key determinant

61/62
Thank you…
Hasan ÇİFCİ

You might also like