0% found this document useful (0 votes)
53 views47 pages

Chapter Four

This document provides an overview of database normalization. It begins by defining what a relation is in database terms and outlines the requirements for a table to be considered a relation. It then discusses why normalization is important by addressing issues like data redundancy and anomalies. The document introduces the different normal forms as steps to normalize data and reduce anomalies. It provides examples of anomalies like insertion, deletion, and modification anomalies. Finally, it covers concepts like functional dependencies, primary keys, and the steps to normalize relations.

Uploaded by

Abreham Kassa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
53 views47 pages

Chapter Four

This document provides an overview of database normalization. It begins by defining what a relation is in database terms and outlines the requirements for a table to be considered a relation. It then discusses why normalization is important by addressing issues like data redundancy and anomalies. The document introduces the different normal forms as steps to normalize data and reduce anomalies. It provides examples of anomalies like insertion, deletion, and modification anomalies. Finally, it covers concepts like functional dependencies, primary keys, and the steps to normalize relations.

Uploaded by

Abreham Kassa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 47

Chapter Four

Logical Database Design

Normalization
Objectives
 Recalling Relational concepts
 Understand different anomalies and functional dependency
concepts
 Use normalization to convert anomalous tables to well-
structured relations
 Why normalise
 What is normalisation
 Identify three problems solved by normalisation
 Example of how to normalise

2
Relation
 Definition: A relation is a named, two-dimensional table of data
 Table consists of rows (records) and columns (attribute or field)
 Requirements for a table to qualify as a relation:
 It must have a unique name
 Every attribute value must be atomic (not multivalued, not
composite)
 Every row must be unique (can’t have two rows with exactly the
same values for all their fields)
 Attributes (columns) in tables must have unique names
 The order of the columns and rows must be irrelevant

NOTE: all relations are in 1st Normal form

3
Relation …
 A relational database is merely a collection of data,
organized in a particular manner.
 As the father of the relational database approach,
Boyce-Codd created a series of rules called normal
forms that help define that organization
 Recall that
 One of the best ways to determine what information should be
stored in a database is to clarify what questions will be asked of
it and what data would be included in the answers.

4
Why normalize
 Data design aims to identify data stored in a system
 Almost certainly stored in a relational database
 Normalization intended to
 Eliminate redundancy
 Organize data efficiently
 Reduce the potential for anomalies

5
What is normalization
 Decompose a relation into a set of smaller relations
 That achieve the goals stated previously
 A relation is in a specific normal form (NF) if it
 Satisfies requirements of all previous NFs
 Satisfies requirements of the current NF
 We concentrate on first 3 NFs
 Data/Database normalization is a series of steps followed to
obtain a database design that allows for consistent storage and
efficient access of data in a relational database.
 These steps reduce data redundancy and the risk of data
becoming inconsistent.

6
Cont…
 Formal definition
 NORMALIZATION is the process of identifying the
logical associations between data items and designing a
database that will represent such associations but without
suffering the update anomalies which are;
 Insertion Anomalies
 Deletion Anomalies
 Modification Anomalies

Reading Assignment: Read and Understand the three


kinds of anomalies
7
Cont…
 Normalization may reduce system performance since data will
be cross referenced from many tables.
 Thus denormalization is sometimes used to improve
performance, at the cost of reduced consistency guarantees.
 All the normalization rules will eventually remove the update
anomalies that may exist during data manipulation after the
implementation

8
Cont…
 Denormalisation
 Doesn't always make sense for data to be normalised
 Some applications work better with denormalised data
 Usually those that rely on lots of read only operations

Reading assignment: The why, when and how of


denormalization
9
Well-Structured Relations
 A relation that contains minimal data redundancy and allows users to insert,
delete, and update rows without causing data inconsistencies
 As said before the goal is to avoid anomalies
 Insertion Anomaly–adding new rows forces user to create duplicate data
 Deletion Anomaly–deleting rows may cause a loss of data that would be
needed for other future rows
 Modification Anomaly–changing data in a row forces changes to other
rows because of duplication

General rule of thumb: A table should not pertain to


more than one entity type (to have a well structured
relation)
10
Example 2

Question–Is this a relation? Answer–Yes: Unique rows and no


multivalued attributes

Question–What’s the primary key? Answer–Composite: Emp_ID, Course_Title

11
Anomalies in this Table
 Insertion–can’t enter a new employee without having the employee take a class
 Deletion–if we remove employee 140, we lose information about the existence
of a Tax Acc class
 Modification–giving a salary increase to employee 100 forces us to update
multiple records

Why do these anomalies exist?


Because there are multiple themes (entity types) in this
one relation. This results in data duplication and an
unnecessary dependency between the entities
12
Functional Dependencies and Keys
 Functional Dependency: The value of one attribute (the
determinant) determines the value of another attribute
 Candidate Key:
 A unique identifier. One of the candidate keys will become the
primary key
 E.g. perhaps there is both credit card number and SS# in a table…in this
case both are candidate keys
 Each non-key field is functionally dependent on every candidate key

13
Data Dependency
 The logical associations between data items that point the
database designer in the direction of a good database design
are referred to as determinant or dependent relationships.
 Two data items A and B are said to be in a determinant or
dependent relationship if certain values of data item B always
appears with certain values of data item A.
 If the data item A is the determinant data item and B the
dependent data item then the direction of the association is
from A to B and not vice versa.
 FDs are derived from the real-world constraints on the
attributes

14
Data Dependency
 The essence of this idea is that if the existence of something,
call it A, implies that B must exist and have a certain value,
then we say that "B is functionally dependent on A."
 We also often express this idea by saying that "A determines
B," or that "B is a function of A," or that "A functionally
governs B."

15
Data Dependency
 Often, the notions of functionality and functional
dependency are expressed briefly by the statement, "If A,
then B."
 It is important to note that the value B must be unique for a
given value of A, i.e., any given value of A must imply just
one and only one value of B, in order for the relationship to
qualify for the name "function."
 (However, this does not necessarily prevent different values
of A from implying the same value of B.)

16
Data Dependency
 X  Y holds if whenever two tuples have the same value
for X, they must have the same value for Y
 The notation is: AB which is read as; B is functionally
dependent on A
 In general, a functional dependency is a relationship among
attributes.
 In relational databases, we can have a determinant that
governs one other attribute or several other attributes.
 Who will tell us this FD? How do we know?

17
Data Dependency…
 Partial Dependency
 If an attribute which is not a member of the primary
key is dependent on some part of the primary key (if
we have composite primary key) then that attribute is
partially functionally dependent on the primary key.
 Let {A,B} is the Primary Key and C is non key
attribute.
 Then if {A,B}C and BC
 Then C is partially functionally dependent on {A,B}

18
Data Dependency…
 Full Dependency
 If an attribute which is not a member of the primary key is not
dependent on some part of the primary key but the whole key
(if we have composite primary key) then that attribute is fully
functionally dependent on the primary key.
 Let {A,B} is the Primary Key and C is non key attribute
 Then if {A,B}C and BC and AC does not hold
 Then C Fully functionally dependent on {A,B}

19
Data Dependency…
 Transitive Dependency
 In mathematics and logic, a transitive relationship is a relationship
of the following form: "If A implies B, and if also B implies C, then
A implies C."
 Example:
 If Mr X is a Human, and if every Human is an Animal, then Mr X must be
an Animal.
 Generalized way of describing transitive dependency is
that:
 If A functionally governs B, AND If B functionally governs C
THEN A functionally governs C
 Provided that neither C nor B determines A i.e. (B / A and C
/ A)
 In the normal notation:
 {(AB) AND (BC)} ==> AC provided that B / A and C / A
20
Steps of Normalization
 We have various levels or steps in normalization called
Normal Forms.
 The level of complexity, strength of the rule and
decomposition increases as we move from one lower level
Normal Form to the higher.
 A table in a relational database is said to be in a certain
normal form if it satisfies certain constraints.
 Normal form below(next) represents a stronger condition
than the previous one

21
Steps in normalization- Pictorial representation

22
First Normal Form (1NF)
 Requires that all column values in a table are atomic (e.g., a
number is an atomic value, while a list or a set is not).
 Solution
 Moving this repeating groups to a new row by repeating the common
attributes. If so then Find the key with which you can find all data
 Thus
 No multivalued attributes
 Every attribute value is atomic
 For example Fig. 5-25 is not in 1st Normal Form (multivalued
attributes)  it is not a relation
 While Fig. 5-26 is in 1st Normal form
 Remark
 All relations are in 1st Normal Form

23
Cont…
 Formal Definition: a table (relation) is in 1NF
 If
 There are no duplicated rows in the table. Unique identifier
 Each cell is single-valued (i.e., there are no repeating groups).
 Entries in a column (attribute, field) are of the same kind.

24
Example for First Normal form (1NF )

25
26
Example 2: Table with multivalued attributes, not in 1st
normal form

Note: this is NOT a relation

27
Table with no multivalued attributes and unique rows, in 1st
normal form

Note: this is relation, but not a well-structured one


28
Second Normal form 2NF
 No partial dependency of a non key attribute on part of the
primary key.
 This will result in a set of relations with a level of Second
Normal Form.
 Any table that is in 1NF and has a single-attribute (i.e., a
non-composite) key is automatically also in 2NF.

29
Cont…
 Formal Definition: a table (relation) is in 2NF
 If
 It is in 1NF and
 If all non-key attributes are dependent on the entire
primary key. i.e. no partial dependency.

30
Example 1: for 2NF
EMP_PROJ
EmpID EmpName ProjNo ProjName ProjLoc ProjFund PrrojMangID Incentive

EMP_PROJ rearranged
EmpID ProjNo EmpName ProjName ProjLoc ProjFund ProjMangID Incentive

Business rule: Whenever an employee participates in a


project, he/she will be entitled for an incentive.

This schema is in its 1NF since we don’t have any


repeating groups or attributes with multi-valued property.

31
Cont…
 To convert it to a 2NF we need to remove all partial
dependencies of non key attributes on part of the primary
key.

{EmpID, ProjNo} EmpName, ProjName, ProjLoc, ProjFund,


ProjMangID, Incentive

But in addition to this we have the following dependencies

FD1: {EmpID}EmpName
FD2: {ProjNo}ProjName, ProjLoc, ProjFund, ProjMangID
FD3: {EmpID, ProjNo} Incentive

32
Cont…
 As we can see, some non key attributes are partially
dependent on some part of the primary key.
 This can be witnessed by analyzing the first two
functional dependencies (FD1 and FD2).
 Thus, each Functional Dependencies, with their
dependent attributes should be moved to a new relation
where the Determinant will be the Primary Key for
each.

33
Cont…

34
Example 2: Functional dependency diagram for INVOICE

Order_ID  Order_Date, Customer_ID, Customer_Name, Customer_Address


Customer_ID  Customer_Name, Customer_Address
Product_ID  Product_Description, Product_Finish, Unit_Price
Order_ID, Product_ID  Order_Quantity

Therefore, NOT in 2nd Normal Form


35
Removing partial dependencies

Getting it into
Second Normal
Form

Partial dependencies are removed, but there


are still transitive dependencies
36
Third Normal Form (3NF )
 Eliminate Columns Dependent on another non-Primary
Key - If attributes do not contribute to a description of
the key, remove them to a separate table.
 This level avoids update and delete anomalies.
 Formal Definition: a Table (Relation) is in 3NF
 If
 It is in 2NF and
 There are no transitive dependencies between a
primary key and non-primary key attributes.

37
Cont…
 2NF PLUS no transitive dependencies (functional dependencies
on non-primary-key attributes)
 Note: This is called transitive, because the primary key is a determinant
for another attribute, which in turn is a determinant for a third
 Solution:
 Non-key determinant with transitive dependencies go into a new table;
 non-key determinant becomes primary key in the new table and stays as
foreign key in the old table

38
Removing transitive dependencies

Getting it into
Third Normal
Form

Transitive dependencies are removed

39
2nd Example for (3NF)
 Assumption: Students of same batch (same year) live in one
building or dormitory
 Student

StudID Stud_F_Name Stud_L_Name Dept Year Dormitary

125/97 Abebe Mekuria Info Sc 1 401


654/95 Lemma Alemu Geog 3 403
842/95 Chane Kebede CompSc 3 403

165/97 Alem Kebede InfoSc 1 401


985/95 Almaz Belay Geog 3 403

This schema is in its 2NF since the primary key is a single attribute.
40
Cont…
 StudIDYear ANDYearDormitary
And
 Year can not determine StudID and Dormitary can not
determine StudID
 Then transitively StudIDDormitary

 To convert it to a 3NF we need to remove all transitive


dependencies of non key attributes on another non-key attribute.
 The non-primary key attributes, dependent on each other will be
moved to another table and linked with the main table using
Candidate Key- Foreign Key relationship.

41
Cont…

42
Cont…
 Generally, eventhough there are other four additional levels
of Normalization, a table is said to be normalized if it reaches
3NF.
 A database with all tables in the 3NF is said to be Normalized
Database.

 Reading Assignment
 Boyce-Codd Normal Form (BCNF)
 Forth Normal form (4NF)
 Fifth Normal Form (5NF)
 Domain-Key Normal Form (DKNF)

43
Summary of the Process of
Normalization
 To understand normalisation you need to know these
problems
 1NF: Repeating groups
 2NF: Partial Dependency
 3NF: Transitive Dependency and Derived Attributes
 Normalisation is the process of decomposing relations

44
Cont…
 Some important indicators
 No Repeating or Redunduncy: no repeting fields in the table.
 The Fields Depend Upon the Key: the table should solely depend on
the key.
 The Whole Key: no partial keybdependency.
 And Nothing But The Key: no inter data dependency.

45
Cont…
 Pitfalls of Normalization
 Requires data to see the problems
 May reduce performance of the system
 Is time consuming,
 Difficult to design and apply and
 Prone to human error

46
End of Chapter Four

47

You might also like