Lec02 - Normalization
Lec02 - Normalization
Dependencies and
Normalization for Relational
Databases
Lecture 2
Dr. Marwa Hussien
1
Outline
1. Informal Design Guidelines for Relational Databases
2. Normalization
3. Functional Dependencies (FDs)
4. Normal Forms Based on Primary Keys
• First Normal Form
• Second Normal Form
• Third Normal Form
• Boyce-Codd Normal Form
2
Informal Design Guidelines for Relational
Databases
3
1. Semantics of the Relational Attributes must
be clear meanings
row
4
Example: A
simplified
COMPANY
relational
database schema
5
Example: A
relational
database schema
violating
Guideline1
6
Redundant Information in Tuples and Update
Anomalies
7
Example: Two Relations suffering from
Anomalies
8
Example: Redundant Information in Tuples
cause Anomalies
after populationg the data... if i have 1000
employees i will erite it in
these colums 1000 x
9
Example: Redundant Information in Tuples
cause Anomalies its many to many so an employee name will be exist multiple times in many projects
10
Example: An Insert Anomaly
11
Example: An Update Anomaly
12
Example: A Delete Anomaly
13
2. Redundant Information in Tuples and
Update Anomalies
• Guideline 2: Design a schema that does not suffer from the insertion,
deletion and update anomalies.
• If there are any anomalies present, then note them so that
applications can be made to take them into account.
14
3. Null Values in Tuples
• Reasons for nulls:
• Attribute not applicable or invalid.
• For example, Visa_status may not apply to U.S. students.
• Attribute value unknown. optional
• For example, Phone_no attribute if some student do not have a mobile phone.
• Value known to exist, but unavailable.
• For example, if only 15% of employees have individual offices, there is little justification
for including an attribute Office_number in the EMPLOYEE relation; rather, a relation
EMP_OFFICES(Essn, Office_number) can be created to include tuples for only the
employees with individual offices.
• GUIDELINE 3: Relations should be designed such that their tuples will
have as few NULL values as possible.
15
Normalization
• Normalization of data a process of analyzing the given relation
functional dependencies
schemas based on their FDs and primary keys to achieve the desirable
properties of (1) minimizing redundancy and (2) minimizing the
insertion, deletion, and update anomalies.
• Normalization: is the process of decomposing unsatisfactory “not well
designed” relations by breaking up their attributes into smaller
relations.
16
Functional Dependencies
• Functional dependencies (FDs)
• Used to specify formal measures of the "goodness" of relational designs
• Keys are used to define normal forms for relations
• Constraints that are derived from the meaning and interrelationships of the
data attributes
• A set of attributes X functionally determines a set of attributes Y if the
value of X determines a unique value for Y
key
17
Defining Functional Dependencies non prime attributes should depend on
prime attributes
• X → Y holds if whenever two tuples have the same value for X, they
must have the same value for Y
• For any two tuples t1 and t2 in any relation instance r(R):
If t1[X]=t2[X], then t1[Y]=t2[Y].
18
Examples of FD constraints
• Social security number determines employee name
• SSN → ENAME ssn determines ename or ename is functionally dependent on ssn
19
Normal Forms based on Primary Keys
• First Normal Form
• Second Normal Form
• Third Normal Form
• Boyce Codd Normal Form variation of the third
20
sequential steps
21
First Normal Form
• Disallows:
• composite attributes.
• multivalued attributes.
• nested relations; attributes whose values for an individual tuple are non-
atomic.
22
Normalizing multivalued attributes into 1NF
sol:take the mv attribute and make a new table
with the primary key of the first table as both fk and part of pk in the
new table
multivalued
ssn_| grade_ x bcuz the dbms will deal with this as one
value
in 1 nm ok!
23
Normalizing nested relations into 1NF
same sol
no composite key
24
Second Normal Form partial dependency on part of composite key
• Definitions:
• Prime attribute: An attribute that is member of the primary key K.
• Full functional dependency: a FD Y -> Z where removal of any attribute from Y means
the FD does not hold any more.
• Examples:
• {SSN, PNUMBER} -> HOURS is a full FD since neither SSN -> HOURS nor PNUMBER ->
HOURS hold.
• {SSN, PNUMBER} -> ENAME is not a full FD (it is called a partial dependency ) since
SSN -> ENAME also holds.
• A relation schema R is in second normal form (2NF) if every non-prime
attribute A in R is fully functionally dependent on the primary key.
• R can be decomposed into 2NF relations via the process of 2NF
normalization or “second normalization”.
25
Second Normal Form
26
Normalizing into 2NF sol: seperate partial dependencies in defferriant tables (many
to many)
ok parial d partial d
partial dependency
std no._|course no_| mark|course name|stdname
28
Third Normal Form sol: sseperate y and z into a new table but y remaining as fk in the first table
and pk in the new table
non prime depends on non prime
29
Normalizing into 3NF
30
Summary of Normal Forms Based on Primary
Keys and Corresponding Normalization
Normal Form Rule Normalization
First(1NF) Relation should have no multivalued Form new relations for each multivalued attribute
attributes or nested relations. or nested relation.
Second(2NF) For relations where primary key contains Decompose and set up a new relation for each
multiple attributes, no nonkey attribute partial key with its dependent attribute(s). Make
should be functionally dependent on a sure to keep a relation with the original primary key
part of the primary key. and any attributes that are fully functionally
dependent on it.
Third(3NF) Relation should not have a nonkey Decompose and set up a relation that includes the
attribute functionally determined by nonkey attribute(s) that functionally determine(s)
another nonkey attribute (or by a set of other nonkey attribute(s).
nonkey attributes). That is, there should
be no transitive dependency of a nonkey
attribute on the primary key.
31
General Normal Form Definitions (For Multiple Keys)
32
General Definition of
2NF (For Multiple
Candidate Keys)
33
General Definition
of Third Normal
Form
• Definition: Superkey of relation
schema R - a set of attributes S of
R that contains a key of R.
• A relation schema R is in third
normal form (3NF) if whenever a
FD X → A holds in R, then
either:
• (a) X is a superkey of R, or
• (b) A is a prime attribute of R
• LOTS1 relation violates 3NF
because Area → Price ; and Area
is not a superkey in LOTS1.
34
Third Normal Form
• NOTE:
• In X -> Y and Y -> Z, with X as the primary key, we consider this a problem only
if Y is not a candidate key.
• When Y is a candidate key, there is no problem with the transitive
dependency .
• E.g., Consider EMP (SSN, Emp#, Salary ).
• Here, SSN -> Emp# -> Salary and Emp# is a candidate key.
35