0% found this document useful (0 votes)
9 views35 pages

Lec02 - Normalization

The document outlines the basics of functional dependencies and normalization for relational databases, covering informal design guidelines, normalization processes, and various normal forms. It emphasizes the importance of minimizing redundancy and preventing anomalies such as insertion, deletion, and update issues. Key concepts include functional dependencies, the definitions of normal forms (1NF, 2NF, 3NF, and Boyce-Codd Normal Form), and guidelines for designing effective relational schemas.

Uploaded by

Amira Gohar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views35 pages

Lec02 - Normalization

The document outlines the basics of functional dependencies and normalization for relational databases, covering informal design guidelines, normalization processes, and various normal forms. It emphasizes the importance of minimizing redundancy and preventing anomalies such as insertion, deletion, and update issues. Key concepts include functional dependencies, the definitions of normal forms (1NF, 2NF, 3NF, and Boyce-Codd Normal Form), and guidelines for designing effective relational schemas.

Uploaded by

Amira Gohar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

Basics of Functional

Dependencies and
Normalization for Relational
Databases
Lecture 2
Dr. Marwa Hussien

1
Outline
1. Informal Design Guidelines for Relational Databases
2. Normalization
3. Functional Dependencies (FDs)
4. Normal Forms Based on Primary Keys
• First Normal Form
• Second Normal Form
• Third Normal Form
• Boyce-Codd Normal Form

2
Informal Design Guidelines for Relational
Databases

1. Semantics of the Relation Attributes


2. Redundant Information in Tuples and Update Anomalies
3. Null Values in Tuples

3
1. Semantics of the Relational Attributes must
be clear meanings

row

• Guideline 1: Informally, each tuple in a relation should represent one


entity.
• Attributes of different entities (EMPLOYEEs, DEPARTMENTs, PROJECTs) should
not be mixed in the same relation.
• Only foreign keys should be used to refer to other entities.
• Bottom Line: Design a schema that can be explained easily relation by
relation. The semantics of attributes should be easy to interpret.

4
Example: A
simplified
COMPANY
relational
database schema

5
Example: A
relational
database schema
violating
Guideline1

6
Redundant Information in Tuples and Update
Anomalies

• When information is stored redundantly, it causes:


• Wastes storage
• Causes anomaly problems
• Insertion anomalies
• Deletion anomalies
• Modification anomalies (update)

the only allowed redunduncy is the one of foriegn key

7
Example: Two Relations suffering from
Anomalies

8
Example: Redundant Information in Tuples
cause Anomalies
after populationg the data... if i have 1000
employees i will erite it in
these colums 1000 x

9
Example: Redundant Information in Tuples
cause Anomalies its many to many so an employee name will be exist multiple times in many projects

10
Example: An Insert Anomaly

• Consider the relation:


• EMP_PROJ(Emp#, Proj#,
Ename, Pname, No_hours)
• Insert Anomaly:
• Cannot insert a project unless
an employee is assigned to it.
• Conversely
• Cannot insert an employee
unless an he/she is assigned
to a project.
cannot add new employee without him being
in a project cuz primary key pnum cannot be null same for project

11
Example: An Update Anomaly

• Consider the relation:


EMP_PROJ(Emp#, Proj#, Ename,
Pname, No_hours)
• Update Anomaly:
• Changing the name of project
number P1 from “ProductX”
to “Customer-Accounting”
may cause this update to be
made for all 100 employees
working on project P1.

affects functional dependency

12
Example: A Delete Anomaly

• Consider the relation:


• EMP_PROJ(Emp#, Proj#,
Ename, Pname, No_hours)
• Delete Anomaly:
• When a project is deleted, it
will result in deleting all the
employees who work on that
project.
• Alternately, if an employee is
the sole employee on a
project, deleting that
employee would result in
deleting the corresponding
project.

13
2. Redundant Information in Tuples and
Update Anomalies

• Guideline 2: Design a schema that does not suffer from the insertion,
deletion and update anomalies.
• If there are any anomalies present, then note them so that
applications can be made to take them into account.

14
3. Null Values in Tuples
• Reasons for nulls:
• Attribute not applicable or invalid.
• For example, Visa_status may not apply to U.S. students.
• Attribute value unknown. optional

• For example, Phone_no attribute if some student do not have a mobile phone.
• Value known to exist, but unavailable.
• For example, if only 15% of employees have individual offices, there is little justification
for including an attribute Office_number in the EMPLOYEE relation; rather, a relation
EMP_OFFICES(Essn, Office_number) can be created to include tuples for only the
employees with individual offices.
• GUIDELINE 3: Relations should be designed such that their tuples will
have as few NULL values as possible.

15
Normalization
• Normalization of data a process of analyzing the given relation
functional dependencies
schemas based on their FDs and primary keys to achieve the desirable
properties of (1) minimizing redundancy and (2) minimizing the
insertion, deletion, and update anomalies.
• Normalization: is the process of decomposing unsatisfactory “not well
designed” relations by breaking up their attributes into smaller
relations.

16
Functional Dependencies
• Functional dependencies (FDs)
• Used to specify formal measures of the "goodness" of relational designs
• Keys are used to define normal forms for relations
• Constraints that are derived from the meaning and interrelationships of the
data attributes
• A set of attributes X functionally determines a set of attributes Y if the
value of X determines a unique value for Y
key

17
Defining Functional Dependencies non prime attributes should depend on
prime attributes

• X → Y holds if whenever two tuples have the same value for X, they
must have the same value for Y
• For any two tuples t1 and t2 in any relation instance r(R):
If t1[X]=t2[X], then t1[Y]=t2[Y].

18
Examples of FD constraints
• Social security number determines employee name
• SSN → ENAME ssn determines ename or ename is functionally dependent on ssn

• Project number determines project name and location


• PNUMBER → {PNAME, PLOCATION}
• Employee ssn and project number determines the hours per week
that the employee works on the project
• {SSN, PNUMBER} → HOURS

19
Normal Forms based on Primary Keys
• First Normal Form
• Second Normal Form
• Third Normal Form
• Boyce Codd Normal Form variation of the third

• Fourth Normal Form


• Fifth Normal Form

20
sequential steps

21
First Normal Form
• Disallows:
• composite attributes.
• multivalued attributes.
• nested relations; attributes whose values for an individual tuple are non-
atomic.

22
Normalizing multivalued attributes into 1NF
sol:take the mv attribute and make a new table
with the primary key of the first table as both fk and part of pk in the
new table
multivalued

ssn_ | ename | grade(phd,bsc)


/\
not in 1nm
ssn_| ename

ssn_| grade_ x bcuz the dbms will deal with this as one
value
in 1 nm ok!

23
Normalizing nested relations into 1NF
same sol

no composite key

24
Second Normal Form partial dependency on part of composite key

• Definitions:
• Prime attribute: An attribute that is member of the primary key K.
• Full functional dependency: a FD Y -> Z where removal of any attribute from Y means
the FD does not hold any more.
• Examples:
• {SSN, PNUMBER} -> HOURS is a full FD since neither SSN -> HOURS nor PNUMBER ->
HOURS hold.
• {SSN, PNUMBER} -> ENAME is not a full FD (it is called a partial dependency ) since
SSN -> ENAME also holds.
• A relation schema R is in second normal form (2NF) if every non-prime
attribute A in R is fully functionally dependent on the primary key.
• R can be decomposed into 2NF relations via the process of 2NF
normalization or “second normalization”.

25
Second Normal Form

• Disallow partial dependencies (must be in 1NF).


• A relational schema R is in Second normal form (2NF) if every non-
prime attribute A in R is fully functionally dependent on the primary
key.
• R can be decomposed into 2NF relations via the process of 2NF
normalization.

26
Normalizing into 2NF sol: seperate partial dependencies in defferriant tables (many
to many)

ok parial d partial d

partial dependency
std no._|course no_| mark|course name|stdname

stdno_|crsno_|mark crsno_|crsname stdno_|stdname


27
Third Normal Form
• A relation schema R is in third normal form (3NF) if it is in 2NF and no
non-prime attribute A in R is transitively dependent on the primary
key.
• R can be decomposed into 3NF relations via the process of 3NF
normalization

28
Third Normal Form sol: sseperate y and z into a new table but y remaining as fk in the first table
and pk in the new table
non prime depends on non prime

• Disallow transitive dependency. x determines y determines z


• Transitive functional dependency:
a FD X -> Z that can be derived from two FDs X -> Y and Y -> Z.
• Examples:
• SSN -> DMGRSSN is a transitive FD
• Since SSN -> DNUMBER and DNUMBER -> DMGRSSN hold.
• SSN -> ENAME is non-transitive
• Since there is no set of attributes X where SSN -> X and X -> ENAME.

st ID_|st name|add|major|courseid_|c-title|insruct.name|ins.loc|grade 1st normal

sol on 15/2 on phone

29
Normalizing into 3NF

30
Summary of Normal Forms Based on Primary
Keys and Corresponding Normalization
Normal Form Rule Normalization
First(1NF) Relation should have no multivalued Form new relations for each multivalued attribute
attributes or nested relations. or nested relation.
Second(2NF) For relations where primary key contains Decompose and set up a new relation for each
multiple attributes, no nonkey attribute partial key with its dependent attribute(s). Make
should be functionally dependent on a sure to keep a relation with the original primary key
part of the primary key. and any attributes that are fully functionally
dependent on it.
Third(3NF) Relation should not have a nonkey Decompose and set up a relation that includes the
attribute functionally determined by nonkey attribute(s) that functionally determine(s)
another nonkey attribute (or by a set of other nonkey attribute(s).
nonkey attributes). That is, there should
be no transitive dependency of a nonkey
attribute on the primary key.

31
General Normal Form Definitions (For Multiple Keys)

• The above definitions consider the primary key only.


• The following more general definitions take into account relations
with multiple candidate keys.
• Any attribute involved in a candidate key is a prime attribute.
• All other attributes are called non-prime attributes.

32
General Definition of
2NF (For Multiple
Candidate Keys)

• A relation schema R is in second partial


dependency
normal form (2NF) if every non- transitiive
prime attribute A in R is fully dependency
functionally dependent on every
key of R.
• The FD County_name → Tax_rate
violates 2NF.
• So, second normalization converts
LOTS into LOTS1 (Property_id#,
County_name, Lot#, Area, Price)
LOTS2 ( County_name, Tax_rate)

33
General Definition
of Third Normal
Form
• Definition: Superkey of relation
schema R - a set of attributes S of
R that contains a key of R.
• A relation schema R is in third
normal form (3NF) if whenever a
FD X → A holds in R, then
either:
• (a) X is a superkey of R, or
• (b) A is a prime attribute of R
• LOTS1 relation violates 3NF
because Area → Price ; and Area
is not a superkey in LOTS1.

34
Third Normal Form
• NOTE:
• In X -> Y and Y -> Z, with X as the primary key, we consider this a problem only
if Y is not a candidate key.
• When Y is a candidate key, there is no problem with the transitive
dependency .
• E.g., Consider EMP (SSN, Emp#, Salary ).
• Here, SSN -> Emp# -> Salary and Emp# is a candidate key.

35

You might also like