0% found this document useful (0 votes)
5 views88 pages

Unit 3 Normalization

The document provides an overview of the relational model in database management systems, detailing concepts such as tables, attributes, tuples, and relational schemas. It explains key constraints, integrity rules, and operations like selection, projection, and joins, emphasizing the importance of domain constraints and referential integrity. Additionally, it introduces relational algebra and various operators used for data retrieval and manipulation.

Uploaded by

mayankgrover846
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views88 pages

Unit 3 Normalization

The document provides an overview of the relational model in database management systems, detailing concepts such as tables, attributes, tuples, and relational schemas. It explains key constraints, integrity rules, and operations like selection, projection, and joins, emphasizing the importance of domain constraints and referential integrity. Additionally, it introduces relational algebra and various operators used for data retrieval and manipulation.

Uploaded by

mayankgrover846
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 88

Data Base Management System

Unit -3
Relational Model
 Main idea:
 Table: relation
 Column header: attribute
 Row: tuple

 Relational schema: name(attributes)


 Example: employee(ssno,name,salary)
 Attributes:
 Each attribute has a domain – domain constraint
 Each attribute is atomic: we cannot refer to or directly see
a subpart of the value.
Relation Example

Account Customer
AccountId CustomerId Balance Id Name Addr
150 20 11,000 20 Tom Irvine
160 23 2,300 23 Jane LA
180 23 32,000 32 Jack Riverside
• Database schema consists of
– a set of relation schema
– Account(AccountId, CustomerId, Balance)
– Customer(Id, Name, Addr)
– a set of constraints over the relation schema
– AccountId, CustomerId must an integer
– Name and Addr must be a string of characters
– CustomerId in Account must be of Ids in Customer
– etc.
NULL value

Customer(Id, Name, Addr)


Id Name Addr
20 Tom Irvine
23 Jane LA
32 Jack NULL

 Attributes can take a special value: NULL


 Either not known: we don’t know Jack’s address
Domain Constraints
 Every attribute has a type:
 integer, float, date, boolean, string, etc.
 An attribute can have a domain. E.g.:
 Id > 0
 Salary > 0
 age < 100
 City in {Irvine, LA, Riverside}
 An insertion can violate the domain constraint.
 DBMS checks if insertion violates domain constraint and reject the insertion.

Integer String String


Id Name City
20 Tom Irvine
23 Jane San Diego
-2 Jack Riverside violations
Key Constraints

 Superkey: A Super Key is a set of one or more attributes (columns)


that can uniquely identify a row (tuple) in a table. No two rows in
the table can have the same values for a Super Key. Every
Candidate Key is a Super Key, but not every Super Key is a
Candidate Key.

 Any superset of {Account} is also a superkey


 There can be multiple superkeys

Log(LogId, AccountId, Xact#, Time, Amount) Illegal


LogID AccountID Xact# Time Amount
1001 111 4 1/12/02 $100
1001 122 4 12/28/01 $20
1003 333 6 9/1/00 $60
Example of Super Key
Table: Employee
Possible Super Keys:
1️⃣ {Emp_ID} (Unique by itself )
2️⃣ {Emp_ID, Name} (Contains extra attribute )
3️⃣ {Email} (Each email is unique )
4️⃣ {Phone} (Each phone number is unique )
5️⃣ {Emp_ID, Email, Phone, Dept_ID} (Still unique but redundant )
Minimal Super Keys like {Emp_ID}, {Email}, {Phone} are
Candidate Keys.
Redundant Super Keys contain extra attributes and are not
minimal.
Emp_ID Name Email Phone Dept_ID
101 Alice [email protected] 9876543210 HR
102 Bob [email protected] 9876543211 IT
103 Charlie [email protected] 9876543212 HR
Keys

 Key:
 Minimal superkey (no proper subset is a superkey)
 If more than one key: choose one as a primary key
 Example:
 Key 1: LogID (primary key)
 Key 2: AccountId, Xact#
 Superkeys: all supersets of the keys

Log(LogId, AccountId, Xact#, Time, Ammount)


LogID AccountID Xact# Time Amount
1001 111 4 1/12/02 $100 OK
1002 122 4 12/28/01 $20
1003 333 6 9/1/00 $60
Integrity Rules

There are two Integrity Rules that every relation should follow :
1. Entity Integrity (Rule 1)
2. Referential Integrity (Rule 2)

Entity Integrity states that –

If attribute A of a relation R is a prime attribute of R, then A


can not accept null and duplicate values.
Referential Integrity Constraints
 Given two relations R and S, R has a primary key X (a set of attributes)
 A set of attributes Y is a foreign key of S if:
 Attributes in Y have same domains as attributes X
 For every tuple s in S, there exists a tuple r in R: s[Y] = r[X].
 A referential integrity constraint from attributes Y of S to R means that Y is
a foreign that refers to the primary key of R.
 The foreign key must be either equal to the primary key or be entirely null.

Foreign key Y X (primary key of R)

r
s

S R
Examples of Referential Integrity

Account Customer
AccountId CustomerId Balance Id Name Addr
150 20 11,000 20 Tom Irvine
160 23 2,300 23 Jane LA
180 23 32,000 32 Jack Riverside

Account.customerId to Customer.Id

Student Dept
Id Name Dept Name chair
1111 Mike ICS ICS Tom
2222 Harry CE CE Jane
3333 Ford ICS MATH Jack

Student.dept to Dept.name: every value of Student.dept must also be a


value of Dept.name.
Relational Algebra

Relational Algebra is :
1. The formal description of how a relational database
operates
2. An interface to the data stored in the database itself.
3. The mathematics which underpin SQL operations

The DBMS must take whatever SQL statements the


user types in and translate them into relational algebra
operations before applying them to the database.
Operators - Retrieval
There are two groups of operations:

1. Mathematical set theory based relations:


UNION, INTERSECTION, DIFFERENCE, and
CARTESIAN PRODUCT.
2. Special database oriented operations:
SELECT , PROJECT and JOIN.
Symbolic Notation
 SELECT σ (sigma)
 PROJECT  (pi)
 PRODUCT  (times)
 JOIN ⋈ (bow-tie)
 UNION  (cup)
 INTERSECTION  (cap)
 DIFFERENCE - (minus)
 RENAME  (rho)
SET Operations - requirements
For set operations to function correctly the relations
R and S must be union compatible. Two relations
are union compatible if

They have the same number of attributes


The domain of each attribute in column order is
the same in both R and S.
Set Operations - semantics
Consider two relations R and S.
 UNION of R and S
the union of two relations is a relation that includes all
the tuples that are either in R or in S or in both R and S.
Duplicate tuples are eliminated.

 INTERSECTION of R and S
the intersection of R and S is a relation that includes
all tuples that are both in R and S.

 DIFFERENCE of R and S
the difference of R and S is the relation that contains
all the tuples that are in R but that are not in S.
Union , Intersection , Difference -

Set operators. Relations must have the same


schema.

R(name, dept) S(name, dept)


Name Dept Name Dept
Jack Physics Jack Physics
Tom ICS Mary Math

RS RS R-S


Name Dept Name Dept Name Dept
Jack Physics Jack Physics Tom ICS
Tom ICS
Mary Math
Relational SELECT
SELECT is used to obtain a subset of the tuples of a
relation that satisfy a select condition.
For example, find all employees born after 1st Jan 1950:
SELECT dob > ’01/JAN/1950’ (employee)
or
σ dob > ’01/JAN/1950’ (employee)
Conditions can be combined together using ^ (AND) and v
(OR). For example, all employees in department 1 called
`Smith':
σ depno = 1 ^ surname = `Smith‘ (employee)
Selection 

 c (R): return tuples in R that satisfy condition C.


Emp (name, dept, salary)
Name Dept Salary
Jane ICS 30K
Jack Physics 30K
Tom ICS 75K
Joe Math 40K
Jack Math 50K

 salary>35K (Emp)  dept=ics and salary<40K (Emp)


Name Dept Salary Name Dept Salary
Tom ICS 75K Jane ICS 30K
Joe Math 40K
Jack Math 50K
Relational PROJECT
The PROJECT operation is used to select a subset of the attributes of a
relation by specifying the names of the required attributes.

For example, to get a list of all employees with their salary


PROJECT ename, salary (employee)

OR
πename, salary(employee)
Projection 

A1,…,Ak(R): pick columns of attributes A1,…,Ak of R.


Emp (name, dept, salary)
Name Dept Salary
Jane ICS 30K
Jack Physics 30K
Tom ICS 75K
Joe Math 40K
Jack Math 50K

name,dept (Emp) name (Emp)


Name Dept Name
Jane ICS Jane
Jack Physics Jack
Tom ICS Tom
Joe Math Joe
Jack Math
Duplicates (“Jack”) eliminated.
CARTESIAN PRODUCT
The Cartesian Product is also an operator which
works on two sets. It is sometimes called the
CROSS PRODUCT or CROSS JOIN.

It combines the tuples of one relation with all the


tuples of the other relation.
Cartesian Product: 
R  S: pair each tuple r in R with each tuple s in S.

Emp (name, dept) Contact(name, addr)


Name Dept Name Addr
Jack Physics Jack Irvine
Tom LA
Tom ICS
Mary Riverside

Emp  Contact
E.name Dept C.Name Addr
Jack Physics Jack Irvine
Jack Physics Tom LA
Jack Physics Mary Riverside
Tom ICS Jack Irvine
Tom ICS Tom LA
Tom ICS Mary Riverside
JOIN Example
 JOIN is used to combine related tuples from two
relations R and S.
 In its simplest form the JOIN operator is just the
cross product of the two relations and is represented
as (R ⋈ S).

JOIN allows you to evaluate a join condition between


the attributes of the relations on which the join is
undertaken.
The notation used is R ⋈ S
Join Condition
Join
R C
S =  c (R  S)
• Join condition C is of the form:
<cond_1> AND <cond_2> AND … AND <cond_k>
Each cond_i is of the form A op B, where:
– A is an attribute of R, B is an attribute of S
– op is a comparison operator: =, <, >, , , or .
• Different types:
– Theta-join
– Equi-join
– Natural join
Theta-Join

R S
R.A>S.C

R(A,B) S(C,D)
A B C D
3 4 2 7
5 7 6 8

RS Result
R.A R.B S.C S.D
3 4 2 7 R.A R.B S.C S.D
3 4 6 8 3 4 2 7
5 7 2 7 5 7 2 7
5 7 6 8
Theta-Join

R S
R.A>S.C, R.B  S.D

R(A,B) S(C,D)
A B C D
3 4 2 7
5 7 6 8

RS Result
R.A R.B S.C S.D R.A R.B S.C S.D
3 4 2 7 3 4 2 7
3 4 6 8
5 7 2 7
5 7 6 8
Equi-Join

 Special kind of theta-join: C only uses the equality operator.

R(A,B) S(C,D)
A B C D
3 4 2 7
5 7 6 8

R S
R.B=S.D
RS Result
R.A R.B S.C S.D R.A R.B S.C S.D
3 4 2 7 5 7 2 7
3 4 6 8
5 7 2 7
5 7 6 8
Natural-Join

 Relations R and S. Let L be the union of their attributes.


 Let A1,…,Ak be their common attributes.

R S =  L (R S)
R.A1=S.A1,…,R.Ak=S.Ak
Natural-Join

Emp (name, dept) Contact(name, addr)


Name Dept Name Addr
Jack Physics Jack Irvine
Tom LA
Tom ICS
Mary Riverside

Emp Contact: all employee names, depts, and addresses.

Emp.name Emp.Dept Contact.name Contact.addr


Jack Physics Jack Irvine
Jack Physics Tom LA
Emp  Contact Jack Physics Mary Riverside
Tom ICS Jack Irvine
Tom ICS Tom LA
Tom ICS Mary Riverside

Result Name Dept Addr


Jack Physics Irvine
Tom ICS LA
Outer Joins

 Motivation: “join” can lose information


 E.g.: natural join of R and S loses info about Tom and
Mary, since they do not join with other tuples.
 Called “dangling tuples”.
R S
Name Dept Name Addr
Jack Physics Jack Irvine
Tom ICS Mike LA
Mary Riverside
• Outer join: natural join, but use NULL values to fill in dangling tuples.
• Three types: “left”, “right”, or “full”
Left Outer Join
Name Dept Name Addr
R Jack Physics Jack Irvine S
Mike LA
Tom ICS Mary Riverside

Left outer join


R S

Name Dept Addr


Jack Physics Irvine
Tom ICS NULL

Pad null value for left dangling tuples.


Right Outer Join
Name Addr
Name Dept Jack Irvine
R Jack Physics Mike LA S
Tom ICS Mary Riverside

Right outer join


R S

Name Dept Addr


Jack Physics Irvine
Mike NULL LA
Mary NULL Riverside

Pad null value for right dangling tuples.


Full Outer Join

Name Dept Name Addr


R Jack Physics Jack Irvine S
Tom ICS Mike LA
Mary Riverside

Full outer join


R S

Name Dept Addr


Jack Physics Irvine
Tom ICS NULL
Mike NULL LA
Mary NULL Riverside

Pad null values for both left and right dangling tuples.
Joins Revised

Result of applying these joins in a query:


INNER JOIN: Select only those rows that have values in common in the
columns specified in the ON clause.
LEFT, RIGHT, or FULL OUTER JOIN: Select all rows from the table on the left (or
right, or both) regardless of whether the other table has values in common
and (usually) enter NULL where data is missing.
Combining Different Operations

 Construct general expressions using basic operations.


 Schema of each operation:
 , , -: same as the schema of the two relations
 Selection  : same as the relation’s schema
 Projection : attributes in the projection
 Cartesian product  : attributes in two relations, use prefix
to avoid confusion
 Theta Join : same as 
C
 Natural Join : union of relations’ attributes, merge
common attributes
 Renaming: new renamed attributes
Example 1
customer(ssn, name, city)
account(custssn, balance)
“List account balances of Tom.”
balance ( custssn = ssn
(account  (
name =tom
customer )))

balance
Tree representation  custssn= ssn


account  name=tom
customer
Example 1(cont)
customer(ssn, name, city)
account(custssn, balance)
“List account balances of Tom.”

balance

ssn=custssn

account  name=tom
customer
Comparing RA and SQL
Relational algebra:
 is closed (the result of every expression is a relation)
 has a rigorous foundation
 has simple semantics
 is used for reasoning, query optimisation, etc.
SQL:
 is a superset of relational algebra
 has convenient formatting features, etc.
 provides aggregate functions
 has complicated semantics
 is an end-user language.
Functional
Dependencies
And
Normalization
Schema Normalization

 Decompose relational schemes to


 remove redundancy
 remove anomalies
 Result of normalization:
 Semantically-equivalent relational scheme
 Represent the same information as the original
 Be able to reconstruct the original from
decomposed relations.
Functional Dependencies
 Motivation: avoid redundancy in database design.
Relation R(A1,...,An,B1,...,Bm,C1,...,Cl)
Definition: A1,...,An functionally determine
B1,...,Bm,i.e.,
(A1,...,An →B1,...,Bm)
iff for any two tuples r1 and r2 in R,
r1(A1,...,An ) = r2(A1,...,An )
implies r1(B1,...,Bm) = r2(B1,...,Bm)
 By definition: a superkey → all attributes of the
relation.
Example
Take(StudentID, CID, Semster, Grade)
FD: (StudentId,Cid,semester) → Grade
StudentId Cid Semester Grade
1111 ICS184 Winter 02 A
1111 ICS184 Winter 02 B Illegal
2222 ICS143 Fall 01 A-

What if FD: (StudentId, Cid) → Semester?


StudentId Cid Semester Grade
1111 ICS184 Winter 02 A
1111 ICS184 Spring 02 A Illegal
2222 ICS143 Fall 01 A-

“Each student can take a course only once.”


FD Sets
 A set of FDs on a relation: e.g., R(A,B,C), {A→B,
B→C, A→C, AB→A}
 Some dependencies can be derived
 e.g., A→C can be derived from {A→B, B→C}.
 Some dependencies are trivial
 e.g., AB→A is “trivial.”
Trivial Dependencies

 Those that are true for every relation


 A1 A2…An → B1 B2…Bm is trivial if B’s are a subset of the
A’s.
 Example: XY → X (here X is a subset of XY)

 Called nontrivial if none of the B’s is one of the A’s.


 Example: AB→C (i.e. there is no such attribute at right
side of the FD which is at left side also)
Closure of FD Set
 Definition: Let F be a set of FDs of a relation R.
We use F+ to denote the set of all FDs that must
hold over R, i.e.:
F+ = { X → Y | F logically implies X → Y}
 F+ is called the closure of F.
 Example: F = {A→B, B→C}, then A→C is in F+.
Armstrong’s Axioms: Inferring All FDs

Given a set of FDs F over a relation R, how to compute F+?

• Reflexivity:
– If Y is a subset of X, then X →Y.
– Example: AB→A, ABC→AB, etc.

• Augmentation:
– If X→Y, then XZ→YZ.
– Example: If A→B, then AC→BC.

• Transitivity:
– If X→Y, and Y→Z, then X→Z.
– Example: If AB→C, and C→D, then AB→D.
More Rules Derived from AAs

 Union Rule( or additivity):


 If X→Y, X→Z, then X→YZ

 Projectivity
 If X→YZ, then X→Y and X→Z

 Pseudo-Transitivity Rule:
 If X→Y, WY→Z, then WX→Z
The Normalization Process
 In relational databases the term normalization refers to a reversible step-
by-step process in which a given set of relations is decomposed into a set
of smaller relations that have a progressively simpler and more regular
structure.

 The objectives of the normalization process are:

 To make it feasible to represent any relation in the


database.
applies to First Normal Form
 To free relations from undesirable insertion, update and
deletion anomalies.
applies to all normal forms
The Normalization Process

 The entire normalization process is based


upon

 the analysis of relations


 their schemes
 their primary keys
 their functional dependencies.
Normalization

rmalized Relati
o
n t normal fo o
rs d normal r

Un

ns
Functional

on normal f
dependency

m m
Sec Fi
No transitive of nonkey
dependency
d f

or
attributes on
between the primary

Thir

orm
nonkey key - Atomic
attributes Boyce- values only
Codd and
Higher
All Full
determinants Functional
are candidate dependency
of nonkey
keys - Single
multivalued attributes on
dependency the primary
key
Normal Forms

1st Normal Form No repeating data groups


2nd Normal Form No partial key dependency
3rd Normal Form No transitive dependency
Boyce-Codd Normal Form Reduce keys dependency
Unnormalized Relations

 First step in normalization is to convert the data into a


two-dimensional table

 A relation is said to be unnormalized if does not conatin


atomic values.
Eg of Unnormalized Relation
Patient # Surgeon # Surg. date Patient Name Patient Addr Surgeon Surgery Postop drug
Drug side effects

Gallstone
s removal;
Jan 1, 15 New St. Beth Little Kidney
145 1995; June New York, Michael stones Penicillin, rash
1111 311 12, 1995 John White NY Diamond removal none- none

Eye
Charles Cataract
Apr 5, Field removal
243 1994 May 10 Main St. Patricia Thrombos Tetracyclin Fever
1234 467 10, 1995 Mary Jones Rye, NY Gold is removal e none none
Dogwood
Lane Open
Jan 8, Harrison, David Heart Cephalosp
2345 189 1996 Charles Brown NY Rosen Surgery orin none
55 Boston
Post Road,
Nov 5, Chester, Cholecyst
4876 145 1995 Hal Kane CN Beth Little ectomy Demicillin none
Blind Brook Gallstone
May 10, Mamaronec s
5123 145 1995 Paul Kosher k, NY Beth Little Removal none none
Eye
Cornea
Replacem
Apr 5, Hilton Road ent Eye
1994 Dec Larchmont, Charles cataract Tetracyclin
6845 243 15, 1984 Ann Hood NY Field removal e Fever
First Normal Form

 Tomove to First Normal Form a relation must


contain only atomic values at each row and
column.
No repeating groups
 Relation in 1NF contains only atomic
values.
First Normal Form
 Three Formal definitions of First Normal Form

 A relation r is said to be in First Normal Form (1NF) if and


only if every entry of the relation (each cell) has at most a
single value.

 A relation is in first normal form (1NF) if and only if all


underlying simple domain contains atomic values only.

 A relation is in 1NF if and only if all of its attributes are


based upon a simple domain.
 These two definitions are equivalent.
 If all relations of a database are in 1NF, we can say that
the database is in 1NF.
Eg of First Normal Form
The normalized representation of the PROJECT table
PROJECT
Proj Proj-Name Proj-Mgr- Emp-ID Emp- Emp-Dpt Emp-Hrly- Total
-ID ID Name Rate -Hrs
100 E-commerce 789487453 123423479 Heydary MIS 65 10
100 E-commerce 789487453 980808980 Jones TechSupport 45 6
100 E-commerce 789487453 234809000 Alexander TechSupport 35 6
100 E-commerce 789487453 542298973 Johnson TechDoc 30 12
110 Distance-Ed 820972445 432329700 Mantle MIS 50 5
110 Distance-Ed 820972445 689231199 Richardson TechSupport 35 12
110 Distance-Ed 820972445 712093093 Howard TechDoc 30 8
120 Cyber 980212343 834920043 Lopez Engineering 80 4
120 Cyber 980212343 380802233 Harrison TechSupport 35 11
120 Cyber 980212343 553208932 Olivier TechDoc 30 12
120 Cyber 980212343 123423479 Heydary MIS 65 07
130 Nitts 550227043 340783453 Shaw MIS 65 07
First Normal Form
 This normalized PROJECT table is not a relation
because it does not have a primary key.
 The attribute Proj-ID no longer identifies uniquely
any row.
 To transform this table into a relation a primary key
needs to be defined.
 A suitable PK for this table is the composite key
(Proj-ID, Emp-ID)
No other combination of the attributes of the table
will work as a PK.
Partial Dependencies
 Identifying the partial dependencies in the PROJECT-
EMPLOYEE relation.

 The PK of this relation is formed by the attributes Proj-ID


and Emp-ID.
 This implies that {Proj-ID, Emp-ID} uniquely identifies a
tuple in the relation.
 They functionally determine any individual attribute or
any combination of attributes of the relation.
 However, we only need attribute Emp-ID to functionally
determine the following attributes:
 Emp-Name, Emp-Dpt, Emp-Hrly-Rate.
Second Normal Form

And we need only Proj-Id attribute to functionally determine


proj_name and Proj_Mgr_Id.
So, we decompose the relation into following two relations:

PROJECT Proj- Proj- Proj-Mgr-


ID Name ID
100 E- 789487453
commerce
110 Distance- 820972445
Ed
120 Cyber 980212343
130 Nitts 550227043
Second Normal Form
PROJECT-EMPLOYEE

Emp-ID Emp-Name Emp-Dpt Emp-Hrly-


Rate
123423479 Heydary MIS 65
980808980 Jones TechSupport 45
234809000 Alexander TechSupport 35
542298973 Johnson TechDoc 30
432329700 Mantle MIS 50
689231199 Richardson TechSupport 35
712093093 Howard TechDoc 30
834920043 Lopez Engineering 80
380802233 Harrison TechSupport 35
553208932 Olivier TechDoc 30
340783453 Shaw MIS 65
 There are no partial dependencies in both the tables
because the determinant of the key only has a single
attribute.
Emp-Name
 For eg: Proj-ID
Emp-Dpt

Emp-ID Emp-Hrly-Rate

 To relate these two relations, we create a third table


(relationship table) that consists of the primary keys of
both the relations as foreign key and an attribute ‘Total-
Hrs-Worked’ because it is fully dependent on the key of
the relation {Proj-Id, Emp-Id}.
Second Normal Form
A relation is said to be in Second Normal Form if is in 1NF and
when every non key attribute is fully functionally dependent on
the primary key.
Or No nonprime attribute is partially dependent on any key .

Now, the example relation scheme is in 2NF with following


relations:
Project (Proj-Id, Proj-Name, Proj-Mgr-Id)
Employee (Emp-Id, Emp-Name, Emp_dept, Emp-Hrly-Rate )
Proj_Emp (Proj-id, Emp-Id, Total-Hrs-Worked)
Data Anomalies in 2NF Relations

 Insertion anomalies occur in the EMPLOYEE


relation.
 Consider a situation where we would like to set
in advance the rate to be charged by the
employees of a new department.
 We cannot insert this information until there is an
employee assigned to that department.
Notice that the rate that a department charges
is independent of whether or not it has
employees.
Data Anomalies in 2NF Relations

 The EMPLOYEE relation is also susceptible to


deletion anomalies.

 This type of anomaly occurs whenever we delete


the tuple of an employee who happens to be the
only employee left in a department.
 Inthis case, we will also lose the information
about the rate that the department charges.
Data Anomalies in 2NF Relations

 Update anomalies will also occur in the EMPLOYEE


relation because there may be several employees from
the same department working on different projects.

 If thedepartment rate changes, we need to make


sure that the corresponding rate is changed for all
employees that work for that department.
Otherwise the database may end up in an
inconsistent state.
Transitive Dependencies
 A transitive dependency is a functional dependency which holds by virtue of
transitivity. A transitive dependency can occur only in a relation that has three
or more attributes. Let A, B, and C designate three distinct attributes and
following conditions hold:
 A→B (where A is the key of the relation)
 B→C
 Then the functional dependency A → C (which follows from 1 and 3 by the
axiom of transitivity) is a transitive dependency.
 For eg: If in a relation Book is the key and
{Book} → {Author}
{Author} → {Nationality}
Therefore {Book} → {Nationality} is a transitive dependency.
 Transitive dependency occurs when a non-key attribute determines another
non-key attribute.
Transitive Dependencies
 Assume the following functional dependencies of
attributes A, B and C of relation r(R):

C
Third Normal Form
 A relation is in 3NF iff it is in 2NF and every non key attribute is non
transitively dependent on the primary key.

 A relation r(R) is in Third Normal Form (3NF) if and only if the following
conditions are satisfied simultaneously:
 r(R) is already in 2NF.
 No nonprime attribute is transitively dependent on the key.

 The objective of transforming relations into 3NF is to remove all transitive


dependencies.
 Given a relation R with FDs F, test if R is in 3NF.
 Compute all the candidate keys of R
 For each X→Y in F, check if it violates 3NF
 If X
is not a superkey, and Y is not part of a candidate key, then
X→Y violates 3NF.
Conversion to Third Normal Form

A* A*
B B
Convert to
C

B*
* indicates the key or the C
determinant of the relation.
Third Normal Form
 Using the general procedure, we will transform our 2NF
relation example to a 3NF relation.
 The relation EMPLOYEE is not in 3NF because there is a
transitive dependency of a nonprime attribute on the primary
key of the relation.
 In this case, the nonprime attribute Emp-Hrly-Rate is
transitively dependent on the key through the functional
dependency Emp-Dpt → Emp-Hrly-Rate.
 To transform this relation into a 3NF relation:
 it is necessary to remove any transitive dependency of a
nonprime attribute on the key.
 It is necessary to create two new relations.
Third Normal Form

The scheme of
the first relation that we have
named EMPLOYEE is:

EMPLOYEE (Emp-ID, Emp-Name, Emp-Dpt)

The scheme of
the second relation that we have
named CHARGES is:

CHARGES (Emp-Dpt, Emp-Hrly-Rate)


Data Anomalies in Third Normal Form
 The Third Normal Form helped us to get rid of the data
anomalies caused either by
 transitive dependencies on the PK or
 by dependencies of a nonprime attribute on another
nonprime attribute.

 However, relations in 3NF are still susceptible to data


anomalies, particularly when
 the relations have two overlapping candidate keys or
 when a nonprime attribute functionally determines a
prime attribute.
Boyce-Codd Normal Form (BCNF)

• A relation is in BCNF iff every determinant is a candidate key.


OR
• In other words, a relational schema R is in Boyce–Codd normal
form if and only if for every one of its dependencies X → Y, at least
one of the following conditions hold:
• X → Y is a trivial functional dependency (Y ⊆ X)
• X is a superkey for schema R

• The definition of 3NF does not deal with a relation that:


• has multiple candidate keys, where
• those candidate keys are composite, and
• the candidate keys overlap (i.e., have at least one common
attribute)
Example of BCNF

Candidate keys are (sid, part_id)


and (sname, part_id).
With following FDs: sname part_id
1. { sid, part_id } → qty sid qty
2. { sname, part_id } → qty
SSP
3. sid → sname
4. sname → sid

The relation is in 3NF:


For sid → sname, … sname is in a candidate key.
For sname → sid, … sid is in a candidate key.

However, this leads to redundancy and loss of information


Example of BCNF
If we decompose the schema into
R1 = ( sid, sname ), R2 = ( sid, part_id, qty )
These are in BCNF.

The decomposition is dependency preserving.


{ sname, part_id } → qty can be deduced from

(1) sname → sid (given)


(2) { sname, part_id } → { sid, part_id } (augmentation on (1))
(3) { sid, part_id } → qty (given)

and finally transitivity on (2) and (3).


3NF vs BCNF
 Only in rare cases does a 3NF table not meet the
requirements of BCNF. A 3NF table which does not have
multiple overlapping candidate keys is guaranteed to be in
BCNF. Depending on what its functional dependencies are, a
3NF table with two or more overlapping candidate keys may
or may not be in BCNF.
 If a relation schema is not in BCNF
 it is possible to obtain a lossless-join decomposition into a
collection of BCNF relation schemas.
 Dependency-preserving is not guaranteed.
 3NF
 There is always a dependency-preserving, lossless-join
decomposition into a collection of 3NF relation schemas.
Properties of a good Decomposition
A decomposition of a relation R into sub-relations R1, R2,…….,
Rn should possess following properties:

The decomposition should be

• Attribute Preserving ( All the attributes in the given relation


must occur in any of the sub – relations)
• Dependency Preserving ( All the FDs in the given relation
must be preserved in the decomposed relations)
• Lossless join ( The natural join of decomposed relations should
produce the same original relation back, without any spurious
tuples).
• No redundancy ( The redundancy should be minimized in the
decomposed relations).
Lossless Join Decomposition
The relation schemas { R1, R2, …, Rn } is a lossless-join decomposition of R
if:
for all possible relations r on schema R,
r = R1( r )   R2( r ) …   Rn ( r )
Example:
Student = ( sid, sname, major)
F = { sid → sname, sid → major}

{ sid, sname } + { sid, major } is a lossless join decomposition


the intersection = {sid} is a key in both schemas

{sid, major} + { sname, major } is not a lossless join decomposition


the intersection = {major} is not a key in either
{sid, major} or { sname, major }
Another Example

R = { A, B, C, D }
F = { A → B, C → D }.
Key is {AC}.
introduce
Decomposition: { (A, B), (C, D), (A, C) } virtually
Consider it a two step decomposition:
1. Decompose R into R1 = (A, B), R2 = (A, C, D)
2. Decompose R2 into R3 = (C, D), R4 = (A, C)
This is a lossless join decomposition.

If R is decomposed into (A, B), (C, D)


This is a lossy-join decomposition.
Fourth Normal Form
A relation R is in 4NF if and only if it satisfies following
conditions:
 If R is already in 3NF or in BCNF.
 If it contains no multi valued dependencies.

MVDs occur when two or more independent multi valued facts


about the same attribute occur within the same relation.

This means that if in a relation R, having A, B and C attributes,


B and C are multi valued represented as A→→B and A→→C,
then MVD exists only if B and C are independent of each other.
Example: 4NF
Example: 4NF
Fifth Normal Form
 A relation R is in 5NF (also called Projection-Join Normal form
or PJNF) iff every join dependency in the relation R is implied
by the candidate keys of the relation R.

 A relation decomposed into two relations must have lossless


join property, which ensures that no spurious tuples are
generated when relations are reunited using a natural join.

 There are requirements to decompose a relation into more


than two relations. Such cases are managed by join
dependency and 5NF.

 Implies that relations that have been decomposed in previous


NF can be recombined via natural joins to recreate the
original relation.
Fifth Normal Form
Consider the different case where, if an agent is an agent for a company and that
company makes a product, then he always sells that product for the company.
Under these circumstances, the 'agent company product' table is as shown
below . This relation contains following dependencies.
Agent →→ Company
Agent →→ Product_Name
Company→→Product_Name
agent_company_product_table
Fifth Normal Form
The table is necessary in order to show all the information required.
Suneet, for example, sells ABC's Nuts and Screws, but not ABC's Bolts. Raj is
not an age it for CDE and does not sell ABC's Nuts or Screws. The table is
in 4NF because it contains no multi-valued dependency. It does,
however, contain an element of redundancy in that it records the fact
that Suneet is an agent for ABC twice. Suppose that the table is
decomposed into its two projections, PI and P2.

The redundancy has been eliminated, but the information about which
companies make which products and which of these products they
supply to which agents has been lost. The natural join of these two
projections will result in some spurious tuples (additional tuples which were
not present in the original relation).
Fifth Normal Form
This table can be decomposed into its three projections without loss of
information as demonstrated below .

If we take the natural join of these relations then we get the original
relation back. So this is the correct decomposition.

decompose into three projection


THANK
YOU

You might also like