0% found this document useful (0 votes)
35 views95 pages

Dbms Unit 51

This chapter discusses relational database design and normalization. It covers informal design guidelines like minimizing null values and avoiding update anomalies. The chapter defines functional dependencies and describes how they are used to determine candidate keys and define normal forms like 1NF, 2NF and 3NF. Normalization helps minimize data anomalies by removing redundant data and dependencies between attributes. The chapter provides examples to illustrate concepts like functional dependencies, candidate keys, and how normalization can resolve issues like insertion, deletion and update anomalies.

Uploaded by

abcd efgh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views95 pages

Dbms Unit 51

This chapter discusses relational database design and normalization. It covers informal design guidelines like minimizing null values and avoiding update anomalies. The chapter defines functional dependencies and describes how they are used to determine candidate keys and define normal forms like 1NF, 2NF and 3NF. Normalization helps minimize data anomalies by removing redundant data and dependencies between attributes. The chapter provides examples to illustrate concepts like functional dependencies, candidate keys, and how normalization can resolve issues like insertion, deletion and update anomalies.

Uploaded by

abcd efgh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 95

UNIT 5 – RELATIONAL DATABASE DESIGN

Chapter Outline
1 Informal Design Guidelines for Relational Databases
1.1Semantics of the Relation Attributes
1.2 Redundant Information in Tuples and Update
Anomalies
1.3 Null Values in Tuples
1.4 Spurious Tuples
2 Functional Dependencies (FDs)
2.1 Definition of FD
2.2 Inference Rules for FDs
Chapter Outline(contd.)
3 Normal Forms Based on Primary Keys
3.1 Normalization of Relations
3.2 Practical Use of Normal Forms
3.3 Definitions of Keys and Attributes
Participating in Keys
3.4 First Normal Form
3.5 Second Normal Form
3.6 Third Normal Form
4 BCNF (Boyce-Codd Normal Form)
1.1 Semantics of the Relation Attributes
GUIDELINE 1: Informally, each tuple in a relation should represent one entity
or relationship instance. (Applies to individual relations and their
attributes).

• Attributes of different entities (EMPLOYEEs, DEPARTMENTs, PROJECTs)


should not be mixed in the same relation
• Only foreign keys should be used to refer to other entities
• Entity and relationship attributes should be kept apart as much as
possible.
Bottom Line: Design a schema that can be explained easily relation by
relation. The semantics of attributes should be easy to interpret.
GUIDELINE 1
To minimize the storage
A simplified COMPANY relational database schema
1.2 Redundant Information in Tuples and Anomalies
GUIDELINE 2: Design a schema that does not suffer from the insertion,
deletion and update anomalies. If there are any present, then note them so
that applications can be made to take them into account.
• Mixing attributes of multiple entities may cause problems
• Information is stored redundantly wasting storage
• Problems:

Insertion Anomaly

Deletion Anomaly

Update Anomaly
Data Anomalies: Example
Redundant data is where we have stored the same ‘information’
more than once. i.e., the redundant data could be removed
without the loss of information.
Example: We have the following relation that contains staff and
department details:
staffNo job dept dname city
SL10 Salesman 10 Sales Stratford
Such ‘redundancy’
SA51 Manager 20 Accounts Barking could lead to the
DS40 Clerk 20 Accounts Barking
OS45 Clerk 30 Operations Barking
following
‘anomalies’
Insert Anomaly: We can’t insert a dept without inserting a member of
staff that works in that department
Deletion Anomaly: By removing employee SL10 we have removed all
information pertaining to the Sales dept.
Insertion
Anomaly
Deletion
Anomaly

Update
Anomaly
Example Of Insert & Delete Anomaly
Consider the relation:
EMP_PROJ ( Emp#, Proj#, Ename, Pname, No_hours)
• Insert Anomaly: Cannot insert a project unless an employee is assigned
to .
Inversely - Cannot insert an employee unless an he/she is assigned to a
project.
•  Delete Anomaly: When a project is deleted, it will result in deleting all
the employees who work on that project. Alternately, if an employee is
the sole employee on a project, deleting that employee would result in
deleting the corresponding project.
• Update Anomaly: Changing the name of project number P1 from
“Billing” to “Customer-Accounting” may cause this update to be made for
all 100 employees working on project P1.
1.3 Null Values in Tuples
GUIDELINE 3: Relations should be designed such that their tuples will have as
few NULL values as possible
• Attributes that are NULL frequently could be placed in separate relations
(with the primary key)
• Problems with NULL values:
— Can waste space at storage level.
— Applying aggregate operations like count or sum becomes difficult.
— For example consider below table:
— Student(stno,name,address,phn_no,marks,scholarship_amount,
scholarship_date)  Suppose total 500 students are there but
college is giving scholarship to only first three students. Then
Scholarship attributes will be applicable to only three records for
remaining records it will be null wasting space. It is better to have
scholarship attributes in a separate table.
• Reasons for nulls:
— attribute not applicable or invalid
— attribute value unknown (may exist)
— value known to exist, but unavailable
1.4 Spurious Tuples
GUIDELINE 4:
— Design relation schemas so that they can be joined with equality
conditions on attributes that are either primary keys or foreign keys in
a way that guarantees that no spurious tuples are generated.

• Avoid relations that contain matching attributes that are not (foreign key,
primary key) combinations, because joining on such attributes may
produce spurious tuples.
• A spurious tuple is, basically a record in a database that gets created
when two tables are joined badly.
• Bad designs for a relational database may result in erroneous results for
certain JOIN operations
• It is said that it is better to have smaller relations : we can decompose big
relation into more than one small relation.

• Problem: When two small relations are joined together you should get
exactly same bigger relation.
Spurious Tuples
• For example: consider relation car(Id, Make, Color)
Id Make Color
123 Toyota Blue
456 Audi Blue
789 Toyota Red

• Let us decompose relation car into two small relations as


below:
— car1: (id, color)
— car2: (color, make)
Id Color Color Make
123 Blue Blue Toyota
456 Blue Blue Audi
789 Red Red Toyota

• What happens when we join car1 and car2?


Spurious Tuples
• Applying natural join between car1: (id, color) and car2: (color,
make)
Car1 Car2
Id Color Color Make
123 Blue Blue Toyota
456 Blue Blue Audi
789 Red Red Toyota

Table with spurious tuples (Tuples marked in red are spurious tuples)
Id Color Make
Original table Car
123 Blue Toyota Id Make Color
123 Blue Audi 123 Toyota Blue
456 Blue Toyota 456 Audi Blue
456 Blue Audi 789 Toyota Red
789 Red Toyota
Closure Method
• It used to find candidate key
• Functional dependency
• X->y (y is determined by x)(y is
dependent x is determinant)
• R(A,b,c,d)
• FD(A->b,b->c,c->d) D+=(D)
A+=(A,b,c,d) prime attribute(A)
non prime(b,c,d)
B+=(b,c,d)
• R(A,B,C,D,E)

• FD=(A->B,BC->D,E->C,D->A)
• E=(B,D,C,A)
• E+=C
• AE+=(A,E,C,B,D)(you need to check either A or
E is there on right side)
• DE+(D,E,A,B,C)
• BE+(B,E,C,D,A)
• CE+(C,E) prime(A,B,D,E) NPA(C)
• R = ABCDE, F = {A -> BE, C -> BE, B -> D}
• R = ABCD, F={AB -> C, BC -> D, CD -> A}
• A->BE
• A->B
• A->E
• C->B
• C->E
AC=(BED)
AC+=(A,C,B,E,D) (Candidate key=AC)
B=(CDA)
B+=(B)
AB+=(A,B,C,D) (Candidate key=AB,BC)
CB+=(C,B,D,A)
DB+=(D,B)
Functional Dependency
X->Y (X determinant and y is dependent)
Eid->ename
Trivial FD and non Trivial FD
Trivial FD
X->Y then y needs too be subset of X
Eid->eid(reflexive property of FD)
Eid,ename->ename
X intersect y = (never be empty)
(EID,ENAME) intersection EID (EID)
Non trivial FD
X->Y then y is not a subset of x (x intersection y)=(Empty or Null)
Stud_sapno->sname
Emp_code->empname
Stud_sapno->semester
2.1 Functional Dependencies
• Functional dependencies (FDs) are used to specify formal measures of
the "goodness" of relational designs
• FDs and keys are used to define normal forms for relations
• FDs are constraints that are derived from the meaning and
interrelationships of the data attributes
• A set of attributes X functionally determines a set of attributes Y if the
value of X determines a unique value for Y
• X -> Y holds if whenever two tuples have the same value for X, they must
have the same value for Y
• For any two tuples t1 and t2 in any relation instance r(R): If t1[X]=t2[X],
then t1[Y]=t2[Y]
• X -> Y in R specifies a constraint on all relation instances r(R)
• Written as X -> Y; can be displayed graphically on a relation schema as in
Figures. ( denoted by the arrow: ).
• FDs are derived from the real-world constraints on the attributes
Functional Dependencies
Functional Dependencies: Example 1

No functionally
Dependent
As different
values of Y for
same X

X -> Y holds if whenever two tuples have the same value


for X, they must have the same value for Y
Functional Dependencies: Example 2

employee ssn and project number determines the hours per


week that the employee works on the project
{SSN, PNUMBER} -> HOURS

social security number determines employee name


SSN -> ENAME

project number determines project name and location


PNUMBER -> {PNAME, PLOCATION}
• F denotes the set of FD that are specified on relation schema R.
• F= { SSNENAME
SSN, PNUMBER  HOURS
PNUMBER  PNAME, PLOCATION }
2.2 Inference Rules for FDs (1)
• Dependencies can be inferred or deduced from the FDs in F.

• For example: each department has one manger.


— So dnomgrno holds
• Each manager has a unique phone number
— So mgrno  mgrphone holds.

• This leades to dno mgrphone  inferred FD

• So formally it is useful to define a concept called closure that includes all


possible dependencies that can be inferred from the given set F.
• Closure of a set F of FDs is the set F+ of all FDs that can be inferred from F
• Closure of a set of attributes X with respect to F is the set X + of all
attributes that are functionally determined by X
2.2 Inference Rules for FDs
• Armstrong's inference rules:

IR1. (Reflexive**) If Y subset-of X, then X -> Y : states that a set of attributes


always determines itself or any of its subsets. IR1 generates dependencies
that are always true, such dependencies are know as trivial.
Sid,sname->sname
The FD XY is trivial if Y X else it is non trivial
IR2. (Augmentation) If X -> Y, then XZ -> YZ
Sid->sname
Sid,semail->sname,semail
(Notation: XZ stands for X U Z)
IR3. (Transitive**) If X -> Y and Y -> Z, then X -> Z (sid->sname)(sname->city)
(sid->city)
IR4. (Decomposition**) If X -> YZ, then X -> Y and X -> Z
IR5. (Union) If X -> Y and X -> Z, then X -> YZ
IR6. (Psuedotransitivity) If X -> Y and WY -> Z, then WX -> Z
Inference Rules for FDs: Example 1(Transitive)
• Each department has one manager
—FD : dept_no manager_no
• Each manager has unique phone no.
—FD : manager_no phn_no
• Two dependencies together imply:
—FD : dept_no phn_no
This is an inferred functional dependency.

Dept_no Manager_no Phn_no

01 101 876665401
02 102 678888802
03 103 765489903
Inference Rules for FDs: Example 2
• F = {Eno {Ename, DOB, Address, Dno}
dno {dname, dmgrno}}
• Additional FDs that we can infer from F are:
• Closure of F denoted by F+
• F+ ={ Eno {dname, dmgrno} (Transitive)
Eno Eno (Trivial)
Dno Dname (Decomposition)
Dno dmgrno} (Decomposition)
}
Inference Rules for FDs: Example 3
• R = (A, B, C, G, H, I)
F={ AB
AC
CG  H
CG  I
B  H}
• some members of F+
—A  H
– by transitivity from A  B and B  H
— AG  I
– by augmenting A  C with G, to get AG  CG
and then transitivity with CG  I
— CG  HI
– By decomposition, CG  H, CG  I, we have
– CG  HI
Attribute Set Closure: Example1
• R = (A, B, C, G, H, I)
• F = {A  B
AC
CG  H
CG  I
B  H}
• (AG)+
1. result = AG
2. result = ABCG (A  C and A  B)
3. result = ABCGH (CG  H and CG  AGBC)
4. result = ABCGHI (CG  I and CG  AGBCH)
Therefore, AG is a candidate key
Attribute Set Closure: Example2
Given set of Functional dependencies F:

F = { Eno Ename
Pno {pname, Plocation}
{Eno, Pno } Hours}
Find {Eno}+ , {Pno}+, {Eno, Pno}+

Sol.: {Eno}+ = {Eno, Ename}


{Pno}+ = {pno,pname,plocation}
{Eno,pno}+ = {Eno,Ename,Pno,Pname,Plocation, Hours}
3 Normal Forms Based on Primary Keys
3.1Normalization of Relations
3.2Practical Use of Normal Forms
3.3Definitions of Keys and Attributes Participating in
Keys
3.4First Normal Form
3.5Second Normal Form
3.6Third Normal Form
3.1 Normalization of Relations (1)
• Normalization: The process of decomposing
unsatisfactory "bad" relations by breaking up their
attributes into smaller relations

• Normal form: Condition using keys and FDs of a


relation to certify whether a relation schema is in a
particular normal form
3.2 Practical Use of Normal Forms
• Normalization is carried out in practice so that the resulting
designs are of high quality and meet the desirable properties

• The database designers need not normalize to the highest


possible normal form. (usually up to 3NF, BCNF or 4NF)

• Denormalization: the process of storing the join of higher


normal form relations as a base relation—which is in a lower
normal form
Definitions of Keys and Attributes Participating
in Keys (2)
• If a relation schema has more than one key, each is
called a candidate key. One of the candidate keys is
arbitrarily designated to be the primary key, and the
others are called secondary keys.

• A Prime attribute must be a member of some


candidate key

• A Nonprime attribute is not a prime attribute—that


is, it is not a member of any candidate key.
3.4 First Normal Form
First Normal Form: Example 1

1NF: Violation
First Normal Form
First Normal Form
First Normal Form
First Normal Form
First Normal Form: Example 2
First Normal Form: Example 2
Figure 10.9 Normalization nested relations into 1NF
1 NF also disallows
multivalued attributes that
themselves are composite.

These are called nested


relations because each tuple
can have a relation within
it.

EMP_PROJ  each tuple


represents an employee
entity, and a relations
PROJS within each tuple
represents the employee’s
projects and the hours per
week that employee works
on each project.

Chapter 10-43
3.5 Second Normal Form
Functional Dependencies
Partial Dependency
Second Normal Form: Example 1
Second Normal Form: Example 1
Cust_id Store_id Location
1 1 Delhi
1 3 Mumbai
2 1 Delhi
3 2 Bangalore
4 3 Mumbai
Cust_id,store_id->location (cust_id,store_id) (store_id,location)
• R(A,b,c,d,e,f)
• FD( c->f, e->A, ec->d, A->b)
• (x->y)
• (ec->a, ec->b,ec->d,ec->f)
• EC+=(e,c,f,d,b,a)
• Prime Attributes(E,C)
• Non Prime Attributes (A,B,D,F)
• *LHS of FD Should be proper subset of candidate key and RHS should be
non prime attribute(this is condition to check partial dependency)
• (EC) together is a subset and (E, C) is a proper subset
Second Normal Form: Example 2
2NF: EXAMPLE 3
• Consider below table:

• Table describes parcels of land for sale in various counties of a state.


• Two candidate keys: property_id, (county_name, Lot#)
• Propertyid#  primary key
• Lot# are unique within county
• Propertyid # is unique through out the state.
2NF: EXAMPLE 3
• Consider below table:

• Total FDs:
1. Propertyid# {County_name,Lot#,Area,Price, Tax_rate}
2. {County_name, Lot#} {Propertyid#, Area, Price, Tax_rate}
3. County_nameTax_rate //tax rate is fixed for given county
4. AreaPrice //price of a lott is determined by its area
regardless of which county it is in
Does relation satisfy 2NF?
2NF: EXAMPLE 3
• Consider below table:

1. Propertyid# {county_name,Lot#,Area,Price, Tax_rate}


2. {County_name, Lot#} {Propertyid#, Area, Price, Tax_rate}
3. County_nameTax_rate
4. AreaPrice
Does relation satisfy 2NF? : Tax_rate is dependent on propertyid# but it is
not fully functionally dependent on {Countyname, Lot#}, it is partially
dependent on County_name.  So decompose table
2NF: EXAMPLE 3
• Consider below table:
3.4 Third Normal Form
Transitive Dependency
3NF EXAMPLE 1
3NF: EXAMPLE 1
Roll No State City
1 Punjab Mohali
2 Haryana Ambali
3 Punjab Mohali
4 Bihar Patna
For each FD LHS must be a candidate or Super Key OR RHS is a Prime
Attribute
R(A,B,C,D) is not in 3NF
FD: AB->C , C->D
AB+=ABCD
CK=(AB)
PA(A,B)
NPA(C,D)

R(A,B,C,D)
FD: AB->CD, D->A
B=ADC
B+=B
AB+=ABCD DB+=DBAC PA=(A,B,D) NPA=(C)
Boyce CoDD Normal Form (BCNF)
• It should be in 3NF and LHS of each FD Should be candidate key or Super
Key

Rno Name Vid Age


1 AB AB123 20
2 CD CD304 21
3 AB AB786 23
4 PQ PQ286 21
FD: Rno->name
Rno->vid
Vid->age
Vid->Rno
CK (Rno, Vid)
Decomposition
• Lossless/Lossy Decomposition
B C
A B C A B 2 1
1 2 1 1 2 2 2
2 2 2 2 2 3 2
3 3 2 3 3 R2
R R1
A B B C
A B C
1 2 2 1
1 2 1
1 2 2 2
1 2 3 2 1 2 2

2 2 2 1 2 2 1
2 2 2 2 2 2 2
2 2 3 2
3 3 2
3 3 2 1
3 3 2 2
• Common attribute should be CK/ SK of either R1 or R2 or Both
• R1 U R2 -> R (AB) U (AC)-> (ABC)
• R1 n R2 != null (AB) n (AC)->(A)

• R(A,B,C,D,E,F)
• FD: AB->C , C->DE, E->F, F->A
• Check the highest normal form?
• Ck(AB,FB,EB,CB)
• PA=(A,B,C,E,F)
• NPA=(D)
• No BCNF
• No 3NF
• No 2NF
• 1 NF (Highest Normal Form)
3NF: EXAMPLE 2

Does above tables satisfy 3NF?


In LOTS1 FD AreaPrice : violates 3NF
According to general definition of 3 NF: Area is not a key neither
Price is a prime attribute.

According to normal definition of 3NF: FD4 violates 3NF as


transitive dependency holds:
Propertyid#  Area Price
So decompose table to satisfy 3NF
3NF: EXAMPLE 2
4 BCNF (Boyce-Codd Normal Form)
• A relation schema R is in Boyce-Codd Normal Form
(BCNF) if whenever an FD X -> A holds in R, then X is
a superkey of R
• Each normal form is strictly stronger than the previous one
— Every 2NF relation is in 1NF
— Every 3NF relation is in 2NF
— Every BCNF relation is in 3NF
• There exist relations that are in 3NF but not in BCNF
• The goal is to have each relation in BCNF (or 3NF)
BCNF: Example 1
BCNF: Example 2
• Suppose we have thousands of lots in the relation.
• Lots are of two county’s only : Dekalb and Fulton
• Suppose also lot size in Dekalb county are only 0.5,
0.6, 0.7,0.8,0.9,1.0 acres.
• Whereas lot size in Fulton county are 1.1,1.2,…,1.9,
2.0 acres.
• In such situation we will have additional FD:
—AreaCounty_name
— If we add above FD in LOTS1A --- relation will satisfy 3NF but not BCNF
as Area is not a super key.
— So decompose relation so that it satisfies BCNF.
BCNF: Example 2
Multivalued Attributes
Partial Dependency
Transitive Dependency
Normalization
Normal Forms Review
INF: Example1
Example 1

You might also like