Dbms Unit 51
Dbms Unit 51
Chapter Outline
1 Informal Design Guidelines for Relational Databases
1.1Semantics of the Relation Attributes
1.2 Redundant Information in Tuples and Update
Anomalies
1.3 Null Values in Tuples
1.4 Spurious Tuples
2 Functional Dependencies (FDs)
2.1 Definition of FD
2.2 Inference Rules for FDs
Chapter Outline(contd.)
3 Normal Forms Based on Primary Keys
3.1 Normalization of Relations
3.2 Practical Use of Normal Forms
3.3 Definitions of Keys and Attributes
Participating in Keys
3.4 First Normal Form
3.5 Second Normal Form
3.6 Third Normal Form
4 BCNF (Boyce-Codd Normal Form)
1.1 Semantics of the Relation Attributes
GUIDELINE 1: Informally, each tuple in a relation should represent one entity
or relationship instance. (Applies to individual relations and their
attributes).
Insertion Anomaly
Deletion Anomaly
Update Anomaly
Data Anomalies: Example
Redundant data is where we have stored the same ‘information’
more than once. i.e., the redundant data could be removed
without the loss of information.
Example: We have the following relation that contains staff and
department details:
staffNo job dept dname city
SL10 Salesman 10 Sales Stratford
Such ‘redundancy’
SA51 Manager 20 Accounts Barking could lead to the
DS40 Clerk 20 Accounts Barking
OS45 Clerk 30 Operations Barking
following
‘anomalies’
Insert Anomaly: We can’t insert a dept without inserting a member of
staff that works in that department
Deletion Anomaly: By removing employee SL10 we have removed all
information pertaining to the Sales dept.
Insertion
Anomaly
Deletion
Anomaly
Update
Anomaly
Example Of Insert & Delete Anomaly
Consider the relation:
EMP_PROJ ( Emp#, Proj#, Ename, Pname, No_hours)
• Insert Anomaly: Cannot insert a project unless an employee is assigned
to .
Inversely - Cannot insert an employee unless an he/she is assigned to a
project.
• Delete Anomaly: When a project is deleted, it will result in deleting all
the employees who work on that project. Alternately, if an employee is
the sole employee on a project, deleting that employee would result in
deleting the corresponding project.
• Update Anomaly: Changing the name of project number P1 from
“Billing” to “Customer-Accounting” may cause this update to be made for
all 100 employees working on project P1.
1.3 Null Values in Tuples
GUIDELINE 3: Relations should be designed such that their tuples will have as
few NULL values as possible
• Attributes that are NULL frequently could be placed in separate relations
(with the primary key)
• Problems with NULL values:
— Can waste space at storage level.
— Applying aggregate operations like count or sum becomes difficult.
— For example consider below table:
— Student(stno,name,address,phn_no,marks,scholarship_amount,
scholarship_date) Suppose total 500 students are there but
college is giving scholarship to only first three students. Then
Scholarship attributes will be applicable to only three records for
remaining records it will be null wasting space. It is better to have
scholarship attributes in a separate table.
• Reasons for nulls:
— attribute not applicable or invalid
— attribute value unknown (may exist)
— value known to exist, but unavailable
1.4 Spurious Tuples
GUIDELINE 4:
— Design relation schemas so that they can be joined with equality
conditions on attributes that are either primary keys or foreign keys in
a way that guarantees that no spurious tuples are generated.
• Avoid relations that contain matching attributes that are not (foreign key,
primary key) combinations, because joining on such attributes may
produce spurious tuples.
• A spurious tuple is, basically a record in a database that gets created
when two tables are joined badly.
• Bad designs for a relational database may result in erroneous results for
certain JOIN operations
• It is said that it is better to have smaller relations : we can decompose big
relation into more than one small relation.
• Problem: When two small relations are joined together you should get
exactly same bigger relation.
Spurious Tuples
• For example: consider relation car(Id, Make, Color)
Id Make Color
123 Toyota Blue
456 Audi Blue
789 Toyota Red
Table with spurious tuples (Tuples marked in red are spurious tuples)
Id Color Make
Original table Car
123 Blue Toyota Id Make Color
123 Blue Audi 123 Toyota Blue
456 Blue Toyota 456 Audi Blue
456 Blue Audi 789 Toyota Red
789 Red Toyota
Closure Method
• It used to find candidate key
• Functional dependency
• X->y (y is determined by x)(y is
dependent x is determinant)
• R(A,b,c,d)
• FD(A->b,b->c,c->d) D+=(D)
A+=(A,b,c,d) prime attribute(A)
non prime(b,c,d)
B+=(b,c,d)
• R(A,B,C,D,E)
• FD=(A->B,BC->D,E->C,D->A)
• E=(B,D,C,A)
• E+=C
• AE+=(A,E,C,B,D)(you need to check either A or
E is there on right side)
• DE+(D,E,A,B,C)
• BE+(B,E,C,D,A)
• CE+(C,E) prime(A,B,D,E) NPA(C)
• R = ABCDE, F = {A -> BE, C -> BE, B -> D}
• R = ABCD, F={AB -> C, BC -> D, CD -> A}
• A->BE
• A->B
• A->E
• C->B
• C->E
AC=(BED)
AC+=(A,C,B,E,D) (Candidate key=AC)
B=(CDA)
B+=(B)
AB+=(A,B,C,D) (Candidate key=AB,BC)
CB+=(C,B,D,A)
DB+=(D,B)
Functional Dependency
X->Y (X determinant and y is dependent)
Eid->ename
Trivial FD and non Trivial FD
Trivial FD
X->Y then y needs too be subset of X
Eid->eid(reflexive property of FD)
Eid,ename->ename
X intersect y = (never be empty)
(EID,ENAME) intersection EID (EID)
Non trivial FD
X->Y then y is not a subset of x (x intersection y)=(Empty or Null)
Stud_sapno->sname
Emp_code->empname
Stud_sapno->semester
2.1 Functional Dependencies
• Functional dependencies (FDs) are used to specify formal measures of
the "goodness" of relational designs
• FDs and keys are used to define normal forms for relations
• FDs are constraints that are derived from the meaning and
interrelationships of the data attributes
• A set of attributes X functionally determines a set of attributes Y if the
value of X determines a unique value for Y
• X -> Y holds if whenever two tuples have the same value for X, they must
have the same value for Y
• For any two tuples t1 and t2 in any relation instance r(R): If t1[X]=t2[X],
then t1[Y]=t2[Y]
• X -> Y in R specifies a constraint on all relation instances r(R)
• Written as X -> Y; can be displayed graphically on a relation schema as in
Figures. ( denoted by the arrow: ).
• FDs are derived from the real-world constraints on the attributes
Functional Dependencies
Functional Dependencies: Example 1
No functionally
Dependent
As different
values of Y for
same X
01 101 876665401
02 102 678888802
03 103 765489903
Inference Rules for FDs: Example 2
• F = {Eno {Ename, DOB, Address, Dno}
dno {dname, dmgrno}}
• Additional FDs that we can infer from F are:
• Closure of F denoted by F+
• F+ ={ Eno {dname, dmgrno} (Transitive)
Eno Eno (Trivial)
Dno Dname (Decomposition)
Dno dmgrno} (Decomposition)
}
Inference Rules for FDs: Example 3
• R = (A, B, C, G, H, I)
F={ AB
AC
CG H
CG I
B H}
• some members of F+
—A H
– by transitivity from A B and B H
— AG I
– by augmenting A C with G, to get AG CG
and then transitivity with CG I
— CG HI
– By decomposition, CG H, CG I, we have
– CG HI
Attribute Set Closure: Example1
• R = (A, B, C, G, H, I)
• F = {A B
AC
CG H
CG I
B H}
• (AG)+
1. result = AG
2. result = ABCG (A C and A B)
3. result = ABCGH (CG H and CG AGBC)
4. result = ABCGHI (CG I and CG AGBCH)
Therefore, AG is a candidate key
Attribute Set Closure: Example2
Given set of Functional dependencies F:
F = { Eno Ename
Pno {pname, Plocation}
{Eno, Pno } Hours}
Find {Eno}+ , {Pno}+, {Eno, Pno}+
1NF: Violation
First Normal Form
First Normal Form
First Normal Form
First Normal Form
First Normal Form: Example 2
First Normal Form: Example 2
Figure 10.9 Normalization nested relations into 1NF
1 NF also disallows
multivalued attributes that
themselves are composite.
Chapter 10-43
3.5 Second Normal Form
Functional Dependencies
Partial Dependency
Second Normal Form: Example 1
Second Normal Form: Example 1
Cust_id Store_id Location
1 1 Delhi
1 3 Mumbai
2 1 Delhi
3 2 Bangalore
4 3 Mumbai
Cust_id,store_id->location (cust_id,store_id) (store_id,location)
• R(A,b,c,d,e,f)
• FD( c->f, e->A, ec->d, A->b)
• (x->y)
• (ec->a, ec->b,ec->d,ec->f)
• EC+=(e,c,f,d,b,a)
• Prime Attributes(E,C)
• Non Prime Attributes (A,B,D,F)
• *LHS of FD Should be proper subset of candidate key and RHS should be
non prime attribute(this is condition to check partial dependency)
• (EC) together is a subset and (E, C) is a proper subset
Second Normal Form: Example 2
2NF: EXAMPLE 3
• Consider below table:
• Total FDs:
1. Propertyid# {County_name,Lot#,Area,Price, Tax_rate}
2. {County_name, Lot#} {Propertyid#, Area, Price, Tax_rate}
3. County_nameTax_rate //tax rate is fixed for given county
4. AreaPrice //price of a lott is determined by its area
regardless of which county it is in
Does relation satisfy 2NF?
2NF: EXAMPLE 3
• Consider below table:
R(A,B,C,D)
FD: AB->CD, D->A
B=ADC
B+=B
AB+=ABCD DB+=DBAC PA=(A,B,D) NPA=(C)
Boyce CoDD Normal Form (BCNF)
• It should be in 3NF and LHS of each FD Should be candidate key or Super
Key
2 2 2 1 2 2 1
2 2 2 2 2 2 2
2 2 3 2
3 3 2
3 3 2 1
3 3 2 2
• Common attribute should be CK/ SK of either R1 or R2 or Both
• R1 U R2 -> R (AB) U (AC)-> (ABC)
• R1 n R2 != null (AB) n (AC)->(A)
• R(A,B,C,D,E,F)
• FD: AB->C , C->DE, E->F, F->A
• Check the highest normal form?
• Ck(AB,FB,EB,CB)
• PA=(A,B,C,E,F)
• NPA=(D)
• No BCNF
• No 3NF
• No 2NF
• 1 NF (Highest Normal Form)
3NF: EXAMPLE 2