Week8 DBMS
Week8 DBMS
1
Normalization:
Good relation schema design
Goodness of relation schema can be done at
⚫ Logical level
⚫ Implementation level
The first is the logical (or conceptual) level—how users interpret the relation schemas
and the meaning of their attributes. Having good relation schemas at this level enables users
to understand clearly the meaning of the data in the relations, and hence to formulate their
queries correctly. The second is the implementation (or physical storage) level—how the
tuples in a base relation are stored and updated. This level applies only to schemas of base
relations—which will be physically stored as files—whereas at the logical level we are
interested in schemas of both base relations and views (virtual relations).
Database design may be performed using two approaches: bottom-up or top-down. A bottom-
up design methodology which considers the basic relationships among individual attributes
as the starting point and uses those to construct relation schemas. This approach is not very
popular in practice because it suffers from the problem of having to collect a large number of
binary relationships among attributes as the starting point. For practical situations, it is next to
impossible to capture binary relationships among all such pairs of attributes. In contrast, a
top-down design methodology (also called design by analysis) starts with a number of
groupings of attributes into relations that exist together naturally, for example, on an invoice,
a form, or a report. The relations are then analysed individually and collectively, leading to
further decomposition until all desirable properties are met. The database design theory
discussed in this week is applicable to both the top-down and bottom-up design approaches
but is more appropriate when used with the top-down approach.
2
Informal Design Guidelines:
To start with normalization we will see informal guidelines that may be used as a measure to
determine the quality of relation schema design:
■ Making sure that the semantics of the attributes is clear in the schema
■ Reducing the redundant information in tuples
■ Reducing the NULL values in tuples
■ Disallowing the possibility of generating spurious tuples
Making sure that the semantics of the attributes is clear in the schema:
Employee and Department or Employee and project are combined in the example. If we
make ER diagram each of this will be separate entity. When we map ER to relational schema
it will be sperate relations.
Ex.Emp_dept
(a) ssn, ename, add, dno, dname,mgrssn
Ex. Emp_proj
(b) ssn, pno, hours, name, pname, plocation
Guidelines 1:
Design a relation schema that do not combine attributes from multiple entity types &
relationship types.
3
Guidelines 2: Design the base relation schemas so that no insertion, deletion, or modification
anomalies are present in the relations. If any anomalies are present, note them clearly and
make sure that the programs that update the database will operate correctly.
⚫ Tables are put together to reduce storage space.
⚫ create views for base relations with joins for easy querying.
4
When joining emp_proj and emp_loc by ploc(only common attribute) the table generated is
given above. Look at the values, SSN 11 has name Ram and project PA, SSN 13 has name
Ram and PC. If SSN is different name should be different , since we have only three SSN and
three distinct names in the tables. 13,RAM… must be a wrong tuple called spurious tuple- a
tuple that is not in the database but generated by wrong joins.
Guideline 4:
Design relation schemas so that they can be joined with equality conditions on attributes that
are either PRIMARY KEYS or FOREIGN KEYS.
1. Do not have common attributes that are either PK or FK.
2. If such relations are unavoidable do not join them, because it produces spurious
tuples.
Functional Dependencies:
Functional dependency is a constraint between 2 sets of attributes from the database.
Functional dependency is a property of the semantics of the attributes.
if Y ⊆ X , X Y
5
if X Y & X Z, X YZ
Either all or few of the rules can be applied to set of dependencies to find the minimal
dependencies.
Key of a Relation:
Having seen what is normalization and functional dependencies , let us see how to normalize
a relation with set of given FDs. Before we see normal forms, we have to find the key of the
relation with given FDs.
2. For every Si in S take the LHS attribute and find the closure
3. If an attribute determines all the attributes in the relation it is the PK
4. If none of LHS attribute determine all, check for combination of LHS
attributes
if X ⊆ X+ then
X+ := X+ ∪ Y
End
Initially create X+ , the closure of X. It‘s the LHS of a given FD. First add X , Add the
attribute determined by X. Keep adding the attributes determined by attributes added in X+
until there is no further determination.
6
Example Finding the key of the Relation:
Let us take the relation Student department and set of FD
Stud_dept (Reg, name, prog, dcode, dname, dean)
FD={ Reg name ; Reg prog; Reg dcode; dcode dname, dean}
We have only Reg and dcode on the LHS of FDs. Find closure of reg and dcode.
Minimal Cover
Steps to find Minimal Cover
Singleton attributes in Right Hand Side [RHS]
Identify Extraneous Attributes and remove it
Remove redundant dependencies
8
Since none of the key attribute closure have issued all the attributes, try finding some other
closure by combining the key attributes which may issue all the attributes in the
AD+ = ADCEB [Since A->C, D->E, C->B]
From the given set of functional dependencies, AD is a Candidate Key.
Example 4: Key Attribute closure without candidate Key
Consider the Functional Dependencies for R(A, B, C,D,E)
A -> B
C-> D
BC -> E
A+ = AB [Since A->B]
C+ = CD [Since C->D]
BC+ = BCED [Since BC->E, C->D]
Since none of the key attribute closure have issued all the attributes, try finding some other
closure by combining the key attributes which may issue all the attributes in the
AC+ = ACBDE [Since A->B, C->D, BC->E]
From the given set of functional dependencies, AC is a Candidate Key.
Step 2: Removing Extraneous Attributes
If an attribute doesn’t any meaning to the functional dependency, we say it is extraneous
and remove it
Consider the functional dependencies
A -> B
AB -> C
D -> AC
D -> E
A -> B
A -> C
D -> A
D->C
D -> E
If an LHS has more than one attribute, check whether there exists an extraneous
(Extra/Unwanted) attribute. If so, remove it.
9
LHS which have 2 attributes is AB -> C
A+ = ABC [Since A->B, AB-> C]
B+ = B [Reflexivity]
If an attribute Closure gives only its own attribute by satisfying Reflexivity, that attribute
in the functional dependency is Extraneous.
So, B is Extraneous in AB-> C Implies A->C
New FDs are A->B, A->C, D-AC, D->E
Step 3: Removing Redundant Functional Dependencies
Finding Redundancy Dependency and Minimal Cover – Ex 1
Now we have to identify the redundant dependencies from the below
New FDs
A->B
A->C
D->AC
D->E
Applying Singleton to RHS
A->B
A->C
D->A
D->C
D->E
10
4. Remove D->C and find the attribute closure for D
D+ = DAEBC [Since D->C, ] – Here if we don’t consider D->C, Could be found in D+. So
D->C is the redundant dependency and should be removed.
5. Remove D->E and find the attribute closure for D
D+ = DACB [Since D->A, A->C, A->B] – Here if we don’t consider D->E, E cannot be
found in D+. So D->E cannot be a redundant dependency.
So, Minimal Cover will be after removing
a) Extraneous Attributes
b) Redundant Dependencies
Minimal Functional Dependencies are
A -> B
A -> C
D -> A
D -> E
Example 2
Consider the Functional Dependencies
A -> B
B -> C
A -> C
Remove A -> B and find attribute Closure for A
A+ = AC [B is not issued – Not redundant]
Remove B -> C and find attribute Closure for B.
B+ = B [C is not issued – Not redundant]
Remove A -> C and find attribute Closure for A.
A+ = ABC [A->B, B->C. C is issued – So, Redundant]
Final FDs without redundancy are
A->B and B->C
Example 3:
R(A,B,C,D,E)
F = {A->D, BC->AD, C->B, E->A, E->D}
Step1: Singleton RHS
11
Step2: Remove Extraneous Attribute
Step3: Redundant Dependency
Step1: Singleton RHS
A->D
BC->A
BC->D
C->B
E->A
E->D
12
C->B
E->A
E->D
Remove A->D, A+ = A – Not Redundant [D not arrived]
Remove C->A, C+ = CDB – Not Redundant [A not arrived]
Remove C->D, C+ = CABD – Redundant [D arrived]
New
F= {A->D,
C->A,
C->B,
E->A,
E->D}
Remove C->B, C+ = CAD – Not Redundant [B not arrived]
Remove E->A, E+ = ED – Not Redundant [A not arrived]
Remove E->D, E+ = EAD – Redundant [D arrived]
Final – Minimal Cover
F= {A->D,
C->A,
C->B,
E->A,}
EC+ = ECADB
EC will be the candidate Key
13