Normalisation
Normalisation
Management Systems
1. Understand and successfully apply logical database design principles, including E-R diagrams and database normalization.
4. Learn database architectures, DBMS advancements and its usage in advance applications.
Course Outcomes: Upon completion of the course, the students will be able to:
1. Design ER-models to represent simple database application scenarios and Improve the database design by normalization.
2. Design Database Relational Model and apply SQL , PLSQL concepts for database programming.
4. Identify appropriate database architecture for the real world database applications.
B is
A functionally B
dependent on
A
Determina Refers to the attribute or group of attributes on the
nt left-hand side of the arrow of a functional dependency
A → B, B → C
1. B → A
2. AD → B ( using decomposition inference rule on AD → BC)
AD → C ( using decomposition inference rule on AD → BC)
3. C → A ( using decomposition inference rule on C → ABD)
C → B ( using decomposition inference rule on C → ABD)
C → D ( using decomposition inference rule on C → ABD)
A. {E, F}
B. {E, F, H}
C. {E, F, H, K, L}
D. {E}
A. CD -> AC
B. BD -> CD Hint:
C. BC -> CD To check whether an FD A->B can be derived from an
D. AC -> BC FD set F,
Question. Consider the database given below. Suppose we want to insert a new
staff in the StaffBranch relation. What can be the problem for inserting new
staff details?
StaffBranch
staffNo sName position salary branchNo bAddress
SL21 John White Manager 30000 B005 22 Deer Rd, London
SG37 Ann Beech Assistant 12000 B003 163 Main St,Glasgow
SG14 David Ford Supervisor 18000 B003 163 Main St,Glasgow
SA9 Mary Howe Assistant 9000 B007 16 Argyll St, Aberdeen
SG5 Susan Brand Manager 24000 B003 163 Main St,Glasgow
SL41 Julie Lee Assistant 9000 B005 22 Deer Rd, London
A method to remove all these anomalies and bring the Conversion to 1NF
database to a consistent state.
Consider the relation Course_info
Rules :
All the attributes in a relation must have atomic
domains. ▪ Each attribute must contain only a single
The values in an atomic domain must be indivisible value from its pre-defined domain.
units.
1.With the first approach, we remove the repeating group (property rented
details) by entering the appropriate client data into each row.
ClientNo propertyNo cName pAddress rentStart rentFinish rent ownerNo oName
John 6 lawrence Tina
CR76 PG4 1-Jul-00 31-Aug-01 350 CO40
Kay St,Glasgow Murphy
John 5 Novar Dr, Tony
CR76 PG16 1-Sep-02 1-Sep-02 450 CO93
Kay Glasgow Shaw
Aline 6 lawrence Tina
CR56 PG4 1-Sep-99 10-Jun-00 350 CO40
Stewart St,Glasgow Murphy
Tony
Aline 2 Manor Rd,
CR56 PG36 10-Oct-00 1-Dec-01 370 CO93 Shaw
Stewart Glasgow
Tony
Aline 5 Novar Dr,
CR56 PG16 1-Nov-02 1-Aug-03 450 CO93 Shaw
Stewart Glasgow
Full functional dependency indicates that if A Second Normal Form : a relation that is in first
and B are attributes of a relation, B is fully normal form and every non-primary-key attribute is
functionally dependent on A if B is fully functionally dependent on the primary key.
functionally dependent on A, but not on any
proper subset of A.
The normalization of 1NF relations to 2NF involves
the removal of partial dependencies. If a partial
dependency exists, we remove the function dependent
A functional dependency A B is partially attributes from the relation by placing them in a new
dependent if there is some attributes that can relation along with a copy of their determinant.
be removed from A and the dependency still
holds.
PropertyOwner
Rules :
▪ For a relation to be in Third Normal Form, it must be in Second Normal form and the
following must satisfy −
▪ No non-prime attribute is transitively dependent on prime key attribute.
▪ For any non-trivial functional dependency, X → A, then either
◦ X is a super key or,
◦ A is prime attribute.
Consider relation Stud_info
Stud_info {rno,name,marks,zip,city,dob}
rno name marks zip city dob
Reflection Spot 3
Question. Does the above relation satisfy the 3 NF criteria??If no convert it to 3 NF.
Ans : No. The given relation does not satisfy the 3NF as it contains following transitive
dependency:
We have : Rno->zip but zip->city
Therefore Rno->city (Transitive Dependency)
Conversion to 3NF :
Stud_info {rno,name,marks,dob,zip}
Zip_city {zip,city}
zip city
Owner
propertyNo pAddress rent ownerNo
ownerNo oName
PG4 6 lawrence St,Glasgow 350 CO40
PropertyOwner CO40 Tina Murphy
PG16 5 Novar Dr, Glasgow 450 CO93
CO93 Tony Shaw
PG36 2 Manor Rd, Glasgow 370 CO93
1
SUBJECTS TABLE (key = Subject)
1
SUBJECTS TABLE (key = Subject)
A subject can be listed MANY times in
the results table (for different students)
8 1
1
SUBJECTS TABLE (key = Subject)
1
A student can be listed MANY times in
the results table (for different subjects)
8
1
SUBJECTS TABLE (key = Subject)
1
A student can be listed MANY times in
the results table (for different subjects)
8
8
1
1
8
1
1
8
1
Or…
References
▪ Query processing
▪ translation of query into low-level activities
▪ evaluation of query
▪ data extraction
▪ Query optimization
▪ selecting the most efficient query evaluation
▪ Definition- The process of choosing a suitable execution strategy for processing a
query.
8/13/2020 31
DBMS
Query Processing (Cont.)
8/13/2020 31
Query Processing(Cont.)
DBMS
8/13/2020 31
Query Processing(Cont.)
SELECT LNAME, FNAME
FROM EMPLOYEE
WHERE SALARY > (SELECT MAX (SALARY)
FROM EMPLOYEE
WHERE DNO = 5);
8/13/2020 31
Query Processing(Cont.)
DBMS
8/13/2020 31
Why Optimize?
8/13/2020 31
Evaluation Plans
▪ An execution plan for a relational algebra query consists of a combination of the
relational algebra query tree and information about the access methods to be used
for each relation as well as the methods to be used in computing the relational
operators stored in the tree.
▪ Materialized evaluation: the result of an operation is stored as a temporary
relation.
▪ Pipelined evaluation: as the result of an operator is produced, it is forwarded to the
next operator in sequence
8/13/2020 31
Evaluation Plans
▪ Specify which access path to follow
▪ Specify which algorithm to use to evaluate operator
▪ Specify how operators interleave
▪ Optimization: πname
courseid;
σname=Paul index-nested loop
student
course
cid; hash join
student
student takes
8/13/2020 31
Estimating Cost
• What needs to be considered:
▪ Disk I/Os
▪ sequential
▪ random
▪ CPU time
▪ Network communication
▪ What are we going to consider:
▪ Disk I/Os
▪ page reads/writes
▪ Ignoring cost of writing final output
8/13/2020 31
Estimating Cost
• What needs to be considered:
• operation (σ, π, …)
• implementation
• size of inputs
• size of outputs
• sorting
• transforms expressions
• equivalent expressions
• heuristics, rules of thumb
• perform selections early
• perform projections early
• replace products followed by selection σ (R x S) with joins R S
• start with joins, selections with smallest result
8/13/2020 • create left-deep join trees 31
Combining Operations using Pipelining
■ Motivation
▪ A query is mapped into a sequence of operations.
▪ Each execution of an operation produces a temporary result.
▪ Generating and saving temporary files on disk is time consuming and expensive.
■ Alternative:
▪ Avoid constructing temporary results as much as possible.
▪ Pipeline the data through multiple operations - pass the result of a previous operator to
the next without waiting to complete the previous operation.
Example:
For a 2-way join, combine the 2 selections on the input and one projection on the output
with the Join.
Results of a select operation are fed in a "Pipeline" to the join algorithm
31
Using Selectivity and Cost Estimates in Query
Optimization
■ Cost-based query optimization:
▪ Estimate and compare the costs of executing a query using different execution
strategies and choose the strategy with the lowest cost estimate.
▪ (Compare to heuristic query optimization)
■ Issues :
▪ Cost function
▪ Number of execution strategies to be considered
■ Cost Components for Query Execution
▪ Access cost to secondary storage
▪ Storage cost
▪ Computation cost
▪ Memory usage cost
8/13/2020 31
References
DBMS 32
8/13/2020
End of Unit 1 Part 2