0% found this document useful (0 votes)
8 views77 pages

Normalisation

ppt on normalisation

Uploaded by

haamidatksa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views77 pages

Normalisation

ppt on normalisation

Uploaded by

haamidatksa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 77

CS312-Database

Management Systems

SCHOOL OF COMPUTER ENGINEERING AND TECHNOLOGY


CS312-Database Management Systems
Examination scheme: Marks-50 [Continuous Assessment]
Course Objectives:

1. Understand and successfully apply logical database design principles, including E-R diagrams and database normalization.

2. Learn Database Programming languages and apply in DBMS applications.

3. Understand transaction processing and concurrency control in DBMS.

4. Learn database architectures, DBMS advancements and its usage in advance applications.

Course Outcomes: Upon completion of the course, the students will be able to:

1. Design ER-models to represent simple database application scenarios and Improve the database design by normalization.

2. Design Database Relational Model and apply SQL , PLSQL concepts for database programming.

3. Describe Transaction Processing and Concurrency Control techniques for databases.

4. Identify appropriate database architecture for the real world database applications.

DATABASE MANAGEMENT SYSTEMS 2


Module 2- Relational Database Design and Normalisation
Relational Database Design and Normalization
Relational Model: Attributes, Tuple, Domain, CODD’s rule Relational Integrity,
Referential Integrities, Enterprise Constraints, Normalization: 1NF, 2NF, 3NF,
BCNF, Functional Dependency, Decomposition
Query Processing: Overview, Measures of Query cost, Selection and Join
operations, Evaluation of Expressions
Introduction to Query optimization: Estimation, Transformation of Relational
Expression

DATABASE MANAGEMENT SYSTEMS 3


Codd’s Rules

Rule 1: Information Rule


• All Information (including metadata) is to be
represented as data stored in cells of tables.
• The rows and columns have to be strictly unordered.
Rule 2: Guaranteed Access
• Each unique piece of data (atomic value) should be accesible by :
TableName + Primary Key (Row) + Attribute (Column)
• Violation: Ability to directly access via pointers
Rule3: Systematic treatment of NULL
• NULLs may mean: Missing data, Not applicable, No value
• Should be handled consistently - Not Zero or Blank
• Primary keys — Not NULL
• expressions on NULL should give NULL

DATABASE MANAGEMENT SYSTEMS 4


Codd’s Rules

Rule4: Active On-Line Catalogue


• Database dictionary (Catalog) to have description of the Database
• Catalog to be governed by same rules as rest of the database
• The same query language to be used on catalog as on the application database
Rule5: Powerful language
• One well defined language to provide all manners of access to data
• Example: SQL
• If file supporting table can be accessed by any manner except a SQL Interface, then a
violation
Rule6: View Updation Rule
• All views that are theoretically updatable should be updatable
• Example: If a view is formed as join of 3 tables, changes to view should be
reflected in base tables

DATABASE MANAGEMENT SYSTEMS 5


Codd’s Rules

Rule7: Relational level operations


• There must be insert, update, delete operations at the level of Relations
• Set operations like Union, Intersection and Minus should be supported

Rule8: Physical Data Independence


• The physical storage of data should not matter to the system
• If say, some file supporting table was renamed or moved from one disk to
another, it should not effect the applications.
Rule9: Logical Data Independence
• If there is change in the logical structure (table structures) of the database
the user view of the data should not change
• implemented through views. Say, if a table is split into two tables, a new
view should give result as the join of the two tables

DATABASE MANAGEMENT SYSTEMS 6


Codd’s Rules

Rule10: Integrity Independence


• The database should be able to enforce its own integrity rather than using other programs
• Integrity rules = Filter to allow correct data, should be stored in Data Dictionary
• Key and check constraints, triggers etc should be stored in Data Dictionary
• This also makes RDBMS independent of front end
Rule11: Distribution Independence
• A database should work properly regardless of its distribution across a network
• This lays foundation of Distributed databases
• Similiar to Rule8 only that applies to distribution on a local Disk
Rule12: Nonsubversion Rule
• If low level access is allowed to a system it should not be able to subvert or bypass
integrity rules to change data
• This may be achieved by some sort of locking or encryption
• Some low level access tools are provided by vendors that violate these rules for extra
speed
DATABASE MANAGEMENT SYSTEMS 7
Codd’s Rules

1.Information represented at the logical level in tables.


2. Data is determined by table, primary key, and column.
3. Missing information is modeled as null values.
4. Metadata is part of the database.
5. Single language for all tasks in DBMS.
6. Views and tables must change simultaneously.

DATABASE MANAGEMENT SYSTEMS 8


Codd’s Rules

7. Single operations for retrieve, insert, delete, update.


8. Operations independent of physical storage and access.
9. Database modifiable without affecting applications.
10. Constraints are part of database.
11. DML independent of physical layer (distributed, etc.)
12. Row-processing obeys same rules as set-processing.

DATABASE MANAGEMENT SYSTEMS 9


Functional Dependency
▪ Redundancy in relational databases is often caused by a functional dependency
▪ A functional dependency (FD) : a link between two sets of attributes in a relation
▪ We can normalize a relation by removing undesirable FD
▪ A set of attributes, A, functionally determines another set, B, or: there exists a functional
dependency between A and B (A ->B)
▪ If whenever two rows of the relation have the same values for all the attributes in A, then they also
have the same values for all the attributes in B.

B is
A functionally B
dependent on
A
Determina Refers to the attribute or group of attributes on the
nt left-hand side of the arrow of a functional dependency

DATABASE MANAGEMENT SYSTEMS 10


Functional Dependencies Continued
Example
Set of FDs :
1. {ID} - >{First, Last}
2. {ID, modCode}->{First, Last, modName}
3. {modCode}->{modName}

Represented by an arrow sign (→) that is, X→Y,


where X functionally determines Y.
The left-hand side attributes determine the values of
attributes on the right-hand side.

DATABASE MANAGEMENT SYSTEMS 11


Fully Functional Dependencies Continued
Example

{Roll_Number, Subject_Name} –> Paper_Hour


Roll_Number –> Paper_Hour
Subject_Name –> Paper_Hour

DATABASE MANAGEMENT SYSTEMS 12


Partial Functional Dependencies Continued
Example

{Roll_Number, Subject_Name} –> Student_Name


Roll_Number –> Student_Name

DATABASE MANAGEMENT SYSTEMS 13


Transitive Functional Dependencies Continued
Example

Roll_Number –> Pin_Code


Pin_Code –> City_Name

DATABASE MANAGEMENT SYSTEMS 14


Trivial Functional Dependencies Continued
Example

{Roll_Number, Student_Name} –> Roll_Number

DATABASE MANAGEMENT SYSTEMS 15


Functional Dependencies Continued

Movie( Title, Year, Length, Genre, StudioName, StarName)

DATABASE MANAGEMENT SYSTEMS 16


Functional Dependencies Continued

StudID -> lastname


StudID-> lastname, status, credits
Status -> Credits

DATABASE MANAGEMENT SYSTEMS 17


Armstrong’s Axioms
▪ Closure : If F is a set of functional dependencies then the closure of F, denoted as F+ , is the set of all functional
dependencies logically implied by F.
▪ Set of rules, that when applied repeatedly, generates a closure of functional dependencies.
▪ Rules are as follows:
Reflexive rule : if Y ⊆ X then X Y
Augmentation rule : if X Y then XZ YZ for any Z
Transitivity rule : if X Y and Y Z then X Z
Union : if X Y and X Z then X YZ
Decomposition : if X YZ then X Y and X Z
Pseudo Transitivity : if X Y and YZ W then XZ W

DATABASE MANAGEMENT SYSTEMS 18


Example
R = (A, B, C, G, H, I)
F= {A → B, A → C, CG → H, CG → I , B → H}

A → H. Since A → B and B → H hold, we apply the transitivity rule

CG → HI. Since CG → H and CG → I, the union rule implies that CG → HI.

AG → I. Since A → C and CG → I , the pseudotransitivity rule implies that AG → I holds.

A → C to infer AG → CG. Applying the transitivity rule to this


dependency and CG → I, we infer AG → I .—Augmentation rule

DATABASE MANAGEMENT SYSTEMS 19


Functional Dependency Types
▪ Trivial Functional Dependency :
If a functional dependency (FD) X → Y holds, where Y is a subset of X, then it is called a trivial FD.
Trivial FDs always hold.
▪ Non-trivial Functional Dependency : If an FD X → Y holds, where Y is not a subset of X, then it is
called a non-trivial FD.
▪ Completely Non-trivial : If an FD X → Y holds, where x intersect Y = Φ, it is said to be a completely
non-trivial FD.

DATABASE MANAGEMENT SYSTEMS 20


Non-trivial Functional Dependency
An employee table with three attributes: emp_id, emp_name, emp_address.
emp_id -> emp_name (emp_name is not a subset of emp_id)
emp_id -> emp_address (emp_address is not a subset of emp_id)
Trivial Functional Dependency
{emp_id, emp_name} -> emp_name [emp_name is a subset of {emp_id, emp_name}]

DATABASE MANAGEMENT SYSTEMS 21


Canonical Cover
▪ Minimal cover:
Definition 1:
A minimal cover of a set of FDs F is a minimal set of functional dependencies Fmin that is equivalent to
F. There can be many such minimal covers for a set of functional dependencies F.
Definition 2:
A set of FDs F is minimum if F has as few FDs as any equivalent set of FDs.
Simple properties/steps of minimal cover:
1. Right Hand Side (RHS) of all FDs should be single attribute.
2. Remove extraneous attributes.
3. Eliminate redundant functional dependencies.

DATABASE MANAGEMENT SYSTEMS 22


Canonical cover Example
R=(ABC)
F= {A → BC, B → C, A → B, AB → C}
Let us compute a canonical cover for F.
• There are two functional dependencies with the same set of attributes on the left
side of the arrow: A → BC, A → B
We combine these functional dependencies into A → BC.
• A is extraneous in AB → C because F logically implies (F − {AB → C}) ∪ {B → C}. This assertion is
true because B →C is already in our set of functional dependencies.
• C is extraneous in A → BC, since A → BC is logically implied by A → B and B → C.

A → B, B → C

DATABASE MANAGEMENT SYSTEMS 23


Canonical cover Example
Given a relational Schema R( A, B, C, D) and set of Function Dependency
FD = { B → A, AD → BC, C → ABD }. Find the canonical cover?

1. B → A
2. AD → B ( using decomposition inference rule on AD → BC)
AD → C ( using decomposition inference rule on AD → BC)
3. C → A ( using decomposition inference rule on C → ABD)
C → B ( using decomposition inference rule on C → ABD)
C → D ( using decomposition inference rule on C → ABD)

Now new set of FD = { B → A, AD → B, AD → C, C → A, C → B, C → D }

DATABASE MANAGEMENT SYSTEMS 24


Canonical cover Example
R = {E, F, G, H, I, J, K, L, M, N} and the set of functional dependencies {{E, F} -> {G}, {F} ->
{I, J}, {E, H} -> {K, L}, K -> {M}, L -> {N} on R. What is the key for R?

A. {E, F}
B. {E, F, H}
C. {E, F, H, K, L}
D. {E}

DATABASE MANAGEMENT SYSTEMS 25


Canonical cover Example
In a schema with attributes A, B, C, D and E following set of functional dependencies are
given
{A -> B, A -> C, CD -> E, B -> D, E -> A}
Which of the following functional dependencies is NOT implied by the above set?

A. CD -> AC
B. BD -> CD Hint:
C. BC -> CD To check whether an FD A->B can be derived from an
D. AC -> BC FD set F,

Find (A)+ using FD set F.


If B is subset of (A)+, then A->B is true else not true.

DATABASE MANAGEMENT SYSTEMS 26


Database Normalization

DATABASE MANAGEMENT SYSTEMS 27


Database Normalization : Need
What is an Anomaly?
Anything we try to do with a database that leads to unexpected and/or unpredictable
results.
Relations that have redundant data may have problems called update anomalies
Three types of Update Anomaly to guard against:
▪ insert
▪ delete
▪ update

Need to check your database design carefully:


▪ the only good database is an anomaly free database.

DATABASE MANAGEMENT SYSTEMS 28


Reflection Spot 1

Question. Consider the database given below. Suppose we want to insert a new
staff in the StaffBranch relation. What can be the problem for inserting new
staff details?
StaffBranch
staffNo sName position salary branchNo bAddress
SL21 John White Manager 30000 B005 22 Deer Rd, London
SG37 Ann Beech Assistant 12000 B003 163 Main St,Glasgow
SG14 David Ford Supervisor 18000 B003 163 Main St,Glasgow
SA9 Mary Howe Assistant 9000 B007 16 Argyll St, Aberdeen
SG5 Susan Brand Manager 24000 B003 163 Main St,Glasgow
SL41 Julie Lee Assistant 9000 B005 22 Deer Rd, London

VASUNDHARA GHATE,MIT COE,IT DEPT 29


1.Insert Anomaly
Answer : The attempt to insert staff details will be prevented/not allowed unless
and until it has been associated with some branch.
When we want to enter a value into a data cell but the attempt is prevented, as
another value is not known it leads to Insert Anomaly
staffNo sName position salary branchNo bAddress
SL21 John White Manager 30000 B005 22 Deer Rd, London
SG37 Ann Beech Assistant 12000 B003 163 Main St,Glasgow
SG14 David Ford Supervisor 18000 B003 163 Main St,Glasgow
SA9 Mary Howe Assistant 9000 B007 16 Argyll St, Aberdeen
SG5 Susan Brand Manager 24000 B003 163 Main St,Glasgow
SL41 Julie Lee Assistant 9000 B005 22 Deer Rd, London
? ?

DATABASE MANAGEMENT SYSTEMS 30


2. Delete Anomaly
▪ When a value we want to delete also means we will delete values we wish to keep.
▪Example : To delete a tuple that represents the last member of staff located at a
branch B007.
▪If we delete the staff details the branch also will be deleted.
staffNo sName position salary branchNo bAddress
SL21 John White Manager 30000 B005 22 Deer Rd, London
SG37 Ann Beech Assistant 12000 B003 163 Main St,Glasgow
SG14 David Ford Supervisor 18000 B003 163 Main St,Glasgow
SA9 Mary Howe Assistant 9000 B007 16 Argyll St, Aberdeen
SG5 Susan Brand Manager 24000 B003 163 Main St,Glasgow
SL41 Julie Lee Assistant 9000 B005 22 Deer Rd, London

DATABASE MANAGEMENT SYSTEMS 31


3.Update Anomaly
▪ When we want to change a single data item value, but must update multiple
entries
e.g. To change the address of B003.If we update it in one tuple it is to be updated
for all staff associated with that branch

staffNo sName position salary branchNo bAddress


SL21 John White Manager 30000 B005 22 Deer Rd, London
SG37 Ann Beech Assistant 12000 B003 163 Main St,Glasgow
SG14 David Ford Supervisor 18000 B003 163 Main St,Glasgow
SA9 Mary Howe Assistant 9000 B007 16 Argyll St, Aberdeen
SG5 Susan Brand Manager 24000 B003 163 Main St,Glasgow
SL41 Julie Lee Assistant 9000 B005 22 Deer Rd, London

DATABASE MANAGEMENT SYSTEMS 32


Database Normalization Forms

Normalization:a technique for producing a set of relations


with desirable properties, given the data requirements of an
enterprise.
The process of normalization is a formal method that
identifies relations based on their primary or candidate keys
and the functional dependencies among their attributes.
Normal Forms :
▪ 1 NF (Atomicity)
▪ 2 NF (Remove Partial Dependency)
▪ 3NF (Remove Transitive Dependency)
▪ Boyce Codd NF (Super key)
▪ 4 NF (Multi-valued Dependencies)
▪ 5 NF (Join Dependency)

DATABASE MANAGEMENT SYSTEMS 33


1 NF : First Normal Form

A method to remove all these anomalies and bring the Conversion to 1NF
database to a consistent state.
Consider the relation Course_info

Rules :
All the attributes in a relation must have atomic
domains. ▪ Each attribute must contain only a single
The values in an atomic domain must be indivisible value from its pre-defined domain.
units.

DATABASE MANAGEMENT SYSTEMS 34


1 NF : First Normal Form

Consider the relation ClientRental


Repeating group = (propertyNo, pAddress,
rentStart, rentFinish, rent, ownerNo, oName)

ClientNo cName propertyNo pAddress rentStart rentFinish rent ownerNo oName


6 lawrence Tina
1-Jul-00 31-Aug-01 350 CO40
PG4 St,Glasgow Murphy
John
CR76
kay
PG16 5 Novar Dr, Tony Shaw
1-Sep-02 1-Sep-02 450 CO93
Glasgow
6 lawrence
PG4 1-Sep-99 10-Jun-00 350 CO40 Tina
St,Glasgow
Murphy

Aline 2 Manor Rd,


CR56
Stewart
PG36
Glasgow ▪ Each
10-Oct-00 1-Dec-01 370
attribute must CO93
contain only Tony Shaw
a single
value from its pre-defined domain. Tony Shaw
5 Novar Dr,
PG16 1-Nov-02 1-Aug-03 450 CO93
Glasgow

DATABASE MANAGEMENT SYSTEMS 35


1 NF : First Normal Form

There are two approaches to removing repeating groups from unnormalized


tables:
1. Removes the repeating groups by entering appropriate data in the empty
columns of rows containing the repeating data.\
2. Removes the repeating group by placing the repeating data,
along with a copy of the original key attribute(s), in a separate
relation. A primary key is identified for the new relation.

DATABASE MANAGEMENT SYSTEMS 36


1 NF : First Normal Form Continued

1.With the first approach, we remove the repeating group (property rented
details) by entering the appropriate client data into each row.
ClientNo propertyNo cName pAddress rentStart rentFinish rent ownerNo oName
John 6 lawrence Tina
CR76 PG4 1-Jul-00 31-Aug-01 350 CO40
Kay St,Glasgow Murphy
John 5 Novar Dr, Tony
CR76 PG16 1-Sep-02 1-Sep-02 450 CO93
Kay Glasgow Shaw
Aline 6 lawrence Tina
CR56 PG4 1-Sep-99 10-Jun-00 350 CO40
Stewart St,Glasgow Murphy
Tony
Aline 2 Manor Rd,
CR56 PG36 10-Oct-00 1-Dec-01 370 CO93 Shaw
Stewart Glasgow
Tony
Aline 5 Novar Dr,
CR56 PG16 1-Nov-02 1-Aug-03 450 CO93 Shaw
Stewart Glasgow

1NF ClientRental relation with the first approach

DATABASE MANAGEMENT SYSTEMS 37


1 NF : First Normal Form Continued
2.With the second approach, we remove the repeating group (property rented details)
by placing the repeating data along with a copy of the original key attribute (clientNo)
in a separate relation. ClientRental

ClienNo propertyNo pAddress rentStart rentFinish rent ownerNo oName


6 lawrence Tina
CR76 PG4 1-Jul-00 31-Aug-01 350 CO40
St,Glasgow Murphy
5 Novar Dr, Tony
CR76 PG16 1-Sep-02 1-Sep-02 450 CO93
Glasgow Shaw
Client 6 lawrence Tina
CR56 PG4 1-Sep-99 10-Jun-00 350 CO40
St,Glasgow Murphy
2 Manor Rd, Tony
CR56 PG36 10-Oct-00 1-Dec-01 370 CO93
ClientNo cName Glasgow Shaw
CR76 John Kay 5 Novar Dr, Tony
CR56 PG16 1-Nov-02 1-Aug-03 450 CO93
Glasgow Shaw
CR56 Aline Stewart

1NF ClientRental relation with the second approach

DATABASE MANAGEMENT SYSTEMS 38


2 NF : Second Normal Form

Full functional dependency indicates that if A Second Normal Form : a relation that is in first
and B are attributes of a relation, B is fully normal form and every non-primary-key attribute is
functionally dependent on A if B is fully functionally dependent on the primary key.
functionally dependent on A, but not on any
proper subset of A.
The normalization of 1NF relations to 2NF involves
the removal of partial dependencies. If a partial
dependency exists, we remove the function dependent
A functional dependency A B is partially attributes from the relation by placing them in a new
dependent if there is some attributes that can relation along with a copy of their determinant.
be removed from A and the dependency still
holds.

DATABASE MANAGEMENT SYSTEMS 39


2 NF : Second Normal Form
2 NF Conversion to 2 NF
Prime attribute − An attribute, which is a part From example , we find that Stu_Name can be identified
of the prime-key, is known as a prime attribute. by Stu_ID and Proj_Name can be identified by Proj_ID
independently. This is called partial dependency, which
Non-prime attribute − An attribute, which is is not allowed in Second Normal Form.
not a part of the prime-key, is said to be a
non-prime attribute. Therefore, we can convert the relation into as shown
below
Consider the relation student_project
Student {Stud_id,Stud-name} and
{Stud_id, Proj_id, Stud_name, Project_Title} Stud_id Stud_name
Rule : Every non-prime attribute should be Project {Proj_id,Project_Title}
fully functionally dependent on prime key
attribute. That is, if X → A holds, then there Prod_id Project_Title
should not be any proper subset Y of X, for
which Y → A also holds true.
DATABASE MANAGEMENT SYSTEMS 40
2 NF: Second Normal Form Continued

The ClientRental relation has the following functional dependencies:


fd1 clientNo, propertyNo rentStart, rentFinish (Primary Key)
fd2 clientNo cName (Partial dependency)
fd3 propertyNo pAddress, rent, ownerNo, oName (Partial dependency)
fd4 ownerNo oName (Transitive Dependency)
fd5 clientNo, rentStart propertyNo, pAddress,
rentFinish, rent, ownerNo, oName (Candidate key)
fd6 propertyNo, rentStart clientNo, cName, rentFinish (Candidate key)

DATABASE MANAGEMENT SYSTEMS 41


2 NF: Second Normal Form Continued

After removing the partial dependencies, the ClientNo cName


creation of the three CR76 John Kay Client
CR56 Aline Stewart
new relations called Client, Rental, and
PropertyOwner ClientNo propertyNo rentStart rentFinish
CR76 PG4 1-Jul-00 31-Aug-01
Rental CR76 PG16 1-Sep-02 1-Sep-02
Client(clientNo, cName) CR56 PG4 1-Sep-99 10-Jun-00
Rental(clientNo, propertyNo, rentStart, CR56 PG36 10-Oct-00 1-Dec-01
CR56 PG16 1-Nov-02 1-Aug-03
rentFinish)
PropertyOwner (propertyNo, pAddress, propertyNo pAddress rent ownerNo oName
rent, ownerNo, oName) PG4 6 lawrence St,Glasgow 350 CO40 Tina Murphy

PG16 5 Novar Dr, Glasgow 450 CO93 Tony Shaw


PG36 2 Manor Rd, Glasgow 370 CO93 Tony Shaw

PropertyOwner

DATABASE MANAGEMENT SYSTEMS 42


3 NF : Third Normal Form
Third normal form (3NF)
Transitive dependency A relation that is in first and second normal
form, and in which no non-primary-key
A condition where A, B, and C are attributes of attribute is transitively dependent on the
a relation such that primary key.
if A B and B C, then C is transitively
dependent on A via B
The normalization of 2NF relations to 3NF
(provided that A is not functionally dependent involves the removal of transitive
on B or C). dependencies by placing the attribute(s) in a
new relation along with a copy of the
determinant.

DATABASE MANAGEMENT SYSTEMS 43


3 NF : Third Normal Form

Rules :
▪ For a relation to be in Third Normal Form, it must be in Second Normal form and the
following must satisfy −
▪ No non-prime attribute is transitively dependent on prime key attribute.
▪ For any non-trivial functional dependency, X → A, then either
◦ X is a super key or,
◦ A is prime attribute.
Consider relation Stud_info
Stud_info {rno,name,marks,zip,city,dob}
rno name marks zip city dob
Reflection Spot 3
Question. Does the above relation satisfy the 3 NF criteria??If no convert it to 3 NF.

DATABASE MANAGEMENT SYSTEMS 44


3 NF : Third Normal Form

Ans : No. The given relation does not satisfy the 3NF as it contains following transitive
dependency:
We have : Rno->zip but zip->city
Therefore Rno->city (Transitive Dependency)

Conversion to 3NF :
Stud_info {rno,name,marks,dob,zip}

rno name marks dob zip

Zip_city {zip,city}
zip city

DATABASE MANAGEMENT SYSTEMS 45


3 NF : Third Normal Form Continued
The functional dependencies for the Client, Rental and PropertyOwner relations are as follows:
Client
fd2 clientNo cName (Primary Key)
Rental
fd1 Agent
clientNo, propertyNoProduct
Company rentStart, rentFinish (Primary Key)
Smith
fd5 clientNo, Ford
rentStart car
propertyNo, rentFinish (Candidate key)
Smith
fd6 propertyNo, Ford
rentStart truck rentFinish
clientNo, (Candidate key)
Smith GM car
PropertyOwner
Smith GM Truck
fd3 propertyNo
Jones
pAddress, rent,
Ford Car
ownerNo, oName (Primary Key)
fd4 ownerNo oName (Transitive Dependency)

DATABASE MANAGEMENT SYSTEMS 46


3 NF : Third Normal Form Continued
The resulting 3NF relations have the forms:
ClientNo cName
CR76 John Kay Client
CR56 Aline Stewart

Client (clientNo, cName) Rental


ClientNo propertyNo rentStart rentFinish
CR76 PG4 1-Jul-00 31-Aug-01
Rental (clientNo, propertyNo, rentStart, rentFinish)
CR76 PG16 1-Sep-02 1-Sep-02
PropertyOwner (propertyNo, pAddress, rent, ownerNo) CR56 PG4 1-Sep-99 10-Jun-00
CR56 PG36 10-Oct-00 1-Dec-01
Owner (ownerNo, oName) CR56 PG16 1-Nov-02 1-Aug-03

Owner
propertyNo pAddress rent ownerNo
ownerNo oName
PG4 6 lawrence St,Glasgow 350 CO40
PropertyOwner CO40 Tina Murphy
PG16 5 Novar Dr, Glasgow 450 CO93
CO93 Tony Shaw
PG36 2 Manor Rd, Glasgow 370 CO93

DATABASE MANAGEMENT SYSTEMS 47


Exercise : Is it 1NF?

Take the following table.

StudentID is the primary key.


Create new rows so each cell contains only one value

But now look – is the studentID primary key still valid?


So. We now have 1NF.
Create new tables for Normalization

Create a new table for each primary key field


Give each new table its own primary key
Move columns from the original table to the new table that matches their primary
key.
Step 1

STUDENT TABLE (key = StudentID)


Step 2

STUDENT TABLE (key = StudentID)

SUBJECTS TABLE (key = Subject)


Step 3

STUDENT TABLE (key = StudentID)

SUBJECTS TABLE (key = Subject)

RESULTS TABLE (key = StudentID+Subject)


Step 4 - relationships
STUDENT TABLE (key = StudentID)

SUBJECTS TABLE (key = Subject)

RESULTS TABLE (key = StudentID+Subject)


Step 4 - cardinality

STUDENT TABLE (key = StudentID)

1 Each student can only appear ONCE in


the student table SUBJECTS TABLE (key = Subject)

RESULTS TABLE (key = StudentID+Subject)


Step 4 - cardinality
STUDENT TABLE (key = StudentID)

1
SUBJECTS TABLE (key = Subject)

Each subject can only appear ONCE in


the subjects table

RESULTS TABLE (key = StudentID+Subject)


Step 4 - cardinality
STUDENT TABLE (key = StudentID)

1
SUBJECTS TABLE (key = Subject)
A subject can be listed MANY times in
the results table (for different students)
8 1

RESULTS TABLE (key = StudentID+Subject)


Step 4 - cardinality
STUDENT TABLE (key = StudentID)

1
SUBJECTS TABLE (key = Subject)

1
A student can be listed MANY times in
the results table (for different subjects)
8

RESULTS TABLE (key = StudentID+Subject)


Step 4 - cardinality
STUDENT TABLE (key = StudentID)

1
SUBJECTS TABLE (key = Subject)

1
A student can be listed MANY times in
the results table (for different subjects)
8

RESULTS TABLE (key = StudentID+Subject)


A 3NF fix

8
1
1

SUBJECTS TABLE (key = Subject)


8

RESULTS TABLE (key = StudentID+Subject)


A 3NF win!

8
1
1

8
1

SUBJECTS TABLE (key = Subject)


RESULTS TABLE (key = StudentID+Subject)

Or…
References

DATABASE MANAGEMENT SYSTEMS 63


Query Processing

▪ Query processing
▪ translation of query into low-level activities
▪ evaluation of query
▪ data extraction
▪ Query optimization
▪ selecting the most efficient query evaluation
▪ Definition- The process of choosing a suitable execution strategy for processing a
query.

8/13/2020 31
DBMS
Query Processing (Cont.)

8/13/2020 31
Query Processing(Cont.)

1. Translating SQL Queries into Relational Algebra


▪ Query block:
▪ The basic unit that can be translated into the algebraic operators and optimized.
▪ A query block contains a single SELECT-FROM-WHERE expression, as well as
GROUP BY and HAVING clause if these are part of the block.
▪ Nested queries within a query are identified as separate query blocks.
▪ Aggregate operators in SQL must be included in the extended algebra.

DBMS
8/13/2020 31
Query Processing(Cont.)
SELECT LNAME, FNAME
FROM EMPLOYEE
WHERE SALARY > (SELECT MAX (SALARY)
FROM EMPLOYEE
WHERE DNO = 5);

SELECT LNAME, FNAME SELECT MAX (SALARY)


FROM EMPLOYEE FROM EMPLOYEE
WHERE SALARY > C WHERE DNO = 5

πLNAME, FNAME (σSALARY>C(EMPLOYEE)) ℱMAX SALARY (σDNO=5 (EMPLOYEE))

8/13/2020 31
Query Processing(Cont.)

• SELECT * FROM student WHERE name=Paul


• σname=Paul(student)
• πname( σcid<00112235(student) )
• πname(σcoursename=Advanced DBs((student cid takes) courseid course) )

DBMS

8/13/2020 31
Why Optimize?

• Many alternative options to evaluate a query

• Several options to evaluate a single operation


• σname=Paul(student)
• scan file
• use secondary index on student.name
• Multiple access paths
• access path: how can records be accessed

8/13/2020 31
Evaluation Plans
▪ An execution plan for a relational algebra query consists of a combination of the
relational algebra query tree and information about the access methods to be used
for each relation as well as the methods to be used in computing the relational
operators stored in the tree.
▪ Materialized evaluation: the result of an operation is stored as a temporary
relation.
▪ Pipelined evaluation: as the result of an operator is produced, it is forwarded to the
next operator in sequence

8/13/2020 31
Evaluation Plans
▪ Specify which access path to follow
▪ Specify which algorithm to use to evaluate operator
▪ Specify how operators interleave
▪ Optimization: πname

▪ estimate the cost of each plan (not all plans)


▪ select plan with lowest estimated cost σcoursename=Advanced DBs l
σname=Paul ; use index i

courseid;
σname=Paul index-nested loop
student
course
cid; hash join
student
student takes
8/13/2020 31
Estimating Cost
• What needs to be considered:
▪ Disk I/Os
▪ sequential
▪ random
▪ CPU time
▪ Network communication
▪ What are we going to consider:
▪ Disk I/Os
▪ page reads/writes
▪ Ignoring cost of writing final output

8/13/2020 31
Estimating Cost
• What needs to be considered:
• operation (σ, π, …)
• implementation
• size of inputs
• size of outputs
• sorting

• transforms expressions
• equivalent expressions
• heuristics, rules of thumb
• perform selections early
• perform projections early
• replace products followed by selection σ (R x S) with joins R S
• start with joins, selections with smallest result
8/13/2020 • create left-deep join trees 31
Combining Operations using Pipelining

■ Motivation
▪ A query is mapped into a sequence of operations.
▪ Each execution of an operation produces a temporary result.
▪ Generating and saving temporary files on disk is time consuming and expensive.
■ Alternative:
▪ Avoid constructing temporary results as much as possible.
▪ Pipeline the data through multiple operations - pass the result of a previous operator to
the next without waiting to complete the previous operation.
Example:
For a 2-way join, combine the 2 selections on the input and one projection on the output
with the Join.
Results of a select operation are fed in a "Pipeline" to the join algorithm
31
Using Selectivity and Cost Estimates in Query
Optimization
■ Cost-based query optimization:
▪ Estimate and compare the costs of executing a query using different execution
strategies and choose the strategy with the lowest cost estimate.
▪ (Compare to heuristic query optimization)
■ Issues :
▪ Cost function
▪ Number of execution strategies to be considered
■ Cost Components for Query Execution
▪ Access cost to secondary storage
▪ Storage cost
▪ Computation cost
▪ Memory usage cost
8/13/2020 31
References

1. Connally T, Begg C.,”Database Systems”,Pearson Education


2. Silberschatz−Korth−Sudarshan's Database System Concepts, Seventh
Edition.
3. Ramakrishnan, R. and Gherke, J., “Database Management Systems”,
3rd Ed., McGraw-Hill.
4. MySQL Tutorial
http://www.mysqltutorial.org/, http://www.w3schools.com,

DBMS 32
8/13/2020
End of Unit 1 Part 2

DATABASE MANAGEMENT SYSTEMS 77

You might also like