DBMS DC Unit 3
DBMS DC Unit 3
This document is confidential and intended solely for the educational purpose of
RMK Group of Educational Institutions. If you have received this document
through email in error, please notify the system manager. This document
contains proprietary information and is intended only to the respective group /
learning community as intended. If you are not the addressee you should not
disseminate, distribute or copy through e-mail. Please notify the sender
immediately by e-mail if you have received this document by mistake and delete
this document from your system. If you are not the intended recipient you are
notified that disclosing, copying, distributing or taking any action in reliance on
the contents of this information is strictly prohibited.
22IT202
DATABASE MANAGEMENT SYSTEMS
Created by:
Dr. J. Jeno Jasmine
Mr.D.Kirubakaran
1.TABLE OF CONTENTS
1. Contents
2. Course Objectives
3. Pre Requisites
4. Syllabus
5. Course outcomes
7. Lecture Plan
9. Lecture Notes
10. Assignments
6
3. PRE REQUISITES
PRE-REQUISITE
7
4. SYLLABUS
List of Exercise/Experiments
Case Study using real life database applications anyone from the
following list
a) Inventory Management for a EMart Grocery Shop
b) Society Financial Management
c) Cop Friendly App – Eseva
d) Property Management – eMall
e) Star Small and Medium Banking and Finance
● Build Entity Model diagram. The diagram should align with the
business and functional goals stated in the application.
List of Exercise/Experiments
Case Study using real life database applications anyone from the following
list and do the following exercises.
a) Inventory Management for a EMart Grocery Shop
b) Society Financial Management
c) Cop Friendly App – Eseva
d) Property Management – eMall
e) Star Small and Medium Banking and Finance
List of Exercise/Experiments
1. Case Study using real life database applications anyone from the following list
Inventory Management for a EMart Grocery Shop
Society Financial Management
Cop Friendly App – Eseva
Property Management – eMall
Star Small and Medium Banking and Finance.
Apply Normalization rules in designing the tables in scope.
.
UNIT IV TRANSACTIONS, CONCURRENCY CONTROL AND DATA STORAGE 9+6
Transaction Concepts – ACID Properties – Schedules based on Recoverability,
Serializability – Concurrency Control – Need for Concurrency – Locking Protocols – Two
Phase Locking – Transaction Recovery –Concepts – Deferred Update – Immediate
Update.Organization of Records in Files – Unordered, Ordered – Hashing Techniques –
RAID – Ordered Indexes – Multilevel Indexes - B+ tree Index Files – B tree Index Files.
List of Exercise/Experiments
Case Study using real life database applications anyone from the following list
a) Inventory Management for a EMart Grocery Shop
b) Society Financial Management
c) Cop Friendly App – Eseva
d) Property Management – eMall
e) Star Small and Medium Banking and Finance
Ability to showcase ACID Properties with sample queries with appropriate settings
for the above scenario
UNIT V QUERY OPTIMIZATION AND ADVANCED DATABASES 9+6
Query Processing Overview – Algorithms for SELECT and JOIN operations – Query
optimization using Heuristics.Distributed Database Concepts – Design –Concurrency Control
and Recovery – NOSQL Systems – Document-Based NOSQL Systems and MongoDB.
List of Exercise/Experiments
Case Study using real life database applications anyone from the following list
a) Inventory Management for a EMart Grocery Shop
b) Society Financial Management
c) Cop Friendly App – Eseva
d) Property Management – eMall
e) Star Small and Medium Banking and Finance
design.
PO1 PO2 PO3 PO4 PO5 PO6 PO7 PO8 PO9 PO10 PO11 PO12
CO1 2 1 1 1 1 1 1 2 2 2 2 2
CO2 3 2 2 1 1 1 1 2 2 2 2 2
CO3 2 1 1 1 1 1 1 2 2 2 2 2
CO4 2 1 1 1 1 1 1 2 2 2 2 2
CO5 2 1 1 1 1 1 1 2 2 2 2 2
CO6 2 1 1 1 1 1 1 2 2 2 2 2
Cognitive/
Expected
Course Affective Level
Course Outcome Statement Level of
Code of the Course
Attainment
Outcome
12
6. CO-PO/PSO MAPPING
P P P P P P P P P P P P PS PS PS
O O O O O O O O O O O O O O O
Course
Outcomes (Cos) 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3
K3
K
K4 K5 K5 /K A2 A3 A3 A3 A3 A3 A2 K3 K3 K3
3
5
C212.1 K4 3 3 2 2 3 3 3
C212.2 K3 3 2 1 1 3 3 3
C212.3 K4 3 3 2 2 3 3 3
5
C212.4 K4 3 3 2 2 3 3 3
C212.5 K4 3 3 2 2 3 3 3
C212.6 K4 3 3 2 2 3 3 3
C212.7 A2 3
C212.8 A2 2 2 2 3
C212.9 A3 3 3 3 3 3
C305 3 3 2 2 3 3 3
13
7. LECTURE PLAN
1 1 CO2 K3 PPT
Relational
Algebra
8 1 CO3 K4 PPT
Join
Dependencies
and Fifth
Normal Form
8. ACTIVITY BASED LEARNING
1. Crossword Puzzle
Across
3. Attributes whose values are obtained from other attribute values.
5. If not all / only a few entities in E participate in the relation R.
6. A level that describes how the data is actually stored on disk.
9. The level of how the relationship between data.
10. Results of analysis and synthesis of data.
11. Organized data sets based on a relationship structure.
15. The DBMS component that evaluates the query.
19. Properties of an entity.
22. Records of existing or occurring phenomena or facts.
24. Objects that distinguish from other objects.
26. Minimal set of attributes that can uniquely distinguish each row of data in a
table.
28. Which is used to uniquely distinguish an entity from other entities in the
entity set.
29. Language for manipulating data.
30. Atomic attributes which cannot be further divided into smaller subsections.
DOWN
1. One example of a DBMS.
2. Relationships that point to the same entity.
4. Languages for creating database schemas.
7. DBMS component that provides an interface between application programs and
data stored in the database.
8. Attributes that have only one single value.
12. The set of one or more attributes that can uniquely distinguish each row of
data in a table.
13. Attributes that can be further divided into smaller sub-attributes, which have
meaning.
14. has only a few values.
16. Relationships between several entities.
17. One of the advantages of databases.
18. The main tool for identifying entities in an entity set.
20. The value contained in the database at one time
21. Users who are responsible for database management (abbreviated).
23. Data regarding data.
25. Complete description of the terrain structure, records, and data relationships
in the database.
27. The software creates and maintains databases.
8. ACTIVITY BASED LEARNING
9. LECTURE NOTES
1. RELATIONAL ALGEBRA
select
project
union
set difference
cartesian product
rename
Here the select, project and rename operations are called unary operations,
because they operate on one relation. The other three operations operate on
pairs of relations and are, therefore called binary operations.
2. Operations
1) The select operation:
The lower case Greek letter sigma (σ) is used to denote the selection. The predicate
appears as the subscript to σ.
Comparisons are allowed in the predicate using relational operators, =, , >, . <,
..
Several predicates can be combined into a larger predicate by using the connectives
(and), (or), (not).
Question 1: - Select those tuples of the loan relation where the branch-name
is Perryridge.
The equivalent relational algebra query for the given question is,
Question 2:- Find all the tuples in which the amount lent is more than 1200
in loan relation.
Relational algebra query:
The equivalent relational algebra query for the given question is,
σ amount >1200 (loan)
The result of the query is given below.
loan-number branch-name Amount
Question 3: - Find those tuples pertaining to loans of more than 1200 made by the
perryridge branch.
The project operation is a unary operation that returns its argument relation, with
certain attributes left out. Projection is denoted by the Greek letter pi (). The
attributes that should appear in the result are listed as subscript to the . The
argument relation follows in the parenthesis.
Question 1:- List all the loan-numbers and amount of the loan.
The equivalent relational algebra query for the given question is,
loan-number, amount(loan)
Jones
The union operation is the binary operation which combines two relations. The union
operation is denoted by the letter (). For a union operation r s to be valid, two
conditions must hold.
The relations r and s must be of the same arity. That is they must have the
same number of attributes.
The domains of the ith attribute of r and ith attribute of s must be the same
for all i.
Question 1:- Find the names of all customers who have either an
account or a loan or both?
The equivalent relational algebra query for the given question is,
The set-difference operation denoted by (–), finds the tuples that are in one
relation but are not in another. It is a binary operation. For a set difference
operation r–s to be valid , the relations r and s should be of the same arity and
the domains of the ith attribute of r and ith attribute of s must be the same.
Question 1:- Find all the customers of the bank who have an account but not
a loan?
The equivalent relational algebra query for the given question is,
Result:
5) The Cartesian product operation:
If r1 contains n1 tuples and r2 contains n2 tuples, then there are n1*n2 ways of
choosing a pair of tuples-one tuple from each relation.
Example:
Question 1:- Find the names of all customers who have a loan at the perryridge branch.
This question needs the information in both the loan relation and the borrower relation.
The above relation pertains results to only perryridge branch. However, the
customer-name column may contain customers who do not have a loan at
the perryridge branch. Therefore to obtain the correct result the query has
to be written as below.
σborrower.loan-number = loan.loan-number(σbranch-name = “perryridge”
(borrower X loan))
The result of the above query is given below:
Finally since only the customer-name is needed the projection operation is used as
below.
Final result:
Customer-name
Jackson
The rename operator is used to rename the attributes. The rename operator is
denoted by the lowercase Greek letter rho (). Given a relational algebra
expression E, the expressionx(E) returns the result of expression E under name
x.
x(A1,A2,…,An)(E)
Returns the result of expression E under the name x, and with the attributes
renamed to A1, A2,….,An.
Examples:
Computation:
This query requires to (1) compute first a temporary relation consisting of those
balances that are not the largest and (2) take the set difference between the
relation П balance (account) and the temporary relation just computed, to obtain
the result.
Step 1:
The expression for temporary relation that consists of the balances that are not
largest is
This expression gives those balances in the account relation for which a larger
balance appears somewhere in the account relation renamed as d. The result
contains all balances except the largest one as shown next.
500
400
400
Step 2:
The query to find the largest account balance in the bank can be written as:
Пbalance(account) –
П account.balance(σaccount.balance<d.balance
(accountXd(account )))
balance
900
Question 2:- Find the names of all customers who live on the same street
and in the same city as smith.
Computation:
The smith’s street and city can be obtained by,
Пcustomer-street, customer-city (σcustomer-name = smith”(customer)))
In order to find other customers with this street and city, the customer relation must be
referred second time. The rename operation is used for this purpose. The resulting
Additional Operations:
Division operation
Assignment operation
The set intersection operation is denoted by (). It returns the result that is
common. It is a binary operation.The set intersection can be done with a pair of
set difference operation.
r s = r – (r – s)
Question 1: Find all customers who have both loan and an account?
The equivalent relational algebra query for the given question is,
The natural join operation forms a cartesian product of its two arguments,
performs a selection forcing equality on those attributes that appear in both
relation schemas and finally removes duplicate attributes.
Example 1:Consider the loan and borrower relation. Refer Cartesian product
operation for the relations.
Question 1:- Find the names of all customers who have a loan at the bank,
and find the amount of the loan?
Since the schemas for borrower and loan have the attribute loan-number in
common, the natural join operation considers only pair of tuples that have the
same value on loan number. It combines each such pair of tuples into a single
tuple on the union of two schemas. After performing the projection the following
result is obtained.
Result:
Customer-name Loan-number Amount
customer-name acc-no
Hayes A-102
John A-101
John A-201
Jones A-217
Question 2:- Find the names of all branches with customers who have an
account in the bank and who live in Harrison?
Relational algebra Expression:
Пbranch-name, (σcustomer-city = “Harrison”(customer account
depositor))
Result:
Branch-name
Brighton
Perryridge
r1 = Пbranch-name(σbranch-city = “Brooklyn”(branch))
The (customer-name, branch-name) pairs of all customers who has an account at a branch
can be found by the expression,
customer-name branch-name
Hayes Perryridge
John Downtown
John Brighton
Jones Brighton
To find customers who appear in r2 with every branch name in r1, the divide operation is
used as given below.
Result:
Customer-name
John
The Assignment Operation:
temp2 s
Generalized Projection:
ПF1,F2,….,Fn(E)
Question:
The credit-info relation lists the credit limit and expenses so far done. To find how
much more each person can spend, the following expression is written:
The attribute resulting from the expression limit – credit-balance does not have a
name. The rename operation can be applied for this purpose as below.
Result:
6. Aggregate Functions:
Aggregate functions take a collection of values and return a single value as a result
Question:
To find out the total sum of salaries of all part-time employees in the bank, the
following relational algebra expression is used.
gsum(salary)(pt-works)
7. Groups:
The result can be grouped based on some attribute. For example to partition the
relation pt-works into groups based on the branch, and to apply aggregation on
each group, the query is written as below.
branch-namegsum(salary)(pt-works)
Left outer Join (ii) Right outer Join (iii) Full outer Join
(i) The left outer join: This takes all tuples in the left relation that did not match
with any tuple in the right relation, pads the tuples with null values for all other
attributes from the right relation, and adds them to the result of the natural join.
The result of employee ft-works is given below
(ii) The right outer join: it is symmetric with the left outer join. It pads tuples
from the right relation that did not match any from the left relation with nulls and
adds them to the result of the natural join. The result of employee ft-works is
given below.
(iii) The full outer join: it does both of the above operations, padding tuples from
the left relation that did not match any from the right relation, as well as tuples
from the right relation that did not match any from the left relation, and adding
them to the result of the join. The below relation shows the result of employee
ft-works.
3. Relational Calculus
Each tuple variable usually ranges over a particular database relation, meaning
that the variable may take as its value any individual tuple from that relation.
{t | COND(t)}
The result of such a query is the set of all tuples t that satisfy COND (t).
For example, to find all employees whose salary is above $50,000, we can write
the following tuple calculus expression:
The condition EMPLOYEE(t) specifies that the range relation of tuple variable t
is EMPLOYEE.
To retrieve only some of the attributes—say, the first and last names—we write
For each tuple variable t, the range relation R of t. This value is specified by a
condition of the form R(t).
In tuple relational calculus, we first specify the requested attributes t.Bdate and
t.Address for each selected tuple t. Then we specify the condition for selecting a
tuple following the bar (|)—namely, that t be a tuple of the EMPLOYEE relation
whose Fname, Minit, and Lname attribute values are ‘John’, ‘B’, and ‘Smith’,
respectively.
{t1.Aj, t2.Ak, ... , tn.Am | COND(t1, t2, ..., tn, tn+1, tn+2, ..., tn+m)}
where t1, t2, … , tn, tn+1, … , tn+m are tuple variables, each Ai is an attribute of
the relation on which ti ranges, and COND is a condition or formula of the tuple
relational calculus.
1. An atom of the form R(ti), where R is a relation name and ti is a tuple variable.
This atom identifies the range of the tuple variable ti as the relation whose name is
2. An atom of the form ti.A op tj.B, where op is one of the comparison operators in
the set {=, <, ≤, >, ≥, ≠}, ti and tj are tuple variables, A is an attribute of the
relation on which ti ranges, and B is an attribute of the relation on which tj ranges.
3.An atom of the form ti.A op c or c op tj.B, where op is one of the comparison
operators in the set {=, <, ≤, >, ≥, ≠}, ti and tj are tuple variables, A is an attribute
of the relation on which ti ranges, B is an attribute of the relation on which tj
ranges, and c is a constant value.
Each of the preceding atoms evaluates to either TRUE or FALSE for a specific
combination of tuples; this is called the truth value of an atom.
In general, a tuple variable t ranges over all possible tuples in the universe. For
atoms of the form R(t), if t is assigned to a tuple that is a member of the specified
relation R, the atom is TRUE; otherwise, it is FALSE.
In atoms of types 2 and 3, if the tuple variables are assigned to tuples such that
the values of the specified attributes of the tuples satisfy the condition, then the
atom is TRUE.
In addition, two special symbols called quantifiers can appear in formulas; these
are the universal quantifier (∀) and the existential quantifier (∃).
logical connectives—(F1 AND F2), (F1 OR F2), NOT(F1), and NOT(F2)— depending
on whether it is free or bound in F1 or F2 (if it occurs in either).
Notice that in a formula of the form F = (F1 AND F2) or F = (F1 OR F2), a tuple
variable may be free in F1 and bound in F2, or vice versa; in this case, one
occurrence of the tuple variable is bound and the other is free in F.
form F′= (∃t)(F) or F′ = (∀t)(F). The tuple variable is bound to the quantifier
specified in F′. For example, consider the following formulas:
The tuple variable d is free in both F1 and F2, whereas it is bound to the (∀)
quantifier in F3. Variable t is bound to the (∃) quantifier in F2.
Rule 3: If F is a formula, then so is (∃t)(F), where t is a tuple variable. The formula
(∃t)(F) is TRUE if the formula F evaluates to TRUE for some (at least one) tuple
assigned to free occurrences of t in F; otherwise, (∃t)(F) is FALSE.
Query 1. List the name and address of all employees who work for the ‘Research’
department.
Query 2. For every project located in ‘Stafford’, list the project number, the
controlling department number, and the department manager’s last name, birth
date, and address.
Query 3. List the names of employees who work on all the projects controlled by
department number 5. One way to specify this query is to use the universal
quantifier as shown:
x.Pnumber=w.Pno))))}
Query 5. List the names of managers who have at least one dependent.
This query is handled by interpreting managers who have at least one dependent
as managers for whom there exists some dependent.
Domain calculus differs from tuple calculus in the type of variables used in
formulas:Rather than having variables range over tuples, the variables range over
single values from domains of attributes.
To form a relation of degree n for a query result, we must have n of these domain
variables—one for each attribute. An expression of the domain calculus is of the
form
{x1, x2, ..., xn | COND(x1, x2, ..., xn, xn+1, xn+2, ..., xn+m)} where x1, x2, … , xn,
xn+1, xn+2, … , xn+m are domain variables that range over domains (of
attributes), and COND is a condition or formula of the domain relational calculus.
A formula is made up of atoms. The atoms of a formula are slightly different from
those for the tuple calculus and can be one of the following:
1. An atom of the form R(x1, x2, … , xj), where R is the name of a relation of
degree j and each xi, 1 ≤ i ≤ j, is a domain variable. This atom states that a list of
values of <x1, x2, … , xj> must be a tuple in the relation whose name is R, where xi
is the value of the ith attribute value of the tuple. To make a domain calculus
expression more concise, we can drop the commas in a list of variables; thus, we
can write: {x1, x2, ..., xn | R(x1 x2 x3) AND ...} instead of: {x1, x2, ... , xn | R(x1,
x2, x3) AND ...}
2. An atom of the form xi op xj, where op is one of the comparison operators in the
set {=, <, ≤, >, ≥, ≠}, and xi and xj are domain variables.
B. Smith’.
Q0: {u, v | (∃q) (∃r) (∃s) (∃t) (∃w) (∃x) (∃y) (∃z) (EMPLOYEE(qrstuvwxyz) AND
q=‘John’ AND r=‘B’ AND s=‘Smith’)}
Query 1. Retrieve the name and address of all employees who work for the
‘Research’ department.
Query 2. For every project located in ‘Stafford’, list the project number, the
controlling department number, and the department manager’s last name, birth
date, and address.
Example:
Relation Instance:
X 🡒 Y holds if whenever two tuples have the same value for X, they must have
the same value for Y
For any two tuples t1 and t2 in any relation instance r(R): If t1[X]=t2[X], then
t1[Y]=t2[Y]
Examples of FD constraints
SSN 🡒 ENAME
Employee ssn and project number determines the hours per week that the
employee works on the project
One way to reduce the size of the set of FDs is to eliminate the
trivial dependencies. An FD is trivial if and only if the right side is a subset of the
left side.
An FD is non-trivial if and only if the right side is a not a subset of the left side.
Closure of a set F of FDs is the set F+ of all FDs that can be inferred from F.
{supplier-no, part-no}->{city}
{supplier-no, part-no}->{qty}
As another example, consider the relation R with attributes A,B and C, such that
the FDs A->B and B->C both hold for R. Then it is easy to see that the FD A->C
also holds for R. The FD A->C is an example of a transitive FD i.e. C is said to
depend on A transitively via B. The set of all FDs that are implied by a given set S
of FDs is called the closure of S, written S+
Let A,B and C be arbitrary subsets of the set of attributes of given relation
R and let AB mean the union of A and B. Then we have:
Self-determination: A->A
Let R be the relation with attributes A,B,C,D,E,F and the FDs are:
We now show that the FD AD->F holds for R and is thus a member of the closure of
A->BC (given)
CD->EF (given)
Closure of a set of attributes X with respect to Fis the set X+ of all attributes that
X+ can be calculated by repeatedly applying IR1, IR2, IR3 using the FDs in F
algorithm that says “Repeatedly apply the rules from the previous section until
Let R be the relation, Z be the set of all attributes of R and S be the set of
FDs that hold for R. From this we can determinate the set of all attributes of R
CLOSURE[Z,S]=Z;
do “forever”;
do;
if X C CLOSURE[Z,S]
then CLOSURE[Z,S]=CLOSURE[Z,S]UY;
end
if CLOSURE[Z,S] did not change
We now compute the closure{A,B}+ of the set of attributes {A,B} under this set of
FDs.
We now go round the inner loop four times, once for each of the given FDs. On the
first iteration (for the FD A->BC), we find that the left side is a subset of
CLOSURE[Z,S]. so we add attributes (B and C) to the result. CLOSURE[Z,S] is now
the set {A,B,C}.
On the second iteration (for the FD E->CF), we find that the left side is not a subset
of the result, which thus remains unchanged.
On the third iteration (for the FD B->E), we add E to CLOSURE[Z,S], which now has
the value {A,B,C,E}.
Now we go round the inner loop four times again. On the first iteration, the result
does not change; on the second, it expands to {A,B,C,E,F}, on the third and
fourth it does not change.
Now we go round the inner loop four times again. CLOSURE[Z,S] does not change,
and so the whole process terminates with {A,B}+ = {A,B,C,E,F}.
Thus if Z is a set of attributes of relation R and S is a set of FDs that hold for R, then
set of FDs that hold for R with Z as the left side is the set consisting of all FDs of
the form Z->Z’, where Z’ is some subset of the closure Z+ of Z under S. The
closure S+ of the original set S of FDs is then the union of all such sets of FDs,
taken over all possible attribute sets Z.
Definition (Covers):
We cannot remove any dependency from F and have a set of dependencies that is
equivalent to F.
Example:
Consider the relation PARTS for which the following FDs hold:
PART-NO->PART-NAME
PART-NO->COLOUR
PART-NO->WEIGHT
PART-NO->CITY
PART-NO->WEIGHT
PART-NO->CITY
PART-NO-> PART-NAME
PART-NO->WEIGHT
PART-NO->CITY
PART-NO-> PART-NO The first FD can be discarded without changing the closure.
PART-NO-> PART-NAME
PART-NO->COLOUR
PART-NO->WEIGHT
PART-NO->CITY
So, for every set of FDs there exist at least one equivalent set
that is irreducible.
Example:
A->BC
B->C
A->B
AB->C
AC->D
We now compute the irreducible set of FDs that is equivalent to the given set:
The first step is to rewrite the FDs such that each has a singleton right side:
A->B
A->C
B->C
A->B
AB->C
AC->D
Next, attribute C can be eliminated from the left side of the FD AC->D, because we
have A->C which can be written as
AC->D (given)
A->C (given)
Finally, the FD A->C is implied by the FDs A->B and B->C, so it can also be
eliminated.
A->B
B->C
A->D
NOTE:
The irreducible sets can also be represented by the terms minimal sets,
minimal cover and canonical cover.
ANOMALIES
Database anomalies, are problems that arise due to the limitations or flaws within a given
database. Anomalies can be classified into insertion anomalies, deletion anomalies,and
modification anomalies. These anomalies are discussed based on the EMP_DEPT relation
given below.
Insertion Anomalies:
Insertion anomalies can be differentiated into two types, illustrated by the following
examples based on the EMP_DEPT relation:
To insert a new employee tuple into EMP_DEPT, we must include either the attribute values
for the department that the employee works for, or NULLs (if the employee does not work
for a department as yet). For example, to insert a new tuple for an employee who works
in department number 5, we must enter all the attribute values
of department 5 correctly so that they are consistent with the corresponding values for
department 5 in other tuples in EMP_DEPT.
It is difficult to insert a new department that has no employees as yet in the EMP_DEPT
relation. The only way to do this is to place NULL values in the attributes for employee.
This violates the entity integrity for EMP_DEPT because SSN is its primary key.
(ii)Deletion Anomalies:
If we delete from EMP_DEPT an employee tuple that happens to represent the last
employee working for a particular department, the information concerning that
department is lost from the database.
For example if we delete the details of Borg.James E who works for Headquarters then the
details of that department is lost.
Modification Anomalies:
8. NORMALIZATION
A superkey of a relation schema R = {A1, A2, ...., An} is a set of attributes S subset-of R
with the property that no two tuples t1 and t2 in any legal relation state r of R will have
t1[S] = t2[S]
A key K is a superkey with the additional property that removal of any attribute from K will
cause K not to be a superkeyany more.
Figure.1 (a)
Description:
There are three main techniques to achieve first normal form for such a relation:
Expand the key so that there will be a separate tuple in the original DEPARTMENT
relation for each location of a DEPARTMENT, as shown in Figure 1(c). In this case,
the primary key becomes the combination {Dnumber, Dlocation}. This solution
has the disadvantage of introducing redundancy in the relation.
Remove the attribute Dlocations that violates 1NF and place it in a separate
relation DEPT_LOCATIONS along with the primary key Dnumber of DEPARTMENT.
The primary key of this relation is the combination {Dnumber, Dlocation}, as
shown in the below Figure 2. A distinct tuple in DEPT_LOCATIONS exists for each
location of a department. This decomposes the non-1NF relation into two 1NF
relations.
Definition.
The EMP_PROJ relation in Figure 15.10 (a) is in 1NF but is not in 2NF. The nonprime
attribute Ename violates 2NF because of FD2, as do the nonprime attributes Pname
and Plocation because of FD3. The functional dependencies FD2 and FD3 make
Ename, Pname, and Plocation partially dependent on the primary key {Ssn,
Pnumber} of EMP_PROJ, thus violating the 2NF.
Therefore, the functional dependencies FD1, FD2, and FD3 in Figure 15.10(a) lead
tothe decomposition of EMP_PROJ into the three relation schemas EP1, EP2, and
EP3shown in Figure 15.10(a), each of which is in 2NF.
8.3 Third Normal Form:
We can normalize EMP_DEPT by decomposing it into the two 3NF relation schemas
ED1 and ED2shown in Figure 15.10(b). ED1 and ED2 represent independententity
facts about employees and departments. A NATURAL JOIN operation onED1 and
ED2 will recover the original relation EMP_DEPT.
NOTE:
In X -> Y and Y -> Z, with X as the primary key, we consider this a problem only if Y
is not a candidate key.
Example:
For each subject, each student of that subject is taught only by one teacher.
teacher->subject
S Student
J Subject
T Teacher
Insertion anomaly: if a new faculty (say faculty 5) joins and no subject is assigned,
the faculty cannot be inserted as the prime attribute cannot be null.
Updation anomaly: if a student with id 789 is deleted, then Faculty 4 will get
deleted.
This difficulty is caused by the fact that the attribute teacher is a determinant but
not a candidate key. Whenever a non prime attribute determines the one or more
prime attribute then the relation violates BCNF. The teacher(non-prime attribute)
The solution to the problem is to split or decompose the original relation by two
BCNF projections as below:
ST{student, teacher}
TJ{Teacher, subject}
Student Teacher
123 Faculty1
123 Faculty2
456 Faculty3
789 Faculty4
999 Faculty1
Teacher Subject
Faculty1 Physics
Faculty2 Music
Faculty3 Biology
Faculty4 Physics
Faculty1 Physics
The anomalies can be overcome from the decompose of relation SJT into two
relations ST and TJ
9. MULTIVALUED DEPENDENCIES
MVD is represented as
A tuple in this Emp relation represents the fact that an employee whose name is Ename
works on the project whose name is Pname and has a dependent whose name is Dname.
Ename-->>Pname|Dname.
The Emp with Ename Smith works on projects with Pname X and Y and has 2
dependents with Dname ‘john’ and ‘Anna’. If we stored only the first two tuples in emp(<
‘smith’, ‘X’, ‘john’ > and < ‘smith’, ‘Y’, ‘Anna’>), we would incorrectly show
associations between project X, john and project Y, Anna. These should not be
conveyed, because no such meaning is intended in this relation.
Hence we must store the other 2 tuples (< ‘smith’, ‘X’, ‘Anna’>) and (<
‘smith’, ‘Y’, ‘john’>) to show that {X,Y} and {john , anna} are associated only
with smith e., there is no association between Pname and Dname which mean that the
two attributes are independent.
Y is a subset of X or X U Y=R
An MVD that does not satisfy the above condition is non trivial.
9.1 FOURTH NORMAL FORM:
The Emp relation in the example is not in 4NF because in the non trivial MVD’s
ename-->>pname and ename-->>dname, ename is not a super key of emp.
Emp_proj Emp_dep
Therefore, the emp relation is decomposed into Emp_proj and Emp_dep as given
below:
Join dependencies constrain the set of legal relations over a schema R to those relations
for which a given decomposition is a lossless-join decomposition.
if:
A relation schema R is in fifth normal form (5NF) (or Project-Join Normal Form
(PJNF)) with respect to a set F of functional, multivalued, and join dependencies
if, for every nontrivial join dependency JD(R1, R2, ..., Rn) in F+ (that is, implied by
F), everyRi is a superkey of R.
Supply:
The natural join of all three produces the state of the original relation.
R1 R2 R3
10 What are normal-forms? Explain the types of Normal form with CO3 K2
an example.
11 What are the pitfalls in the relational database design? With a CO3 K2
suitable example, explain the role of functional dependency in
the process of normalization.
15 Explain the principles of : (i) loss less join decomposition (ii) CO3 K2
join dependencies (iii) fifth normal form.
13. SUPPORTIVE ONLINE CERTIFICATION COURSES
It must have only one parent for each child node but parent nodes can have more
than one child. Multiple parents are not allowed. This is the major difference
between the hierarchical and network database model. The first node of the tree is
called the root node. When data needs to be retrieved then the whole tree is
traversed starting from the root node. This model represents one- to- many
relationships.
Let us see one example: Let us assume that we have a main directory which
contains other subdirectories. Each subdirectory contains more files and directories.
Each directory or file can be in one directory only i.e. it has only one parent.
15. CONTENT BEYOND SYLLABUS
Here A is the main directory i.e. the root node. B1 and B2 are their child or
subdirectories. B1 and B2 also have two children C1, C2 and C2, C3 respectively.
They may be directories or other files. This depicts one- to- many relationships.
A Hierarchical database model was widely used during the Mainframe Computers
Era. Today, it is used mainly for storing file systems and geographic information. It is
used in applications where high performance is required such as telecommunications
and banking. A hierarchical database is also used for Windows Registry in the
Microsoft Windows operating system. It is useful where the following two conditions
are met:
The data in a hierarchical pattern must be accessed through a single path only.
Advantages
Data can be retrieved easily due to the explicit links present between the table
structures.
Referential integrity is always maintained i.e. any changes made in the parent table
are automatically updated in a child table.
High performance.
Disadvantages
If the parent table and child table are unrelated then adding a new entry in the child
table is difficult because additional entry must be added in the parent table.
Parent-child relationship: Each child can have only one parent but a parent can
have more than one children.
Pointer: Pointers are used for linking records that tell which is a parent and which
child record is.
Disk input and output is minimized: Parent and child records are placed or
stored close to each other on the storage device which minimizes the hard disk input
and output.
Fast navigation: As parent and child are stored close to each other so access time
is reduced and navigation becomes faster.
Examples
Let us take an example of college students who take different courses. A course can
be assigned to an only single student but a student can take as many courses as
they want therefore following one to many relationships.
Now we can represent the above hierarchical model as relational tables as shown
below:
Student
Course
16. ASSESSMENT SCHEDULE
TEXT BOOKS:
REFERENCES:
Design E-R model for the following and also apply normalization
1) Blood bank management system
Hospitals will get register to request the blood they want. And some donors will get signup to
this blood bank to donate the blood. These donors will be available to donate in the particular
areas according to the registered data. The hospitals will request for the blood and blood bank
will provide the details of donors near to the hospital. Blood bank also shows the availability of
blood groups to the hospitals. We can also maintain the data of donated blood to the hospitals.
Staff details will be stored with id and all the staff details will be stored in the system. And we
retrieve them at any time by using their id. Students information will also be stored in the
system and students marks also can be stored. Salary management can also be done in this
system for the staff members of the school. Fees of the students can also be maintained in the
system. Another feature will contain sections information and the section class teacher.
create a system where the admin will be the manager. The manager will log in with his id and
he will add all the details about the employees and he can add any new employees who are
joined in the organization. add a feature to calculate the salaries of the employees based on
their designation and attendance. Add a feature to display the details of all the employees in
the organization and we can also display the details and salaries of the employees which are
calculated in the current month.
4) Railway system
Users can book the train tickets to reach their destination. n this option includes things like the
present station and destination station and the train that they want to travel in and provide the
user to check the details of the train by using the train id and it must also show the details of
train arrival time, in which platform the train is arriving and departure timings of the train. also
add an option in which that will allow the user to book a meal while traveling on the train. And
we can also add the option which shows the price range of a different class of booking like AC,
second class, sleeper, and others. And try to think yourself to add any options.
5) Hospital Data Management
assign unique IDs to the patients and store the relevant information under the same. add the
patient’s name, personal details, contact number, disease name, and the treatment the patient
is going through. mention under which hospital department the patient is (such as cardiac,
gastro, etc.). add information about the hospital’s doctors. A doctor can treat multiple patients,
and he/she would have a unique ID as well. Doctors would also be classified in different
departments. add the information of ward boys and nurses working in the hospital and
assigned to different rooms. Patients would get admitted into rooms, so add that information in
your database too.
Thank you
Disclaimer:
This document is confidential and intended solely for the educational purpose of RMK Group of
Educational Institutions. If you have received this document through email in error, please notify the
system manager. This document contains proprietary information and is intended only to the
respective group / learning community as intended. If you are not the addressee you should not
disseminate, distribute or copy through e-mail. Please notify the sender immediately by e-mail if you
have received this document by mistake and delete this document from your system. If you are not
the intended recipient you are notified that disclosing, copying, distributing or taking any action in
reliance on the contents of this information is strictly prohibited.