DBMS Unit 5 HWN
DBMS Unit 5 HWN
SQL is a programming language for Relational Databases. It is designed over relational algebra
and tuple relational calculus. SQL comprises both data definition and data manipulation
languages. Using the data definition properties of SQL, one can design and modify database
schema, whereas data manipulation properties allows SQL to store and retrieve data from
database.
The basic form of SQL query is
SELECT [Distinct] Select-list
FROM From-list
WHERE Condition;
Example: SQL Query for to get the names of students who got marks above 60.
SELECT sname
From student
Where marks>60
This query corresponds to a relational algebra expression involves selection, projection, and
cross product.
The SELECT clause specifies which columns to be retained in the result.
Select-list: it specifies the list of column names.
The FROM clause specifies the cross product of tables
From-list: list of table names.
WHERE clause specifies the selection condition on the tables mentioned in the FROM clause.
Conceptual evaluation strategy:
Computes the cross product of tables in the from-list.
Deletes the rows from the cross product that fails the condition.
Delete the column that does not appear in the select-list.
Eliminates the duplicate rows.
In this we wrote the queries using following table definitions.
Sailors (sid: integer, sname: string, rating: integer, age: real)
Boats ( bid: integer, bname: string, color: string)
Reserves( sid: integer, bid: integer, day: date)
Id sname rating Age Sid bid Day
22 Dustin 7 45.0 22 101 10/10/98
29 Brutus 1 33.0 22 102 10/10/98
31 Lubber 8 55.5 22 103 10/8/98
32 Andy 8 25.5 22 104 10/7/98
58 Rusty 10 35.0 31 102 11/10/98
64 Horatio 7 35.0 31 103 11/6/98
71 Zorba 10 16.0 31 104 11/12/98
74 Horatio 9 35.0 64 101 9/5/98
85 Art 3 25.5 64 102 9/8/98
95 Bob 3 63.5 74 103 9/8/98
Fig instance of sailors s3
Figure: An Instance R2 of Reserves
Example: Find the names of sailors who have reserved a red boat.
SELECT S.sname
FROM Sailors S
WHERE S.sid IN ( SELECT R.sid
FROM Reserves R
WHERE R.bid IN ( SELECT B.bid
FROM Boats B
WHERE B.color = `red' )
Example: Find the names of sailors who have not reserved a red boat.
SELECT S.sname
FROM Sailors S
WHERE S.sid NOT IN ( SELECT R.sid
FROM Reserves R
WHERE R.bid IN ( SELECT B.bid
FROM Boats B
WHERE B.color = `red' )
Correlated Nested Queries:
Correlated sub queries are used for row-by-row processing. Each sub query is executed once for
every row of the outer query.
Example: Find the names of sailors who have reserved boat number 103.
SELECT S.sname
FROM Sailors S
WHERE EXISTS ( SELECT *
FROM Reserves R
WHERE R.bid = 103 AND R.sid = S.sid)
Set-Comparison Operators:
SQL support the set operators EXIST, IN, and UNIQUE it also supports op ANY and op ALL,
where op is one of the arithmetic comparison operators (<;<=;=; <>;>=;>). SOME is also
available, but it is just a synonym for ANY.
The ANY and ALL operators are used with a WHERE or HAVING clause.
The ANY operator returns true if any of the subquery values meet the condition.
The ALL operator returns true if all of the subquery values meet the condition.
SOME: This operator is used to compare a value with a single column set of values returned by
the subquery. The SOME operator in SQL must match at least one value in a subquery and that
value must be preceded by comparison operators.
Generally we will use this SOME operator in WHERE clause to check whether the required
column values are matching with the set of values returned by subquery or not.
Example: write a Query to find the sailors whose rating of is greater than 8.
Example
INNER JOIN: The INNER JOIN creates a new result table by combining column values of two
tables (table1 and table2) based upon the join-predicate.
RIGHT JOIN: returns all rows from the right table, even if there are no matches in the left
table. This means that if the ON clause matches 0 (zero) records in the left table; the join will
still return a row in the result, but with NULL in each column from the left table.
This means that a right join returns all the values from the right table, plus matched values from
the left table or NULL in case of no matching join predicate
SQL> SELECT ID, NAME, AMOUNT, DATE
FROM CUSTOMERS
RIGHT JOIN ORDERS
ON CUSTOMERS.ID = ORDERS.CUSTOMER_ID;
ID | NAME | AMOUNT | DATE |
+------+----------+--------+---------------------+
| 3 | kaushik | 3000 | 2009-10-08 00:00:00 |
| 3 | kaushik | 1500 | 2009-10-08 00:00:00 |
| 2 | Khilan | 1560 | 2009-11-20 00:00:00 |
| 4 | Chaitali | 2060 | 2008-05-20 00:00:00
FULL JOIN: combines the results of both left and right outer joins.
The joined table will contain all records from both the tables and fill in NULLs for missing
matches on either side.
SELECT ID, NAME, AMOUNT, DATE
FROM CUSTOMERS
FULL JOIN ORDERS
ON CUSTOMERS.ID = ORDERS.CUSTOMER_ID;
This would produce the following result
+------+----------+--------+---------------------+
| ID | NAME | AMOUNT | DATE |
+------+----------+--------+---------------------+
| 1 | Ramesh | NULL | NULL |
| 2 | Khilan | 1560 | 2009-11-20 00:00:00 |
| 3 | kaushik | 3000 | 2009-10-08 00:00:00 |
| 3 | kaushik | 1500 | 2009-10-08 00:00:00 |
| 4 | Chaitali | 2060 | 2008-05-20 00:00:00 |
| 5 | Hardik | NULL | NULL |
| 6 | Komal | NULL | NULL |
| 7 | Muffy | NULL | NULL |
| 3 | kaushik | 3000 | 2009-10-08 00:00:00 |
| 3 | kaushik | 1500 | 2009-10-08 00:00:00 |
| 2 | Khilan | 1560 | 2009-11-20 00:00:00 |
| 4 | Chaitali | 2060 | 2008-05-20 00:00:00
Complex integrity constraints in SQL:
Integrity constraints over a single table:
We can specify complex integrity constraints over a single table using table constraints, which
have the form CHECK conditional-expression. When a row is inserted into table or an Existing
row is modified, the conditional expression in CHECK constraint is evaluated. If it evaluates to
false, the command is rejected.
Example:
Ensure that rating must be an integer in the range 1 to 10 only.
CREATE TABLE Sailors( sid INTEGER,
Sname CHAR(10),
Rating INTEGER,
Age REAL, PRIMARY KEY (Sid),
CHECK (rating >=1 AND rating <=10));
Restrict values:
Example: enforce the constraint that Interlake boats cannot be reserved
CREATE TABLE Reserves (Sid INTEGER,
Bid INTEGER (10),
Day DATE,
FOREIGN KEY (sid) REFERENCES Sailors,
FOREIGN KEY (bid) REFERENCES Boats,
CONSTRAINT noInterlakes
CHECK (‘Interlake’ < > (SELECT B.bname
FROM Boats B
WHERE B.bid=Reserves.bid)))
Domain Constraints and distinct types:
A user can define a new domain using the CREATE DOMAIN statement, which uses check
constraints.
CREATE DOMAIN ratingval INTEGER DEFAULT 1
CHECK (VALUE >=1 AND VALUE <=10)
Assertions:
Table constraints are associated with a single table, although the conditional expression in the
CHECK clause can refer to other tables. Table constraints are required to hold only if the
associated table is non empty. Thus when a constraint involves two or more tables, the table
constraint mechanism is not quite desired. To cover such situations, SQL supports the creation of
assertions. Assertions are constraints not associated with any one table.
For example, suppose that we wish to enforce the constraint that the number of
boats plus the number of sailors should be less than 100.
When a constraint involves two or more tables assertions are used.
Example: Database contains the CUSTOMERS table (id, name, age, address, salary),
creates a row-level trigger for the customers table to display the salary difference between
the old values and new values, that would fire for INSERT or UPDATE or DELETE
operations performed on the CUSTOMERS table.
Drop Trigger: we can remove the trigger on database using drop command.
Syntax: DROP TRIGGER [IF EXISTS] [schema_name.]trigger_name
Example: DROP TRIGGER [IF EXISTS] [customers] display_salary_changes
Active Databases:
Active Database is a database consisting of set of triggers. These databases are very difficult to be
maintained because of the complexity that arises in understanding the effect of these triggers. In
such database, DBMS initially verifies whether the particular trigger specified in the statement that
modifies the database is activated or not, prior to executing the statement.
If the trigger is active then DBMS executes the condition part and then executes the action part
only if the specified condition is evaluated to true. It is possible to activate more than one trigger
within a single statement.
In such situation, DBMS processes each of the trigger randomly. The execution of an action part of
a trigger may either activate other triggers or the same trigger that Initialized this action. Such
types of trigger that activates itself is called as ‘recursive trigger’. The DBMS executes such chains
of trigger.
For example consider Hourly_Emps relation, ssn is the key for this relation, and hourly wages
attribute is determined by rating attribute. That is, for a given rating value, there is only one
permissible hourly_wages value. This integrity constraint is an example of functional
dependency. It leads to possible redundancy in the relation Hourly_Emps.
Redundant storage: the rating value 8 corresponds to the hourly wage 10, this association
repeated three times, the rating value 5 corresponds to the hourly wage 7, this association
repeated two times.
Problems related to decomposition: decomposition relation schema can create more problems
than it solve. So we have to understand when we normally decompose a table into n number of
sub tables we have to realize that what the importance of doing decomposition is and problems
that might be we may if we do decomposition.
Example: Relation R(ABC) is decomposed into R1(AB) and R2(BC) check whether the
decomposition is lossy or loss less decomposition.
R(ABC)=
A B C
1 2 1
2 2 2
3 3 2
R1(AB)=
A B
1 2
2 2
3 3
R2(BC)=
B C
2 1
2 2
3 2
R1 X R2=
A B B C
1 2 2 1
1 2 2 2
1 2 3 2
2 2 2 1
2 2 2 2
2 2 3 2
3 3 2 1
3 3 2 2
3 3 3 2
R1 ⋈ R2 =
A B C
1 2 1
1 2 2
2 2 1
2 2 2
3 3 2
R1 ⋈ R2 != R, This decomposition is lossy decomposition.
Example: Relation R(ABC) is decomposed into R1(AB) and R2(AC) check whether the
decomposition is lossy or loss less decomposition.
R(ABC)=
A B C
1 1 1
2 1 2
3 2 1
4 3 2
R(ABC) is decomposed into R1(AB) and R2(AC)
R 1=
A B
1 1
2 1
3 2
4 3
R2=
A C
1 1
2 2
3 1
4 2
R1 ⋈ R2 = R
A B C
1 1 1
2 1 2
3 2 1
4 3 2
This decomposition is loss less.
The decomposition of relation schema R with set of FD’s, F into R 1 an R2 with FD’s F1 and F2
then this decomposition is said to be dependency preserving, the closure of set of functional
dependency set (F) is equals to the closure of the functional dependency sets F1 and F2.
Example:
Consider a table with two columns Employee_Id and Employee_Name.
{Employee_id, Employee_Name} → Employee_Id is a trivial functional dependency as
Employee_Id is a subset of {Employee_Id, Employee_Name}.
Also, Employee_Id → Employee_Id and Employee_Name → Employee_Name are trivial de
pendencies too.
2. Non-trivial functional dependency
A → B has a non-trivial functional dependency if B is not a subset of A.
Example:
ID → Name,
Name → DOB
Functional Dependency Set: Functional Dependency set or FD set of a relation is the set of all
FDs present in the relation.
Armstrong Axioms: The Armstrong axioms refer the set of inference rules. These are used to find
the closure of set of functional dependencies (F+), from the given set of functional dependencies
(F).
Given set of functional dependencies (F) = {AB, AC, CGH, CGI, BH}
Here we have
AB, and we also have BH then we can write AH (Transitivity)
CGH, and CGI, then we can write CG HI (Union)
AC, CGH, then we can write AGH (Pseudo transitivity)
AC, CGI, then we can write AGI (Pseudo transitivity)
Closure of set of functional dependencies (F+)= {AB, AC, CGH, CGI, BH, AH ,
CG HI, AGH, AGI }
Example: Relation has attributes R(ABCDEF) , set of functional dependencies of the relation
are F={AB, AC, CDE,CDF, BE} find the closure of set of functional
dependencies for relation R.
Given set of functional dependencies (F) = F={AB, AC, CDE,CDF, BE}
Here we have
AB and AC then we can write ABC(union)
CDE and CDF then we can write CDEF(union)
ABand BE then we can write AE(Transitivity)
AC, and CDE then we can write ADE (Pseudo transitivity)
AC, and CDF then we can write ADF (Pseudo transitivity)
Then closure of set of functional dependencies (F+)= { AB, AC, CDE,CDF, BE,
ABC, CDEF, AE, ADF, ADE }
Minimal cover or canonical cover or irreducible set of functional dependies:
To find the minimal set of functional dependencies we should follow these three rules
Here attribute B is extraneous, because from attribute A we can get attribute B, but from
attribute B we can’t get attribute A.
To removing extraneous attribute B, the functional dependency AB C can be written as AC
After removing the extraneous attribute B the FD’s are
AB,
BC,
AC
Step3: we should remove the redundant FD’s
AB,
BC,
AC
AB,
BC
Example2: find the canonical cover for the given set of functional dependencies
G = {AC, ABC,CDI, CDI, ECAB, EIC,AE}.
CLOSURE OF A {A+}={A,C,D,I,E,B}
CLOSURE OF B {B+}={B}
Here Attribute B Is Extranious.
AC
CDI
To find the extraneous attribute we should find the closure of C and Closure of D
To find the extraneous attribute we should find the closure of C and Closure of D
CLOSURE OF E {E+} = { E}
CLOSURE OF C {C+}= {C,D,I }
HERE THERE IS N EXTRANIOUS ATTRIBUTE
EIC
To find the extraneous attribute we should find the closure of E and Closure of I
CLOSURE OF E {E+} = { E}
CLOSURE OF C {I+}= {I}
Here there is noextranious attribute
ECB
To find the extraneous attribute we should find the closure of E and Closure of I
CLOSURE OF E {E+} = { E}
CLOSURE OF C {C+}= {C,D,I}
HERE THERE IS N EXTRANIOUS ATTRIBUTE
Example2: find the canonical cover for the given set of functional dependencies
F= {AD,EAD,BCAD,CB}
Example3: Find the canonical cover of F = { A BC, B CE, A E, AC H, D B}
Normalization
Normalization is the process of organizing the data in the database. It is used to minimize the
redundancy from a relation or set of relations. Normalization divides the larger table into the
smaller table and links them using relationship.
Types of Normal Forms
First Normal Form (1NF):
A relation will be 1NF if it contains an atomic value. It states that an attribute of a table cannot
hold multiple values. It must hold only single-valued attribute, it doesn’t allows the multi-valued
attribute, composite attribute, and their combinations.
The decomposition of the STUDENTT table into 1NF has been shown below:
4 C1 1000
2 C5 2000
{Note that, there are many courses having the same course fee.
Here,
COURSE_FEE cannot alone decide the value of COURSE_NO or STUD_NO;
COURSE_FEE together with STUD_NO cannot decide the value of COURSE_NO;
COURSE_FEE together with COURSE_NO cannot decide the value of STUD_NO;
Hence,
COURSE_FEE would be a non-prime attribute, as it does not belong to the one only candidate key
{STUD_NO, COURSE_NO} ;
But, COURSE_NO -> COURSE_FEE , i.e., COURSE_FEE is dependent on COURSE_NO,
which is a proper subset of the candidate key. Non-prime attribute COURSE_FEE is dependent on
a proper subset of the candidate key, which is a partial dependency and so this relation is not in
2NF.
To convert the above relation to 2NF, we need to split the table into two tables such as :
Table 1: STUD_NO, COURSE_NO and Table 2: COURSE_NO, COURSE_FEE
1 C1 C1 1000
2 C2 C2 1500
1 C4 C4 2000
4 C3 C3 1000
4 C1 C1 1000
2 C5 C5 2000
Table 1: STUD_NO, COURSE_NO Table 2: COURSE_NO, COURSE_FEE
NOTE: 2NF tries to reduce the redundant data getting stored in memory. For instance, if there are
100 students taking C1 course, we dont need to store its Fee as 1000 for all the 100 records, instead
once we can store it in the second table as the course fee for C1 is 1000.
Example 2: Let's assume, a school can store the data of teachers and the subjects they teach. In a
school, a teacher can teach more than one subject.
TEACHER table
TEACHER_DETAIL table:
TEACHER_SUBJECT table:
Teacher ID Subject
25 Chemistry
25 Biol0gy
47 English
83 Math
83 Computer
A relation is in 3NF if at least one of the following condition holds in every non-trivial function
dependency X –> Y
1. X is a super key.
2. Y is a prime attribute i.e., each element of Y is part of some candidate key.
Transitive dependency – If A->B and B->C are two FDs then A->C is called transitive
dependency.
Example 1 – In relation STUDENT
STUD_
STUD_ID STUD _ STATE STUD_ COUNTRY STUD_ AGE
NAME
1 RAM HAYANA INDIA 20
2 RAM PUNJAB INDIA 19
3 SURESH PUNJAB INDIA 21
FD set: {STUD_ID -> STUD_NAME, STUD_ID -> STUD_STATE, STUD_STATE ->
STUD_COUNTRY, STUD_ID -> STUD_AGE}
Candidate Key: {STUD_ID}
For this relation, STUD_ID -> STUD_STATE and STUD_STATE -> STUD_COUNTRY are
true. So STUD_COUNTRY is transitively dependent on STUD_ID. It violates the third normal
form.
To convert it in third normal form, we will decompose the relation STUDENT into two relations.
STUDENT (STUD_ID, STUD_NAME, STUD_STATE, STUD_AGE)
STUD_
STUD_ID STUD _ STATE STUD_ AGE
NAME
1 RAM HAYANA 20
2 RAM PUNJAB 19
3 SURESH PUNJAB 21
To convert the given table into BCNF, we decompose it into three tables:
EMP_COUNTRY table:
EMP_ID EMP_COUNTRY
264 INDIA
364 UK
EMP_DEPT table:
EMP_DEPT_MAPPING table:
EMP_ID EMP_DEPT
264 Designing
264 Testing
364 Stores
364 Developing
Functional dependencies:
EMP_ID → EMP_COUNTRY
EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}
Candidate keys:
Now, this is in BCNF because left side part of both the functional dependencies is a key.
Fourth normal form (4NF):
A relation will be in 4NF if it is in Boyce Codd normal form and has no multi-valued
dependency.
For a dependency A → B, if for a single value of A, multiple values of B exists, then the
relation will be a multi-valued dependency.
Example
STUDENT RELATION
The given STUDENT table is in 3NF, but the COURSE and HOBBY are two independent
entity. Hence, there is no relationship between COURSE and HOBBY.
So to make the above table into 4NF, we can decompose it into two tables:
STUDENT_COURSE, STUDENT_HOBBY.
Student_Course
STU_ID COURSE
21 COMPUTER
21 MATH
34 CHEMESTRY
74 BIALOGY
59 PHYSICS
Student_Hobby
STU_ID HOBBY
21 DANCING
21 SINGING
34 DANCING
74 CRICKET
59 HOCKEY
Fifth normal form (5NF):
A relation is in 5NF if it is in 4NF and not contains any join dependency and joining
should be lossless.
5NF is satisfied when all the tables are broken into as many tables as possible in
order to avoid redundancy.
5NF is also known as Project-join normal form (PJ/NF)
In the above table, John takes both Computer and Math class for Semester 1 but he doesn't take
Math class for Semester 2. In this case, combination of all these fields required to identify a valid
data.Suppose we add a new Semester as Semester 3 but do not know about the subject and who
will be taking that subject so we leave Lecturer and Subject as NULL. But all three columns
together acts as a primary key, so we can't leave other two columns blank.
So to make the above table into 5NF, we can decompose it into three relations P1, P2 & P3:
Table P1
SEMESTER SUBJECT
Semester 1 COMPUTER
Semester 1 MATH
Semester 2 MATH
Semester 1 CHEMESTRY
Table P2
SUBJECT LECTURER
COMPUTER Ansika
COMPUTER John
MATH John
MATH Akash
CHEMESTRY Praveen
Table P3
SEMESTER LECTURER
Semester 1 Ansika
Semester 1 John
Semester 1 John
Semester 2 Akash
Semester 1 Praveen
Example: Relation R has attributes R(ABCDEF) , set of FD’s, {ABC, CD, CE, EF,
FA} check the highest normal for relation R.
2NF: L.H.S. of all FD’s should be candidate key or RHS is Prime attribute it should not contain
any partial dependency L.H.S. is proper subset of any candidate key and R.H.S is non prime
attribute)
It is not in 2NF because is C is proper subset of candidate key and D is Non prime attribute.
(CDE)
The highest normal for of given relation R is first normal form (1NF).
Example: check the highest normal form of the Student relation R, set of functional
dependencies F is {RollnoName, RollnoVoterid, Voteridage, Voter Rollno }.
To check the highest normal form, we can check from highest to lowest.
BCNF: L.H. S of every functional dependency is the candidate key or super key of a relation.
Candidate keys of a relation are {Roll no, Voter id}
The above table is in BCNF in L.H. S of every functional dependency is the candidate key.
Example: R (A, B, C) and set of FD’s F={ABC, CA} show that R is in 3NF, not in
The above relation is in 3NF because AB is candidate key in FD (AB C) and A is the prime
attribute in FD (CA) of Relation R.
The above relation is not in BCNF because attribute C is not candidate key in FD (CA) of
relation R
Step 1. As we can see, (AC)+ ={A,C,B,E,D} but none of its subset can determine all attribute of
relation, So AC will be candidate key. A or C can’t be derived from any other attribute of the
relation, so there will be only 1 candidate key {AC}.
Step 2. Prime attributes are those attribute which are part of candidate key {A, C} in this
example and others will be non-prime {B, D, E} in this example.
Step 3. The relation R is in 1st normal form as a relational DBMS does not allow multi-valued or
composite attribute.
The relation is in 2nd normal form because BC->D is in 2nd normal form (BC is not a proper
subset of candidate key AC) and AC->BE is in 2nd normal form (AC is candidate key) and B->E
is in 2nd normal form (B is not a proper subset of candidate key AC).
The relation is not in 3rd normal form because in BC->D (neither BC is a super key nor D is a
prime attribute) and in B->E (neither B is a super key nor E is a prime attribute) but to satisfy 3rd
normal for, either LHS of an FD should be super key or RHS should be prime attribute.
So the highest normal form of relation will be 2nd Normal form.
Example: Find the highest normal form in R (A, B, C, D, E) under following functional
dependencies.
ABC D, CD AE
1) It is always a good idea to start checking from BCNF, then 3 NF and so on.
2) If any functional dependency satisfied a normal form then there is no need to check for
lower normal form.
For example, ABC –> D is in BCNF (Note that ABC is a super key), so no need to check this
dependency for lower normal forms.
Candidate keys in the given relation are {ABC, BCD}
BCNF: ABC -> D is in BCNF. Let us check CD -> AE, CD is not a super key so this dependency
is not in BCNF. So, R is not in BCNF.
3NF: ABC -> D we don’t need to check for this dependency as it already satisfied BCNF. Let us
consider CD -> AE. Since E is not a prime attribute, so the relation is not in 3NF.
2NF: In 2NF, we need to check for partial dependency. CD which is a proper subset of a candidate
key and it determine E, which is non-prime attribute. So, given relation is also not in 2 NF. So, the
highest normal form is 1 NF.