Unit 2
Unit 2
203105251
Prof. Swapnil Umbarkar, Prof. Mitali
Acharya,
Prof. Kiran Parmar
Assistant Professor
Computer Science and Engineering
UNIT-2
Relational Query Languages
Relational Algebra
• In order to query the database instances of a relational databases we
have Relational Algebra and Calculus Algebra.
• Relational Algebra takes relational database instances as input and
yields the instances only as output.
• Fundamental operations included in the Relational Algebra:
A) Select
B) Project
C) Union
D) Set Difference
E) Cartesian Product
F) Rename
Relational Algebra
• Fundamental operations are divided into:
A) Unary Operations namely Select, Project, Rename
B) Binary Operations namely Union, Set Difference, Cartesian Product
Relational Algebra
UNARY OPERATIONS:
•1) Select: Selects the tuples from the database which satisfy the given
predicate. Denoted by sigma (σ). The predicates are written in subscript of
(σ).
Example: (here, Eno=123 is predicate) σEno=123
•1) UNION: Union performs binary union between two given relations.
Denoted using the symbol ꓴ .
Example: ΠEno, name ꓴ ΠEno, name
•2) SET DIFFERENCE: Set difference is used when we want some set of
relation to not be present in other set of relation. Denoted using the symbol
(―).
Example: ΠEno, name ― ΠEno, name
Relational Algebra
• 3) CARTESIAN PRODUCT: It combines information of two different
relations. Denoted using the symbol X.
Example: σAddress=‘Vadodara’(Student Χ Faculties)
Relational Calculus
• Relational Calculus is non-procedural query language. In this the user is
concerning only on how the end results are obtained. Relational calculus
never concerns on how results are obtained, it concerns what results are
being obtained.
• Types of Relational Calculus:
A) Tuple Relational Calculus
B) Domain Relational Calculus
Relational Calculus
A) Tuple Relational Calculus:-
[1]
DDL and DML Constructs
• DDL, Data definition language, is the SQL commands set which are used
to define the structure of the data structures itself.
Example: Create, Alter etc.
• On the other hand, DML, Data manipulation language, is the set of SQL
commands used to manipulate the data itself.
Example: Insert, update, delete etc.
DDL and DML Constructs
Examples of DML:
• INSERT – is used to insert data into a table.
• UPDATE – is used to update existing data within a table.
• DELETE – is used to delete records from a database table.
DDL and DML Constructs
Examples of DDL commands:
• CREATE – is used to create the database or its objects (like table, index,
function, views, store procedure and triggers).
• DROP – is used to delete objects from the database.
• ALTER-is used to alter the structure of the database.
• TRUNCATE–is used to remove all records from a table, including all
spaces allocated for the records are removed.
• COMMENT –is used to add comments to the data dictionary.
• RENAME –is used to rename an object existing in the database.
DDL and DML difference
TABLE 2: lecturer
L_id L_name Subject_cod
e
454 ABC 03485
456 XYZ 03486
564 ASD 03567
451 GHJ 03485
563 PQS 03566
SQL Examples
⮚ Select rows from table 1 where student is not enrolled in AI.
Select * from Students
where NOT enrolled_in = ‘AI’;
⮚ Find the list of students name which is 5 character length and ending with t.
Select name from students
where name LIKE ‘____t’ ;
⮚ Find number of students born in year 2000.
Select count(DOB) from students
where DOB LIKE ‘%2000’ ;
⮚ Select student id and name who are taught by lecturer ASD.
Select students.S_id, students.name from students
INNER JOIN lecturer ON students.L_id = lecturer.L_id
Where lecturer.L_name = ‘ASD’;
SQL Examples
⮚ Update the course enrolled in of student id 303 to TOC.
Update students
set enrolled_in = ‘TOC’
where s_id = 303 ;
⮚ Insert record in table 2 : 345, FGH, 03105 .
Insert into lecturer
values(345, ‘FGH’, 03105) ;
⮚ Select students name enrolled in subject whose code is ending with 66.
Select students.name from students
INNER JOIN lecturer ON students.L_id = lecturer.L_id
where lecturer.subject_code LIKE ‘%66’ ;
⮚ Select students name from table students with no redundancy.
Select DISTINCT(Name) from students ;
SQL Examples
⮚ Count number of students in each division and enlist it in a table.
Select COUNT(Name) , division from students
GROUP BY division ;
⮚ Select students name and id with L_id as rename L_id to FR.
Select s_id, name, L_id AS FR from students ;
Select students id where L_id is 454 or 561.
Select S_id from students
where L_id = 454 OR L_id = 561 ;
⮚ Give student details with DOB in descending order.
Select * from students
order by DOB desc ;
⮚ Display division where number of students is greater than 1.
Select COUNT(division) , name from students
GROUP BY name ;
SQL Examples
⮚ Delete the record from table 2 where subject code is ending with 86.
Delete from lecturer where subject_code LIKE ‘%86’ ;
⮚ Change the constraint of table 1 at column enrolled_in to ICT.
Alter table students
alter enrolled_in SET DEFAULT ‘ICT’ ;
⮚ Add a new column, Phone_no in table lecturer .
Alter table lecturer
ADD Phone_no varchar(12) ;
⮚ Use ANY clause and find the details of students who is taught by any lecturer
whose name ends with ‘S’.
Select * from students
where L_id = ANY (Select L_name from lecturer where L_name LIKE ‘%S’ ;
SQL Examples
⮚ Select student id if there exists a student in division B1.
Select S_id from students
where EXISTS (Select * from students where division = ‘B1’ ) ;
⮚ Create a virtual table named VT where lecturer name is starting with A.
Create VIEW[VT] AS
Select * from lecturer
where L_name LIKE ‘A%’ ;
⮚ Remove primary key S_id from students.
Alter table students
DROP PRIMARY KEY ;
⮚ Count number of lecturers taking subject 03485.
Select count(L_id) from lecturer
GROUP BY subject_code ;
SQL Examples
⮚ Select lecturer details where subject code is not 03102, 03103, 03566.
Select * from lecturer
where subject_code NOT IN (03102, 03103, 03566) ;
⮚ Select students with DOB greater than 21/03/2000.
Select * from students
where DOB > 21/03/2000 ;
⮚ Select details of lecturer who is not teaching any subject i.e., subject_code =
NULL.
Select * from lecturer
where subject_code IS NULL ;
⮚ Select lecturer who teach subject 03102 and 03485
Select * from lecturer
where subject_code = 03102 AND subject_code = 03485 ;
SQL Examples
⮚ Select L_id from table 2 if ALL students in table 1 have division B8.
Select L_id from lecturer
where L_id = ALL (Select L_id from students where division = ‘B8’ ;
⮚ Drop table lecturer.
DROP table lecturer ;
⮚ Insert values 307, Siya, OS to the table students
Insert into students(S_id, Name, Enrolled_in)
values(307, ‘Siya’, ‘OS’);
Section - 2
Relational Database Design
Domain and data dependency:
• Dependencies in DBMS is a relation between two or more
attributes. It has the following types in DBMS:
• As you can see over here, for each value of Sid there is only
one Sname value. Hence, Sid → Sname is valid FD.
Types of Functional Dependency:
Trivial FD:
•For example, Consider Table 1.1. Following FDs are Trivial FDs.
• Sid → Sid
• Sname → Sname
• SidSname → Sname
• SidSname → SidSname
Types of Functional Dependency:
Non- Trivial FD:
•Answer: A+ is { A, B, C, D, E)
•[AF]+ = { A, B, C, D, E, F, G}
Super Key:
• Example: R(ABCD), F = { A → B, B → C, C → D }
• [A]+ = { A, B, C, D}
Example:
•R(ABCD), F = { AB → C, C → D, B → E }
•[AB]+ = { A, B, C, D, E}
•Hence, AB is one of the super keys.
Example:
•R(ABCDEF), F = { A → D, F → E, C → F, F → A }
•Here, B is not part of non-trivial FD set so it must be part of
candidate key.
•[C]+ = { A, C, E, F}
•Candidate key: { BC }
•Prime attributes: B and C
Candidate Key:
Example:
•R(ABCDEF), F = { AB → C, E → F, D → C, C → B }
•Here, A,E are at the determinant side only.
•Candidate key: { ABE}
•Prime attributes: A,B and E
Candidate Key:
1.R(ABCDE), F = { AB → B, BC → D, CD → E, E → A}
1. R(ABCD), F = { AB → CD, C → A, D → B}
1.R(ABCDEF), F = { AB → C, C → DE, E → F, EF → B, E → A}
1.R(ABCDEFG), F = { AB → CD, AF → D, DE → F, C → G, F → E, G
→ A}
Normalization:
• Normalization is the procedure to reduce the Redundancy.
• Problem:
• It if difficult to retrieve the list of Customers living in ‘Vadodara’
from above table.
• Reason is address attribute is composite attribute which contains
name of the area as well as city name in single cell.
Normalization:
1NF:
• Solution:
• Divide composite attributes into number of sub- attribute and
insert value in proper sub attribute.
Table 1.7: 1 NF Example solution
NOT X X→ Y : Non-trivial FD
SUPERKEY Forms redundancy
in Relation
Normalization:
How Redundancy can occur in any given FD X → Y?
• R(ABCDE), F = { AB → C, C → A, C → D, D → E }
• Super key/Candidate key: { AB,BC}
Super •
AB →C Doesn’t form
key Redundancy
Proper Subset
of Super key C →A
Not C →D Forms
Super Redundancy
Key D →E
Non-
Prime
Normalization:
X → Y, non-trivial FD in R with X not Super key can be,
Case 1: E.g. C → D
Proper Subset of CK (X) Non-Prime attribute (Y)
Case 2:
Non-Prime attribute (X) Non-Prime attribute (Y) E.g. D →
E
1 NF 2 NF 3 NF BCNF
Case 1 Yes No No No
Case 2 Yes Yes No No
Case 3 Yes Yes Yes no
Normalization:
2NF:
Problem:
• For example in case of joint account multiple customers have
common accounts.
• If some account says ‘A02’ is jointly by two customers says ‘C02’
and ‘C04’ then data values for attributes balance and bname will
be duplicated in two different tuples of customers ‘C02’ and ‘C04’.
Normalization:
2NF Example:
Solution:
• Decompose relation in such a way that resultant relation does not
have any partial FD.
• For this purpose remove partial dependent attribute that
violets 2NF from relation.
Normalization:
2NF Example:
Solution:
• Place them in separate new relation along with the prime
attribute on which they are full dependent.
• The primary key of new relation will be the attribute on which it if
fully dependent.
• Keep other attribute same as in that table with same primary key.
Normalization:
3NF:
Problem:
• Transitively dependency results in data redundancy.
• In this relation branch address will be stored repeatedly from each
account of same branch which occupy more space.
Normalization:
3NF Example:
Table 1.11 : 3NF Solution
Solution:
Solution:
1.R(ABCDE), F = { ABC → D, D → A, BD → E}
Question:
Solution:
• UNF
Lecturer Number, Lecturer Name, Lecturer Grade,
Department Code, Department Name, Subject Code, Subject
Name, Subject Level
• After removing multi valued attributes we get 1NF
• Lecturer Number, Lecturer Name, Lecturer Grade, Department
Code, Department Name
• Lecturer Number, Subject Code, Subject Name, Subject
Level (Partial Dependency)
Normalization:
Solution:
• After removing partial dependency we get 2NF
• Lecturer Number, Lecturer Name, Lecturer Grade, Department
Code, Department Name (Transitive Dependency)
• Lecturer Number, Subject Code
• Subject Code, Subject Name, Subject Level
• After removing transitive dependency we get 3NF
• Lecturer Number, Lecturer Name, Lecturer Grade, Department
Code
• Department Code, Department Name
• Lecturer Number, Subject Code
Decomposition:
• Properties of Decomposition:
R F
R1 R2 R3
(F1) (F2) (F3)
Decomposition:
Dependency Preserving Decomposition:
In general,
Example 1:
Example 1:
R2 (BC) : (F2)
[B]+ = { B, C, D, E} [ We can derive C from B. So B → C. A,D,E are
not in R2]
[C]+ = { B, C, D, E} [ We can derive B from C. So C → B. A,D,E are
not in R2]
R3 (CD) : (F3)
[C]+ = { B, C, D, E} [ We can derive D from C. So C → D. A,B,E are
not in R2]
[D]+ = { B, C, D, E} [ We can derive C from D. So D → C. A,B,E are
not in R2]
Decomposition:
Example 1:
{F1 ∪ F2 ∪ F3 ∪ F4 } = {A → B, B → C, C → B, C → D, D → C, D → E
}
Now, F = { A → B, B → C, C → D, D → E, D → B }
Example 2:
R1 (ABC) : (F1)
[AB]+ = { A, B, C, D} [ We can derive C from A and B. So AB → C. D
is not in R1]
Decomposition:
Example 2:
R2 (AD) : (F2)
[A]+ = {A} [ We cannot derive A from D.]
[D]+ = { A,D} [ We can derive A from d. So D → A. A,D are not in
R2]
R3 (BCD) : (F3)
[B]+ = {B} [ We can derive D from C. So C → D. A,B,E are not in R 2]
[C]+ = {C} [ We can derive C from D. So D → C. A,B,E are not in R 2]
[BD]+ = { A, B, C, D} [ We can derive C from B and D. So BD → C. A
is not in R3]
Decomposition:
Example 2:
{F1 ∪ F2 ∪ F3 } = {AB → C, D → A, BD → C }
R
Before
Decomposition
R1 R2 R3
After
Decomposition
Decomposition:
Lossless join Decomposition:
In general,
[R1 ⋈ R2 ⋈ R3 ⋈ …… ⋈ Rn] ⊇ R
S1 A S1 C1 R1⋈ R2S1 A C1
S2 B S1 C2 S1 A C2
S3 B S2 C2 S2 B C2
S3 C3 S3 B C3
{Sid}: key
{SidCid}: key
R1⋈ R2 = R, so it is LOSSLESS JOIN DECOMPOSTION.
Decomposition:
Lossless join Decomposition:
S2 B A C2 R1⋈ R2S1 A C2
S3 B B C2 S2 B C2
B C3 S2 B C3
{Sid}: key
S3 B C3
{SidCid}: key
S3 B C2
R1⋈ R2 ⊃ R, so LOSSY JOIN DECOMPOSTION.
S3 B C3
Decomposition:
Lossless join Decomposition:
Example:
R(ABCDE), F = { AB → C, C → D, B → E}
Relation R is decomposed into,
1)D = { ABC, CD }
R1(ABC) ∩ R2(CD) = C
[C]+ = { C,D }
Example:
R(ABCDE), F = { AB → C, C → D, B → E}
Relation R is decomposed into,
2)D = { ABC, DE }
Example:
R(ABCDE), F = { AB → C, C → D, B → E}
Relation R is decomposed into,
3)D = { ABC, CDE }
R1(ABC) ∩ R2(CDE) = C
[C]+ = { C,D }
Example:
R(ABCDE), F = { AB → C, C → D, B → E}
Relation R is decomposed into,
4)D = { ABCD, BE }
R1(ABCD) ∩ R2(BE) = B
[B]+ = { B,E }
Example:
R(ABCDE), F = { AB → C, C → D, B → E}
Relation R is decomposed into,
5)D = { ABC, ABDE }
R1(ABC) ∩ R2(ABDE) = AB
[AB]+ = { A,B,C,D,E }
Example:
R(ABCDE), F = { AB → C, C → D, B → E}
Relation R is decomposed into,
6)D = { ABC, CD, BE }
R1(ABC) ∩ R2(CD) = C
[C]+ = { C,D }
Example:
R(ABCDE), F = { AB → C, C → D, B → E}
Relation R is decomposed into,
7)D = { ABC, CD, DE }
R1(ABC) ∩ R2(CD) = C
[C]+ = { C,D }
Database Catalog
Data Data Data Statistics about
Data
Step in Query Processing
1. Parsing and Translation:
• Parsing(Parser):
• Check Syntax
• Check Schema Elements
• Translation(Translator)
• Parse Tree Relational Algebra
Step in Query Processing
2. Optimization(Optimizer):
• Communication cost:
• Applicable to distributed/parallel system.
• CPU Cycles:
• Difficult to calculate
• CPU speed improves at much faster rate as compared to
Disk speed
Measures of Query cost
• Disk Access:
• Dominates the total time to execute a query
1. Linear Search(A1)
2. Binary Search(A2)
Selection operation
1. Linear Search(A1):
• This algorithm will search and scan all blocks available and
tests all records/data to determine whether or not they satisfy
the selection condition.
• Cost(A1) = BR (worst case)
where BR denotes number of blocks
• If the condition is on a Key(primary) attribute, then system
can stop searching if desired record found.
• Cost(A1) = BR/2 (best case)
• If the condition is on non (primary) key attribute, then
multiple blocks may contain desired records, then the price of
scanning such blocks have to be added to the estimate value.
• This is slower than Binary Search.
Selection operation
2. Binary Search(A2):
• File (relation) ordered based on attribute A (primary index).
• Cost(A2) = log2(BR)
Bottom to top
• To evaluate such type of
Execution
Query we have to solve one
by one in proper order.
• There are two methods to σBalance<25000 (Customer)
evaluate multiple operations
expression:
1. Materialization
2. Pipelining (Account)
Materialization
• Materialization starts the bottom of the expression and performs
a single operation at a time.
• Materialized(store in temporary relation) each intermediate result
of all operations performed and use this result as input to evaluate
next-level operations.
• The cost of materialization can be quite high as overall cost can
be compute as:
Overall Cost = Sum of Costs of individual operations + Cost
of writing intermediate results to the disk
• Disadvantages of Materializations are:
• Due to intermediate results, it creates lots of temporary
relations.
• It performs many Input/Output operations.
Pipelining
• In Pipelining, the output of one operation is passed as input to
another operation. i.e. it forms a queue.
• As the output of one operation is passed to the next operation in
the Pipelines, the number of intermediate temporary relations
will be reduced.
• Performing operations in Pipeline eliminates the cost of writing
and reading temporary relations.
E1 σθ E2 = E2 σθ E1
E1 ∪ E2 = E2 ∪ E1
E1 ∩ E2 = E2 ∩ E1
Note: set difference is not commutative