Relational Algebra and SQL
Relational Algebra and SQL
Chapter 6
1
Relational Query Languages
Languages for describing queries on a
relational database
Structured Query Language (SQL)
Predominant application-level query language
Declarative
Relational Algebra
Intermediate language used within DBMS
Procedural
2
What is an Algebra?
A language based on operators and a domain of values
Operators map values taken from the domain into
other domain values
Hence, an expression involving operators and
arguments produces a value in the domain
When the domain is a set of all relations (and the
operators are as described later), we get the relational
algebra
We refer to the expression as a query and the value
produced as the query result
3
Relational Algebra
Domain: set of relations
Basic operators: select, project, union, set
difference, Cartesian product
Derived operators: set intersection, division, join
Procedural: Relational expression specifies query
by describing an algorithm (the sequence in which
operators are applied) for determining the result of
an expression
4
Relational Algebra in a DBMS
Optimized
Relational Relational Query Executable
SQL algebra execution code
algebra
query expression expression plan
Code
parser
generator
Query optimizer
DBMS
5
Select Operator
Produce table containing subset of rows of
argument table satisfying condition
condition relation
Example:
Person Hobby=stamps(Person)
Id Name Address Hobby Id Name Address Hobby
1123 John 123 Main stamps 1123 John 123 Main stamps
1123 John 123 Main coins 9876 Bart 5 Pine St stamps
5556 Mary 7 Lake Dr hiking
9876 Bart 5 Pine St stamps
6
Selection Condition
Operators: <, , , >, =,
Simple selection condition:
<attribute> operator <constant>
<attribute> operator <attribute>
<condition> AND <condition>
<condition> OR <condition>
NOT <condition>
7
Selection Condition - Examples
NOT(Hobby=hiking) (Person)
Hobbyhiking (Person)
8
Project Operator
Produces table containing subset of columns
of argument table
attribute list(relation)
Example:
Person Name,Hobby(Person)
10
Expressions
Id, Name ( Hobby=stamps OR Hobby=coins (Person) )
11
Set Operators
Relation is a set of tuples => set operations
should apply
Result of combining two relations with a set
operator is a relation => all its elements
must be tuples having same structure
Hence, scope of set operations limited to
union compatible relations
12
Union Compatible Relations
Two relations are union compatible if
Both have same number of columns
Names of attributes are the same in both
Attributes with the same name in both relations
have the same domain
Union compatible relations can be
combined using union, intersection, and set
difference
13
Example
Tables:
Person (SSN, Name, Address, Hobby)
Professor (Id, Name, Office, Phone)
are not union compatible. However
Name (Person) and Name (P rofessor)
are union compatible and
Name (Person) - Name (Professor)
makes sense.
14
Cartesian Product
If R and S are two relations, R S is the set of all
concatenated tuples <x,y>, where x is a tuple in R
and y is a tuple in S
(R and S need not be union compatible)
R S is expensive to compute:
Factor of two in the size of each row
Quadratic in the number of rows
a b c d a b c d
x1 x2 y1 y2 x1 x2 y1 y2
x3 x4 y3 y4 x1 x2 y3 y4
x3 x4 y1 y2
R S x3 x4 y3 y4
R S 15
Renaming
Result of expression evaluation is a relation
Attributes of relation must have distinct names.
This is not guaranteed with Cartesian product
e.g., suppose in previous example a and c have the
same name
Renaming operator tidies this up. To assign the
names A1, A2, An to the attributes of the n
column relation produced by expression expr use
expr [A1, A2, An]
16
Example
20
Equijoin Join - Example
Equijoin: Join condition is a conjunction of equalities.
Name,CrsCode(Student Id=StudId Grade=A (Transcript))
Student Transcript
Id Name Addr Status StudId CrsCode Sem Grade
111 John .. .. 111 CSE305 S00 B
222 Mary .. .. 222 CSE306 S99 A
333 Bill .. .. 333 CSE304 F99 A
444 Joe .. ..
The equijoin is used very
frequently since it combines
Mary CSE306 related data in different relations.
Bill CSE304
21
Natural Join
Special case of equijoin:
join condition equates all and only those attributes with the
same name (condition doesnt have to be explicitly stated)
duplicate columns eliminated from the result
Transcript Teaching =
StudId, Transcript.CrsCode, Transcript.Sem, Grade, ProfId (
Transcript CrsCode=CrsCode AND Sem=SemTeaching )
[StudId, CrsCode, Sem, Grade, ProfId ] 22
Natural Join (cont)
More generally:
R S = attr-list (join-cond (R S) )
where
attr-list = attributes (R) attributes (S)
(duplicates are eliminated) and join-cond has
the form:
A1 = A1 AND AND An = An
where
{A1 An} = attributes(R) attributes(S)
23
Natural Join Example
List all Ids of students who took at least
two different courses:
StudId ( CrsCode CrsCode2 (
Transcript
Transcript [StudId, CrsCode2, Sem2, Grade2]))
24
Division
Goal: Produce the tuples in one relation, r,
that match all tuples in another relation, s
r (A1, An, B1, Bm)
s (B1 Bm)
r/s, with attributes A1, An, is the set of all tuples
<a> such that for every tuple <b> in s, <a,b> is
in r
Can be expressed in terms of projection, set
difference, and cross-product
25
Division (cont)
26
Division - Example
List the Ids of students who have passed all
courses that were taught in spring 2000
Numerator: StudId and CrsCode for every
course passed by every student
StudId, CrsCode (Grade F (Transcript) )
Denominator: CrsCode of all courses
taught in spring 2000
CrsCode (Semester=S2000 (Teaching) )
Result is numerator/denominator
27
Schema for Student Registration System
28
Query Sublanguage of SQL
SELECT C.CrsName
FROM Course C
WHERE C.DeptId = CS
Tuple variable C ranges over rows of Course.
Evaluation strategy:
FROM clause produces Cartesian product of listed tables
WHERE clause assigns rows to C in sequence and produces
table containing only rows satisfying condition
SELECT clause retains listed columns
Equivalent to: CrsNameDeptId=CS(Course)
29
Join Queries
SELECT C.CrsName
FROM Course C, Teaching T
WHERE C.CrsCode=T.CrsCode AND T.Sem=S2000
List CS courses taught in S2000
Tuple variables clarify meaning.
Join condition C.CrsCode=T.CrsCode
eliminates garbage
Selection condition T.Sem=S2000
eliminates irrelevant rows
Equivalent (using natural join) to:
CrsName(Course Sem=S2000 (Teaching) )
CrsName (Sem=S2000 (Course Teaching) ) 30
Correspondence Between SQL
and Relational Algebra
SELECT C.CrsName
FROM Course C, Teaching T
WHERE C.CrsCode=T.CrsCode AND T.Sem=S2000
Also equivalent to:
CrsName C_CrsCode=T_CrsCode AND Sem=S2000
(Course [C_CrsCode, DeptId, CrsName, Desc]
Teaching [ProfId, T_CrsCode, Sem])
This is the simple evaluation algorithm for SELECT.
Relational algebra expressions are procedural. Which of
the two equivalent expressions is more easily evaluated?
31
Self-join Queries
Find Ids of all professors who taught at least two
courses in the same semester:
SELECT T1.ProfId
FROM Teaching T1, Teaching T2
WHERE T1.ProfId = T2.ProfId
AND T1.Semester = T2.Semester
AND T1.CrsCode <> T2.CrsCode
SELECT DISTINCT ..
FROM ..
33
Use of Expressions
Equality and comparison operators apply to
strings (based on lexical ordering)
WHERE S.Name < P
Concatenate operator applies to strings
WHERE S.Name || -- || S.Address = .
Expressions can also be used in SELECT clause:
SELECT S.Name || -- || S.Address AS NmAdd
FROM Student S
34
Set Operators
SQL provides UNION, EXCEPT (set difference), and
INTERSECT for union compatible tables
Example: Find all professors in the CS Department and
all professors that have taught CS courses
(SELECT P.Name
FROM Professor P, Teaching T
WHERE P.Id=T.ProfId AND T.CrsCode LIKE CS%)
UNION
(SELECT P.Name
FROM Professor P
WHERE P.DeptId = CS) 35
Nested Queries
List all courses that were not taught in S2000
SELECT C.CrsName
FROM Course C
WHERE C.CrsCode NOT IN
(SELECT T.CrsCode --subquery
FROM Teaching T
WHERE T.Sem = S2000)
38
Division
Query type: Find the subset of items in one
set that are related to all items in another set
Example: Find professors who have taught
courses in all departments
Why does this involve division?
ProfId DeptId DeptId
Contains row All department Ids
<p,d> if professor
p has taught a
course in
department d
39
Division
Strategy for implementing division in SQL:
Find set, A, of all departments in which a
particular professor, p, has taught a course
Find set, B, of all departments
Output p if A B, or, equivalently, if BA is
empty
40
Division SQL Solution
SELECT P.Id
FROM Professor P
WHERE NOT EXISTS
(SELECT D.DeptId -- set B of all dept Ids
FROM Department D
EXCEPT
SELECT C.DeptId -- set A of dept Ids of depts in
-- which P has taught a course
FROM Teaching T, Course C
WHERE T.ProfId=P.Id -- global variable
AND T.CrsCode=C.CrsCode)
41
Aggregates
Functions that operate on sets:
COUNT, SUM, AVG, MAX, MIN
Produce numbers (not tables)
Not part of relational algebra
45
GROUP BY - Example
Transcript
Attributes:
-students Id
1234 1234 3.3 4 -avg grade
1234
1234 -number of courses
1234
48
Example
Output the name and address of all seniors
on the Deans List
55
View Benefits (cont)
Customization: Users need not see full
complexity of database. View creates the
illusion of a simpler database customized to
the needs of a particular category of users
A view is similar in many ways to a
subroutine in standard programming
Can be used in multiple queries
56
Nulls
Conditions: x op y (where op is <, >, <>, =, etc.) has
value unknown (U) when either x or y is null
WHERE T.cost > T.price
Arithmetic expression: x op y (where op is +, -, *,
etc.) has value NULL if x or y is null
WHERE (T. price/T.cost) > 2
Aggregates: COUNT counts nulls like any other
value; other aggregates ignore nulls
SELECT COUNT (T.CrsCode), AVG (T.Grade)
FROM Transcript T
WHERE T.StudId = 1234
57
Nulls (cont)
WHERE clause uses a three-valued logic to filter
rows. Portion of truth table:
C1 C2 C1 AND C2 C1 OR C2
T U U T
F U F U
U U U U
59
Bulk Insertion
Insert the rows output by a SELECT
CREATE TABLE DeansList (
StudId INTEGER,
Credits INTEGER,
CumGpa FLOAT,
PRIMARY KEY StudId )
60
Modifying Tables - Delete
Similar to SELECT except:
No project list in DELETE clause
No Cartesian product in FROM clause (only 1 table
name)
Rows satisfying WHERE clause (general form,
including subqueries, allowed) are deleted instead of
output
63
Updating Views - Problem 1
INSERT INTO CsReg (StudId, CrsCode, Semester)
VALUES (1111, CSE305, S2000)
Question: What value should be placed in
attributes of underlying table that have been
projected out (e.g., Grade)?
Answer: NULL (assuming null allowed in
the missing attribute) or DEFAULT
64
Updating Views - Problem 2
INSERT INTO CsReg (StudId, CrsCode, Semester)
VALUES (1111, ECO105, S2000)
66
Updating Views - Problem 3 (cont)
67
Updating Views - Restrictions
Updatable views are restricted to those in which
No Cartesian product in FROM clause
no aggregates, GROUP BY, HAVING