0% found this document useful (0 votes)
13 views53 pages

Chapter 3

The document discusses the expressive power of query languages, focusing on relational algebra and its operators, which include select, project, union, set difference, Cartesian product, and rename. It explains how these operators can be composed to form complex queries and introduces additional operations like natural join and outer join. The document also covers the use of aggregate functions and the nonprocedural query language based on predicate calculus.

Uploaded by

yeshwanth vemula
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views53 pages

Chapter 3

The document discusses the expressive power of query languages, focusing on relational algebra and its operators, which include select, project, union, set difference, Cartesian product, and rename. It explains how these operators can be composed to form complex queries and introduces additional operations like natural join and outer join. The document also covers the use of aggregate functions and the nonprocedural query language based on predicate calculus.

Uploaded by

yeshwanth vemula
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 53

Modifies from: Database System Concepts,

6th Ed. ©Silberschatz, Korth and Sudarshan


See www.db-book.com for conditions on re-
use
 Expressive power of a query language
o What queries can be expressed in this language?
 Procedural versus non-procedural, or declarative
 “Pure” languages:
◦ Relational algebra
◦ Tuple relational calculus
◦ Domain relational calculus
 The above 3 pure languages are equivalent in computing
power

 Relational algebra:
 Algebra of relations -> set of operators that take relations as input and
produce relations as output
 -> composable: the output of evaluating an expression in relational
algebra can be used as input to another relational algebra expression
 Now: First introduction to operators of the relational algebra
 Procedural language
 Six basic operators
select: 
project: 
union: 
set difference: –
Cartesian product: x
rename: 
 The operators take one or two relations
as inputs and produce a new relation as a
result.
composable
 Notation:  p(r)
 p is called the selection predicate
 Defined as:

 Where p is a formula in propositional calculus consisting of terms connected by :


 (and),  (or),  (not)
Each term is one of:
<attribute> op <attribute> or <constant>
where op is one of: =, , >, . <. 
 Example
 dept_name=“Physics”(instructor)
 Notation:
 A1 , A2 ,, Ak (r )
 where A1, A2 are attribute names and r is a relation
name.
 The result is defined as the relation of k columns
obtained by erasing the columns that are not listed
 Duplicate rows removed from result, since relations are
sets
 Let A be a subset of the attributes of relation r then:

 Example: To eliminate the dept_name attribute of


instructor
ID, name, salary (instructor)
ID, name, salary (instructor)
 Notation: r  s
 Defined as:

 For r  s to be valid.
1. r, s must have the same arity (same number
of attributes)
2. The attribute domains must be compatible
(example: 2nd column
of r deals with the same type of values as
does the 2nd column of s)
 Example: to find all courses taught in the Fall semester, or in
the Spring semester, or in both. First get all the section
offered in the spring and fall semester:
 semester=“Fall” (section)   semester=“Spring” (section)
2- Than show the course_ID attribute
course_id ( semester=“Fall” (section)   semester=“Spring” (section))
 Notation r – s
 Defined as:

 Set differences must be taken between compatible


relations.
 r and s must have the same arity
 attribute domains of r and s must be compatible
 Example: to find all courses taught in the Fall
semester but not in the Spring semester, or in both
 semester=“Fall” (section) − semester=“Spring” (section)

Then get the course_id of the selection


course_id ( semester=“Fall” (section) −  semester=“Spring” (section))
 Notation r x s
 Defined as:

 Assume that attributes of r(R) and s(S) are
disjoint. (That is, R  S = ).
 If attributes of r(R) and s(S) are not disjoint,
then renaming must be used.
 A basic expression in the relational algebra
consists of either one of the following:
A relation in the database
A constant relation: e.g., {(1),(2)}
 Let E1 and E2 be relational-algebra expressions;
the following are all relational-algebra
expressions:
 E1  E2
 E1 – E2
 E1 x E2
 p (E1), P is a predicate on attributes in E1
s(E1), S is a list consisting of some of the attributes in E1
  x (E1), x is the new name for the result of E1
 Allows us to name, and therefore to refer to, the
results of relational-algebra expressions.
 Allows us to refer to a relation by more than one name.
 Example:
 x (r)
returns the expression E under the name X
 If a relational-algebra expression E has arity n, then

 x ( A1, A2 ,..., An ) (r)


returns the result of expression E under the name X,
and with the attributes renamed to A1 , A2 , …., An .
 Find the largest salary in the university
 Step 1: find instructor salaries that are less than some other
instructor salary (not maximum) using a copy of instructor under
a new name d
 Find the largest salary in the university
 Step 2: Find the largest salary, step one return all but he
maximum value of instructor salary since the selection
condition is to show the salary that is smaller

- 95000
We define additional operations that do not
add any expressive power to the relational
algebra, but that simplify common queries.
 Natural join
 Assignment
 Outer join
 Notation: r s
 Let r and s be relations on schemas R and S respectively.
Then, r s is a relation on schema R  S obtained as
follows:
Consider each pair of tuples tr from r and ts from s.
If tr and ts have the same value on each of the attributes in R  S,
add a tuple t to the result, where
 t has the same value as tr on r
 t has the same value as ts on s
 Example:
 R = (A, B, C, D)
 S = (E, B, D)
Result schema = (A, B, C, D, E)
r s is defined as:
r.A, r.B, r.C, r.D, s.E (r.B = s.B  r.D = s.D (r x s))
 Let r and s be relations on schemas R and S
respectively.
Then, r s is defined as:
 Relations r, s:

n r s
 Find the names of all instructors in the Comp. Sci.
department together with the course titles of all the
courses that the instructors teach
 name, title ( dept_name=“Comp. Sci.” (instructor teaches
course))
 Natural join is associative
(instructor teaches) course is equivalent to
instructor (teaches course)
 Natural join is commutative (we ignore attribute order)
 instruct teaches is equivalent to teaches instructor
 The theta join operation r  s is defined as
 The assignment operation () provides a
convenient way to express complex queries.
 Write query as a sequential program consisting of
a series of assignments
followed by an expression whose value is displayed as a result
of the query.
Assignment must always be made to a temporary relation
variable.
 An extension of the join operation that
avoids loss of information.
 Computes the join and then adds tuples
form one relation that does not match
tuples in the other relation to the result of
the join.
 Uses null values:
 null signifies that the value is unknown or does
not exist
All comparisons involving null are (roughly
speaking) false by definition.
We shall study precise meaning of comparisons with
nulls later
 Relation instructor1

ID name dept_name
10101 Srinivasan Comp. Sci.
12121 Wu Finance
15151 Mozart Music

 Relation teaches1

ID course_id
10101 CS-101
12121 FIN-201
76766 BIO-101
 Join
instructor teaches
ID name dept_name course_id
10101 Srinivasan Comp. Sci. CS-101
12121 Wu Finance FIN-201

n Left Outer Join


instructor teaches

ID name dept_name course_id


10101 Srinivasan Comp. Sci. CS-101
12121 Wu Finance FIN-201
15151 Mozart Music null
n Right Outer Join
instructor teaches

ID name dept_name course_id


10101 Srinivasan Comp. Sci. CS-101
12121 Wu Finance FIN-201
76766 null null BIO-101

n Full Outer Join


instructor teaches

ID name dept_name course_id


10101 Srinivasan Comp. Sci. CS-101
12121 Wu Finance FIN-201
15151 Mozart Music null
76766 null null BIO-101
 Outer join can be expressed using basic
operations
 Given relations r(R) and s(S), such that S  R, r  s
is the largest relation t(R-S) such that
txsr
 Alternatively, all tuples from r.(R-S) such that all
their extensions on R ∩ S with tuples from s exist
in R
 Can write r  s as
 Return the name of all persons that read
all newspapers
reads newspaper
name newspaper newspaper
Peter Times Times
Bob Wall Street Wall Street
Alice Times
Alice Wall Street
 Generalized Projection
 Aggregate Functions
 Extends the projection operation by allowing arithmetic
functions to be used in the projection list.

 E is any relational-algebra expression


 Each of F1, F2, …, Fn are arithmetic expressions and
function calls involving constants and attributes in the
schema of E.
 Given relation instructor(ID, name, dept_name, salary)
where salary is annual salary, get the same information
but with monthly salary
ID, name, dept_name, salary/12 (instructor)
 Adding functions increases expressive power!
 In standard relational algebra there is no way to change
attribute values
 Aggregation function takes a set of values and returns
a single value as a result.
avg: average value
min: minimum value
max: maximum value
sum: sum of values
count: number of values
 Aggregate operation in relational algebra


E is any relational-algebra expression
 G1, G2 …, Gn is a list of attributes on which to group (can be
empty)
 Each Fi is an aggregate function
 Each Ai is an attribute name
 Note: Some books/articles use  instead of
(Calligraphic G)
 Relation r:

A B C

  7
  7
  3
  10

sum(c )
 sum(c) (r)
27
 Find the average salary in each department
dept_name avg(salary) (instructor)

avg_salar
y
 The content of the database may be
modified using the following
operations:
Deletion
Insertion
Updating
 All these operations can be expressed
using the assignment operator
 Example: Delete instructors with salary
over $1,000,000
 A nonprocedural query language, where each
query is of the form
{t | P (t ) }
 It is the set of all tuples t such that predicate
P is true for t
 t is a tuple variable, t [A ] denotes the value
of tuple t on attribute A
 t  r denotes that tuple t is in relation r
 P is a formula similar to that of the predicate
calculus
1. Set of attributes and constants
2. Set of comparison operators: (e.g., , , ,
, , )
3. Set of logical connectives: and (), or (v)‚
not ()
4. Implication (): x  y, if x if true, then y is
true
x  y x v y
5. Set of quantifiers:
t r (Q (t )) ”there exists” a tuple in t in relation r
such that predicate Q (t ) is true
t r (Q (t )) Q is true “for all” tuples t in relation r
 Find the ID, name, dept_name, salary for
instructors whose salary is greater than $80,000
{t | t  instructor  t [salary ]  80000}

n As in the previous query, but output only the ID attribute


value

{t |  s instructor (t [ID ] = s [ID ]  s [salary ]  80000)}

Notice that a relation on schema (ID) is implicitly defined by


the query, because
1) t is not bound to any relation by the predicate
2) we implicitly state that t has an ID attribute (t[ID] =
s[ID])
 Find the names of all instructors whose
department is in the Watson building
{t | s  instructor (t [name ] = s [name ]
 u  department (u [dept_name ] = s[dept_name] “
 u [building] = “Watson” ))}

n Find the set of all courses taught in the Fall 2009 semester, or in
the Spring 2010 semester, or both

{t | s  section (t [course_id ] = s [course_id ] 


s [semester] = “Fall”  s [year] = 2009)
v u  section (t [course_id ] = u [course_id ] 
u [semester] = “Spring”  u [year] = 2010)}
n Find the set of all courses taught in the Fall 2009 semester, and in
the Spring 2010 semester

{t | s  section (t [course_id ] = s [course_id ] 


s [semester] = “Fall”  s [year] = 2009 )
 u  section (t [course_id ] = u [course_id ] 
u [semester] = “Spring”  u [year] = 2010)}

n Find the set of all courses taught in the Fall 2009 semester, but not
in
the Spring 2010 semester
{t | s  section (t [course_id ] = s [course_id ] 
s [semester] = “Fall”  s [year] = 2009)
  u  section (t [course_id ] = u [course_id ] 
u [semester] = “Spring”  u [year] = 2010)}
 It is possible to write tuple calculus expressions
that generate infinite relations.
 For example, { t |  t r } results in an infinite
relation if the domain of any attribute of relation r
is infinite
 To guard against the problem, we restrict the set
of allowable expressions to safe expressions.
 An expression {t | P (t )} in the tuple relational
calculus is safe if every component of t appears in
one of the relations, tuples, or constants that
appear in P
◦ NOTE: this is more than just a syntax condition.
 E.g. { t | t [A] = 5  true } is not safe --- it defines an infinite
set with attribute values that do not appear in any relation or
tuples or constants in P.
 Find all students who have taken all courses offered in the
Biology department
◦ {t |  r  student (t [ID] = r [ID]) 
( u  course (u [dept_name]=“Biology” 
 s  takes (t [ID] = s [ID ] 
s [course_id] = u [course_id]))}
◦ Note that without the existential quantification on student,
the above query would be unsafe if the Biology department
has not offered any courses.
 A nonprocedural query language equivalent in
power to the tuple relational calculus
 Each query is an expression of the form:

 {  x1, x2, …, xn  | P (x1, x2, …,


xn)}

 x1, x2, …, xn represent domain variables


 Variables that range of attribute values
 P represents a formula similar to that of the predicate
calculus
Tuples can be formed using <>
 E.g., <‘Einstein’,’Physics’>
 Find the ID, name, dept_name, salary for
instructors whose salary is greater than $80,000
 {< i, n, d, s> | < i, n, d, s>  instructor  s  80000}
 As in the previous query, but output only the ID
attribute value
 {< i> | < i, n, d, s>  instructor  s  80000}
 Find the names of all instructors whose
department is in the Watson building
 {< n > |  i, d, s (< i, n, d, s >  instructor
  b, a (< d, b, a>  department  b
= “Watson” ))}
 Find the set of all courses taught in the Fall semester, or in
the Spring semester, or both
{<c> |  a, s, y, b, r, t ( <c, a, s, y, b, t >  section 
s = “Fall”)
v  a, s, y, b, r, t ( <c, a, s, y, b, t >  section ] 
s = “Spring”)}
This case can also be written as
{<c> |  a, s, y, b, r, t ( <c, a, s, y, b, t >  section 
( (s = “Fall”) v (s = “Spring))}
 Find the set of all courses taught in the Fall semester, and in
the Spring semester

{<c> |  a, s, y, b, r, t ( <c, a, s, y, b, t >  section 


s = “Fall”)
  a, s, y, b, r, t ( <c, a, s, y, b, t >  section ] 
s = “Spring”)}
The expression:
{  x1, x2, …, xn  | P (x1, x2, …, xn )}

is safe if all of the following hold:


1. All values that appear in tuples of the expression
are values from dom (P ) (that is, the values appear
either as constants in P or in a
tuple of a relation mentioned in P ).
2. For every “there exists” subformula of the form  x
(P1(x )), the subformula is true if and only if there is
a value of x in dom (P1) such that P1(x ) is true.
3. For every “for all” subformula of the form x (P1 (x
)), the subformula is true if and only if P1(x ) is true
for all values x from dom (P1).
 Find all students who have taken all courses offered in the
Biology department
 {< i > |  n, d, tc ( < i, n, d, tc >  student 
( ci, ti, dn, cr ( < ci, ti, dn, cr >  course  dn
=“Biology”
  si, se, y, g ( <i, ci, si, se, y, g> 
takes ))}
 Note that without the existential quantification on
student, the above query would be unsafe if the Biology
department has not offered any courses.

* Above query fixes bug in page 246, last query


 Codd’s theorem
Relational algebra and tuple calculus are equivalent in
terms of expressiveness

 That means that every query expressible in


relational algebra can also be expressed in tuple
calculus and vice versa
 Since domain calculus is as expressive as tuple
calculus the same holds for the domain calculus
 Note: Here relational algebra refers to the
standard version (no aggregation and projection
with functions)

You might also like