0% found this document useful (0 votes)
46 views43 pages

Relational Algebra Ii: CS121: Relational Databases Fall 2018 - Lecture 3

This document discusses extensions to the relational algebra operations. It introduces generalized projection, which allows computed values and naming derived attributes. Aggregate functions like sum, avg, count, min, and max are discussed, which operate on collections of values. Grouping and aggregation allows computing aggregates per group, like counting puzzles completed by each person. Distinct values ensures aggregates are computed over sets rather than multisets by eliminating duplicate values.

Uploaded by

Raghav Rathi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views43 pages

Relational Algebra Ii: CS121: Relational Databases Fall 2018 - Lecture 3

This document discusses extensions to the relational algebra operations. It introduces generalized projection, which allows computed values and naming derived attributes. Aggregate functions like sum, avg, count, min, and max are discussed, which operate on collections of values. Grouping and aggregation allows computing aggregates per group, like counting puzzles completed by each person. Distinct values ensures aggregates are computed over sets rather than multisets by eliminating duplicate values.

Uploaded by

Raghav Rathi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 43

RELATIONAL

ALGEBRA
CS121: II
Relational Databases
Fall 2018 – Lecture 3
Last
2
Lecture
 Query languages provide support for retrieving
information from a database
 Introduced the relational algebra
¤ A procedural query language
¤ Six fundamental operations:
 select, project, set-union, set-
difference, Cartesian product, rename
¤ Several additional operations, built upon the
fundamental operations
 set-intersection, natural join, division,
assignment
Extended Operations
3

 Relational algebra operations have been extended


in various ways
¤ More generalized
¤ More useful!
 Three major extensions:
¤ Generalized projection
¤ Aggregate functions
¤ Additional join operations
 All of these appear in SQL standards
Generalized Projection Operation
4

 Would like to include computed results into relations


¤ e.g. “Retrieve all credit accounts, computing the
current ‘available credit’ for each account.”
¤ Available credit = credit limit – current balance
 Project operation is generalized to include computed
results
¤ Can specify functions on attributes, as well as
attributes themselves
¤ Can also assign names to computed values
¤ (Renaming attributes is also allowed, even though this is
also provided by the  operator)
Generalized
5
Projection
 Written as:  F , F , …, F (E)
1 2 n

¤ Fi are arithmetic expressions


¤ E is an expression that produces a relation
¤ Can also name values: Fi as name
 Can use to provide derived attributes
¤ Values are always computed from other attributes stored in
database
 Also useful for updating values in database
¤ (more on this later)
Generalized Projection
6
Example
 “Compute available credit for every credit
account.”
cred_id, (limit – balance) as available_credit(credit_acct)

cred_id limit balance cred_id available_credit


C-273 2500 150 C-273 2350
C-291 750 600 C-291 150
C-304 15000 3500 C-304 11500
C-313 300 25 C-313 275
credit_acct
Aggregate
7
Functions
 Very useful to apply a function to a collection
of values to generate a single result
 Most common aggregate functions:
sum sums the values in the collection
avg computes average of values in the
coun collection counts number of elements in
t the collection returns minimum value in
min the collection returns maximum value in
 max
Aggregate the collection
functions work on multisets, not
sets
¤ A value can appear in the input multiple times
Aggregate Function
8
Examples
“Find the total amount cred_id limit balance

owed to the credit C-273


C-291
2500
750
150
600
company.” C-304 15000 3500
C-313 300 25
Gsum(balance)(credit_acct) credit_acct
4275

“Find the maximum available credit of any


account.”
Gmax(available_credit)
11500 ((limit – balance) as available_credit(credit_acct))
Grouping and Aggregation
9

 Sometimes need to compute aggregates on a


per-item basis
puzzle_name
 Back to the puzzle database: altekruse
puzzle_list(puzzle_name) soma cube
completed(person_name, puzzle
box puzzle_list
puzzle_name)
 Examples: person_name puzzle_name
¤ How many puzzles has Alex altekruse
soma cube
each person completed? Alex puzzle
¤ How many people have box
Bob altekruse
completed each soma cube
puzzle? Carl puzzle
box
Bob
completed
Carl
Grouping and Aggregation (2)
10

puzzle_name person_name puzzle_name


altekruse Alex altekruse
soma cube
soma cube
Alex puzzle
puzzle
box
box
Bob altekruse
puzzl
soma cube
“How many puzzles has e_list
Carl puzzle
each person completed?” box
Bob
completed
person_nameG Carl
 First, input relation completed is groupedAlex
by unique puzzle
values
(completed)
count(puzzle_name) box of
person_name Carl soma cube
 Then, count(puzzle_name) is applied separately to each group
Grouping and Aggregation (3)
11

person_nameGcount(puzzle_name)(completed)

Input relation is Aggregate function is


grouped by applied to each group
person_name
person_name puzzle_name
Alex altekruse
soma cube person_name
Alex puzzle
Alex 3
box
Alex Bob 2

Bob puzzle box Carl 3


soma
Bob cube
Carl altekruse
puzzle box
Carl soma
cube
Distinct
12
Values
 Sometimes want to compute aggregates over sets of
values, instead of multisets
Example:
¤ Chage puzzle database to include a completed_times
relation, which records multiple solutions of a puzzle
 How many puzzles has person_name puzzle_name seconds
Alex altekruse 350
each person completed? Alex soma cube 45
Bob puzzle box 240
¤ Using completed_times Carl altekruse 285
relation this time Bob puzzle box 215
Alex altekruse 290
completed_times
Distinct Values
13
(2)
“How many puzzles has each person completed?”
person_name puzzle_name seconds
Alex altekruse 350
 Each puzzle appears Alex soma cube 45
multiple times now. Bob
Carl
puzzle box
altekruse
240
285
Bob puzzle box 215
Alex altekruse 290

completed_times

 Need to count distinct occurrences of each


puzzle’s name
person_nameGcount-distinct(puzzle_name)(completed_times)
Eliminating
14
Duplicates
 Can append -distinct to any aggregate function to
specify elimination of duplicates
¤ Usually used with count: count-distinct
¤ Makes no sense with min, max
General Form of Aggregates
15

 General form: G , G , …, G
1 2 n F1 (A1), F2 (A2 ), …, Fm
(E
Gevalutes to a relation
¤ E (A )m
)
¤ Leading Gi are attributes of E to group on
¤ Each Fj is aggregate function applied to attribute Aj of E
 First, input relation is divided into groups
¤ If no attributes Gi specified, no grouping is
performed (it’s just one big group)
 Then, aggregate functions applied to each group
General Form of Aggregates (2)
16

 General form: G , G , …, G F (A ), F (A ), …, F (E
1 2 n 1 1 2 2 m

 G (A )m
Tuples in E are grouped such that: )
¤ All tuples in a group have same values for attributes
G1, G2, …, Gn
¤ Tuples in different groups have different values for
G1, G2, …, Gn
 Thus, the values {g1, g2, …, gn} in each
group uniquely identify the group
¤ {G1, G2, …, Gn} are a superkey for the result
relation
General Form of Aggregates (3)
17

 General form: G , G , …, G F (A ), F (A ), …, F (E
1 2 n 1 1 2 2 m

 G in result (Ahave
Tuples ) m
the form: )
{g1, g2, …, gn, a1, a2, …, am}
¤ gi are values for that particular group
¤ aj is result of applying Fj to the multiset of values of
Aj
in that group
 Important note: Fj(Aj) attributes are unnamed!
¤ Informally we refer to them as Fj(Aj) in results,
but they have no name.
¤ Specify a name, same as before: Fj(Aj) as
attr_name
One More Aggregation Example
18

puzzle_name person_name puzzle_name


altekruse Alex altekruse
soma cube
soma cube
Alex puzzle
puzzle
box
box
Bob altekruse
puzzl
soma cube
e_list
“How many people have Carl puzzle
box
completed each Bob
completed
puzzle?” Carl
puzzle_nameGcount(person_name)(completed)
Alex puzzle box
Carl soma cube

 What if nobody has tried a particular puzzle?


¤ Won’t appear in completed relation
One More Aggregation Example
19

puzzle_name person_name puzzle_name


altekruse Alex altekruse
soma cube
soma cube
Alex puzzle
puzzle
box
box clutch
Bob altekruse
box
soma cube
 New puzzle added to puzzl
e_list
Carl puzzle
box
puzzle_list relation Bob
completed
¤ Would like to see { “clutch box”, 0 } in Carl
result…
¤ “clutch box” won’t appear in result! Alex puzzle box
Carl soma cube
 Joining the two tables doesn’t help either
¤ Natural join won’t produce any rows with “clutch
box”
Outer
20
Joins
 Natural join requires that both left and right tables
have a matching tuple
r s =  R  S(r.A1 =s.A1  r.A 2=s.A 2  …  r.An (r 
 s))
=s.A
n
Outer join is an extension of join operation
¤ Designed to handle missing information
 Missing information is represented by null values
in the result
¤ null = unknown or unspecified value
Forms of Outer
21
Join
 Left outer join: r s
¤ If a tuple tr  r doesn’t match any tuple in
s, result contains { tr, null, …, null }
¤ If a tuple ts  s doesn’t match any tuple in r,
it’s excluded
 Right outer join: r s
¤ If a tuple tr  r doesn’t match any tuple in s,
it’s excluded
¤ If a tuple ts  s doesn’t match any tuple in
r, result contains { null, …, null, ts }
Forms of Outer Join
22
(2)
 Full outer join: r s
¤ Includes tuples from r that don’t match
s, as well as tuples from s that don’t
match r
attr1 attr2 attr1 attr3
 Summary: a r1 b s2
r=
b r2 c s3
s=
c r3 d s4
r r r r
attr1 sattr2 attr3 attr1 sattr2 attr3 attr1 sattr2 attr3 attr1 s attr2 attr3
b r2 s2 a r1 null b r2 s2 a r1 null
s2 r3 b r2 s2
c r3 s3 b r2 s3 c null s3 c r3 s3
d null s4
c r3 d s4
Effects of null
23
Values
 Introducing null values affects everything!
¤ null means “unknown” or “nonexistent”
 Must specify effect on results when null is present
¤ These choices are somewhat arbitrary…
¤ (Read your database user’s manual! )
 Arithmetic operations (+, –, *, /) involving null
always evaluate to null (e.g. 5 + null = null)
 Comparison operations involving null evaluate to
unknown
¤ unknown is a third truth-value
¤ Note: Yes, even null = null evaluates to unknown.
Boolean Operators and
24
unknown
 and
true  unknown = unknown
false  unknown = false
unknown  unknown =
unknown
 or
true  unknown = true
false  unknown =
unknown
unknown  unknown =
unknown
 not
Relational
25
Operations
 For each relational operation, need to specify
behavior with respect to null and unknown
 Select: P(E)
¤ If P evaluates to unknown for a tuple, that tuple is excluded
from result (i.e. definition of  doesn't change)
 Natural join: r s
¤ Includes a Cartesian product, then a select
¤ If a common attribute has a null value, tuples are excluded
from join result
¤ Why?
 null = (anything) evaluates to unknown
Project and Set-Operations
26

 Project: (E)
¤ Project operation must eliminate duplicates
¤ null value is treated like any other value
¤ Duplicate tuples containing null values are also
eliminated
 Union, Intersection, and Difference
¤ null values are treated like any other value
¤ Set union, intersection, difference computed as expected
 These choices are somewhat arbitrary
¤ null means “value is unknown or missing”…
¤ …but in these cases, two null values are considered
equal.
¤ Technically, two null values aren’t the same. (oh
Grouping and Aggregation
27

 In grouping phase:
¤ null is treated like any other value
¤ If two tuples have same values (including null) on
the grouping attributes, they end up in same group
 In aggregation phase:
¤ null values are removed from the input multiset before
the aggregate function is applied!
 Slightly different from arithmetic behavior; it keeps one null
value from wiping out an aggregate computation.
¤ If the aggregate function gets an empty multiset for
input, the result is null…
 …except for count! In that case, count returns 0.
Generalized Projection, Outer
28
Joins
 Generalized Projection operation:
¤ A combination of simple projection and
arithmetic operations
¤ Easy to figure out from previous rules
 Outer joins:
¤ Behave just like natural join operation, except
for padding missing values with null
Back to Our person_name puzzle_name
29
Puzzle! Alex
Alex
altekruse
soma cube
“How many people Bob puzzle box
have completed puzzle_name Carl altekruse
altekruse Bob soma cube
each puzzle?” soma cube Carl puzzle box
puzzle box Alex puzzle box
clutch box Carl soma cube
puzzle_list completed

 Use an outer join to include puzzle_name person_name

all puzzles, not just solved altekruse


soma cube
Alex

ones puzzle_list completed puzzle


box
Alex

altekruse Bob
soma cube
puzzle Carl
box
Bob
Counting the
30
Solutions
 Now, use grouping and aggregation
¤ Group on puzzle name
¤ Count up the people!
completed
puzzle_nameGcount(person_name)(puzzle_list
)
puzzle_name person_name puzzle_name person_name puzzle_name
altekruse Alex altekruse Alex altekruse 2
soma cube soma cube 3
puzzle Alex altekruse Carl puzzle box 3
box soma cube Alex clutch box 0
altekruse Bob soma
soma cube cube Bob
puzzle Carl soma cube
box Carl
Bob
puzzle box Bob
puzzle
Carl
box Carl
Database Modification
31

 Often need to modify data in a database


 Can use assignment operator  for this
 Operations:
¤ r r Insert new tuples into a relation
E Delete tuples from a relation
¤ r r–E Update tuples already in the relation
¤ r  (r)
 Remember: r is a relation-variable
¤ Assignment operator assigns a new relation-value to
r
¤ Hence, RHS expression may need to include
existing version of r, to avoid losing unchanged
tuples
Inserting New
32
Tuples
 Inserting tuples simply involves a union:
r rE
¤ E has to have correct arity

 Can specify actual tuples to insert:


completed  completed  constan
{ (“Bob”, “altekruse”), (“Carl”, “clutch box”) } t
relation
¤ Adds two new tuples to completed relation

 Can specify constant relations as a set of values


¤ Each tuple is enclosed with parentheses
¤ Entire set of tuples enclosed with curly-braces
Inserting New Tuples
33
(2)
 Can also insert tuples generated from an expression
 Example:
“Dave is joining the puzzle club. He has done
every puzzle that Bob has done.”
¤ Find out puzzles that Bob has completed, then
construct new tuples to add to completed
Inserting New Tuples
34
(3)
 How to construct new tuples with name “Dave”
and each of Bob’s puzzles?
¤ Could use a Cartesian product:
{ (“Dave”) }  puzzle_name(person_name=“Bob”(completed))
¤ Or, use generalized projection with a constant:
“Dave” as person_name, puzzle_name(person_name=“Bob”(completed))
 Add new tuples to completed
relation:
completed  completed 
“Dave” as person_name, puzzle_name(person_name=“Bob”(completed))
Deleting
35
Tuples
 Deleting tuples uses the –
operation: puzzle_name

r r–E altekruse
soma cube
 Example puzzle
box puzzle_list
:
person_name puzzle_name
Problem:
Get rid of Alex altekruse
the completed relation
Alex
soma cube
puzzle
“somareferences the puzzle_list box
relation Bob altekruse
cube”
 To respect referential integrity soma cube
puzzle.constraints, should delete Carl puzzle
box
from completed first. Bob
completed
Carl
Deleting Tuples
36
(2)
 completed references puzzle_list
¤ puzzle_name is a key
¤ completed shouldn’t have any values for puzzle_name
that don’t appear in puzzle_list
¤ Delete tuples from completed first.
¤ Then delete tuples from puzzle_list.

completed  completed – puzzle_name=“soma cube”(completed)


puzzle_list  puzzle_list – puzzle_name=“soma cube”(puzzle_list)
Of course, could also write:
completed  puzzle_name≠“soma cube”(completed)
Deleting Tuples
37
(3)
 In the relational model, we have to think
about foreign key constraints ourselves…
 Relational database systems take care of these
things for us, automatically.
¤ Will explore the various capabilities and options in a
few weeks
Updating
38
Tuples
 General form uses generalized projection:
r   F , F , …,
1 2 acct_id branch_name balance
(r) F n
 Updates all tuples in A-301 New York 350
A-307 Seattle 275
r A-318 Los Angeles 550
A-319 New York 80
A-322 Los Angeles 275
account
 Example:
“Add 5% interest to all bank account balances.”
account  acct_id, branch_name, balance*1.05(account)
¤ Note: Must include unchanged attributes too
¤ Otherwise you will change the schema of
account
Updating Some
39
Tuples
 Updating only some tuples is more verbose
¤ Relation-variable is set to the entire result of the
evaluation
¤ Must include both updated tuples, and non-updated tuples,
in result
 Example:
“Add 5% interest to accounts with a balance less than
$10,000.”
account  acct_id, branch_name, balance*1.05(balance<10000(account)) 

balance≥10000(account)
Updating Some Tuples
40
(2)
Another example:
“Add 5% interest to accounts with a balance less than
$10,000, and 6% interest to accounts with a balance
of $10,000 or more.”
account  acct_id,branch_name,balance*1.05(balance<10000(account))

acct_id,branch_name,balance*1.06(balance≥10000(account))

 Don’t forget to include any non-updated tuples


in your update operations!
Relational Algebra
41
Summary
 Very expressive query language for retrieving
information from a relational database
¤ Simple selection, projection
¤ Computing correlations between relations using
joins
¤ Grouping and aggregation operations
 Can also specify changes to the contents of a
relation-variable
¤ Inserts, deletes, updates
 The relational algebra is a procedural query
language
¤ State a sequence of operations for computing a
result
Relational Algebra Summary
42
(2)
 Benefit of relational algebra is that it can be formally
specified and reasoned about
 Drawback is that it is very verbose!
 Database systems usually provide much simpler
query languages
¤ Most popular by far is SQL, the Structured Query
Language
 However, many databases use relational algebra-like
operations internally!
¤ Great for representing execution plans, due to its
procedural nature
Next
43
Time
 Transition from relational algebra to
SQL
 Start working with “real” databases 

You might also like