0% found this document useful (0 votes)
26 views23 pages

04 Rel-Algebra2

R and S are bags. γA,SUM(B)(R) groups the bag R by attribute A and computes the sum of attribute B within each group, producing the result {(0, 2), (2, 7), (3, 4)}. The outer join of R and S preserves any dangling tuples by padding them with NULL values.

Uploaded by

steph30forwork
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views23 pages

04 Rel-Algebra2

R and S are bags. γA,SUM(B)(R) groups the bag R by attribute A and computes the sum of attribute B within each group, producing the result {(0, 2), (2, 7), (3, 4)}. The outer join of R and S preserves any dangling tuples by padding them with NULL values.

Uploaded by

steph30forwork
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

Relational Algebra on Bags

• A bag is like a set, but an element may appear


more than once.
– Multiset is another name for “bag.”
• Example: {1,2,1,3} is a bag. {1,2,3} is also a bag
that happens to be a set.
• Bags also resemble lists, but order in a bag is
unimportant.
– Example:
• {1,2,1} = {1,1,2} as bags, but
• [1,2,1] != [1,1,2] as lists.
Why bags?
• SQL, the most important query language for
relational databases is actually a bag language.
– SQL will eliminate duplicates, but usually only if you
ask it to do so explicitly.

• Some operations, like projection or union, are


much more efficient on bags than sets.
– Why?
Operations on Bags
• Selection applies to each tuple, so its effect on
bags is like its effect on sets.

• Projection also applies to each tuple, but as a bag


operator, we do not eliminate duplicates.

• Products and joins are done on each pair of


tuples, so duplicates in bags have no effect on
how we operate.
Example: Bag Selection

R( A B ) S( B C )
1 2 3 4
5 6 7 8
1 2

σA+B<5 (R) = A B
1 2
1 2
Example: Bag Projection

R( A, B ) S( B, C )
1 2 3 4
5 6 7 8
1 2

• Bag projection yields


πA (R) = A always the same
1
5
number of tuples as
1 the original relation.
Example: Bag Product

R( A, B ) S( B, C )
1 2 3 4
5 6 7 8
1 2
• Each copy of the
R×S= A R.B S.B C tuple (1,2) of R is
1 2 3 4 being paired with
1 2 7 8 each tuple of S.
5 6 3 4
5 6 7 8 • So, the duplicates do
1 2 3 4 not have an effect on
1 2 7 8 the way we compute
the product.
Bag Union
• Union, intersection, and difference need new
definitions for bags.
• An element appears in the union of two bags the
sum of the number of times it appears in each
bag.
• Example:
{1,2,1} ∪ {1,1,2,3,1}
= {1,1,1,1,1,2,2,3}
Bag Intersection
• An element appears in the intersection of two
bags the minimum of the number of times it
appears in either.
• Example:
{1,2,1} ∩ {1,2,3}
= {1,2}.
Bag Difference

• An element appears in difference A – B of


bags as many times as it appears in A,
minus the number of times it appears in B.
– But never less than 0 times.
• Example: {1,2,1} – {1,2,3}
= {1}.
Beware: Bag Laws != Set Laws
• Not all algebraic laws that hold for sets also hold for bags.
• For one example, the commutative law for union (R ∪ S = S ∪ R
) does hold for bags.
– Since addition is commutative, adding the number of times
that tuple x appears in R and S doesn’t depend on the order
of R and S.
• Set union is idempotent, meaning that S ∪ S = S.
• However, for bags, if x appears n times in S, then it appears 2n
times in S ∪ S.
• Thus S ∪ S != S in general.
The Extended Algebra

1. δ = eliminate duplicates from bags.


2. τ = sort tuples.
3. Extended projection: arithmetic, duplication
of columns.
4. γ = grouping and aggregation.
5. OUTERJOIN: avoids “dangling tuples” =
tuples that do not join with anything.
Example: Duplicate Elimination
• R1 := δ(R2).
• R1 consists of one copy of each tuple that
appears in R2 one or more times.
R= A B
1 2
3 4
1 2

δ(R) = A B
1 2
3 4
Sorting
• R1 := τ L (R2).
– L is a list of some of the attributes of R2.

• R1 is the list of tuples of R2 sorted first on


the value of the first attribute on L, then on
the second attribute of L, and so on.

• τ is the only operator whose result is neither


a set nor a bag.
Example: Extended Projection
• Using the same πL operator, we allow the list L
to contain arbitrary expressions involving
attributes, for example:
1. Arithmetic on attributes, e.g., A+B.
2. Duplicate occurrences of the same attribute.

R= A B
1 2
3 4

πA+BC,AA1,AA2 (R) = C A1 A2
3 1 1
7 3 3
Aggregation Operators

• They apply to entire columns of a table and


produce a single result.
• The most important examples:
– SUM
– AVG
– COUNT
– MIN
– MAX
Example: Aggregation

R= A B
1 3
3 4
3 2

SUM(A) = 7
COUNT(A) = 3
MAX(B) = 4
MIN(B) = 2
AVG(B) = 3
Grouping Operator

• R1 := γL (R2).
• L is a list of elements that are either:
1. Individual (grouping ) attributes.
2. AGG(A), where AGG is one of the
aggregation operators and A is an attribute.
Applying γL(R)
• Group R according to all the grouping attributes on
list L.
– That is, form one group for each distinct list of values for
those attributes in R.

• Within each group, compute AGG(A) for each


aggregation on list L.

• Result has grouping attributes and aggregations as


attributes.
• One tuple for each list of values for the grouping
attributes and their group’s aggregations.
Example: Grouping/Aggregation

R= A B C
1 2 3 Then, average C within
4 5 6 groups:
1 2 5
A B AVG(C)
γ A,B,AVG(C) (R) = ?? 1 2 4
First, group R: 4 5 6
A B C
1 2 3
1 2 5
4 5 6
Example: Grouping/Aggregation
• StarsIn(title, year, starName)
• We want, for each star who has appeared in at least three
movies the earliest year in which he or she appeared.
– First we group, using starName as a grouping attribute.
– Then, we have to compute the MIN(year) for each group.
– However, we need also compute COUNT(title) aggregate for
each group, in order to filter out those stars with less than three
movies.
• σctTitle>3[γstarName,MIN(year)minYear,COUNT(title)ctTitle(StarsIn)]
Outerjoin

• Suppose we join R S.
• A tuple of R that has no tuple of S with which
it joins is said to be dangling.
– Similarly for a tuple of S.
• Outerjoin preserves dangling tuples by
padding them with a special NULL symbol in
the result.
Example: Outerjoin

R= A B S= B C
1 2 2 3
4 5 6 7

(1,2) joins with (2,3), but the other two tuples


are dangling.
R S= A B C
1 2 3
4 5 NULL
NULL 6 7
Problem
• R(A,B) = {(0,1), (2,3), (0,1), (2,4), (3,4)}
• S(B,C) = {(0,1), (2,4), (2,5), (3,4), (0,2), (3,4)}

• γA,SUM(B)(R)

• R S

You might also like