Lec04 SQL Aggregation Grouping
Lec04 SQL Aggregation Grouping
1
Aggregation in SQL
>sqlite3 lecture04
2
Comment about SQLite
• One cannot load NULL values such that they
are actually loaded as null values
select sum(quantity)
from Purchase
where quantity is not null; 5
Aggregates and NULL Values
Null values are not used in aggregates
insert into Purchase
values(12, 'gadget', NULL, NULL, 'april')
Let’s try the following
select count(*) from Purchase
-- NULL is counted in count(*)
select count(quantity) from Purchase
-- NULL is ignored in count(quantity)
select sum(quantity)
from Purchase
where quantity is not null;
-- “is not null” is redundant
6
Counting Duplicates
COUNT applies to duplicates, unless otherwise stated:
We probably want:
SELECT Count(DISTINCT product)
FROM Purchase
WHERE price> 4.99
7
More Examples
8
Simple Aggregations
Purchase Product Price Quantity
Bagel 3 20
Bagel 1.50 20
Banana 0.5 50
Banana 2 10
Banana 4 10
SELECT Sum(price * quantity)
FROM Purchase 90 (= 60+30)
WHERE product = ‘Bagel’
9
Simple Aggregations
Purchase Product Price Quantity
Bagel 3 20
Bagel 1.50 20
Banana 0.5 50
Banana 2 10
Banana 4 10
SELECT Sum(price * quantity)
FROM Purchase 90 (= 60+30)
WHERE product = ‘Bagel’
10
More Examples
How can we find the average revenue per sale?
11
More Examples
SELECT sum(price * quantity) / count(*)
FROM Purchase
WHERE product = ‘bagel’
12
Grouping and Aggregation
Purchase(product, price, quantity)
13
Grouping and Aggregation
Purchase(product, price, quantity)
FWGS
15
1&2. FROM-WHERE-GROUPBY
16
FWGS
3. SELECT
Product Price Quantity
Bagel 3 20 Product sum(quantity)
Other Examples
Compare these
two queries:
SELECT product,
sum(quantity) AS SumQuantity,
How about
max(price) AS MaxPrice this one?
FROM Purchase
GROUP BY product
18
Need to be Careful…
SELECT product, max(quantity) Product Price Quantity
FROM Purchase Bagel 3 20
GROUP BY product
Bagel 1.50 20
SELECT product, quantity Banana 0.5 50
FROM Purchase
GROUP BY product Banana 2 10
Banana 4 10
sqlite allows this
query to be executed
with strange Better DBMS (e.g., SQL
behavior. Server) gives an error
19
Purchase(pid, product, price, quantity, month)
Ordering Results
FWGOS
20
Purchase(pid, product, price, quantity, month)
Ordering Results
FWGOS
HAVING Clause
Same query as earlier, except that we consider only products
that had at least 30 sales.
Exercise
Compute the total income per month
Show only months with less than 10 items sold
Order by quantity sold and display as “TotalSold”
23
WHERE vs. HAVING
• WHERE condition is applied to individual rows
– The rows may or may not contribute to the aggregate
– No aggregates allowed here
24
Purchase(pid, product, price, quantity, month)
Mystery Query
What do they compute?
26
Purchase(pid, product, price, quantity, month)
Product(pid, pname, manufacturer)
27
Nested Loop Semantics for SFW
SELECT x1.a1, x2.a2, … xm.am
FROM R1 as x1, R2 as x2, … Rm as xm
WHERE Cond
for x1 in R1:
for x2 in R2:
... Nested loop
for xm in Rm: semantics
if Cond(x1, x2…):
output(x1.a1, x2.a2, … xm.am)
28
Semantics for SFWGH
SELECT S
FROM R1,…,Rn
WHERE C1
GROUP BY a1,…,ak Why ?
HAVING C2
S = may contain attributes a1,…,ak and/or any
aggregates, but NO OTHER ATTRIBUTES
C1 = is any condition on the attributes in R1,…,Rn
C2 = is any condition on aggregate expressions
and on attributes a1,…,ak
29
Semantics for SFWGH
SELECT S
FROM R1,…,Rn
WHERE C1
GROUP BY a1,…,ak
HAVING C2
Evaluation steps:
1. Evaluate FROM-WHERE using Nested Loop Semantics
2. Group by the attributes a1,…,ak
3. Apply condition C2 to each group (may have aggregates)
4. Compute aggregates in S and return the result
21
Semantics for SFWGH
SELECT S
Execution order:
FROM R1,…,Rn
WHERE C1 FWGHOS
GROUP BY a1,…,ak
HAVING C2
Evaluation steps:
1. Evaluate FROM-WHERE using Nested Loop Semantics
2. Group by the attributes a1,…,ak
3. Apply condition C2 to each group (may have aggregates)
4. Compute aggregates in S and return the result
22
Purchase(pid, product, price, quantity, month)
Product(pid, pname, manufacturer)
32
Empty Groups
• In the result of a group by query, there is one
row per group in the result
• No group can be empty! What if there
• In particular, count(*) is never 0 are no
purchases for a
manufacturer
SELECT manufacturer, count(*)
FROM Product, Purchase
WHERE pname = product
GROUP BY manufacturer
33
Empty Group Solution:
Outer Join
34
Purchase(pid, product, price, quantity, month)
Product(pid, pname, manufacturer)
Exercise:
35
Purchase(pid, product, price, quantity, month)
Product(pid, pname, manufacturer)
Exercise:
36
Purchase(pid, product, price, quantity, month)
Product(pid, pname, manufacturer)
Exercise:
37
Purchase(pid, product, price, quantity, month)
Product(pid, pname, manufacturer)
Exercise:
38