0% found this document useful (0 votes)
17 views

Lec04 SQL Aggregation Grouping

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views

Lec04 SQL Aggregation Grouping

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 38

Introduction to Database Systems

SQL Aggregation & Grouping

1
Aggregation in SQL
>sqlite3 lecture04

sqlite> create table Purchase(


pid int primary key,
product text,
price float, Other DBMSs have
quantity int, other ways of
month varchar(15)); importing data

sqlite> -- download data.txt


sqlite> .import lec04-data.txt Purchase

2
Comment about SQLite
• One cannot load NULL values such that they
are actually loaded as null values

• So we need to use two steps:


– Load null values using some type of special value
– Update the special values to actual null values
update Purchase
set price = null
where price = ‘null’
3
Simple Aggregations
Five basic aggregate operations in SQL

select count(*) from Purchase


select sum(quantity) from Purchase
select avg(price) from Purchase
select max(quantity) from Purchase
select min(quantity) from Purchase

Except count, all aggregations apply to a single value


4
Aggregates and NULL Values

Null values are not used in aggregates


insert into Purchase
values(12, 'gadget', NULL, NULL, 'april')
Let’s try the following
select count(*) from Purchase

select count(quantity) from Purchase

select sum(quantity) from Purchase

select sum(quantity)
from Purchase
where quantity is not null; 5
Aggregates and NULL Values
Null values are not used in aggregates
insert into Purchase
values(12, 'gadget', NULL, NULL, 'april')
Let’s try the following
select count(*) from Purchase
-- NULL is counted in count(*)
select count(quantity) from Purchase
-- NULL is ignored in count(quantity)

select sum(quantity) from Purchase

select sum(quantity)
from Purchase
where quantity is not null;
-- “is not null” is redundant
6
Counting Duplicates
COUNT applies to duplicates, unless otherwise stated:

SELECT Count(product) same as Count(*) if no nulls


FROM Purchase
WHERE price > 4.99

We probably want:
SELECT Count(DISTINCT product)
FROM Purchase
WHERE price> 4.99
7
More Examples

SELECT Sum(price * quantity)


FROM Purchase
What do
SELECT Sum(price * quantity) they mean ?
FROM Purchase
WHERE product = ‘bagel’

8
Simple Aggregations
Purchase Product Price Quantity
Bagel 3 20
Bagel 1.50 20
Banana 0.5 50
Banana 2 10
Banana 4 10
SELECT Sum(price * quantity)
FROM Purchase 90 (= 60+30)
WHERE product = ‘Bagel’
9
Simple Aggregations
Purchase Product Price Quantity
Bagel 3 20
Bagel 1.50 20
Banana 0.5 50
Banana 2 10
Banana 4 10
SELECT Sum(price * quantity)
FROM Purchase 90 (= 60+30)
WHERE product = ‘Bagel’
10
More Examples
How can we find the average revenue per sale?

SELECT sum(price * quantity) / count(*)


FROM Purchase
WHERE product = ‘bagel’

How can we find the average price of a bagel sold?

SELECT sum(price * quantity) / sum(quantity)


FROM Purchase
WHERE product = ‘bagel’

11
More Examples
SELECT sum(price * quantity) / count(*)
FROM Purchase
WHERE product = ‘bagel’

SELECT sum(price * quantity) / sum(quantity)


FROM Purchase
WHERE product = ‘bagel’

What happens if there are NULLs in price or quantity?

Lesson: disallow NULLs unless you need to handle them

12
Grouping and Aggregation
Purchase(product, price, quantity)

Find number of bagels sold for more than $1

SELECT Sum(quantity) as TotalSold


FROM Purchase
WHERE price > 1 and product = ‘bagel’

13
Grouping and Aggregation
Purchase(product, price, quantity)

Find number sold for more than $1 for each product

SELECT product, Sum(quantity)


FROM Purchase
WHERE price > 1
GROUP BY product

Let’s see what this means…


14
Grouping and Aggregation

1. Compute the FROM and WHERE clauses.

2. Group by the attributes in the GROUP BY

3. Compute the SELECT clause:


grouped attributes and aggregates.

FWGS

15
1&2. FROM-WHERE-GROUPBY

Product Price Quantity


Bagel 3 20 FWGS
Bagel 1.50 20
Banana 0.5 50
Banana 2 10
WHERE price > 1
Banana 4 10

16
FWGS
3. SELECT
Product Price Quantity
Bagel 3 20 Product sum(quantity)

Bagel 1.50 20 Bagel 40


Banana 0.5 50 Banana 20
Banana 2 10
Banana 4 10
SELECT product, Sum(quantity)
FROM Purchase
WHERE price > 1
GROUP BY product
17
Purchase(pid, product, price, quantity, month)

Other Examples
Compare these
two queries:

SELECT product, count(*) SELECT month, count(*)


FROM Purchase FROM Purchase
GROUP BY product GROUP BY month

SELECT product,
sum(quantity) AS SumQuantity,
How about
max(price) AS MaxPrice this one?
FROM Purchase
GROUP BY product
18
Need to be Careful…
SELECT product, max(quantity) Product Price Quantity
FROM Purchase Bagel 3 20
GROUP BY product
Bagel 1.50 20
SELECT product, quantity Banana 0.5 50
FROM Purchase
GROUP BY product Banana 2 10
Banana 4 10
sqlite allows this
query to be executed
with strange Better DBMS (e.g., SQL
behavior. Server) gives an error
19
Purchase(pid, product, price, quantity, month)

Ordering Results

SELECT product, sum(price*quantity)


FROM Purchase
GROUP BY product
ORDER BY sum(price*quantity) DESC

FWGOS

20
Purchase(pid, product, price, quantity, month)

Ordering Results

SELECT product, sum(price*quantity) as rev


FROM Purchase
GROUP BY product
ORDER BY rev desc

FWGOS

Note: some SQL engines


CSE 414 - Fall 2017 12
want you to say ORDER BY sum(price*quantity)
Purchase(pid, product, price, quantity, month)

HAVING Clause
Same query as earlier, except that we consider only products
that had at least 30 sales.

SELECT product, sum(price*quantity)


FROM Purchase
WHERE price > 1 FWGHOS
GROUP BY product
HAVING sum(quantity) > 30

HAVING clause contains conditions on groups.


22
Purchase(pid, product, price, quantity, month)

Exercise
Compute the total income per month
Show only months with less than 10 items sold
Order by quantity sold and display as “TotalSold”

SELECT month, sum(price*quantity),


sum(quantity) as TotalSold
FROM Purchase FWGHOS
GROUP BY month
HAVING sum(quantity) < 10
ORDER BY sum(quantity)

23
WHERE vs. HAVING
• WHERE condition is applied to individual rows
– The rows may or may not contribute to the aggregate
– No aggregates allowed here

• HAVING condition is applied to the entire group


– Entire group is returned, or not at all
– May use aggregate functions in the group

24
Purchase(pid, product, price, quantity, month)

Mystery Query
What do they compute?

SELECT month, sum(quantity), max(price)


FROM Purchase
GROUP BY month

SELECT month, sum(quantity)


FROM Purchase Lesson:
GROUP BY month DISTINCT is
a special case
SELECT month of GROUP BY
FROM Purchase
GROUP BY month
25
Aggregates and Joins
create table Product(
pid int primary key,
pname varchar(15),
manufacturer varchar(15));

insert into product values(1,'bagel’,'Sunshine Co.');


insert into product values(2,'banana’,'BusyHands');
insert into product values(3,'gizmo’,'GizmoWorks');
insert into product values(4,'gadget’,'BusyHands');
insert into product values(5,'powerGizmo’,'PowerWorks');

26
Purchase(pid, product, price, quantity, month)
Product(pid, pname, manufacturer)

Aggregate + Join Example


Let’s figure out
what these
SELECT manufacturer, count(*) mean…
FROM Product, Purchase
WHERE pname = product
GROUP BY manufacturer

SELECT manufacturer, month, count(*)


FROM Product, Purchase
WHERE pname = product
GROUP BY manufacturer, month

27
Nested Loop Semantics for SFW
SELECT x1.a1, x2.a2, … xm.am
FROM R1 as x1, R2 as x2, … Rm as xm
WHERE Cond

for x1 in R1:
for x2 in R2:
... Nested loop
for xm in Rm: semantics
if Cond(x1, x2…):
output(x1.a1, x2.a2, … xm.am)

28
Semantics for SFWGH
SELECT S
FROM R1,…,Rn
WHERE C1
GROUP BY a1,…,ak Why ?
HAVING C2
S = may contain attributes a1,…,ak and/or any
aggregates, but NO OTHER ATTRIBUTES
C1 = is any condition on the attributes in R1,…,Rn
C2 = is any condition on aggregate expressions
and on attributes a1,…,ak
29
Semantics for SFWGH
SELECT S
FROM R1,…,Rn
WHERE C1
GROUP BY a1,…,ak
HAVING C2
Evaluation steps:
1. Evaluate FROM-WHERE using Nested Loop Semantics
2. Group by the attributes a1,…,ak
3. Apply condition C2 to each group (may have aggregates)
4. Compute aggregates in S and return the result
21
Semantics for SFWGH
SELECT S
Execution order:
FROM R1,…,Rn
WHERE C1 FWGHOS
GROUP BY a1,…,ak
HAVING C2
Evaluation steps:
1. Evaluate FROM-WHERE using Nested Loop Semantics
2. Group by the attributes a1,…,ak
3. Apply condition C2 to each group (may have aggregates)
4. Compute aggregates in S and return the result
22
Purchase(pid, product, price, quantity, month)
Product(pid, pname, manufacturer)

Aggregate + Join Example


What do these
SELECT manufacturer, count(*) queries mean?
FROM Product, Purchase
WHERE pname = product
GROUP BY manufacturer

SELECT manufacturer, month, count(*)


FROM Product, Purchase
WHERE pname = product
GROUP BY manufacturer, month

32
Empty Groups
• In the result of a group by query, there is one
row per group in the result
• No group can be empty! What if there
• In particular, count(*) is never 0 are no
purchases for a
manufacturer
SELECT manufacturer, count(*)
FROM Product, Purchase
WHERE pname = product
GROUP BY manufacturer

33
Empty Group Solution:
Outer Join

SELECT manufacturer, count(quantity)


FROM Product LEFT OUTER JOIN Purchase
ON pname = product
GROUP BY manufacturer

Why not count(*)?

34
Purchase(pid, product, price, quantity, month)
Product(pid, pname, manufacturer)

Exercise:

Find all manufacturers with more than 10 items sold.


Return manufacturer name and number of items sold.

SELECT manufacturer, sum(quantity)


FROM Product, Purchase
WHERE pname = product
GROUP BY manufacturer
HAVING sum(quantity) > 10

35
Purchase(pid, product, price, quantity, month)
Product(pid, pname, manufacturer)

Exercise:

Find all manufacturers with more than 1 distinct product sold


Return the name of the manufacturer and
number of distinct products sold

SELECT manufacturer, count(distinct product)


FROM Product, Purchase
WHERE pname = product
GROUP BY manufacturer
HAVING count(distinct product) > 1

36
Purchase(pid, product, price, quantity, month)
Product(pid, pname, manufacturer)

Exercise:

Find all products with more than 2 purchases


Return the name of the product and max price it was sold

SELECT pname, max(price)


FROM Product, Purchase
WHERE pname = product
GROUP BY pname
HAVING COUNT(*) > 2

37
Purchase(pid, product, price, quantity, month)
Product(pid, pname, manufacturer)

Exercise:

Find all manufacturers with at least 5 purchases in one month


Return manufacturer name, month, and number of items sold

SELECT manufacturer, month, sum(quantity)


FROM Product, Purchase
WHERE pname = product
GROUP BY manufacturer, month
HAVING count(*) >= 5

38

You might also like