Lec04 Aggregates
Lec04 Aggregates
§ Homework 2
• Posted
• Due on Friday
• Sqlite
§ Homework 3
• To be posted later this week
• Due on April 19
• SQL Azure in the cloud
Payroll
UserID Name Job Salary Regist
123 Jack TA 50000 UserID Car
345 Allison TA 60000 123 Charger
567 Magda Prof 90000 567 Civic
789 Dan Prof 100000 567 Pinto
April 1, 2024 Joins 3
Recap: Inner Join
For each employee, find the cars that they drive
SELECT P.Name, R.Car Name Car
FROM Payroll AS P Jack Charger
JOIN Regist AS R
Magda Civic
ON P.UserID = R.UserID;
Magda Pinto
Allison, Dan
are missing
Payroll
UserID Name Job Salary Regist
123 Jack TA 50000 UserID Car
345 Allison TA 60000 123 Charger
567 Magda Prof 90000 567 Civic
789 Dan Prof 100000 567 Pinto
April 1, 2024 Joins 4
Recap: Outer Join
For each employee, find the cars that they drive
SELECT P.Name, R.Car
FROM Payroll AS P
LEFT OUTER JOIN Regist AS R
ON P.UserID = R.UserID;
Payroll
UserID Name Job Salary Regist
123 Jack TA 50000 UserID Car
345 Allison TA 60000 123 Charger
567 Magda Prof 90000 567 Civic
789 Dan Prof 100000 567 Pinto
April 1, 2024 Joins 5
Recap: Outer Join
For each employee, find the cars that they drive
SELECT P.Name, R.Car Name Car
FROM Payroll AS P Jack Charger
LEFT OUTER JOIN Regist AS R
Magda Civic
ON P.UserID = R.UserID;
Magda Pinto
Allison NULL
NULL means Dan NULL
“unknown” or
“missing”
Payroll
UserID Name Job Salary Regist
123 Jack TA 50000 UserID Car
345 Allison TA 60000 123 Charger
567 Magda Prof 90000 567 Civic
789 Dan Prof 100000 567 Pinto
April 1, 2024 Joins 6
Recap: Outer Join
§ LEFT OUTER JOIN
• Add missing tuples from the LEFT
Payroll
Complications:
§ Expressions with NULLs?
§ Conditions with NULLs?
Payroll
Payroll
Payroll
Payroll
Payroll
Payroll
Payroll
SELECT Name
FROM Payroll
WHERE Job = ‘TA’;
Payroll
Payroll
Payroll
Not included:
UserID Name Job Salary
SQL uses
123 Jack TA 50000 3 valued logic.
345 Allison NULL 60000
567 Magda Prof 90000
789 Dan Prof NULL
432 NULL Prof NULL
April 1, 2024 Aggregates 21
Three-valued Logic
false = 0; unknown = 0.5; true = 1
x AND y = min(x,y);
x OR y = max(x,y);
not x = 1-x
x AND y = min(x,y);
x OR y = max(x,y);
not x = 1-x
x AND y = min(x,y);
x OR y = max(x,y);
not x = 1-x
x AND y = min(x,y);
x OR y = max(x,y);
not x = 1-x
x AND y = min(x,y);
x OR y = max(x,y);
not x = 1-x
x AND y = min(x,y);
x OR y = max(x,y);
not x = 1-x
x AND y = min(x,y);
x OR y = max(x,y);
not x = 1-x
x AND y = min(x,y);
x OR y = max(x,y);
not x = 1-x
SELECT *
FROM Payroll
WHERE Job != ‘Prof’
or Salary > 80000;
Payroll
SELECT *
FROM Payroll
WHERE Job != ‘Prof’
or Salary > 80000;
Payroll
SELECT *
FROM Payroll
WHERE Job != ‘Prof’
or Salary > 80000;
Payroll
SELECT *
FROM Payroll
WHERE Job != ‘Prof’
or Salary > 80000;
Payroll
SELECT *
FROM Payroll
WHERE Job != ‘Prof’ UserID Name Job Salary
or Salary > 80000; 123 Jack TA 50000
567 Magda Prof 90000
Payroll
SELECT *
FROM Payroll
WHERE Job !=‘Prof’ or Job =‘Prof’;
Payroll
SELECT *
FROM Payroll
WHERE Job !=‘Prof’ or Job =‘Prof’;
Should return
Payroll everyone, but…
UserID Name Job Salary
123 Jack TA 50000
345 Allison NULL 60000
567 Magda Prof 90000
789 Dan Prof NULL
432 NULL Prof NULL
April 1, 2024 Aggregates 36
Three-valued Logic
SELECT *
FROM Payroll
WHERE Job !=‘Prof’ or Job =‘Prof’;
Should return
Payroll everyone, but…
UserID Name Job Salary
123 Jack TA 50000
345 Allison NULL 60000 …we are missing
567 Magda Prof 90000 Allison!
789 Dan Prof NULL
432 NULL Prof NULL
April 1, 2024 Aggregates 37
Three-valued Logic
SELECT *
FROM Payroll
WHERE Job !=‘Prof’ or Job =‘Prof’ or Job isNull;
Payroll
Payroll Regist
UserID Name Job Salary UserID Car
123 Jack TA 50000 123 Charger
345 Allison TA 60000 567 Civic
567 Magda Prof 90000 567 Pinto
April 1, 2024 Joins 39
Outer Joins Revisited: ON v.s. WHERE
1. Perform the join with the ON clause
2. Add all missing tuples from LEFT
SELECT P.Name, R.Car 3. Check the WHERE clause (if any)
FROM Payroll AS P Name Car
LEFT OUTER JOIN Regist AS R Steps 1,2,3 Jack Charger
ON P.UserID = R.UserID Allison NULL
AND R.Car = ‘Charger’;
Magda NULL
Payroll Regist
UserID Name Job Salary UserID Car
123 Jack TA 50000 123 Charger
345 Allison TA 60000 567 Civic
567 Magda Prof 90000 567 Pinto
April 1, 2024 Joins 40
Outer Joins Revisited: ON v.s. WHERE
1. Perform the join with the ON clause
2. Add all missing tuples from LEFT
SELECT P.Name, R.Car 3. Check the WHERE clause (if any)
FROM Payroll AS P Name Car
LEFT OUTER JOIN Regist AS R Steps 1,2,3 Jack Charger
ON P.UserID = R.UserID Allison NULL
AND R.Car = ‘Charger’;
Magda NULL
Payroll Regist
UserID Name Job Salary UserID Car
123 Jack TA 50000 123 Charger
345 Allison TA 60000 567 Civic
567 Magda Prof 90000 567 Pinto
April 1, 2024 Joins 41
Outer Joins Revisited: ON v.s. WHERE
1. Perform the join with the ON clause
2. Add all missing tuples from LEFT
SELECT P.Name, R.Car 3. Check the WHERE clause (if any)
FROM Payroll AS P Name Car
LEFT OUTER JOIN Regist AS R Steps 1,2,3 Jack Charger
ON P.UserID = R.UserID Allison NULL
AND R.Car = ‘Charger’;
Magda NULL
Payroll Regist
UserID Name Job Salary UserID Car
123 Jack TA 50000 123 Charger
345 Allison TA 60000 567 Civic
567 Magda Prof 90000 567 Pinto
April 1, 2024 Joins 42
Outer Joins Revisited: ON v.s. WHERE
1. Perform the join with the ON clause
2. Add all missing tuples from LEFT
SELECT P.Name, R.Car 3. Check the WHERE clause (if any)
FROM Payroll AS P Name Car
LEFT OUTER JOIN Regist AS R Steps 1,2,3 Jack Charger
ON P.UserID = R.UserID Allison NULL
AND R.Car = ‘Charger’;
Magda NULL
Payroll Regist
UserID Name Job Salary UserID Car
123 Jack TA 50000 123 Charger
345 Allison TA 60000 567 Civic
567 Magda Prof 90000 567 Pinto
April 1, 2024 Joins 43
Outer Joins Revisited: ON v.s. WHERE
1. Perform the join with the ON clause
2. Add all missing tuples from LEFT
SELECT P.Name, R.Car 3. Check the WHERE clause (if any)
FROM Payroll AS P Name Car
LEFT OUTER JOIN Regist AS R Steps 1,2,3 Jack Charger
ON P.UserID = R.UserID Allison NULL
AND R.Car = ‘Charger’;
Magda NULL
Payroll Regist
UserID Name Job Salary UserID Car
123 Jack TA 50000 123 Charger
345 Allison TA 60000 567 Civic
567 Magda Prof 90000 567 Pinto
April 1, 2024 Joins 44
Outer Joins Revisited: ON v.s. WHERE
1. Perform the join with the ON clause
2. Add all missing tuples from LEFT
SELECT P.Name, R.Car 3. Check the WHERE clause (if any)
FROM Payroll AS P Name Car
LEFT OUTER JOIN Regist AS R Steps 1,2,3 Jack Charger
ON P.UserID = R.UserID Allison NULL
AND R.Car = ‘Charger’;
Magda NULL
§ 3-Valued Logic
§Aggregates in SQL:
• sum(1, 4, 3, 4) = 1+4+3+4 = 12
• max(1, 4, 3, 4) = 4
• min(1, 4, 3, 4) = 1
• count(1, 4, 3, 4) = 4
• avg(1, 4, 3, 4) = 3
§Aggregates in SQL:
• sum(1, 4, 3, 4) = 1+4+3+4 = 12
• max(1, 4, 3, 4) = 4
• min(1, 4, 3, 4) = 1 The collection may
have duplicates!
• count(1, 4, 3, 4) = 4
• avg(1, 4, 3, 4) = 3
SELECT count(*) as C C
FROM Payroll; 4
Payroll
UserID Name Job Salary Regist
123 Jack TA 50000 UserID Car
345 Allison TA 60000 123 Charger
567 Magda Prof 90000 567 Civic
789 Dan Prof 100000 567 Pinto
April 1, 2024 Aggregates 51
COUNT
How many records are in Payroll?
Payroll
UserID Name Job Salary Regist
123 Jack TA 50000 UserID Car
345 Allison TA 60000 123 Charger
567 Magda Prof 90000 567 Civic
789 Dan Prof 100000 567 Pinto
April 1, 2024 Aggregates 52
COUNT
How many records are in Payroll? How many cars are in the database?
Payroll
UserID Name Job Salary Regist
123 Jack TA 50000 UserID Car
345 Allison TA 60000 123 Charger
567 Magda Prof 90000 567 Civic
789 Dan Prof 100000 567 Pinto
April 1, 2024 Aggregates 53
COUNT
How many records are in Payroll? How many cars are in the database?
SELECT count(*)
…
FROM Payroll
2
WHERE Job=‘TA’;
Payroll
UserID Name Job Salary Regist
123 Jack TA 50000 UserID Car
345 Allison TA 60000 123 Charger
567 Magda Prof 90000 567 Civic
789 Dan Prof 100000 567 Pinto
April 1, 2024 Aggregates 54
COUNT
How many records are in Payroll? How many cars are in the database?
How many TA’s are there? How many people have salary > 55000?
Payroll
UserID Name Job Salary Regist
123 Jack TA 50000 UserID Car
345 Allison TA 60000 123 Charger
567 Magda Prof 90000 567 Civic
789 Dan Prof 100000 567 Pinto
April 1, 2024 Aggregates 56
SUM, MIN, MAX, AVG
What is the sum of all salaries?
Payroll
UserID Name Job Salary Regist
123 Jack TA 50000 UserID Car
345 Allison TA 60000 123 Charger
567 Magda Prof 90000 567 Civic
789 Dan Prof 100000 567 Pinto
April 1, 2024 Aggregates 57
SUM, MIN, MAX, AVG
What is the sum of all salaries?
Payroll
UserID Name Job Salary Regist
123 Jack TA 50000 UserID Car
345 Allison TA 60000 123 Charger
567 Magda Prof 90000 567 Civic
789 Dan Prof 100000 567 Pinto
April 1, 2024 Aggregates 58
Semantics
SELECT agg(attrs)
FROM ... WHERE ...;
attrs …
attrs …
Step 2: agg
apply aggregate 55
Payroll
UserID Name Job Salary
123 Jack TA 50000
345 Allison TA 60000
567 Magda Prof 90000
789 Dan Prof 100000
April 1, 2024 Aggregates 65
COUNT
SELECT count(*) SELECT * …
FROM Payroll; FROM Payroll; 4
Payroll
UserID Name Job Salary
123 Jack TA 50000
345 Allison TA 60000
567 Magda Prof 90000
789 Dan Prof 100000
April 1, 2024 Aggregates 66
COUNT
SELECT count(*) SELECT * …
FROM Payroll; FROM Payroll; 4
SELECT count(Job)
FROM Payroll;
Payroll
UserID Name Job Salary
123 Jack TA 50000
345 Allison TA 60000
567 Magda Prof 90000
789 Dan Prof 100000
April 1, 2024 Aggregates 67
COUNT
SELECT count(*) SELECT * …
FROM Payroll; FROM Payroll; 4
Payroll
UserID Name Job Salary
123 Jack TA 50000
345 Allison TA 60000
567 Magda Prof 90000
789 Dan Prof 100000
April 1, 2024 Aggregates 69
COUNT
SELECT count(*) SELECT * …
FROM Payroll; FROM Payroll; 4
Payroll
UserID Name Job Salary
123 Jack TA 50000
345 Allison TA 60000
567 Magda Prof 90000
789 Dan Prof 100000
April 1, 2024 Aggregates 70
COUNT
SELECT count(*) SELECT * …
FROM Payroll; FROM Payroll; 4
§ Sum: same as 0
Payroll
Payroll Regist
UserID Name Job Salary UserID Car
123 Jack TA 50000 123 Charger
345 Allison TA 60000 123 Pinto
567 Magda Prof 90000 567 Civic
789 Dan Prof 100000 567 Pinto
April 1, 2024 Aggregates 80
Aggregates and Joins
Find the average salary of people driving a Pinto
SELECT avg(P.Salary)
FROM Payroll P, Regist R
WHERE P.UserID = R.UserID
and R.Car = ‘Pinto’;
Payroll Regist
UserID Name Job Salary UserID Car
123 Jack TA 50000 123 Charger
345 Allison TA 60000 123 Pinto
567 Magda Prof 90000 567 Civic
789 Dan Prof 100000 567 Pinto
April 1, 2024 Aggregates 81
Aggregates and Joins
SELECT P.Salary
Find the average salary of people driving a Pinto FROM Payroll P, Regist R
...;
SELECT avg(P.Salary)
FROM Payroll P, Regist R Name Salary
Payroll Regist
UserID Name Job Salary UserID Car
123 Jack TA 50000 123 Charger
345 Allison TA 60000 123 Pinto
567 Magda Prof 90000 567 Civic
789 Dan Prof 100000 567 Pinto
April 1, 2024 Aggregates 82
Aggregates and Joins
SELECT P.Salary
Find the average salary of people driving a Pinto FROM Payroll P, Regist R
...;
SELECT avg(P.Salary)
FROM Payroll P, Regist R Name Salary
avg(…)
70000
Payroll Regist
UserID Name Job Salary UserID Car
123 Jack TA 50000 123 Charger
345 Allison TA 60000 123 Pinto
567 Magda Prof 90000 567 Civic
789 Dan Prof 100000 567 Pinto
April 1, 2024 Aggregates 83
Duplicates
§ Need to watch for duplicates introduced when we
join two tables
Payroll
UserID Name Job Salary Regist
123 Jack TA 50000 UserID Car
345 Allison TA 60000 123 Charger
567 Magda Prof 90000 567 Civic
789 Dan Prof 100000 567 Pinto
April 1, 2024 Aggregates 85
Duplicates
How many people drive a car?
SELECT count(*)
FROM Payroll P, Regist R
WHERE P.UserID = R.UserID;
Payroll
UserID Name Job Salary Regist
123 Jack TA 50000 UserID Car
345 Allison TA 60000 123 Charger
567 Magda Prof 90000 567 Civic
789 Dan Prof 100000 567 Pinto
April 1, 2024 Aggregates 86
Duplicates
How many people drive a car? SELECT *
FROM ...;
SELECT count(*)
FROM Payroll P, Regist R
WHERE P.UserID = R.UserID;
UserID Name Job Salary UserID Car
123 Jack TA 50000 123 Charger
567 Magda Prof 90000 567 Civic
567 Magda Prof 90000 567 Pinto
Payroll
UserID Name Job Salary Regist
123 Jack TA 50000 UserID Car
345 Allison TA 60000 123 Charger
567 Magda Prof 90000 567 Civic
789 Dan Prof 100000 567 Pinto
April 1, 2024 Aggregates 87
Duplicates
How many people drive a car? SELECT *
FROM ...;
SELECT count(*)
FROM Payroll P, Regist R
WHERE P.UserID = R.UserID;
UserID Name Job Salary UserID Car
count(*) 123 Jack TA 50000 123 Charger
Wrong! 3 567 Magda Prof 90000 567 Civic
567 Magda Prof 90000 567 Pinto
Payroll
UserID Name Job Salary Regist
123 Jack TA 50000 UserID Car
345 Allison TA 60000 123 Charger
567 Magda Prof 90000 567 Civic
789 Dan Prof 100000 567 Pinto
April 1, 2024 Aggregates 88
Duplicates
How many people drive a car?
Payroll
UserID Name Job Salary Regist
123 Jack TA 50000 UserID Car
345 Allison TA 60000 123 Charger
567 Magda Prof 90000 567 Civic
789 Dan Prof 100000 567 Pinto
April 1, 2024 Aggregates 89
Duplicates
How many people drive a car? SELECT DISTINCT UserID
FROM ...;
SELECT count(DISTINCT UserID)
FROM Payroll P, Regist R
WHERE P.UserID = R.UserID;
UserID
123
567
Payroll
UserID Name Job Salary Regist
123 Jack TA 50000 UserID Car
345 Allison TA 60000 123 Charger
567 Magda Prof 90000 567 Civic
789 Dan Prof 100000 567 Pinto
April 1, 2024 Aggregates 90
Duplicates
How many people drive a car? SELECT DISTINCT UserID
FROM ...;
SELECT count(DISTINCT UserID)
FROM Payroll P, Regist R
WHERE P.UserID = R.UserID;
UserID
count(*)
123
Right! 2
567
Payroll
UserID Name Job Salary Regist
123 Jack TA 50000 UserID Car
345 Allison TA 60000 123 Charger
567 Magda Prof 90000 567 Civic
789 Dan Prof 100000 567 Pinto
April 1, 2024 Aggregates 91
Duplicates
What is the average salary of car drivers?
SELECT avg(P.Salary)
FROM Payroll P, Regist R
WHERE P.UserID = R.UserID;
Payroll
UserID Name Job Salary Regist
123 Jack TA 50000 UserID Car
345 Allison TA 60000 123 Charger
567 Magda Prof 90000 567 Civic
789 Dan Prof 100000 567 Pinto
April 1, 2024 Aggregates 92
Duplicates
What is the average salary of car drivers? SELECT P.Salary
FROM ...;
SELECT avg(P.Salary)
FROM Payroll P, Regist R
WHERE P.UserID = R.UserID;
Name Salary
Jack 50000
Magda 90000
Magda 90000
Payroll
UserID Name Job Salary Regist
123 Jack TA 50000 UserID Car
345 Allison TA 60000 123 Charger
567 Magda Prof 90000 567 Civic
789 Dan Prof 100000 567 Pinto
April 1, 2024 Aggregates 93
Duplicates
What is the average salary of car drivers? SELECT P.Salary
FROM ...;
SELECT avg(P.Salary)
FROM Payroll P, Regist R
WHERE P.UserID = R.UserID;
Name Salary
avg(…) Jack 50000
76667 Magda 90000
Magda 90000
Payroll
UserID Name Job Salary Regist
123 Jack TA 50000 UserID Car
345 Allison TA 60000 123 Charger
567 Magda Prof 90000 567 Civic
789 Dan Prof 100000 567 Pinto
April 1, 2024 Aggregates 94
Duplicates
What is the average salary of car drivers? SELECT P.Salary
FROM ...;
SELECT avg(P.Salary)
FROM Payroll P, Regist R
WHERE P.UserID = R.UserID;
Name Salary
avg(…) Jack 50000
Wrong! 76667 Magda 90000 Duplicate!
Magda 90000
Payroll
UserID Name Job Salary Regist
123 Jack TA 50000 UserID Car
345 Allison TA 60000 123 Charger
567 Magda Prof 90000 567 Civic
789 Dan Prof 100000 567 Pinto
April 1, 2024 Aggregates 95
Duplicates
What is the average salary of car drivers?
Payroll
UserID Name Job Salary Regist
123 Jack TA 50000 UserID Car
345 Allison TA 60000 123 Charger
567 Magda Prof 90000 567 Civic
789 Dan Prof 100000 567 Pinto
April 1, 2024 Aggregates 96
Duplicates
What is the average salary of car drivers? SELECT DISTINCT P.Salary
FROM ...;
SELECT avg(DISTINCT P.Salary)
FROM Payroll P, Regist R
WHERE P.UserID = R.UserID;
Salary
avg(…)
50000
Does DISTINCT fix it? 70000
90000
Wrong!
Payroll Regist
UserID Name Job Salary UserID Car
123 Jack TA 50000 123 Charger
345 Allison TA 50000 345 Tesla
567 Magda Prof 90000 567 Civic
789 Dan Prof 100000 567 Pinto
April 1, 2024 Aggregates 97
Duplicates
What is the average salary of car drivers? SELECT DISTINCT P.Salary
FROM ...;
SELECT avg(DISTINCT P.Salary)
FROM Payroll P, RegistThis
R query is harder to fix.
WHERE P.UserID = R.UserID;
We will discuss it on Friday
Salary
avg(…)
50000
Does DISTINCT fix it? 70000
90000
Wrong!
Correct answer:
63333
Payroll Regist
UserID Name Job Salary UserID Car
123 Jack TA 50000 123 Charger
345 Allison TA 50000 345 Tesla
567 Magda Prof 90000 567 Civic
789 Dan Prof 100000 567 Pinto
April 1, 2024 Aggregates 98
Summary for Today
§ NULLs:
• Once a NULL, always a NULL
• 3-Valued Logic (3VL)
• Outer-joins revisited
§ Aggregates
• sum, min, max, count, avg
• Two steps semantics
• Subtle interactions with joins, duplicates, nulls