0% found this document useful (0 votes)
4 views99 pages

Lec04 Aggregates

The document covers data management concepts, focusing on SQL operations like inner and outer joins, as well as handling NULL values in queries. It explains how NULLs represent missing or unknown data and discusses their implications in expressions and conditions. Additionally, it introduces three-valued logic in SQL, illustrating how it affects query results.

Uploaded by

alessiatxy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views99 pages

Lec04 Aggregates

The document covers data management concepts, focusing on SQL operations like inner and outer joins, as well as handling NULL values in queries. It explains how NULLs represent missing or unknown data and discusses their implications in expressions and conditions. Additionally, it introduces three-valued logic in SQL, illustrating how it affects query results.

Uploaded by

alessiatxy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 99

Introduction to Data Management

Nulls and Aggregates

April 1, 2024 Aggregates 1


Announcements

§ Homework 2
• Posted
• Due on Friday
• Sqlite

§ Homework 3
• To be posted later this week
• Due on April 19
• SQL Azure in the cloud

April 1, 2024 Aggregates 2


Recap: Inner Join
For each employee, find the cars that they drive
SELECT P.Name, R.Car Name Car
FROM Payroll AS P Jack Charger
JOIN Regist AS R
Magda Civic
ON P.UserID = R.UserID;
Magda Pinto

Payroll
UserID Name Job Salary Regist
123 Jack TA 50000 UserID Car
345 Allison TA 60000 123 Charger
567 Magda Prof 90000 567 Civic
789 Dan Prof 100000 567 Pinto
April 1, 2024 Joins 3
Recap: Inner Join
For each employee, find the cars that they drive
SELECT P.Name, R.Car Name Car
FROM Payroll AS P Jack Charger
JOIN Regist AS R
Magda Civic
ON P.UserID = R.UserID;
Magda Pinto
Allison, Dan
are missing

Payroll
UserID Name Job Salary Regist
123 Jack TA 50000 UserID Car
345 Allison TA 60000 123 Charger
567 Magda Prof 90000 567 Civic
789 Dan Prof 100000 567 Pinto
April 1, 2024 Joins 4
Recap: Outer Join
For each employee, find the cars that they drive
SELECT P.Name, R.Car
FROM Payroll AS P
LEFT OUTER JOIN Regist AS R
ON P.UserID = R.UserID;

Payroll
UserID Name Job Salary Regist
123 Jack TA 50000 UserID Car
345 Allison TA 60000 123 Charger
567 Magda Prof 90000 567 Civic
789 Dan Prof 100000 567 Pinto
April 1, 2024 Joins 5
Recap: Outer Join
For each employee, find the cars that they drive
SELECT P.Name, R.Car Name Car
FROM Payroll AS P Jack Charger
LEFT OUTER JOIN Regist AS R
Magda Civic
ON P.UserID = R.UserID;
Magda Pinto
Allison NULL
NULL means Dan NULL
“unknown” or
“missing”

Payroll
UserID Name Job Salary Regist
123 Jack TA 50000 UserID Car
345 Allison TA 60000 123 Charger
567 Magda Prof 90000 567 Civic
789 Dan Prof 100000 567 Pinto
April 1, 2024 Joins 6
Recap: Outer Join
§ LEFT OUTER JOIN
• Add missing tuples from the LEFT

§ RIGHT OUTER JOIN


• Add missing tuples from the RIGHT

§ FULL OUTER JOIN


• Add missing tuples from both

Let’s discuss NULLs…

April 1, 2024 Joins 7


NULLs

April 1, 2024 Aggregates 8


NULLs in SQL
A NULL value means missing, or unknown, or
undefined, or inapplicable

Payroll

UserID Name Job Salary


123 Jack TA 50000
345 Allison NULL 60000
567 Magda Prof 90000
789 Dan Prof NULL
432 NULL Prof NULL
April 1, 2024 Aggregates 9
NULLs in SQL
A NULL value means missing, or unknown, or
undefined, or inapplicable
.nullvalue NULL
INSERT INTO Payroll
VALUES (123, 'Jack', 'TA', 50000),
(345, 'Allison', NULL, 60000),
(567, 'Magda', 'Prof', 90000),
(789, 'Dan', 'Prof', NULL),
(432, NULL, 'Prof', NULL);
Payroll

UserID Name Job Salary


123 Jack TA 50000
345 Allison NULL 60000
567 Magda Prof 90000
789 Dan Prof NULL
432 NULL Prof NULL
April 1, 2024 Aggregates 10
NULLs in SQL
A NULL value means missing, or unknown, or
undefined, or inapplicable
Tells Sqlite
how to print it
.nullvalue NULL
INSERT INTO Payroll
VALUES (123, 'Jack', 'TA', 50000),
(345, 'Allison', NULL, 60000),
(567, 'Magda', 'Prof', 90000),
(789, 'Dan', 'Prof', NULL),
(432, NULL, 'Prof', NULL);
Payroll

UserID Name Job Salary


123 Jack TA 50000
345 Allison NULL 60000
567 Magda Prof 90000
789 Dan Prof NULL
432 NULL Prof NULL
April 1, 2024 Aggregates 11
NULLs in SQL
A NULL value means missing, or unknown, or
undefined, or inapplicable

Complications:
§ Expressions with NULLs?
§ Conditions with NULLs?
Payroll

UserID Name Job Salary


123 Jack TA 50000
345 Allison NULL 60000
567 Magda Prof 90000
789 Dan Prof NULL
432 NULL Prof NULL
April 1, 2024 Aggregates 12
Expressions with NULLs
If any term is NULL, the entire expression is NULL

Give everyone a 10% raise

Payroll

UserID Name Job Salary


123 Jack TA 50000
345 Allison NULL 60000
567 Magda Prof 90000
789 Dan Prof NULL
432 NULL Prof NULL
April 1, 2024 Aggregates 13
Expressions with NULLs
If any term is NULL, the entire expression is NULL

Give everyone a 10% raise


SELECT Name, Salary*1.1 as NewSalary
FROM Payroll;

Payroll

UserID Name Job Salary


123 Jack TA 50000
345 Allison NULL 60000
567 Magda Prof 90000
789 Dan Prof NULL
432 NULL Prof NULL
April 1, 2024 Aggregates 14
Expressions with NULLs
If any term is NULL, the entire expression is NULL

Give everyone a 10% raise


SELECT Name, Salary*1.1 as NewSalary
FROM Payroll;

Payroll

UserID Name Job Salary Name NewSalary


123 Jack TA 50000 Jack 55000
345 Allison NULL 60000 Allison 66000
567 Magda Prof 90000 Magda 99000
789 Dan Prof NULL Dan NULL
432 NULL Prof NULL NULL NULL
April 1, 2024 Aggregates 15
Expressions with NULLs
If any term is NULL, the entire expression is NULL

Everybody works for free!


SELECT Name, Salary*0 as NewSalary
FROM Payroll; NULL*0 is not 0

Payroll

UserID Name Job Salary Name NewSalary


123 Jack TA 50000 Jack 0
345 Allison NULL 60000 Allison 0
567 Magda Prof 90000 Magda 0
789 Dan Prof NULL Dan NULL
432 NULL Prof NULL NULL NULL
April 1, 2024 Aggregates 16
Expressions with NULLs
If any term is NULL, the entire expression is NULL

Everybody works for free!


SELECT Name, 0 as NewSalary
FROM Payroll; Now it works

Payroll

UserID Name Job Salary Name NewSalary


123 Jack TA 50000 Jack 0
345 Allison NULL 60000 Allison 0
567 Magda Prof 90000 Magda 0
789 Dan Prof NULL Dan 0
432 NULL Prof NULL NULL 0
April 1, 2024 Aggregates 17
Conditions with NULLs
How should NULLs affect conditions in WHERE?

Payroll

UserID Name Job Salary


123 Jack TA 50000
345 Allison NULL 60000
567 Magda Prof 90000
789 Dan Prof NULL
432 NULL Prof NULL
April 1, 2024 Aggregates 18
Conditions with NULLs
How should NULLs affect conditions in WHERE?

SELECT Name
FROM Payroll
WHERE Job = ‘TA’;
Payroll

UserID Name Job Salary


123 Jack TA 50000
345 Allison NULL 60000
567 Magda Prof 90000
789 Dan Prof NULL
432 NULL Prof NULL
April 1, 2024 Aggregates 19
Conditions with NULLs
How should NULLs affect conditions in WHERE?

SELECT Name ??? Name Job


FROM Payroll Jack TA
WHERE Job = ‘TA’; Allison?? ???

Payroll

UserID Name Job Salary


123 Jack TA 50000
345 Allison NULL 60000
567 Magda Prof 90000
789 Dan Prof NULL
432 NULL Prof NULL
April 1, 2024 Aggregates 20
Conditions with NULLs
How should NULLs affect conditions in WHERE?

SELECT Name ??? Name Job


FROM Payroll Jack TA
WHERE Job = ‘TA’; Allison?? ???

Payroll
Not included:
UserID Name Job Salary
SQL uses
123 Jack TA 50000 3 valued logic.
345 Allison NULL 60000
567 Magda Prof 90000
789 Dan Prof NULL
432 NULL Prof NULL
April 1, 2024 Aggregates 21
Three-valued Logic
false = 0; unknown = 0.5; true = 1

x AND y = min(x,y);
x OR y = max(x,y);
not x = 1-x

What are these conditions?


§ true AND unknown =
§ true OR unknown =
§ unknown AND false =
§ unknown OR (NOT unknown) =

April 1, 2024 Aggregates 22


Three-valued Logic
false = 0; unknown = 0.5; true = 1

x AND y = min(x,y);
x OR y = max(x,y);
not x = 1-x

What are these conditions?


§ true AND unknown = min(1, 0.5) = unknown
§ true OR unknown = true
§ unknown AND false = false
§ unknown OR (NOT unknown) =

April 1, 2024 Aggregates 23


Three-valued Logic
false = 0; unknown = 0.5; true = 1

x AND y = min(x,y);
x OR y = max(x,y);
not x = 1-x

What are these conditions?


§ true AND unknown = min(1, 0.5) = unknown
§ true OR unknown = true
§ unknown AND false = false
§ unknown OR (NOT unknown) =

April 1, 2024 Aggregates 24


Three-valued Logic
false = 0; unknown = 0.5; true = 1

x AND y = min(x,y);
x OR y = max(x,y);
not x = 1-x

What are these conditions?


§ true AND unknown = unknown
§ true OR unknown = true
§ unknown AND false = false
§ unknown OR (NOT unknown) =

April 1, 2024 Aggregates 25


Three-valued Logic
false = 0; unknown = 0.5; true = 1

x AND y = min(x,y);
x OR y = max(x,y);
not x = 1-x

What are these conditions?


§ true AND unknown = unknown
§ true OR unknown = true
§ unknown AND false = false
§ unknown OR (NOT unknown) =

April 1, 2024 Aggregates 26


Three-valued Logic
false = 0; unknown = 0.5; true = 1

x AND y = min(x,y);
x OR y = max(x,y);
not x = 1-x

What are these conditions?


§ true AND unknown = unknown
§ true OR unknown = true
§ unknown AND false = false
§ unknown OR (NOT unknown) =

April 1, 2024 Aggregates 27


Three-valued Logic
false = 0; unknown = 0.5; true = 1

x AND y = min(x,y);
x OR y = max(x,y);
not x = 1-x

What are these conditions?


§ true AND unknown = unknown
§ true OR unknown = true
§ unknown AND false = false
§ unknown OR (NOT unknown) =

April 1, 2024 Aggregates 28


Three-valued Logic
false = 0; unknown = 0.5; true = 1

x AND y = min(x,y);
x OR y = max(x,y);
not x = 1-x

What are these conditions?


§ true AND unknown = unknown
§ true OR unknown = true
§ unknown AND false = false
§ unknown OR (NOT unknown) =

April 1, 2024 Aggregates 29


Three-valued Logic

What does this query return?

SELECT *
FROM Payroll
WHERE Job != ‘Prof’
or Salary > 80000;

Payroll

UserID Name Job Salary


123 Jack TA 50000 True
345 Allison NULL 60000
567 Magda Prof 90000
789 Dan Prof NULL
432 NULL Prof NULL
April 1, 2024 Aggregates 30
Three-valued Logic

What does this query return?

SELECT *
FROM Payroll
WHERE Job != ‘Prof’
or Salary > 80000;

Payroll

UserID Name Job Salary


123 Jack TA 50000 True
345 Allison NULL 60000 Unknown
567 Magda Prof 90000
789 Dan Prof NULL
432 NULL Prof NULL
April 1, 2024 Aggregates 31
Three-valued Logic

What does this query return?

SELECT *
FROM Payroll
WHERE Job != ‘Prof’
or Salary > 80000;

Payroll

UserID Name Job Salary


123 Jack TA 50000 True
345 Allison NULL 60000 Unknown
567 Magda Prof 90000 True
789 Dan Prof NULL
432 NULL Prof NULL
April 1, 2024 Aggregates 32
Three-valued Logic

What does this query return?

SELECT *
FROM Payroll
WHERE Job != ‘Prof’
or Salary > 80000;

Payroll

UserID Name Job Salary


123 Jack TA 50000 True
345 Allison NULL 60000 Unknown
567 Magda Prof 90000 True
789 Dan Prof NULL Unknown
432 NULL Prof NULL Unknown
April 1, 2024 Aggregates 33
Three-valued Logic

What does this query return?

SELECT *
FROM Payroll
WHERE Job != ‘Prof’ UserID Name Job Salary
or Salary > 80000; 123 Jack TA 50000
567 Magda Prof 90000
Payroll

UserID Name Job Salary


123 Jack TA 50000 True
345 Allison NULL 60000 Unknown
567 Magda Prof 90000 True
789 Dan Prof NULL Unknown
432 NULL Prof NULL Unknown
April 1, 2024 Aggregates 34
Three-valued Logic

NULLs are the nightmare of query optimizers

SELECT *
FROM Payroll
WHERE Job !=‘Prof’ or Job =‘Prof’;

Payroll

UserID Name Job Salary


123 Jack TA 50000
345 Allison NULL 60000
567 Magda Prof 90000
789 Dan Prof NULL
432 NULL Prof NULL
April 1, 2024 Aggregates 35
Three-valued Logic

NULLs are the nightmare of query optimizers

SELECT *
FROM Payroll
WHERE Job !=‘Prof’ or Job =‘Prof’;

Should return
Payroll everyone, but…
UserID Name Job Salary
123 Jack TA 50000
345 Allison NULL 60000
567 Magda Prof 90000
789 Dan Prof NULL
432 NULL Prof NULL
April 1, 2024 Aggregates 36
Three-valued Logic

NULLs are the nightmare of query optimizers

SELECT *
FROM Payroll
WHERE Job !=‘Prof’ or Job =‘Prof’;

Should return
Payroll everyone, but…
UserID Name Job Salary
123 Jack TA 50000
345 Allison NULL 60000 …we are missing
567 Magda Prof 90000 Allison!
789 Dan Prof NULL
432 NULL Prof NULL
April 1, 2024 Aggregates 37
Three-valued Logic

NULLs are the nightmare of query optimizers

SELECT *
FROM Payroll
WHERE Job !=‘Prof’ or Job =‘Prof’ or Job isNull;

Payroll

UserID Name Job Salary


123 Jack TA 50000
345 Allison NULL 60000 Now we get
567 Magda Prof 90000 everyone!
789 Dan Prof NULL
432 NULL Prof NULL
April 1, 2024 Aggregates 38
Outer Joins Revisited: ON v.s. WHERE
1. Perform the join with the ON clause
2. Add all missing tuples from LEFT
SELECT P.Name, R.Car 3. Check the WHERE clause (if any)
FROM Payroll AS P
LEFT OUTER JOIN Regist AS R
ON P.UserID = R.UserID
AND R.Car = ‘Charger’;

Payroll Regist
UserID Name Job Salary UserID Car
123 Jack TA 50000 123 Charger
345 Allison TA 60000 567 Civic
567 Magda Prof 90000 567 Pinto
April 1, 2024 Joins 39
Outer Joins Revisited: ON v.s. WHERE
1. Perform the join with the ON clause
2. Add all missing tuples from LEFT
SELECT P.Name, R.Car 3. Check the WHERE clause (if any)
FROM Payroll AS P Name Car
LEFT OUTER JOIN Regist AS R Steps 1,2,3 Jack Charger
ON P.UserID = R.UserID Allison NULL
AND R.Car = ‘Charger’;
Magda NULL

Payroll Regist
UserID Name Job Salary UserID Car
123 Jack TA 50000 123 Charger
345 Allison TA 60000 567 Civic
567 Magda Prof 90000 567 Pinto
April 1, 2024 Joins 40
Outer Joins Revisited: ON v.s. WHERE
1. Perform the join with the ON clause
2. Add all missing tuples from LEFT
SELECT P.Name, R.Car 3. Check the WHERE clause (if any)
FROM Payroll AS P Name Car
LEFT OUTER JOIN Regist AS R Steps 1,2,3 Jack Charger
ON P.UserID = R.UserID Allison NULL
AND R.Car = ‘Charger’;
Magda NULL

SELECT P.Name, R.Car


FROM Payroll AS P
LEFT OUTER JOIN Regist AS R What differs if we
ON P.UserID = R.UserID place R.Car=‘Charger’
WHERE R.Car = ‘Charger’; in the WHERE clause?

Payroll Regist
UserID Name Job Salary UserID Car
123 Jack TA 50000 123 Charger
345 Allison TA 60000 567 Civic
567 Magda Prof 90000 567 Pinto
April 1, 2024 Joins 41
Outer Joins Revisited: ON v.s. WHERE
1. Perform the join with the ON clause
2. Add all missing tuples from LEFT
SELECT P.Name, R.Car 3. Check the WHERE clause (if any)
FROM Payroll AS P Name Car
LEFT OUTER JOIN Regist AS R Steps 1,2,3 Jack Charger
ON P.UserID = R.UserID Allison NULL
AND R.Car = ‘Charger’;
Magda NULL

SELECT P.Name, R.Car


FROM Payroll AS P
LEFT OUTER JOIN Regist AS R
ON P.UserID = R.UserID
WHERE R.Car = ‘Charger’;

Payroll Regist
UserID Name Job Salary UserID Car
123 Jack TA 50000 123 Charger
345 Allison TA 60000 567 Civic
567 Magda Prof 90000 567 Pinto
April 1, 2024 Joins 42
Outer Joins Revisited: ON v.s. WHERE
1. Perform the join with the ON clause
2. Add all missing tuples from LEFT
SELECT P.Name, R.Car 3. Check the WHERE clause (if any)
FROM Payroll AS P Name Car
LEFT OUTER JOIN Regist AS R Steps 1,2,3 Jack Charger
ON P.UserID = R.UserID Allison NULL
AND R.Car = ‘Charger’;
Magda NULL

SELECT P.Name, R.Car Name Car


FROM Payroll AS P
Jack Charger
LEFT OUTER JOIN Regist AS R Steps 1,2
Allison NULL
ON P.UserID = R.UserID
WHERE R.Car = ‘Charger’; Magda Civic
Magda Pinto

Payroll Regist
UserID Name Job Salary UserID Car
123 Jack TA 50000 123 Charger
345 Allison TA 60000 567 Civic
567 Magda Prof 90000 567 Pinto
April 1, 2024 Joins 43
Outer Joins Revisited: ON v.s. WHERE
1. Perform the join with the ON clause
2. Add all missing tuples from LEFT
SELECT P.Name, R.Car 3. Check the WHERE clause (if any)
FROM Payroll AS P Name Car
LEFT OUTER JOIN Regist AS R Steps 1,2,3 Jack Charger
ON P.UserID = R.UserID Allison NULL
AND R.Car = ‘Charger’;
Magda NULL

SELECT P.Name, R.Car Name Car


FROM Payroll AS P
Jack Charger
LEFT OUTER JOIN Regist AS R Steps 1,2 Step 3
Allison NULL
ON P.UserID = R.UserID Name Car
WHERE R.Car = ‘Charger’; Magda Civic
Jack Charger
Magda Pinto

Payroll Regist
UserID Name Job Salary UserID Car
123 Jack TA 50000 123 Charger
345 Allison TA 60000 567 Civic
567 Magda Prof 90000 567 Pinto
April 1, 2024 Joins 44
Outer Joins Revisited: ON v.s. WHERE
1. Perform the join with the ON clause
2. Add all missing tuples from LEFT
SELECT P.Name, R.Car 3. Check the WHERE clause (if any)
FROM Payroll AS P Name Car
LEFT OUTER JOIN Regist AS R Steps 1,2,3 Jack Charger
ON P.UserID = R.UserID Allison NULL
AND R.Car = ‘Charger’;
Magda NULL

SELECT P.Name, R.Car Name Car


FROM Payroll AS P
Jack Charger
LEFT OUTER JOIN Regist AS R Steps 1,2 Step 3
Allison NULL
ON P.UserID = R.UserID Name Car
WHERE R.Car = ‘Charger’; Magda Civic
Jack Charger
Magda Pinto
ON, WHERE differ
Payroll Regist
UserID Name Job Salary UserID Car
123 Jack TA 50000 123 Charger
345 Allison TA 60000 567 Civic
567 Magda Prof 90000 567 Pinto
April 1, 2024 Joins 45
Discussion
§ NULL = missing/unknown/undefined

§ Propagates through operations

§ 3-Valued Logic

§ (A = 5) OR (A != 5) is not necessarily TRUE

§ We now understand OUTER JOINs really well!

April 1, 2024 Aggregates 46


Aggregates

April 1, 2024 Aggregates 47


Aggregates

§Aggregate: many values to one value

April 1, 2024 Aggregates 48


Aggregates

§Aggregate: many values to one value

§Aggregates in SQL:
• sum(1, 4, 3, 4) = 1+4+3+4 = 12
• max(1, 4, 3, 4) = 4
• min(1, 4, 3, 4) = 1
• count(1, 4, 3, 4) = 4
• avg(1, 4, 3, 4) = 3

April 1, 2024 Aggregates 49


Aggregates

§Aggregate: many values to one value

§Aggregates in SQL:
• sum(1, 4, 3, 4) = 1+4+3+4 = 12
• max(1, 4, 3, 4) = 4
• min(1, 4, 3, 4) = 1 The collection may
have duplicates!
• count(1, 4, 3, 4) = 4
• avg(1, 4, 3, 4) = 3

April 1, 2024 Aggregates 50


COUNT
How many records are in Payroll?

SELECT count(*) as C C
FROM Payroll; 4

Payroll
UserID Name Job Salary Regist
123 Jack TA 50000 UserID Car
345 Allison TA 60000 123 Charger
567 Magda Prof 90000 567 Civic
789 Dan Prof 100000 567 Pinto
April 1, 2024 Aggregates 51
COUNT
How many records are in Payroll?

SELECT count(*) count(*)


FROM Payroll; 4

Payroll
UserID Name Job Salary Regist
123 Jack TA 50000 UserID Car
345 Allison TA 60000 123 Charger
567 Magda Prof 90000 567 Civic
789 Dan Prof 100000 567 Pinto
April 1, 2024 Aggregates 52
COUNT
How many records are in Payroll? How many cars are in the database?

SELECT count(*) count(*) SELECT count(*) …


FROM Payroll; 4 FROM Regist; 3

Payroll
UserID Name Job Salary Regist
123 Jack TA 50000 UserID Car
345 Allison TA 60000 123 Charger
567 Magda Prof 90000 567 Civic
789 Dan Prof 100000 567 Pinto
April 1, 2024 Aggregates 53
COUNT
How many records are in Payroll? How many cars are in the database?

SELECT count(*) count(*) SELECT count(*) …


FROM Payroll; 4 FROM Regist; 3

How many TA’s are there?

SELECT count(*)

FROM Payroll
2
WHERE Job=‘TA’;
Payroll
UserID Name Job Salary Regist
123 Jack TA 50000 UserID Car
345 Allison TA 60000 123 Charger
567 Magda Prof 90000 567 Civic
789 Dan Prof 100000 567 Pinto
April 1, 2024 Aggregates 54
COUNT
How many records are in Payroll? How many cars are in the database?

SELECT count(*) count(*) SELECT count(*) …


FROM Payroll; 4 FROM Regist; 3

How many TA’s are there? How many people have salary > 55000?

SELECT count(*) SELECT count(*)


… …
FROM Payroll FROM Payroll
2 3
WHERE Job=‘TA’; WHERE Salary>55000;
Payroll
UserID Name Job Salary Regist
123 Jack TA 50000 UserID Car
345 Allison TA 60000 123 Charger
567 Magda Prof 90000 567 Civic
789 Dan Prof 100000 567 Pinto
April 1, 2024 Aggregates 55
SUM, MIN, MAX, AVG
What is the sum of all salaries?

SELECT sum(Salary) sum(…)


FROM Payroll; 300000

Payroll
UserID Name Job Salary Regist
123 Jack TA 50000 UserID Car
345 Allison TA 60000 123 Charger
567 Magda Prof 90000 567 Civic
789 Dan Prof 100000 567 Pinto
April 1, 2024 Aggregates 56
SUM, MIN, MAX, AVG
What is the sum of all salaries?

SELECT sum(Salary) sum(…)


FROM Payroll; 300000

What is the average salary?

SELECT avg(Salary) avg(…)


FROM Payroll; 75000

Payroll
UserID Name Job Salary Regist
123 Jack TA 50000 UserID Car
345 Allison TA 60000 123 Charger
567 Magda Prof 90000 567 Civic
789 Dan Prof 100000 567 Pinto
April 1, 2024 Aggregates 57
SUM, MIN, MAX, AVG
What is the sum of all salaries?

SELECT sum(Salary) sum(…)


FROM Payroll; 300000
What is the smallest salary?
What is the largest salary?
What is the average salary?

SELECT avg(Salary) avg(…)


On your own
FROM Payroll; 75000

Payroll
UserID Name Job Salary Regist
123 Jack TA 50000 UserID Car
345 Allison TA 60000 123 Charger
567 Magda Prof 90000 567 Civic
789 Dan Prof 100000 567 Pinto
April 1, 2024 Aggregates 58
Semantics
SELECT agg(attrs)
FROM ... WHERE ...;

April 1, 2024 Aggregates 59


Semantics
count or
sum or … SELECT agg(attrs)
FROM ... WHERE ...;

April 1, 2024 Aggregates 60


Semantics
count or * or Salary or …
sum or … SELECT agg(attrs)
FROM ... WHERE ...;

April 1, 2024 Aggregates 61


Semantics
count or * or Salary or …
sum or … SELECT agg(attrs)
FROM ... WHERE ...;
Step 1:
drop aggregate,
compute query SELECT attrs
FROM ... WHERE ...;

April 1, 2024 Aggregates 62


Semantics
count or * or Salary or …
sum or … SELECT agg(attrs)
FROM ... WHERE ...;
Step 1:
drop aggregate,
compute query SELECT attrs
FROM ... WHERE ...;

attrs …

April 1, 2024 Aggregates 63


Semantics
count or * or Salary or …
sum or … SELECT agg(attrs)
FROM ... WHERE ...;
Step 1:
drop aggregate,
compute query SELECT attrs
FROM ... WHERE ...;

attrs …

Step 2: agg
apply aggregate 55

April 1, 2024 Aggregates 64


COUNT
SELECT count(*)
FROM Payroll;

Payroll
UserID Name Job Salary
123 Jack TA 50000
345 Allison TA 60000
567 Magda Prof 90000
789 Dan Prof 100000
April 1, 2024 Aggregates 65
COUNT
SELECT count(*) SELECT * …
FROM Payroll; FROM Payroll; 4

Payroll
UserID Name Job Salary
123 Jack TA 50000
345 Allison TA 60000
567 Magda Prof 90000
789 Dan Prof 100000
April 1, 2024 Aggregates 66
COUNT
SELECT count(*) SELECT * …
FROM Payroll; FROM Payroll; 4

How many job are there in this institution?

SELECT count(Job)
FROM Payroll;

Payroll
UserID Name Job Salary
123 Jack TA 50000
345 Allison TA 60000
567 Magda Prof 90000
789 Dan Prof 100000
April 1, 2024 Aggregates 67
COUNT
SELECT count(*) SELECT * …
FROM Payroll; FROM Payroll; 4

How many job are there in this institution? Job


SELECT count(Job) TA
FROM Payroll; TA
Prof
Prof
SELECT Job
FROM Payroll;
Payroll
UserID Name Job Salary
123 Jack TA 50000
345 Allison TA 60000
567 Magda Prof 90000
789 Dan Prof 100000
April 1, 2024 Aggregates 68
COUNT
SELECT count(*) SELECT * …
FROM Payroll; FROM Payroll; 4

How many job are there in this institution? Job


SELECT count(Job) TA …
FROM Payroll; TA 4
Prof
Prof

Payroll
UserID Name Job Salary
123 Jack TA 50000
345 Allison TA 60000
567 Magda Prof 90000
789 Dan Prof 100000
April 1, 2024 Aggregates 69
COUNT
SELECT count(*) SELECT * …
FROM Payroll; FROM Payroll; 4

How many job are there in this institution? Job


SELECT count(Job) TA …
FROM Payroll; TA 4
Prof
Prof
WRONG!

Payroll
UserID Name Job Salary
123 Jack TA 50000
345 Allison TA 60000
567 Magda Prof 90000
789 Dan Prof 100000
April 1, 2024 Aggregates 70
COUNT
SELECT count(*) SELECT * …
FROM Payroll; FROM Payroll; 4

How many job are there in this institution? Job


SELECT count(Job) TA …
FROM Payroll; TA 4
Prof
SELECT count(DISTINCT Job) Prof
FROM Payroll;
Payroll
UserID Name Job Salary
123 Jack TA 50000
345 Allison TA 60000
567 Magda Prof 90000
789 Dan Prof 100000
April 1, 2024 Aggregates 71
COUNT
SELECT count(*) SELECT * …
FROM Payroll; FROM Payroll; 4

How many job are there in this institution? Job


SELECT count(Job) TA …
FROM Payroll; TA 4
Prof
SELECT count(DISTINCT Job) Prof
FROM Payroll;
Job
Payroll
TA
UserID Name Job Salary
Prof
123 Jack TA 50000
345 Allison TA 60000
567 Magda Prof 90000
789 Dan Prof 100000
April 1, 2024 Aggregates 72
COUNT
SELECT count(*) SELECT * …
FROM Payroll; FROM Payroll; 4

How many job are there in this institution? Job


SELECT count(Job) TA …
FROM Payroll; TA 4
Prof
SELECT count(DISTINCT Job) Prof
FROM Payroll;
Job
Payroll …
TA
UserID Name Job Salary 2
Prof
123 Jack TA 50000
345 Allison TA 60000 Correct
567 Magda Prof 90000
789 Dan Prof 100000
April 1, 2024 Aggregates 73
Aggregates and NULLs
Aggregates ignore NULLs:

§ Sum: same as 0

§ Avg: NOT the same as 0

§ Min/max: same as +∞, −∞

§ Count: doesn’t include them, but it’s more subtle

April 1, 2024 Aggregates 74


Aggregates and NULLs

SELECT sum(Salary) sum(…)


50000 + 60000 + 90000
FROM Payroll; 200000

Payroll

UserID Name Job Salary


123 Jack TA 50000
345 Allison NULL 60000
567 Magda Prof 90000
789 Dan Prof NULL
432 NULL Prof NULL
April 1, 2024 Aggregates 75
Aggregates and NULLs

SELECT sum(Salary) sum(…)


50000 + 60000 + 90000
FROM Payroll; 200000

NULLs are just ignored.


SELECT avg(Salary) avg(…) Just as you expected.
FROM Payroll; 66667
Payroll

UserID Name Job Salary


123 Jack TA 50000
345 Allison NULL 60000
567 Magda Prof 90000
789 Dan Prof NULL
432 NULL Prof NULL
April 1, 2024 Aggregates 76
Discussion: Aggregates

§ Semantics: two steps

§ NULLs are ignored

April 1, 2024 Aggregates 77


Aggregates and Joins

April 1, 2024 Aggregates 78


Aggregates and Joins

§ Joins combine records from multiple tables

§ Aggregates: many values to one value

§ Together they form a very powerful SQL tool

April 1, 2024 Aggregates 79


Aggregates and Joins
Find the average salary of people driving a Pinto

Payroll Regist
UserID Name Job Salary UserID Car
123 Jack TA 50000 123 Charger
345 Allison TA 60000 123 Pinto
567 Magda Prof 90000 567 Civic
789 Dan Prof 100000 567 Pinto
April 1, 2024 Aggregates 80
Aggregates and Joins
Find the average salary of people driving a Pinto

SELECT avg(P.Salary)
FROM Payroll P, Regist R
WHERE P.UserID = R.UserID
and R.Car = ‘Pinto’;

Payroll Regist
UserID Name Job Salary UserID Car
123 Jack TA 50000 123 Charger
345 Allison TA 60000 123 Pinto
567 Magda Prof 90000 567 Civic
789 Dan Prof 100000 567 Pinto
April 1, 2024 Aggregates 81
Aggregates and Joins
SELECT P.Salary
Find the average salary of people driving a Pinto FROM Payroll P, Regist R
...;

SELECT avg(P.Salary)
FROM Payroll P, Regist R Name Salary

WHERE P.UserID = R.UserID Jack 50000

and R.Car = ‘Pinto’; Magda 90000

Payroll Regist
UserID Name Job Salary UserID Car
123 Jack TA 50000 123 Charger
345 Allison TA 60000 123 Pinto
567 Magda Prof 90000 567 Civic
789 Dan Prof 100000 567 Pinto
April 1, 2024 Aggregates 82
Aggregates and Joins
SELECT P.Salary
Find the average salary of people driving a Pinto FROM Payroll P, Regist R
...;

SELECT avg(P.Salary)
FROM Payroll P, Regist R Name Salary

WHERE P.UserID = R.UserID Jack 50000

and R.Car = ‘Pinto’; Magda 90000

avg(…)
70000
Payroll Regist
UserID Name Job Salary UserID Car
123 Jack TA 50000 123 Charger
345 Allison TA 60000 123 Pinto
567 Magda Prof 90000 567 Civic
789 Dan Prof 100000 567 Pinto
April 1, 2024 Aggregates 83
Duplicates
§ Need to watch for duplicates introduced when we
join two tables

§ Sometimes duplicates are easy to deal with, e.g.


COUNT(DISTINCT …)

§ Sometimes they are much harder to deal with, and


we will discuss this in future lectures

April 1, 2024 Aggregates 84


Duplicates
How many people drive a car?

Payroll
UserID Name Job Salary Regist
123 Jack TA 50000 UserID Car
345 Allison TA 60000 123 Charger
567 Magda Prof 90000 567 Civic
789 Dan Prof 100000 567 Pinto
April 1, 2024 Aggregates 85
Duplicates
How many people drive a car?

SELECT count(*)
FROM Payroll P, Regist R
WHERE P.UserID = R.UserID;

Payroll
UserID Name Job Salary Regist
123 Jack TA 50000 UserID Car
345 Allison TA 60000 123 Charger
567 Magda Prof 90000 567 Civic
789 Dan Prof 100000 567 Pinto
April 1, 2024 Aggregates 86
Duplicates
How many people drive a car? SELECT *
FROM ...;
SELECT count(*)
FROM Payroll P, Regist R
WHERE P.UserID = R.UserID;
UserID Name Job Salary UserID Car
123 Jack TA 50000 123 Charger
567 Magda Prof 90000 567 Civic
567 Magda Prof 90000 567 Pinto

Payroll
UserID Name Job Salary Regist
123 Jack TA 50000 UserID Car
345 Allison TA 60000 123 Charger
567 Magda Prof 90000 567 Civic
789 Dan Prof 100000 567 Pinto
April 1, 2024 Aggregates 87
Duplicates
How many people drive a car? SELECT *
FROM ...;
SELECT count(*)
FROM Payroll P, Regist R
WHERE P.UserID = R.UserID;
UserID Name Job Salary UserID Car
count(*) 123 Jack TA 50000 123 Charger
Wrong! 3 567 Magda Prof 90000 567 Civic
567 Magda Prof 90000 567 Pinto

Payroll
UserID Name Job Salary Regist
123 Jack TA 50000 UserID Car
345 Allison TA 60000 123 Charger
567 Magda Prof 90000 567 Civic
789 Dan Prof 100000 567 Pinto
April 1, 2024 Aggregates 88
Duplicates
How many people drive a car?

SELECT count(DISTINCT UserID)


FROM Payroll P, Regist R
WHERE P.UserID = R.UserID;

Payroll
UserID Name Job Salary Regist
123 Jack TA 50000 UserID Car
345 Allison TA 60000 123 Charger
567 Magda Prof 90000 567 Civic
789 Dan Prof 100000 567 Pinto
April 1, 2024 Aggregates 89
Duplicates
How many people drive a car? SELECT DISTINCT UserID
FROM ...;
SELECT count(DISTINCT UserID)
FROM Payroll P, Regist R
WHERE P.UserID = R.UserID;

UserID
123
567

Payroll
UserID Name Job Salary Regist
123 Jack TA 50000 UserID Car
345 Allison TA 60000 123 Charger
567 Magda Prof 90000 567 Civic
789 Dan Prof 100000 567 Pinto
April 1, 2024 Aggregates 90
Duplicates
How many people drive a car? SELECT DISTINCT UserID
FROM ...;
SELECT count(DISTINCT UserID)
FROM Payroll P, Regist R
WHERE P.UserID = R.UserID;

UserID
count(*)
123
Right! 2
567

Payroll
UserID Name Job Salary Regist
123 Jack TA 50000 UserID Car
345 Allison TA 60000 123 Charger
567 Magda Prof 90000 567 Civic
789 Dan Prof 100000 567 Pinto
April 1, 2024 Aggregates 91
Duplicates
What is the average salary of car drivers?

SELECT avg(P.Salary)
FROM Payroll P, Regist R
WHERE P.UserID = R.UserID;

Payroll
UserID Name Job Salary Regist
123 Jack TA 50000 UserID Car
345 Allison TA 60000 123 Charger
567 Magda Prof 90000 567 Civic
789 Dan Prof 100000 567 Pinto
April 1, 2024 Aggregates 92
Duplicates
What is the average salary of car drivers? SELECT P.Salary
FROM ...;
SELECT avg(P.Salary)
FROM Payroll P, Regist R
WHERE P.UserID = R.UserID;
Name Salary
Jack 50000
Magda 90000
Magda 90000

Payroll
UserID Name Job Salary Regist
123 Jack TA 50000 UserID Car
345 Allison TA 60000 123 Charger
567 Magda Prof 90000 567 Civic
789 Dan Prof 100000 567 Pinto
April 1, 2024 Aggregates 93
Duplicates
What is the average salary of car drivers? SELECT P.Salary
FROM ...;
SELECT avg(P.Salary)
FROM Payroll P, Regist R
WHERE P.UserID = R.UserID;
Name Salary
avg(…) Jack 50000
76667 Magda 90000
Magda 90000

Payroll
UserID Name Job Salary Regist
123 Jack TA 50000 UserID Car
345 Allison TA 60000 123 Charger
567 Magda Prof 90000 567 Civic
789 Dan Prof 100000 567 Pinto
April 1, 2024 Aggregates 94
Duplicates
What is the average salary of car drivers? SELECT P.Salary
FROM ...;
SELECT avg(P.Salary)
FROM Payroll P, Regist R
WHERE P.UserID = R.UserID;
Name Salary
avg(…) Jack 50000
Wrong! 76667 Magda 90000 Duplicate!
Magda 90000

Payroll
UserID Name Job Salary Regist
123 Jack TA 50000 UserID Car
345 Allison TA 60000 123 Charger
567 Magda Prof 90000 567 Civic
789 Dan Prof 100000 567 Pinto
April 1, 2024 Aggregates 95
Duplicates
What is the average salary of car drivers?

SELECT avg(DISTINCT P.Salary)


FROM Payroll P, Regist R
WHERE P.UserID = R.UserID;

Does DISTINCT fix it?

Payroll
UserID Name Job Salary Regist
123 Jack TA 50000 UserID Car
345 Allison TA 60000 123 Charger
567 Magda Prof 90000 567 Civic
789 Dan Prof 100000 567 Pinto
April 1, 2024 Aggregates 96
Duplicates
What is the average salary of car drivers? SELECT DISTINCT P.Salary
FROM ...;
SELECT avg(DISTINCT P.Salary)
FROM Payroll P, Regist R
WHERE P.UserID = R.UserID;
Salary
avg(…)
50000
Does DISTINCT fix it? 70000
90000

Wrong!

Payroll Regist
UserID Name Job Salary UserID Car
123 Jack TA 50000 123 Charger
345 Allison TA 50000 345 Tesla
567 Magda Prof 90000 567 Civic
789 Dan Prof 100000 567 Pinto
April 1, 2024 Aggregates 97
Duplicates
What is the average salary of car drivers? SELECT DISTINCT P.Salary
FROM ...;
SELECT avg(DISTINCT P.Salary)
FROM Payroll P, RegistThis
R query is harder to fix.
WHERE P.UserID = R.UserID;
We will discuss it on Friday
Salary
avg(…)
50000
Does DISTINCT fix it? 70000
90000

Wrong!
Correct answer:
63333
Payroll Regist
UserID Name Job Salary UserID Car
123 Jack TA 50000 123 Charger
345 Allison TA 50000 345 Tesla
567 Magda Prof 90000 567 Civic
789 Dan Prof 100000 567 Pinto
April 1, 2024 Aggregates 98
Summary for Today
§ NULLs:
• Once a NULL, always a NULL
• 3-Valued Logic (3VL)
• Outer-joins revisited

§ Aggregates
• sum, min, max, count, avg
• Two steps semantics
• Subtle interactions with joins, duplicates, nulls

April 1, 2024 Aggregates 99

You might also like