0% found this document useful (0 votes)
49 views

SQL Advanced

M 3.33 F 3.5 The document discusses SQL aggregation queries using the GROUP BY clause. It provides examples of using aggregation functions like COUNT, SUM, AVG, MAX, MIN with a simple SELECT statement and with a GROUP BY clause to aggregate results by different attributes like gender or age. The GROUP BY clause groups the rows based on the specified columns before applying aggregation functions.

Uploaded by

Kallan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views

SQL Advanced

M 3.33 F 3.5 The document discusses SQL aggregation queries using the GROUP BY clause. It provides examples of using aggregation functions like COUNT, SUM, AVG, MAX, MIN with a simple SELECT statement and with a GROUP BY clause to aggregate results by different attributes like gender or age. The GROUP BY clause groups the rows based on the specified columns before applying aggregation functions.

Uploaded by

Kallan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 60

CMPT 354:

Database System I
Lecture 4. SQL Advanced

Sept 20 /18

1
Announcements!
• A1 is due today

• A2 is released (due in 2 weeks)

2
Outline
• Joins
• Inner Join
• Outer Join
• Aggregation Queries
• Simple Aggregations
• Group By
• Having
• Discussion

3
Joins: Recap
Student Enroll
name gpa stdName course
Mary 3.8 Mary 354
Tom 3.6 Tom 354
Jack 3.7 Tom 454
Bob 354

SELECT name, course


FROM Student, Enroll
WHERE name = stdName

name gpa
Mary 354
Tom 354
Tom 454 4
Two equivalent ways to write
joins
SELECT name, course
FROM Student, Enroll
WHERE name = stdName

SELECT name, course


FROM Student JOIN Enroll ON name =
stdName

5
Join Types
SELECT name, course
FROM Student INNER JOIN Enroll ON
name = stdName

SELECT name, course


FROM Student FULL OUTER JOIN Enroll ON
name = stdName

SELECT name
FROM Student LEFT OUTER JOIN Enroll ON
name = stdName

SELECT name
FROM Student RIGHT OUTER JOIN Enroll ON
name = stdName 6
Join Types
SELECT name, course
FROM Student INNER JOIN Enroll ON
name = stdName

SELECT name, course


FROM Student FULL OUTER JOIN Enroll ON
name = stdName

SELECT name
FROM Student LEFT OUTER JOIN Enroll ON
name = stdName

SELECT name
FROM Student RIGHT OUTER JOIN Enroll ON
name = stdName 7
Left Join
Student Enroll
name gpa stdName course
Mary 3.8 Mary 354
Tom 3.6 Tom 354
Jack 3.7 Tom 454
Bob 354

SELECT name, course


FROM Student JOIN
LEFTEnroll
JOIN ON
Enroll ON
name = stdName

We want to include all students no matter


whether they enroll a course or not. How?
8
SELECT name, course
FROM Student LEFT JOIN Enroll ON
name = stdName
Student Enroll
name gpa stdName course
Mary 3.8 Mary 354
Tom 3.6 Tom 354
Jack 3.7 Tom 454
Bob 354

Output
name course
Mary 354
Tom 354
Tom 454
Jack NULL

9
SELECT name, course
FROM Student RIGHT JOIN Enroll ON
name = stdName
Student Enroll
name gpa stdName course
Mary 3.8 Mary 354
Tom 3.6 Tom 354
Jack 3.7 Tom 454
Bob 354

Output
name course
Mary 354
Tom 354
Tom 454
NULL 354
10
SELECT name, course
FROM Enroll FULL JOIN Student ON
name = stdName
Enroll Student
stdName course name gpa
Mary 354 Mary 3.8
Tom 354 Tom 3.6
Tom 454 Jack 3.7
Bob 354

Output
name course
Mary 354
Tom 354
Tom 454
Jack NULL
NULL 354 11
Outer Join
TableA (LEFT/RIGHT/FULL) JOIN TableB

• Left outer join:


• Include tuples from tableA even if no match
• Right outer join:
• Include tuples from tableA even if no match
• Full outer join:
• Include tuples from both even if no match

12
Exercise - 1
SELECT name, course
FROM Student LEFT JOIN Enroll ON
name = stdName AND course = 354

Student Enroll
name gpa stdName course
Mary 3.8 Mary 354
Tom 3.6 Tom 354
Jack 3.7 Tom 454
Bob 354

name course
A Mary 354 name course
Tom 354 Mary 354
Jack NULL Tom 354
13
(A) (B)
Exercise - 2
SELECT name, course
FROM Student LEFT JOIN Enroll ON
name = stdName
WHERE course = 354
Student Enroll
name gpa stdName course
Mary 3.8 Mary 354
Tom 3.6 Tom 354
Jack 3.7 Tom 454
Bob 354

Select + From name course


Then add the
Where condition Mary 354 name course
Will remove the Tom 354 Mary 354
Jack row from A, Jack NULL Tom 354
so answer is B. 14
(A) (B)
Outline
• Joins
• Inner Join
• Outer Join
• Aggregation Queries
• Simple Aggregations
• Group By
• Having
• Discussion

15
Simple Aggregation

SELECT agg(column)
FROM <table name>
WHERE <conditions>

agg = COUNT, SUM, AVG, MAX, MIN, etc.

Except count, all aggregations apply to a single attribute


16
Examples
name gender gpa
Bob M 3
Mike M 3
Alice F 3
Mary F 4
Tom M 4

SELECT COUNT(*) FROM Student 5


SELECT SUM(gpa) FROM Student 17

SELECT AVG(gpa) FROM Student 3.4


SELECT MIN(gpa) FROM Student 3

SELECT MAX(gpa) FROM Student 4 17


Examples
name gender gpa
Bob M 3
Mike M 3
Alice F 3
Mary F 4
Tom M 4

SELECT COUNT(DISTINCT gpa) FROM Student 2

SELECT SUM(DISTINCT gpa) FROM Student 7

SELECT AVG(gpa) FROM Student


WHERE gender = ‘F’ 3.5
18
The need for Group By
• How to get AVG(gpa) for each gender?
SELECT AVG(gpa) FROM Student WHERE gender = ‘M’

SELECT AVG(gpa) FROM Student WHERE gender = ‘F’

• How to get AVG(gpa) for each age?


SELECT AVG(gpa) FROM Student WHERE age = 18

SELECT AVG(gpa) FROM Student WHERE age = 19

SELECT AVG(gpa) FROM Student WHERE age = 20


.
.
.
19
Grouping and Aggregation
SELECT agg(column)
FROM <table name>
WHERE <conditions>
GROUP BY <columns>

• How to get AVG(gpa) for each gender?


SELECT AVG(gpa) FROM Student GROUP BY gender

• How to get AVG(gpa) for each age?

SELECT AVG(gpa) FROM Student GROUP BY age


20
Grouping and Aggregation
• How is the following query processed?
SELECT gender, AVG(gpa)
FROM Student
WHERE gpa > 2.5
GROUP BY gender

• Semantics of the query


1. Compute the FROM and WHERE clauses
2. Group by the attributes in the GROUP BY
3. Compute the SELECT clause: grouped attributes and
aggregates
21
1. Compute the FROM and WHERE clauses

SELECT gender, AVG(gpa)


FROM Student
WHERE gpa > 2.5
GROUP BY gender

name gender gpa


Bob M 2 name gender gpa
Mike M 3 Mike M 3
Alice F 3 Alice F 3
Mary F 4 Mary F 4
Tom M 3 Tom M 3
22
2. Group by the attributes in the GROUP BY

SELECT gender, AVG(gpa)


FROM Student
WHERE gpa > 2.5
GROUP BY gender

name gender gpa gender name gpa


Mike M 3 Mike 3
Alice F 3 M
Tom 3
Mary F 4 Alice 3
Tom M 3 F
Mary 4
23
3. Compute the SELECT clause: grouped
attributes and aggregates
SELECT gender, AVG(gpa)
FROM Student
WHERE gpa > 2.5
GROUP BY gender

gender name gpa


gender AVG(gpa)
Mike 3
M M 3
Tom 3
F 3.5
Alice 3
F
Mary 4
24
Exercise: Empty Group
name gender gpa
Bob M 3 SELECT gender, AVG(gpa)
Mike M 3 FROM Student
Alice F 4 WHERE gpa > 3.5
Mary F 4 GROUP BY gender
Tom M 3

gender AVG(gpa)
gender AVG(gpa)
VS F 4
F 4
M NULL

(A) (B) 25
Exercise: Empty Group
name gender gpa
Bob M 3 SELECT gender, AVG(gpa)
Mike M 3 FROM Student
Alice F 4 WHERE gpa > 3.5
Mary F 4 GROUP BY gender
Tom M 3

name gender gpa


Alice F 4
Mary F 4

26
Exercise: Empty Group
name gender gpa
Bob M 3 SELECT gender, AVG(gpa)
Mike M 3 FROM Student
Alice F 4 WHERE gpa > 3.5
Mary F 4
GROUP BY gender
Tom M 3

name gender gpa gender name gpa


Alice F 4 Alice 4
F
Mary F 4 Mary 4

27
Exercise: Empty Group
name gender gpa
Bob M 3 SELECT gender, AVG(gpa)
Mike M 3 FROM Student
Alice F 4 WHERE gpa > 3.5
Mary F 4 GROUP BY gender
Tom M 3

name gender gpa gender name gpa


gender AVG(gpa)
Alice F 4 Alice 4
F F 4
Mary F 4 Mary 4

28
Exercise: Invalid Selection
name gender gpa
Bob M 3 SELECT gender, AVG(gpa), name
Mike M 3 FROM Student
Alice F 4 WHERE gpa > 3.5
Mary F 4 GROUP BY gender
Tom M 3

gender AVG(gpa) name gender AVG(gpa) name


VS
F 4 Alice F 4 Mary

(A) (B) 29
Exercise: Invalid Selection
name gender gpa
Bob M 3 SELECT gender, AVG(gpa), name
Mike M 3 FROM Student
Alice F 4 WHERE gpa > 3.5
Mary F 4 GROUP BY gender
Tom M 3

name gender gpa


Alice F 4
Mary F 4

30
Exercise: Invalid Selection
name gender gpa
Bob M 3 SELECT gender, AVG(gpa), name
Mike M 3 FROM Student
Alice F 4 WHERE gpa > 3.5
Mary F 4 GROUP BY gender
Tom M 3

name gender gpa gender name gpa


Alice F 4 Alice 4
F
Mary F 4 Mary 4

31
Exercise: Invalid Selection
Everything in SELECT must be either a
GROUP-BY attribute, or an aggregate
name gender gpa
Bob M 3 SELECT gender, AVG(gpa), name
Mike M 3 FROM Student
Alice F 4 WHERE gpa > 3.5
Mary F 4 GROUP BY gender
Tom M 3

Some database will report invalid selection,


Some will pick randomly from Alice and Mary and display that one
name gender gpa gender name gpa
Alice F 4 gender AVG(gpa) name
Alice 4
F F 4 ???
Mary F 4 Mary 4

32
HAVING Clause
• Specify which groups you are interested in

SELECT agg(column)
FROM <table name>
WHERE <conditions>
GROUP BY <columns>
HAVING <columns>

33
HAVING Clause
• Same query as before, except that we require each
group has more than 10 students

SELECT AVG(gpa), gender


FROM Student
WHERE gpa > 2.5
GROUP BY gender
HAVING COUNT(*) > 10

HAVING clause contains conditions on aggregates.


34
Order of Evaluation
SELECT S
FROM R1,…,Rn
WHERE C1
GROUP BY a1,…,ak
HAVING C2

• Create the cross product of the tables in the FROM clause


• Remove rows not meeting the WHERE condition
• Divide records into groups by the GROUP BY clause
• Remove groups not meeting the HAVING clause
• Create one row for each group and remove columns not in
the SELECT clause

35
Exercise
StudentInfo
name gender gpa
Bob M 3
Mike M 3
Alice F 4
Mary F 4
Tom M 3

SELECT gender, AVG(gpa) SELECT gender, AVG(gpa)


FROM StudentInfo FROM StudentInfo
WHERE gpa > 2.5 WHERE gpa > 2.5
GROUP BY gender GROUP BY gender
HAVING COUNT(*) > 2 A HAVING SUM(gpa) < 9 B

gender AVG(gpa) gender AVG(gpa) gender AVG(gpa)

M 3 F 4 M 3
F 4
36
(A) (B) (C)
Imagine you are a data scientist
at a Bank

37
Computer Science vs. Data Science

What When Who Goal


Computer 1950- Software Engineer Write software to make computers work
Science

Plan  Design  Develop  Test  Deploy  Maintain

What When Who Goal


Data 2010- Data Scientist Extract insights from data to answer questions
Science

Collect  Clean  Integrate  Analyze  Visualize  Communicate

38
Discussion

Q1. Who is the richest customer?


Q2. Which customers have ONLY one account?

39
Discussion

Q3. How many customers does each branch have?


Q4. Which branch have a higher pay?

40
Outline
• Joins
• Inner Join
• Outer Join
• Self Join
• Aggregation Queries
• Simple Aggregations
• Group By
• Having
• Subqueries
• In the FROM clause
• In the WHERE clause
41
Subqueries
• A subquery is a SQL query nested inside a larger query
• Such inner-outer queries are called nested queries

SELECT C.customerID, C.birthDate, C.income


Outer Query
FROM Customer C
WHERE C.customerID IN
(
SELECT O.customerID
FROM Account A, Owns A Inner Query
WHERE A.accNumber = O.accNumber
AND A.branchName = 'Lonsdale’
) 42
Subqueries
• Subqueries may appear in
• A FROM clause,
• A WHERE clause, and
• A HAVING clause
SELECT <columns>
FROM <table name>
WHERE <conditions>
GROUP BY <columns>
HAVING <columns>

43
Subqueries in FROM
• Sometimes we need to compute an intermediate
table only to use it later in a SELECT-FROM-WHERE
• Who is the richest customer?

SELECT firstName, lastName, MAX(sumBalance)


FROM (SELECT firstName, lastName, sum(balance) AS sumBalance
FROM Customer C, Account A, Owns O
WHERE C.customerID = O.customerID
AND O.accNumber = A.accNumber
GROUP BY C.customerID )

44
Subqueries in FROM
• Sometimes we need to compute an intermediate
table only to use it later in a SELECT-FROM-WHERE
• Which customers have a total balance equal to 0?

SELECT firstName, lastName, sumBalance


FROM (SELECT firstName, lastName, sum(balance) AS sumBalance
FROM Customer C, Account A, Owns O
WHERE C.customerID = O.customerID
AND O.accNumber = A.accNumber
GROUP BY C.customerID) AS T
WHERE T. sumBalance = 0
45
Subqueries in FROM
• Sometimes we need to compute an intermediate
table only to use it later in a SELECT-FROM-WHERE
• Which customers have a total balance equal to 0?

SELECT firstName, lastName, sum(balance) AS sumBalance


FROM Customer C, Account A, Owns O
WHERE C.customerID = O.customerID AND O.accNumber = A.accNumber
GROUP BY C.customerID
HAVING sumBalance = 0

Rule of thumb: avoid nested queries when possible


46
Subqueries in WHERE
• Subqueries return a single constant
• >, <, =, <>, >=, <=
• Find the customerIDs of customers whose income is
larger than avg(income)

SELECT C.customerID
FROM Customer C1
WHERE C1.income > (SELECT avg(C2.income)
FROM Customer C2)

47
Subqueries in WHERE
• Subqueries return a relation
• IN
• NOT IN
• EXISTS
• NOT EXISTS
• ANY
• ALL

48
Accounts IN Burnaby
• Find the customerIDs of customers with an account at
the Burnaby branch

SELECT C.customerID
FROM Customer C
WHERE C.customerID IN (SELECT O.customerID
FROM Account A, Owns O
WHERE A.accNumber = O.accNumber
AND A.branchName = ’Burnaby’)

49
Accounts NOT IN Burnaby
• Find the customerIDs of customers who do not have an
account at the Burnaby branch

SELECT C.customerID
FROM Customer C
WHERE C.customerID NOT IN(
SELECT O.customerID
FROM Account A, Owns O
WHERE A.accNumber = O.accNumber AND
A.branchName = ’Burnaby’)

50
Uncorrelated Queries
• The query shown previously contains an uncorrelated,
or independent, sub-query
• The sub-query does not contain references to attributes of
the outer query

• An independent sub-query can be evaluated before


evaluation of the outer query
• And needs to be evaluated only once
• The sub-query result can be checked for each row of the outer query
• The cost is the cost for performing the sub-query (once) and
the cost of scanning the outer relation
51
EXISTing BurnabyAccounts
• Find the customerIDs of customers with an account at
the Burnaby branch
SELECT C.customerID
FROM Customer C
WHERE EXISTS ( SELECT *
FROM Account A, Owns O
WHERE C.customerID = O.customerID
AND A.accNumber = O.accNumber
AND A.branchName = ’Burnaby’)

EXISTS and NOT EXISTS test whether the associated


sub-query is non-empty or empty
52
Correlated Queries
• The previous query contained a correlated sub-
query
• With references to attributes of the outer query
• … WHERE C.customerID = O.customerID …
• It is evaluated once for each row in the outer query
• i.e. for each row in the Customer table
• Correlated queries are often inefficient

53
EXISTing BurnabyAccounts
• Find the customerIDs of customers with an account at
the Burnaby branch

SELECT DISTINCT C.customerID


FROM Customer C, Account A, Owns O
WHERE C.customerID = A.accNumber
AND A.accNumber = O.customerID
AND A.branchName = ’Burnaby’

54
Have an account in all branches
• Find the customerIDs of customers who have an
account in all branches
SQ1 – A list of all SQ2 – A list of
branch names branch names that
a customer has an
account at
EXCEPT

If the customer has an account at every branch then


this result is empty
55
Have an account in all branches
• Putting it all together we have

SELECT C.customerID
FROM Customer C
WHERE NOT EXISTS ( (SELECT B.branchName
FROM Branch B)
EXCEPT
(SELECT A.branchName
FROM Account A, Owns O
WHERE O.customerID = C.customerID
AND O.accNumber = A.accNumber))
56
ANYone Richer Than Bruce
• Find the customerIDs of customers who earn more
than some customer called Bruce

SELECT C.customerID
FROM Customer C
WHERE C.income > ANY
(SELECT Bruce.income
FROM Customer Bruce
WHERE Bruce.firstName = 'Bruce')

Customers in the result table must have incomes greater


than at least one of the rows in the sub-query result 57
Richer Than ALL the Bruces
• Find the customerIDs of customers who earn more
than all customer called Bruce
SELECT C.customerID
FROM Customer C
WHERE C.income > ALL
(SELECT Bruce.income
FROM Customer Bruce
WHERE Bruce.firstName = 'Bruce')

If there were no customers called Bruce this query would


return all customers 58
Summary
• Selection
• Projection
• Set Operators (UNION, INTERSECT, EXCEPT)
• Joins (INNER, OUTER)
• Aggregation
• Group By
• Having
• Order By
• Distinct
• Subqueries
59
SQL operators can be composed just like building LEGO buildings
Acknowledge
• Some lecture slides were copied from or inspired by the
following course materials
• “W4111: Introduction to databases” by Eugene Wu at
Columbia University
• “CSE344: Introduction to Data Management” by Dan Suciu at
University of Washington
• “CMPT354: Database System I” by John Edgar at Simon Fraser
University
• “CS186: Introduction to Database Systems” by Joe Hellerstein
at UC Berkeley
• “CS145: Introduction to Databases” by Peter Bailis at Stanford
• “CS 348: Introduction to Database Management” by Grant
Weddell at University of Waterloo
60

You might also like