Correlated Queries, Nested Queries, and Common Table Expressions
Correlated Queries, Nested Queries, and Common Table Expressions
Subqueries
I N T E R M E D I AT E S Q L
Mona Khalil
Data Scientist, Greenhouse Software
Correlated subquery
Uses values from the outer query to generate a result
Correlated subqueries are subqueries that reference one or more columns in the
main query. Correlated subqueries depend on information in the main query to run,
and thus, cannot be executed on their own.
Correlated subqueries are evaluated in SQL once per row of data retrieved -- a
process that takes a lot more computing power and time than a simple subquery.
INTERMEDIATE SQL
A simple example
Which match stages tend to have a higher than average number of goals
scored?
SELECT
s.stage,
ROUND(s.avg_goals,2) AS avg_goal,
(SELECT AVG(home_goal + away_goal) FROM match
WHERE season = '2012/2013') AS overall_avg
FROM
(SELECT
stage,
AVG(home_goal + away_goal) AS avg_goals
FROM match
WHERE season = '2012/2013'
GROUP BY stage) AS s
WHERE s.avg_goals > (SELECT AVG(home_goal + away_goal)
FROM match
WHERE season = '2012/2013');
INTERMEDIATE SQL
A simple example
Which match stages tend to have a higher than average number of goals
scored?
SELECT
s.stage,
ROUND(s.avg_goals,2) AS avg_goal,
(SELECT AVG(home_goal + away_goal)
FROM match
WHERE season = '2012/2013') AS overall_avg
FROM (SELECT
stage,
AVG(home_goal + away_goal) AS avg_goals
FROM match
WHERE season = '2012/2013'
GROUP BY stage) AS s -- Subquery in FROM
WHERE s.avg_goals > (SELECT AVG(home_goal + away_goal)
FROM match
WHERE season = '2012/2013'); -- Subquery in WHERE
INTERMEDIATE SQL
A correlated example
SELECT
s.stage,
ROUND(s.avg_goals,2) AS avg_goal,
(SELECT AVG(home_goal + away_goal)
FROM match
WHERE season = '2012/2013') AS overall_avg
FROM
(SELECT
stage,
AVG(home_goal + away_goal) AS avg_goals
FROM match
WHERE season = '2012/2013'
GROUP BY stage) AS s
WHERE s.avg_goals > (SELECT AVG(home_goal + away_goal)
FROM match AS m
WHERE s.stage > m.stage); filters for data where the outer table's match stage, pulled from the subquery in
from, is higher tha the overall average generated in the where subquery
INTERMEDIATE SQL
A correlated example
| stage | avg_goals |
|-------|-----------|
| 3 | 2.83 |
| 4 | 2.8 |
| 6 | 2.78 |
| 8 | 3.09 |
| 10 | 2.96 |
INTERMEDIATE SQL
Simple vs. correlated subqueries
Simple Subquery Correlated Subquery
INTERMEDIATE SQL
Correlated subqueries
What is the average number of | country | avg_goals
INTERMEDIATE SQL
Correlated subqueries
What is the average number of | country | avg_goals
INTERMEDIATE SQL
In the previous exercise, you generated a list of matches with extremely high scores for each country. In this exercise, you're going to add an additional column for matching to answer the question -- what
was the highest scoring match for each country, in each season?
SELECT
-- Select country ID, date, home, and away goals from match
main.country_id,
main.date,
main.home_goal,
main.away_goal
FROM match AS main
WHERE
-- Filter for matches with the highest number of goals scored
(home_goal + away_goal) =
(SELECT max(sub.home_goal + sub.away_goal)
FROM match AS sub
WHERE main.country_id = sub.country_id
Let's practice!
AND main.season = sub.season);
I N T E R M E D I AT E S Q L
Nested Subqueries
I N T E R M E D I AT E S Q L
Mona Khalil
Data Scientist, Greenhouse Software
Nested subqueries??
Subquery inside another subquery
INTERMEDIATE SQL
A subquery...
How much did each country's average differ from the overall average?
SELECT
c.name AS country,
AVG(m.home_goal + m.away_goal) AS avg_goals,
AVG(m.home_goal + m.away_goal) -
(SELECT AVG(home_goal + away_goal)
FROM match) AS avg_diff
FROM country AS c
LEFT JOIN match AS m
ON c.id = m.country_id
GROUP BY country;
INTERMEDIATE SQL
A subquery...
| country | avg_goals | avg_diff |
|-------------|-----------|----------|
| Belgium | 2.8015 | 0.096 |
| England | 2.7105 | 0.005 |
| France | 2.4431 | -0.2624 |
| Germany | 2.9016 | 0.196 |
| Italy | 2.6168 | -0.0887 |
| Netherlands | 3.0809 | 0.3754 |
| Poland | 2.425 | -0.2805 |
| Portugal | 2.5346 | -0.1709 |
| Scotland | 2.6338 | -0.0718 |
| Spain | 2.7671 | 0.0616 |
| Switzerland | 2.9297 | 0.2241 |
INTERMEDIATE SQL
...inside a subquery!
How does each month's total goals differ from the average monthly
total of goals scored?
SELECT
EXTRACT(MONTH FROM date) AS month,
SUM(m.home_goal + m.away_goal) AS total_goals,
SUM(m.home_goal + m.away_goal) -
(SELECT AVG(goals)
FROM (SELECT
EXTRACT(MONTH FROM date) AS month,
SUM(home_goal + away_goal) AS goals
FROM match
GROUP BY month)) AS avg_diff
FROM match AS m
GROUP BY month;
INTERMEDIATE SQL
Inner subquery
SELECT
EXTRACT(MONTH from date) AS month,
SUM(home_goal + away_goal) AS goals
FROM match
GROUP BY month;
| month | goals |
|-------|-------|
| 01 | 2988 |
| 02 | 3768 |
| 03 | 3936 |
| 04 | 4055 |
| 05 | 2719 |
| 06 | 84 |
| 07 | 366 |
INTERMEDIATE SQL
Outer subquery
SELECT AVG(goals)
FROM (SELECT
EXTRACT(MONTH from date) AS month,
AVG(home_goal + away_goal) AS goals
FROM match
GROUP BY month) AS s;
2944.75
INTERMEDIATE SQL
Final query
SELECT
EXTRACT(MONTH FROM date) AS month,
SUM(m.home_goal + m.away_goal) AS total_goals,
SUM(m.home_goal + m.away_goal) -
(SELECT AVG(goals)
FROM (SELECT
EXTRACT(MONTH FROM date) AS month,
SUM(home_goal + away_goal) AS goals
FROM match
GROUP BY month) AS s) AS diff
FROM match AS m
GROUP BY month;
INTERMEDIATE SQL
Correlated nested subqueries
Nested subqueries can be correlated or uncorrelated
Or...a combination of the two
INTERMEDIATE SQL
Correlated nested subqueries
What is the each country's average goals scored in the 2011/2012
season?
SELECT
c.name AS country,
(SELECT AVG(home_goal + away_goal)
FROM match AS m
WHERE m.country_id = c.id
AND id IN (
SELECT id
FROM match
WHERE season = '2011/2012')) AS avg_goals
FROM country AS c
GROUP BY country;
INTERMEDIATE SQL
Correlated nested subqueries
What is the each country's average goals scored in the 2011/2012
season?
SELECT
c.name AS country,
(SELECT AVG(home_goal + away_goal)
FROM match AS m
WHERE m.country_id = c.id
AND id IN (
SELECT id -- Begin inner subquery
FROM match
WHERE season = '2011/2012')) AS avg_goals
FROM country AS c
GROUP BY country;
INTERMEDIATE SQL
Correlated nested subquery
What is the each country's average goals scored in the 2011/2012
season?
SELECT
c.name AS country,
(SELECT AVG(home_goal + away_goal)
FROM match AS m
WHERE m.country_id = c.id -- Correlates with main query
AND id IN (
SELECT id -- Begin inner subquery
FROM match
WHERE season = '2011/2012')) AS avg_goals
FROM country AS c
GROUP BY country;
INTERMEDIATE SQL
Correlated nested subqueries
| country | avg_goals |
|-------------|------------------|
| Belgium | 2.87916666666667 |
| England | 2.80526315789474 |
| France | 2.51578947368421 |
| Germany | 2.85947712418301 |
| Italy | 2.58379888268156 |
| Netherlands | 3.25816993464052 |
| Poland | 2.19583333333333 |
| Portugal | 2.64166666666667 |
| Scotland | 2.6359649122807 |
| Spain | 2.76315789473684 |
| Switzerland | 2.62345679012346 |
INTERMEDIATE SQL
What's the average number of matches per season where a team scored 5 or more goals? How does this differ by country?
SELECT
c.name AS country,
-- Calculate the average matches per season
AVG(outer_s.matches) AS avg_seasonal_high_scores
FROM country AS c
-- Left join outer_s to country
LEFT JOIN (
SELECT country_id, season,
COUNT(id) AS matches
FROM (
SELECT country_id, season, id
FROM match
WHERE home_goal >= 5 OR away_goal >= 5) AS inner_s
-- Close parentheses and alias the subquery
Let's practice!
GROUP BY country_id, season) AS outer_s
ON c.id = outer_s.country_id
GROUP BY country;
I N T E R M E D I AT E S Q L
Common Table
Expressions
I N T E R M E D I AT E S Q L
Mona Khalil
Data Scientist, Greenhouse Software
When adding subqueries...
Query complexity increases quickly!
Information can be dif cult to keep track of
INTERMEDIATE SQL
Common Table Expressions
Common Table Expressions Setting up CTEs
(CTEs)
WITH cte AS (
Table declared before the SELECT col1, col2
main query FROM table)
Named and referenced later SELECT
in FROM statement AVG(col1) AS avg_col
FROM cte;
INTERMEDIATE SQL
Take a subquery in FROM
SELECT
c.name AS country,
COUNT(s.id) AS matches
FROM country AS c
INNER JOIN (
SELECT country_id, id
FROM match
WHERE (home_goal + away_goal) >= 10) AS s
ON c.id = s.country_id
GROUP BY country;
| country | matches |
|-------------|---------|
| England | 3 |
| Germany | 1 |
| Netherlands | 1 |
| Spain | 4 |
INTERMEDIATE SQL
Place it at the beginning
(
SELECT country_id, id
FROM match
WHERE (home_goal + away_goal) >= 10
)
INTERMEDIATE SQL
Place it at the beginning
WITH s AS (
SELECT country_id, id
FROM match
WHERE (home_goal + away_goal) >= 10
)
INTERMEDIATE SQL
Show me the CTE
WITH s AS (
SELECT country_id, id
FROM match
WHERE (home_goal + away_goal) >= 10
)
SELECT
c.name AS country,
COUNT(s.id) AS matches
FROM country AS c
INNER JOIN s
ON c.id = s.country_id
GROUP BY country;
| country | matches |
|-------------|---------|
| England | 3 |
| Germany | 1 |
| Netherlands | 1 |
| Spain | 4 |
INTERMEDIATE SQL
Show me all the CTEs
WITH s1 AS (
SELECT country_id, id
FROM match
WHERE (home_goal + away_goal) >= 10),
s2 AS ( -- New subquery
SELECT country_id, id
FROM match
WHERE (home_goal + away_goal) <= 1
)
SELECT
c.name AS country,
COUNT(s1.id) AS high_scores,
COUNT(s2.id) AS low_scores -- New column
FROM country AS c
INNER JOIN s1
ON c.id = s1.country_id
INNER JOIN s2 -- New join
ON c.id = s2.country_id
GROUP BY country;
INTERMEDIATE SQL
Why use CTEs?
Executed once
CTE is then stored in memory
INTERMEDIATE SQL
How do you get both the home and away team names into one final query result?
WITH home AS (
SELECT m.id, m.date,
t.team_long_name AS hometeam, m.home_goal
FROM match AS m
LEFT JOIN team AS t
ON m.hometeam_id = t.team_api_id),
-- Declare and set up the away CTE
away AS (
SELECT m.id, m.date,
t.team_long_name AS awayteam, m.away_goal
FROM match AS m
LEFT JOIN team AS t
ON m.awayteam_id = t.team_api_id)
Let's Practice!
-- Select date, home_goal, and away_goal
SELECT
home.date,
home.hometeam,
away.awayteam,
home.home_goal,
I N T E R M E D I AT E S Q L
away.away_goal
-- Join away and home on the id column
FROM home
INNER JOIN away
ON home.id = away.id;
Deciding on
techniques to use
I N T E R M E D I AT E S Q L
Mona Khalil
Data Scientist, Greenhouse Software
Different names for the same thing?
Considerable overlap...
INTERMEDIATE SQL
Differentiating Techniques
Joins Correlated Subqueries
INTERMEDIATE SQL
So which do I use?
Depends on your database/question
INTERMEDIATE SQL
Different use cases
Joins Correlated Subqueries
What is the average deal size How did the marketing, sales,
closed by each sales growth, & engineering teams
representative in the quarter? perform on key metrics?
INTERMEDIATE SQL
Let's Practice!
I N T E R M E D I AT E S Q L