0% found this document useful (0 votes)
273 views40 pages

Correlated Queries, Nested Queries, and Common Table Expressions

Correlated subqueries use values from the outer query to generate results. They must be re-run for every row in the final dataset. Nested subqueries involve a subquery within another subquery to perform multiple layers of data transformation. Both correlated and nested subqueries allow for more advanced data filtering, joining, and evaluation compared to simple subqueries.

Uploaded by

dieko
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
273 views40 pages

Correlated Queries, Nested Queries, and Common Table Expressions

Correlated subqueries use values from the outer query to generate results. They must be re-run for every row in the final dataset. Nested subqueries involve a subquery within another subquery to perform multiple layers of data transformation. Both correlated and nested subqueries allow for more advanced data filtering, joining, and evaluation compared to simple subqueries.

Uploaded by

dieko
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

Correlated

Subqueries
I N T E R M E D I AT E S Q L

Mona Khalil
Data Scientist, Greenhouse Software
Correlated subquery
Uses values from the outer query to generate a result

Re-run for every row generated in the nal data set

Used for advanced joining, ltering, and evaluating data

Correlated subqueries are subqueries that reference one or more columns in the
main query. Correlated subqueries depend on information in the main query to run,
and thus, cannot be executed on their own.

Correlated subqueries are evaluated in SQL once per row of data retrieved -- a
process that takes a lot more computing power and time than a simple subquery.

INTERMEDIATE SQL
A simple example
Which match stages tend to have a higher than average number of goals
scored?
SELECT
s.stage,
ROUND(s.avg_goals,2) AS avg_goal,
(SELECT AVG(home_goal + away_goal) FROM match
WHERE season = '2012/2013') AS overall_avg
FROM
(SELECT
stage,
AVG(home_goal + away_goal) AS avg_goals
FROM match
WHERE season = '2012/2013'
GROUP BY stage) AS s
WHERE s.avg_goals > (SELECT AVG(home_goal + away_goal)
FROM match
WHERE season = '2012/2013');

INTERMEDIATE SQL
A simple example
Which match stages tend to have a higher than average number of goals
scored?
SELECT
s.stage,
ROUND(s.avg_goals,2) AS avg_goal,
(SELECT AVG(home_goal + away_goal)
FROM match
WHERE season = '2012/2013') AS overall_avg
FROM (SELECT
stage,
AVG(home_goal + away_goal) AS avg_goals
FROM match
WHERE season = '2012/2013'
GROUP BY stage) AS s -- Subquery in FROM
WHERE s.avg_goals > (SELECT AVG(home_goal + away_goal)
FROM match
WHERE season = '2012/2013'); -- Subquery in WHERE

INTERMEDIATE SQL
A correlated example
SELECT
s.stage,
ROUND(s.avg_goals,2) AS avg_goal,
(SELECT AVG(home_goal + away_goal)
FROM match
WHERE season = '2012/2013') AS overall_avg
FROM
(SELECT
stage,
AVG(home_goal + away_goal) AS avg_goals
FROM match
WHERE season = '2012/2013'
GROUP BY stage) AS s
WHERE s.avg_goals > (SELECT AVG(home_goal + away_goal)
FROM match AS m
WHERE s.stage > m.stage); filters for data where the outer table's match stage, pulled from the subquery in
from, is higher tha the overall average generated in the where subquery

INTERMEDIATE SQL
A correlated example
| stage | avg_goals |
|-------|-----------|
| 3 | 2.83 |
| 4 | 2.8 |
| 6 | 2.78 |
| 8 | 3.09 |
| 10 | 2.96 |

INTERMEDIATE SQL
Simple vs. correlated subqueries
Simple Subquery Correlated Subquery

Can be run independently Dependent on the main


from the main query query to execute

Evaluated once in the Evaluated in loops


whole query Signi cantly slows
down query runtime

INTERMEDIATE SQL
Correlated subqueries
What is the average number of | country | avg_goals

goals scored in each country? |-------------|-----------------


| Belgium | 2.89344262295082
SELECT | England | 2.76776315789474
c.name AS country, | France | 2.51052631578947
AVG(m.home_goal + m.away_goal) | Germany | 2.94607843137255
AS avg_goals | Italy | 2.63150867823765
FROM country AS c | Netherlands | 3.14624183006536
LEFT JOIN match AS m | Poland | 2.49375
ON c.id = m.country_id | Portugal | 2.63255360623782
GROUP BY country; | Scotland | 2.74122807017544
| Spain | 2.78223684210526
| Switzerland | 2.81054131054131

INTERMEDIATE SQL
Correlated subqueries
What is the average number of | country | avg_goals

goals scored in each country? |-------------|-----------------


| Belgium | 2.89344262295082
SELECT | England | 2.76776315789474
c.name AS country, | France | 2.51052631578947
(SELECT | Germany | 2.94607843137255
AVG(home_goal + away_goal) | Italy | 2.63150867823765
FROM match AS m | Netherlands | 3.14624183006536
WHERE m.country_id = c.id) | Poland | 2.49375
AS avg_goals | Portugal | 2.63255360623782
FROM country AS c | Scotland | 2.74122807017544
GROUP BY country; | Spain | 2.78223684210526
| Switzerland | 2.81054131054131

INTERMEDIATE SQL
In the previous exercise, you generated a list of matches with extremely high scores for each country. In this exercise, you're going to add an additional column for matching to answer the question -- what
was the highest scoring match for each country, in each season?

SELECT
-- Select country ID, date, home, and away goals from match
main.country_id,
main.date,
main.home_goal,
main.away_goal
FROM match AS main
WHERE
-- Filter for matches with the highest number of goals scored
(home_goal + away_goal) =
(SELECT max(sub.home_goal + sub.away_goal)
FROM match AS sub
WHERE main.country_id = sub.country_id

Let's practice!
AND main.season = sub.season);

I N T E R M E D I AT E S Q L
Nested Subqueries
I N T E R M E D I AT E S Q L

Mona Khalil
Data Scientist, Greenhouse Software
Nested subqueries??
Subquery inside another subquery

Perform multiple layers of transformation

INTERMEDIATE SQL
A subquery...
How much did each country's average differ from the overall average?

SELECT
c.name AS country,
AVG(m.home_goal + m.away_goal) AS avg_goals,
AVG(m.home_goal + m.away_goal) -
(SELECT AVG(home_goal + away_goal)
FROM match) AS avg_diff
FROM country AS c
LEFT JOIN match AS m
ON c.id = m.country_id
GROUP BY country;

INTERMEDIATE SQL
A subquery...
| country | avg_goals | avg_diff |
|-------------|-----------|----------|
| Belgium | 2.8015 | 0.096 |
| England | 2.7105 | 0.005 |
| France | 2.4431 | -0.2624 |
| Germany | 2.9016 | 0.196 |
| Italy | 2.6168 | -0.0887 |
| Netherlands | 3.0809 | 0.3754 |
| Poland | 2.425 | -0.2805 |
| Portugal | 2.5346 | -0.1709 |
| Scotland | 2.6338 | -0.0718 |
| Spain | 2.7671 | 0.0616 |
| Switzerland | 2.9297 | 0.2241 |

INTERMEDIATE SQL
...inside a subquery!
How does each month's total goals differ from the average monthly
total of goals scored?

SELECT
EXTRACT(MONTH FROM date) AS month,
SUM(m.home_goal + m.away_goal) AS total_goals,
SUM(m.home_goal + m.away_goal) -
(SELECT AVG(goals)
FROM (SELECT
EXTRACT(MONTH FROM date) AS month,
SUM(home_goal + away_goal) AS goals
FROM match
GROUP BY month)) AS avg_diff
FROM match AS m
GROUP BY month;

INTERMEDIATE SQL
Inner subquery
SELECT
EXTRACT(MONTH from date) AS month,
SUM(home_goal + away_goal) AS goals
FROM match
GROUP BY month;

| month | goals |
|-------|-------|
| 01 | 2988 |
| 02 | 3768 |
| 03 | 3936 |
| 04 | 4055 |
| 05 | 2719 |
| 06 | 84 |
| 07 | 366 |

INTERMEDIATE SQL
Outer subquery
SELECT AVG(goals)
FROM (SELECT
EXTRACT(MONTH from date) AS month,
AVG(home_goal + away_goal) AS goals
FROM match
GROUP BY month) AS s;

2944.75

INTERMEDIATE SQL
Final query
SELECT
EXTRACT(MONTH FROM date) AS month,
SUM(m.home_goal + m.away_goal) AS total_goals,
SUM(m.home_goal + m.away_goal) -
(SELECT AVG(goals)
FROM (SELECT
EXTRACT(MONTH FROM date) AS month,
SUM(home_goal + away_goal) AS goals
FROM match
GROUP BY month) AS s) AS diff
FROM match AS m
GROUP BY month;

| month | goals | diff |


|-------|-------|----------|
| 01 | 5821 | -36.25 |
| 02 | 7448 | 1590.75 |
| 03 | 7298 | 1440.75 |
| 04 | 8145 | 2287.75 |

INTERMEDIATE SQL
Correlated nested subqueries
Nested subqueries can be correlated or uncorrelated
Or...a combination of the two

Can reference information from the outer subquery or main query

INTERMEDIATE SQL
Correlated nested subqueries
What is the each country's average goals scored in the 2011/2012
season?

SELECT
c.name AS country,
(SELECT AVG(home_goal + away_goal)
FROM match AS m
WHERE m.country_id = c.id
AND id IN (
SELECT id
FROM match
WHERE season = '2011/2012')) AS avg_goals
FROM country AS c
GROUP BY country;

INTERMEDIATE SQL
Correlated nested subqueries
What is the each country's average goals scored in the 2011/2012
season?

SELECT
c.name AS country,
(SELECT AVG(home_goal + away_goal)
FROM match AS m
WHERE m.country_id = c.id
AND id IN (
SELECT id -- Begin inner subquery
FROM match
WHERE season = '2011/2012')) AS avg_goals
FROM country AS c
GROUP BY country;

INTERMEDIATE SQL
Correlated nested subquery
What is the each country's average goals scored in the 2011/2012
season?

SELECT
c.name AS country,
(SELECT AVG(home_goal + away_goal)
FROM match AS m
WHERE m.country_id = c.id -- Correlates with main query
AND id IN (
SELECT id -- Begin inner subquery
FROM match
WHERE season = '2011/2012')) AS avg_goals
FROM country AS c
GROUP BY country;

INTERMEDIATE SQL
Correlated nested subqueries
| country | avg_goals |
|-------------|------------------|
| Belgium | 2.87916666666667 |
| England | 2.80526315789474 |
| France | 2.51578947368421 |
| Germany | 2.85947712418301 |
| Italy | 2.58379888268156 |
| Netherlands | 3.25816993464052 |
| Poland | 2.19583333333333 |
| Portugal | 2.64166666666667 |
| Scotland | 2.6359649122807 |
| Spain | 2.76315789473684 |
| Switzerland | 2.62345679012346 |

INTERMEDIATE SQL
What's the average number of matches per season where a team scored 5 or more goals? How does this differ by country?

SELECT
c.name AS country,
-- Calculate the average matches per season
AVG(outer_s.matches) AS avg_seasonal_high_scores
FROM country AS c
-- Left join outer_s to country
LEFT JOIN (
SELECT country_id, season,
COUNT(id) AS matches
FROM (
SELECT country_id, season, id
FROM match
WHERE home_goal >= 5 OR away_goal >= 5) AS inner_s
-- Close parentheses and alias the subquery

Let's practice!
GROUP BY country_id, season) AS outer_s
ON c.id = outer_s.country_id
GROUP BY country;

I N T E R M E D I AT E S Q L
Common Table
Expressions
I N T E R M E D I AT E S Q L

Mona Khalil
Data Scientist, Greenhouse Software
When adding subqueries...
Query complexity increases quickly!
Information can be dif cult to keep track of

Solution: Common Table Expressions!

INTERMEDIATE SQL
Common Table Expressions
Common Table Expressions Setting up CTEs
(CTEs)
WITH cte AS (
Table declared before the SELECT col1, col2
main query FROM table)
Named and referenced later SELECT
in FROM statement AVG(col1) AS avg_col
FROM cte;

INTERMEDIATE SQL
Take a subquery in FROM
SELECT
c.name AS country,
COUNT(s.id) AS matches
FROM country AS c
INNER JOIN (
SELECT country_id, id
FROM match
WHERE (home_goal + away_goal) >= 10) AS s
ON c.id = s.country_id
GROUP BY country;

| country | matches |
|-------------|---------|
| England | 3 |
| Germany | 1 |
| Netherlands | 1 |
| Spain | 4 |

INTERMEDIATE SQL
Place it at the beginning
(
SELECT country_id, id
FROM match
WHERE (home_goal + away_goal) >= 10
)

INTERMEDIATE SQL
Place it at the beginning
WITH s AS (
SELECT country_id, id
FROM match
WHERE (home_goal + away_goal) >= 10
)

INTERMEDIATE SQL
Show me the CTE
WITH s AS (
SELECT country_id, id
FROM match
WHERE (home_goal + away_goal) >= 10
)
SELECT
c.name AS country,
COUNT(s.id) AS matches
FROM country AS c
INNER JOIN s
ON c.id = s.country_id
GROUP BY country;

| country | matches |
|-------------|---------|
| England | 3 |
| Germany | 1 |
| Netherlands | 1 |
| Spain | 4 |

INTERMEDIATE SQL
Show me all the CTEs
WITH s1 AS (
SELECT country_id, id
FROM match
WHERE (home_goal + away_goal) >= 10),
s2 AS ( -- New subquery
SELECT country_id, id
FROM match
WHERE (home_goal + away_goal) <= 1
)
SELECT
c.name AS country,
COUNT(s1.id) AS high_scores,
COUNT(s2.id) AS low_scores -- New column
FROM country AS c
INNER JOIN s1
ON c.id = s1.country_id
INNER JOIN s2 -- New join
ON c.id = s2.country_id
GROUP BY country;

INTERMEDIATE SQL
Why use CTEs?
Executed once
CTE is then stored in memory

Improves query performance

Improving organization of queries

Referencing other CTEs

Referencing itself ( SELF JOIN )

INTERMEDIATE SQL
How do you get both the home and away team names into one final query result?

Let's explore the final method - common table expressions.

WITH home AS (
SELECT m.id, m.date,
t.team_long_name AS hometeam, m.home_goal
FROM match AS m
LEFT JOIN team AS t
ON m.hometeam_id = t.team_api_id),
-- Declare and set up the away CTE
away AS (
SELECT m.id, m.date,
t.team_long_name AS awayteam, m.away_goal
FROM match AS m
LEFT JOIN team AS t
ON m.awayteam_id = t.team_api_id)

Let's Practice!
-- Select date, home_goal, and away_goal
SELECT
home.date,
home.hometeam,
away.awayteam,
home.home_goal,

I N T E R M E D I AT E S Q L
away.away_goal
-- Join away and home on the id column
FROM home
INNER JOIN away
ON home.id = away.id;
Deciding on
techniques to use
I N T E R M E D I AT E S Q L

Mona Khalil
Data Scientist, Greenhouse Software
Different names for the same thing?
Considerable overlap...

...but not identical!

INTERMEDIATE SQL
Differentiating Techniques
Joins Correlated Subqueries

Combine 2+ tables Match subqueries & tables


Simple Avoid limits of joins
operations/aggregations High processing time

Multiple/Nested Subqueries Common Table Expressions

Multi-step Organize subqueries


transformations sequentially
Improve accuracy and Can reference other CTEs
reproducibility

INTERMEDIATE SQL
So which do I use?
Depends on your database/question

The technique that best allows you to:


Use and reuse your queries

Generate clear and accurate results

INTERMEDIATE SQL
Different use cases
Joins Correlated Subqueries

2+ tables (What is the total Who does each employee


sales per employee?) report to in a company?

Multiple/Nested Subqueries Common Table Expressions

What is the average deal size How did the marketing, sales,
closed by each sales growth, & engineering teams
representative in the quarter? perform on key metrics?

INTERMEDIATE SQL
Let's Practice!
I N T E R M E D I AT E S Q L

You might also like