0% found this document useful (0 votes)
57 views20 pages

Learnsql Com Blog Advanced SQL Interview Questions

Sql

Uploaded by

xaras67961
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
57 views20 pages

Learnsql Com Blog Advanced SQL Interview Questions

Sql

Uploaded by

xaras67961
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

Boost your career with the SQL courses! - 40 hours only!

Get 65+ online SQL courses and save $420 40h : 59m : 55s

Courses Pricing For Students SQL Library Log in Create free account

Back to articles list Articles Cookbook

23rd Feb 2023 33 minutes read

Top 27 Advanced SQL Interview Questions with


Answers

Tihomir Babic Sql Jobs And Career

Where can an SQL professional find a comprehensive guide to advanced SQL


interview questions? The shortest answer is: here! We selected the 27 most important
SQL questions and answered them for you.

Preparing for an SQL interview is not easy, especially if your job requires the
knowledge of advanced SQL. This article contains the 27 most commonly asked
advanced SQL interview questions and provides detailed answers and resources for
further reading.

We’ll go through these four main concepts and a few more besides:

JOINs
GROUP BY , WHERE , and HAVING
CTEs (Common Table Expressions) and recursive queries
Window Functions

SQL The best way to refresh your advanced SQL knowledge is by taking our interactive
Advanced SQL track. It has over 300 hands-on exercises for window functions,
test
Common Table Expressions, recursive functions, and more.
Let’s attack these questions frontally, without further ado!

1. What Is a JOIN in SQL?


JOIN is an SQL command that allows you to combine two or more tables. This is done
via a common column (i.e. a column that has the same values in both tables), which
allows using data from two or more tables at the same time. Joining tables in SQL is
essential due to the nature of relational databases: data is atomized into tables, with
each table holding only a part of the data available in the database.

We’ll use two tables to showcase how this works. The first table is football_players .

id first_name last_name national_team_id games_played

1 Gianfranco Zola 1 35

2 Virgil van Dijk 2 53

3 Marcus Rashford 3 51

4 Kylian Mbappé 5 66

5 Phil Foden 3 22

6 Frenkie de Jong 2 22

7 Mario Balotelli 1 36

8 Erling Haaland 6 23

The second is national_team .

id country

1 Italy

2 Netherlands

3 England

4 Croatia

Here’s a query that joins two tables:

Code

SELECT
fp.id,
first_name,
last_name,
national_team_id,
country,
games_played
FROM football_players fp
JOIN national_team nt
ON fp.national_team_id = nt.id
ORDER BY fp.id;

It selects columns from both tables. To join them, we first reference one table in the
FROM clause. This is followed by JOIN , and after that comes the second table. We use
the ON clause to specify the condition with which the tables will be joined: the
national_team_id in the football_players table must be equal to the id column in
the national_team table.

The output of the query is:

id first_name last_name national_team_id country

1 Gianfranco Zola 1 Italy

2 Virgil van Dijk 2 Netherlands

3 Marcus Rashford 3 England

5 Phil Foden 3 England

6 Frenkie de Jong 2 Netherlands


SQL
7 Mario Balotelli 1 Italy
test
INNER JOIN is one of the several distinct joins in SQL. Its characteristic is that it only
returns data from the joined tables where the joining condition is true. Here are more
details on how the SQL INNER JOIN works.

2. What Is the Difference


Between INNER JOIN, LEFT JOIN,
RIGHT JOIN, and FULL JOIN?
There are different types of joins in SQL. The most commonly used joins are INNER
JOIN , LEFT JOIN , RIGHT JOIN , and FULL JOIN . LEFT JOIN , RIGHT JOIN , and FULL
JOIN are so-called outer joins. JOIN (aka INNER JOIN ) is an inner join. In this case,
‘inner’ means that it returns only the rows from both tables that satisfy the joining
condition; outer joins return all the rows in one table, plus matching rows in the other
table(s). The exception is FULL JOIN , which returns all rows from both tables.

Here’s the result of the INNER JOIN from the previous example. Let’s have it here
again. That way, it’ll be easier to see the difference between different joins.

id first_name last_name national_team_id country

1 Gianfranco Zola 1 Italy

2 Virgil van Dijk 2 Netherlands

3 Marcus Rashford 3 England

5 Phil Foden 3 England

6 Frenkie de Jong 2 Netherlands

7 Mario Balotelli 1 Italy

LEFT JOIN returns all the data from the left table (i.e. the first table, which is listed
before\to the left of the JOIN keyword) and only the matching rows from the right
table (the second table, listed after\to the right of the JOIN keyword). If there is no
matching data in the right table, the missing values are shown as NULLs . Here’s the
same query with LEFT JOIN substituting for INNER JOIN :

Code

SELECT
fp.id,
first_name,
last_name,
national_team_id,
country,
games_played
FROM football_players fp
LEFT JOIN national_team nt
ON fp.national_team_id = nt.id
ORDER BY fp.id;

The left table here is football_players , and the right is national_team . As you’d
expect, the output is different:

id first_name last_name national_team_id country

1 Gianfranco Zola 1 Italy

2 Virgil van Dijk 2 Netherlands

3 Marcus Rashford 3 England

4 Kylian Mbappé 5 NULL

5 Phil Foden 3 England

6 Frenkie de Jong 2 Netherlands

7 Mario Balotelli 1 Italy

8 Erling Haaland 6 NULL

SQL
All the football players from the left table are here. However, Kylian Mbappe and
test
Erling Haaland don’t have a matching country in the right table, so there are NULLs in
the country column for those players. These rows were not present in the INNER JOIN
result. They were added by the LEFT JOIN .

The RIGHT JOIN does the opposite: it returns all the data from the right table and only
the matching data from the left table. When there is no matching data in the left table,
the missing values are shown as NULLs .

Here’s the code:

Code

SELECT
fp.id,
first_name,
last_name,
national_team_id,
country,
games_played
FROM football_players fp
RIGHT JOIN national_team nt
ON fp.national_team_id = nt.id
ORDER BY fp.id;

Everything stays the same, except we’re using RIGHT JOIN instead of LEFT JOIN . This is
the output:

id first_name last_name national_team_id country

1 Gianfranco Zola 1 Italy

2 Virgil van Dijk 2 Netherlands

3 Marcus Rashford 3 England

5 Phil Foden 3 England

6 Frenkie de Jong 2 Netherlands

7 Mario Balotelli 1 Italy

NULL NULL NULL NULL Croatia

We now have all the national teams and their players. But you can see there’s one
country (Croatia) that has no players in the left table. The player columns for Croatia
are filled with NULLs.

FULL JOIN outputs all the data from all the joined tables. Again, if there is no matching
data in the corresponding table, the missing values will appear as NULL .

Once again, we change the join type in the query:

Code

SELECT
fp.id,
first_name,
last_name,
national_team_id,
country,
games_played
FROM football_players fp
FULL JOIN national_team nt
ON fp.national_team_id = nt.id
ORDER BY fp.id;

It will return all data from both tables. Any non-matching data is replaced by NULLs .
All players are in the result, even if they don’t have a corresponding country in the
other table. All countries are in the result, even if they don’t have players in the
football_player table. The FULL JOIN result is the union of LEFT JOIN and RIGHT
JOIN :

id first_name last_name national_team_id country

1 Gianfranco Zola 1 Italy

2 Virgil van Dijk 2 Netherlands

SQL 3 Marcus Rashford 3 England

test 4 Kylian Mbappé 5 NULL


5 Phil Foden 3 England

6 Frenkie de Jong 2 Netherlands

7 Mario Balotelli 1 Italy

8 Erling Haaland 6 NULL

NULL NULL NULL NULL Croatia

You can find more info in the article about different JOIN types. You can also see our
SQL JOINs Cheat Sheet for a quick refresher.

Level up your SQL with our interactive SQL JOINs course. Its 93 hands-on exercises
let you learn by doing.

3. What Is a CROSS JOIN?


A CROSS JOIN is another join type available in SQL. It returns a Cartesian product. This
means that CROSS JOIN will return each row from the first table combined with each
row from the second table.

It is not used very often. But if you’re tempted to use it, think twice. Returning all the
row combinations might take some time – if the query finishes at all!

As an example, let’s use the tables as in the previous two questions. To write the query,
use the CROSS JOIN keyword. Since this is a join type that returns all the row
combinations from all tables, there’s no ON clause. Have a look:

Code

SELECT
fp.id,
first_name,
last_name,
national_team_id,
country,
games_played
FROM football_players fp
CROSS JOIN national_team nt;

Here’s the output. All players in the football_players table are listed with all countries
in the national_team table.

id first_name last_name national_team_id country

1 Gianfranco Zola 1 Italy

2 Virgil van Dijk 2 Italy

3 Marcus Rashford 3 Italy

4 Kylian Mbappé 5 Italy

5 Phil Foden 3 Italy

6 Frenkie de Jong 2 Italy

7 Mario Balotelli 1 Italy

8 Erling Haaland 6 Italy

1 Gianfranco Zola 1 Netherlands

2 Virgil van Dijk 2 Netherlands

3 Marcus Rashford 3 Netherlands

4 Kylian Mbappé 5 Netherlands

5 Phil Foden 3 Netherlands

6 Frenkie de Jong 2 Netherlands

7 Mario Balotelli 1 Netherlands

8 Erling Haaland 6 Netherlands


SQL
1 Gianfranco Zola 1 England
test
2 Virgil van Dijk 2 England

3 Marcus Rashford 3 England

4 Kylian Mbappé 5 England

5 Phil Foden 3 England

6 Frenkie de Jong 2 England

7 Mario Balotelli 1 England

8 Erling Haaland 6 England

1 Gianfranco Zola 1 Croatia

2 Virgil van Dijk 2 Croatia

3 Marcus Rashford 3 Croatia

4 Kylian Mbappé 5 Croatia

5 Phil Foden 3 Croatia

6 Frenkie de Jong 2 Croatia

7 Mario Balotelli 1 Croatia

8 Erling Haaland 6 Croatia

You can learn more about the CROSS JOIN in this illustrated guide to the SQL CROSS
JOIN.

4. What Is a Self-Join in SQL?


As you probably suspect, a self-join occurs when the table is joined with itself. It’s
important to note that it’s not a distinct command in SQL: any JOIN type can be used
to join a table with itself.

The joining is done like any other JOIN , but this time you’ll reference the same table
on both sides of the JOIN keyword. Self-joining is especially useful when a table has a
foreign key referencing its primary key. This lets you query hierarchical data, such as
family trees or a company’s organizational hierarchy. It’s also helpful when you want to
find pairs of values. In the example below, we are looking for players from the same
national team:

Code

SELECT
fp1.id,
fp1.first_name,
fp1.last_name,
fp1.national_team_id,
fp2.id AS id_2,
fp2.first_name AS first_name_2,
fp2.last_name AS last_name_2,
fp2.national_team_id as national_team_id_2
FROM football_players fp1
JOIN football_players fp2
ON fp1.id <> fp2.id
AND fp1.national_team_id = fp2.national_team_id;

Self-joining means instead of two tables, you’re specifying the same table twice: once
in the FROM clause and once after the JOIN clause. Since you’re using the same table
twice, you must use aliases for the tables. Each occurrence of the table should be given
a distinct alias ( fp1 , fp2 in our query) so that it is clear which occurrence of the table
we’re referring to.

We’re joining players from the same national team (their national_team_id values are
equal). However, we don’t want to list a player with himself, so we exclude the case
when fp1.id and fp2.id are equal.

The query’s output is this:

id first_name last_name national_team_id id_2 first_n

SQL 1 Gianfranco Zola 1 7 Mario

test 2 Virgil van Dijk 2 6 Frenkie


3 Marcus Rashford 3 5 Phil

5 Phil Foden 3 3 Marcus

6 Frenkie de Jong 2 2 Virgil

7 Mario Balotelli 1 1 Gianfra

You can use the national_team_id columns from both tables to confirm that, really,
Gianfranco Zola and Mario Balotelli played for the same team.

You can learn more about this in our article on self-join examples.

5. Join Two Tables Using a Two-


Column JOIN
You’re given two tables. The first one is employee , which has the following data:

id first_name last_name

1 Steve Bergman

2 Steve Johnson

3 Steve King

The second table is customer , which has the following data:

id first_name last_name

1 Ann Coleman

2 Steve Bergman

3 Steve Young

4 Donna Winter

5 Steve King

Your task is to return all the employees that are also the company’s customers.
Unfortunately, you can’t use the id column because it’s the employee ID in one table
and the customer ID in another. In other words, there’s no single column in one table
that references the other.

The solution is to join tables on the first and last name, i.e. to use a two-column JOIN .

The code below will first join the tables on the first name. After that, the keyword AND
sets the second joining condition, which is the last name. That way, you’ll get data from
both tables where the first name/last name combination is the same. If we used only
one of these columns, we could’ve gotten the wrong data because employees and
customers can have the same first name but different surnames (or vice versa). Here’s
the query:

Code

SELECT
e.first_name,
e.last_name
FROM employee e
JOIN customer c
ON e.first_name = c.first_name
AND e.last_name = c.last_name;

Here’s the code output.:

first_name last_name

Steve Bergman

Steve King

The result shows that Steve Bergman and Steve King are both the company’s
SQL employees and customers.

test
6. Join Two Tables Using a Non-
Equi Join
Up till now, we’ve been using equi-joins: joins where there’s an equality sign in the ON
condition. Conversely, the non-equi join is a join that has a non-equality condition in the
ON clause.

This time, we have data on mobile users and their data usage. The first table is
mobile_user , which shows mobile users and their monthly mobile data limit in MB:

id first_name last_name mobile_data_limit

1 Michael Watson 5,000

2 Nicole Gomez 10,000

3 Sam Stone 8,000

The second table is data_usage , which shows the user's actual monthly data usage in
MB:

id mobile_user_id data_used period

1 1 4,987 2022_10

2 2 6,875 2022_10

3 3 12,547 2022_10

4 1 5,037 2022_11

5 2 11,111 2022_11

6 3 4,897 2022_11

The task is to find all the data where the actual usage was above the monthly limit. We
want to see the user’s first and last name, monthly limit, actual data used, and the time
period.

The solution is to use the non-equi join, as shown below:

Code

SELECT
first_name,
last_name,
mobile_data_limit,
data_used,
period
FROM mobile_user mu
JOIN data_usage du
ON mu.id = du.mobile_user_id
AND mobile_data_limit < data_used;

The query selects all the required info from two tables. The tables are joined using an
INNER JOIN . We first join them where the user ID is the same. Then, we add the second
condition after the AND keyword. Here we have a non-equality condition that will get
us data where the limit is below the monthly usage.

You can see the result below:

first_name last_name mobile_data_limit data_used period

Sam Stone 8,000 12,547 2022_10

Michael Watson 5,000 5,037 2022_11

Nicole Gomez 10,000 11,111 2022_11

If you’re interested, here are some more non-equi join examples.

7. What Does DISTINCT Do?


SQL
DISTINCT ’s purpose, in general, is to remove duplicate values. Or, put another way, to
test
show unique values in your query’s output.
Imagine you’re working with this loans table that shows loan IDs and their durations
in months.

loan_id loan_duration

100112 60

100205 60

100208 48

100333 120

100357 48

100398 120

Different loans can have the same duration, so you want to extract the list of possible
loan durations. You can do this by using DISTINCT :

Code

SELECT DISTINCT loan_duration


FROM loans
ORDER BY loan_duration;

The output shows there are loans with durations of 48, 60, and 120 months:

loan_duration

48

60

120

DISTINCT can be used in SELECT with one column to show only the unique values of
that column, as in the above example. If it’s used in SELECT but with multiple columns,
then the output will show the unique combinations of all these columns.

You can also use DISTINCT with aggregate functions. If you do that, your query will
eliminate duplicate aggregation results.

You can see examples of these usages in our article talking about the role of DISTINCT
in SQL.

8. What Does GROUP BY Do in


SQL?
GROUP BY is an SQL clause used for arranging data into groups based on a common
value or values. It is most commonly used with aggregate functions; this combination
will return aggregated data for every group. However, it’s important to know that using
aggregate functions within the GROUP BY clause is not allowed.

The general GROUP BY syntax is:

Code

SELECT
column_1,
column_2,
…,
FROM table_name
WHERE …
GROUP BY column_1, column_2
HAVING …
ORDER BY column_1, column_2;

Suppose there’s the table salaries :

id first_name last_name salary department

SQL 1 Nicholas Poirot 4,798.44 IT

test 2 Samantha Wolf 5,419.24 IT


3 Stewart Johnsons 5,419.24 IT

4 Jackie Biden 8,474.54 Sales

5 Mark Hamilton 10,574.84 Sales

6 Mariana Costa 9,747.54 Sales

7 Paul Stewart 3,498.12 Accounting

8 Mary Rutte 4,187.23 Accounting

9 Thomas Schwarz 3,748.55 Accounting

We’ll use GROUP BY and AVG() to find the average salary by department:

Code

SELECT
department,
AVG(salary) AS average_salary
FROM salaries
GROUP BY department;

We want to see the departments, so we select this column. To calculate the average
salary, we apply the AVG() function to the column salary .

All the columns listed in GROUP BY define the data groups. In our example, the groups
are defined by the department column: we calculate the average salary for each
department.

Our data grouping and aggregation looks like this:

department average_salary

Accounting 3,811.30

Sales 9,598.97

IT 5,212.31

The average salary in Accounting is 3,811.30. The average salaries in the other two
departments are 9,598.97 and 5,212.31, respectively.

When writing a query, GROUP BY always has to come after WHERE but before the
HAVING clause. You can learn more about it in this article about GROUP BY in SQL.

9. How Do You Filter GROUP BY


Groups?
Once you get the groups you specified in GROUP BY , sometimes you’ll want to filter
them. The clue to doing that lies in the syntax from the previous question. The clause
that allows you to filter groups is HAVING .

After the filtering criteria are specified in HAVING , the query will return all the data that
satisfies the criteria. All other data will be filtered out.

Let’s learn SQL from A to Z in our complete track!

Here’s how it works on the data from the previous question if we had to show only
departments with an average salary below 5.500 dollars.

Code

SELECT
department,
AVG(salary) AS average_salary
FROM salaries
GROUP BY department
HAVING AVG(salary) < 5500;

SQL
test
The code is very similar to the one in the previous question. The difference is the
HAVING clause. We use it to filter the results and show only departments with salaries
below 5,500.

Here’s what the code returns:

department average_salary

Accounting 3,811.30

IT 5,212.31

The department missing from the output is Sales because its average salary is 9,598.97.

You can find additional practical examples of this clause in this article explaining
HAVING in SQL.

10. What’s the Difference


Between WHERE and HAVING?
If you know the answers to the previous two questions, you probably know the answer
to this question.

The main difference is that WHERE is used for filtering data before it’s grouped. Its
position in the SELECT statement shows this: it comes before GROUP BY . Due to its
purpose, no aggregate functions are allowed in WHERE .

HAVING , on the contrary, is used to filter data after grouping; that’s why it is used after
GROUP BY . Also, HAVING allows conditions that include aggregate functions.

The best way to learn the distinction is to read this article on WHERE vs. HAVING in
SQL.

11. What Will the Following


Query Attempting to Filter NULLs
Return?
You’ll often get this type of question at your advanced SQL interview: you’ll be given a
code and have to describe the query's return. While writing and reading SQL code go
hand-in-hand, it still feels different when you have to analyze the code someone else
wrote.

You have data in the table contributors :

id first_name last_name start_date termination_date

1 Valeria Bogdanov 2022-10-11 NULL

2022-04-
2 Nicholas Bertolucci 2022-11-11
07

2022-05-
3 Mathilde Bauman 2022-10-01
25

2022-01-
4 Trevor Trucks NULL
28

2022-03-
5 Maria Szabo NULL
15

What will this code return?

Code

SELECT
first_name,
last_name,
start_date,
termination_date
SQL FROM contributors
WHERE termination_date != '2022-10-01';
test
If you answer that it will return all rows except ID = 3, you’re wrong! This is a kind of
trick question. When reading the WHERE condition, you could read it as: return all the
data where the termination date is different from 2022-10-01. By looking at the table,
you would think it’s all rows except one.

It is, but not for SQL! As you can see, there are three rows with NULL values. For SQL,
NULL doesn’t equal a value; it’s a non-value. So when you set up the condition in
WHERE like that, you will be excluding all dates that are not equal to 2022-10-01 and
NULL values.

Here’s the output as proof:

first_name last_name start_date termination_date

2022-04-
Nicholas Bertolucci 2022-11-11
07

You can learn more about this and other comparison operators used with NULL.

12. Write a Query That Finds the


Number of Songs by Artist. Use
LEFT JOIN and COUNT().
Suppose you’re given two tables: artist and song .

Here’s the artist data:

id artist_name

1 Prince

2 Jimi Hendrix

3 Santana

4 Otis Redding

5 Lou Rawls

Below is the song data:

id artist_id song_title

1 1 Purple Rain

2 2 Purple Haze

3 3 Europa

4 1 Cream

5 1 Bambi

6 1 Why You Wanna Treat Me So Bad?

7 2 Spanish Castle Magic

8 3 Taboo

9 3 Incident at Neshabur

10 3 Flor D' Luna

You need to use LEFT JOIN and COUNT() to find all the artists, their IDs, and the
number of their songs in the database.

You could be tempted to suggest this solution:

Code

SELECT
a.id,
artist_name,
COUNT(*) AS number_of_songs
FROM artist a
SQL LEFT JOIN song s
ON a.id = s.artist_id
test GROUP BY a.id, artist_name
ORDER BY a.id;

Let’s take a look at the output:

id artist_name number_of_songs

1 Prince 4

2 Jimi Hendrix 2

3 Santana 4

4 Otis Redding 1

5 Lou Rawls 1

The output shows all the artists; that’s OK. However, the number of songs for Otis
Redding and Lou Rawls is one, which is wrong! Take a look at the table song , and
you’ll see there are no artist IDs equal to 4 or 5.

What went wrong? When using COUNT(*) with LEFT JOIN , the aggregate function will
count all the non-matched values ( NULLs ). That’s why the result showed one song
each for Otis Redding and Lou Rawls, even though they have no songs in the table.

COUNT(*) is used to count all the rows. To give a correct answer, you should use
COUNT(song_title) instead.

Code

SELECT
a.id,
artist_name,
COUNT(song_title) AS number_of_songs
FROM artist a
LEFT JOIN song s
ON a.id = s.artist_id
GROUP BY a.id, artist_name
ORDER BY a.id;

This code will give you the right output:

id artist_name number_of_songs

1 Prince 4

2 Jimi Hendrix 2

3 Santana 4

4 Otis Redding 0

5 Lou Rawls 0

The number of songs by Prince, Jimi Hendrix, and Santana stayed the same as in the
previous output. However, the number of songs by the other two artists is now zero,
and that’s the correct count.

13. What’s the Difference


Between JOIN and UNION?
JOIN is an SQL clause used for joining two or more tables. It allows using data from all
the joined tables. In other words, columns from all tables are shown next to each
other, meaning data is stacked horizontally.

UNION is a set operator used for combining the results of two or more SELECT
statements. Data is stacked vertically. One of the requirements when using UNION is
there has to be an equal number of columns in all unionized SELECT statements. Also,
all the selected columns have to be of the same data type.

14. What’s the Difference


SQL
test Between UNION and UNION ALL?
What they have in common is that both are set operators. Also, both operators are
used for the same purpose: merging data from two or more SELECT statements.

The requirements regarding the number of columns and their data type are also the
same.

Now, the difference is that UNION returns only unique records. On the other hand,
UNION ALL returns all the records, which includes duplicates.

Usually, UNION ALL is faster because it doesn’t sort the result to remove the duplicates.
The rule of thumb is to use UNION ALL by default. Use UNION only if you need unique
results or you’re absolutely sure your query won’t return duplicate data.

You can learn more about their syntax and use in this article about the differences
between UNION and UNION ALL.

15. What Is a Subquery in SQL?


A subquery is a query written inside another SQL query. The ‘another’ query is called
the main query, while a subquery is sometimes also called a nested query.

Subqueries can be used in the SELECT , INSERT , UPDATE , and DELETE statements.
They can also be used in clauses like FROM or WHERE , which is the most common use.

Here’s an example. The table is products , and it stores information on product names,
quantities, and categories:

id product_name quantity product_category

Apple MacBook Air


1 (2020) MGN63N/A 319 Laptop
Space Gray

Fairphone 4 128GB
2 208 Mobile phone
Green 5G

Apple iMac 24" (2021)


16GB/512GB Apple
3 157 Desktop
M1 with 8 core GPU
Silver

4 HP 17-cp0971nd 487 Laptop

Huawei P30 Pro -


5 148 Mobile phone
128GB - Blue

Lenovo Legion T5 -
AMD Ryzen 9 - 32 GB
6 - 2TB HDD+SSD - 514 Desktop
Windows 10 Home
PC

Toshiba Dynabook
7 Satellite Pro E10-S- 207 Laptop
101 Notebook

Samsung Galaxy S23


8 5G - 256GB - 56 Mobile phone
Phantom Black

Intel Compleet PC |
9 459 Desktop
Intel Core i7-10700

We’ll use a subquery and show the total quantity by product category, but only for the
individual products whose quantity is above the average quantity for all products.

Here’s the solution:

Code

SELECT
product_category,
SUM(quantity) AS product_quantity
FROM products
WHERE quantity > (SELECT AVG(quantity)
FROM products)
GROUP BY product_category;

SQL
The query selects the product category and sums the quantity using the SUM()
test aggregate function. There’s a condition in WHERE that says only those individual
products with a quantity above the average will be included in the sum. We use the
subquery and the AVG() function to get this average.

The query returns two rows:

product_category product_quantity

Laptop 806

Desktop 973

One category is missing because it doesn’t satisfy the filtering criteria – mobile phones.

There are different types of subqueries, such as scalar, multiple-row, and correlated
queries. You can learn more about them in our article on subquery types.

Complete learning path for modern SQL. Try out our SQL from A to Z track. 7
hands-on SQL courses with over 800 exercises!

16. Write a Query to Return


Salespersons and Their Monthly
Sales Data Above Their Personal
Sales Average. Use a Correlated
Subquery.
A correlated subquery is a type of subquery that uses values from the outer query. It is
checked once for each row the outer query returns, which can slow performance.

However, the question insists on using it, so let’s see the data.

The first table is salesperson :

id first_name last_name

1 Nina Lee

2 Carolina Green

3 Mick Johnson

The other table is sales :

id salesperson_id monthly_sales period

1 1 1,200.47 2021_10

2 2 5,487.22 2021_10

3 3 700.47 2021_10

4 1 15,747.54 2021_11

5 2 16,700.87 2021_11

5 3 14,322.87 2021_11

6 1 9,745.55 2021_12

7 2 9,600.97 2021_12

8 3 6,749.58 2021_12

Your task is to use a correlated subquery and return the salesperson’s full name, their
monthly sales, and the periods where their sales are above their personal average.

Here’s the solution:

Code

SELECT
first_name,
SQL last_name,
monthly_sales,
test period
FROM salesperson sp
JOIN sales s
ON sp.id = s.salesperson_id
WHERE monthly_sales >
(SELECT AVG(monthly_sales)
FROM sales
WHERE salesperson_id = sp.id);

The query selects all the required columns. This data is from both tables, so we joined
them.

Now comes the crucial part. To filter data, we use the WHERE clause. The condition
says the query should return all data where monthly sales are higher than each
salesperson’s average sales. How do we calculate these individual average sales? By
using the AVG() function in the subquery that we write in the WHERE clause.

Here’s the output:

first_name last_name monthly_sales period

Nina Lee 15,747.54 2021_11

Carolina Green 16,700.87 2021_11

Mick Johnson 14,322.87 2021_11

Nina Lee 9,745.55 2021_12

There are some more examples in this article on correlated subqueries.

17. What Are Window Functions


in SQL?
SQL window functions get their name from the fact they are applied to a data window.
This window is simply a set of rows related to the current row.

Window functions are initiated by the OVER() clause. Another important clause is
PARTITION BY , which defines data partitions within a window frame. When this clause is
omitted, the partition is the entire result table. When PARTITION BY is used, you can
define one or more columns by which data will be partitioned. You can look at it as
GROUP BY for window functions.

Another important clause is ORDER BY . It sorts data within the window. In the context of
window functions, this clause gives instructions on the order in which the function will
be executed.

To learn more, refer to this window functions article.

18. What’s the Difference


Between Window Functions and
GROUP BY?
The only similarity they share is that both GROUP BY and window functions can be –
and very often are – used with the aggregate functions, and they both work on a set of
rows.

However, when using GROUP BY , the output is shown as groups, and you can’t see the
individual rows forming the group.

Window functions don’t have such problems. One of their characteristics is that they
don’t collapse the individual rows when showing aggregated data. This means it’s
possible to show aggregated and non-aggregated data simultaneously.

Window functions are used for more than data aggregation, as you’ll see in the
following question. But if you want to know more about the current topic, we have an
article that explains window functions vs. GROUP BY.
SQL
test
19. What Window Functions Do
You Know?
SQL window functions can be generally divided into four categories:

Ranking Functions
Distribution Functions
Analytic Functions
Aggregate Functions

The ranking functions are:

ROW_NUMBER() – Returns a unique number for each row within a partition; tied
values have different row numbers.
RANK() – Ranks data within a partition; tied values have the same rank, and
there’s a gap following ties (e.g. 1, 2, 3, 3, 5).
DENSE_RANK() – Ranks data within a partition; tied values have the same rank and
there’s no ranking gap (e.g. 1, 2, 3, 3, 4).

The distribution functions are:

PERCENT_RANK() – Returns the relative rank within a partition.


CUME_DIST() – Returns the cumulative distribution within a partition.

The analytic functions are:

LEAD() – Allows accessing values from a subsequent row in relation to the


current row.
LAG() – Allows accessing values from a previous row in relation to the current
row.
NTILE() – Divides rows within a partition into approximately equal groups.
FIRST_VALUE() – Allows accessing values from the first row within a partition.
LAST_VALUE() – Allows accessing values from the last row within a partition.
NTH_VALUE() – Allows accessing the n-th row within a partition.

Finally, the aggregate functions are:

AVG() – Returns an average value for the rows in a partition.


COUNT() – Returns the number of values in the rows in a partition.
MAX() – Returns the maximum value for the rows in a partition.
MIN() – Returns the minimum value for the rows in a partition.
SUM() – Returns the sum value of the rows in a partition.

Our Window Functions Cheat Sheet will provide you with more information about all
these functions.

The best way to learn about window functions is through practice. I recommend
this Window Functions course. It has 218 interactive exercises, which equals about
20 hours of coding.

20. How Do You Create a


Ranking in SQL?
The easiest way to rank data in SQL is to use one of three ranking window functions:

ROW_NUMBER()
RANK()
DENSE_RANK()

You’re given this dataset named album_sales with the following data:

id artist album copies_sold

1 Eagles Hotel California 42,000,000

SQL 2 Led Zeppelin Led Zeppelin IV 37,000,000

test 3 Shania Twain Come On Over 40,000,000


Fleetwood
4 Rumours 40,000,000
Mac

5 AC/DC Back in Black 50,000,000

6 Bee Gees Saturday Night Fever 40,000,000

Michael
7 Thriller 70,000,000
Jackson

The Dark Side of the


8 Pink Floyd 45,000,000
Moon

Whitney
9 The Bodyguard 45,000,000
Houston

Their Greatest Hits


10 Eagles 44,000,000
(1971-1975)

These are the sales of the ten best-selling albums in history. As you can see, the albums
are not ranked. That’s what we’ll do here: rank them from best- to worst-selling using
window functions.

If you use ROW_NUMBER() , the query will look like this:

Code

SELECT
ROW_NUMBER() OVER (ORDER BY copies_sold DESC) AS rank,
artist,
album,
copies_sold
FROM album_sales;

The syntax is simple. First, you choose the window function. Then you use the
mandatory OVER() clause that signals it’s a window function. In ORDER BY , you sort
data descendingly. This now means the row numbers will be assigned according to the
copies sold from high to low.

Of course, list all other columns you need and reference the table to get the same
output:

rank artist album copies_sold

Michael
1 Thriller 70,000,000
Jackson

2 AC/DC Back in Black 50,000,000

Whitney
3 The Bodyguard 45,000,000
Houston

The Dark Side of


4 Pink Floyd 45,000,000
the Moon

Their Greatest Hits


5 Eagles 44,000,000
(1971-1975)

6 Eagles Hotel California 42,000,000

7 Shania Twain Come On Over 40,000,000

Fleetwood
8 Rumours 40,000,000
Mac

Saturday Night
9 Bee Gees 40,000,000
Fever

10 Led Zeppelin Led Zeppelin IV 37,000,000

As you can see, the albums are ranked from one to ten. Two albums sold 45 million
copies. However, they are ranked differently (third and fourth) according to random
criteria. The same happens with three albums that sold 40 million copies.

If you use RANK() , the syntax is the same, except you use a different window function:

Code

SQL
test
SELECT
RANK() OVER (ORDER BY copies_sold DESC) AS rank,
artist,
album,
copies_sold
FROM album_sales;

However, the output is different:

rank artist album copies_sold

Michael
1 Thriller 70,000,000
Jackson

2 AC/DC Back in Black 50,000,000

Whitney
3 The Bodyguard 45,000,000
Houston

The Dark Side of


3 Pink Floyd 45,000,000
the Moon

Their Greatest Hits


5 Eagles 44,000,000
(1971-1975)

6 Eagles Hotel California 42,000,000

7 Shania Twain Come On Over 40,000,000

Fleetwood
7 Rumours 40,000,000
Mac

Saturday Night
7 Bee Gees 40,000,000
Fever

10 Led Zeppelin Led Zeppelin IV 37,000,000

You can see that the tied albums are ranked as third (two times). The next non-tie
album is ranked fifth. The same happens with the albums ranked seventh.

Let’s see what happens if we use DENSE_RANK() :

Code

SELECT
DENSE_RANK() OVER (ORDER BY copies_sold DESC) AS rank,
artist,
album,
copies_sold
FROM album_sales;

Here’s the result:

rank artist album copies_sold

Michael
1 Thriller 70,000,000
Jackson

2 AC/DC Back in Black 50,000,000

Whitney
3 The Bodyguard 45,000,000
Houston

The Dark Side of


3 Pink Floyd 45,000,000
the Moon

Their Greatest Hits


4 Eagles 44,000,000
(1971-1975)

5 Eagles Hotel California 42,000,000

6 Shania Twain Come On Over 40,000,000

Fleetwood
6 Rumours 40,000,000
Mac

Saturday Night
6 Bee Gees 40,000,000
Fever

7 Led Zeppelin Led Zeppelin IV 37,000,000

The first tied albums are ranked as third, which is the same as in the previous result. But
SQL the difference is that the next non-tie rank is fourth – meaning the ranking is not
test skipped.
The three other tied albums are now ranked sixth, not seventh as before. Also, the
highest rank is seventh, not tenth.

As you can see, each method returns different results. You should use the one that best
suits your data and what you want to achieve with ranking. To learn more, read the
article on ranking rows in SQL.

21. What’s the Difference


Between RANK() and
DENSE_RANK()?
We already touched on the difference in the previous question. You saw it there in a
practical example, and now let’s formulate it to answer this question.

RANK() assigns the same rank to rows with the same values. When it gets to the
following non-tied row, it skips the rank by the number of tied ranks.

DENSE_RANK() also gives the same rank to tied values. However, the rank is not skipped
when it reaches the following non-tied row. In other words, DENSE_RANK() ranks data
sequentially.

More details are explained in this article about RANK() and DENSE_RANK() differences.

22. Find the Top n Rows in SQL


Using a Window Function and a
CTE.
This is a common question and can be solved in several ways. We will use the window
function in a CTE to return the desired result.

The available data is stored in the salary table:

id first_name last_name salary department

1 Tim Thompson 10,524.74 Sales

2 Martina Hrabal 7,895.14 Accounting

3 Susan Truman 15,478.69 Sales

4 Ciro Conte 8,794.41 Accounting

5 Jorge De Lucia 7,489.15 Sales

6 Carmen Lopez 10,479.15 Accounting

7 Catherine Molnar 8,794.89 Sales

8 Richard Buchanan 12,487.69 Accounting

9 Mark Wong 9,784.19 Sales

10 Silvia Karelias 9,748.64 Accounting

The task here is to return the top three highest paid employees in each department,
together with their salary and department.

The approach is this:

Code

WITH ranking AS (
SELECT
first_name,
last_name,
salary,
department,
DENSE_RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS salary_rank
FROM salary
)
SQL
test SELECT *
FROM ranking

You might also like