SQL Inter Q&A
SQL Inter Q&A
SQL Inter Q&A
Top N
Lesson Content
What is Top N Problem?
Top N Records
Top N Per Category
Top N Per Category With Ties
📌 Sample Questions
• What are the Top 5 highest-rated movies?
• What are the Top 3 highest paid employees per department?
• What are the Top 3 highest paid employees per department when there’re ties?
Top N Records
Query the 5th largest value in the table t.
Assume the values are unique, and there are more than 5 values in the table.
Table t:
value
10
50
Although we are not returning all top 5 values, we still consider this as the Top N problem.
The only difference is that we exclude the Top N - 1 from the result.
Top N 1
LIMIT and OFFSET
MySQL
SELECT value
FROM t
ORDER BY value DESC
LIMIT 1
OFFSET 4;
Sort the numbers, use OFFSET , and LIMIT to return the 5th row.
MS SQL Server
SELECT value
FROM t
ORDER BY value DESC
OFFSET 4 ROWS
FETCH NEXT 1 ROWS ONLY;
There is no LIMIT keyword, Use the FETCH keyword to specify how many rows to return.
Window Functions
SELECT value
FROM (
SELECT value,
ROW_NUMBER() OVER(ORDER BY value DESC) AS row
FROM t
) AS rk_table
WHERE row = 5;
💡 When using window functions, we cannot apply filters on the result generated by the window
function directly → create a subquery to filter results.
5 1 1 1
4.9 2 2 2
4.9 3 2 2
4.8 4 3 4
The rank of a row is determined by one plus the number of ranks that come before it.
Top N 2
Top N Per Category
🌿 Cannot use
Use any of
LIMIT and
ROW_NUMBER()
OFFSET
,
, the window function is a better choice.
If two restaurants have the same average ratings, return either restaurant.
Table rating:
… … … …
💡 Idea:
1. Compute average ratings for all the restaurants.
2. Sort ratings.
3. Select the top 5.
SELECT
name,
city,
AVG(rating * 1.0) AS ave_rating
FROM rating
GROUP BY name, city;
Since the ratings are integers, multiply by 1.0 to avoid integer division.
2. Sort ratings.
WITH avg_ratings AS (
SELECT
name, city,
AVG(rating * 1.0) AS avg_rating
FROM rating
GROUP BY name, city
)
Top N 3
SELECT
name, city, avg_rating,
ROW_NUMBER() OVER(PARTITION BY city ORDER BY avg_rating DESC) as row
FROM avg_ratings;
Since we need only 5 restaurants per city, and the ties can be broken arbitrarily.
WITH avg_ratings AS (
SELECT
name, city,
AVG(rating * 1.0) AS avg_rating
FROM rating
GROUP BY name, city
),
rating_rank AS (
SELECT
name, city, rating,
ROW_NUMBER() OVER(PARTITION BY city ORDER BY avg_rating DESC) as row
FROM avg_ratings
)
SELECT
name, city, rating
FROM rating_rank
WHERE row <= 5;
Filter the row as less or equal to 5 → select only 5 top-rated restaurants per city.
❓ What if there are ties in the ranks, and we want to get all the restaurants with the top 5 ratings
per city? How do we modify the query?
If the restaurants have the same average ratings, return all restaurants with the same ratings.
5 1 1 1
4.9 2 2 2
4.9 3 2 2
4.8 4 3 4
WITH avg_ratings AS (
SELECT
Top N 4
name, city,
AVG(rating * 1.0) AS avg_rating
FROM rating
GROUP BY name, city
),
rating_rank AS (
SELECT
name, city, avg_rating,
DENSE_RANK() OVER(PARTITION BY city ORDER BY avg_rating DESC) as rk
FROM avg_ratings
)
SELECT
name, city, avg_rating
FROM rating_rank
WHERE rk <= 5;
🔥 During interviews:
• Clarify the logic - whether to output top N records, or all records (≥ N) that match the top N
scores.
Top N 5
🏍
Ratios
Calculating Ratios
Two Methods
Example: Subscription Rate
Example: Immediate Order
Calculating Ratios
The problem is to compute a ratio or a percentage given some data entries or system logs.
For example:
Query the percentage of users who had some behavior from a table with user behavior logs.
Query the percentage of products that satisfy some criteria based on a purchase history table.
Usually the numerator and the denominator are counts that come from the same table.
Two Methods
1. Subquery: Use a subquery to compute the denominator and the main query to compute the
numerator and ratio.
2. CASE WHEN : Use CASE WHEN to compute the numerator and the main query to compute the
denominator and ratio.
Ratios 1
user_id premium Write a query to calculate the premium subscription
1 TRUE rate: the count of premium subscribers over the total
number of users.
2 FALSE
3 TRUE
Subquery method
SELECT
COUNT(user_id) * 1.0 / (SELECT COUNT(user_id) FROM subscription)
AS ratio
FROM subscription
WHERE premium = 'TRUE';
In MS SQL server, we need to multiply the numerator by 1.0 to avoid integer division, which will
return 0.
SUM → numerator
First use the CASE WHEN statement, then SUM over all the numbers returned → the count of rows
that meet the condition we specified.
SUM(
CASE
WHEN condition THEN 1
ELSE 0
END
)
SELECT
SUM(CASE WHEN premium = 'TRUE' THEN 1 ELSE 0 END) * 1.0 / COUNT(user_id)
AS ratio
FROM subscription;
The SUM(CASE WHEN..) statement will give us the count of premium subscribers.
AVG
An easier (i.e. better) way to get ratio by avoiding calculating the denominator.
SELECT
AVG(CASE WHEN premium = 'TRUE' THEN 1.0 ELSE 0.0 END)
Ratios 2
AS ratio
FROM subscription;
This method only works when the denominator is the total count.
Note: Change the return value to decimals, 1.0 and 0.0 to avoid integer division in the AVG
function.
3 2019-10-12 2019-10-12
3 2019-10-09 2019-10-11
2 2019-08-11 2019-08-13
4 2019-01-09 2019-01-09
Query the percentage of users who placed their first order as an immediate order.
The first order is the earliest order that a customer placed based on the order date.
The immediate order is defined as the same-day order; orders with the same customer preferred
delivery date and the order date.
3 2019-10-12 2019-10-12
Result:
3 2019-10-09 2019-10-11
Subquery method
SELECT
customer_id
FROM delivery
Ratios 3
GROUP BY customer_id
HAVING MIN (order_date) = MIN (pref_delivery_date)
Use a WITH common table expression to store the result we just got.
WITH first_order AS (
SELECT
customer_id
FROM delivery
GROUP BY customer_id
HAVING MIN(order_date) = MIN (pref_delivery_date)
)
SELECT
COUNT(customer_id) * 1.0 /
(SELECT COUNT(DISTINCT customer_id) FROM delivery)
AS immediate_percentage
FROM first_order
SELECT
AVG(CASE
WHEN first_order_date = pref_delivery_date THEN 1.0
ELSE 0.0 END
) AS immediate_percentage
FROM ...
Need a table which contains the first order date for each customer → get the rankings of the
order dates and select the ranking = 1.
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_date) AS
order_rk
FROM delivery
Result:
1 2019-08-01 2019-08-02 1
2 2019-08-02 2019-08-02 1
1 2019-09-02 2019-09-04 2
3 2019-10-12 2019-10-12 1
3 2019-10-09 2019-10-11 2
2 2019-08-11 2019-08-13 2
4 2019-01-09 2019-01-09 1
Put the query into a WITH CTE and call the table ordered_delivery.
Ratios 4
WITH ordered_delivery
AS (SELECT
*,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_date) AS
order_rk
FROM delivery)
SELECT
AVG(CASE
WHEN order_date = pref_delivery_date THEN 1.0
ELSE 0.0 END
) AS immediate_percentage
FROM ordered_delivery
WHERE order_rk = 1 # first order
In the main query, select the first order for each customer.
Ratios 5