Zero To Hero - 50 SQL Qns
Zero To Hero - 50 SQL Qns
Search Write
76
Open in Proxy API
Having an issue ?
Open a ticket or mail here
Why SQL is a Critical Skill for All Tech Workers
In today’s data-driven world, SQL (Structured Query Language) is more than
just a tool for database administrators; it’s a foundational skill for anyone in
the tech industry. Whether you’re a software developer, data scientist,
product manager, or analyst, you will likely interact with data at some point
in your career. SQL allows you to query, manipulate, and analyze vast
amounts of data efficiently, making it an indispensable tool for decision-
making, product development, and business insights.
4. Incremental Learning: Books and tutorials can only take you so far
before concepts become abstract. SQL problem-solving progresses
naturally from simple to complex, helping you solidify each concept
before moving to the next. This structured learning approach ensures
you build a strong foundation.
1. Portability: ANSI SQL makes your queries portable, meaning they can
run on any database system that supports the standard. This is essential
when switching databases or working in diverse environments.
table2.id
are used with standard aggregate functions like COUNT() , SUM() , AVG() ,
etc.
Why Are There So Many Ways to Solve the Same SQL Task?
SQL is a highly flexible language, offering multiple ways to solve the same
problem due to its declarative nature. Unlike procedural languages, SQL
focuses on what result you want, not how to achieve it, leaving
implementation details to the database engine. Here’s why multiple solutions
often exist for the same SQL task:
1. Built-in Functions: SQL has a rich set of functions (e.g., COUNT() , SUM() ,
CASE , JOIN , etc.), and different combinations of these can yield the same
outcome.
Now let’s go ahead and solve 50 questions taken from the leetcode.com
website. Some are easy, some are hard and others are medium in level of
difficulty. The question number on the site will be provided for easy
reference. Enjoy!
Question 1
Difficulty: Easy
#### Example
**Input**:
Products table:
**Output**:
| product_id |
|-------------|
| 1 |
| 3 |
**Explanation**:
Only products 1 and 3 are both low fat and recyclable
[LeetCode 1757]
Solution
SELECT product_id
FROM products
WHERE low_fats = 'Y'
AND recyclable = 'Y';
The task is to find products that are both low fat and recyclable. The solution uses
a simple SELECT query to retrieve the product_id of such products from the
Products table.
Question 2
Difficulty: Easy
#### Example
**Input**:
Customer table:
| id | name | referee_id |
|-----|-------|------------|
| 1 | Will | null |
| 2 | Jane | null |
| 3 | Alex | 2 |
| 4 | Bill | null |
| 5 | Zack | 1 |
| 6 | Mark | 2 |
**Output**:
| name |
|-------|
| Will |
| Jane |
| Bill |
| Zack |
**Explanation**:
The customers referred by customer with `id = 2` are Alex and Mark.
The customers who are not referred by customer 2 are Will, Jane, Bill,
and Zack.
[LeetCode 1148]
Solution
SELECT name
FROM customer
WHERE referee_id != 2
OR referee_id IS NULL;
The SQL query selects the names of customers who were not referred by the
customer with id = 2 or who have no referee at all (indicated by a NULL value in
the referee_id column). The condition referee_id != 2 filters out those referred
by customer 2, while referee_id IS NULL ensures that customers without any
referee are also included. This query returns a list of customers either not referred
by customer 2 or not referred by anyone.
Question 3
Big Countries
Difficulty: Easy
Write a query to find the name, population, and area of the big countries.
#### Example
**Input**:
World table:
**Output**:
[LeetCode 595]
Solution
The solution retrieves the name , population , and area of countries that are
considered "big" based on the conditions provided. A country is classified as big if
it either has an area of at least 3,000,000 km² or a population of at least
25,000,000. The query uses the SELECT statement to fetch the relevant columns
from the world table, and the WHERE clause filters the rows to return only the
countries that satisfy either of the two conditions (area or population). The OR
operator ensures that countries meeting at least one of the criteria are included in
the result.
Question 4
Article Views I
Difficulty: Easy
- There is no primary key for this table, meaning the table may have duplicate
rows.
- Each row in this table indicates that a viewer viewed an article
(written by an author) on a specific date.
- `author_id` and `viewer_id` being equal means the author viewed their
own article.
#### Example
**Input**:
Views table:
**Output**:
| id |
|------|
| 4 |
| 7 |
**Explanation**:
Authors 4 and 7 viewed at least one of their own articles,
which is indicated by rows where `author_id = viewer_id`.
[LeetCode 1148]
Solution
SELECT DISTINCT author_id AS id
FROM views
WHERE author_id = viewer_id
ORDER BY author_id ASC;
The query selects distinct author_id s where the author has viewed their own
article, indicated by the condition author_id = viewer_id . The DISTINCT keyword
ensures that duplicate entries for the same author are removed, meaning each
author appears only once in the result. The output is then sorted in ascending
order of author_id using the ORDER BY clause. This query effectively retrieves a list
of authors who have viewed at least one of their own articles.
Question 5
Invalid Tweets
Difficulty: Easy
#### Example
**Input**:
Tweets table:
| tweet_id | content |
|----------|----------------------------------|
| 1 | Vote for Biden |
| 2 | Let us make America great again! |
**Output**:
| tweet_id |
|----------|
| 2 |
**Explanation**:
- Tweet 1 has a length of 14 characters, so it is valid.
- Tweet 2 has a length of 32 characters, which exceeds the limit,
making it an invalid tweet.
[LeetCode 1683]
Solution
SELECT tweet_id
FROM tweets
WHERE LENGTH(content) > 15;
This query retrieves the tweet_id of tweets that are considered invalid. The
LENGTH(content) function checks the number of characters in the content
column, and the WHERE clause filters the tweets where the content length is greater
than 15 characters. The query returns the tweet_id of all such invalid tweets.
Question 6
- `id` is the primary key for this table, meaning it contains unique values.
- Each row contains the `id` and the `name` of an employee.
#### Example
**Input**:
Employees table:
| id | name |
|-----|----------|
| 1 | Alice |
| 7 | Bob |
| 11 | Meir |
| 90 | Winston |
| 3 | Jonathan |
EmployeeUNI table:
| id | unique_id |
|-----|-----------|
| 3 | 1 |
| 11 | 2 |
| 90 | 3 |
**Output**:
| unique_id | name |
|-----------|----------|
| null | Alice |
| null | Bob |
| 2 | Meir |
| 3 | Winston |
| 1 | Jonathan |
**Explanation**:
- Alice and Bob do not have a unique ID, so `null` is displayed for them.
- Meir's unique ID is 2, Winston's is 3, and Jonathan's is 1.
[LeetCode 1378]
Solution
This query uses a LEFT JOIN to combine data from the Employees and
EmployeeUNI tables. It selects the unique_id from the EmployeeUNI table and the
name from the Employees table. The LEFT JOIN ensures that all employees from
the Employees table are included, even if they don't have a corresponding
unique_id in the EmployeeUNI table, returning null for those without a match.
This retrieves the unique IDs for employees, or null if an employee lacks one
Question 7
#### Example
**Input**:
Sales table:
Product table:
| product_id | product_name |
|------------|--------------|
| 100 | Nokia |
| 200 | Apple |
| 300 | Samsung |
**Output**:
**Explanation**:
From `sale_id = 1`, we conclude that Nokia was sold for 5000 in 2008.
From `sale_id = 2`, Nokia was sold for 5000 in 2009.
From `sale_id = 7`, Apple was sold for 9000 in 2011.
[LeetCode 1068]
Solution
First query uses an implicit join, also known as a comma join, which behaves like
an inner join. Here, it combines the sales and product tables and only returns
rows where the product_id in both tables matches. If there is no corresponding
product_id in the product table for a record in the sales table, that sales
Result: Only records where there is a match between the sales and product tables
based on product_id are returned.
Data Loss: Any rows in the sales table without a matching product_id in the
product table are discarded.
Alternate Solution
Second query uses an explicit LEFT JOIN , which includes all rows from the sales
table, even if they don't have a matching product_id in the product table. For
unmatched rows, the columns from the product table (e.g., product_name ) will
return NULL .
Result: All rows from the sales table are included. If a row from the sales table
has no corresponding row in the product table, the product_name will be NULL .
Data Preservation: Unlike the first query, this query retains all records from
sales , even if there is no matching product_id in the product table.
Key Differences:
The first query returns only the rows where product_id exists in both sales and
product tables.
The second query returns all rows from sales , including those without a match
in the product table.
Data Handling:
The first query is essentially an inner join, so unmatched rows are excluded.
The second query is a left join, ensuring all sales rows are preserved, even if the
product data is missing.
Null Handling:
The first query does not handle nulls for unmatched product_id s because those
rows are excluded.
The second query may return NULL for product_name when there’s no match in
the product table.
In summary, use the first query when you only want matching rows between the
two tables. Use the second query when you need all rows from the sales table,
regardless of whether they have a matching product_id in the product table.
Question 8
Difficulty: Easy
#### Example
**Input**:
Visits table:
| visit_id | customer_id |
|----------|-------------|
| 1 | 23 |
| 2 | 9 |
| 4 | 30 |
| 5 | 54 |
| 6 | 96 |
| 7 | 54 |
| 8 | 54 |
Transactions table:
**Output**:
| customer_id | count_no_trans |
|-------------|----------------|
| 54 | 2 |
| 30 | 1 |
| 96 | 1 |
**Explanation**:
- Customer with `id = 23` visited once and made a transaction during the visit.
- Customer with `id = 9` visited once and made a transaction.
- Customer with `id = 30` visited once and did not make any transactions.
- Customer with `id = 54` visited three times. During 2 visits, no
transactions were made, but in one visit, they made 3 transactions.
- Customer with `id = 96` visited once and did not make any transactions.
[LeetCode 1581]
Solution
FROM visits: The data is fetched from the visits table, which holds information
about customers who visited the mall.
This query effectively identifies customers who visited the mall but did not
complete any transactions and returns how many such visits they made.
Question 9
Rising Temperature
Difficulty: Easy
#### Example
**Input**:
Weather table:
| id | recordDate | temperature |
|----|------------|-------------|
| 1 | 2015-01-01 | 10 |
| 2 | 2015-01-02 | 25 |
| 3 | 2015-01-03 | 20 |
| 4 | 2015-01-04 | 30 |
**Output**:
| id |
|----|
| 2 |
| 4 |
**Explanation**:
- On 2015-01-02, the temperature was higher than the previous day (10 -> 25).
- On 2015-01-04, the temperature was higher than the previous day (20 -> 30).
[LeetCode 197]
Solution
SELECT W1.id
FROM Weather W1, Weather W2
WHERE W1.recordDate = DATE_ADD(W2.recordDate, INTERVAL 1 DAY)
AND W1.temperature > W2.temperature;
This query uses a self-join on the Weather table to compare each day's record with
the previous day's record. It selects W1.id where the recordDate of W1 is exactly
one day after W2 's recordDate and the temperature on W1 is higher than on W2 .
Alternate Solution
SELECT w1.id
FROM Weather AS w1
JOIN Weather AS w2
ON DATEDIFF(w1.recordDate, w2.recordDate) = 1
WHERE w1.recordDate > w2.recordDate
AND w1.temperature > w2.temperature;
This query achieves the same result using a self-join with the DATEDIFF function to
find records where the recordDate of w1 is exactly one day later than w2 . It then
filters the results to ensure w1 's temperature is higher than w2 's. The DATEDIFF
function calculates the difference in days between the two dates, which simplifies
the logic compared to using DATE_ADD .
First Query: Uses DATE_ADD to find the previous day's date and then compares
temperatures.
Second Query: Uses DATEDIFF to directly calculate the day difference and check
temperatures.
Both queries effectively compare temperatures between consecutive days but use
different approaches for handling date differences.
Question 10
Difficulty: Easy
#### Example
**Input**:
Activity table:
**Output**:
| machine_id | processing_time |
|------------|-----------------|
| 0 | 0.894 |
| 1 | 0.995 |
| 2 | 1.456 |
**Explanation**:
- Machine 0's average time: ((1.520 - 0.712) + (4.120 - 3.140)) / 2 = 0.894
- Machine 1's average time: ((1.550 - 0.550) + (1.420 - 0.430)) / 2 = 0.995
- Machine 2's average time: ((4.512 - 4.100) + (5.000 - 2.500)) / 2 = 1.456
[LeetCode 1661]
Solution
SELECT A.MACHINE_ID,
ROUND(AVG(B.TIMESTAMP - A.TIMESTAMP), 3) AS PROCESSING_TIME
FROM ACTIVITY A, ACTIVITY B
WHERE A.MACHINE_ID = B.MACHINE_ID
AND A.PROCESS_ID = B.PROCESS_ID
AND A.ACTIVITY_TYPE = 'START'
AND B.ACTIVITY_TYPE = 'END'
GROUP BY MACHINE_ID;
This query uses a self-join between ACTIVITY table as A (for start times) and B
(for end times), based on the machine_id and process_id . It calculates the
difference between the start and end timestamps for each process, averages the
results per machine, and rounds the result to 3 decimal places.
Alternate Solution
This query achieves the same as the first one but uses an explicit JOIN . It matches
rows where a1 has a start activity and a2 has an end activity for the same
machine_id and process_id , calculates the time differences, averages them, and
rounds the result to 3 decimal places. It includes an additional ORDER BY clause to
sort the results by machine_id .
Superior Option:
The second query is superior because it uses the more modern and clearer JOIN
syntax, making it more readable and easier to maintain than the older, implicit
join (comma-separated) syntax in Solution 1.
Question 11
Employee Bonus
Difficulty: Easy
If an employee does not have a bonus, display `null` for the bonus.
#### Example
**Input**:
Employee table:
| empId | name | supervisor | salary |
|-------|--------|------------|--------|
| 3 | Brad | null | 4000 |
| 1 | John | 3 | 1000 |
| 2 | Dan | 3 | 2000 |
| 4 | Thomas | 3 | 4000 |
Bonus table:
| empId | bonus |
|-------|-------|
| 2 | 500 |
| 4 | 2000 |
**Output**:
| name | bonus |
|-------|-------|
| Brad | null |
| John | null |
| Dan | 500 |
[Leetcode 577]
Solution
LEFT JOIN: The Employee table is joined with the Bonus table using a LEFT JOIN
to ensure that all employees are included, even those without a bonus.
WHERE clause: Filters the results to include:
The query returns the name and bonus for each employee, displaying null for
employees without a bonus.
Question 12
Difficulty: Easy
#### Example
**Input**:
Students table:
| student_id | student_name |
|------------|--------------|
| 1 | Alice |
| 2 | Bob |
| 13 | John |
| 6 | Alex |
Subjects table:
| subject_name |
|--------------|
| Math |
| Physics |
| Programming |
Examinations table:
| student_id | subject_name |
|------------|--------------|
| 1 | Math |
| 1 | Physics |
| 1 | Programming |
| 2 | Programming |
| 1 | Physics |
| 1 | Math |
| 13 | Math |
| 13 | Programming |
| 13 | Physics |
| 2 | Math |
| 1 | Math |
**Output**:
**Explanation**:
- Alice attended the Math exam 3 times, Physics 2 times, and Programming 1
time.
- Bob attended the Math exam once, Programming once, and did not attend the
Physics exam.
- Alex did not attend any exams.
- John attended all three exams once.
[LeetCode 1280]
Solution
SELECT student_subject.student_id,
student_subject.student_name,
student_subject.subject_name,
COUNT(exam.student_id) AS attended_exams
FROM
(
SELECT student.*, subject.*
FROM students AS student
CROSS JOIN subjects AS subject
) AS student_subject
LEFT JOIN examinations AS exam
ON student_subject.student_id = exam.student_id
AND student_subject.subject_name = exam.subject_name
GROUP BY student_subject.student_id, student_subject.student_name,
student_subject.subject_name
ORDER BY student_subject.student_id, student_subject.student_name,
student_subject.subject_name;
This query works as follows:
student_subject : Combines all students with all subjects using CROSS JOIN ,
exam : The examinations table, which tracks how many exams a student attended
for a specific subject.
The LEFT JOIN ensures that even if a student didn’t attend an exam, they will still
be included in the result with a count of 0 for that subject.
The results are grouped by student and subject, counting how many times a
student attended the exam for each subject. The output is then ordered by
student_id and subject_name .
Question 13
Difficulty: Medium
#### Example
**Input:**
Employee table:
**Output:**
| name |
|------|
| John |
**Explanation:**
John is the only manager with at least five direct reports:
Dan, James, Amy, Anne, and Ron.
[Leetcode 570]
Solution
SELECT e1.name
FROM employee AS e1
JOIN (
SELECT managerId
FROM employee
GROUP BY managerId
HAVING COUNT(managerId) > 4
) AS e2
ON e1.id = e2.managerId;
This query works as follows:
Subquery ( e2 ):
Uses HAVING COUNT(managerId) > 4 to filter for managers who have more than 4
direct reports.
Main Query:
Joins the original employee table ( e1 ) with the results of the subquery ( e2 ).
This query finds the names of managers who have at least 5 direct reports.
Question 14
Confirmation Rate
Difficulty: Medium
#### Example
**Input:**
**Signups table:**
| user_id | time_stamp |
|---------|---------------------|
| 3 | 2020-03-21 10:16:13 |
| 7 | 2020-01-04 13:57:59 |
| 2 | 2020-07-29 23:09:44 |
| 6 | 2020-12-09 10:39:37 |
**Confirmations table:**
| user_id | confirmation_rate |
|---------|-------------------|
| 6 | 0.00 |
| 3 | 0.00 |
| 7 | 1.00 |
| 2 | 0.50 |
**Explanation:**
- User 6 did not request any confirmation messages; confirmation rate is 0.
- User 3 requested 2 confirmations, both timed out; rate is 0.
- User 7 requested 3 confirmations, all were confirmed; rate is 1.
- User 2 requested 2 confirmations, 1 was confirmed and 1 timed out;
rate is 0.5.
[LeetCode 1934]
Solution
SELECT
s.user_id,
ROUND(COALESCE(c.confirmation_rate, 0), 2) AS confirmation_rate
FROM
signups AS s
LEFT JOIN (
SELECT
user_id,
AVG(CASE WHEN action = 'confirmed' THEN 1 ELSE 0 END)
AS confirmation_rate
FROM
confirmations
GROUP BY
user_id
) AS c
ON
s.user_id = c.user_id;
Subquery ( c ):
Purpose: Calculate the confirmation rate for each user.
Details:
AVG(CASE WHEN action = 'confirmed' THEN 1 ELSE 0 END) computes the average
of 1 for 'confirmed' actions and 0 for 'timeout' actions, effectively giving the
proportion of confirmations.
GROUP BY user_id ensures that the rate is computed for each user.
Main Query:
LEFT JOIN: Joins the signups table with the subquery result on user_id .
ROUND(…, 2): Rounds the confirmation rate to two decimal places for cleaner
output.
Summary: This query calculates the confirmation rate for each user from the
confirmations table and ensures that users who didn't request any confirmations
are still included with a rate of 0.
Question 15
Difficulty: Easy
#### Example
**Input:**
Cinema table:
**Output:**
**Explanation:**
Movies with odd-numbered IDs are 1, 3, and 5.
The movie with ID 3 is excluded because its description is "boring".
[Leetcode 620]
Solution
SELECT *
FROM cinema
WHERE id % 2 = 1
AND description <> 'boring'
ORDER BY rating DESC;
The query uses the condition id % 2 = 1 to filter rows where the id is odd. The
modulus operator % calculates the remainder of dividing id by 2. If the
remainder is 1, then id is odd, and the condition evaluates to true. This ensures
that only rows with odd id values are included in the result.
Question 16
**Difficulty:** Easy
#### Example
**Input:**
Prices table:
UnitsSold table:
**Output:**
| product_id | average_price |
|------------|---------------|
| 1 | 6.96 |
| 2 | 16.96 |
**Explanation:**
The average selling price is calculated as the total price of
the product divided by the number of units sold:
[LeetCode 1251]
Solution
SELECT p.product_id,
COALESCE(ROUND(SUM(u.units * p.price) / SUM(u.units), 2), 0)
AS average_price
FROM prices p
LEFT JOIN unitssold u
ON p.product_id = u.product_id
AND u.purchase_date BETWEEN p.start_date AND p.end_date
GROUP BY p.product_id;
This query calculates the average selling price of each product by joining the
prices and unitssold tables. It multiplies the units sold by the product price for
each sale, sums these values, and divides by the total units sold to find the average
price. The COALESCE function ensures that if a product has no sales, the result will
be 0. The result is rounded to 2 decimal places.
Question 17
Project Employees I
Difficulty: Easy
#### Example
**Input:**
Project table:
| project_id | employee_id |
|------------|-------------|
| 1 | 1 |
| 1 | 2 |
| 1 | 3 |
| 2 | 1 |
| 2 | 4 |
Employee table:
**Output:**
| project_id | average_years |
|------------|---------------|
| 1 | 2.00 |
| 2 | 2.50 |
**Explanation:**
The average experience years for the first project is
(3 + 2 + 1) / 3 = 2.00, and for the second project, it is (3 + 2) / 2 = 2.50.
[LeetCode 1075]
Solution
This query calculates the average experience years of employees for each project. It
joins the project table with the employee table on employee_id , computes the
average experience for each project, and rounds the result to 2 decimal places. The
LEFT JOIN ensures that all projects are included, even if no employees are assigned
to some projects.
Question 18
Difficulty: Easy
- `user_id` is the primary key (column with unique values) for this table.
- Each row of this table contains the name and the id of a user.
#### Example
**Input:**
Users table:
| user_id | user_name |
|---------|-----------|
| 6 | Alice |
| 2 | Bob |
| 7 | Alex |
Register table:
| contest_id | user_id |
|------------|---------|
| 215 | 6 |
| 209 | 2 |
| 208 | 2 |
| 210 | 6 |
| 208 | 6 |
| 209 | 7 |
| 209 | 6 |
| 215 | 7 |
| 208 | 7 |
| 210 | 2 |
| 207 | 2 |
| 210 | 7 |
**Output:**
| contest_id | percentage |
|------------|------------|
| 208 | 100.0 |
| 209 | 100.0 |
| 210 | 100.0 |
| 215 | 66.67 |
| 207 | 33.33 |
**Explanation:**
All the users registered in contests 208, 209, and 210.
The percentage is 100% and we sort them in the answer table by
`contest_id` in ascending order. Alice and Alex registered in
contest 215 and the percentage is ((2/3) * 100) = 66.67%.
Bob registered in contest 207 and the percentage is ((1/3) * 100) = 33.33%.
[LeetCode 1633]
Solution
SELECT r.contest_id,
ROUND(COUNT(r.user_id) * 100 /
(SELECT COUNT(u.user_id) FROM users u), 2)
AS percentage
FROM register r
GROUP BY r.contest_id
ORDER BY percentage DESC, r.contest_id ASC;
This query calculates the percentage of users registered for each contest.
COUNT(r.user_id) : Counts the number of unique users registered for each contest.
ORDER BY percentage DESC, r.contest_id ASC : Orders the results first by the
percentage in descending order and then by contest_id in ascending order to
handle ties.
This ensures that contests with the highest user registration percentages appear
first and, in case of ties, are ordered by their contest_id .
Question 19
Difficulty: Easy
#### Example
**Input:**
Queries table:
**Output:**
**Explanation:**
- Dog queries quality is ((5 / 1) + (5 / 2) + (1 / 200)) / 3 = 2.50.
- Dog queries poor_query_percentage is (1 / 3) * 100 = 33.33%.
[LeetCode 1211]
Solution
SELECT query_name,
ROUND(AVG(rating / position), 2) AS quality,
ROUND(SUM(CASE WHEN rating < 3 THEN 1 ELSE 0 END) * 100.0 /
COUNT(rating), 2) AS poor_query_percentage
FROM queries
WHERE query_name IS NOT NULL
GROUP BY query_name;
WHERE query_name IS NOT NULL : Ensures that only rows with a non-null
query_name are included in the calculations.
Alternative Solution
SELECT query_name,
ROUND(AVG(rating / position), 2) AS quality,
ROUND(SUM(IF(rating < 3, 1, 0)) * 100.0 / COUNT(rating), 2)
AS poor_query_percentage
FROM Queries
WHERE query_name IS NOT NULL
GROUP BY query_name;
WHERE query_name IS NOT NULL : Filters out rows where query_name is null,
ensuring that only valid query names are considered.
Comparison
Functionality:
Both queries achieve the same results: calculating the average quality and the
percentage of poor queries per query_name .
The first solution uses standard SQL with CASE WHEN for conditional logic, which
is more portable across different SQL dialects.
The second solution uses IF , which is specific to MySQL. For databases other than
MySQL (e.g., SQL Server, Oracle), IF might not be available or might require
different syntax.
The IF function in the second query is concise but might be less familiar to those
used to other SQL dialects.
Portability:
The first solution using CASE WHEN is more portable and compatible with various
SQL databases.
The second solution using IF is specific to MySQL and might need adjustments for
other SQL environments.
Conclusion:
The first query is generally better due to its portability and adherence to standard
SQL practices. It is more likely to work across different SQL databases and might be
preferred in environments where SQL dialect compatibility is a concern.
Question 20
Difficulty: Medium
#### Example
**Input:**
Transactions table:
**Output:**
#### Explanation:
- In December 2018, the US had 2 transactions (1 approved), with a total
amount of 3000 (1000 from the approved transaction).
- In January 2019, the US had 1 transaction (approved) for 2000.
- In January 2019, Germany (DE) had 1 approved transaction for 2000.
[LeetCode 1193]
Solution
SELECT
DATE_FORMAT(trans_date, '%Y-%m') AS month,
country,
COUNT(amount) AS trans_count,
COUNT(CASE WHEN state = 'approved' THEN amount ELSE NULL END) AS approved_co
SUM(amount) AS trans_total_amount,
SUM(CASE WHEN state = 'approved' THEN amount ELSE 0 END) AS approved_total_a
FROM Transactions
GROUP BY month, country;
COUNT(amount) : Counts the total number of transactions per month and country.
COUNT(CASE WHEN state = 'approved' THEN amount ELSE NULL END) : Counts the
number of approved transactions. The CASE statement checks if the state is
'approved' , and returns NULL if it’s not, ensuring only approved transactions are
counted.
SUM(amount) : Calculates the total transaction amount per month and country.
SUM(CASE WHEN state = 'approved' THEN amount ELSE 0 END) : Sums the
transaction amounts where the state is 'approved' . If not approved, it adds 0.
Alternative Solution
SELECT
CONCAT(YEAR(trans_date), '-',
CASE WHEN LENGTH(MONTH(trans_date)) = 1
THEN CONCAT('0', MONTH(trans_date))
ELSE MONTH(trans_date)
END) AS month,
country,
COUNT(id) AS trans_count,
SUM(CASE WHEN state = 'approved' THEN 1 ELSE 0 END) AS approved_count,
SUM(amount) AS trans_total_amount,
SUM(CASE WHEN state = 'approved' THEN amount ELSE 0 END)
AS approved_total_amount
FROM Transactions
GROUP BY month, country;
COUNT(id) : Counts the total number of transactions per month and country.
SUM(CASE WHEN state = 'approved' THEN 1 ELSE 0 END) : Counts the number of
approved transactions by summing 1 for approved transactions and 0 otherwise.
SUM(CASE WHEN state = 'approved' THEN amount ELSE 0 END) : Similar to solution
1, sums the approved transaction amounts.
Comparison
Date Formatting:
Solution 1 uses DATE_FORMAT , which is simpler and cleaner for date formatting in
YYYY-MM .
Solution 2 manually constructs the month using CONCAT and CASE , which
provides more control over the date format but adds complexity.
Both solutions use similar methods to count approved transactions, but Solution 1
uses COUNT(CASE ...) which is more intuitive, whereas Solution 2 uses SUM(CASE
Simplicity:
Solution 2 is slightly more complex due to the manual formatting of the month
and using SUM for counting approved transactions, which could be considered less
intuitive.
Performance:
Both solutions should perform similarly in most cases. However, Solution 1 might
have a slight edge due to its simplicity and direct usage of built-in functions like
DATE_FORMAT.
Solution 1 is better for this scenario due to its simplicity and clean handling of
date formatting and counting logic. Solution 2 provides more control over
formatting but adds unnecessary complexity for this specific case.
Question 21
Difficulty: Medium
- `delivery_id` is the primary key (column of unique values) for this table.
- The table holds information about food delivery orders where customers
specify their preferred delivery date, which can be the same as the
`order_date` (immediate) or after (scheduled).
The first order of a customer is the one with the earliest `order_date`.
It is guaranteed that each customer has precisely one first order.
#### Example
**Input:**
Delivery table:
**Output:**
| immediate_percentage |
|----------------------|
| 50.00 |
**Explanation:**
- Customer 1's first order is delivery id 1 (scheduled).
- Customer 2's first order is delivery id 2 (immediate).
- Customer 3's first order is delivery id 5 (scheduled).
- Customer 4's first order is delivery id 7 (immediate).
[Leetcode 1174]
Solution
SELECT
ROUND(SUM(CASE
WHEN customer_pref_delivery_date = order_date
THEN 1 ELSE 0 END) / COUNT(*) * 100, 2) AS immediate_percentage
FROM (
SELECT *
FROM delivery
WHERE (customer_id, order_date) IN (
SELECT customer_id, MIN(order_date)
FROM delivery
GROUP BY customer_id
)
) AS subquery;
WITH cte AS (
SELECT *,
CASE WHEN order_date = customer_pref_delivery_date THEN 'immediate'
ELSE 'scheduled' END AS order_type,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_date)
AS order_rnk
FROM delivery
)
SELECT ROUND(
(SUM(CASE WHEN order_rnk = 1 AND order_type = 'immediate'
THEN 1 ELSE 0 END) /
SUM(CASE WHEN order_rnk = 1 THEN 1 ELSE 0 END)) * 100, 2
) AS immediate_percentage
FROM cte;
A Common Table Expression (CTE) is a temporary result set defined within a SQL
query using the WITH clause. It improves readability by breaking complex queries
into smaller, logical parts, and can be referenced multiple times in the main query.
CTE Method: Uses a WITH clause and ROW_NUMBER() to rank orders for each
customer, simplifying filtering for first orders.
Breakdown:
meaning that the row numbering restarts for each customer. Essentially, each
customer's orders are treated as a separate group.
order by order_date : This specifies that the row numbers should be assigned in
the order of the order_date . The earliest order gets the rank of 1, the next one gets
2, and so on.
as order_rnk : This names the resulting column as order_rnk to indicate the rank
of the order within each customer’s sequence of orders.
In this context, order_rnk = 1 helps to identify the first order for each customer.
The CTE solution is generally more readable and scalable for large datasets, as it
clearly separates ranking logic from aggregation.
Question 22
Difficulty: Medium
You need to count the number of players who logged in for at least two
consecutive days starting from their first login date, then divide that
number by the total number of players.
#### Example
**Input:**
Activity table:
**Output:**
| fraction |
|-----------|
| 0.33 |
**Explanation:**
- Player 1 logged in on both `2016-03-01` and `2016-03-02`, so they logged in
again after their first login.
- Player 2 logged in only on `2017-06-25` and did not return the next day.
- Player 3 logged in twice, but the dates were far apart (`2016-03-02` and
`2018-07-03`), so they did not log in the day after their first login.
Thus, only 1 out of 3 players logged in again the day after their first
login, making the fraction 1/3 = 0.33.
[LeetCode 550]
Solution
WITH first_day AS (
-- Get the first login date for each player
SELECT player_id, MIN(event_date) AS event_date
FROM activity
GROUP BY player_id
),
second_day AS (
-- Check if a player logged in on the day after their first login
SELECT a.player_id
FROM activity a
JOIN first_day f ON a.player_id = f.player_id
WHERE a.event_date = DATE_ADD(f.event_date, INTERVAL 1 DAY)
)
first_day CTE:
This common table expression (CTE) selects the minimum event_date (first login)
for each player_id from the activity table.
second_day CTE:
This CTE checks if a player logged in the day after their first login by joining the
activity table with the first_day CTE. It filters for records where the
event_date matches exactly one day after the first_day for each player_id .
Main Query:
The main query calculates the fraction of players who logged in again on the day
after their first login. This is done by dividing the number of players in
second_day (those who logged in on consecutive days) by the total number of
players in first_day (all players with a first login). The result is rounded to two
decimal places.
Question 23
Difficulty: Easy
#### Example
**Input:**
Teacher table:
**Output:**
| teacher_id | cnt |
|------------|-----|
| 1 | 2 |
| 2 | 4 |
**Explanation:**
- Teacher 1 teaches subject 2 in departments 3 and 4, and subject 3 in
department 3, totaling 2 unique subjects.
- Teacher 2 teaches subjects 1, 2, 3, and 4, all in department 1,
totaling 4 unique subjects.
[Leetcode 2356]
Solution
## Using a Subquery
Then, it groups the results by teacher_id to count the unique subjects taught by
each teacher.
This approach is straightforward but may be less readable, especially for complex
queries.
Alternate Solution
WITH unique_subject AS (
SELECT DISTINCT teacher_id, subject_id
FROM teacher
)
SELECT teacher_id, COUNT(*) AS cnt
FROM unique_subject
GROUP BY teacher_id;
The CTE makes the logic clearer by separating the selection of distinct subjects from
the main query.
Both solutions effectively yield the same result, counting the number of unique
subjects taught by each teacher. The choice between them often comes down to
personal or team preferences for readability and maintainability. CTEs are
generally preferred for complex queries, while subqueries might suffice for simpler
cases.
Question 24
#### Example
**Input:**
Activity table:
**Output:**
| day | active_users |
|------------|--------------|
| 2019-07-20 | 2 |
| 2019-07-21 | 2 |
**Explanation:**
Note that we do not care about days with zero active users.
[Leetcode 1141]
Solution
WITH cte AS (
SELECT DISTINCT user_id, activity_date
FROM Activity
WHERE activity_date <= DATE_ADD('2019-07-27', INTERVAL 0 DAY)
AND activity_date > DATE_ADD('2019-07-27', INTERVAL -30 DAY)
)
The WITH cte AS (...) clause defines a CTE named cte that selects distinct
user_id and activity_date from the Activity table.
It filters the records to include only those within the 30 days leading up to July 27,
2019, ensuring that only relevant activity data is processed.
Final Selection:
The outer query selects activity_date as day and counts the number of distinct
active users ( COUNT(*) ).
The results are grouped by activity_date , which allows us to see the number of
active users for each day.
This query aims to calculate the daily active user count over a 30-day period
ending on July 27, 2019, by first isolating the relevant records and then
aggregating them by date.
Question 25
Difficulty: Medium
- `product_id` is the primary key (column with unique values) for this table.
- Each row of this table indicates the product name of each product.
#### Example
**Input:**
Sales table:
Product table:
| product_id | product_name |
|------------|--------------|
| 100 | Nokia |
| 200 | Apple |
| 300 | Samsung |
**Output:**
**Explanation:**
The output shows the first year each product was sold along with
the corresponding quantity and price.
[Leetcode 1070]
Solution
WITH cte AS (
SELECT product_id, MIN(year) AS first_year
FROM sales
GROUP BY product_id
)
SELECT s.product_id, t.first_year, s.quantity, s.price
FROM sales s
JOIN cte t ON t.product_id = s.product_id AND t.first_year = s.year;
Common Table Expression (CTE) - cte :
This part of the query creates a temporary result set called cte .
It selects the product_id and the minimum year (i.e., the first year of sales) for
each product.
The GROUP BY product_id clause ensures that we get one row per product.
Main Query:
The main query retrieves data from the sales table, including the product_id ,
It joins the sales table ( s ) with the cte CTE ( t ) using the product_id and
matching the first_year to the year in the sales records.
This ensures that only the sales records from the first year of each product are
returned.
The query effectively identifies the first year a product was sold and retrieves the
associated quantity and price for that year, allowing for a comprehensive view of
the initial sales data for each product.
Question 26
Difficulty: Easy
#### Example
**Input:**
Courses table:
| student | class |
|---------|----------|
| A | Math |
| B | English |
| C | Math |
| D | Biology |
| E | Math |
| F | Computer |
| G | Math |
| H | Math |
| I | Math |
**Output:**
| class |
|---------|
| Math |
**Explanation:**
- Math has 6 students, so we include it.
- English, Biology, and Computer each have 1 student, so we do not
include them.
[Leetcode 596]
Solution
SELECT class
FROM courses
GROUP BY class
HAVING COUNT(student) >= 5;
GROUP BY: This groups the data by class , so that the number of students in each
class can be counted.
HAVING COUNT(student) >= 5: The HAVING clause filters the results to only
include classes that have at least 5 students, as calculated by COUNT(student) .
This query returns all the classes that have 5 or more students enrolled.
Question 27
Difficulty: Easy
#### Example
**Input:**
Followers table:
| user_id | follower_id |
|---------|-------------|
| 0 | 1 |
| 1 | 0 |
| 2 | 0 |
| 2 | 1 |
**Output:**
| user_id | followers_count |
|---------|-----------------|
| 0 | 1 |
| 1 | 1 |
| 2 | 2 |
**Explanation:**
- The followers of user `0` are `{1}`.
- The followers of user `1` are `{0}`.
- The followers of user `2` are `{0,1}`.
[Leetcode 1729]
Solution
SELECT user_id, COUNT(follower_id) : This line selects the user_id from the
followers table and counts the number of follower_id s associated with each
user_id . The result is returned as followers_count .
FROM followers : This specifies the table followers that contains the user and
follower relationships.
GROUP BY user_id : This groups the result by user_id , ensuring that we count the
followers for each user individually.
ORDER BY user_id : This orders the final output by user_id in ascending order to
ensure the result is sorted as per the requirement
The query returns the number of followers for each user in ascending order of
user_id . Each row in the result table will have two columns:
This solution is efficient for calculating and returning the follower count for every
user in a social media app scenario.
Question 28
Difficulty: Easy
- This table may contain duplicates (there is no primary key for this
table in SQL).
- Each row of this table contains an integer.
#### Problem Statement
Find the largest single number. If there is no single number, report null.
#### Example
**Input:**
MyNumbers table:
| num |
|-----|
| 8 |
| 8 |
| 3 |
| 3 |
| 1 |
| 4 |
| 5 |
| 6 |
**Output:**
| num |
|-----|
| 6 |
**Explanation:**
The single numbers are 1, 4, 5, and 6. Since 6 is the largest
single number, we return it.
**Example 2:**
**Input:**
MyNumbers table:
| num |
|-----|
| 8 |
| 8 |
| 7 |
| 7 |
| 3 |
| 3 |
| 3 |
**Output:**
| num |
|------|
| null |
**Explanation:**
There are no single numbers in the input table, so we return null.
[LeetCode 619]
Solution
WITH UniqueNumbers AS (
SELECT num, COUNT(*) AS count
FROM MyNumbers
GROUP BY num
HAVING count = 1
)
This CTE selects distinct numbers from the MyNumbers table and counts how many
times each number appears.
The HAVING count = 1 clause filters the results to only include numbers that
appear exactly once, identifying them as "single numbers."
Final Selection:
The main query selects the maximum number from the UniqueNumbers CTE using
MAX(num) .
This returns the largest single number found in the original table. If there are no
single numbers, the result will be NULL .
Question 29
Difficulty: Medium
- `product_key` is the primary key (column with unique values) for this table.
#### Example
**Input:**
Customer table:
| customer_id | product_key |
|-------------|-------------|
| 1 | 5 |
| 2 | 6 |
| 3 | 5 |
| 3 | 6 |
| 1 | 6 |
Product table:
| product_key |
|-------------|
| 5 |
| 6 |
**Output:**
| customer_id |
|-------------|
| 1 |
| 3 |
**Explanation:**
- The products in the `Product` table are 5 and 6.
- Customers with IDs 1 and 3 have bought both products,
so they are included in the result.
[LeetCode 1045]
Solution
WITH no_duplicate AS (
SELECT DISTINCT * FROM customer
)
SELECT nd.customer_id
FROM no_duplicate nd
GROUP BY nd.customer_id
HAVING COUNT(*) = (SELECT COUNT(*) AS product_count FROM product);
The CTE named no_duplicate selects distinct entries from the customer table to
eliminate any duplicate records. This ensures that each customer is only counted
once, even if they purchased the same product multiple times.
The main query selects customer_id from the no_duplicate CTE.
It groups the results by customer_id , aggregating all purchases for each customer.
The HAVING clause checks that the count of distinct products each customer has
purchased is equal to the total count of products in the product table.
This is done using a subquery that counts all products in the Product table
( SELECT COUNT(*) AS product_count FROM product ).
Question 30
Difficulty: Easy
Write a solution to report the IDs and the names of all managers,
the number of employees who report directly to them, and the average
age of the reports rounded to the nearest integer.
**Input:**
Employees table:
**Output:**
**Explanation:**
Hercy has 2 people reporting directly to him, Alice and Bob.
Their average age is (41+36)/2 = 38.5, which is 39 after rounding
to the nearest integer.
**Example 2:**
**Input:**
Employees table:
**Output:**
[LeetCode 1731]
Solution
SELECT
e.employee_id,
e.name,
COUNT(sub.employee_id) AS reports_count,
ROUND(AVG(sub.age)) AS average_age
FROM
employees e
JOIN
employees sub
ON e.employee_id = sub.reports_to
GROUP BY
e.employee_id,
e.name
ORDER BY
e.employee_id;
The query retrieves information from the employees table, specifically focusing on
employees and their direct reports.
JOIN Operation:
The query uses a self-join on the employees table. The sub alias refers to
subordinates, and the join condition ( ON e.employee_id = sub.reports_to ) links
each employee to their direct reports.
Grouping:
The results are grouped by e.employee_id and e.name , which allows aggregation
functions like COUNT and AVG to calculate values specific to each employee.
Ordering:
The final result set is ordered by e.employee_id , ensuring that the output is sorted
in ascending order by employee ID.
Question 31
Difficulty: Easy
Write a solution to report all the employees with their primary department.
For employees who belong to one department, report their only department.
#### Example
**Input:**
Employee table:
**Output:**
| employee_id | department_id |
|-------------|---------------|
| 1 | 1 |
| 2 | 1 |
| 3 | 3 |
| 4 | 3 |
**Explanation:**
- The primary department for employee 1 is 1 (since they belong
to only one department).
- The primary department for employee 2 is 1.
- The primary department for employee 3 is 3 (as they belong to
only one department).
- The primary department for employee 4 is 3.
[LeetCode 1789]
Solution
-- Using UNION
UNION
The first part of the query selects employees who have marked a department as
their primary department ( primary_flag = 'Y' ).
The second part of the query uses GROUP BY to find employees who belong to only
one department (i.e., where COUNT(employee_id) = 1 ).
The UNION combines the results of these two queries, ensuring that duplicate
entries are removed (since UNION eliminates duplicates).
Performance could suffer due to the need to scan the table twice, once for each
SELECT query.
Alternative Solution
-- Using OR Condition
SELECT
employee_id,
department_id
FROM
Employee
WHERE
primary_flag = 'Y'
OR
employee_id IN (
SELECT
employee_id
FROM
Employee
GROUP BY
employee_id
HAVING
COUNT(*) = 1
);
The first condition ( primary_flag = 'Y' ) selects employees who have a primary
department.
The second condition uses a subquery with GROUP BY to find employees who belong
to only one department. The subquery returns the employee_id s of employees who
belong to exactly one department, and the outer query checks if the employee_id is
in this result.
The OR operator ensures that employees who either have a primary department or
belong to only one department are included in the result.
This query structure is more efficient because it avoids scanning the table twice.
The filtering is done in a single query.
This query does not remove duplicates as effectively as UNION (though it may not
be a problem if the dataset guarantees no overlap).
Question 32
Triangle Judgement
Difficulty: Easy
#### Example
**Input**:
Triangle table:
| x | y | z |
|----|----|----|
| 13 | 15 | 30 |
| 10 | 20 | 15 |
**Output**:
| x | y | z | triangle |
|----|----|----|----------|
| 13 | 15 | 30 | No |
| 10 | 20 | 15 | Yes |
**Explanation**:
- For the first row, 13 + 15 is not greater than 30,
so the three sides cannot form a triangle.
- For the second row, all conditions of the triangle
inequality theorem are satisfied, so the three sides form a triangle.
[LeetCode 610]
Solution
SELECT x, y, z,
CASE
WHEN x + y > z AND x + z > y AND y + z > x
THEN 'Yes'
ELSE 'No'
END AS triangle
FROM triangle;
SELECT x, y, z : This part retrieves the values of x, y, and z from the table
named triangle .
CASE ... END : This is a conditional statement used to create a new column called
triangle that checks whether the values of x, y, and z can form a valid
triangle.
WHEN x + y > z AND x + z > y AND y + z > x : This condition checks if the three
values satisfy the triangle inequality theorem, which states that for any three sides
of a triangle:
The sum of any two sides must be greater than the third side.
If all three conditions are true, then the values can form a triangle.
THEN 'Yes' : If the condition is met (i.e., the three sides form a triangle), the query
will return "Yes" for that row.
ELSE 'No' : If the condition is not met, the query will return "No", indicating the
values do not form a triangle.
Question 33
Consecutive Numbers
Difficulty: Medium
#### Example
**Input**:
Logs table:
| id | num |
|----|-----|
| 1 | 1 |
| 2 | 1 |
| 3 | 1 |
| 4 | 2 |
| 5 | 1 |
| 6 | 2 |
| 7 | 2 |
**Output**:
| ConsecutiveNums |
|-----------------|
| 1 |
**Explanation**:
- Number `1` appears consecutively three times from `id` 1 to `id` 3.
- Number `2` only appears consecutively twice, so it does not meet
the requirement.
[LeetCode 180]
Solution
This query uses self-joins to find consecutive rows in the logs table where the num
l1 , l2 , and l3 represent three different rows from the same table logs .
l1.num = l2.num and l2.num = l3.num : the num value must be the same in all
three consecutive rows.
The DISTINCT ensures only unique num values are returned as ConsecutiveNums .
This query requires multiple joins, which may not be efficient for large datasets.
Each row is compared with its preceding and succeeding rows, making the
computation more expensive.
Alternative Solution
WITH cte AS (
SELECT num, id,
LAG(num, 1) OVER (ORDER BY id) AS prev_num,
LEAD(num, 1) OVER (ORDER BY id) AS next_num,
LAG(id, 1) OVER (ORDER BY id) AS prev_id,
LEAD(id, 1) OVER (ORDER BY id) AS next_id
FROM logs
)
SELECT DISTINCT num AS ConsecutiveNums
FROM cte
WHERE num = prev_num AND prev_num = next_num
AND id = prev_id + 1 AND id = next_id - 1;
This query uses window functions ( LAG and LEAD ) to look at previous and next
rows within the same result set without needing self-joins.
num = prev_num AND prev_num = next_num : Ensures that num is the same across
three consecutive rows.
More efficient: The window functions provide a more efficient way to access
preceding and following rows, avoiding the need for multiple joins.
Simpler logic: The use of LAG and LEAD makes the query more concise and easier
to follow.
Comparison:
Efficiency: The second query is generally more efficient because it avoids self-joins
and uses window functions to inspect adjacent rows, reducing the computational
complexity.
Simplicity: The second query has simpler logic due to the use of LAG and LEAD ,
Joins vs Window Functions: The first query relies on self-joins, which can be
slower for larger datasets, while the second query uses window functions, which
are typically faster for this kind of operation.
Flexibility: The second query can be more flexible in cases where you might need
more complex window-based operations or comparisons.
In summary, while both queries solve the same problem, the second query is more
efficient and modern due to the use of window functions.
Question 34
Difficulty: Medium
#### Example
**Input**:
Products table:
| product_id | price |
|------------|-------|
| 2 | 50 |
| 1 | 35 |
| 3 | 10 |
**Explanation**:
- Product `1` has its price changed to `35` on `2019-08-16`.
- Product `2` has its price changed to `50` on `2019-08-14`,
and no changes on `2019-08-16`, so its price remains `50`.
- Product `3` has no price changes before `2019-08-16`,
so its price is the default of `10`.
[LeetCode 1164]
Solution
WITH LatestChangeDate AS (
SELECT product_id, MAX(change_date) AS last_change_date
FROM products
WHERE change_date <= '2019-08-16'
GROUP BY product_id
)
UNION
The GROUP BY product_id ensures that we get the last change date for each product
up to that date.
The second part retrieves the most recent price for each product, based on the latest
change date.
The JOIN is performed between the original products table ( p ) and the
LatestChangeDate CTE ( lcd ).
lcd.product_id ensures that only the most recent price for each product is selected.
The second part selects products that were not part of the LatestChangeDate CTE,
meaning they didn’t have any price change before or on '2019-08-16' .
Summary:
The query first finds the latest price for products that had price changes before or
on '2019-08-16' by using the LatestChangeDate CTE.
Products without a price change before that date are assigned a default price of
10 .
The UNION merges both results, giving the final output of products with either
their latest price or a default price.
Question 35
Difficulty: Medium
#### Example
**Input**:
Queue table:
**Output**:
| person_name |
|-------------|
| John Cena |
**Explanation**:
The table ordered by `turn` looks like this:
- John Cena is the last person to fit on the bus before the total
weight exceeds 1000 kg.
[LeetCode 1204]
Solution
WITH CumulativeWeightQueue AS (
SELECT
person_name,
weight,
SUM(weight) OVER (ORDER BY turn) AS cumulative_weight
FROM queue
)
SELECT person_name
FROM CumulativeWeightQueue
WHERE cumulative_weight <= 1000
ORDER BY cumulative_weight DESC
LIMIT 1;
The Common Table Expression (CTE) is now named CumulativeWeightQueue ,
This gives us the cumulative weight for each person in the queue as they are
processed one by one.
The main query selects the person_name from the CumulativeWeightQueue CTE
where the cumulative weight is less than or equal to 1000 .
The LIMIT 1 ensures that we only return the name of the person whose cumulative
weight is the highest while still being less than or equal to 1000 .
Summary:
The query calculates a running total of weights for people in the queue.
It then selects the person whose cumulative weight is the largest but still under or
equal to 1000 , effectively finding the last person who can be added to the queue
without exceeding the weight limit.
Question 36
Count Salary Categories
Difficulty: Medium
The result must contain all three categories. If there are no accounts in
a category, return `0`.
#### Example
**Input**:
Accounts table:
| account_id | income |
|------------|--------|
| 3 | 108939 |
| 2 | 12747 |
| 8 | 87709 |
| 6 | 91796 |
**Output**:
| category | accounts_count |
|----------------|----------------|
| Low Salary | 1 |
| Average Salary | 0 |
| High Salary | 3 |
**Explanation**:
- **Low Salary**: Account 2 has an income strictly less than $20,000.
- **Average Salary**: No accounts have income between $20,000 and
$50,000 inclusive.
- **High Salary**: Accounts 3, 6, and 8 have incomes strictly greater
than $50,000.
[LeetCode 1907]
Solution
WITH accounts_cat_count AS
(
SELECT
SUM(IF(income < 20000, 1, 0)) AS "Low_Salary",
SUM(IF(income >= 20000 AND income <= 50000, 1, 0))
AS "Average_Salary",
SUM(IF(income > 50000, 1, 0)) AS "High_Salary"
FROM Accounts
)
SELECT 'High Salary' AS category, High_Salary AS accounts_count
FROM accounts_cat_count
UNION ALL
SELECT 'Low Salary', Low_Salary
FROM accounts_cat_count
UNION ALL
SELECT 'Average Salary', Average_Salary
FROM accounts_cat_count;
Aggregation in a Subquery:
Single Aggregation: This solution computes all categories in one go, aggregating
the counts for “Low_Salary,” “Average_Salary,” and “High_Salary” in a single
subquery.
Efficiency: This approach is potentially faster for large datasets because it scans
the Accounts table once and performs the counting in a single query. This can be
more efficient than multiple table scans.
Structure: It requires you to access the counts multiple times from the
accounts_cat_count CTE in the UNION ALL .
Alternative Solution
WITH cte AS (
SELECT "Low Salary" AS category, COUNT(*) AS accounts_count
FROM Accounts
WHERE income < 20000
UNION
UNION
Multiple Scans: Each SELECT statement with WHERE clauses runs independently,
meaning the table is scanned multiple times (three times in this case). This can
lead to slower performance on large datasets compared to the aggregation-based
solution.
Flexibility: This method is easier to understand and maintain, as each category
has its own filtering logic explicitly written in the WHERE clause. If more categories
or complex conditions were needed, they could be added easily.
Efficiency
Solution 1 (Aggregation) is likely more efficient for large datasets because it scans
the Accounts table only once.
Solution 2 (Filtering) scans the table separately for each salary category, resulting
in three scans of the Accounts table. This could be less performant for large
datasets.
Solution 1 (Aggregation):
More compact: The logic for all salary categories is contained within a single
query.
Difficult to extend: If you need to add more salary ranges or categories, the logic
would become increasingly complicated, as you’d need to add more conditional
aggregations.
Solution 2 (Filtering):
More readable: The logic is straightforward, with each SELECT statement clearly
defining the condition for each salary range. This makes the query easier to read
and extend if needed.
Ordering
Solution 1 (Aggregation) does not include an ORDER BY clause. The result will be
in the default UNION ALL order, which depends on how the database processes the
query. If you want to order the results by accounts_count , you would need to add
an ORDER BY clause to the final query.
Solution 1 uses UNION ALL which keeps all results and avoids the overhead of
deduplication that occurs with UNION .
Conclusion:
Performance: Solution 1 is generally more efficient for large datasets due to fewer
table scans. It is ideal for scenarios where performance is critical.
Use Case:
If you’re working with a large dataset and want better performance, Solution 1 is
likely the better choice.
Question 37
Difficulty: Easy
#### Example
**Input**:
Employees table:
**Output**:
| employee_id |
|-------------|
| 11 |
**Explanation**:
- The employees with a salary less than $30,000 are 1 (Kalel) and 11 (Joziah).
- Kalel's manager is employee 11, who is still in the company (Joziah).
- Joziah's manager is employee 6, who left the company because there is no
row for employee 6 as it was deleted.
[LeetCode 1978]
Solution
SELECT employee_id
FROM employees
WHERE manager_id NOT IN (SELECT employee_id FROM employees)
AND salary < 30000
ORDER BY employee_id;
Main Query: Selects employee_id from the employees table.
Filters out employees whose manager_id is found in the list of employee_id s in the
table. It effectively finds employees with non-existent or null managers.
salary < 30000 : Further filters the result to include only employees with a salary
below 30,000.
The query retrieves employees who do not have a valid manager and earn less than
30,000.
Question 38
Exchange Seats
Difficulty: Medium
- `id` is the primary key (unique value) column for this table.
- Each row of this table indicates the name and the ID of a student.
- The ID sequence always starts from 1 and increments continuously.
#### Example
**Input**:
Seat table:
| id | student |
|----|---------|
| 1 | Abbot |
| 2 | Doris |
| 3 | Emerson |
| 4 | Green |
| 5 | Jeames |
**Output**:
| id | student |
|----|---------|
| 1 | Doris |
| 2 | Abbot |
| 3 | Green |
| 4 | Emerson |
| 5 | Jeames |
**Explanation**:
- Note that if the number of students is odd, there is no need to change
the last student's seat. In this case, the last student is Jeames,
and his seat remains unchanged.
[LeetCode 626]
Solution
WITH max_id AS (
SELECT MAX(id) AS mid
FROM seat
)
SELECT
CASE
WHEN id % 2 = 1 AND id != max_id.mid THEN id + 1
WHEN id % 2 = 0 THEN id - 1
ELSE id
END AS id,
student
FROM seat, max_id
ORDER BY id;
WITH max_id AS (SELECT MAX(id) AS mid FROM seat) : This part calculates the
maximum id value from the seat table and stores it in a CTE called max_id with
an alias mid .
Main Query:
SELECT Statement: The main query selects and transforms the id and student
CASE Statement:
ELSE id : If neither condition is met (i.e., the id is the maximum odd number), it
keeps the id unchanged.
FROM Clause: Uses a Cartesian join between the seat table and the max_id CTE,
making the maximum id value available for use in the CASE logic.
Question 39
Movie Rating
Difficulty: Medium
- `movie_id` is the primary key (column with unique values) for this table.
- `title` is the name of the movie.
- `user_id` is the primary key (column with unique values) for this table.
- The column `name` has unique values.
#### Example
**Input**:
Movies table:
| movie_id | title |
|----------|----------|
| 1 | Avengers |
| 2 | Frozen 2 |
| 3 | Joker |
Users table:
| user_id | name |
|---------|--------|
| 1 | Daniel |
| 2 | Monica |
| 3 | Maria |
| 4 | James |
MovieRating table:
**Output**:
| results |
|-----------|
| Daniel |
| Frozen 2 |
**Explanation**:
- Daniel and Monica have rated 3 movies ("Avengers", "Frozen 2" and "Joker"),
but Daniel is lexicographically smaller.
- "Frozen 2" and "Joker" have an average rating of 3.5 in February,
but "Frozen 2" is lexicographically smaller.
[LeetCode 1341]
Solution
WITH results1 AS (
SELECT u.name AS results
FROM MovieRating AS r
JOIN Users AS u ON u.user_id = r.user_id
GROUP BY u.user_id
ORDER BY COUNT(*) DESC, u.name ASC
LIMIT 1
),
febmov AS (
SELECT movie_id, AVG(rating) AS av
FROM MovieRating
WHERE created_at >= '2020-02-01' AND created_at <= '2020-02-29'
GROUP BY movie_id
),
results2 AS (
SELECT m.title AS results
FROM Movies AS m
JOIN febmov fm ON m.movie_id = fm.movie_id
ORDER BY fm.av DESC, m.title ASC
LIMIT 1
)
results1 CTE:
This common table expression finds the user with the highest number of movie
ratings.
It joins the MovieRating table with the Users table to group ratings by each user.
The result is ordered by the number of ratings in descending order and then by the
user’s name in ascending order. The top user (with the most ratings) is selected
using LIMIT 1 .
febmov CTE:
This CTE calculates the average rating for each movie in February 2020.
The WHERE clause filters ratings within the specified date range (February 1st to
February 29th).
Results are grouped by movie_id , with the average rating ( av ) calculated for each
movie.
results2 CTE:
This part selects the title of the movie with the highest average rating in February
2020.
It joins the Movies table with the febmov CTE to order by the average rating in
descending order and by movie title in ascending order. The top-rated movie is
selected using LIMIT 1 .
Final Query:
The final SELECT statement combines the results from results1 and results2
using UNION ALL . This will display the user with the most ratings and the highest-
rated movie in February 2020, both in a single output.
Summary:
The query retrieves the top user based on the number of movie ratings and the
highest-rated movie in February 2020, presenting them in a combined result set.
Question 40
Restaurant Growth
Difficulty: Medium
#### Example
**Input**:
Customer table:
| customer_id | name | visited_on | amount |
|-------------|---------|------------|--------|
| 1 | Jhon | 2019-01-01 | 100 |
| 2 | Daniel | 2019-01-02 | 110 |
| 3 | Jade | 2019-01-03 | 120 |
| 4 | Khaled | 2019-01-04 | 130 |
| 5 | Winston | 2019-01-05 | 110 |
| 6 | Elvis | 2019-01-06 | 140 |
| 7 | Anna | 2019-01-07 | 150 |
| 8 | Maria | 2019-01-08 | 80 |
| 9 | Jaze | 2019-01-09 | 110 |
| 1 | Jhon | 2019-01-10 | 130 |
| 3 | Jade | 2019-01-10 | 150 |
**Output**:
**Explanation**:
- The 1st moving average from 2019-01-01 to 2019-01-07 has an `average_amount`
of (100 + 110 + 120 + 130 + 110 + 140 + 150)/7 = 122.86.
- The 2nd moving average from 2019-01-02 to 2019-01-08 has an `average_amount`
of (110 + 120 + 130 + 110 + 140 + 150 + 80)/7 = 120.
- The 3rd moving average from 2019-01-03 to 2019-01-09 has an `average_amount`
of (120 + 130 + 110 + 140 + 150 + 80 + 110)/7 = 120.
- The 4th moving average from 2019-01-04 to 2019-01-10 has an `average_amount`
of (130 + 110 + 140 + 150 + 80 + 110 + 130 + 150)/7 = 142.86.
[LeetCode 1321]
Solution
WITH customer_aggregated AS (
SELECT visited_on, SUM(amount) AS amount
FROM customer
GROUP BY visited_on
),
7_days_results AS (
SELECT
visited_on,
SUM(amount) OVER (
ORDER BY visited_on
ROWS BETWEEN 6 PRECEDING AND 0 FOLLOWING
) AS amount,
ROUND(AVG(amount) OVER (
ORDER BY visited_on
ROWS BETWEEN 6 PRECEDING AND 0 FOLLOWING
), 2) AS average_amount
FROM customer_aggregated
ORDER BY visited_on
)
SELECT *
FROM 7_days_results
WHERE visited_on >= (
SELECT MIN(visited_on) + INTERVAL 6 DAY
FROM customer_aggregated
);
customer_aggregated CTE:
date.
7_days_results CTE:
This CTE performs a 7-day window analysis on the aggregated sales data.
amount : The sum of the amount values over a 7-day window, defined by the range
ROWS BETWEEN 6 PRECEDING AND 0 FOLLOWING . This window includes the current
day and the six previous days.
average_amount : The average amount spent over the same 7-day window, rounded
to two decimal places.
Final Query:
The final SELECT statement filters the 7_days_results to include only those dates
that are at least 6 days after the earliest visited_on date.
Summary:
The query analyzes customer spending over rolling 7-day windows and retrieves
the sum and average for each date, starting from the first complete 7-day window
Question 41
Difficulty: Medium
#### Example
**Input**:
RequestAccepted table:
**Output**:
| id | num |
|----|-----|
| 3 | 3 |
**Explanation**:
- The person with `id 3` is friends with people 1, 2, and 4,
so they have 3 friends, which is the most among all others.
#### Follow-up
In a real-world scenario, multiple people could have the same
number of friends. Can you find all such people in this case?
[LeetCode 602]
Solution
WITH combined_counts AS (
SELECT requester_id AS id, COUNT(*) AS cnt
FROM RequestAccepted
GROUP BY requester_id
UNION ALL
SELECT accepter_id AS id, COUNT(*) AS cnt
FROM RequestAccepted
GROUP BY accepter_id
)
First part: It counts how many times each requester_id appears in the
RequestAccepted table.
The UNION ALL ensures that we collect both the requests initiated and accepted by
each user.
Main Query:
The main query aggregates the counts ( cnt ) for each id (which is either a
requester_id or accepter_id ) using SUM(cnt) to get the total number of
interactions per user.
GROUP BY id is used to group all interactions for each unique user.
The results are ordered by the total number of interactions ( num ) in descending
order ( ORDER BY num DESC ).
The LIMIT 1 ensures that only the user with the highest number of interactions is
returned.
Question 42
Difficulty: Medium
#### Example
**Input**:
Insurance table:
**Output**:
| tiv_2016 |
|----------|
| 45.00 |
**Explanation**:
- The first and fourth records meet both conditions:
- The `tiv_2015` value 10 is shared with other records (records 1, 3, 4).
- Their `(lat, lon)` locations are unique.
- The second record fails the first condition, and the third record fails
the second condition (same location as record 2).
- The sum of `tiv_2016` for records 1 and 4 is `5 + 40 = 45.00`.
[LeetCode 585]
Solution
WITH tiv_2015_counts AS (
SELECT tiv_2015, COUNT(tiv_2015) AS cnt
FROM Insurance
GROUP BY tiv_2015
),
location_count AS (
SELECT DISTINCT CONCAT(lat, "-", lon) AS loc, COUNT(*) AS cnt
FROM Insurance
GROUP BY CONCAT(lat, "-", lon)
)
CTE tiv_2015_counts :
This common table expression groups the Insurance table by the tiv_2015
column (Total Insured Value for 2015), calculating the count of occurrences for
each tiv_2015 value.
The result gives the number of records sharing the same tiv_2015 .
CTE location_count :
This CTE creates a unique identifier for each location by concatenating lat
Main Query:
The main query joins the Insurance table with the two CTEs ( tiv_2015_counts
and location_count ).
I.tiv_2015 = T.tiv_2015 : The tiv_2015 values from the Insurance table match
with the values in the tiv_2015_counts CTE.
T.cnt >= 2 : The count of occurrences for the tiv_2015 value is 2 or more.
CONCAT(I.lat, "-", I.lon) = L.loc : The location from the Insurance table
matches the unique location identifier in the location_count CTE.
It calculates the total sum of tiv_2016 (Total Insured Value for 2016) for the
matching records and rounds it to two decimal places.
Question 43
**Difficulty**: Hard
Write an SQL query to return a result table that lists the department name, empl
#### Example
**Input**:
**Employee** table:
**Department** table:
| id | name |
|-----|-------|
| 1 | IT |
| 2 | Sales |
**Output**:
**Explanation**:
- In the IT department:
- Max earns the highest unique salary.
- Both Randy and Joe earn the second-highest unique salary.
- Will earns the third-highest unique salary.
- In the Sales department:
- Henry earns the highest salary.
- Sam earns the second-highest salary.
- There is no third-highest salary as there are only two employees.
[LeetCode 185]
Solution
WITH top_3 AS (
SELECT *
FROM (
SELECT DISTINCT departmentId, salary,
DENSE_RANK() OVER (PARTITION BY departmentId ORDER BY salary DESC)
AS rn
FROM employee
) AS subquery
WHERE rn <= 3
)
CTE top_3 :
This Common Table Expression (CTE) identifies the top 3 salaries in each
department:
Only ranks 1 to 3 are retained in the CTE ( WHERE rn <= 3 ), capturing the highest
three salaries (or fewer if a department has fewer than three employees).
Main Query:
The main query joins the employee table ( E ) with the department table ( D ) and
the CTE top_3 ( T ) to retrieve the top 3 earners from each department.
Finally, the query orders the results by E.salary (descending), then by Employee
The query returns the top 3 earners from each department, showing each
department’s name, the employee name, and salary, ordered by salary in
descending order and then alphabetically by employee and department.
Question 44
**Difficulty**: Easy
**Input**:
Users table:
| user_id | name |
|---------|-------|
| 1 | aLice |
| 2 | bOB |
**Output**:
| user_id | name |
|---------|-------|
| 1 | Alice |
| 2 | Bob |
**Explanation**:
- The name `aLice` should be changed to `Alice`.
- The name `bOB` should be changed to `Bob`.
- The result is ordered by `user_id`.
[LeetCode 1667]
Solution
SELECT
user_id,
CONCAT(UPPER(SUBSTRING(name, 1, 1)), LOWER(SUBSTRING(name, 2))) AS name
FROM users
ORDER BY user_id;
This query retrieves a list of user IDs and formats each user's name to capitalize
only the first letter, while converting the rest of the name to lowercase.
Formatting name :
UPPER(SUBSTRING(name, 1, 1)) : Extracts the first character from the name and
converts it to uppercase.
LOWER(SUBSTRING(name, 2)) : Extracts the rest of the name starting from the second
character and converts it to lowercase.
CONCAT(...) : Combines the formatted parts (uppercase first letter and lowercase
rest of the name) into a single string.
Ordering:
This query is useful for standardizing the format of user names in the database,
ensuring that each name appears with an initial capital letter followed by
lowercase letters, regardless of the original format.
Question 45
**Difficulty**: Easy
#### Example
**Input**:
Patients table:
**Output**:
**Explanation**:
- Bob and George have conditions that start with "DIAB1".
- Patients with IDs 1, 2, and 5 do not meet the criteria.
[LeetCode 1527]
Solution
SELECT
patient_id,
patient_name,
conditions
FROM patients
WHERE
conditions LIKE 'DIAB1%'
OR conditions LIKE '% DIAB1%';
This query retrieves the patient_id , patient_name , and conditions columns
from the patients table, specifically selecting only those patients whose
conditions contain "DIAB1" at the beginning or as a separate word later in the
condition list.
Filtering Conditions:
conditions LIKE 'DIAB1%' : Matches any condition that starts with "DIAB1."
conditions LIKE '% DIAB1%' : Matches any condition where "DIAB1" appears
after a space, ensuring it’s found within a list of conditions (e.g., "ASTHMA
DIAB1").
The OR operator ensures that rows meeting either condition will be included.
Question 46
**Difficulty**: Easy
**Note**:
- You need to write a `DELETE` statement, not a `SELECT` query.
- After the script runs, the table should show only unique emails with the
lowest `id` for each.
#### Example
**Input**:
Person table:
| id | email |
|-----|------------------|
| 1 | [email protected] |
| 2 | [email protected] |
| 3 | [email protected] |
**Output**:
| id | email |
|-----|------------------|
| 1 | [email protected] |
| 2 | [email protected] |
**Explanation**:
- The email `[email protected]` appears twice with IDs 1 and 3. We delete
the entry with the larger ID, keeping the one with ID 1.
[LeetCode 196]
Solution
DELETE A
FROM Person A, Person B
WHERE A.email = B.email
AND A.id > B.id;
This query deletes duplicate rows from the Person table based on the email
column, keeping only the row with the smallest id for each unique email.
FROM Clause:
Person A and Person B represent two instances of the same Person table,
enabling comparison between pairs of rows in the table.
WHERE Clause:
A.email = B.email : Matches rows where both A and B have the same email
value.
A.id > B.id : Ensures that only the row with the higher id (considered the
duplicate) is selected for deletion.
In this way, for each unique email , only the row with the smallest id is retained,
and any additional rows with the same email but a higher id are removed from
the table. This approach is efficient for removing duplicates while preserving the
first occurrence of each unique email.
Question 47
**Difficulty**: Medium
#### Example 1
**Input**:
Employee table:
| id | salary |
|-----|--------|
| 1 | 100 |
| 2 | 200 |
| 3 | 300 |
**Output**:
| SecondHighestSalary |
|---------------------|
| 200 |
#### Example 2
**Input**:
Employee table:
| id | salary |
|-----|--------|
| 1 | 100 |
**Output**:
| SecondHighestSalary |
|---------------------|
| null |
**Explanation**:
- In the first example, the second highest distinct salary is 200.
- In the second example, there is only one salary, so the second highest
salary does not exist and `null` is returned.
[LeetCode 176]
Solution
WITH RankedSalaries AS (
SELECT salary,
ROW_NUMBER() OVER (ORDER BY salary DESC) AS rn
FROM (SELECT DISTINCT salary FROM employee) AS sub
)
SELECT
(SELECT salary AS SecondHighestSalary
FROM RankedSalaries
WHERE rn = 2) AS SecondHighestSalary;
This query identifies the second-highest unique salary from the employee table.
The RankedSalaries CTE retrieves distinct salaries from the employee table,
removing any duplicates.
Main Query:
The main query selects the salary with rn = 2 from RankedSalaries , which
represents the second-highest unique salary.
If fewer than two distinct salaries exist, the query will return NULL as
SecondHighestSalary .
This refined version makes the query purpose clearer, using RankedSalaries to
indicate that the CTE ranks unique salaries in descending order.
Alternate solution
SELECT (
SELECT DISTINCT salary
FROM Employee
ORDER BY salary DESC
LIMIT 1 OFFSET 1
) AS SecondHighestSalary;
The inner query directly selects distinct salaries from Employee , sorts them in
descending order, and uses LIMIT 1 OFFSET 1 to get the second-highest salary.
If there’s only one unique salary, this query implicitly returns NULL as no row
exists at OFFSET
Simplicity: This is more concise, using only one query without any CTEs.
Efficiency: The query might be more efficient since it fetches only the top two
distinct salaries directly without applying ROW_NUMBER() across all entries.
Question 48
**Difficulty**: Easy
- There is no primary key for this table, and it may contain duplicates.
- Each row records the name of a product sold and the date it was sold.
#### Example 1
**Input**:
Activities table:
| sell_date | product |
|------------|-------------|
| 2020-05-30 | Headphone |
| 2020-06-01 | Pencil |
| 2020-06-02 | Mask |
| 2020-05-30 | Basketball |
| 2020-06-01 | Bible |
| 2020-06-02 | Mask |
| 2020-05-30 | T-Shirt |
**Output**:
**Explanation**:
- For `2020-05-30`, the sold items were `Headphone`, `Basketball`,
and `T-Shirt`. After sorting lexicographically, we return: `Basketball,
Headphone, T-Shirt`.
- For `2020-06-01`, the sold items were `Pencil` and `Bible`. After
sorting lexicographically, we return: `Bible, Pencil`.
- For `2020-06-02`, the only sold item was `Mask`.
[LeetCode 1484]
Solution
SELECT
sell_date,
COUNT(DISTINCT product) AS num_sold,
GROUP_CONCAT(DISTINCT product ORDER BY product ASC SEPARATOR ',') AS product
FROM
Activities
GROUP BY
sell_date
ORDER BY
sell_date;
DISTINCT product : Ensures each product is only counted once per sell_date .
SEPARATOR ',' : Specifies that each product should be separated by a comma in the
final list.
GROUP BY sell_date: Groups the results by sell_date to get a summary per date.
Question 49
List the Products Ordered in a Period
**Difficulty**: Easy
- `product_id` is the primary key for this table, which contains data
about the company's products.
#### Example 1
**Input**:
Products table:
Orders table:
**Output**:
| product_name | unit |
|--------------------|------|
| Leetcode Solutions | 130 |
| Leetcode Kit | 100 |
**Explanation**:
- Product `1` (Leetcode Solutions) has 60 units ordered on `2020-02-05`
and 70 units on `2020-02-10`, totaling 130 units in February 2020.
- Product `5` (Leetcode Kit) has 50 units ordered on `2020-02-25` and
50 units on `2020-02-27`, totaling 100 units.
- Product `2` has 80 units ordered in February, which is less than 100,
and product `3` has only 5 units. Therefore, they are not included in the
result.
Solution
WITH MonthlyProductSales AS (
SELECT product_id,
SUM(unit) AS total_units_sold
FROM Orders
WHERE order_date BETWEEN '2020-02-01' AND '2020-02-29'
GROUP BY product_id
HAVING total_units_sold >= 100
)
SELECT
p.product_name,
mps.total_units_sold AS units
FROM Products p
JOIN MonthlyProductSales mps
ON p.product_id = mps.product_id;
CTE: MonthlyProductSales
Only products with at least 100 units sold are retained ( HAVING total_units_sold
>= 100 ).
Main Query
The main query retrieves the product_name and total units sold for each
qualifying product.
This approach is efficient for filtering products with substantial sales in a specific
period and displaying the relevant product names and units sold.
Question 50
Find Users With Valid E-Mails
**Difficulty**: Easy
- `user_id` is the primary key (column with unique values) for this table.
- This table contains information about the users signed up on a website.
Some emails may be invalid.
#### Example 1
**Input**:
Users table:
**Output**:
Solution
To solve this problem, we need to select users whose email addresses match a
specific pattern:
We can achieve this using a REGEXP pattern in MySQL to enforce these conditions.
Appendix
CREATE
Definition: Creates a new database object such as a table, view, or index.
Usage:
CREATE TABLE employees (id INT, name VARCHAR(50), salary DECIMAL(10,2));
ALTER
Definition: Modifies an existing database object such as a table or view.
Usage:
DROP
Definition: Deletes an existing database object such as a table, view, or
index.
Usage:
TRUNCATE
Definition: Removes all rows from a table without logging individual row
deletions.
Usage:
RENAME
Definition: Renames a database object such as a table or column.
Usage:
UPDATE , DELETE , and SELECT to add, change, remove, or retrieve data from
tables. Unlike DDL, which focuses on the structure of the database, DML is
concerned with the actual data stored in the tables. These operations are
essential for managing and working with the data in a database without
altering its schema or structure.
SELECT
Definition: Retrieves data from one or more tables.
Usage:
SELECT name, salary FROM employees WHERE department = ‘HR’;
INSERT
Definition: Inserts new data into a table.
Usage:
INSERT INTO employees (id, name, salary) VALUES (1, ‘John Doe’, 50000);
UPDATE
Definition: Updates existing data in a table.
Usage:
DELETE
Definition: Removes data from a table.
Usage:
MERGE
Definition: Combines insert, update, and delete operations in a single
statement.
Usage:
CALL
Definition: Executes a stored procedure.
Usage:
EXPLAIN PLAN
Definition: Shows the execution plan of a SQL statement.
Usage:
LOCK TABLE
Definition: Locks a table to prevent other transactions from modifying it.
Usage:
LOCK TABLE employees IN EXCLUSIVE MODE;
DCL (Data Control Language) in SQL is used to manage access and control
permissions within a database. It includes commands like GRANT and REVOKE
to give or take away user privileges for actions such as querying, inserting,
updating, or deleting data. DCL ensures that users have the appropriate level
of access to the database, allowing administrators to enforce security and
control who can perform specific operations on the data or the structure of
the database.
GRANT
Definition: Gives privileges to users or roles.
Usage:
REVOKE
Definition: Removes privileges from users or roles.
Usage:
COMMIT
Definition: Saves all changes made in the current transaction.
Usage:
COMMIT;
ROLLBACK
Definition: Undoes all changes made in the current transaction.
Usage:
ROLLBACK;
SAVEPOINT
Definition: Sets a point in a transaction to which you can later roll back.
Usage:
SAVEPOINT sp1;
SET TRANSACTION
Definition: Sets the characteristics of the current transaction, such as
isolation level.
Usage:
Clauses
Clauses in SQL are keywords used to specify conditions and modify queries
to retrieve or manipulate data in a more refined way. Common clauses
include WHERE to filter rows, GROUP BY to group rows based on a column,
ORDER BY to sort results, and HAVING to filter groups. Clauses work in
combination with SQL commands like SELECT , UPDATE , and DELETE to control
how data is selected, updated, or removed. They allow you to narrow down
the dataset or apply specific rules for processing the data.
FROM
Definition: Specifies the table from which to retrieve data.
Usage:
GROUP BY
Definition: Groups rows sharing a property so aggregate functions can be
applied to each group.
Usage:
HAVING
Definition: Filters groups based on a condition, typically used with GROUP
BY.
Usage:
ORDER BY
Definition: Sorts the result set in ascending or descending order.
Usage:
SELECT * FROM employees ORDER BY salary DESC;
LIMIT
Definition: Limits the number of rows returned.
Usage:
OFFSET
Definition: Skips a specific number of rows before returning the result set.
Usage:
UNION
Definition: Combines the result sets of two or more SELECT queries,
excluding duplicates.
Usage:
SELECT name FROM employees UNION ALL SELECT name FROM contractors;
INTERSECT
Definition: Returns only the rows that are common to the result sets of two
SELECT queries.
Usage:
EXCEPT
Definition: Returns rows from the first SELECT query that are not present in
the second SELECT query.
Usage:
JOIN
Definition: Combines rows from two or more tables based on a related
column.
Usage (INNER JOIN):
ON
Definition: Specifies the condition for a join operation between tables.
Usage:
USING
Definition: Specifies the column to be used for a join between tables that
have the same column name.
Usage:
Other Keywords
DISTINCT
Definition: Removes duplicate rows from the result set.
Usage:
LIKE
Definition: Performs pattern matching in string searches.
Usage:
DISTINCT
Definition: Removes duplicate rows from the result set.
Usage:
ALL
Definition: Returns all rows in the result set, including duplicates (used with
SELECT or in combination with set operators like UNION).
Usage:
SELECT * FROM employees WHERE salary > 50000 AND department = ‘HR’;
OR
Definition: Combines two or more conditions in a query, where at least one
must be true.
Usage:
NOT
Definition: Reverses the result of a condition.
Usage:
IN
Definition: Filters records based on a list of values.
Usage:
SELECT * FROM employees WHERE department IN (‘HR’, ‘Finance’, ‘IT’);
BETWEEN
Definition: Filters records within a specific range.
Usage:
LIKE
Definition: Performs pattern matching using wildcards.
Usage:
IS NULL
Definition: Checks if a column contains NULL values.
Usage:
IS NOT NULL
Definition: Checks if a column does not contain NULL values.
Usage:
AS
Definition: Renames a column or table in the result set.
Usage:
CASE
Definition: Provides conditional logic in a query, similar to IF-THEN-ELSE.
Usage:
SELECT name,
CASE WHEN salary > 50000 THEN ‘High’ ELSE ‘Low’ END AS salary_category
FROM employees;
WHEN
Definition: Specifies a condition for use in a CASE statement.
Usage:
SELECT name,
CASE
WHEN salary > 50000 THEN ‘High’
WHEN salary > 30000 THEN ‘Medium’
ELSE ‘Low’ END AS salary_category F
ROM employees;
THEN
Definition: Defines the result to return if a condition in a CASE statement is
true.
Usage:
SELECT name,
CASE WHEN salary > 50000 THEN ‘High’ ELSE ‘Low’ END AS salary_category
FROM employees;
ELSE
Definition: Specifies the result if none of the conditions in a CASE statement
are true.
Usage:
SELECT name,
CASE WHEN salary > 50000 THEN ‘High’ ELSE ‘Low’ END AS salary_category
FROM employees;
END
Definition: Marks the end of a CASE statement.
Usage:
SELECT name,
CASE WHEN salary > 50000 THEN ‘High’ ELSE ‘Low’ END AS salary_category
FROM employees;
EXISTS
Definition: Tests for the existence of any rows in a subquery.
Usage:
ANY
Definition: Compares a value to any value in a subquery or list.
Usage:
SOME
Definition: Similar to ANY, compares a value to any value in a subquery or
list.
Usage:
DEFAULT
Definition: Specifies a default value for a column when no value is provided.
Usage:
CHECK
Definition: Adds a condition that data must satisfy before being inserted into
a table.
Usage:
PRIMARY KEY
Definition: Defines one or more columns as a unique identifier for each row.
Usage:
CREATE TABLE employees (id INT, department_id INT, FOREIGN KEY (department_id)
REFERENCES departments(id));
UNIQUE
Definition: Ensures that all values in a column or group of columns are
unique.
Usage:
INDEX
Definition: Creates an index on a table to improve query performance.
Usage:
VIEW
Definition: Creates a virtual table based on the result of a SELECT query.
Usage:
CREATE VIEW high_salary_employees AS
SELECT * FROM employees WHERE salary > 60000;
PROCEDURE
Definition: Defines a stored procedure for reusable logic in the database.
Usage:
FUNCTION
Definition: Creates a user-defined function that returns a value.
Usage:
TRIGGER
Definition: Executes a specified action automatically in response to certain
events on a table or view.
Usage:
CREATE TRIGGER before_employee_insert BEFORE INSERT ON employees
FOR EACH ROW SET NEW.salary = GREATEST(NEW.salary, 30000);
CASCADE
Definition: Specifies that related rows in other tables should also be deleted
or updated when a row is deleted or updated.
Usage:
RESTRICT
Definition: Prevents the deletion or update of a row if related rows exist in
other tables.
Usage:
WITH
Definition: Defines a Common Table Expression (CTE) for use within a
query.
Usage:
Built-in Functions
AVG ), string functions (e.g., CONCAT , SUBSTRING ), date functions (e.g., NOW ,
Aggregate Functions
to calculate the total, COUNT to count the number of rows, AVG to find the
average, MAX to get the highest value, and MIN to get the lowest. These
functions are often used with the GROUP BY clause to group data and apply
the function to each group, summarizing large datasets into meaningful
insights, such as total sales per region or average salary by department.
AVG()
Definition: Returns the average value of a numeric column.
Usage:
SELECT AVG(salary) FROM employees;
COUNT()
Definition: Returns the number of rows that match a specified condition.
Usage:
MAX()
Definition: Returns the maximum value in a column.
Usage:
MIN()
Definition: Returns the minimum value in a column.
Usage:
SUM()
Definition: Returns the total sum of a numeric column.
Usage:
Numeric Functions
ABS()
Definition: Returns the absolute value of a number.
Usage:
SELECT ABS(-10);
ROUND()
Definition: Rounds a number to a specified number of decimal places.
Usage:
SELECT ROUND(123.456, 2);
CEIL()
Definition: Returns the smallest integer greater than or equal to a number.
Usage:
SELECT CEIL(4.2);
FLOOR()
Definition: Returns the largest integer less than or equal to a number.
Usage:
SELECT FLOOR(4.9);
EXP()
Definition: Returns e raised to the power of a given number.
Usage:
SELECT EXP(1);
LN()
Definition: Returns the natural logarithm of a number.
Usage:
SELECT LN(2.718);
LOG()
Definition: Returns the logarithm of a number to a specified base.
Usage:
MOD()
Definition: Returns the remainder of the division of two numbers.
Usage:
POWER()
Definition: Returns a number raised to the power of another number.
Usage:
SELECT SIGN(-10);
SQRT()
Definition: Returns the square root of a number.
Usage:
SELECT SQRT(16);
TRUNC()
Definition: Truncates a number to a specified number of decimal places.
Usage:
String Functions
String functions in SQL are used to perform operations on text data (strings).
Common string functions include CONCAT to join two or more strings,
SUBSTRING to extract part of a string, LENGTH to get the length of a string,
UPPER or LOWER to convert a string to uppercase or lowercase, and TRIM to
remove whitespace from the beginning or end of a string. These functions
help manipulate text in various ways, such as formatting names, extracting
specific parts of data, or cleaning up strings for better data consistency.
CONCAT()
Definition: Concatenates two or more strings into one.
Usage:
LENGTH()
Definition: Returns the length of a string.
Usage:
SELECT LENGTH(‘hello’);
LOWER()
Definition: Converts a string to lowercase.
Usage:
SELECT LOWER(‘HELLO’);
UPPER()
Definition: Converts a string to uppercase.
Usage:
SELECT UPPER(‘hello’);
LTRIM()
Definition: Removes leading spaces from a string.
Usage:
RTRIM()
Definition: Removes trailing spaces from a string.
Usage:
TRIM()
Definition: Removes leading and trailing spaces from a string.
Usage:
REPLACE()
Definition: Replaces occurrences of a substring in a string with another
substring.
Usage:
POSITION()
Definition: Returns the position of a substring within a string.
Usage:
Date and time functions in SQL are used to perform operations on date and
time values, allowing for various manipulations and calculations. Common
functions include NOW() to retrieve the current date and time, DATEADD to add
a specified interval to a date, DATEDIFF to calculate the difference between
two dates, EXTRACT to retrieve specific parts of a date (like year or month),
and FORMAT to display dates in a specified format. These functions are
essential for handling and analyzing temporal data, enabling tasks like
calculating age, finding time intervals, and formatting dates for presentation
in reports.
CURRENT_DATE
Definition: Returns the current date.
Usage:
SELECT CURRENT_DATE;
CURRENT_TIME
Definition: Returns the current time.
Usage:
SELECT CURRENT_TIME;
CURRENT_TIMESTAMP
Definition: Returns the current date and time.
Usage:
SELECT CURRENT_TIMESTAMP;
EXTRACT()
Definition: Extracts a specific part of a date (e.g., year, month, day).
Usage:
DATE_ADD()
Definition: Adds a specified interval to a date.
Usage:
DATE_SUB()
Definition: Subtracts a specified interval from a date.
Usage:
DATEDIFF()
Definition: Returns the difference between two dates.
Usage:
SELECT DATEDIFF(CURRENT_DATE, ‘2023–01–01’);
DATE_FORMAT()
Definition: Formats a date according to a specified format.
Usage:
Conversion Functions
Conversion functions in SQL are used to change data from one type to
another, ensuring that data is in the correct format for processing or
analysis. Common conversion functions include CAST and CONVERT , which
allow you to convert data types such as converting a string to an integer or a
date to a string. Other functions, like TO_CHAR or TO_DATE , are used to format
date values into strings or convert strings into date formats, respectively.
These functions are crucial for data integrity, enabling the proper handling
of different data types in queries, calculations, and comparisons.
CAST()
Definition: Converts a value from one data type to another.
Usage:
Conditional Functions
COALESCE()
Definition: Returns the first non-null value in a list of expressions.
Usage:
NULLIF()
Definition: Returns NULL if two expressions are equal, otherwise returns the
first expression.
Usage:
GREATEST()
Definition: Returns the greatest value from a list of expressions.
Usage:
SELECT GREATEST(10, 20, 30);
LEAST()
Definition: Returns the least value from a list of expressions.
Usage:
System Functions
to return the current database user, CURRENT_DATABASE to get the name of the
active database, and VERSION() to retrieve the version of the database
management system. These functions are useful for auditing,
troubleshooting, and optimizing database operations, helping users
understand the context and configuration of their SQL environment.
USER()
Definition: Returns the current database user.
Usage:
SELECT USER();
CURRENT_USER()
Definition: Returns the name of the current user.
Usage:
SELECT CURRENT_USER();
SESSION_USER()
Definition: Returns the session user name for the current session.
Usage:
SELECT SESSION_USER();
SYSTEM_USER()
Definition: Returns the system user name for the current session.
Usage:
SELECT SYSTEM_USER();
Window Functions
ROW_NUMBER()
Definition: Assigns a unique row number to each row in the result set.
Usage:
RANK()
Definition: Assigns a rank to each row in a result set with possible gaps
between ranks.
Usage:
DENSE_RANK()
Definition: Assigns a rank to each row without gaps between ranks.
Usage:
LAG()
Definition: Returns the value from a previous row in the result set.
Usage:
LEAD()
Definition: Returns the value from a subsequent row in the result set.
Usage:
FIRST_VALUE()
Definition: Returns the first value in an ordered set.
Usage:
SELECT name, FIRST_VALUE(salary) OVER (ORDER BY salary) FROM employees;
LAST_VALUE()
Definition: Returns the last value in an ordered set.
Usage:
PARTITION BY
Definition: Divides the result set into partitions and applies a window
function to each partition. It is often used with window functions like
ROW_NUMBER(), RANK(), LAG(), etc.
Usage:
SELECT name,
salary,
ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS rank
FROM employees;
527 Followers
Aug 5 5 Jul 19 73
Oct 16 50 5d ago 1
Lists
ChatGPT AI Regulation
21 stories · 860 saves 6 stories · 604 saves
5d ago 61 1 3d ago 21
Mastering SQL Self Joins: Common 10 SQL Tricks Every Data Analyst
Interview Questions and Solution… Should Know
In SQL, a self-join is a powerful technique that After 2 years of working as a data analyst, I’ve
allows you to join a table with itself. This is… come across some SQL tricks that can help…