0% found this document useful (0 votes)
4 views184 pages

Zero To Hero - 50 SQL Qns

The document emphasizes the importance of SQL as a critical skill for tech workers across various roles, highlighting its universality and the necessity of hands-on problem-solving for mastery. It advocates for using ANSI SQL to ensure compatibility and maintainability across different database systems. The document also provides a series of SQL questions and solutions to help readers practice and improve their SQL skills.

Uploaded by

Samridh Vikhas S
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views184 pages

Zero To Hero - 50 SQL Qns

The document emphasizes the importance of SQL as a critical skill for tech workers across various roles, highlighting its universality and the necessity of hands-on problem-solving for mastery. It advocates for using ANSI SQL to ensure compatibility and maintainability across different database systems. The document also provides a series of SQL questions and solutions to help readers practice and improve their SQL skills.

Uploaded by

Samridh Vikhas S
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 184

Open in app

Search Write

Solve 50 SQL questions — to move Open in Google Cache

from zero to hero : Free E-Book. Open in Read-Medium

Ebo Jackson · Follow Open in Freedium


107 min read · 3 days ago
Open in Archive

76
Open in Proxy API

Iframe/gist/embeds are not loaded in


the Google Cache proxy. For those,
please use the Read-Medium/Archive
proxy instead.

Having an issue ?
Open a ticket or mail here
Why SQL is a Critical Skill for All Tech Workers
In today’s data-driven world, SQL (Structured Query Language) is more than
just a tool for database administrators; it’s a foundational skill for anyone in
the tech industry. Whether you’re a software developer, data scientist,
product manager, or analyst, you will likely interact with data at some point
in your career. SQL allows you to query, manipulate, and analyze vast
amounts of data efficiently, making it an indispensable tool for decision-
making, product development, and business insights.

SQL is also universal. While different databases exist — MySQL, PostgreSQL,


Oracle, and others — the core principles of SQL remain the same, meaning
once you learn it, your skillset is transferable across multiple platforms. It’s a
skill that opens doors to various roles and empowers professionals to make
data-driven decisions, automate tasks, and drive growth in any industry.

Why Solving SQL Problems Leads to Mastery


Many people believe reading books or watching tutorials is enough to master
SQL, but true proficiency comes from solving real-world problems. Here’s
why:

1. Hands-on Practice: Solving SQL questions forces you to think critically


and apply concepts in a practical setting. Each problem presents unique
challenges that push you to explore SQL’s full capabilities.

2. Learning Through Failure: By tackling complex SQL problems, you’ll


encounter mistakes and errors. This process of debugging and refining
your queries reinforces your learning far better than passive reading or
watching.

3. Real-World Relevance: Most SQL exercises are based on actual data


scenarios. By solving them, you’re not just memorizing syntax but
learning how to handle real-life business problems like analyzing sales
trends, customer behaviors, or system performance.

4. Incremental Learning: Books and tutorials can only take you so far
before concepts become abstract. SQL problem-solving progresses
naturally from simple to complex, helping you solidify each concept
before moving to the next. This structured learning approach ensures
you build a strong foundation.

Why It’s Better to Use ANSI SQL to Solve SQL Problems


ANSI SQL, or standard SQL, ensures that queries are compatible across
different database systems, like MySQL, PostgreSQL, and SQL Server. Here’s
why it’s beneficial to use ANSI SQL:

1. Portability: ANSI SQL makes your queries portable, meaning they can
run on any database system that supports the standard. This is essential
when switching databases or working in diverse environments.

2. Maintainability: Following the SQL standard keeps your code consistent


and easier to read, maintain, and collaborate on. Other developers can
understand and modify your queries without needing database-specific
knowledge.

3. Future-Proofing: ANSI SQL compliance helps protect your queries


against future changes or updates in database technology. Database
systems evolve, but adhering to the standard ensures your queries
remain functional.

4. Broad Adoption: Most major database platforms support ANSI SQL, so


using it maximizes compatibility and leverages widely accepted best
practices.

Using ANSI SQL promotes consistency, portability, and long-term flexibility


across different database systems.

To determine if a SQL solution is ANSI-compliant, you need to ensure that


the SQL query adheres to the SQL standards defined by the American
National Standards Institute (ANSI) and avoids proprietary or non-standard
SQL extensions used by specific database systems. Here are key steps to
check for ANSI compliance:

1. Avoid Database-Specific Functions


Check for proprietary functions: Some databases provide custom
functions that are not part of ANSI SQL. For example, GETDATE() (SQL
Server) or SYSDATE() (Oracle) are not ANSI-compliant.

ANSI-compliant: CURRENT_DATE , CURRENT_TIMESTAMP

2. Use Standard SQL Syntax


Ensure standard syntax for joins: Use the standard JOIN syntax ( INNER
JOIN , LEFT JOIN ) instead of database-specific variations.

ANSI-compliant: SELECT * FROM table1 INNER JOIN table2 ON table1.id =

table2.id

3. Avoid Vendor-Specific Data Types


Check data types: Avoid using non-standard data types like VARCHAR2

(Oracle) or TINYINT (MySQL). Stick to standard types like VARCHAR ,

INTEGER , DECIMAL , etc.

ANSI-compliant: VARCHAR , INTEGER , DATE

4. Check for ANSI Standard Functions


Common functions: Functions like UPPER() , LOWER() , SUBSTRING() are
ANSI-compliant, but functions like SUBSTR() (Oracle) may not be.

ANSI-compliant: SELECT UPPER(column_name) FROM table_name

5. Standard Date and Time Handling


Date formatting: Ensure that you use the standard SQL functions for date
handling.
ANSI-compliant: CURRENT_DATE , EXTRACT(YEAR FROM date_column)

6. GROUP BY and HAVING Clauses


Follow standard use of aggregate functions: Ensure GROUP BY and HAVING

are used with standard aggregate functions like COUNT() , SUM() , AVG() ,

etc.

ANSI-compliant: SELECT COUNT(*) FROM table_name GROUP BY column_name

7. Limit, Offset, or Pagination


Pagination or limiting results: The LIMIT clause is not ANSI-compliant.
Use FETCH FIRST for compliance.

ANSI-compliant: SELECT * FROM table_name FETCH FIRST 10 ROWS ONLY

8. Test on Multiple Database Systems


Cross-database testing: Run your query on multiple database platforms
(e.g., MySQL, PostgreSQL, SQL Server) to see if it works consistently. If it
works on all, it’s likely ANSI-compliant.

By following these steps and avoiding database-specific syntax, functions,


and features, you’ll ensure that your SQL solution is ANSI-compliant.

Why Are There So Many Ways to Solve the Same SQL Task?
SQL is a highly flexible language, offering multiple ways to solve the same
problem due to its declarative nature. Unlike procedural languages, SQL
focuses on what result you want, not how to achieve it, leaving
implementation details to the database engine. Here’s why multiple solutions
often exist for the same SQL task:
1. Built-in Functions: SQL has a rich set of functions (e.g., COUNT() , SUM() ,

CASE , JOIN , etc.), and different combinations of these can yield the same
outcome.

2. Database-Specific Extensions: Different databases (MySQL, PostgreSQL,


SQL Server) offer unique extensions and optimizations, which lead to
alternative approaches.

3. Performance Considerations: While several methods can produce the


same result, performance varies. Some queries may run faster depending
on factors like indexing, query complexity, or dataset size.

4. Readability vs. Efficiency: One query might prioritize clarity and


maintainability, while another might focus on optimal performance or
use advanced SQL features.

In SQL, the diversity of solutions is a strength, allowing flexibility based on


the context, database, and performance needs.

Now let’s go ahead and solve 50 questions taken from the leetcode.com
website. Some are easy, some are hard and others are medium in level of
difficulty. The question number on the site will be provided for easy
reference. Enjoy!

Question 1

Recyclable and Low Fat Products

Difficulty: Easy

#### Table: Products


| Column Name | Type |
|-------------|---------|
| product_id | int |
| low_fats | enum |
| recyclable | enum |

- `product_id` is the primary key for this table.


- `low_fats` is an ENUM of type ('Y', 'N'), where 'Y' means the product is
low fat, and 'N' means it is not.
- `recyclable` is an ENUM of type ('Y', 'N'), where 'Y' means the product is
recyclable, and 'N' means it is not.

#### Problem Statement


Write a query to find the `product_id`s of products that are both low fat
and recyclable.

Return the result table in any order.

#### Example

**Input**:
Products table:

| product_id | low_fats | recyclable |


|-------------|----------|------------|
| 0 | Y | N |
| 1 | Y | Y |
| 2 | N | Y |
| 3 | Y | Y |
| 4 | N | N |

**Output**:

| product_id |
|-------------|
| 1 |
| 3 |

**Explanation**:
Only products 1 and 3 are both low fat and recyclable

[LeetCode 1757]

Solution
SELECT product_id
FROM products
WHERE low_fats = 'Y'
AND recyclable = 'Y';

The task is to find products that are both low fat and recyclable. The solution uses
a simple SELECT query to retrieve the product_id of such products from the
Products table.

Question 2

Find Customer Referee

Difficulty: Easy

#### Table: Customer

| Column Name | Type |


|-------------|---------|
| id | int |
| name | varchar |
| referee_id | int |

- `id` is the primary key column for this table.


- Each row of this table indicates the id of a customer,
their name, and the id of the customer who referred them.

#### Problem Statement


Write a query to find the names of the customers that are not referred
by the customer with id = 2.

Return the result table in any order.

#### Example

**Input**:
Customer table:

| id | name | referee_id |
|-----|-------|------------|
| 1 | Will | null |
| 2 | Jane | null |
| 3 | Alex | 2 |
| 4 | Bill | null |
| 5 | Zack | 1 |
| 6 | Mark | 2 |

**Output**:

| name |
|-------|
| Will |
| Jane |
| Bill |
| Zack |

**Explanation**:
The customers referred by customer with `id = 2` are Alex and Mark.
The customers who are not referred by customer 2 are Will, Jane, Bill,
and Zack.

[LeetCode 1148]

Solution

SELECT name
FROM customer
WHERE referee_id != 2
OR referee_id IS NULL;

The SQL query selects the names of customers who were not referred by the
customer with id = 2 or who have no referee at all (indicated by a NULL value in
the referee_id column). The condition referee_id != 2 filters out those referred
by customer 2, while referee_id IS NULL ensures that customers without any
referee are also included. This query returns a list of customers either not referred
by customer 2 or not referred by anyone.
Question 3

Big Countries

Difficulty: Easy

#### Table: World

| Column Name | Type |


|-------------|---------|
| name | varchar |
| continent | varchar |
| area | int |
| population | int |
| gdp | bigint |

- `name` is the primary key for this table.


- Each row contains information about the name of a country,
the continent to which it belongs, its area, population, and GDP.

#### Problem Statement


A country is considered big if:
- It has an area of at least 3,000,000 km², or
- It has a population of at least 25,000,000.

Write a query to find the name, population, and area of the big countries.

Return the result table in any order.

#### Example

**Input**:
World table:

| name | continent | area | population | gdp |


|-------------|-----------|---------|------------|--------------|
| Afghanistan | Asia | 652230 | 25500100 | 20343000000 |
| Albania | Europe | 28748 | 2831741 | 12960000000 |
| Algeria | Africa | 2381741 | 37100000 | 188681000000 |
| Andorra | Europe | 468 | 78115 | 3712000000 |
| Angola | Africa | 1246700 | 20609294 | 100990000000 |

**Output**:

| name | population | area |


|-------------|------------|---------|
| Afghanistan | 25500100 | 652230 |
| Algeria | 37100000 | 2381741 |

[LeetCode 595]

Solution

SELECT name, population, area


FROM world
WHERE area >= 3000000
OR population >= 25000000;

The solution retrieves the name , population , and area of countries that are
considered "big" based on the conditions provided. A country is classified as big if
it either has an area of at least 3,000,000 km² or a population of at least
25,000,000. The query uses the SELECT statement to fetch the relevant columns
from the world table, and the WHERE clause filters the rows to return only the
countries that satisfy either of the two conditions (area or population). The OR

operator ensures that countries meeting at least one of the criteria are included in
the result.

Question 4

Article Views I

Difficulty: Easy

#### Table: Views

| Column Name | Type |


|---------------|---------|
| article_id | int |
| author_id | int |
| viewer_id | int |
| view_date | date |

- There is no primary key for this table, meaning the table may have duplicate
rows.
- Each row in this table indicates that a viewer viewed an article
(written by an author) on a specific date.
- `author_id` and `viewer_id` being equal means the author viewed their
own article.

#### Problem Statement


Write a query to find all the authors who viewed at least one of their
own articles.

Return the result table sorted by `id` in ascending order.

#### Example

**Input**:
Views table:

| article_id | author_id | viewer_id | view_date |


|------------|-----------|-----------|------------|
| 1 | 3 | 5 | 2019-08-01 |
| 1 | 3 | 6 | 2019-08-02 |
| 2 | 7 | 7 | 2019-08-01 |
| 2 | 7 | 6 | 2019-08-02 |
| 4 | 7 | 1 | 2019-07-22 |
| 3 | 4 | 4 | 2019-07-21 |
| 3 | 4 | 4 | 2019-07-21 |

**Output**:

| id |
|------|
| 4 |
| 7 |

**Explanation**:
Authors 4 and 7 viewed at least one of their own articles,
which is indicated by rows where `author_id = viewer_id`.

[LeetCode 1148]

Solution
SELECT DISTINCT author_id AS id
FROM views
WHERE author_id = viewer_id
ORDER BY author_id ASC;

The query selects distinct author_id s where the author has viewed their own
article, indicated by the condition author_id = viewer_id . The DISTINCT keyword
ensures that duplicate entries for the same author are removed, meaning each
author appears only once in the result. The output is then sorted in ascending
order of author_id using the ORDER BY clause. This query effectively retrieves a list
of authors who have viewed at least one of their own articles.

Question 5

Invalid Tweets

Difficulty: Easy

#### Table: Tweets

| Column Name | Type |


|-------------|---------|
| tweet_id | int |
| content | varchar |

- `tweet_id` is the primary key for this table, meaning it contains


unique values.
- This table contains all the tweets in a social media app.

#### Problem Statement


Write a query to find the IDs of the invalid tweets. A tweet is
considered invalid if its `content` exceeds 15 characters in length.

Return the result table in any order.

#### Example

**Input**:
Tweets table:

| tweet_id | content |
|----------|----------------------------------|
| 1 | Vote for Biden |
| 2 | Let us make America great again! |

**Output**:

| tweet_id |
|----------|
| 2 |

**Explanation**:
- Tweet 1 has a length of 14 characters, so it is valid.
- Tweet 2 has a length of 32 characters, which exceeds the limit,
making it an invalid tweet.

[LeetCode 1683]

Solution

SELECT tweet_id
FROM tweets
WHERE LENGTH(content) > 15;

This query retrieves the tweet_id of tweets that are considered invalid. The
LENGTH(content) function checks the number of characters in the content

column, and the WHERE clause filters the tweets where the content length is greater
than 15 characters. The query returns the tweet_id of all such invalid tweets.

Question 6

Replace Employee ID With The Unique Identifier


Difficulty: Easy

#### Table: Employees

| Column Name | Type |


|-------------|---------|
| id | int |
| name | varchar |

- `id` is the primary key for this table, meaning it contains unique values.
- Each row contains the `id` and the `name` of an employee.

#### Table: EmployeeUNI

| Column Name | Type |


|-------------|---------|
| id | int |
| unique_id | int |

- `(id, unique_id)` is the primary key for this table.


- Each row contains the `id` and the corresponding `unique_id` of an employee.

#### Problem Statement


Write a query to display the `unique_id` of each user. If a user does not
have a `unique_id`, display `null` instead.

Return the result table in any order.

#### Example

**Input**:
Employees table:

| id | name |
|-----|----------|
| 1 | Alice |
| 7 | Bob |
| 11 | Meir |
| 90 | Winston |
| 3 | Jonathan |

EmployeeUNI table:

| id | unique_id |
|-----|-----------|
| 3 | 1 |
| 11 | 2 |
| 90 | 3 |

**Output**:
| unique_id | name |
|-----------|----------|
| null | Alice |
| null | Bob |
| 2 | Meir |
| 3 | Winston |
| 1 | Jonathan |

**Explanation**:
- Alice and Bob do not have a unique ID, so `null` is displayed for them.
- Meir's unique ID is 2, Winston's is 3, and Jonathan's is 1.

[LeetCode 1378]

Solution

SELECT b.unique_id, a.name


FROM Employees AS a
LEFT JOIN EmployeeUNI AS b
ON a.id = b.id;

This query uses a LEFT JOIN to combine data from the Employees and
EmployeeUNI tables. It selects the unique_id from the EmployeeUNI table and the
name from the Employees table. The LEFT JOIN ensures that all employees from
the Employees table are included, even if they don't have a corresponding
unique_id in the EmployeeUNI table, returning null for those without a match.
This retrieves the unique IDs for employees, or null if an employee lacks one

Question 7

Product Sales Analysis I


**Difficulty**: Easy

#### Table: Sales

| Column Name | Type |


|-------------|-------|
| sale_id | int |
| product_id | int |
| year | int |
| quantity | int |
| price | int |

- `(sale_id, year)` is the primary key for this table.


- `product_id` is a foreign key referencing the `Product` table.
- Each row shows a sale of the product for a certain year, and
the `price` is per unit.

#### Table: Product

| Column Name | Type |


|--------------|---------|
| product_id | int |
| product_name | varchar |

- `product_id` is the primary key for this table.


- Each row contains the `product_id` and `product_name`.

#### Problem Statement


Write a query to report the `product_name`, `year`, and `price`
for each sale in the `Sales` table.

Return the result table in any order.

#### Example

**Input**:

Sales table:

| sale_id | product_id | year | quantity | price |


|---------|------------|------|----------|-------|
| 1 | 100 | 2008 | 10 | 5000 |
| 2 | 100 | 2009 | 12 | 5000 |
| 7 | 200 | 2011 | 15 | 9000 |

Product table:

| product_id | product_name |
|------------|--------------|
| 100 | Nokia |
| 200 | Apple |
| 300 | Samsung |

**Output**:

| product_name | year | price |


|--------------|-------|-------|
| Nokia | 2008 | 5000 |
| Nokia | 2009 | 5000 |
| Apple | 2011 | 9000 |

**Explanation**:
From `sale_id = 1`, we conclude that Nokia was sold for 5000 in 2008.
From `sale_id = 2`, Nokia was sold for 5000 in 2009.
From `sale_id = 7`, Apple was sold for 9000 in 2011.

[LeetCode 1068]

Solution

SELECT p.product_name, s.year, s.price


FROM sales AS s, product AS p
WHERE s.product_id = p.product_id;

First query uses an implicit join, also known as a comma join, which behaves like
an inner join. Here, it combines the sales and product tables and only returns
rows where the product_id in both tables matches. If there is no corresponding
product_id in the product table for a record in the sales table, that sales

record will be excluded from the result.

Result: Only records where there is a match between the sales and product tables
based on product_id are returned.

Data Loss: Any rows in the sales table without a matching product_id in the
product table are discarded.
Alternate Solution

SELECT p.product_name, s.year, s.price


FROM sales AS s
LEFT JOIN product AS p
ON s.product_id = p.product_id;

Second query uses an explicit LEFT JOIN , which includes all rows from the sales

table, even if they don't have a matching product_id in the product table. For
unmatched rows, the columns from the product table (e.g., product_name ) will
return NULL .

Result: All rows from the sales table are included. If a row from the sales table
has no corresponding row in the product table, the product_name will be NULL .

Data Preservation: Unlike the first query, this query retains all records from
sales , even if there is no matching product_id in the product table.

Key Differences:

Matching vs. Non-Matching Data:

The first query returns only the rows where product_id exists in both sales and
product tables.

The second query returns all rows from sales , including those without a match
in the product table.

Data Handling:
The first query is essentially an inner join, so unmatched rows are excluded.

The second query is a left join, ensuring all sales rows are preserved, even if the
product data is missing.

Null Handling:

The first query does not handle nulls for unmatched product_id s because those
rows are excluded.

The second query may return NULL for product_name when there’s no match in
the product table.

In summary, use the first query when you only want matching rows between the
two tables. Use the second query when you need all rows from the sales table,
regardless of whether they have a matching product_id in the product table.

Question 8

Customer Who Visited but Did Not Make Any Transactions

Difficulty: Easy

#### Table: Visits

| Column Name | Type |


|-------------|---------|
| visit_id | int |
| customer_id | int |

- `visit_id` is the unique identifier for this table.


- This table contains information about the customers who visited the mall.

#### Table: Transactions


| Column Name | Type |
|----------------|---------|
| transaction_id | int |
| visit_id | int |
| amount | int |

- `transaction_id` is the unique identifier for this table.


- This table contains information about the transactions made during a
particular `visit_id`.

#### Problem Statement


Write a query to find the `customer_id` of the users who visited the mall
without making any transactions, along with the number of times
(`count_no_trans`) they made such visits.

Return the result table sorted in any order.

#### Example

**Input**:

Visits table:

| visit_id | customer_id |
|----------|-------------|
| 1 | 23 |
| 2 | 9 |
| 4 | 30 |
| 5 | 54 |
| 6 | 96 |
| 7 | 54 |
| 8 | 54 |

Transactions table:

| transaction_id | visit_id | amount |


|----------------|----------|--------|
| 2 | 5 | 310 |
| 3 | 5 | 300 |
| 9 | 5 | 200 |
| 12 | 1 | 910 |
| 13 | 2 | 970 |

**Output**:

| customer_id | count_no_trans |
|-------------|----------------|
| 54 | 2 |
| 30 | 1 |
| 96 | 1 |
**Explanation**:
- Customer with `id = 23` visited once and made a transaction during the visit.
- Customer with `id = 9` visited once and made a transaction.
- Customer with `id = 30` visited once and did not make any transactions.
- Customer with `id = 54` visited three times. During 2 visits, no
transactions were made, but in one visit, they made 3 transactions.
- Customer with `id = 96` visited once and did not make any transactions.

[LeetCode 1581]

Solution

SELECT customer_id, COUNT(customer_id) AS count_no_trans


FROM visits
WHERE visit_id NOT IN (SELECT visit_id FROM transactions)
GROUP BY customer_id;

This query works as follows:

SELECT customer_id, COUNT(customer_id): This retrieves the customer_id and


counts how many times each customer visited the mall without making any
transactions.

FROM visits: The data is fetched from the visits table, which holds information
about customers who visited the mall.

WHERE visit_id NOT IN (SELECT visit_id FROM transactions): The subquery


selects all visit_id s that are present in the transactions table. The outer query
then filters out these visit_id s, leaving only those visits where no transactions
were made.
GROUP BY customer_id: This groups the results by customer_id to calculate the
number of visits without transactions for each customer.

This query effectively identifies customers who visited the mall but did not
complete any transactions and returns how many such visits they made.

Question 9

Rising Temperature

Difficulty: Easy

#### Table: Weather

| Column Name | Type |


|---------------|---------|
| id | int |
| recordDate | date |
| temperature | int |

- `id` is the column with unique values for this table.


- There are no different rows with the same `recordDate`.
- This table contains information about the temperature on a certain day.

#### Problem Statement


Write a query to find all dates' IDs with higher temperatures compared
to their previous dates (yesterday).

Return the result table in any order.

#### Example

**Input**:
Weather table:

| id | recordDate | temperature |
|----|------------|-------------|
| 1 | 2015-01-01 | 10 |
| 2 | 2015-01-02 | 25 |
| 3 | 2015-01-03 | 20 |
| 4 | 2015-01-04 | 30 |

**Output**:
| id |
|----|
| 2 |
| 4 |

**Explanation**:
- On 2015-01-02, the temperature was higher than the previous day (10 -> 25).
- On 2015-01-04, the temperature was higher than the previous day (20 -> 30).

[LeetCode 197]

Solution

SELECT W1.id
FROM Weather W1, Weather W2
WHERE W1.recordDate = DATE_ADD(W2.recordDate, INTERVAL 1 DAY)
AND W1.temperature > W2.temperature;

This query uses a self-join on the Weather table to compare each day's record with
the previous day's record. It selects W1.id where the recordDate of W1 is exactly
one day after W2 's recordDate and the temperature on W1 is higher than on W2 .

Alternate Solution

SELECT w1.id
FROM Weather AS w1
JOIN Weather AS w2
ON DATEDIFF(w1.recordDate, w2.recordDate) = 1
WHERE w1.recordDate > w2.recordDate
AND w1.temperature > w2.temperature;
This query achieves the same result using a self-join with the DATEDIFF function to
find records where the recordDate of w1 is exactly one day later than w2 . It then
filters the results to ensure w1 's temperature is higher than w2 's. The DATEDIFF

function calculates the difference in days between the two dates, which simplifies
the logic compared to using DATE_ADD .

First Query: Uses DATE_ADD to find the previous day's date and then compares
temperatures.
Second Query: Uses DATEDIFF to directly calculate the day difference and check
temperatures.
Both queries effectively compare temperatures between consecutive days but use
different approaches for handling date differences.

Question 10

Average Time of Process per Machine

Difficulty: Easy

#### Table: Activity

| Column Name | Type |


|---------------|---------|
| machine_id | int |
| process_id | int |
| activity_type | enum |
| timestamp | float |

- `(machine_id, process_id, activity_type)` is the primary key of this table.


- `machine_id` is the ID of a machine.
- `process_id` is the ID of a process running on the machine.
- `activity_type` is an ENUM ('start', 'end'), representing when a
process starts or ends.
- `timestamp` is a float representing the time in seconds.
- The `start` timestamp is always before the `end` timestamp for each
(machine_id, process_id) pair.

#### Problem Statement


Write a query to find the average time each machine takes to complete a
process. The average time is the total time to complete all processes on
the machine divided by the number of processes, rounded to 3 decimal places.

Return the result table in any order.

#### Example

**Input**:
Activity table:

| machine_id | process_id | activity_type | timestamp |


|------------|------------|---------------|-----------|
| 0 | 0 | start | 0.712 |
| 0 | 0 | end | 1.520 |
| 0 | 1 | start | 3.140 |
| 0 | 1 | end | 4.120 |
| 1 | 0 | start | 0.550 |
| 1 | 0 | end | 1.550 |
| 1 | 1 | start | 0.430 |
| 1 | 1 | end | 1.420 |
| 2 | 0 | start | 4.100 |
| 2 | 0 | end | 4.512 |
| 2 | 1 | start | 2.500 |
| 2 | 1 | end | 5.000 |

**Output**:

| machine_id | processing_time |
|------------|-----------------|
| 0 | 0.894 |
| 1 | 0.995 |
| 2 | 1.456 |

**Explanation**:
- Machine 0's average time: ((1.520 - 0.712) + (4.120 - 3.140)) / 2 = 0.894
- Machine 1's average time: ((1.550 - 0.550) + (1.420 - 0.430)) / 2 = 0.995
- Machine 2's average time: ((4.512 - 4.100) + (5.000 - 2.500)) / 2 = 1.456

[LeetCode 1661]

Solution
SELECT A.MACHINE_ID,
ROUND(AVG(B.TIMESTAMP - A.TIMESTAMP), 3) AS PROCESSING_TIME
FROM ACTIVITY A, ACTIVITY B
WHERE A.MACHINE_ID = B.MACHINE_ID
AND A.PROCESS_ID = B.PROCESS_ID
AND A.ACTIVITY_TYPE = 'START'
AND B.ACTIVITY_TYPE = 'END'
GROUP BY MACHINE_ID;

This query uses a self-join between ACTIVITY table as A (for start times) and B

(for end times), based on the machine_id and process_id . It calculates the
difference between the start and end timestamps for each process, averages the
results per machine, and rounds the result to 3 decimal places.

Alternate Solution

SELECT a1.machine_id AS machine_id,


ROUND(AVG(a2.timestamp - a1.timestamp), 3) AS processing_time
FROM activity AS a1
JOIN activity AS a2
ON a1.machine_id = a2.machine_id
AND a1.process_id = a2.process_id
AND a1.activity_type = 'start'
AND a2.activity_type = 'end'
GROUP BY machine_id
ORDER BY machine_id;

This query achieves the same as the first one but uses an explicit JOIN . It matches
rows where a1 has a start activity and a2 has an end activity for the same
machine_id and process_id , calculates the time differences, averages them, and
rounds the result to 3 decimal places. It includes an additional ORDER BY clause to
sort the results by machine_id .
Superior Option:
The second query is superior because it uses the more modern and clearer JOIN

syntax, making it more readable and easier to maintain than the older, implicit
join (comma-separated) syntax in Solution 1.

Question 11

Employee Bonus

Difficulty: Easy

#### Table: Employee

| Column Name | Type |


|-------------|---------|
| empId | int |
| name | varchar |
| supervisor | int |
| salary | int |

- `empId` is the column with unique values for this table.


- Each row contains the name, ID of an employee, their salary,
and the ID of their manager.

#### Table: Bonus

| Column Name | Type |


|-------------|------|
| empId | int |
| bonus | int |

- `empId` is a foreign key referencing the `Employee` table.


- Each row contains the `empId` and the bonus amount for an employee.

#### Problem Statement


Write a solution to report the `name` and `bonus` amount of each
employee with a bonus less than 1000.

If an employee does not have a bonus, display `null` for the bonus.

Return the result table in any order.

#### Example
**Input**:
Employee table:
| empId | name | supervisor | salary |
|-------|--------|------------|--------|
| 3 | Brad | null | 4000 |
| 1 | John | 3 | 1000 |
| 2 | Dan | 3 | 2000 |
| 4 | Thomas | 3 | 4000 |

Bonus table:
| empId | bonus |
|-------|-------|
| 2 | 500 |
| 4 | 2000 |

**Output**:
| name | bonus |
|-------|-------|
| Brad | null |
| John | null |
| Dan | 500 |

[Leetcode 577]

Solution

SELECT E.name, B.bonus


FROM Employee AS E
LEFT JOIN Bonus AS B
ON E.empId = B.empId
WHERE B.bonus < 1000 OR B.bonus IS NULL;

This query works as follows:

LEFT JOIN: The Employee table is joined with the Bonus table using a LEFT JOIN

to ensure that all employees are included, even those without a bonus.
WHERE clause: Filters the results to include:

Employees with a bonus less than 1000.

Employees without a bonus ( bonus IS NULL ).

The query returns the name and bonus for each employee, displaying null for
employees without a bonus.

Question 12

Students and Examinations

Difficulty: Easy

#### Table: Students

| Column Name | Type |


|---------------|---------|
| student_id | int |
| student_name | varchar |

- `student_id` is the primary key of this table.


- Each row contains the ID and the name of a student in the school.

#### Table: Subjects

| Column Name | Type |


|---------------|---------|
| subject_name | varchar |

- `subject_name` is the primary key of this table.


- Each row contains the name of a subject taught in the school.

#### Table: Examinations

| Column Name | Type |


|---------------|---------|
| student_id | int |
| subject_name | varchar |
- There is no primary key for this table.
- Each row indicates that a student with `student_id` attended an exam in
`subject_name`.

#### Problem Statement


Write a query to find the number of times each student attended each exam.
The result should be ordered by `student_id` and `subject_name`.

Return the result table in any order.

#### Example

**Input**:
Students table:

| student_id | student_name |
|------------|--------------|
| 1 | Alice |
| 2 | Bob |
| 13 | John |
| 6 | Alex |

Subjects table:

| subject_name |
|--------------|
| Math |
| Physics |
| Programming |

Examinations table:

| student_id | subject_name |
|------------|--------------|
| 1 | Math |
| 1 | Physics |
| 1 | Programming |
| 2 | Programming |
| 1 | Physics |
| 1 | Math |
| 13 | Math |
| 13 | Programming |
| 13 | Physics |
| 2 | Math |
| 1 | Math |

**Output**:

| student_id | student_name | subject_name | attended_exams |


|------------|--------------|--------------|----------------|
| 1 | Alice | Math | 3 |
| 1 | Alice | Physics | 2 |
| 1 | Alice | Programming | 1 |
| 2 | Bob | Math | 1 |
| 2 | Bob | Physics | 0 |
| 2 | Bob | Programming | 1 |
| 6 | Alex | Math | 0 |
| 6 | Alex | Physics | 0 |
| 6 | Alex | Programming | 0 |
| 13 | John | Math | 1 |
| 13 | John | Physics | 1 |
| 13 | John | Programming | 1 |

**Explanation**:
- Alice attended the Math exam 3 times, Physics 2 times, and Programming 1
time.
- Bob attended the Math exam once, Programming once, and did not attend the
Physics exam.
- Alex did not attend any exams.
- John attended all three exams once.

[LeetCode 1280]

Solution

SELECT student_subject.student_id,
student_subject.student_name,
student_subject.subject_name,
COUNT(exam.student_id) AS attended_exams
FROM
(
SELECT student.*, subject.*
FROM students AS student
CROSS JOIN subjects AS subject
) AS student_subject
LEFT JOIN examinations AS exam
ON student_subject.student_id = exam.student_id
AND student_subject.subject_name = exam.subject_name
GROUP BY student_subject.student_id, student_subject.student_name,
student_subject.subject_name
ORDER BY student_subject.student_id, student_subject.student_name,
student_subject.subject_name;
This query works as follows:

student_subject : Combines all students with all subjects using CROSS JOIN ,

which creates a pair for every student and subject.

exam : The examinations table, which tracks how many exams a student attended
for a specific subject.

The LEFT JOIN ensures that even if a student didn’t attend an exam, they will still
be included in the result with a count of 0 for that subject.

The results are grouped by student and subject, counting how many times a
student attended the exam for each subject. The output is then ordered by
student_id and subject_name .

Question 13

Managers with at Least 5 Direct Reports

Difficulty: Medium

#### Table: Employee

| Column Name | Type |


|-------------|---------|
| id | int |
| name | varchar |
| department | varchar |
| managerId | int |

- `id` is the primary key of this table.


- Each row indicates the name of an employee, their department,
and the id of their manager.
- If `managerId` is null, the employee does not have a manager.
- No employee will be their own manager.

#### Problem Statement


Write a query to find the managers who have at least five direct reports.

Return the result table in any order.

#### Example

**Input:**
Employee table:

| id | name | department | managerId |


|-----|-------|------------|-----------|
| 101 | John | A | null |
| 102 | Dan | A | 101 |
| 103 | James | A | 101 |
| 104 | Amy | A | 101 |
| 105 | Anne | A | 101 |
| 106 | Ron | B | 101 |

**Output:**

| name |
|------|
| John |

**Explanation:**
John is the only manager with at least five direct reports:
Dan, James, Amy, Anne, and Ron.

[Leetcode 570]

Solution

SELECT e1.name
FROM employee AS e1
JOIN (
SELECT managerId
FROM employee
GROUP BY managerId
HAVING COUNT(managerId) > 4
) AS e2
ON e1.id = e2.managerId;
This query works as follows:

Subquery ( e2 ):

Selects managerId from the employee table.

Groups results by managerId .

Uses HAVING COUNT(managerId) > 4 to filter for managers who have more than 4
direct reports.

Main Query:

Joins the original employee table ( e1 ) with the results of the subquery ( e2 ).

Matches e1.id with e2.managerId .

Selects the name of managers who meet the criteria.

This query finds the names of managers who have at least 5 direct reports.

Question 14

Confirmation Rate

Difficulty: Medium

#### Table: Signups

| Column Name | Type |


|-------------|----------|
| user_id | int |
| time_stamp | datetime |
- `user_id` is the unique identifier for each user.
- Each row contains the signup time for the user with `user_id`.

#### Table: Confirmations

| Column Name | Type |


|-------------|----------|
| user_id | int |
| time_stamp | datetime |
| action | ENUM |

- `(user_id, time_stamp)` is the primary key of this table.


- `user_id` is a foreign key referencing the `Signups` table.
- `action` is an ENUM ('confirmed', 'timeout') indicating whether
the confirmation was successful or not.

#### Problem Statement


The confirmation rate for a user is calculated as the number of 'confirmed'
messages divided by the total number of requested confirmation messages. For
users who did not request any confirmation messages, the rate is 0. Round the
confirmation rate to two decimal places.

Write a query to find the confirmation rate of each user.

Return the result table in any order.

#### Example

**Input:**

**Signups table:**

| user_id | time_stamp |
|---------|---------------------|
| 3 | 2020-03-21 10:16:13 |
| 7 | 2020-01-04 13:57:59 |
| 2 | 2020-07-29 23:09:44 |
| 6 | 2020-12-09 10:39:37 |

**Confirmations table:**

| user_id | time_stamp | action |


|---------|---------------------|-----------|
| 3 | 2021-01-06 03:30:46 | timeout |
| 3 | 2021-07-14 14:00:00 | timeout |
| 7 | 2021-06-12 11:57:29 | confirmed |
| 7 | 2021-06-13 12:58:28 | confirmed |
| 7 | 2021-06-14 13:59:27 | confirmed |
| 2 | 2021-01-22 00:00:00 | confirmed |
| 2 | 2021-02-28 23:59:59 | timeout |
**Output:**

| user_id | confirmation_rate |
|---------|-------------------|
| 6 | 0.00 |
| 3 | 0.00 |
| 7 | 1.00 |
| 2 | 0.50 |

**Explanation:**
- User 6 did not request any confirmation messages; confirmation rate is 0.
- User 3 requested 2 confirmations, both timed out; rate is 0.
- User 7 requested 3 confirmations, all were confirmed; rate is 1.
- User 2 requested 2 confirmations, 1 was confirmed and 1 timed out;
rate is 0.5.

[LeetCode 1934]

Solution

SELECT
s.user_id,
ROUND(COALESCE(c.confirmation_rate, 0), 2) AS confirmation_rate
FROM
signups AS s
LEFT JOIN (
SELECT
user_id,
AVG(CASE WHEN action = 'confirmed' THEN 1 ELSE 0 END)
AS confirmation_rate
FROM
confirmations
GROUP BY
user_id
) AS c
ON
s.user_id = c.user_id;

Subquery ( c ):
Purpose: Calculate the confirmation rate for each user.

Details:

AVG(CASE WHEN action = 'confirmed' THEN 1 ELSE 0 END) computes the average
of 1 for 'confirmed' actions and 0 for 'timeout' actions, effectively giving the
proportion of confirmations.

GROUP BY user_id ensures that the rate is computed for each user.

Main Query:

LEFT JOIN: Joins the signups table with the subquery result on user_id .

COALESCE(c.confirmation_rate, 0) ensures that users without confirmation data


get a rate of 0.

ROUND(…, 2): Rounds the confirmation rate to two decimal places for cleaner
output.

Summary: This query calculates the confirmation rate for each user from the
confirmations table and ensures that users who didn't request any confirmations
are still included with a rate of 0.

Question 15

Not Boring Movies

Difficulty: Easy

#### Table: Cinema


| Column Name | Type |
|----------------|----------|
| id | int |
| movie | varchar |
| description | varchar |
| rating | float |

- `id` is the primary key for this table.


- `movie` is the name of the movie.
- `description` is a text describing the movie.
- `rating` is a float with 2 decimal places, representing the movie's rating.

#### Problem Statement


Write a query to find movies with an odd-numbered `id` and a `description` that

Return the result table ordered by `rating` in descending order.

#### Example

**Input:**
Cinema table:

| id | movie | description | rating |


|----|------------|-------------|--------|
| 1 | War | great 3D | 8.9 |
| 2 | Science | fiction | 8.5 |
| 3 | Irish | boring | 6.2 |
| 4 | Ice Song | Fantasy | 8.6 |
| 5 | House Card | Interesting | 9.1 |

**Output:**

| id | movie | description | rating |


|----|------------|-------------|--------|
| 5 | House Card | Interesting | 9.1 |
| 1 | War | great 3D | 8.9 |

**Explanation:**
Movies with odd-numbered IDs are 1, 3, and 5.
The movie with ID 3 is excluded because its description is "boring".

[Leetcode 620]

Solution
SELECT *
FROM cinema
WHERE id % 2 = 1
AND description <> 'boring'
ORDER BY rating DESC;

The query uses the condition id % 2 = 1 to filter rows where the id is odd. The
modulus operator % calculates the remainder of dividing id by 2. If the
remainder is 1, then id is odd, and the condition evaluates to true. This ensures
that only rows with odd id values are included in the result.

Question 16

### Average Selling Price

**Difficulty:** Easy

#### Table: Prices

| Column Name | Type |


|-------------|------|
| product_id | int |
| start_date | date |
| end_date | date |
| price | int |

- `(product_id, start_date, end_date)` is the primary key.


- This table shows the price of each `product_id` for a specific
date range (from `start_date` to `end_date`).
- There are no overlapping periods for the same `product_id`.

#### Table: UnitsSold

| Column Name | Type |


|---------------|------|
| product_id | int |
| purchase_date | date |
| units | int |

- Each row in this table indicates the units of `product_id` sold on


`purchase_date`. There may be duplicate rows.

#### Problem Statement


Write a query to find the average selling price for each product.
The `average_price` should be rounded to 2 decimal places.
If no units of a product are sold, the `average_price` is assumed to be 0.

Return the result table in any order.

#### Example

**Input:**
Prices table:

| product_id | start_date | end_date | price |


|------------|------------|------------|-------|
| 1 | 2019-02-17 | 2019-02-28 | 5 |
| 1 | 2019-03-01 | 2019-03-22 | 20 |
| 2 | 2019-02-01 | 2019-02-20 | 15 |
| 2 | 2019-02-21 | 2019-03-31 | 30 |

UnitsSold table:

| product_id | purchase_date | units |


|------------|---------------|-------|
| 1 | 2019-02-25 | 100 |
| 1 | 2019-03-01 | 15 |
| 2 | 2019-02-10 | 200 |
| 2 | 2019-03-22 | 30 |

**Output:**

| product_id | average_price |
|------------|---------------|
| 1 | 6.96 |
| 2 | 16.96 |

**Explanation:**
The average selling price is calculated as the total price of
the product divided by the number of units sold:

- Product 1: `((100 * 5) + (15 * 20)) / 115 = 6.96`


- Product 2: `((200 * 15) + (30 * 30)) / 230 = 16.96`

[LeetCode 1251]

Solution
SELECT p.product_id,
COALESCE(ROUND(SUM(u.units * p.price) / SUM(u.units), 2), 0)
AS average_price
FROM prices p
LEFT JOIN unitssold u
ON p.product_id = u.product_id
AND u.purchase_date BETWEEN p.start_date AND p.end_date
GROUP BY p.product_id;

This query calculates the average selling price of each product by joining the
prices and unitssold tables. It multiplies the units sold by the product price for
each sale, sums these values, and divides by the total units sold to find the average
price. The COALESCE function ensures that if a product has no sales, the result will
be 0. The result is rounded to 2 decimal places.

Question 17

Project Employees I

Difficulty: Easy

#### Table: Project

| Column Name | Type |


|-------------|------|
| project_id | int |
| employee_id | int |

- `project_id`, `employee_id` is the primary key of this table.


- `employee_id` is a foreign key to the Employee table.
- Each row of this table indicates that the employee with
`employee_id` is working on the project with `project_id`.

#### Table: Employee

| Column Name | Type |


|------------------|---------|
| employee_id | int |
| name | varchar |
| experience_years | int |

- `employee_id` is the primary key of this table.


- Each row contains information about one employee.
- `experience_years` is an integer and guaranteed not to be NULL.

#### Problem Statement


Write an SQL query that reports the average experience years of
all the employees for each project, rounded to 2 digits.

Return the result table in any order.

#### Example

**Input:**
Project table:

| project_id | employee_id |
|------------|-------------|
| 1 | 1 |
| 1 | 2 |
| 1 | 3 |
| 2 | 1 |
| 2 | 4 |

Employee table:

| employee_id | name | experience_years |


|-------------|--------|------------------|
| 1 | Khaled | 3 |
| 2 | Ali | 2 |
| 3 | John | 1 |
| 4 | Doe | 2 |

**Output:**

| project_id | average_years |
|------------|---------------|
| 1 | 2.00 |
| 2 | 2.50 |

**Explanation:**
The average experience years for the first project is
(3 + 2 + 1) / 3 = 2.00, and for the second project, it is (3 + 2) / 2 = 2.50.

[LeetCode 1075]
Solution

SELECT p.project_id, ROUND(AVG(e.experience_years), 2) AS average_years


FROM project p
LEFT JOIN employee e
ON p.employee_id = e.employee_id
GROUP BY p.project_id;

This query calculates the average experience years of employees for each project. It
joins the project table with the employee table on employee_id , computes the
average experience for each project, and rounds the result to 2 decimal places. The
LEFT JOIN ensures that all projects are included, even if no employees are assigned
to some projects.

Question 18

1633. Percentage of Users Attended a Contest

Difficulty: Easy

#### Table: Users

| Column Name | Type |


|-------------|---------|
| user_id | int |
| user_name | varchar |

- `user_id` is the primary key (column with unique values) for this table.
- Each row of this table contains the name and the id of a user.

#### Table: Register

| Column Name | Type |


|-------------|---------|
| contest_id | int |
| user_id | int |
- `(contest_id, user_id)` is the primary key (combination of columns
with unique values) for this table.
- Each row of this table contains the id of a user and the contest
they registered into.

#### Problem Statement


Write a solution to find the percentage of the users registered in
each contest rounded to two decimals.

Return the result table ordered by percentage in descending order.


In case of a tie, order it by `contest_id` in ascending order.

#### Example

**Input:**
Users table:

| user_id | user_name |
|---------|-----------|
| 6 | Alice |
| 2 | Bob |
| 7 | Alex |

Register table:

| contest_id | user_id |
|------------|---------|
| 215 | 6 |
| 209 | 2 |
| 208 | 2 |
| 210 | 6 |
| 208 | 6 |
| 209 | 7 |
| 209 | 6 |
| 215 | 7 |
| 208 | 7 |
| 210 | 2 |
| 207 | 2 |
| 210 | 7 |

**Output:**

| contest_id | percentage |
|------------|------------|
| 208 | 100.0 |
| 209 | 100.0 |
| 210 | 100.0 |
| 215 | 66.67 |
| 207 | 33.33 |

**Explanation:**
All the users registered in contests 208, 209, and 210.
The percentage is 100% and we sort them in the answer table by
`contest_id` in ascending order. Alice and Alex registered in
contest 215 and the percentage is ((2/3) * 100) = 66.67%.
Bob registered in contest 207 and the percentage is ((1/3) * 100) = 33.33%.

[LeetCode 1633]

Solution

SELECT r.contest_id,
ROUND(COUNT(r.user_id) * 100 /
(SELECT COUNT(u.user_id) FROM users u), 2)
AS percentage
FROM register r
GROUP BY r.contest_id
ORDER BY percentage DESC, r.contest_id ASC;

This query calculates the percentage of users registered for each contest.

COUNT(r.user_id) : Counts the number of unique users registered for each contest.

(SELECT COUNT(u.user_id) FROM users u) : Retrieves the total number of users


from the users table.

ROUND(..., 2) : Rounds the calculated percentage to two decimal places.

GROUP BY r.contest_id : Groups the results by contest_id to get the percentage


for each contest.

ORDER BY percentage DESC, r.contest_id ASC : Orders the results first by the
percentage in descending order and then by contest_id in ascending order to
handle ties.

This ensures that contests with the highest user registration percentages appear
first and, in case of ties, are ordered by their contest_id .

Question 19

1211. Queries Quality and Percentage

Difficulty: Easy

#### Table: Queries

| Column Name | Type |


|-------------|---------|
| query_name | varchar |
| result | varchar |
| position | int |
| rating | int |

- `query_name` is the name of the query.


- `result` is the result of the query.
- `position` indicates the position of the query (values from 1 to 500).
- `rating` represents the quality rating of the query (values from 1 to 5).
Queries with a rating less than 3 are considered poor.

#### Problem Statement


Write a solution to find each `query_name`, the `quality` and
`poor_query_percentage`.

- The `quality` is defined as the average of the ratio between query


rating and its position.
- The `poor_query_percentage` is the percentage of all queries with a
rating less than 3.

Both `quality` and `poor_query_percentage` should be rounded to 2


decimal places.

Return the result table in any order.

#### Example

**Input:**
Queries table:

| query_name | result | position | rating |


|------------|-------------------|----------|--------|
| Dog | Golden Retriever | 1 | 5 |
| Dog | German Shepherd | 2 | 5 |
| Dog | Mule | 200 | 1 |
| Cat | Shirazi | 5 | 2 |
| Cat | Siamese | 3 | 3 |
| Cat | Sphynx | 7 | 4 |

**Output:**

| query_name | quality | poor_query_percentage |


|------------|---------|-----------------------|
| Dog | 2.50 | 33.33 |
| Cat | 0.66 | 33.33 |

**Explanation:**
- Dog queries quality is ((5 / 1) + (5 / 2) + (1 / 200)) / 3 = 2.50.
- Dog queries poor_query_percentage is (1 / 3) * 100 = 33.33%.

- Cat queries quality equals ((2 / 5) + (3 / 3) + (4 / 7)) / 3 = 0.66.


- Cat queries poor_query_percentage is (1 / 3) * 100 = 33.33%.

[LeetCode 1211]

Solution

SELECT query_name,
ROUND(AVG(rating / position), 2) AS quality,
ROUND(SUM(CASE WHEN rating < 3 THEN 1 ELSE 0 END) * 100.0 /
COUNT(rating), 2) AS poor_query_percentage
FROM queries
WHERE query_name IS NOT NULL
GROUP BY query_name;

This query works as follows:


ROUND(AVG(rating / position), 2) AS quality : This calculates the average of the
ratio between rating and position for each query_name , rounded to 2 decimal
places. It provides an indication of the query's overall quality.

ROUND(SUM(CASE WHEN rating < 3 THEN 1 ELSE 0 END) * 100.0 / COUNT(rating),

2) AS poor_query_percentage : This calculates the percentage of queries with a


rating less than 3 (considered poor) for each query_name . It first sums up all the
poor queries, then divides by the total number of queries to get the percentage,
rounded to 2 decimal places.

WHERE query_name IS NOT NULL : Ensures that only rows with a non-null
query_name are included in the calculations.

GROUP BY query_name : Groups the results by query_name so that aggregate


functions (like AVG and SUM ) are applied within each query name group.

Alternative Solution

SELECT query_name,
ROUND(AVG(rating / position), 2) AS quality,
ROUND(SUM(IF(rating < 3, 1, 0)) * 100.0 / COUNT(rating), 2)
AS poor_query_percentage
FROM Queries
WHERE query_name IS NOT NULL
GROUP BY query_name;

This query works as follows:

ROUND(AVG(rating / position), 2) AS quality : Calculates the average of the


ratio between rating and position for each query_name , rounded to 2 decimal
places, providing the quality metric for each query.

ROUND(SUM(IF(rating < 3, 1, 0)) * 100.0 / COUNT(rating), 2) AS

poor_query_percentage : Computes the percentage of queries with a rating less


than 3. It uses the IF function to count poor queries, then calculates the
percentage, rounded to 2 decimal places.

WHERE query_name IS NOT NULL : Filters out rows where query_name is null,
ensuring that only valid query names are considered.

GROUP BY query_name : Groups the results by query_name to apply aggregate


functions (like AVG and SUM ) within each group.

Comparison

Functionality:

Both queries achieve the same results: calculating the average quality and the
percentage of poor queries per query_name .

The first solution uses standard SQL with CASE WHEN for conditional logic, which
is more portable across different SQL dialects.

The second solution uses IF , which is specific to MySQL. For databases other than
MySQL (e.g., SQL Server, Oracle), IF might not be available or might require
different syntax.

Clarity and Readability:

Both queries are quite similar in terms of readability.


The CASE WHEN approach in the first query is more universally recognized and may
be easier to understand for those familiar with different SQL dialects.

The IF function in the second query is concise but might be less familiar to those
used to other SQL dialects.

Portability:

The first solution using CASE WHEN is more portable and compatible with various
SQL databases.

The second solution using IF is specific to MySQL and might need adjustments for
other SQL environments.

Conclusion:

The first query is generally better due to its portability and adherence to standard
SQL practices. It is more likely to work across different SQL databases and might be
preferred in environments where SQL dialect compatibility is a concern.

Question 20

1193. Monthly Transactions I

Difficulty: Medium

#### Table: Transactions

| Column Name | Type |


|-------------|---------|
| id | int |
| country | varchar |
| state | enum |
| amount | int |
| trans_date | date |

- `id` is the primary key of this table.


- The table contains information about incoming transactions.
- The `state` column is an enum with possible values:
`["approved", "declined"]`.

#### Problem Statement


Write an SQL query to find, for each month and country:
- The total number of transactions.
- The total amount of all transactions.
- The number of approved transactions.
- The total amount of approved transactions.

Return the result in any order.

#### Example

**Input:**

Transactions table:

| id | country | state | amount | trans_date |


|-----|---------|----------|--------|------------|
| 121 | US | approved | 1000 | 2018-12-18 |
| 122 | US | declined | 2000 | 2018-12-19 |
| 123 | US | approved | 2000 | 2019-01-01 |
| 124 | DE | approved | 2000 | 2019-01-07 |

**Output:**

| month | country | trans_count | approved_count | trans_total_amount | approv


|---------|---------|-------------|----------------|--------------------|-------
| 2018-12 | US | 2 | 1 | 3000 | 1000
| 2019-01 | US | 1 | 1 | 2000 | 2000
| 2019-01 | DE | 1 | 1 | 2000 | 2000

#### Explanation:
- In December 2018, the US had 2 transactions (1 approved), with a total
amount of 3000 (1000 from the approved transaction).
- In January 2019, the US had 1 transaction (approved) for 2000.
- In January 2019, Germany (DE) had 1 approved transaction for 2000.

[LeetCode 1193]
Solution

SELECT
DATE_FORMAT(trans_date, '%Y-%m') AS month,
country,
COUNT(amount) AS trans_count,
COUNT(CASE WHEN state = 'approved' THEN amount ELSE NULL END) AS approved_co
SUM(amount) AS trans_total_amount,
SUM(CASE WHEN state = 'approved' THEN amount ELSE 0 END) AS approved_total_a
FROM Transactions
GROUP BY month, country;

DATE_FORMAT(trans_date, '%Y-%m') : Extracts the year and month from


trans_date in the format YYYY-MM .

COUNT(amount) : Counts the total number of transactions per month and country.

COUNT(CASE WHEN state = 'approved' THEN amount ELSE NULL END) : Counts the
number of approved transactions. The CASE statement checks if the state is
'approved' , and returns NULL if it’s not, ensuring only approved transactions are
counted.

SUM(amount) : Calculates the total transaction amount per month and country.

SUM(CASE WHEN state = 'approved' THEN amount ELSE 0 END) : Sums the
transaction amounts where the state is 'approved' . If not approved, it adds 0.

Alternative Solution
SELECT
CONCAT(YEAR(trans_date), '-',
CASE WHEN LENGTH(MONTH(trans_date)) = 1
THEN CONCAT('0', MONTH(trans_date))
ELSE MONTH(trans_date)
END) AS month,
country,
COUNT(id) AS trans_count,
SUM(CASE WHEN state = 'approved' THEN 1 ELSE 0 END) AS approved_count,
SUM(amount) AS trans_total_amount,
SUM(CASE WHEN state = 'approved' THEN amount ELSE 0 END)
AS approved_total_amount
FROM Transactions
GROUP BY month, country;

CONCAT(YEAR(trans_date), '-', ...) : Manually constructs the month by


concatenating the YEAR and MONTH values from trans_date . The CASE ensures
that months with a single digit (e.g., 1) are formatted as 01 by adding a leading
zero.

COUNT(id) : Counts the total number of transactions per month and country.

SUM(CASE WHEN state = 'approved' THEN 1 ELSE 0 END) : Counts the number of
approved transactions by summing 1 for approved transactions and 0 otherwise.

SUM(amount) : Calculates the total transaction amount.

SUM(CASE WHEN state = 'approved' THEN amount ELSE 0 END) : Similar to solution
1, sums the approved transaction amounts.

Comparison

Date Formatting:
Solution 1 uses DATE_FORMAT , which is simpler and cleaner for date formatting in
YYYY-MM .

Solution 2 manually constructs the month using CONCAT and CASE , which
provides more control over the date format but adds complexity.

Counting Approved Transactions:

Both solutions use similar methods to count approved transactions, but Solution 1
uses COUNT(CASE ...) which is more intuitive, whereas Solution 2 uses SUM(CASE

...) with 1 and 0, which is less typical for counting.

Simplicity:

Solution 1 is more straightforward, leveraging MySQL’s built-in functions like


DATE_FORMAT and COUNT(CASE ...) to keep the code clean and concise.

Solution 2 is slightly more complex due to the manual formatting of the month

and using SUM for counting approved transactions, which could be considered less
intuitive.

Performance:

Both solutions should perform similarly in most cases. However, Solution 1 might
have a slight edge due to its simplicity and direct usage of built-in functions like
DATE_FORMAT.

Solution 1 is better for this scenario due to its simplicity and clean handling of
date formatting and counting logic. Solution 2 provides more control over
formatting but adds unnecessary complexity for this specific case.
Question 21

Immediate Food Delivery II

Difficulty: Medium

#### Table: Delivery

| Column Name | Type |


|-----------------------------|---------|
| delivery_id | int |
| customer_id | int |
| order_date | date |
| customer_pref_delivery_date | date |

- `delivery_id` is the primary key (column of unique values) for this table.
- The table holds information about food delivery orders where customers
specify their preferred delivery date, which can be the same as the
`order_date` (immediate) or after (scheduled).

#### Problem Statement


If the customer's preferred delivery date is the same as the order date,
the order is called immediate; otherwise, it is called scheduled.

The first order of a customer is the one with the earliest `order_date`.
It is guaranteed that each customer has precisely one first order.

Write a solution to find the percentage of immediate orders in the first


orders of all customers, rounded to 2 decimal places.

#### Example

**Input:**
Delivery table:

| delivery_id | customer_id | order_date | customer_pref_delivery_date |


|-------------|-------------|------------|-----------------------------|
| 1 | 1 | 2019-08-01 | 2019-08-02 |
| 2 | 2 | 2019-08-02 | 2019-08-02 |
| 3 | 1 | 2019-08-11 | 2019-08-12 |
| 4 | 3 | 2019-08-24 | 2019-08-24 |
| 5 | 3 | 2019-08-21 | 2019-08-22 |
| 6 | 2 | 2019-08-11 | 2019-08-13 |
| 7 | 4 | 2019-08-09 | 2019-08-09 |

**Output:**
| immediate_percentage |
|----------------------|
| 50.00 |

**Explanation:**
- Customer 1's first order is delivery id 1 (scheduled).
- Customer 2's first order is delivery id 2 (immediate).
- Customer 3's first order is delivery id 5 (scheduled).
- Customer 4's first order is delivery id 7 (immediate).

Thus, 50% of customers have immediate first orders.

[Leetcode 1174]

Solution

SELECT
ROUND(SUM(CASE
WHEN customer_pref_delivery_date = order_date
THEN 1 ELSE 0 END) / COUNT(*) * 100, 2) AS immediate_percentage
FROM (
SELECT *
FROM delivery
WHERE (customer_id, order_date) IN (
SELECT customer_id, MIN(order_date)
FROM delivery
GROUP BY customer_id
)
) AS subquery;

The subquery finds each customer’s first order using MIN(order_date) .

The outer query calculates the percentage of immediate orders (where


customer_pref_delivery_date = order_date ) by dividing the count of immediate
orders by the total count and rounding it to two decimal places.
Alternate solution

WITH cte AS (
SELECT *,
CASE WHEN order_date = customer_pref_delivery_date THEN 'immediate'
ELSE 'scheduled' END AS order_type,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_date)
AS order_rnk
FROM delivery
)

SELECT ROUND(
(SUM(CASE WHEN order_rnk = 1 AND order_type = 'immediate'
THEN 1 ELSE 0 END) /
SUM(CASE WHEN order_rnk = 1 THEN 1 ELSE 0 END)) * 100, 2
) AS immediate_percentage
FROM cte;

A Common Table Expression (CTE) is a temporary result set defined within a SQL
query using the WITH clause. It improves readability by breaking complex queries
into smaller, logical parts, and can be referenced multiple times in the main query.

CTE Method: Uses a WITH clause and ROW_NUMBER() to rank orders for each
customer, simplifying filtering for first orders.

The expression row_number() over (partition by customer_id order by

order_date) as order_rnk is used to assign a unique rank to each order for a


customer, based on the order date.

Breakdown:

row_number() : This function assigns a sequential number (starting from 1) to


each row within a partition.
over (partition by customer_id) : This partitions the data by customer_id ,

meaning that the row numbering restarts for each customer. Essentially, each
customer's orders are treated as a separate group.

order by order_date : This specifies that the row numbers should be assigned in
the order of the order_date . The earliest order gets the rank of 1, the next one gets
2, and so on.

as order_rnk : This names the resulting column as order_rnk to indicate the rank
of the order within each customer’s sequence of orders.

In this context, order_rnk = 1 helps to identify the first order for each customer.

Subquery Method: Uses a subquery with MIN(order_date) to find first orders.


Slightly more complex but achieves the same result.

The CTE solution is generally more readable and scalable for large datasets, as it
clearly separates ranking logic from aggregation.

Question 22

Game Play Analysis IV

Difficulty: Medium

#### Table: Activity

| Column Name | Type |


|--------------|---------|
| player_id | int |
| device_id | int |
| event_date | date |
| games_played | int |
- `(player_id, event_date)` is the primary key of this table.
- Each row represents the activity of a player who logged in and played a
number of games (possibly 0) on a specific day using a specific device.

#### Problem Statement


Write a solution to report the fraction of players that logged in again on
the day after the day they first logged in, rounded to two decimal places.

You need to count the number of players who logged in for at least two
consecutive days starting from their first login date, then divide that
number by the total number of players.

#### Example

**Input:**
Activity table:

| player_id | device_id | event_date | games_played |


|-----------|-----------|------------|--------------|
| 1 | 2 | 2016-03-01 | 5 |
| 1 | 2 | 2016-03-02 | 6 |
| 2 | 3 | 2017-06-25 | 1 |
| 3 | 1 | 2016-03-02 | 0 |
| 3 | 4 | 2018-07-03 | 5 |

**Output:**

| fraction |
|-----------|
| 0.33 |

**Explanation:**
- Player 1 logged in on both `2016-03-01` and `2016-03-02`, so they logged in
again after their first login.
- Player 2 logged in only on `2017-06-25` and did not return the next day.
- Player 3 logged in twice, but the dates were far apart (`2016-03-02` and
`2018-07-03`), so they did not log in the day after their first login.

Thus, only 1 out of 3 players logged in again the day after their first
login, making the fraction 1/3 = 0.33.

[LeetCode 550]

Solution
WITH first_day AS (
-- Get the first login date for each player
SELECT player_id, MIN(event_date) AS event_date
FROM activity
GROUP BY player_id
),

second_day AS (
-- Check if a player logged in on the day after their first login
SELECT a.player_id
FROM activity a
JOIN first_day f ON a.player_id = f.player_id
WHERE a.event_date = DATE_ADD(f.event_date, INTERVAL 1 DAY)
)

-- Calculate the fraction of players who logged in the day after


-- their first login
SELECT ROUND(
(SELECT COUNT(player_id) FROM second_day) /
(SELECT COUNT(player_id) FROM first_day),
2
) AS fraction;

first_day CTE:

This common table expression (CTE) selects the minimum event_date (first login)
for each player_id from the activity table.

second_day CTE:

This CTE checks if a player logged in the day after their first login by joining the
activity table with the first_day CTE. It filters for records where the
event_date matches exactly one day after the first_day for each player_id .

Main Query:
The main query calculates the fraction of players who logged in again on the day
after their first login. This is done by dividing the number of players in
second_day (those who logged in on consecutive days) by the total number of
players in first_day (all players with a first login). The result is rounded to two
decimal places.

Question 23

Number of Unique Subjects Taught by Each Teacher

Difficulty: Easy

#### Table: Teacher

| Column Name | Type |


|-------------|------|
| teacher_id | int |
| subject_id | int |
| dept_id | int |

- `(subject_id, dept_id)` is the primary key (a combination of columns


with unique values) for this table.
- Each row indicates that the teacher with `teacher_id` teaches the
subject `subject_id` in the department `dept_id`.

#### Problem Statement


Write a solution to calculate the number of unique subjects each
teacher teaches in the university.

Return the result table in any order.

#### Example

**Input:**
Teacher table:

| teacher_id | subject_id | dept_id |


|------------|------------|---------|
| 1 | 2 | 3 |
| 1 | 2 | 4 |
| 1 | 3 | 3 |
| 2 | 1 | 1 |
| 2 | 2 | 1 |
| 2 | 3 | 1 |
| 2 | 4 | 1 |

**Output:**

| teacher_id | cnt |
|------------|-----|
| 1 | 2 |
| 2 | 4 |

**Explanation:**
- Teacher 1 teaches subject 2 in departments 3 and 4, and subject 3 in
department 3, totaling 2 unique subjects.
- Teacher 2 teaches subjects 1, 2, 3, and 4, all in department 1,
totaling 4 unique subjects.

[Leetcode 2356]

Solution

## Using a Subquery

SELECT subquery.teacher_id, COUNT(*) AS cnt


FROM (
SELECT DISTINCT teacher_id, subject_id
FROM teacher
) AS subquery
GROUP BY teacher_id;

Utilizes a subquery to first select distinct pairs of teacher_id and subject_id .

Then, it groups the results by teacher_id to count the unique subjects taught by
each teacher.

This approach is straightforward but may be less readable, especially for complex
queries.
Alternate Solution

## Alternate Solution: Using a Common Table Expression (CTE)

WITH unique_subject AS (
SELECT DISTINCT teacher_id, subject_id
FROM teacher
)
SELECT teacher_id, COUNT(*) AS cnt
FROM unique_subject
GROUP BY teacher_id;

Employs a CTE to achieve the same result.

The CTE makes the logic clearer by separating the selection of distinct subjects from
the main query.

It enhances readability and maintainability, making it easier to understand the


flow of data.

Both solutions effectively yield the same result, counting the number of unique
subjects taught by each teacher. The choice between them often comes down to
personal or team preferences for readability and maintainability. CTEs are
generally preferred for complex queries, while subqueries might suffice for simpler
cases.

Question 24

User Activity for the Past 30 Days I


Difficulty: Easy
#### Table: Activity

| Column Name | Type |


|---------------|---------|
| user_id | int |
| session_id | int |
| activity_date | date |
| activity_type | enum |

- This table may have duplicate rows.


- The `activity_type` column is an ENUM of type
('open_session', 'end_session', 'scroll_down', 'send_message').
- The table shows user activities for a social media website.
Each session belongs to exactly one user.

#### Problem Statement


Write a solution to find the daily active user count for a period of
30 days ending 2019-07-27 inclusively. A user was active on a day if
they made at least one activity on that day.

Return the result table in any order.

#### Example

**Input:**
Activity table:

| user_id | session_id | activity_date | activity_type |


|---------|------------|---------------|---------------|
| 1 | 1 | 2019-07-20 | open_session |
| 1 | 1 | 2019-07-20 | scroll_down |
| 1 | 1 | 2019-07-20 | end_session |
| 2 | 4 | 2019-07-20 | open_session |
| 2 | 4 | 2019-07-21 | send_message |
| 2 | 4 | 2019-07-21 | end_session |
| 3 | 2 | 2019-07-21 | open_session |
| 3 | 2 | 2019-07-21 | send_message |
| 3 | 2 | 2019-07-21 | end_session |
| 4 | 3 | 2019-06-25 | open_session |
| 4 | 3 | 2019-06-25 | end_session |

**Output:**
| day | active_users |
|------------|--------------|
| 2019-07-20 | 2 |
| 2019-07-21 | 2 |

**Explanation:**
Note that we do not care about days with zero active users.

[Leetcode 1141]
Solution

WITH cte AS (
SELECT DISTINCT user_id, activity_date
FROM Activity
WHERE activity_date <= DATE_ADD('2019-07-27', INTERVAL 0 DAY)
AND activity_date > DATE_ADD('2019-07-27', INTERVAL -30 DAY)
)

SELECT activity_date AS day, COUNT(*) AS active_users


FROM cte
GROUP BY activity_date;

Common Table Expression (CTE):

The WITH cte AS (...) clause defines a CTE named cte that selects distinct
user_id and activity_date from the Activity table.

It filters the records to include only those within the 30 days leading up to July 27,
2019, ensuring that only relevant activity data is processed.

Final Selection:

The outer query selects activity_date as day and counts the number of distinct
active users ( COUNT(*) ).

The results are grouped by activity_date , which allows us to see the number of
active users for each day.
This query aims to calculate the daily active user count over a 30-day period
ending on July 27, 2019, by first isolating the relevant records and then
aggregating them by date.

Question 25

Product Sales Analysis III

Difficulty: Medium

#### Table: Sales

| Column Name | Type |


|-------------|-------|
| sale_id | int |
| product_id | int |
| year | int |
| quantity | int |
| price | int |

- `(sale_id, year)` is the primary key (combination of columns with unique


values) for this table.
- `product_id` is a foreign key (reference column) to the Product table.
- Each row of this table shows a sale on the product `product_id`
in a certain year.
- Note that the price is per unit.

#### Table: Product

| Column Name | Type |


|--------------|---------|
| product_id | int |
| product_name | varchar |

- `product_id` is the primary key (column with unique values) for this table.
- Each row of this table indicates the product name of each product.

#### Problem Statement


Write a solution to select the product id, year, quantity, and price for
the first year of every product sold.

Return the resulting table in any order.

#### Example
**Input:**
Sales table:

| sale_id | product_id | year | quantity | price |


|---------|------------|------|----------|-------|
| 1 | 100 | 2008 | 10 | 5000 |
| 2 | 100 | 2009 | 12 | 5000 |
| 7 | 200 | 2011 | 15 | 9000 |

Product table:

| product_id | product_name |
|------------|--------------|
| 100 | Nokia |
| 200 | Apple |
| 300 | Samsung |

**Output:**

| product_id | first_year | quantity | price |


|------------|------------|----------|-------|
| 100 | 2008 | 10 | 5000 |
| 200 | 2011 | 15 | 9000 |

**Explanation:**
The output shows the first year each product was sold along with
the corresponding quantity and price.

[Leetcode 1070]

Solution

WITH cte AS (
SELECT product_id, MIN(year) AS first_year
FROM sales
GROUP BY product_id
)
SELECT s.product_id, t.first_year, s.quantity, s.price
FROM sales s
JOIN cte t ON t.product_id = s.product_id AND t.first_year = s.year;
Common Table Expression (CTE) - cte :

This part of the query creates a temporary result set called cte .

It selects the product_id and the minimum year (i.e., the first year of sales) for
each product.

The GROUP BY product_id clause ensures that we get one row per product.

Main Query:

The main query retrieves data from the sales table, including the product_id ,

first_year , quantity , and price .

It joins the sales table ( s ) with the cte CTE ( t ) using the product_id and
matching the first_year to the year in the sales records.

This ensures that only the sales records from the first year of each product are
returned.

The query effectively identifies the first year a product was sold and retrieves the
associated quantity and price for that year, allowing for a comprehensive view of
the initial sales data for each product.

Question 26

Classes More Than 5 Students

Difficulty: Easy

#### Table: Courses


| Column Name | Type |
|-------------|---------|
| student | varchar |
| class | varchar |

- `(student, class)` is the primary key (combination of columns with


unique values) for this table.
- Each row of this table indicates the name of a student and the class
in which they are enrolled.

#### Problem Statement


Write a solution to find all the classes that have at least five students.

Return the result table in any order.

#### Example

**Input:**
Courses table:

| student | class |
|---------|----------|
| A | Math |
| B | English |
| C | Math |
| D | Biology |
| E | Math |
| F | Computer |
| G | Math |
| H | Math |
| I | Math |

**Output:**

| class |
|---------|
| Math |

**Explanation:**
- Math has 6 students, so we include it.
- English, Biology, and Computer each have 1 student, so we do not
include them.

[Leetcode 596]

Solution
SELECT class
FROM courses
GROUP BY class
HAVING COUNT(student) >= 5;

GROUP BY: This groups the data by class , so that the number of students in each
class can be counted.

HAVING COUNT(student) >= 5: The HAVING clause filters the results to only
include classes that have at least 5 students, as calculated by COUNT(student) .

This query returns all the classes that have 5 or more students enrolled.

Question 27

Find Followers Count

Difficulty: Easy

#### Table: Followers

| Column Name | Type |


|-------------|------|
| user_id | int |
| follower_id | int |

- `(user_id, follower_id)` is the primary key (combination of columns


with unique values) for this table.
- This table contains the IDs of a user and a follower in a social media
app where the follower follows the user.

#### Problem Statement


Write a solution that will, for each user, return the number of followers.

Return the result table ordered by `user_id` in ascending order.

#### Example
**Input:**
Followers table:

| user_id | follower_id |
|---------|-------------|
| 0 | 1 |
| 1 | 0 |
| 2 | 0 |
| 2 | 1 |

**Output:**

| user_id | followers_count |
|---------|-----------------|
| 0 | 1 |
| 1 | 1 |
| 2 | 2 |

**Explanation:**
- The followers of user `0` are `{1}`.
- The followers of user `1` are `{0}`.
- The followers of user `2` are `{0,1}`.

[Leetcode 1729]

Solution

SELECT user_id, COUNT(follower_id) AS followers_count


FROM followers
GROUP BY user_id
ORDER BY user_id;

SELECT user_id, COUNT(follower_id) : This line selects the user_id from the
followers table and counts the number of follower_id s associated with each
user_id . The result is returned as followers_count .
FROM followers : This specifies the table followers that contains the user and
follower relationships.

GROUP BY user_id : This groups the result by user_id , ensuring that we count the
followers for each user individually.

ORDER BY user_id : This orders the final output by user_id in ascending order to
ensure the result is sorted as per the requirement

The query returns the number of followers for each user in ascending order of
user_id . Each row in the result table will have two columns:

user_id : The ID of the user.

followers_count : The number of unique followers the user has.

This solution is efficient for calculating and returning the follower count for every
user in a social media app scenario.

Question 28

Biggest Single Number

Difficulty: Easy

#### Table: MyNumbers

| Column Name | Type |


|-------------|------|
| num | int |

- This table may contain duplicates (there is no primary key for this
table in SQL).
- Each row of this table contains an integer.
#### Problem Statement
Find the largest single number. If there is no single number, report null.

#### Example

**Input:**
MyNumbers table:

| num |
|-----|
| 8 |
| 8 |
| 3 |
| 3 |
| 1 |
| 4 |
| 5 |
| 6 |

**Output:**

| num |
|-----|
| 6 |

**Explanation:**
The single numbers are 1, 4, 5, and 6. Since 6 is the largest
single number, we return it.

**Example 2:**

**Input:**
MyNumbers table:

| num |
|-----|
| 8 |
| 8 |
| 7 |
| 7 |
| 3 |
| 3 |
| 3 |

**Output:**

| num |
|------|
| null |
**Explanation:**
There are no single numbers in the input table, so we return null.

[LeetCode 619]

Solution

WITH UniqueNumbers AS (
SELECT num, COUNT(*) AS count
FROM MyNumbers
GROUP BY num
HAVING count = 1
)

SELECT MAX(num) AS num


FROM UniqueNumbers;

Common Table Expression (CTE) - UniqueNumbers :

This CTE selects distinct numbers from the MyNumbers table and counts how many
times each number appears.

It groups the results by the num column.

The HAVING count = 1 clause filters the results to only include numbers that
appear exactly once, identifying them as "single numbers."

Final Selection:

The main query selects the maximum number from the UniqueNumbers CTE using
MAX(num) .
This returns the largest single number found in the original table. If there are no
single numbers, the result will be NULL .

Question 29

Customers Who Bought All Products

Difficulty: Medium

#### Table: Customer

| Column Name | Type |


|-------------|---------|
| customer_id | int |
| product_key | int |

- This table may contain duplicate rows.


- `customer_id` is not NULL.
- `product_key` is a foreign key (reference column) to the `Product` table.

#### Table: Product

| Column Name | Type |


|-------------|------|
| product_key | int |

- `product_key` is the primary key (column with unique values) for this table.

#### Problem Statement


Write a solution to report the `customer_id` from the `Customer`
table who bought all the products listed in the `Product` table.

Return the result table in any order.

#### Example

**Input:**

Customer table:

| customer_id | product_key |
|-------------|-------------|
| 1 | 5 |
| 2 | 6 |
| 3 | 5 |
| 3 | 6 |
| 1 | 6 |

Product table:

| product_key |
|-------------|
| 5 |
| 6 |

**Output:**

| customer_id |
|-------------|
| 1 |
| 3 |

**Explanation:**
- The products in the `Product` table are 5 and 6.
- Customers with IDs 1 and 3 have bought both products,
so they are included in the result.

[LeetCode 1045]

Solution

WITH no_duplicate AS (
SELECT DISTINCT * FROM customer
)

SELECT nd.customer_id
FROM no_duplicate nd
GROUP BY nd.customer_id
HAVING COUNT(*) = (SELECT COUNT(*) AS product_count FROM product);

The CTE named no_duplicate selects distinct entries from the customer table to
eliminate any duplicate records. This ensures that each customer is only counted
once, even if they purchased the same product multiple times.
The main query selects customer_id from the no_duplicate CTE.

It groups the results by customer_id , aggregating all purchases for each customer.

The HAVING clause checks that the count of distinct products each customer has
purchased is equal to the total count of products in the product table.

This is done using a subquery that counts all products in the Product table
( SELECT COUNT(*) AS product_count FROM product ).

Question 30

The Number of Employees Which Report to Each Employee

Difficulty: Easy

#### Table: Employees

| Column Name | Type |


|--------------|---------|
| employee_id | int |
| name | varchar |
| reports_to | int |
| age | int |

- `employee_id` is the column with unique values for this table.


- This table contains information about the employees and the ID of
the manager they report to. Some employees do not report to anyone
(`reports_to` is null).

#### Problem Statement


For this problem, we will consider a manager an employee who has at
least 1 other employee reporting to them.

Write a solution to report the IDs and the names of all managers,
the number of employees who report directly to them, and the average
age of the reports rounded to the nearest integer.

Return the result table ordered by `employee_id`.


#### Example

**Input:**
Employees table:

| employee_id | name | reports_to | age |


|-------------|---------|------------|-----|
| 9 | Hercy | null | 43 |
| 6 | Alice | 9 | 41 |
| 4 | Bob | 9 | 36 |
| 2 | Winston | null | 37 |

**Output:**

| employee_id | name | reports_count | average_age |


|-------------|-------|---------------|-------------|
| 9 | Hercy | 2 | 39 |

**Explanation:**
Hercy has 2 people reporting directly to him, Alice and Bob.
Their average age is (41+36)/2 = 38.5, which is 39 after rounding
to the nearest integer.

**Example 2:**

**Input:**
Employees table:

| employee_id | name | reports_to | age |


|-------------|---------|------------|-----|
| 1 | Michael | null | 45 |
| 2 | Alice | 1 | 38 |
| 3 | Bob | 1 | 42 |
| 4 | Charlie | 2 | 34 |
| 5 | David | 2 | 40 |
| 6 | Eve | 3 | 37 |
| 7 | Frank | null | 50 |
| 8 | Grace | null | 48 |

**Output:**

| employee_id | name | reports_count | average_age |


|-------------|---------|---------------|-------------|
| 1 | Michael | 2 | 40 |
| 2 | Alice | 2 | 37 |
| 3 | Bob | 1 | 37 |

[LeetCode 1731]
Solution

SELECT
e.employee_id,
e.name,
COUNT(sub.employee_id) AS reports_count,
ROUND(AVG(sub.age)) AS average_age
FROM
employees e
JOIN
employees sub
ON e.employee_id = sub.reports_to
GROUP BY
e.employee_id,
e.name
ORDER BY
e.employee_id;

The query retrieves information from the employees table, specifically focusing on
employees and their direct reports.

e.employee_id : The ID of the employee.

e.name : The name of the employee.

COUNT(sub.employee_id) AS reports_count : This counts the number of direct


reports each employee has by counting the employee_id from the sub alias, which
represents the subordinate employees.

ROUND(AVG(sub.age)) AS average_age : This calculates the average age of the


subordinates, rounding the result to the nearest whole number.

JOIN Operation:
The query uses a self-join on the employees table. The sub alias refers to
subordinates, and the join condition ( ON e.employee_id = sub.reports_to ) links
each employee to their direct reports.

Grouping:

The results are grouped by e.employee_id and e.name , which allows aggregation
functions like COUNT and AVG to calculate values specific to each employee.

Ordering:

The final result set is ordered by e.employee_id , ensuring that the output is sorted
in ascending order by employee ID.

Question 31

Primary Department for Each Employee

Difficulty: Easy

#### Table: Employee

| Column Name | Type |


|---------------|---------|
| employee_id | int |
| department_id | int |
| primary_flag | varchar |

- `(employee_id, department_id)` is the primary key for this table.


- `employee_id` is the ID of the employee.
- `department_id` is the ID of the department to which the employee belongs.
- `primary_flag` is an ENUM of type ('Y', 'N'). If the flag is 'Y',
the department is the primary department for the employee.
If the flag is 'N', the department is not the primary.

#### Problem Statement


Employees can belong to multiple departments. When the employee joins
other departments, they need to decide which department is their primary
department. Note that when an employee belongs to only one department,
their primary column is 'N'.

Write a solution to report all the employees with their primary department.
For employees who belong to one department, report their only department.

Return the result table in any order.

#### Example

**Input:**
Employee table:

| employee_id | department_id | primary_flag |


|-------------|---------------|--------------|
| 1 | 1 | N |
| 2 | 1 | Y |
| 2 | 2 | N |
| 3 | 3 | N |
| 4 | 2 | N |
| 4 | 3 | Y |
| 4 | 4 | N |

**Output:**

| employee_id | department_id |
|-------------|---------------|
| 1 | 1 |
| 2 | 1 |
| 3 | 3 |
| 4 | 3 |

**Explanation:**
- The primary department for employee 1 is 1 (since they belong
to only one department).
- The primary department for employee 2 is 1.
- The primary department for employee 3 is 3 (as they belong to
only one department).
- The primary department for employee 4 is 3.

[LeetCode 1789]

Solution
-- Using UNION

SELECT employee_id, department_id


FROM employee
WHERE primary_flag = 'Y'

UNION

SELECT employee_id, department_id


FROM employee
GROUP BY employee_id
HAVING COUNT(employee_id) = 1;

The first part of the query selects employees who have marked a department as
their primary department ( primary_flag = 'Y' ).

The second part of the query uses GROUP BY to find employees who belong to only
one department (i.e., where COUNT(employee_id) = 1 ).

The UNION combines the results of these two queries, ensuring that duplicate
entries are removed (since UNION eliminates duplicates).

Simple and easy to understand.

The use of UNION guarantees no duplicates, even if an employee fits both


conditions (i.e., has a primary department and belongs to only one department).

Performance could suffer due to the need to scan the table twice, once for each
SELECT query.

Alternative Solution
-- Using OR Condition

SELECT
employee_id,
department_id
FROM
Employee
WHERE
primary_flag = 'Y'
OR
employee_id IN (
SELECT
employee_id
FROM
Employee
GROUP BY
employee_id
HAVING
COUNT(*) = 1
);

The first condition ( primary_flag = 'Y' ) selects employees who have a primary
department.

The second condition uses a subquery with GROUP BY to find employees who belong
to only one department. The subquery returns the employee_id s of employees who
belong to exactly one department, and the outer query checks if the employee_id is
in this result.

The OR operator ensures that employees who either have a primary department or
belong to only one department are included in the result.

This query structure is more efficient because it avoids scanning the table twice.
The filtering is done in a single query.
This query does not remove duplicates as effectively as UNION (though it may not
be a problem if the dataset guarantees no overlap).

Question 32

Triangle Judgement

Difficulty: Easy

#### Table: Triangle

| Column Name | Type |


|-------------|------|
| x | int |
| y | int |
| z | int |

- `(x, y, z)` is the primary key for this table.


- Each row contains the lengths of three line segments.

#### Problem Statement


Write a query to determine whether three line segments
(`x`, `y`, and `z`) can form a triangle.
According to the triangle inequality theorem,
the following conditions must be met:
- `x + y > z`
- `x + z > y`
- `y + z > x`

Return the result table in any order.

#### Example

**Input**:
Triangle table:

| x | y | z |
|----|----|----|
| 13 | 15 | 30 |
| 10 | 20 | 15 |

**Output**:

| x | y | z | triangle |
|----|----|----|----------|
| 13 | 15 | 30 | No |
| 10 | 20 | 15 | Yes |

**Explanation**:
- For the first row, 13 + 15 is not greater than 30,
so the three sides cannot form a triangle.
- For the second row, all conditions of the triangle
inequality theorem are satisfied, so the three sides form a triangle.

[LeetCode 610]

Solution

SELECT x, y, z,
CASE
WHEN x + y > z AND x + z > y AND y + z > x
THEN 'Yes'
ELSE 'No'
END AS triangle
FROM triangle;

SELECT x, y, z : This part retrieves the values of x, y, and z from the table
named triangle .

CASE ... END : This is a conditional statement used to create a new column called
triangle that checks whether the values of x, y, and z can form a valid
triangle.

WHEN x + y > z AND x + z > y AND y + z > x : This condition checks if the three
values satisfy the triangle inequality theorem, which states that for any three sides
of a triangle:

The sum of any two sides must be greater than the third side.
If all three conditions are true, then the values can form a triangle.

THEN 'Yes' : If the condition is met (i.e., the three sides form a triangle), the query
will return "Yes" for that row.

ELSE 'No' : If the condition is not met, the query will return "No", indicating the
values do not form a triangle.

Question 33

Consecutive Numbers

Difficulty: Medium

#### Table: Logs

| Column Name | Type |


|-------------|---------|
| id | int |
| num | varchar |

- `id` is the primary key for this table and is an autoincrementing


column starting from 1.

#### Problem Statement


Write a query to find all numbers that appear at least three times
consecutively in the `Logs` table.

Return the result table in any order.

#### Example

**Input**:
Logs table:

| id | num |
|----|-----|
| 1 | 1 |
| 2 | 1 |
| 3 | 1 |
| 4 | 2 |
| 5 | 1 |
| 6 | 2 |
| 7 | 2 |

**Output**:

| ConsecutiveNums |
|-----------------|
| 1 |

**Explanation**:
- Number `1` appears consecutively three times from `id` 1 to `id` 3.
- Number `2` only appears consecutively twice, so it does not meet
the requirement.

[LeetCode 180]

Solution

SELECT DISTINCT l1.num AS ConsecutiveNums


FROM logs AS l1
JOIN logs AS l2
JOIN logs AS l3
ON l1.id = l2.id - 1 AND l2.id = l3.id - 1
AND l1.num = l2.num AND l2.num = l3.num;

This query uses self-joins to find consecutive rows in the logs table where the num

value is the same across three rows.

l1 , l2 , and l3 represent three different rows from the same table logs .

The ON condition enforces:

l1.id = l2.id - 1 : l1 must be immediately followed by l2 .


l2.id = l3.id - 1 : l2 must be immediately followed by l3 .

l1.num = l2.num and l2.num = l3.num : the num value must be the same in all
three consecutive rows.

The DISTINCT ensures only unique num values are returned as ConsecutiveNums .

This query requires multiple joins, which may not be efficient for large datasets.
Each row is compared with its preceding and succeeding rows, making the
computation more expensive.

Alternative Solution

WITH cte AS (
SELECT num, id,
LAG(num, 1) OVER (ORDER BY id) AS prev_num,
LEAD(num, 1) OVER (ORDER BY id) AS next_num,
LAG(id, 1) OVER (ORDER BY id) AS prev_id,
LEAD(id, 1) OVER (ORDER BY id) AS next_id
FROM logs
)
SELECT DISTINCT num AS ConsecutiveNums
FROM cte
WHERE num = prev_num AND prev_num = next_num
AND id = prev_id + 1 AND id = next_id - 1;

This query uses window functions ( LAG and LEAD ) to look at previous and next
rows within the same result set without needing self-joins.

LAG(num, 1) : Returns the value of num in the previous row.

LEAD(num, 1) : Returns the value of num in the next row.


Similar functions ( LAG and LEAD ) are used for id values to check if the rows are
consecutive.

In the outer query:

num = prev_num AND prev_num = next_num : Ensures that num is the same across
three consecutive rows.

id = prev_id + 1 AND id = next_id - 1 : Ensures the id values are consecutive.

DISTINCT ensures unique results for ConsecutiveNums .

More efficient: The window functions provide a more efficient way to access
preceding and following rows, avoiding the need for multiple joins.

Simpler logic: The use of LAG and LEAD makes the query more concise and easier
to follow.

Comparison:

Efficiency: The second query is generally more efficient because it avoids self-joins
and uses window functions to inspect adjacent rows, reducing the computational
complexity.

Simplicity: The second query has simpler logic due to the use of LAG and LEAD ,

making it easier to maintain and understand.

Joins vs Window Functions: The first query relies on self-joins, which can be
slower for larger datasets, while the second query uses window functions, which
are typically faster for this kind of operation.
Flexibility: The second query can be more flexible in cases where you might need
more complex window-based operations or comparisons.

In summary, while both queries solve the same problem, the second query is more
efficient and modern due to the use of window functions.

Question 34

Product Price at a Given Date

Difficulty: Medium

#### Table: Products

| Column Name | Type |


|---------------|---------|
| product_id | int |
| new_price | int |
| change_date | date |

- `(product_id, change_date)` is the primary key of this table.


- Each row represents a price change for a product on a specific date.

#### Problem Statement


Write a solution to find the prices of all products on `2019-08-16`.
If there is no price change for a product before this date,
assume the price is `10`.

Return the result table in any order.

#### Example

**Input**:
Products table:

| product_id | new_price | change_date |


|------------|-----------|-------------|
| 1 | 20 | 2019-08-14 |
| 2 | 50 | 2019-08-14 |
| 1 | 30 | 2019-08-15 |
| 1 | 35 | 2019-08-16 |
| 2 | 65 | 2019-08-17 |
| 3 | 20 | 2019-08-18 |
**Output**:

| product_id | price |
|------------|-------|
| 2 | 50 |
| 1 | 35 |
| 3 | 10 |

**Explanation**:
- Product `1` has its price changed to `35` on `2019-08-16`.
- Product `2` has its price changed to `50` on `2019-08-14`,
and no changes on `2019-08-16`, so its price remains `50`.
- Product `3` has no price changes before `2019-08-16`,
so its price is the default of `10`.

[LeetCode 1164]

Solution

WITH LatestChangeDate AS (
SELECT product_id, MAX(change_date) AS last_change_date
FROM products
WHERE change_date <= '2019-08-16'
GROUP BY product_id
)

SELECT p.product_id, p.new_price AS price


FROM products p
JOIN LatestChangeDate lcd
ON p.change_date = lcd.last_change_date AND p.product_id = lcd.product_id

UNION

SELECT product_id, 10 AS price


FROM products
WHERE product_id NOT IN (SELECT product_id FROM LatestChangeDate);

The Common Table Expression (CTE) is named LatestChangeDate , making its


purpose clear.
It selects the product_id and the most recent change_date ( MAX(change_date) )
before or on '2019-08-16' .

The GROUP BY product_id ensures that we get the last change date for each product
up to that date.

The second part retrieves the most recent price for each product, based on the latest
change date.

The JOIN is performed between the original products table ( p ) and the
LatestChangeDate CTE ( lcd ).

The condition ON p.change_date = lcd.last_change_date AND p.product_id =

lcd.product_id ensures that only the most recent price for each product is selected.

We return the product_id and the corresponding new_price as price .

The UNION combines the results from two parts.

The second part selects products that were not part of the LatestChangeDate CTE,
meaning they didn’t have any price change before or on '2019-08-16' .

For these products, we assign a default price of 10 .

Summary:

The query first finds the latest price for products that had price changes before or
on '2019-08-16' by using the LatestChangeDate CTE.
Products without a price change before that date are assigned a default price of
10 .

The UNION merges both results, giving the final output of products with either
their latest price or a default price.

Question 35

Last Person to Fit in the Bus

Difficulty: Medium

#### Table: Queue

| Column Name | Type |


|-------------|---------|
| person_id | int |
| person_name | varchar |
| weight | int |
| turn | int |

- `person_id` column contains unique values.


- `turn` determines the boarding order, where `turn = 1` is the
first person and `turn = n` is the last.
- The bus has a **weight limit of 1000 kg**.

#### Problem Statement


Write a solution to find the `person_name` of the last person that
can fit on the bus **without exceeding** the weight limit. Only one
person can board the bus per turn.

The result table can be returned in any order.

#### Example

**Input**:
Queue table:

| person_id | person_name | weight | turn |


|-----------|-------------|--------|------|
| 5 | Alice | 250 | 1 |
| 4 | Bob | 175 | 5 |
| 3 | Alex | 350 | 2 |
| 6 | John Cena | 400 | 3 |
| 1 | Winston | 500 | 6 |
| 2 | Marie | 200 | 4 |

**Output**:

| person_name |
|-------------|
| John Cena |

**Explanation**:
The table ordered by `turn` looks like this:

| Turn | person_id | person_name | weight | Total Weight |


|------|-----------|-------------|--------|--------------|
| 1 | 5 | Alice | 250 | 250 |
| 2 | 3 | Alex | 350 | 600 |
| 3 | 6 | John Cena | 400 | 1000 | (last person to board
| 4 | 2 | Marie | 200 | 1200 | (cannot board)
| 5 | 4 | Bob | 175 | ___ |
| 6 | 1 | Winston | 500 | ___ |

- John Cena is the last person to fit on the bus before the total
weight exceeds 1000 kg.

[LeetCode 1204]

Solution

WITH CumulativeWeightQueue AS (
SELECT
person_name,
weight,
SUM(weight) OVER (ORDER BY turn) AS cumulative_weight
FROM queue
)

SELECT person_name
FROM CumulativeWeightQueue
WHERE cumulative_weight <= 1000
ORDER BY cumulative_weight DESC
LIMIT 1;
The Common Table Expression (CTE) is now named CumulativeWeightQueue ,

indicating that it is tracking cumulative weights for people in the queue.

We are selecting person_name and weight from the queue table.

The SUM(weight) OVER (ORDER BY turn) is a window function that calculates a


running total (cumulative sum) of the weight values in the order of turn .

This gives us the cumulative weight for each person in the queue as they are
processed one by one.

The main query selects the person_name from the CumulativeWeightQueue CTE
where the cumulative weight is less than or equal to 1000 .

The ORDER BY cumulative_weight DESC sorts the results by cumulative weight in


descending order, meaning we get the largest cumulative weight under or equal to
1000 first.

The LIMIT 1 ensures that we only return the name of the person whose cumulative
weight is the highest while still being less than or equal to 1000 .

Summary:

The query calculates a running total of weights for people in the queue.

It then selects the person whose cumulative weight is the largest but still under or
equal to 1000 , effectively finding the last person who can be added to the queue
without exceeding the weight limit.

Question 36
Count Salary Categories

Difficulty: Medium

#### Table: Accounts

| Column Name | Type |


|-------------|------|
| account_id | int |
| income | int |

- `account_id` is the primary key for this table.


- Each row contains information about the monthly income for one bank account.

#### Problem Statement


Write a solution to calculate the number of bank accounts for each salary
category. The salary categories are:

- **"Low Salary"**: All the salaries strictly less than $20,000.


- **"Average Salary"**: All the salaries in the inclusive range [$20,000,
$50,000].
- **"High Salary"**: All the salaries strictly greater than $50,000.

The result must contain all three categories. If there are no accounts in
a category, return `0`.

#### Example

**Input**:
Accounts table:

| account_id | income |
|------------|--------|
| 3 | 108939 |
| 2 | 12747 |
| 8 | 87709 |
| 6 | 91796 |

**Output**:

| category | accounts_count |
|----------------|----------------|
| Low Salary | 1 |
| Average Salary | 0 |
| High Salary | 3 |

**Explanation**:
- **Low Salary**: Account 2 has an income strictly less than $20,000.
- **Average Salary**: No accounts have income between $20,000 and
$50,000 inclusive.
- **High Salary**: Accounts 3, 6, and 8 have incomes strictly greater
than $50,000.

[LeetCode 1907]

Solution

WITH accounts_cat_count AS
(
SELECT
SUM(IF(income < 20000, 1, 0)) AS "Low_Salary",
SUM(IF(income >= 20000 AND income <= 50000, 1, 0))
AS "Average_Salary",
SUM(IF(income > 50000, 1, 0)) AS "High_Salary"
FROM Accounts
)
SELECT 'High Salary' AS category, High_Salary AS accounts_count
FROM accounts_cat_count
UNION ALL
SELECT 'Low Salary', Low_Salary
FROM accounts_cat_count
UNION ALL
SELECT 'Average Salary', Average_Salary
FROM accounts_cat_count;

Aggregation in a Subquery:

Aggregation Logic: Uses SUM(IF(...)) to calculate the number of accounts in


each salary category in a single pass over the table.

Single Aggregation: This solution computes all categories in one go, aggregating
the counts for “Low_Salary,” “Average_Salary,” and “High_Salary” in a single
subquery.
Efficiency: This approach is potentially faster for large datasets because it scans
the Accounts table once and performs the counting in a single query. This can be
more efficient than multiple table scans.

Structure: It requires you to access the counts multiple times from the
accounts_cat_count CTE in the UNION ALL .

Alternative Solution

WITH cte AS (
SELECT "Low Salary" AS category, COUNT(*) AS accounts_count
FROM Accounts
WHERE income < 20000

UNION

SELECT "Average Salary" AS category, COUNT(*) AS accounts_count


FROM Accounts
WHERE income >= 20000 AND income <= 50000

UNION

SELECT "High Salary" AS category, COUNT(*) AS accounts_count


FROM Accounts
WHERE income > 50000
)

SELECT * FROM cte ORDER BY accounts_count DESC;

Multiple Scans: Each SELECT statement with WHERE clauses runs independently,
meaning the table is scanned multiple times (three times in this case). This can
lead to slower performance on large datasets compared to the aggregation-based
solution.
Flexibility: This method is easier to understand and maintain, as each category
has its own filtering logic explicitly written in the WHERE clause. If more categories
or complex conditions were needed, they could be added easily.

Efficiency

Solution 1 (Aggregation) is likely more efficient for large datasets because it scans
the Accounts table only once.

Solution 2 (Filtering) scans the table separately for each salary category, resulting
in three scans of the Accounts table. This could be less performant for large
datasets.

Readability and Maintainability

Solution 1 (Aggregation):

More compact: The logic for all salary categories is contained within a single
query.

More complex: The use of SUM(IF(...)) may be harder to understand for


beginners, especially if more categories are added.

Difficult to extend: If you need to add more salary ranges or categories, the logic
would become increasingly complicated, as you’d need to add more conditional
aggregations.

Solution 2 (Filtering):
More readable: The logic is straightforward, with each SELECT statement clearly
defining the condition for each salary range. This makes the query easier to read
and extend if needed.

Easier to extend: Adding a new category or modifying an existing one is simpler.


You just add another SELECT with a WHERE clause, without modifying existing
logic.

Ordering

Solution 1 (Aggregation) does not include an ORDER BY clause. The result will be
in the default UNION ALL order, which depends on how the database processes the
query. If you want to order the results by accounts_count , you would need to add
an ORDER BY clause to the final query.

Solution 2 (Filtering) explicitly orders the results by accounts_count in


descending order, which is useful if you want to see the categories with the most
accounts first.

Use of UNION vs. UNION ALL

Solution 1 uses UNION ALL which keeps all results and avoids the overhead of
deduplication that occurs with UNION .

Solution 2 uses UNION , which performs a deduplication of results. However, in this


case, it doesn't really matter because the categories are inherently distinct, and
UNION will behave the same as UNION ALL .

Conclusion:
Performance: Solution 1 is generally more efficient for large datasets due to fewer
table scans. It is ideal for scenarios where performance is critical.

Readability and Extensibility: Solution 2 is easier to understand and extend,


making it more suitable for simpler cases or when you need to frequently update or
add conditions.

Use Case:

If you’re working with a large dataset and want better performance, Solution 1 is
likely the better choice.

If the dataset is small or if you prioritize simplicity and maintainability, Solution


2 may be more appropriate.

Question 37

Employees Whose Manager Left the Company

Difficulty: Easy

#### Table: Employees

| Column Name | Type |


|--------------|----------|
| employee_id | int |
| name | varchar |
| manager_id | int |
| salary | int |

- `employee_id` is the primary key for this table.


- This table contains information about the employees, their salary,
and the ID of their manager. Some employees do not have a manager
(i.e., `manager_id` is null).

#### Problem Statement


Find the IDs of the employees whose salary is strictly less than
$30,000 and whose manager left the company. When a manager leaves the
company, their information is deleted from the Employees table,
but the reports still have their `manager_id` set to the manager that left.

Return the result table ordered by `employee_id`.

#### Example

**Input**:
Employees table:

| employee_id | name | manager_id | salary |


|-------------|-----------|------------|--------|
| 3 | Mila | 9 | 60301 |
| 12 | Antonella | null | 31000 |
| 13 | Emery | null | 67084 |
| 1 | Kalel | 11 | 21241 |
| 9 | Mikaela | null | 50937 |
| 11 | Joziah | 6 | 28485 |

**Output**:

| employee_id |
|-------------|
| 11 |

**Explanation**:
- The employees with a salary less than $30,000 are 1 (Kalel) and 11 (Joziah).
- Kalel's manager is employee 11, who is still in the company (Joziah).
- Joziah's manager is employee 6, who left the company because there is no
row for employee 6 as it was deleted.

[LeetCode 1978]

Solution

SELECT employee_id
FROM employees
WHERE manager_id NOT IN (SELECT employee_id FROM employees)
AND salary < 30000
ORDER BY employee_id;
Main Query: Selects employee_id from the employees table.

WHERE Clause: manager_id NOT IN (SELECT employee_id FROM employees) :

Filters out employees whose manager_id is found in the list of employee_id s in the
table. It effectively finds employees with non-existent or null managers.

salary < 30000 : Further filters the result to include only employees with a salary
below 30,000.

ORDER BY Clause: Orders the output by employee_id in ascending order.

The query retrieves employees who do not have a valid manager and earn less than
30,000.

Question 38

Exchange Seats

Difficulty: Medium

#### Table: Seat

| Column Name | Type |


|-------------|---------|
| id | int |
| student | varchar |

- `id` is the primary key (unique value) column for this table.
- Each row of this table indicates the name and the ID of a student.
- The ID sequence always starts from 1 and increments continuously.

#### Problem Statement


Write a solution to swap the seat `id` of every two consecutive students.
If the number of students is odd, the `id` of the last student is not swapped.

Return the result table ordered by `id` in ascending order.

#### Example
**Input**:
Seat table:

| id | student |
|----|---------|
| 1 | Abbot |
| 2 | Doris |
| 3 | Emerson |
| 4 | Green |
| 5 | Jeames |

**Output**:

| id | student |
|----|---------|
| 1 | Doris |
| 2 | Abbot |
| 3 | Green |
| 4 | Emerson |
| 5 | Jeames |

**Explanation**:
- Note that if the number of students is odd, there is no need to change
the last student's seat. In this case, the last student is Jeames,
and his seat remains unchanged.

[LeetCode 626]

Solution

WITH max_id AS (
SELECT MAX(id) AS mid
FROM seat
)

SELECT
CASE
WHEN id % 2 = 1 AND id != max_id.mid THEN id + 1
WHEN id % 2 = 0 THEN id - 1
ELSE id
END AS id,
student
FROM seat, max_id
ORDER BY id;

Common Table Expression (CTE):

WITH max_id AS (SELECT MAX(id) AS mid FROM seat) : This part calculates the
maximum id value from the seat table and stores it in a CTE called max_id with
an alias mid .

Main Query:

SELECT Statement: The main query selects and transforms the id and student

columns from the seat table.

CASE Statement:

WHEN id % 2 = 1 AND id != max_id.mid THEN id + 1 : If the id is odd and not


equal to the maximum id , it increments the id by 1.

WHEN id % 2 = 0 THEN id - 1 : If the id is even, it decrements the id by 1.

ELSE id : If neither condition is met (i.e., the id is the maximum odd number), it
keeps the id unchanged.

FROM Clause: Uses a Cartesian join between the seat table and the max_id CTE,
making the maximum id value available for use in the CASE logic.

ORDER BY Clause: Orders the output by the transformed id .


The query swaps the positions of adjacent odd and even id s in the seat table,
except for the maximum odd id , which remains unchanged.

Question 39

Movie Rating

Difficulty: Medium

#### Table: Movies

| Column Name | Type |


|---------------|---------|
| movie_id | int |
| title | varchar |

- `movie_id` is the primary key (column with unique values) for this table.
- `title` is the name of the movie.

#### Table: Users

| Column Name | Type |


|---------------|---------|
| user_id | int |
| name | varchar |

- `user_id` is the primary key (column with unique values) for this table.
- The column `name` has unique values.

#### Table: MovieRating

| Column Name | Type |


|---------------|---------|
| movie_id | int |
| user_id | int |
| rating | int |
| created_at | date |

- (`movie_id`, `user_id`) is the primary key (column with unique values)


for this table.
- This table contains the rating of a movie by a user in their review.
- `created_at` is the user's review date.

#### Problem Statement


Write a solution to:
1. Find the name of the user who has rated the greatest number of movies.
In case of a tie, return the lexicographically smaller user name.
2. Find the movie name with the highest average rating in February 2020.
In case of a tie, return the lexicographically smaller movie name.

Return the result table in any order.

#### Example

**Input**:
Movies table:

| movie_id | title |
|----------|----------|
| 1 | Avengers |
| 2 | Frozen 2 |
| 3 | Joker |

Users table:

| user_id | name |
|---------|--------|
| 1 | Daniel |
| 2 | Monica |
| 3 | Maria |
| 4 | James |

MovieRating table:

| movie_id | user_id | rating | created_at |


|----------|---------|--------|------------|
| 1 | 1 | 3 | 2020-01-12 |
| 1 | 2 | 4 | 2020-02-11 |
| 1 | 3 | 2 | 2020-02-12 |
| 1 | 4 | 1 | 2020-01-01 |
| 2 | 1 | 5 | 2020-02-17 |
| 2 | 2 | 2 | 2020-02-01 |
| 2 | 3 | 2 | 2020-03-01 |
| 3 | 1 | 3 | 2020-02-22 |
| 3 | 2 | 4 | 2020-02-25 |

**Output**:

| results |
|-----------|
| Daniel |
| Frozen 2 |

**Explanation**:
- Daniel and Monica have rated 3 movies ("Avengers", "Frozen 2" and "Joker"),
but Daniel is lexicographically smaller.
- "Frozen 2" and "Joker" have an average rating of 3.5 in February,
but "Frozen 2" is lexicographically smaller.

[LeetCode 1341]

Solution

WITH results1 AS (
SELECT u.name AS results
FROM MovieRating AS r
JOIN Users AS u ON u.user_id = r.user_id
GROUP BY u.user_id
ORDER BY COUNT(*) DESC, u.name ASC
LIMIT 1
),

febmov AS (
SELECT movie_id, AVG(rating) AS av
FROM MovieRating
WHERE created_at >= '2020-02-01' AND created_at <= '2020-02-29'
GROUP BY movie_id
),

results2 AS (
SELECT m.title AS results
FROM Movies AS m
JOIN febmov fm ON m.movie_id = fm.movie_id
ORDER BY fm.av DESC, m.title ASC
LIMIT 1
)

SELECT * FROM results1


UNION ALL
SELECT * FROM results2;

results1 CTE:
This common table expression finds the user with the highest number of movie
ratings.

It joins the MovieRating table with the Users table to group ratings by each user.

The result is ordered by the number of ratings in descending order and then by the
user’s name in ascending order. The top user (with the most ratings) is selected
using LIMIT 1 .

febmov CTE:

This CTE calculates the average rating for each movie in February 2020.

The WHERE clause filters ratings within the specified date range (February 1st to
February 29th).

Results are grouped by movie_id , with the average rating ( av ) calculated for each
movie.

results2 CTE:

This part selects the title of the movie with the highest average rating in February
2020.

It joins the Movies table with the febmov CTE to order by the average rating in
descending order and by movie title in ascending order. The top-rated movie is
selected using LIMIT 1 .

Final Query:
The final SELECT statement combines the results from results1 and results2

using UNION ALL . This will display the user with the most ratings and the highest-
rated movie in February 2020, both in a single output.

Summary:

The query retrieves the top user based on the number of movie ratings and the
highest-rated movie in February 2020, presenting them in a combined result set.

Question 40

Restaurant Growth

Difficulty: Medium

#### Table: Customer

| Column Name | Type |


|---------------|---------|
| customer_id | int |
| name | varchar |
| visited_on | date |
| amount | int |

- (`customer_id`, `visited_on`) is the primary key for this table.


- This table contains data about customer transactions in a restaurant.
- `visited_on` is the date on which the customer visited the restaurant.
- `amount` is the total paid by a customer.

#### Problem Statement


You are the restaurant owner and want to analyze a possible expansion.
Compute the moving average of how much the customer paid in a seven-day
window (i.e., current day + 6 days before). The `average_amount` should be
rounded to two decimal places.

Return the result table ordered by `visited_on` in ascending order.

#### Example

**Input**:
Customer table:
| customer_id | name | visited_on | amount |
|-------------|---------|------------|--------|
| 1 | Jhon | 2019-01-01 | 100 |
| 2 | Daniel | 2019-01-02 | 110 |
| 3 | Jade | 2019-01-03 | 120 |
| 4 | Khaled | 2019-01-04 | 130 |
| 5 | Winston | 2019-01-05 | 110 |
| 6 | Elvis | 2019-01-06 | 140 |
| 7 | Anna | 2019-01-07 | 150 |
| 8 | Maria | 2019-01-08 | 80 |
| 9 | Jaze | 2019-01-09 | 110 |
| 1 | Jhon | 2019-01-10 | 130 |
| 3 | Jade | 2019-01-10 | 150 |

**Output**:

| visited_on | amount | average_amount |


|------------|--------|----------------|
| 2019-01-07 | 860 | 122.86 |
| 2019-01-08 | 840 | 120 |
| 2019-01-09 | 840 | 120 |
| 2019-01-10 | 1000 | 142.86 |

**Explanation**:
- The 1st moving average from 2019-01-01 to 2019-01-07 has an `average_amount`
of (100 + 110 + 120 + 130 + 110 + 140 + 150)/7 = 122.86.
- The 2nd moving average from 2019-01-02 to 2019-01-08 has an `average_amount`
of (110 + 120 + 130 + 110 + 140 + 150 + 80)/7 = 120.
- The 3rd moving average from 2019-01-03 to 2019-01-09 has an `average_amount`
of (120 + 130 + 110 + 140 + 150 + 80 + 110)/7 = 120.
- The 4th moving average from 2019-01-04 to 2019-01-10 has an `average_amount`
of (130 + 110 + 140 + 150 + 80 + 110 + 130 + 150)/7 = 142.86.

[LeetCode 1321]

Solution

WITH customer_aggregated AS (
SELECT visited_on, SUM(amount) AS amount
FROM customer
GROUP BY visited_on
),
7_days_results AS (
SELECT
visited_on,
SUM(amount) OVER (
ORDER BY visited_on
ROWS BETWEEN 6 PRECEDING AND 0 FOLLOWING
) AS amount,
ROUND(AVG(amount) OVER (
ORDER BY visited_on
ROWS BETWEEN 6 PRECEDING AND 0 FOLLOWING
), 2) AS average_amount
FROM customer_aggregated
ORDER BY visited_on
)

SELECT *
FROM 7_days_results
WHERE visited_on >= (
SELECT MIN(visited_on) + INTERVAL 6 DAY
FROM customer_aggregated
);

customer_aggregated CTE:

This common table expression aggregates daily sales data.

It calculates the total amount spent ( SUM(amount) ) grouped by each visited_on

date.

7_days_results CTE:

This CTE performs a 7-day window analysis on the aggregated sales data.

For each visited_on date, it calculates:

amount : The sum of the amount values over a 7-day window, defined by the range
ROWS BETWEEN 6 PRECEDING AND 0 FOLLOWING . This window includes the current
day and the six previous days.

average_amount : The average amount spent over the same 7-day window, rounded
to two decimal places.

The data is ordered by visited_on for the window calculations.

Final Query:

The final SELECT statement filters the 7_days_results to include only those dates
that are at least 6 days after the earliest visited_on date.

The subquery (SELECT MIN(visited_on) + INTERVAL 6 DAY FROM

customer_aggregated) finds the earliest possible date where a complete 7-day


window can be calculated.

Summary:

The query analyzes customer spending over rolling 7-day windows and retrieves
the sum and average for each date, starting from the first complete 7-day window

Question 41

Friend Requests II: Who Has the Most Friends

Difficulty: Medium

#### Table: RequestAccepted

| Column Name | Type |


|--------------|-------|
| requester_id | int |
| accepter_id | int |
| accept_date | date |

- (`requester_id`, `accepter_id`) is the primary key for this table.


- This table contains the ID of the user who sent the friend request
(`requester_id`), the ID of the user who received the request
(`accepter_id`), and the date when the request was accepted (`accept_date`).

#### Problem Statement


Write a solution to find the person who has the most friends and the
number of friends they have. The test cases are generated so that
only one person has the most friends.

Return the result as follows:


- `id`: the ID of the person with the most friends.
- `num`: the total number of friends.

#### Example

**Input**:
RequestAccepted table:

| requester_id | accepter_id | accept_date |


|--------------|-------------|-------------|
| 1 | 2 | 2016/06/03 |
| 1 | 3 | 2016/06/08 |
| 2 | 3 | 2016/06/08 |
| 3 | 4 | 2016/06/09 |

**Output**:

| id | num |
|----|-----|
| 3 | 3 |

**Explanation**:
- The person with `id 3` is friends with people 1, 2, and 4,
so they have 3 friends, which is the most among all others.

#### Follow-up
In a real-world scenario, multiple people could have the same
number of friends. Can you find all such people in this case?

[LeetCode 602]

Solution
WITH combined_counts AS (
SELECT requester_id AS id, COUNT(*) AS cnt
FROM RequestAccepted
GROUP BY requester_id
UNION ALL
SELECT accepter_id AS id, COUNT(*) AS cnt
FROM RequestAccepted
GROUP BY accepter_id
)

SELECT id, SUM(cnt) AS num


FROM combined_counts
GROUP BY id
ORDER BY num DESC
LIMIT 1;

CTE (Common Table Expression):

The combined_counts CTE is used to combine two sets of counts:

First part: It counts how many times each requester_id appears in the
RequestAccepted table.

Second part: It counts how many times each accepter_id appears.

The UNION ALL ensures that we collect both the requests initiated and accepted by
each user.

Main Query:

The main query aggregates the counts ( cnt ) for each id (which is either a
requester_id or accepter_id ) using SUM(cnt) to get the total number of
interactions per user.
GROUP BY id is used to group all interactions for each unique user.

The results are ordered by the total number of interactions ( num ) in descending
order ( ORDER BY num DESC ).

The LIMIT 1 ensures that only the user with the highest number of interactions is
returned.

Question 42

585. Investments in 2016

Difficulty: Medium

#### Table: Insurance

| Column Name | Type |


|-------------|-------|
| pid | int |
| tiv_2015 | float |
| tiv_2016 | float |
| lat | float |
| lon | float |

- `pid` is the primary key for this table.


- This table contains information about policyholders’ total
investment values in 2015 and 2016, as well as their geographical
coordinates (`lat`, `lon`).

#### Problem Statement


Write a query to report the sum of all `tiv_2016` values for
policyholders who:
- Have the same `tiv_2015` value as one or more other policyholders.
- Are not located in the same city as any other policyholder
(i.e., the `(lat, lon)` pairs must be unique).

Round the result to two decimal places.

#### Example

**Input**:
Insurance table:

| pid | tiv_2015 | tiv_2016 | lat | lon |


|-----|----------|----------|-----|-----|
| 1 | 10 | 5 | 10 | 10 |
| 2 | 20 | 20 | 20 | 20 |
| 3 | 10 | 30 | 20 | 20 |
| 4 | 10 | 40 | 40 | 40 |

**Output**:

| tiv_2016 |
|----------|
| 45.00 |

**Explanation**:
- The first and fourth records meet both conditions:
- The `tiv_2015` value 10 is shared with other records (records 1, 3, 4).
- Their `(lat, lon)` locations are unique.
- The second record fails the first condition, and the third record fails
the second condition (same location as record 2).
- The sum of `tiv_2016` for records 1 and 4 is `5 + 40 = 45.00`.

[LeetCode 585]

Solution

WITH tiv_2015_counts AS (
SELECT tiv_2015, COUNT(tiv_2015) AS cnt
FROM Insurance
GROUP BY tiv_2015
),

location_count AS (
SELECT DISTINCT CONCAT(lat, "-", lon) AS loc, COUNT(*) AS cnt
FROM Insurance
GROUP BY CONCAT(lat, "-", lon)
)

SELECT ROUND(SUM(tiv_2016), 2) AS tiv_2016


FROM Insurance I
JOIN tiv_2015_counts T
JOIN location_count L
ON I.tiv_2015 = T.tiv_2015
AND T.cnt >= 2
AND CONCAT(I.lat, "-", I.lon) = L.loc
AND L.cnt = 1;

CTE tiv_2015_counts :

This common table expression groups the Insurance table by the tiv_2015

column (Total Insured Value for 2015), calculating the count of occurrences for
each tiv_2015 value.

The result gives the number of records sharing the same tiv_2015 .

CTE location_count :

This CTE creates a unique identifier for each location by concatenating lat

(latitude) and lon (longitude) as loc .

It then counts the number of occurrences of each unique location.

Main Query:

The main query joins the Insurance table with the two CTEs ( tiv_2015_counts
and location_count ).

The join conditions are:

I.tiv_2015 = T.tiv_2015 : The tiv_2015 values from the Insurance table match
with the values in the tiv_2015_counts CTE.

T.cnt >= 2 : The count of occurrences for the tiv_2015 value is 2 or more.
CONCAT(I.lat, "-", I.lon) = L.loc : The location from the Insurance table
matches the unique location identifier in the location_count CTE.

L.cnt = 1 : The unique location appears exactly once.

It calculates the total sum of tiv_2016 (Total Insured Value for 2016) for the
matching records and rounds it to two decimal places.

Question 43

Department Top Three Salaries

**Difficulty**: Hard

#### Table: Employee

| Column Name | Type |


|--------------|---------|
| id | int |
| name | varchar |
| salary | int |
| departmentId | int |

- `id` is the primary key for this table.


- `departmentId` is a foreign key referencing the `id` in the Department table.
- Each row contains the employee's `id`, `name`, `salary`, and the department th

#### Table: Department

| Column Name | Type |


|-------------|---------|
| id | int |
| name | varchar |

- `id` is the primary key for this table.


- Each row contains the department's `id` and `name`.

#### Problem Statement


A company wants to know who the top three highest-paid employees in each departm

Write an SQL query to return a result table that lists the department name, empl
#### Example

**Input**:

**Employee** table:

| id | name | salary | departmentId |


|-----|-------|--------|--------------|
| 1 | Joe | 85000 | 1 |
| 2 | Henry | 80000 | 2 |
| 3 | Sam | 60000 | 2 |
| 4 | Max | 90000 | 1 |
| 5 | Janet | 69000 | 1 |
| 6 | Randy | 85000 | 1 |
| 7 | Will | 70000 | 1 |

**Department** table:

| id | name |
|-----|-------|
| 1 | IT |
| 2 | Sales |

**Output**:

| Department | Employee | Salary |


|------------|----------|--------|
| IT | Max | 90000 |
| IT | Joe | 85000 |
| IT | Randy | 85000 |
| IT | Will | 70000 |
| Sales | Henry | 80000 |
| Sales | Sam | 60000 |

**Explanation**:
- In the IT department:
- Max earns the highest unique salary.
- Both Randy and Joe earn the second-highest unique salary.
- Will earns the third-highest unique salary.
- In the Sales department:
- Henry earns the highest salary.
- Sam earns the second-highest salary.
- There is no third-highest salary as there are only two employees.

[LeetCode 185]
Solution

WITH top_3 AS (
SELECT *
FROM (
SELECT DISTINCT departmentId, salary,
DENSE_RANK() OVER (PARTITION BY departmentId ORDER BY salary DESC)
AS rn
FROM employee
) AS subquery
WHERE rn <= 3
)

SELECT D.name AS Department,


E.name AS Employee,
E.salary AS Salary
FROM employee E
JOIN department D ON E.departmentId = D.id
JOIN top_3 T ON E.departmentId = T.departmentId
AND E.salary = T.salary
ORDER BY E.salary DESC, Employee, Department;

CTE top_3 :

This Common Table Expression (CTE) identifies the top 3 salaries in each
department:

It uses DENSE_RANK() to rank salaries within each departmentId in descending


order of salary.

Only ranks 1 to 3 are retained in the CTE ( WHERE rn <= 3 ), capturing the highest
three salaries (or fewer if a department has fewer than three employees).

Main Query:
The main query joins the employee table ( E ) with the department table ( D ) and
the CTE top_3 ( T ) to retrieve the top 3 earners from each department.

E.departmentId = D.id links each employee with their department name.

E.departmentId = T.departmentId and E.salary = T.salary match only the top 3


salaries per department.

Finally, the query orders the results by E.salary (descending), then by Employee

name, and Department name.

The query returns the top 3 earners from each department, showing each
department’s name, the employee name, and salary, ordered by salary in
descending order and then alphabetically by employee and department.

Question 44

Fix Names in a Table

**Difficulty**: Easy

#### Table: Users

| Column Name | Type |


|-------------|---------|
| user_id | int |
| name | varchar |

- `user_id` is the primary key for this table.


- Each row contains a user's `user_id` and `name`. The `name` column
consists of both lowercase and uppercase characters.

#### Problem Statement


Write an SQL query to fix the names in the `Users` table so that only
the first character is uppercase and the rest are lowercase.

Return the result ordered by `user_id`.


#### Example

**Input**:

Users table:

| user_id | name |
|---------|-------|
| 1 | aLice |
| 2 | bOB |

**Output**:

| user_id | name |
|---------|-------|
| 1 | Alice |
| 2 | Bob |

**Explanation**:
- The name `aLice` should be changed to `Alice`.
- The name `bOB` should be changed to `Bob`.
- The result is ordered by `user_id`.

[LeetCode 1667]

Solution

SELECT
user_id,
CONCAT(UPPER(SUBSTRING(name, 1, 1)), LOWER(SUBSTRING(name, 2))) AS name
FROM users
ORDER BY user_id;

This query retrieves a list of user IDs and formats each user's name to capitalize
only the first letter, while converting the rest of the name to lowercase.

Formatting name :
UPPER(SUBSTRING(name, 1, 1)) : Extracts the first character from the name and
converts it to uppercase.

LOWER(SUBSTRING(name, 2)) : Extracts the rest of the name starting from the second
character and converts it to lowercase.

CONCAT(...) : Combines the formatted parts (uppercase first letter and lowercase
rest of the name) into a single string.

Ordering:

Results are ordered by user_id in ascending order.

This query is useful for standardizing the format of user names in the database,
ensuring that each name appears with an initial capital letter followed by
lowercase letters, regardless of the original format.

Question 45

Patients With a Condition

**Difficulty**: Easy

#### Table: Patients

| Column Name | Type |


|--------------|---------|
| patient_id | int |
| patient_name | varchar |
| conditions | varchar |

- `patient_id` is the primary key for this table.


- The `conditions` column contains 0 or more condition codes separated by spaces

#### Problem Statement


Write an SQL query to find the `patient_id`, `patient_name`, and `conditions` of

Return the result in any order.

#### Example

**Input**:

Patients table:

| patient_id | patient_name | conditions |


|------------|--------------|--------------|
| 1 | Daniel | YFEV COUGH |
| 2 | Alice | |
| 3 | Bob | DIAB100 MYOP |
| 4 | George | ACNE DIAB100 |
| 5 | Alain | DIAB201 |

**Output**:

| patient_id | patient_name | conditions |


|------------|--------------|--------------|
| 3 | Bob | DIAB100 MYOP |
| 4 | George | ACNE DIAB100 |

**Explanation**:
- Bob and George have conditions that start with "DIAB1".
- Patients with IDs 1, 2, and 5 do not meet the criteria.

[LeetCode 1527]

Solution

SELECT
patient_id,
patient_name,
conditions
FROM patients
WHERE
conditions LIKE 'DIAB1%'
OR conditions LIKE '% DIAB1%';
This query retrieves the patient_id , patient_name , and conditions columns
from the patients table, specifically selecting only those patients whose
conditions contain "DIAB1" at the beginning or as a separate word later in the
condition list.

Filtering Conditions:

conditions LIKE 'DIAB1%' : Matches any condition that starts with "DIAB1."

conditions LIKE '% DIAB1%' : Matches any condition where "DIAB1" appears
after a space, ensuring it’s found within a list of conditions (e.g., "ASTHMA
DIAB1").

The OR operator ensures that rows meeting either condition will be included.

This query identifies patients with a specific condition, "DIAB1," whether it


appears at the beginning or elsewhere within a list of conditions. This approach is
useful for filtering patients with particular health conditions.

Question 46

Delete Duplicate Emails

**Difficulty**: Easy

#### Table: Person

| Column Name | Type |


|-------------|---------|
| id | int |
| email | varchar |

- `id` is the primary key for this table.


- Each row contains an email address, with no uppercase letters.
#### Problem Statement
Write an SQL query to delete all duplicate emails, keeping only one
entry for each unique email with the smallest `id`.

**Note**:
- You need to write a `DELETE` statement, not a `SELECT` query.
- After the script runs, the table should show only unique emails with the
lowest `id` for each.

#### Example

**Input**:

Person table:

| id | email |
|-----|------------------|
| 1 | [email protected] |
| 2 | [email protected] |
| 3 | [email protected] |

**Output**:

| id | email |
|-----|------------------|
| 1 | [email protected] |
| 2 | [email protected] |

**Explanation**:
- The email `[email protected]` appears twice with IDs 1 and 3. We delete
the entry with the larger ID, keeping the one with ID 1.

[LeetCode 196]

Solution

DELETE A
FROM Person A, Person B
WHERE A.email = B.email
AND A.id > B.id;
This query deletes duplicate rows from the Person table based on the email

column, keeping only the row with the smallest id for each unique email.

FROM Clause:

Person A and Person B represent two instances of the same Person table,
enabling comparison between pairs of rows in the table.

WHERE Clause:

A.email = B.email : Matches rows where both A and B have the same email

value.

A.id > B.id : Ensures that only the row with the higher id (considered the
duplicate) is selected for deletion.

In this way, for each unique email , only the row with the smallest id is retained,
and any additional rows with the same email but a higher id are removed from
the table. This approach is efficient for removing duplicates while preserving the
first occurrence of each unique email.

Question 47

Second Highest Salary

**Difficulty**: Medium

#### Table: Employee

| Column Name | Type |


|-------------|------|
| id | int |
| salary | int |
- `id` is the primary key for this table.
- Each row contains the salary of an employee.

#### Problem Statement


Write a solution to find the second highest distinct salary from the
`Employee` table. If there is no second highest salary, return `null`.

#### Example 1

**Input**:

Employee table:

| id | salary |
|-----|--------|
| 1 | 100 |
| 2 | 200 |
| 3 | 300 |

**Output**:

| SecondHighestSalary |
|---------------------|
| 200 |

#### Example 2

**Input**:

Employee table:

| id | salary |
|-----|--------|
| 1 | 100 |

**Output**:

| SecondHighestSalary |
|---------------------|
| null |

**Explanation**:
- In the first example, the second highest distinct salary is 200.
- In the second example, there is only one salary, so the second highest
salary does not exist and `null` is returned.

[LeetCode 176]
Solution

WITH RankedSalaries AS (
SELECT salary,
ROW_NUMBER() OVER (ORDER BY salary DESC) AS rn
FROM (SELECT DISTINCT salary FROM employee) AS sub
)

SELECT
(SELECT salary AS SecondHighestSalary
FROM RankedSalaries
WHERE rn = 2) AS SecondHighestSalary;

This query identifies the second-highest unique salary from the employee table.

CTE (Common Table Expression) RankedSalaries :

The RankedSalaries CTE retrieves distinct salaries from the employee table,
removing any duplicates.

ROW_NUMBER() assigns a rank ( rn ) to each distinct salary, ordered in descending


order. The highest salary receives rn = 1 , the next receives rn = 2 , and so on.

Main Query:

The main query selects the salary with rn = 2 from RankedSalaries , which
represents the second-highest unique salary.

If fewer than two distinct salaries exist, the query will return NULL as
SecondHighestSalary .
This refined version makes the query purpose clearer, using RankedSalaries to
indicate that the CTE ranks unique salaries in descending order.

Alternate solution

SELECT (
SELECT DISTINCT salary
FROM Employee
ORDER BY salary DESC
LIMIT 1 OFFSET 1
) AS SecondHighestSalary;

The inner query directly selects distinct salaries from Employee , sorts them in
descending order, and uses LIMIT 1 OFFSET 1 to get the second-highest salary.

If there’s only one unique salary, this query implicitly returns NULL as no row
exists at OFFSET

Simplicity: This is more concise, using only one query without any CTEs.

Efficiency: The query might be more efficient since it fetches only the top two
distinct salaries directly without applying ROW_NUMBER() across all entries.

Question 48

Group Sold Products By The Date

**Difficulty**: Easy

#### Table: Activities


| Column Name | Type |
|-------------|---------|
| sell_date | date |
| product | varchar |

- There is no primary key for this table, and it may contain duplicates.
- Each row records the name of a product sold and the date it was sold.

#### Problem Statement


Write a solution to find, for each date, the number of **different**
products sold and their names, sorted lexicographically.
Return the result table ordered by `sell_date`.

#### Example 1

**Input**:

Activities table:

| sell_date | product |
|------------|-------------|
| 2020-05-30 | Headphone |
| 2020-06-01 | Pencil |
| 2020-06-02 | Mask |
| 2020-05-30 | Basketball |
| 2020-06-01 | Bible |
| 2020-06-02 | Mask |
| 2020-05-30 | T-Shirt |

**Output**:

| sell_date | num_sold | products |


|------------|----------|-------------------------------|
| 2020-05-30 | 3 | Basketball,Headphone,T-Shirt |
| 2020-06-01 | 2 | Bible,Pencil |
| 2020-06-02 | 1 | Mask |

**Explanation**:
- For `2020-05-30`, the sold items were `Headphone`, `Basketball`,
and `T-Shirt`. After sorting lexicographically, we return: `Basketball,
Headphone, T-Shirt`.
- For `2020-06-01`, the sold items were `Pencil` and `Bible`. After
sorting lexicographically, we return: `Bible, Pencil`.
- For `2020-06-02`, the only sold item was `Mask`.

[LeetCode 1484]
Solution

SELECT
sell_date,
COUNT(DISTINCT product) AS num_sold,
GROUP_CONCAT(DISTINCT product ORDER BY product ASC SEPARATOR ',') AS product
FROM
Activities
GROUP BY
sell_date
ORDER BY
sell_date;

COUNT(DISTINCT product): Counts the number of unique products sold on each


date ( sell_date ).

GROUP_CONCAT(DISTINCT product ORDER BY product ASC SEPARATOR ‘,’):

DISTINCT product : Ensures each product is only counted once per sell_date .

ORDER BY product ASC : Sorts products lexicographically (alphabetical order).

SEPARATOR ',' : Specifies that each product should be separated by a comma in the
final list.

GROUP BY sell_date: Groups the results by sell_date to get a summary per date.

ORDER BY sell_date: Orders the output by sell_date as required in the problem.

Question 49
List the Products Ordered in a Period

**Difficulty**: Easy

#### Table: Products

| Column Name | Type |


|------------------|---------|
| product_id | int |
| product_name | varchar |
| product_category | varchar |

- `product_id` is the primary key for this table, which contains data
about the company's products.

#### Table: Orders

| Column Name | Type |


|---------------|---------|
| product_id | int |
| order_date | date |
| unit | int |

- `product_id` is a foreign key that references the `Products` table.


- `unit` is the number of products ordered on `order_date`.

#### Problem Statement


Write a solution to get the names of products that have at least
100 units ordered in **February 2020** and their total order amount.
Return the result in any order.

#### Example 1

**Input**:

Products table:

| product_id | product_name | product_category |


|------------|-----------------------|------------------|
| 1 | Leetcode Solutions | Book |
| 2 | Jewels of Stringology | Book |
| 3 | HP | Laptop |
| 4 | Lenovo | Laptop |
| 5 | Leetcode Kit | T-shirt |

Orders table:

| product_id | order_date | unit |


|------------|-------------|------|
| 1 | 2020-02-05 | 60 |
| 1 | 2020-02-10 | 70 |
| 2 | 2020-01-18 | 30 |
| 2 | 2020-02-11 | 80 |
| 3 | 2020-02-17 | 2 |
| 3 | 2020-02-24 | 3 |
| 4 | 2020-03-01 | 20 |
| 4 | 2020-03-04 | 30 |
| 4 | 2020-03-04 | 60 |
| 5 | 2020-02-25 | 50 |
| 5 | 2020-02-27 | 50 |
| 5 | 2020-03-01 | 50 |

**Output**:

| product_name | unit |
|--------------------|------|
| Leetcode Solutions | 130 |
| Leetcode Kit | 100 |

**Explanation**:
- Product `1` (Leetcode Solutions) has 60 units ordered on `2020-02-05`
and 70 units on `2020-02-10`, totaling 130 units in February 2020.
- Product `5` (Leetcode Kit) has 50 units ordered on `2020-02-25` and
50 units on `2020-02-27`, totaling 100 units.
- Product `2` has 80 units ordered in February, which is less than 100,
and product `3` has only 5 units. Therefore, they are not included in the
result.

Solution

WITH MonthlyProductSales AS (
SELECT product_id,
SUM(unit) AS total_units_sold
FROM Orders
WHERE order_date BETWEEN '2020-02-01' AND '2020-02-29'
GROUP BY product_id
HAVING total_units_sold >= 100
)

SELECT
p.product_name,
mps.total_units_sold AS units
FROM Products p
JOIN MonthlyProductSales mps
ON p.product_id = mps.product_id;

CTE: MonthlyProductSales

The MonthlyProductSales Common Table Expression (CTE) aggregates the


number of units sold ( SUM(unit) ) for each product_id in February 2020 ( WHERE
order_date BETWEEN '2020-02-01' AND '2020-02-29' ).

Only products with at least 100 units sold are retained ( HAVING total_units_sold

>= 100 ).

Main Query

The main query retrieves the product_name and total units sold for each
qualifying product.

It joins the Products table ( p ) with the MonthlyProductSales CTE ( mps ) on


product_id .

The final output lists the product_name and total_units_sold (renamed as


units ) for each product meeting the criteria.

This approach is efficient for filtering products with substantial sales in a specific
period and displaying the relevant product names and units sold.

Question 50
Find Users With Valid E-Mails

**Difficulty**: Easy

#### Table: Users

| Column Name | Type |


|-------------|---------|
| user_id | int |
| name | varchar |
| mail | varchar |

- `user_id` is the primary key (column with unique values) for this table.
- This table contains information about the users signed up on a website.
Some emails may be invalid.

#### Problem Statement


Write a solution to find the users who have valid emails. A valid email
must meet the following criteria:
1. The email prefix may contain letters (upper or lower case), digits,
underscore '_', period '.', and/or dash '-'.
2. The prefix must start with a letter.
3. The domain must be `@leetcode.com`.

Return the result table in any order.

#### Example 1

**Input**:

Users table:

| user_id | name | mail |


|---------|-----------|-------------------------|
| 1 | Winston | [email protected] |
| 2 | Jonathan | jonathanisgreat |
| 3 | Annabelle | @leetcode.com">[email protected] |
| 4 | Sally | [email protected] |
| 5 | Marwan | quarz#[email protected] |
| 6 | David | [email protected] |
| 7 | Shapiro | [email protected] |

**Output**:

| user_id | name | mail |


|---------|-----------|-------------------------|
| 1 | Winston | [email protected] |
| 3 | Annabelle | @leetcode.com">[email protected] |
| 4 | Sally | [email protected] |
**Explanation**:
- User 1's email is valid.
- User 2's email lacks a domain.
- User 3's email is valid.
- User 4's email is valid.
- User 5's email contains an invalid character (`#`).
- User 6's email does not have the correct domain (`@leetcode.com`).
- User 7's email starts with a period, making it invalid.

Solution

SELECT user_id, name, mail


FROM Users
WHERE mail REGEXP '^[a-zA-Z][a-zA-Z0-9._-]*@leetcode\\.com$';

To solve this problem, we need to select users whose email addresses match a
specific pattern:

The email prefix must:

Start with a letter.

Contain only letters, digits, underscores ( _ ), periods ( . ), or dashes ( - ).

The domain must be @leetcode.com .

We can achieve this using a REGEXP pattern in MySQL to enforce these conditions.

Regex Pattern: The regular expression ^[a-zA-Z][a-zA-Z0-9._-]*@leetcode\\.com

is broken down as follows:


^: Start of the string.

[a-zA-Z] : Ensures the first character in the prefix is a letter.

[a-zA-Z0-9._-]* : Allows any number of letters, digits, underscores, periods, or


dashes after the first character.

@leetcode\\.com : Ensures the email has the domain @leetcode.com . (The \\

escapes the dot in .com as a literal period.)

\\.com$ : Ensures the email ends with .com .

Appendix

THE ULTIMATE ANSI SQL KEYWORDS CHEAT SHEET

Data Definition Language (DDL)

DDL (Data Definition Language) in SQL is used to define, modify, or remove


the structure of database objects like tables, indexes, and views. It includes
commands such as CREATE , ALTER , and DROP to create a new table, change an
existing one, or delete it. DDL is focused on managing the schema of the
database, which means it handles how data is stored rather than
manipulating the actual data itself. These operations typically happen at the
structural level, affecting how data can be organized and accessed.

CREATE
Definition: Creates a new database object such as a table, view, or index.
Usage:
CREATE TABLE employees (id INT, name VARCHAR(50), salary DECIMAL(10,2));

ALTER
Definition: Modifies an existing database object such as a table or view.
Usage:

ALTER TABLE employees ADD COLUMN department VARCHAR(50);

DROP
Definition: Deletes an existing database object such as a table, view, or
index.
Usage:

DROP TABLE employees;

TRUNCATE
Definition: Removes all rows from a table without logging individual row
deletions.
Usage:

TRUNCATE TABLE employees;


COMMENT
Definition: Adds a comment to a database object.
Usage:

COMMENT ON TABLE employees IS 'Table storing employee details';

RENAME
Definition: Renames a database object such as a table or column.
Usage:

ALTER TABLE employees RENAME COLUMN name TO employee_name;

Data Manipulation Language (DML)

DML (Data Manipulation Language) in SQL is used to interact with and


modify the data within a database. It includes commands like INSERT ,

UPDATE , DELETE , and SELECT to add, change, remove, or retrieve data from
tables. Unlike DDL, which focuses on the structure of the database, DML is
concerned with the actual data stored in the tables. These operations are
essential for managing and working with the data in a database without
altering its schema or structure.

SELECT
Definition: Retrieves data from one or more tables.
Usage:
SELECT name, salary FROM employees WHERE department = ‘HR’;

INSERT
Definition: Inserts new data into a table.
Usage:

INSERT INTO employees (id, name, salary) VALUES (1, ‘John Doe’, 50000);

UPDATE
Definition: Updates existing data in a table.
Usage:

UPDATE employees SET salary = 60000 WHERE id = 1;

DELETE
Definition: Removes data from a table.
Usage:

DELETE FROM employees WHERE id = 1;

MERGE
Definition: Combines insert, update, and delete operations in a single
statement.
Usage:

MERGE INTO employees AS e


USING (SELECT 1 AS id) AS s ON (e.id = s.id)
WHEN MATCHED THEN UPDATE SET salary = 60000
WHEN NOT MATCHED THEN INSERT (id, salary) VALUES (1, 50000);

CALL
Definition: Executes a stored procedure.
Usage:

CALL calculate_bonus(1, 500);

EXPLAIN PLAN
Definition: Shows the execution plan of a SQL statement.
Usage:

EXPLAIN PLAN FOR SELECT * FROM employees;

LOCK TABLE
Definition: Locks a table to prevent other transactions from modifying it.
Usage:
LOCK TABLE employees IN EXCLUSIVE MODE;

Data Control Language (DCL)

DCL (Data Control Language) in SQL is used to manage access and control
permissions within a database. It includes commands like GRANT and REVOKE

to give or take away user privileges for actions such as querying, inserting,
updating, or deleting data. DCL ensures that users have the appropriate level
of access to the database, allowing administrators to enforce security and
control who can perform specific operations on the data or the structure of
the database.

GRANT
Definition: Gives privileges to users or roles.
Usage:

GRANT SELECT ON employees TO user1;

REVOKE
Definition: Removes privileges from users or roles.
Usage:

REVOKE SELECT ON employees FROM user1;


Transaction Control Language (TCL)

TCL (Transaction Control Language) in SQL is used to manage transactions


within a database. It includes commands like COMMIT , ROLLBACK , and
SAVEPOINT to control the changes made by DML operations. COMMIT saves all
changes permanently, while ROLLBACK undoes changes made since the last
commit, and SAVEPOINT sets a point within a transaction that can be rolled
back to if needed. TCL helps ensure data integrity by allowing you to group
multiple operations into a single transaction and control when changes are
finalized or undone.

COMMIT
Definition: Saves all changes made in the current transaction.
Usage:

COMMIT;

ROLLBACK
Definition: Undoes all changes made in the current transaction.
Usage:

ROLLBACK;

SAVEPOINT
Definition: Sets a point in a transaction to which you can later roll back.
Usage:
SAVEPOINT sp1;

SET TRANSACTION
Definition: Sets the characteristics of the current transaction, such as
isolation level.
Usage:

SET TRANSACTION ISOLATION LEVEL READ COMMITTED;

Clauses

Clauses in SQL are keywords used to specify conditions and modify queries
to retrieve or manipulate data in a more refined way. Common clauses
include WHERE to filter rows, GROUP BY to group rows based on a column,
ORDER BY to sort results, and HAVING to filter groups. Clauses work in
combination with SQL commands like SELECT , UPDATE , and DELETE to control
how data is selected, updated, or removed. They allow you to narrow down
the dataset or apply specific rules for processing the data.

FROM
Definition: Specifies the table from which to retrieve data.
Usage:

SELECT * FROM employees;


WHERE
Definition: Filters records based on a condition.
Usage:

SELECT * FROM employees WHERE salary > 50000;

GROUP BY
Definition: Groups rows sharing a property so aggregate functions can be
applied to each group.
Usage:

SELECT department, AVG(salary) FROM employees GROUP BY department;

HAVING
Definition: Filters groups based on a condition, typically used with GROUP
BY.
Usage:

SELECT department, AVG(salary) FROM employees


GROUP BY department HAVING AVG(salary) > 50000;

ORDER BY
Definition: Sorts the result set in ascending or descending order.
Usage:
SELECT * FROM employees ORDER BY salary DESC;

LIMIT
Definition: Limits the number of rows returned.
Usage:

SELECT * FROM employees LIMIT 10;

OFFSET
Definition: Skips a specific number of rows before returning the result set.
Usage:

SELECT * FROM employees OFFSET 5;

UNION
Definition: Combines the result sets of two or more SELECT queries,
excluding duplicates.
Usage:

SELECT name FROM employees UNION SELECT name FROM contractors;


UNION ALL
Definition: Combines the result sets of two or more SELECT queries,
including duplicates.
Usage:

SELECT name FROM employees UNION ALL SELECT name FROM contractors;

INTERSECT
Definition: Returns only the rows that are common to the result sets of two
SELECT queries.
Usage:

SELECT name FROM employees INTERSECT SELECT name FROM contractors;

EXCEPT
Definition: Returns rows from the first SELECT query that are not present in
the second SELECT query.
Usage:

SELECT name FROM employees EXCEPT SELECT name FROM contractors;

JOIN
Definition: Combines rows from two or more tables based on a related
column.
Usage (INNER JOIN):

SELECT e.name, d.department_name FROM employees e


INNER JOIN departments d ON e.department_id = d.id;

ON
Definition: Specifies the condition for a join operation between tables.
Usage:

SELECT e.name, d.department_name FROM employees e


INNER JOIN departments d ON e.department_id = d.id;

USING
Definition: Specifies the column to be used for a join between tables that
have the same column name.
Usage:

SELECT e.name, d.department_name FROM employees e


INNER JOIN departments d USING (department_id);

Other Keywords

DISTINCT
Definition: Removes duplicate rows from the result set.
Usage:

SELECT DISTINCT department FROM employees;

LIKE
Definition: Performs pattern matching in string searches.
Usage:

SELECT * FROM employees WHERE name LIKE ‘J%’;

DISTINCT
Definition: Removes duplicate rows from the result set.
Usage:

SELECT DISTINCT department FROM employees;

ALL
Definition: Returns all rows in the result set, including duplicates (used with
SELECT or in combination with set operators like UNION).
Usage:

SELECT ALL name FROM employees;


AND
Definition: Combines two or more conditions in a query, all of which must
be true.
Usage:

SELECT * FROM employees WHERE salary > 50000 AND department = ‘HR’;

OR
Definition: Combines two or more conditions in a query, where at least one
must be true.
Usage:

SELECT * FROM employees WHERE department = ‘HR’ OR salary > 50000;

NOT
Definition: Reverses the result of a condition.
Usage:

SELECT * FROM employees WHERE NOT department = ‘HR’;

IN
Definition: Filters records based on a list of values.
Usage:
SELECT * FROM employees WHERE department IN (‘HR’, ‘Finance’, ‘IT’);

BETWEEN
Definition: Filters records within a specific range.
Usage:

SELECT * FROM employees WHERE salary BETWEEN 50000 AND 100000;

LIKE
Definition: Performs pattern matching using wildcards.
Usage:

SELECT * FROM employees WHERE name LIKE ‘J%’;

IS NULL
Definition: Checks if a column contains NULL values.
Usage:

SELECT * FROM employees WHERE department IS NULL;

IS NOT NULL
Definition: Checks if a column does not contain NULL values.
Usage:

SELECT * FROM employees WHERE department IS NOT NULL;

AS
Definition: Renames a column or table in the result set.
Usage:

SELECT name AS employee_name, salary AS employee_salary FROM employees;

CASE
Definition: Provides conditional logic in a query, similar to IF-THEN-ELSE.
Usage:

SELECT name,
CASE WHEN salary > 50000 THEN ‘High’ ELSE ‘Low’ END AS salary_category
FROM employees;

WHEN
Definition: Specifies a condition for use in a CASE statement.
Usage:

SELECT name,
CASE
WHEN salary > 50000 THEN ‘High’
WHEN salary > 30000 THEN ‘Medium’
ELSE ‘Low’ END AS salary_category F
ROM employees;

THEN
Definition: Defines the result to return if a condition in a CASE statement is
true.
Usage:

SELECT name,
CASE WHEN salary > 50000 THEN ‘High’ ELSE ‘Low’ END AS salary_category
FROM employees;

ELSE
Definition: Specifies the result if none of the conditions in a CASE statement
are true.
Usage:

SELECT name,
CASE WHEN salary > 50000 THEN ‘High’ ELSE ‘Low’ END AS salary_category
FROM employees;

END
Definition: Marks the end of a CASE statement.
Usage:
SELECT name,
CASE WHEN salary > 50000 THEN ‘High’ ELSE ‘Low’ END AS salary_category
FROM employees;

EXISTS
Definition: Tests for the existence of any rows in a subquery.
Usage:

SELECT * FROM employees


WHERE EXISTS (SELECT 1 FROM departments WHERE department_name = ‘HR’);

ANY
Definition: Compares a value to any value in a subquery or list.
Usage:

SELECT * FROM employees


WHERE salary > ANY (SELECT salary FROM employees WHERE department = ‘HR’);

SOME
Definition: Similar to ANY, compares a value to any value in a subquery or
list.
Usage:

SELECT * FROM employees


WHERE salary > SOME (SELECT salary FROM employees WHERE department = ‘HR’);

DEFAULT
Definition: Specifies a default value for a column when no value is provided.
Usage:

CREATE TABLE employees


(id INT, name VARCHAR(50), salary DECIMAL(10,2) DEFAULT 30000);

CHECK
Definition: Adds a condition that data must satisfy before being inserted into
a table.
Usage:

CREATE TABLE employees


(id INT, name VARCHAR(50), salary DECIMAL(10,2), CHECK (salary > 0));

PRIMARY KEY
Definition: Defines one or more columns as a unique identifier for each row.
Usage:

CREATE TABLE employees (id INT PRIMARY KEY, name VARCHAR(50));


FOREIGN KEY
Definition: Establishes a relationship between two tables.
Usage:

CREATE TABLE employees (id INT, department_id INT, FOREIGN KEY (department_id)
REFERENCES departments(id));

UNIQUE
Definition: Ensures that all values in a column or group of columns are
unique.
Usage:

CREATE TABLE employees (id INT, email VARCHAR(100) UNIQUE);

INDEX
Definition: Creates an index on a table to improve query performance.
Usage:

CREATE INDEX idx_salary ON employees(salary);

VIEW
Definition: Creates a virtual table based on the result of a SELECT query.
Usage:
CREATE VIEW high_salary_employees AS
SELECT * FROM employees WHERE salary > 60000;

PROCEDURE
Definition: Defines a stored procedure for reusable logic in the database.
Usage:

CREATE PROCEDURE raise_salary(employee_id INT, increment DECIMAL)


BEGIN UPDATE employees SET salary = salary + increment WHERE id = employee_id;
END;

FUNCTION
Definition: Creates a user-defined function that returns a value.
Usage:

CREATE FUNCTION get_employee_name(employee_id INT) RETURNS VARCHAR(50)


BEGIN RETURN (SELECT name FROM employees WHERE id = employee_id);
END;

TRIGGER
Definition: Executes a specified action automatically in response to certain
events on a table or view.
Usage:
CREATE TRIGGER before_employee_insert BEFORE INSERT ON employees
FOR EACH ROW SET NEW.salary = GREATEST(NEW.salary, 30000);

CASCADE
Definition: Specifies that related rows in other tables should also be deleted
or updated when a row is deleted or updated.
Usage:

ALTER TABLE employees DROP CONSTRAINT fk_department CASCADE;

RESTRICT
Definition: Prevents the deletion or update of a row if related rows exist in
other tables.
Usage:

ALTER TABLE employees DROP CONSTRAINT fk_department RESTRICT;

WITH
Definition: Defines a Common Table Expression (CTE) for use within a
query.
Usage:

WITH high_salaries AS (SELECT * FROM employees WHERE salary > 50000)


SELECT * FROM high_salaries;

Built-in Functions

Built-in functions in SQL are predefined functions provided by the database


system to perform various operations on data. These functions can be
categorized into different types like aggregate functions (e.g., SUM , COUNT ,

AVG ), string functions (e.g., CONCAT , SUBSTRING ), date functions (e.g., NOW ,

DATEADD ), and mathematical functions (e.g., ROUND , ABS ). They simplify


complex calculations or data manipulations, allowing users to process and
transform data easily without writing custom code. Built-in functions
enhance SQL queries by enabling operations like summarizing, formatting,
or transforming data directly within the query.

Aggregate Functions

Aggregate functions in SQL are used to perform calculations on a set of


values and return a single result. Common aggregate functions include SUM

to calculate the total, COUNT to count the number of rows, AVG to find the
average, MAX to get the highest value, and MIN to get the lowest. These
functions are often used with the GROUP BY clause to group data and apply
the function to each group, summarizing large datasets into meaningful
insights, such as total sales per region or average salary by department.

AVG()
Definition: Returns the average value of a numeric column.
Usage:
SELECT AVG(salary) FROM employees;

COUNT()
Definition: Returns the number of rows that match a specified condition.
Usage:

SELECT COUNT(*) FROM employees WHERE department = ‘HR’;

MAX()
Definition: Returns the maximum value in a column.
Usage:

SELECT MAX(salary) FROM employees;

MIN()
Definition: Returns the minimum value in a column.
Usage:

SELECT MIN(salary) FROM employees;

SUM()
Definition: Returns the total sum of a numeric column.
Usage:

SELECT SUM(salary) FROM employees WHERE department = ‘HR’;

Numeric Functions

Numeric functions in SQL are built-in functions that perform operations on


numeric data. They include functions like ROUND to round a number to a
specified number of decimal places, ABS to return the absolute value, CEIL

or FLOOR to round a number up or down, POWER to raise a number to a power,


and MOD to return the remainder of a division. These functions are useful for
manipulating numeric data in calculations, helping to refine or transform
numbers in queries for tasks like rounding, finding remainders, or
performing more complex mathematical operations.

ABS()
Definition: Returns the absolute value of a number.
Usage:

SELECT ABS(-10);

ROUND()
Definition: Rounds a number to a specified number of decimal places.
Usage:
SELECT ROUND(123.456, 2);

CEIL()
Definition: Returns the smallest integer greater than or equal to a number.
Usage:

SELECT CEIL(4.2);

FLOOR()
Definition: Returns the largest integer less than or equal to a number.
Usage:

SELECT FLOOR(4.9);

EXP()
Definition: Returns e raised to the power of a given number.
Usage:

SELECT EXP(1);

LN()
Definition: Returns the natural logarithm of a number.
Usage:

SELECT LN(2.718);

LOG()
Definition: Returns the logarithm of a number to a specified base.
Usage:

SELECT LOG(10, 100);

MOD()
Definition: Returns the remainder of the division of two numbers.
Usage:

SELECT MOD(10, 3);

POWER()
Definition: Returns a number raised to the power of another number.
Usage:

SELECT POWER(2, 3);


SIGN()
Definition: Returns the sign of a number (-1, 0, or 1).
Usage:

SELECT SIGN(-10);

SQRT()
Definition: Returns the square root of a number.
Usage:

SELECT SQRT(16);

TRUNC()
Definition: Truncates a number to a specified number of decimal places.
Usage:

SELECT TRUNC(123.456, 2);

String Functions

String functions in SQL are used to perform operations on text data (strings).
Common string functions include CONCAT to join two or more strings,
SUBSTRING to extract part of a string, LENGTH to get the length of a string,
UPPER or LOWER to convert a string to uppercase or lowercase, and TRIM to
remove whitespace from the beginning or end of a string. These functions
help manipulate text in various ways, such as formatting names, extracting
specific parts of data, or cleaning up strings for better data consistency.

CONCAT()
Definition: Concatenates two or more strings into one.
Usage:

SELECT CONCAT(‘First’, ‘ ‘, ‘Last’);

LENGTH()
Definition: Returns the length of a string.
Usage:

SELECT LENGTH(‘hello’);

LOWER()
Definition: Converts a string to lowercase.
Usage:

SELECT LOWER(‘HELLO’);

UPPER()
Definition: Converts a string to uppercase.
Usage:

SELECT UPPER(‘hello’);

LTRIM()
Definition: Removes leading spaces from a string.
Usage:

SELECT LTRIM(‘ hello’);

RTRIM()
Definition: Removes trailing spaces from a string.
Usage:

SELECT RTRIM(‘hello ‘);

TRIM()
Definition: Removes leading and trailing spaces from a string.
Usage:

SELECT TRIM(‘ hello ‘);


SUBSTRING()
Definition: Extracts a substring from a string.
Usage:

SELECT SUBSTRING(‘hello world’, 1, 5);

REPLACE()
Definition: Replaces occurrences of a substring in a string with another
substring.
Usage:

SELECT REPLACE(‘hello world’, ‘world’, ‘everyone’);

POSITION()
Definition: Returns the position of a substring within a string.
Usage:

SELECT POSITION(‘world’ IN ‘hello world’);

Date and Time Functions

Date and time functions in SQL are used to perform operations on date and
time values, allowing for various manipulations and calculations. Common
functions include NOW() to retrieve the current date and time, DATEADD to add
a specified interval to a date, DATEDIFF to calculate the difference between
two dates, EXTRACT to retrieve specific parts of a date (like year or month),
and FORMAT to display dates in a specified format. These functions are
essential for handling and analyzing temporal data, enabling tasks like
calculating age, finding time intervals, and formatting dates for presentation
in reports.

CURRENT_DATE
Definition: Returns the current date.
Usage:

SELECT CURRENT_DATE;

CURRENT_TIME
Definition: Returns the current time.
Usage:

SELECT CURRENT_TIME;

CURRENT_TIMESTAMP
Definition: Returns the current date and time.
Usage:

SELECT CURRENT_TIMESTAMP;
EXTRACT()
Definition: Extracts a specific part of a date (e.g., year, month, day).
Usage:

SELECT EXTRACT(YEAR FROM CURRENT_DATE);

DATE_ADD()
Definition: Adds a specified interval to a date.
Usage:

SELECT DATE_ADD(CURRENT_DATE, INTERVAL 7 DAY);

DATE_SUB()
Definition: Subtracts a specified interval from a date.
Usage:

SELECT DATE_SUB(CURRENT_DATE, INTERVAL 7 DAY);

DATEDIFF()
Definition: Returns the difference between two dates.
Usage:
SELECT DATEDIFF(CURRENT_DATE, ‘2023–01–01’);

DATE_FORMAT()
Definition: Formats a date according to a specified format.
Usage:

SELECT DATE_FORMAT(CURRENT_DATE, ‘%Y-%m-%d’);

Conversion Functions

Conversion functions in SQL are used to change data from one type to
another, ensuring that data is in the correct format for processing or
analysis. Common conversion functions include CAST and CONVERT , which
allow you to convert data types such as converting a string to an integer or a
date to a string. Other functions, like TO_CHAR or TO_DATE , are used to format
date values into strings or convert strings into date formats, respectively.
These functions are crucial for data integrity, enabling the proper handling
of different data types in queries, calculations, and comparisons.

CAST()
Definition: Converts a value from one data type to another.
Usage:

SELECT CAST(‘123’ AS INT);


CONVERT()
Definition: Converts a value from one data type to another (similar to
CAST()).
Usage:

SELECT CONVERT(‘123’, INT);

Conditional Functions
COALESCE()
Definition: Returns the first non-null value in a list of expressions.
Usage:

SELECT COALESCE(NULL, ‘hello’, ‘world’);

NULLIF()
Definition: Returns NULL if two expressions are equal, otherwise returns the
first expression.
Usage:

SELECT NULLIF(10, 10);

GREATEST()
Definition: Returns the greatest value from a list of expressions.
Usage:
SELECT GREATEST(10, 20, 30);

LEAST()
Definition: Returns the least value from a list of expressions.
Usage:

SELECT LEAST(10, 20, 30);

System Functions

System functions in SQL are built-in functions that provide information


about the database system or the environment in which SQL is running.
They can be used to retrieve metadata, perform administrative tasks, or
obtain system-specific information. Common system functions include USER

to return the current database user, CURRENT_DATABASE to get the name of the
active database, and VERSION() to retrieve the version of the database
management system. These functions are useful for auditing,
troubleshooting, and optimizing database operations, helping users
understand the context and configuration of their SQL environment.

USER()
Definition: Returns the current database user.
Usage:

SELECT USER();
CURRENT_USER()
Definition: Returns the name of the current user.
Usage:

SELECT CURRENT_USER();

SESSION_USER()
Definition: Returns the session user name for the current session.
Usage:

SELECT SESSION_USER();

SYSTEM_USER()
Definition: Returns the system user name for the current session.
Usage:

SELECT SYSTEM_USER();

Window Functions

Window functions in SQL allow you to perform calculations across a set of


table rows that are related to the current row, without collapsing the result
into a single row like aggregate functions do. They operate over a “window”
of rows, which can be defined by the PARTITION BY and ORDER BY clauses,
enabling you to calculate values like running totals, ranks, or moving
averages while keeping all rows intact. This makes window functions useful
for tasks where you want to maintain the full dataset but also need additional
computed values.

ROW_NUMBER()
Definition: Assigns a unique row number to each row in the result set.
Usage:

SELECT name, ROW_NUMBER() OVER (ORDER BY salary) FROM employees;

RANK()
Definition: Assigns a rank to each row in a result set with possible gaps
between ranks.
Usage:

SELECT name, RANK() OVER (ORDER BY salary DESC) FROM employees;

DENSE_RANK()
Definition: Assigns a rank to each row without gaps between ranks.
Usage:

SELECT name, DENSE_RANK() OVER (ORDER BY salary DESC) FROM employees;


NTILE()
Definition: Distributes the result set into a specified number of roughly
equal groups.
Usage:

SELECT name, NTILE(4) OVER (ORDER BY salary DESC) FROM employees;

LAG()
Definition: Returns the value from a previous row in the result set.
Usage:

SELECT name, LAG(salary, 1) OVER (ORDER BY salary) FROM employees;

LEAD()
Definition: Returns the value from a subsequent row in the result set.
Usage:

SELECT name, LEAD(salary, 1) OVER (ORDER BY salary) FROM employees;

FIRST_VALUE()
Definition: Returns the first value in an ordered set.
Usage:
SELECT name, FIRST_VALUE(salary) OVER (ORDER BY salary) FROM employees;

LAST_VALUE()
Definition: Returns the last value in an ordered set.
Usage:

SELECT name, LAST_VALUE(salary) OVER (ORDER BY salary) FROM employees;

PARTITION BY
Definition: Divides the result set into partitions and applies a window
function to each partition. It is often used with window functions like
ROW_NUMBER(), RANK(), LAG(), etc.
Usage:

SELECT name,
salary,
ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS rank
FROM employees;

Sql Data Science Data Data Engineering Technology


Written by Ebo Jackson Follow

527 Followers

IT / Telecom / Software / Data Professional

More from Ebo Jackson

Ebo Jackson in Level Up Coding Ebo Jackson in Python in Plain English

25 Python Must-Know Concepts: Why Using Python Generators will


From Beginner to Advanced improve your code quality
Part 1: 5 Essential Python Concepts Every Python generators are a powerful and
Beginner Should Master efficient way to handle sequences of data.…

May 16, 2023 203 3 May 27 96 1


Ebo Jackson Ebo Jackson

Asynchronous Programming in Python Code Hierarchy is very very


Python Using Async/Await Simple
Python’s asyncio module, introduced in In Python, the hierarchy and organization of
Python 3.4, provides support for… code elements follow a structured pattern…

Aug 5 5 Jul 19 73

See all from Ebo Jackson

Recommended from Medium


Obafemi Ganesh Bajaj in Towards AI

Optimize SQL Queries for SQL Best Practices for Beginners:


Performance Working with Multiple Data Tables
Sample project refactoring for optimization Explore some of the best practices to Start
SQL analysis on Big Data and how to avoid t…

Oct 16 50 5d ago 1

Lists

Predictive Modeling w/ ChatGPT prompts


Python 50 stories · 2179 saves
20 stories · 1633 saves

ChatGPT AI Regulation
21 stories · 860 saves 6 stories · 604 saves

CyCoderX in Python in Plain English Python Coding


Mastering PySpark: A Guide to Big System Monitoring Scripts with
Data in Python Python
Learn PySpark fundamentals, 1. Retrieve System Information
transformations, and optimizations

5d ago 61 1 3d ago 21

Pritam Deb in Towards Data Engineering Data Analytics

Mastering SQL Self Joins: Common 10 SQL Tricks Every Data Analyst
Interview Questions and Solution… Should Know
In SQL, a self-join is a powerful technique that After 2 years of working as a data analyst, I’ve
allows you to join a table with itself. This is… come across some SQL tricks that can help…

Sep 4 19 1 Oct 28 377 4

See more recommendations

You might also like