0% found this document useful (0 votes)
457 views1,261 pages

? PDF - 1000+ SQL Interview Questions & Answers v2

This eBook, '1000+ SQL Interview Questions & Answers' by Zero Analyst, serves as a comprehensive resource for individuals preparing for SQL-related job interviews. It includes over 1000 questions categorized by difficulty and topic, along with practical scenarios and interview tips to enhance understanding and performance. The author, N.H., aims to empower aspiring data professionals by simplifying SQL concepts and providing a roadmap for success in their careers.

Uploaded by

Sachin VS
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
457 views1,261 pages

? PDF - 1000+ SQL Interview Questions & Answers v2

This eBook, '1000+ SQL Interview Questions & Answers' by Zero Analyst, serves as a comprehensive resource for individuals preparing for SQL-related job interviews. It includes over 1000 questions categorized by difficulty and topic, along with practical scenarios and interview tips to enhance understanding and performance. The author, N.H., aims to empower aspiring data professionals by simplifying SQL concepts and providing a roadmap for success in their careers.

Uploaded by

Sachin VS
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1261

1000+ SQL Interview Questions & Answers | By Zero Analyst

1000+ SQL Interview Questions & Answers


By Zero Analyst(N.H.)

“Every query is a new opportunity to grow!”

A Note from the Author


This eBook is more than just a collection of questions; it’s a roadmap to success. I’ve poured
my experience, knowledge, and heart into creating a resource that simplifies your learning
journey. Whether you’re prepping for your first interview or aiming to advance your career, I
hope this guide brings you one step closer to your goals.
Good luck, and remember—every query you write takes you closer to mastery.

Warm Regards,
N.H.
Founder, Zero Analyst

2
1000+ SQL Interview Questions & Answers | By Zero Analyst

Introductions

Copyright Information
© 2025 Zero Analyst(N.H.)
All rights reserved. No part of this eBook may be reproduced, distributed, or
transmitted in any form or by any means, including photocopying, recording, or other
electronic or mechanical methods, without the prior written permission of the author,
except in the case of brief quotations embodied in critical reviews and certain other
non-commercial uses permitted by copyright law.

ISBN: 9798306737812
Imprint: Independently published

3
1000+ SQL Interview Questions & Answers | By Zero Analyst

Dedication
To all aspiring Data Analysts, Data Engineers, Business Analysts, SQL Developers,
and tech enthusiasts working tirelessly to secure their dream job—this book is for
you.

4
1000+ SQL Interview Questions & Answers | By Zero Analyst

Acknowledgments

I want to express my deepest gratitude to my amazing students, followers, and


connections who have made this journey possible. To my 1:1 students who inspire me
daily with their questions and growth, and to my network who contributed resources
and feedback to shape this guide—I owe so much to your support.
A heartfelt thank you to my community of learners who have trusted me to guide
them on their path to success. This eBook is a reflection of our shared efforts, and I
am forever grateful for the encouragement and collaboration.

5
1000+ SQL Interview Questions & Answers | By Zero Analyst

About the Author


With over 5 years of hands-on experience as a Data Analyst, Business Analyst, and
Data Engineer, I’ve mentored hundreds of students, conducted SQL mock interviews,
and helped candidates land roles at top MNCs. As the founder of Zero Analyst, my
mission is to empower aspiring professionals with the knowledge and confidence to
excel in their careers.
This eBook is the culmination of my passion for teaching and my dedication to
simplifying SQL for learners at all levels.
📧 Contact [email protected]
🌐 www.zeroanalyst.com/sqlmentor

6
1000+ SQL Interview Questions & Answers | By Zero Analyst

Who Is This Book For?


This eBook is designed for professionals and aspirants targeting roles such as:

• Data Analyst
• Data Engineer
• Business Analyst
• SQL Developer
• Full Stack Developer
• Cloud Data Engineer

Whether you’re a fresher or an experienced professional looking to sharpen your SQL


skills, this book will prepare you for real-world challenges and help you ace your
interviews.

7
1000+ SQL Interview Questions & Answers | By Zero Analyst

What to Expect
• 1000+ SQL Interview Questions & Answers: Covering fundamental,
intermediate, and advanced levels.
• Detailed Explanations: Understand not just the “how” but also the “why”
behind SQL concepts.
• Practical Scenarios: Real-world problems and solutions to bridge the gap
between theory and practice.
• Interview Tips: Insights to help you stand out in interviews and impress
hiring managers.

8
1000+ SQL Interview Questions & Answers | By Zero Analyst

How to Use This Book


How to Use This Book

1. Start with Basics: If you’re new to SQL, go through the foundational questions and
explanations.
2. Practice Hands-On: For intermediate and advanced questions, implement solutions
in your preferred SQL environment.
3. Mock Interviews: Use the questions to simulate interview scenarios and improve
your confidence.
4. Revisit Regularly: SQL is a skill that sharpens with practice—make this book your
go-to reference.
5. Utilize the eBook: Access the eBook for interactive datasets and solutions, making
your practice more engaging and practical.

9
1000+ SQL Interview Questions & Answers | By Zero Analyst

Index
Introduction - Page No.3

• Overview of SQL: Importance in interviews and career advancement.

Everything in One Place (All Links🔗- )

• Report a Question, Video Explanations, Goal Tracker.

SQL Questions by Difficulty Level - Page No.23

• Super Easy: - Page No.23


Beginner-level questions on SELECT, WHERE, and basic operations.
Range: Q.1 – Q.20
• Easy: - Page No.42
Slightly advanced queries with GROUP BY, ORDER BY, and basic JOINs.
Range: Q.21 – Q.40
• Medium: - Page No.64
Complex queries involving subqueries, multi-table joins, and aggregation.
Range: Q.41 – Q.70
• Hard: - Page No.102
Advanced topics like CTEs, window functions, and complex joins.
Range: Q.71 – Q.100

SQL Questions by Topic - Page No.162

• SELECT Statement: - Page No.159


Understanding SELECT and various use cases.
Range: Q.100 – Q.110
• COUNT: - Page No.169
Filtering data using COUNT.
Range: Q.111 – Q.120
• WHERE Clause: - Page No.176
Filtering data using WHERE conditions.
Range: Q.121 – Q.130
• GROUP BY: - Page No.183
Aggregating data for summaries.
Range: Q.130 – Q.140
• GROUP BY + HAVING: - Page No.194
Filtering aggregated results.
Range: Q.141 – Q.150

10
1000+ SQL Interview Questions & Answers | By Zero Analyst

• ORDER BY: - Page No.206


Sorting data in ascending or descending order.
Range: Q.151 – Q.160
• JOINS: - Page No.216
Inner, Outer, Left, Right, and Self Joins.
Range: Q.161 – Q.170
• Subqueries: - Page No.231
Writing queries within queries.
Range: Q.171 – Q.180
• Common Table Expressions (CTEs): - Page No.245
Reusable query blocks for simplification.
Range: Q.181 – Q.190
• Window Functions: - Page No.257
ROW_NUMBER, RANK, DENSE_RANK, etc.
Range: Q.191 – Q.200
• String Functions: - Page No.268
CONCAT, SUBSTRING, LENGTH, etc.
Range: Q.200 – Q.210
• Date Functions: - Page No.279
DATEADD, DATEDIFF, CURRENT_DATE, etc.
Range: Q.211 – Q.220
• CASE Statements: - Page No.289
Conditional queries with CASE.
Range: Q.221 – Q.230
• Set Operations: - Page No.300
UNION, UNION ALL, EXCEPT, INTERSECT.
Range: Q.231 – Q.240
• Recursive CTE: - Page No.312
ACID properties, COMMIT, ROLLBACK.
Range: Q.241 – Q.250
• DDL (Data Definition Language): - Page No.322
CREATE, ALTER, DROP, etc.
Range: Q.251 – Q.260
• DML (Data Manipulation Language): - Page No.329
INSERT, UPDATE, DELETE operations.
Range: Q.261 – Q.270
• Data Cleaning: - Page No.338
Queries for cleaning and preparing data.
Range: Q.271 – Q.280

SQL Questions by Company - Page No.350

• Amazon: - Page No.350


Real-world data challenges inspired by Amazon.
Range: Q.281 – Q.300
• Google: - Page No.378
Focused on scalability and query efficiency.
Range: Q.301 – Q.320

11
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Walmart: - Page No.402


Retail-specific data queries.
Range: Q.321 – Q.340
• Flipkart: - Page No.435
E-commerce-related data handling.
Range: Q.341 – Q.360
• Spotify: - Page No.456
Music-streaming database queries.
Range: Q.361 – Q.380
• Airbnb: - Page No.476
Rental data analysis and queries.
Range: Q.381 – Q.400
• Microsoft: - Page No.499
Enterprise software and database management.
Range: Q.401 – Q.420
• Meta (Facebook): - Page No.522
Social media data insights.
Range: Q.421 – Q.440
• Netflix: - Page No.549
Content streaming and recommendation system queries.
Range: Q.441 – Q.460
• Uber: - Page No.576
Ride-hailing platform data scenarios.
Range: Q.461 – Q.480
• PayPal: - Page No.600
Payment transactions and data.
Range: Q.481 – Q.500
• PwC: - Page No.622
Accounting and employee-related data.
Range: Q.501 – Q.520
• Cisco: - Page No.645
Energy and manufacturing data problems.
Range: Q.521 – Q.540
• Zomato: - Page No.669
Restaurant data challenges.
Range: Q.541 – Q.560
• Swiggy: - Page No.698
Quick commerce and food delivery datasets.
Range: Q.561 – Q.580
• Tesla: - Page No.723
Electric car company data queries.
Range: Q.581 – Q.600
• TikTok: - Page No.754
Social media and video platform data.
Range: Q.601 – Q.620
• Apple: - Page No.781
Hardware and computer parts-related queries.
Range: Q.621 – Q.640

12
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Adobe: - Page No.807


Creative tools-related database queries.
Range: Q.641 – Q.660
• Samsung: - Page No.828
Consumer electronics database handling.
Range: Q.661 – Q.680
• IBM: - Page No.858
Enterprise and cloud-based database queries.
Range: Q.681 – Q.700
• Dell: - Page No.880
Hardware and IT services-related queries.
Range: Q.701 – Q.720
• American Express: - Page No.901
Credit card transactions data.
Range: Q.721 – Q.740
• EY: - Page No.922
Services-based data and employee wellbeing.
Range: Q.741 – Q.760
• Capgemini: - Page No.949
Services-based data queries.
Range: Q.761 – Q.780

SQL Questions by Job Profile - Page No.968

• Data Analyst: - Page No.968


Analysis-driven SQL scenarios.
Range: Q.781 – Q.790
• Data Engineer: - Page No.979
Building ETL pipelines and handling big data.
Range: Q.791 – Q.800
• Business Analyst: - Page No.990
SQL for decision-making and insights.
Range: Q.801 – Q.810
• SQL Developer: - Page No.1001
Query optimization and database management.
Range: Q.811 – Q.820
• Full Stack Developer: - Page No.1008
Database integration in applications.
Range: Q.821 – Q.830
• Cloud Data Engineer: - Page No.1012
Queries for cloud-native databases (e.g., Snowflake, Redshift).
Range: Q.831 – Q.840
• Machine Learning Engineer: - Page No.1022
SQL for feature extraction and preparation.
Range: Q.841 – Q.850
• Backend Developer: - Page No.1025
Efficient SQL for backend systems.
Range: Q.851 – Q.860

13
1000+ SQL Interview Questions & Answers | By Zero Analyst

Special Sections - Page No.1028

• 50+ Most Asked Questions & Answers: - Page No.1028


Most common SQL interview questions across all roles.
Range: Q.861 – Q.910
• 100 Theoretical SQL Questions: - Page No.1091
Fundamental concepts and SQL theory.
Range: Q.911 – Q.1010
• Super Hard Questions: - Page No.1185
SQL performance tuning and real-world problem-solving.
Range: Q.1011 – Q.1060
• SQL Cheatsheet: - Page No.1259
A quick reference for common SQL syntax and functions.
Range: -
• Thank You Page: - Page No.1259
Acknowledgments and closing notes.

14
1000+ SQL Interview Questions & Answers | By Zero Analyst

Additional Information - All Links 🔗

🔗 Access Online -eBook & Drive📕

Click the below link or scan the QR Code

Click or Type Below Link into your browser!


🔗 https://tinyurl.com/za-sql1000

15
1000+ SQL Interview Questions & Answers | By Zero Analyst

🎯 Completion Goal Tracker


Click the below link or scan the QR Code

Access Your Completions Tracker - Start Solving & Tracking Progress

Click or Type Below Link into your browser!


🔗https://tinyurl.com/za-sql1000-t

16
1000+ SQL Interview Questions & Answers | By Zero Analyst

GitHub Repository - 🛢

Click the below link or scan the QR Code

Click or Type Below Link into your browser!


🔗 https://tinyurl.com/za-sql1000-g

17
1000+ SQL Interview Questions & Answers | By Zero Analyst

Reports Bugs in the Questions - 🐛


Click the below link or scan the QR Code

Click or Type Below Link into your browser!


🔗 https://tinyurl.com/za-sql1000-b

18
1000+ SQL Interview Questions & Answers | By Zero Analyst

Request Video Explanations - 🎬


Click the below link or scan the QR Code

Click or Type Below Link into your browser!


🔗 https://tinyurl.com/za-sql1000-b

19
1000+ SQL Interview Questions & Answers | By Zero Analyst

Learn SQL in 21 Hours (+4 Real Data


Projects) from me

Click the below link or scan the QR Code

Click or Type Below Link into your browser!


🔗https://tinyurl.com/za-learnsql

20
1000+ SQL Interview Questions & Answers | By Zero Analyst

Book 1:1 Session with me 💻


Click the below link or scan the QR Code

Click or Type Below Link into your browser!


🔗 https://tinyurl.com/za-bookcall

21
1000+ SQL Interview Questions & Answers | By Zero Analyst

Are you ready?


So let’s start the game!

22
1000+ SQL Interview Questions & Answers | By Zero Analyst

Question By Difficulty Level


• Super Easy
o Q.1

Retrieve all rows from the students table.

Datasets & Schemas


Students Table:
CREATE TABLE students (
student_id INT PRIMARY KEY,
first_name VARCHAR(50),
last_name VARCHAR(50),
age INT
);

INSERT INTO students (student_id, first_name, last_name, age)


VALUES
(1, 'Alice', 'Johnson', 20),
(2, 'Bob', 'Smith', 22),
(3, 'Charlie', 'Brown', 19),
(4, 'Diana', 'Prince', 21),
(5, 'Eve', 'Adams', 20);

Learning:

• To retrieve all rows from the table, we use the SELECT statement.
• The wildcard is used to select all columns from the table.
• We don’t need to filter any rows, so no WHERE clause is needed.
• The SELECT * query will return every row and every column from the
students table.

Answer:
SELECT * FROM students;

This query will retrieve all the data from the students table. The result will
include all five rows and four columns: student_id, first_name,
last_name, and age

o Q.2

Find all employees whose salary is greater than 4000.

23
1000+ SQL Interview Questions & Answers | By Zero Analyst

Datasets & Schemas


Employees Table:
CREATE TABLE employees (
employee_id INT PRIMARY KEY,
name VARCHAR(50),
salary DECIMAL(10, 2)
);

INSERT INTO employees (employee_id, name, salary)


VALUES
(1, 'John Doe', 5000.00),
(2, 'Jane Smith', 3500.00),
(3, 'Emily Davis', 4200.00),
(4, 'Michael Scott', 3800.00),
(5, 'Pam Beesly', 4500.00);

Learning:

• To filter employees based on their salary, we use the WHERE clause.


• The > (greater than) operator allows us to select rows where the
salary is greater than a specified value (4000 in this case).
• We will use the WHERE condition to return only employees with a salary
greater than 4000.

Answer:
SELECT *
FROM employees
WHERE salary > 4000;

Explanation:

• This query will return all employees whose salary is greater than 4000.
The result will include the employee details (ID, name, and salary) for
those who meet the salary condition.

o Q.3

List all books published in the year 2023.

Datasets & Schemas


Books Table:
CREATE TABLE books (
book_id INT PRIMARY KEY,
title VARCHAR(100),
author VARCHAR(50),
published_year INT
);

24
1000+ SQL Interview Questions & Answers | By Zero Analyst

INSERT INTO books (book_id, title, author, published_year)


VALUES
(1, 'SQL for Beginners', 'Author A', 2023),
(2, 'Advanced SQL Techniques', 'Author B', 2022),
(3, 'Database Design', 'Author C', 2023),
(4, 'Learn SQL in 7 Days', 'Author D', 2021),
(5, 'Mastering SQL', 'Author E', 2023);

Learning:

• To filter books published in a specific year, we will use the WHERE


clause.
• The = operator allows us to filter rows where the published_year
matches the specified value (2023 in this case).
• By applying this filter, we will retrieve only the books published in
2023.

Answer:
SELECT *
FROM books
WHERE published_year = 2023;

Explanation:

• This query retrieves all the books from the books table that were
published in the year 2023.
• The WHERE clause ensures that only the rows with the published_year
equal to 2023 are returned.
• The result will include the book_id, title, author, and published_year
for each book that matches the condition.

o Q.4

Question 4:
Question: Count the number of products in the products table.

Datasets & Schemas


Products Table:
CREATE TABLE products (
product_id INT PRIMARY KEY,
product_name VARCHAR(50),
price DECIMAL(10, 2)

25
1000+ SQL Interview Questions & Answers | By Zero Analyst

);

INSERT INTO products (product_id, product_name, price)


VALUES
(1, 'Laptop', 1000.00),
(2, 'Smartphone', 700.00),
(3, 'Tablet', 300.00),
(4, 'Headphones', 50.00),
(5, 'Monitor', 200.00);

Learning:

• To count the total number of rows in a table, we can use the COUNT()
aggregate function.
• The COUNT() function will return the number of rows in a table, based
on a specified column or all rows.
• If we use COUNT(*), it counts all rows regardless of any column’s
value, which is ideal when counting the total number of products in
this case.

Answer:
SELECT COUNT(*) AS total_products
FROM products;

Explanation:

• This query counts the total number of rows in the products table.
• The COUNT(*) function returns the total number of products in the
table, regardless of the product details.
• The result will be a single number showing the total number of
products, which in this case should return 5 since there are 5 rows in
the table.

o Q.5

Question 5:
Question: Find all orders placed by the customer with customer_id = 1.

Datasets & Schemas


Orders Table:
CREATE TABLE orders (
order_id INT PRIMARY KEY,
customer_id INT,

26
1000+ SQL Interview Questions & Answers | By Zero Analyst

product_name VARCHAR(50),
order_date DATE
);

INSERT INTO orders (order_id, customer_id, product_name, order_date)


VALUES
(1, 1, 'Laptop', '2024-12-01'),
(2, 2, 'Smartphone', '2024-12-02'),
(3, 1, 'Tablet', '2024-12-03'),
(4, 3, 'Headphones', '2024-12-04'),
(5, 2, 'Monitor', '2024-12-05');

Learning:

• To filter orders based on a specific customer_id, we will use the


WHERE clause.
• The WHERE clause allows us to specify a condition, in this case,
customer_id = 1.
• This query will return all orders placed by the customer with
customer_id = 1.

Answer:
SELECT *
FROM orders
WHERE customer_id = 1;

Explanation:

• The SELECT * statement retrieves all columns (order_id,


customer_id, product_name, and order_date) from the orders
table.
• The WHERE customer_id = 1 condition filters the rows to include
only the orders placed by the customer with customer_id = 1.
• The result will return the orders for Laptop and Tablet, as those are
the products ordered by this customer.

o Q.6

Retrieve all rows from the cities table.

Datasets & Schemas


Cities Table:
CREATE TABLE cities (
city_id INT PRIMARY KEY,
city_name VARCHAR(50),
country VARCHAR(50)
);

27
1000+ SQL Interview Questions & Answers | By Zero Analyst

INSERT INTO cities (city_id, city_name, country)


VALUES
(1, 'New York', 'USA'),
(2, 'London', 'UK'),
(3, 'Paris', 'France'),
(4, 'Tokyo', 'Japan'),
(5, 'Sydney', 'Australia');

Learning:

• To retrieve all rows from a table, we use the SELECT statement.


• The wildcard is used to select all columns from the table.
• No filtering is needed, so there’s no WHERE clause in this query.
• The query will return every row and every column in the cities table.

Answer:
SELECT *
FROM cities;

Explanation:

• The SELECT * statement will retrieve all columns (city_id,


city_name, and country) from the cities table.
• Since we are not applying any filters, it will return all rows from the
table.
• The result will include all 5 cities in the table: New York, London,
Paris, Tokyo, and Sydney.

o Q.7

Find all customers who are older than 30.

Datasets & Schemas


Customers Table:
CREATE TABLE customers (
customer_id INT PRIMARY KEY,
name VARCHAR(50),
age INT
);

INSERT INTO customers (customer_id, name, age)


VALUES
(1, 'Anna', 28),
(2, 'Brian', 35),
(3, 'Clara', 32),

28
1000+ SQL Interview Questions & Answers | By Zero Analyst

(4, 'David', 29),


(5, 'Ella', 40);

Learning:

• To filter customers who are older than a specific age, we use the WHERE
clause with a condition.
• The > (greater than) operator helps us filter rows where the age is
greater than 30.
• The query will return all customers who satisfy this condition (i.e.,
customers whose age is greater than 30).

Answer:
SELECT *
FROM customers
WHERE age > 30;

Explanation:

• The SELECT * statement retrieves all columns (customer_id, name,


and age) from the customers table.
• The WHERE age > 30 condition filters the rows to include only
customers who are older than 30.
• The result will include Brian, Clara, and Ella, as their ages are greater
than 30.

o Q.8

Display all animals whose species is 'Dog'.

Datasets & Schemas


Animals Table:
CREATE TABLE animals (
animal_id INT PRIMARY KEY,
name VARCHAR(50),
species VARCHAR(50),
age INT
);

INSERT INTO animals (animal_id, name, species, age)


VALUES
(1, 'Buddy', 'Dog', 5),
(2, 'Whiskers', 'Cat', 3),
(3, 'Max', 'Dog', 7),
(4, 'Charlie', 'Dog', 2),

29
1000+ SQL Interview Questions & Answers | By Zero Analyst

(5, 'Luna', 'Cat', 4);

Learning:

• To filter rows based on a specific condition, we use the WHERE clause.


• The condition species = 'Dog' ensures we only retrieve rows where
the species of the animal is 'Dog'.
• This query will return all columns (animal_id, name, species, and
age) for the animals that meet this condition.

Answer:
SELECT *
FROM animals
WHERE species = 'Dog';

Explanation:

• The SELECT * statement retrieves all columns (animal_id, name,


species, and age) from the animals table.
• The WHERE species = 'Dog' condition filters the results to show
only animals whose species is 'Dog'.
• The result will include the animals Buddy, Max, and Charlie, as they
belong to the 'Dog' species.

o Q.9

Count the number of movies in the movies table.

Datasets & Schemas


Movies Table:
CREATE TABLE movies (
movie_id INT PRIMARY KEY,
title VARCHAR(100),
genre VARCHAR(50),
release_year INT
);

INSERT INTO movies (movie_id, title, genre, release_year)


VALUES
(1, 'The Shawshank Redemption', 'Drama', 1994),
(2, 'Inception', 'Sci-Fi', 2010),
(3, 'The Godfather', 'Crime', 1972),
(4, 'Frozen', 'Animation', 2013),
(5, 'Avengers: Endgame', 'Action', 2019);

30
1000+ SQL Interview Questions & Answers | By Zero Analyst

Learning:

• To count the total number of rows in a table, we use the COUNT()


aggregate function.
• The COUNT(*) function counts all rows, regardless of the column's
value, making it ideal for counting the total number of movies.
• The query will return a single number representing the total number of
rows (i.e., the total number of movies in the table).

Answer:
SELECT COUNT(*) AS total_movies
FROM movies;

Explanation:

• The COUNT(*) function counts all rows in the movies table.


• The result will be a single value indicating the total number of movies.
In this case, the result should return 5, as there are 5 movies in the
table.

o Q.10

Find all transactions with an amount greater than 100.

Datasets & Schemas


Transactions Table:
CREATE TABLE transactions (
transaction_id INT PRIMARY KEY,
customer_id INT,
amount DECIMAL(10, 2),
transaction_date DATE
);

INSERT INTO transactions (transaction_id, customer_id, amount, transaction


_date)
VALUES
(1, 1, 50.00, '2024-12-01'),
(2, 2, 150.00, '2024-12-02'),
(3, 3, 75.00, '2024-12-03'),
(4, 4, 200.00, '2024-12-04'),
(5, 5, 300.00, '2024-12-05');

Learning:

• To filter rows based on a condition, we use the WHERE clause.

31
1000+ SQL Interview Questions & Answers | By Zero Analyst

• The > (greater than) operator allows us to retrieve rows where the
amount is greater than 100.
• This query will return all columns (transaction_id, customer_id,
amount, transaction_date) for the transactions where the amount is
greater than 100.

Answer:
SELECT *
FROM transactions
WHERE amount > 100;

Explanation:

• The SELECT * statement retrieves all columns from the transactions


table.
• The WHERE amount > 100 condition filters the rows to include only
transactions where the amount is greater than 100.
• The result will include the transactions with transaction_id 2, 4, and
5, as these transactions have amounts greater than 100.

o Q.11

List all the employees working at TCS.

Datasets & Schemas


Employees Table:
CREATE TABLE employees (
employee_id INT PRIMARY KEY,
name VARCHAR(50),
company VARCHAR(50),
city VARCHAR(50)
);

INSERT INTO employees (employee_id, name, company, city)


VALUES
(1, 'Amit', 'TCS', 'Mumbai'),
(2, 'Riya', 'Infosys', 'Bangalore'),
(3, 'Karan', 'TCS', 'Chennai'),
(4, 'Sara', 'Wipro', 'Hyderabad'),
(5, 'Neha', 'TCS', 'Pune');

Learning:

• To filter rows based on a specific value in a column, we use the WHERE


clause.

32
1000+ SQL Interview Questions & Answers | By Zero Analyst

• The WHERE company = 'TCS' condition ensures we only retrieve rows


where the company column is equal to 'TCS'.
• This query will return all columns (employee_id, name, company, and
city) for the employees working at TCS.

Answer:
SELECT *
FROM employees
WHERE company = 'TCS';

Explanation:

• The SELECT * statement retrieves all columns from the employees


table.
• The WHERE company = 'TCS' condition filters the rows to include
only employees whose company is 'TCS'.
• The result will include Amit, Karan, and Neha, as they are the
employees working at TCS.

o Q.12

Find all Flipkart products priced above 1000.

Datasets & Schemas


Products Table:
CREATE TABLE products (
product_id INT PRIMARY KEY,
product_name VARCHAR(50),
company VARCHAR(50),
price DECIMAL(10, 2)
);

INSERT INTO products (product_id, product_name, company, price)


VALUES
(1, 'Smartphone', 'Flipkart', 1500.00),
(2, 'Shoes', 'Myntra', 800.00),
(3, 'Laptop', 'Flipkart', 50000.00),
(4, 'T-shirt', 'Ajio', 500.00),
(5, 'Headphones', 'Flipkart', 1200.00);

Learning:

• To filter products based on both price and company, we use the WHERE
clause with multiple conditions.

33
1000+ SQL Interview Questions & Answers | By Zero Analyst

• We can combine conditions using the AND operator. In this case, we


want to filter by both the company being 'Flipkart' and the price being
greater than 1000.
• This query will return all columns (product_id, product_name,
company, price) for the products from Flipkart that are priced above
1000.

Answer:
SELECT *
FROM products
WHERE company = 'Flipkart'
AND price > 1000;

Explanation:

• The SELECT * statement retrieves all columns from the products


table.
• The WHERE company = 'Flipkart' AND price > 1000 condition
filters the rows to include only those products where the company is
'Flipkart' and the price is greater than 1000.
• The result will include Smartphone, Laptop, and Headphones, as
these are the Flipkart products with prices above 1000.

o Q.13

Count the number of Ola rides taken in December 2024.

Datasets & Schemas


Rides Table:
CREATE TABLE rides (
ride_id INT PRIMARY KEY,
customer_name VARCHAR(50),
company VARCHAR(50),
ride_date DATE
);

INSERT INTO rides (ride_id, customer_name, company, ride_date)


VALUES
(1, 'Raj', 'Ola', '2024-12-01'),
(2, 'Simran', 'Uber', '2024-12-03'),
(3, 'Arjun', 'Ola', '2024-12-05'),
(4, 'Deepa', 'Ola', '2024-12-10'),
(5, 'Maya', 'Uber', '2024-12-15');

Learning:

34
1000+ SQL Interview Questions & Answers | By Zero Analyst

• To count rows based on a specific condition, we use the COUNT()


function.
• We can filter rows based on the company column (where the value is
'Ola') and the ride_date column, ensuring the rides occurred in
December 2024.
• The COUNT(*) function will return the number of rows that match both
conditions.

Answer:
SELECT COUNT(*) AS ola_ride_count
FROM rides
WHERE company = 'Ola'
AND ride_date BETWEEN '2024-12-01' AND '2024-12-31';

Explanation:

• The SELECT COUNT(*) function counts the total number of rows in the
rides table that meet the specified conditions.
• The WHERE company = 'Ola' condition filters for rides taken with the
company 'Ola'.
• The AND ride_date BETWEEN '2024-12-01' AND '2024-12-31'
condition ensures that only rides taken in December 2024 are counted.
• The result will be 3, as there are three rides taken with 'Ola' in
December 2024 (on the 1st, 5th, and 10th).

o Q.14

List all cities where Wipro has offices.

Datasets & Schemas


Offices Table:
CREATE TABLE offices (
office_id INT PRIMARY KEY,
company VARCHAR(50),
city VARCHAR(50)
);

INSERT INTO offices (office_id, company, city)


VALUES
(1, 'TCS', 'Mumbai'),
(2, 'Infosys', 'Bangalore'),
(3, 'Wipro', 'Hyderabad'),
(4, 'Wipro', 'Kolkata'),
(5, 'Infosys', 'Chennai');

35
1000+ SQL Interview Questions & Answers | By Zero Analyst

Learning:

• To retrieve distinct (unique) values from a column, we use the


DISTINCT keyword.
• In this case, we want to list all the cities where the company is Wipro.
• The DISTINCT keyword ensures that we only get each city once, even
if multiple offices of Wipro exist in the same city.

Answer:
SELECT DISTINCT city
FROM offices
WHERE company = 'Wipro';

Explanation:

• The SELECT DISTINCT city statement retrieves all unique cities from
the offices table.
• The WHERE company = 'Wipro' condition filters the rows to include
only those where the company is 'Wipro'.
• The result will return Hyderabad and Kolkata, as these are the cities
where Wipro has offices.

o Q.15

Find the total revenue generated by Zomato orders.

Datasets & Schemas


Orders Table:
CREATE TABLE orders (
order_id INT PRIMARY KEY,
company VARCHAR(50),
amount DECIMAL(10, 2)
);

INSERT INTO orders (order_id, company, amount)


VALUES
(1, 'Zomato', 500.00),
(2, 'Swiggy', 300.00),
(3, 'Zomato', 700.00),
(4, 'Swiggy', 200.00),
(5, 'Zomato', 1000.00);

Learning:

36
1000+ SQL Interview Questions & Answers | By Zero Analyst

• To calculate the total revenue or sum of a specific column, we use the


SUM() function.
• The WHERE clause allows us to filter the rows based on a condition—in
this case, we want to filter for orders from Zomato.
• The SUM(amount) function will add up the amount values for the
Zomato orders.

Answer:
SELECT SUM(amount) AS total_revenue
FROM orders
WHERE company = 'Zomato';

Explanation:

• The SELECT SUM(amount) function calculates the total sum of the


amount column for the filtered rows.
• The WHERE company = 'Zomato' condition filters the rows to include
only orders from Zomato.
• The result will return the total revenue, which is the sum of the
amounts for the Zomato orders (500 + 700 + 1000 = 2200).

o Q.16

Retrieve all customers who have booked flights with Indigo.

Datasets & Schemas


Flights Table:
CREATE TABLE flights (
flight_id INT PRIMARY KEY,
customer_name VARCHAR(50),
airline VARCHAR(50)
);

INSERT INTO flights (flight_id, customer_name, airline)


VALUES
(1, 'Akshay', 'Indigo'),
(2, 'Meera', 'Air India'),
(3, 'Vishal', 'Indigo'),
(4, 'Nidhi', 'Vistara'),
(5, 'Kiran', 'Indigo');

Learning:

37
1000+ SQL Interview Questions & Answers | By Zero Analyst

• To filter rows based on a specific value in a column, we use the WHERE


clause.
• The WHERE airline = 'Indigo' condition filters the rows to include
only those where the airline column is equal to 'Indigo'.
• This query will return all columns (flight_id, customer_name,
airline) for the customers who have booked flights with Indigo.

Answer:
SELECT *
FROM flights
WHERE airline = 'Indigo';

Explanation:

• The SELECT * statement retrieves all columns from the flights table.
• The WHERE airline = 'Indigo' condition filters the rows to include
only those where the airline is Indigo.
• The result will include Akshay, Vishal, and Kiran as these are the
customers who booked flights with Indigo.

o Q.17

Find all ITC products in the products table.

Datasets & Schemas


Products Table:
CREATE TABLE products (
product_id INT PRIMARY KEY,
product_name VARCHAR(50),
company VARCHAR(50),
category VARCHAR(50)
);

INSERT INTO products (product_id, product_name, company, category)


VALUES
(1, 'Bingo', 'ITC', 'Snacks'),
(2, 'Notebook', 'Classmate', 'Stationery'),
(3, 'Sunfeast', 'ITC', 'Biscuits'),
(4, 'Shampoo', 'HUL', 'Personal Care'),
(5, 'Aashirvaad', 'ITC', 'Flour');

Learning:

• To filter products based on a specific company, we can use the WHERE


clause.

38
1000+ SQL Interview Questions & Answers | By Zero Analyst

• The WHERE company = 'ITC' condition will filter the rows to return
only products where the company is ITC.
• This query will return all columns (product_id, product_name,
company, category) for ITC products.

Answer:
SELECT *
FROM products
WHERE company = 'ITC';

Explanation:

• The SELECT * statement retrieves all columns from the products


table.
• The WHERE company = 'ITC' condition filters the rows to include
only products manufactured by ITC.
• The result will include Bingo, Sunfeast, and Aashirvaad, which are
the ITC products in the table.

o Q.18

Find all Jio customers who recharged for more than 300.

Datasets & Schemas


Recharges Table:
CREATE TABLE recharges (
recharge_id INT PRIMARY KEY,
customer_name VARCHAR(50),
company VARCHAR(50),
amount DECIMAL(10, 2)
);

INSERT INTO recharges (recharge_id, customer_name, company, amount)


VALUES
(1, 'Ankit', 'Jio', 350.00),
(2, 'Rohit', 'Airtel', 200.00),
(3, 'Priya', 'Jio', 400.00),
(4, 'Sneha', 'Vodafone', 250.00),
(5, 'Vivek', 'Jio', 150.00);

Learning:

• To filter rows based on two conditions, we use the WHERE clause with
multiple conditions connected by AND.
• The first condition filters by company = 'Jio', and the second
condition filters by amount > 300.

39
1000+ SQL Interview Questions & Answers | By Zero Analyst

• This query will return only Jio customers who have recharged for
amounts greater than 300.

Answer:
SELECT *
FROM recharges
WHERE company = 'Jio' AND amount > 300;

Explanation:

• The SELECT * statement retrieves all columns from the recharges


table.
• The WHERE company = 'Jio' condition filters the rows to include
only those with the company 'Jio'.
• The AND amount > 300 condition ensures that only the rows where
the recharge amount is greater than 300 are included.
• The result will include Ankit and Priya, as they are the Jio customers
who recharged for more than 300.

o Q.19

Count the number of Paytm transactions done in November 2024.

Datasets & Schemas


Transactions Table:
CREATE TABLE transactions (
transaction_id INT PRIMARY KEY,
company VARCHAR(50),
amount DECIMAL(10, 2),
transaction_date DATE
);

INSERT INTO transactions (transaction_id, company, amount, transaction_dat


e)
VALUES
(1, 'Paytm', 100.00, '2024-11-01'),
(2, 'Google Pay', 200.00, '2024-11-02'),
(3, 'Paytm', 300.00, '2024-11-03'),
(4, 'PhonePe', 150.00, '2024-11-04'),
(5, 'Paytm', 250.00, '2024-11-05');

Learning:

• To count the number of rows that meet a certain condition, we use the
COUNT() function.

40
1000+ SQL Interview Questions & Answers | By Zero Analyst

• The WHERE clause is used to filter rows based on specific conditions,


such as the company being 'Paytm' and the transaction date being
in November 2024.
• In this case, we need to count the number of transactions for Paytm
that occurred in November 2024.

Answer:
SELECT COUNT(*) AS paytm_transactions_november
FROM transactions
WHERE company = 'Paytm'
AND transaction_date BETWEEN '2024-11-01' AND '2024-11-30';

Explanation:

• The COUNT(*) function counts the total number of rows that match the
given conditions.
• The WHERE company = 'Paytm' condition filters the rows to include
only Paytm transactions.
• The AND transaction_date BETWEEN '2024-11-01' AND '2024-
11-30' condition filters the transactions to include only those that
occurred in November 2024.
• The result will give the total number of Paytm transactions in
November 2024.

o Q.20

List all customers who have accounts in SBI.

Datasets & Schemas


Accounts Table:
CREATE TABLE accounts (
account_id INT PRIMARY KEY,
customer_name VARCHAR(50),
bank_name VARCHAR(50)
);

INSERT INTO accounts (account_id, customer_name, bank_name)


VALUES
(1, 'Ramesh', 'SBI'),
(2, 'Geeta', 'HDFC'),
(3, 'Suresh', 'SBI'),
(4, 'Kavita', 'ICICI'),
(5, 'Anjali', 'SBI');

Learning:

41
1000+ SQL Interview Questions & Answers | By Zero Analyst

• To filter rows based on a specific value in a column, we can use the


WHERE clause.
• In this case, we want to filter the rows to include only those where the
bank_name is 'SBI'.
• This query will return the names of all customers who have accounts in
the SBI bank.

Answer:
SELECT customer_name
FROM accounts
WHERE bank_name = 'SBI';

Explanation:

• The SELECT customer_name statement retrieves the names of the


customers from the accounts table.
• The WHERE bank_name = 'SBI' condition filters the rows to only
include those where the bank name is SBI.
• The result will include the customer_name for all customers who have
an account with SBI.

• Easy
o Q.21

Find all suppliers who provide products in the Electronics category.

Datasets & Schemas


Suppliers Table:
CREATE TABLE suppliers (
supplier_id INT PRIMARY KEY,
supplier_name VARCHAR(100)
);

Products Table:
CREATE TABLE products (
product_id INT PRIMARY KEY,
product_name VARCHAR(100),
category VARCHAR(50),
supplier_id INT
);

INSERT INTO suppliers VALUES


(1, 'Supplier A'),
(2, 'Supplier B');

INSERT INTO products VALUES


(1, 'Smartphone', 'Electronics', 1),
(2, 'Washing Machine', 'Appliances', 2);

42
1000+ SQL Interview Questions & Answers | By Zero Analyst

Learning:

• To find suppliers providing products in a specific category, we need to


join the suppliers and products tables using the supplier_id.
• The INNER JOIN allows us to combine the two tables on the common
supplier_id.
• The WHERE clause is used to filter the products by the category value
('Electronics' in this case).

Answer:
SELECT DISTINCT s.supplier_name
FROM suppliers s
JOIN products p ON s.supplier_id = p.supplier_id
WHERE p.category = 'Electronics';

Explanation:

• The SELECT DISTINCT s.supplier_name retrieves the unique names


of suppliers who provide products in the specified category.
• The JOIN products p ON s.supplier_id = p.supplier_id
ensures that we match each product to its corresponding supplier based
on the supplier_id.
• The WHERE p.category = 'Electronics' condition filters the
results to include only products in the Electronics category.
• The DISTINCT keyword ensures that we only list each supplier once,
even if they provide multiple products in the Electronics category.

o Q.22

List all orders along with customer names from the orders and customers
tables.

Datasets & Schemas


Customers Table:
CREATE TABLE customers (
id INT PRIMARY KEY,
first_name VARCHAR(50),
last_name VARCHAR(50)
);

Orders Table:
CREATE TABLE orders (
id INT PRIMARY KEY,
customer_id INT,

43
1000+ SQL Interview Questions & Answers | By Zero Analyst

order_date DATE
);

INSERT INTO customers VALUES


(1, 'Raj', 'Verma'),
(2, 'Sneha', 'Rao');

INSERT INTO orders VALUES


(1, 1, '2023-05-12'),
(2, 2, '2023-05-14');

Learning:

• To retrieve all orders with the corresponding customer names, we need


to join the orders and customers tables using the customer_id.
• The INNER JOIN allows us to combine the two tables where the
customer_id matches in both tables.
• We'll select the first_name and last_name from the customers table
and the order_date from the orders table.

Answer:
SELECT c.first_name, c.last_name, o.order_date
FROM orders o
JOIN customers c ON o.customer_id = c.id;

Explanation:

• The SELECT c.first_name, c.last_name, o.order_date retrieves


the customer names and their respective order dates.
• The JOIN customers c ON o.customer_id = c.id joins the
orders table with the customers table based on the customer_id
from the orders table and the id from the customers table.
• This query ensures that each order is associated with the correct
customer.

o Q.23

Get the number of unique orders placed in January 2024.

Datasets & Schemas


Orders Table:
CREATE TABLE orders (
order_id INT PRIMARY KEY,
order_date DATE
);

INSERT INTO orders VALUES

44
1000+ SQL Interview Questions & Answers | By Zero Analyst

(1, '2024-01-05'),
(1, '2024-01-05'),
(2, '2024-01-15'),
(3, '2023-02-10');

Learning:

• To count unique orders in January 2024, we can use the COUNT


function along with DISTINCT to ensure we count only unique
order_id values.
• We also need to filter the orders placed in January 2024 using the
WHERE clause.
• The DATE function can help us filter the orders by date.

Answer:
SELECT COUNT(DISTINCT order_id) AS unique_orders
FROM orders
WHERE order_date BETWEEN '2024-01-01' AND '2024-01-31';

Explanation:

• The COUNT(DISTINCT order_id) counts the number of unique order


IDs, ensuring that duplicate entries for the same order are not counted
multiple times.
• The WHERE order_date BETWEEN '2024-01-01' AND '2024-01-
31' clause filters the orders to only those placed in January 2024.
• The result will return the count of unique orders placed during that
month.

o Q.24

Find all suppliers who provide products in the Electronics category.

Datasets & Schemas


Suppliers Table:
CREATE TABLE suppliers (
supplier_id INT PRIMARY KEY,
supplier_name VARCHAR(100)
);

Products Table:
CREATE TABLE products (
product_id INT PRIMARY KEY,
product_name VARCHAR(100),
category VARCHAR(50),
supplier_id INT

45
1000+ SQL Interview Questions & Answers | By Zero Analyst

);

INSERT INTO suppliers VALUES


(1, 'Supplier A'),
(2, 'Supplier B');

INSERT INTO products VALUES


(1, 'Smartphone', 'Electronics', 1),
(2, 'Washing Machine', 'Appliances', 2);

Learning:

• To find suppliers who provide products in the Electronics category,


we need to join the suppliers and products tables.
• The INNER JOIN is used to combine the data from both tables based on
the supplier_id field.
• We will filter the products where the category is 'Electronics'
using the WHERE clause.

Answer:
SELECT DISTINCT s.supplier_name
FROM suppliers s
JOIN products p ON s.supplier_id = p.supplier_id
WHERE p.category = 'Electronics';

Explanation:

• The SELECT DISTINCT s.supplier_name retrieves the names of


suppliers that provide products in the Electronics category.
• The JOIN combines the suppliers and products tables based on the
supplier_id field.
• The WHERE p.category = 'Electronics' filters the products to
include only those in the Electronics category.
• DISTINCT ensures that suppliers are not repeated if they provide
multiple Electronics products.

o Q.25

Count the total number of transactions per customer.

Datasets & Schemas


Transactions Table:
CREATE TABLE transactions (
transaction_id INT PRIMARY KEY,
customer_id INT
);

46
1000+ SQL Interview Questions & Answers | By Zero Analyst

INSERT INTO transactions VALUES


(1, 1),
(2, 2),
(3, 1),
(4, 3);

Learning:

• To count the total number of transactions per customer, we need to


group the data by customer_id.
• The COUNT function will be used to count the number of transactions
for each customer.
• The GROUP BY clause will ensure that the count is calculated for each
unique customer_id.

Answer:
SELECT customer_id, COUNT(transaction_id) AS total_transactions
FROM transactions
GROUP BY customer_id;

Explanation:

• The SELECT customer_id, COUNT(transaction_id) AS


total_transactions selects each customer_id and counts the
number of transactions associated with them.
• The GROUP BY customer_id ensures that the count is performed for
each distinct customer.
• The result will show the customer_id along with the total number of
transactions for each customer.

o Q.26

Given the employee table with columns EMP_ID and SALARY, write an SQL
query to find all salaries greater than the average salary. Return EMP_ID and
SALARY.

Datasets & Schemas


Employee Table:
DROP TABLE IF EXISTS employee;

CREATE TABLE employee (


EMP_ID INT PRIMARY KEY,
SALARY DECIMAL(10, 2)
);

47
1000+ SQL Interview Questions & Answers | By Zero Analyst

INSERT INTO employee (EMP_ID, SALARY) VALUES


(1, 50000),
(2, 60000),
(3, 70000),
(4, 45000),
(5, 80000),
(6, 55000),
(7, 75000),
(8, 62000),
(9, 48000),
(10, 85000);

Learning:

• To find salaries greater than the average salary, we will first calculate
the average salary using the AVG() function.
• Then, we will filter the employees whose salary is greater than the
calculated average.
• The HAVING clause can be used to filter results based on aggregate
functions like AVG(), but in this case, we can also use a subquery to
compare each employee's salary to the average salary.

Answer:
SELECT EMP_ID, SALARY
FROM employee
WHERE SALARY > (SELECT AVG(SALARY) FROM employee);

Explanation:

• The SELECT AVG(SALARY) FROM employee subquery calculates the


average salary for all employees.
• The outer query retrieves the EMP_ID and SALARY for all employees
whose salary is greater than the calculated average.
• The condition WHERE SALARY > ensures that only salaries greater than
the average salary are returned.

o Q.27

List all courses offered by a training institute.

Datasets & Schemas


Courses Table:
CREATE TABLE courses (
course_id INT,
course_name VARCHAR(100),
duration VARCHAR(20)
);

48
1000+ SQL Interview Questions & Answers | By Zero Analyst

INSERT INTO courses VALUES


(1, 'Data Science', '6 months'),
(2, 'Web Development', '3 months'),
(3, 'Digital Marketing', '2 months'),
(4, 'AI & ML', '4 months'),
(5, 'Cloud Computing', '5 months');

Learning:

• To list all courses offered, a simple SELECT query is sufficient.


• We will retrieve all columns from the courses table, i.e., course_id,
course_name, and duration.

Answer:
SELECT *
FROM courses;

Explanation:

• The SELECT * query retrieves all columns from the courses table,
which includes the course ID, course name, and duration.
• There is no need to filter the data as the question asks for all available
courses.

o Q.28

Find all public sector banks established before 2000.

Datasets & Schemas


Public Sector Banks Table:
CREATE TABLE public_sector_banks (
bank_id INT PRIMARY KEY,
bank_name VARCHAR(100),
established_year INT
);

INSERT INTO public_sector_banks VALUES


(1, 'State Bank of India', 1955),
(2, 'Bank of Baroda', 1908),
(3, 'Union Bank of India', 2001);

Learning:

• To retrieve banks established before the year 2000, we will use the
WHERE clause with a condition that filters the established_year.

49
1000+ SQL Interview Questions & Answers | By Zero Analyst

• We will compare the established_year to 2000 using the < operator.

Answer:
SELECT *
FROM public_sector_banks
WHERE established_year < 2000;

Explanation:

• The query retrieves all the columns from the public_sector_banks


table.
• The WHERE established_year < 2000 condition filters the results to
show only banks established before the year 2000.
• In this dataset, State Bank of India (1955) and Bank of Baroda
(1908) meet this condition.

o Q.29

Find the total revenue generated from each product.

Datasets & Schemas


Sales Table:
CREATE TABLE sales (
sale_id INT PRIMARY KEY,
product_id INT,
revenue DECIMAL(10, 2)
);

INSERT INTO sales VALUES


(1, 1, 50000.00),
(2, 2, 75000.00),
(3, 1, 30000.00);

Products Table:
CREATE TABLE products (
product_id INT PRIMARY KEY,
product_name VARCHAR(100)
);

INSERT INTO products VALUES


(1, 'Smartphone'),
(2, 'Laptop'),
(3, 'Washing Machine');

Learning:

• To find the total revenue for each product, we will use the SUM()
function, which calculates the total revenue for each product_id.

50
1000+ SQL Interview Questions & Answers | By Zero Analyst

• We need to join the sales table with the products table on the
product_id to get the product name along with the total revenue.
• Use GROUP BY to aggregate the results by product_name.

Answer:
SELECT p.product_name, SUM(s.revenue) AS total_revenue
FROM sales s
JOIN products p ON s.product_id = p.product_id
GROUP BY p.product_name;

Explanation:

• JOIN: We are joining the sales table and the products table using the
product_id column so we can display the product names alongside
the revenue.
• SUM(s.revenue): This sums up the revenue for each product.
• GROUP BY p.product_name: This groups the results by the product
name so that we get the total revenue for each individual product.

Output Example:

product_name total_revenue

Smartphone 80000.00

Laptop 75000.00

Washing Machine 0.00

o Q.30

List products with stock less than 10.

Datasets & Schemas


Products Table:
CREATE TABLE products (
product_id INT PRIMARY KEY,
product_name VARCHAR(100),
stock INT
);

INSERT INTO products VALUES


(1, 'Smartphone', 5),
(2, 'Laptop', 20),

51
1000+ SQL Interview Questions & Answers | By Zero Analyst

(3, 'T-shirt', 8);

Learning:

• To filter products with stock less than a certain value, we use the
WHERE clause.
• The < operator will help us filter products with stock less than 10.
• We can simply select the product_name and stock columns to display
the required results.

Answer:
SELECT product_name, stock
FROM products
WHERE stock < 10;

Explanation:

• WHERE stock < 10: This condition filters out the products whose
stock is less than 10.
• The query will return the product_name and stock of all products that
meet this condition.

Output Example:

product_name stock

Smartphone 5

T-shirt 8

o Q.31

Retrieve the details of all mobile phones with their brands.

Datasets & Schemas


Mobile Phones Table:
CREATE TABLE mobile_phones (
phone_id INT,
phone_name VARCHAR(100),
brand VARCHAR(100)
);

52
1000+ SQL Interview Questions & Answers | By Zero Analyst

INSERT INTO mobile_phones VALUES


(1, 'iPhone 14', 'Apple'),
(2, 'Galaxy S21', 'Samsung'),
(3, 'OnePlus 9', 'OnePlus'),
(4, 'Poco X3', 'Xiaomi'),
(5, 'Nokia G20', 'Nokia');

Learning:

• We can retrieve details from the mobile_phones table using a simple


SELECT query.
• The query should return the phone_name and brand for each mobile
phone.

Answer:
SELECT phone_name, brand
FROM mobile_phones;

Explanation:

• The query retrieves the phone_name and brand for all rows from the
mobile_phones table.
• No filter is applied, so it returns all records in the table.

Output Example:

phone_name brand

iPhone 14 Apple

Galaxy S21 Samsung

OnePlus 9 OnePlus

Poco X3 Xiaomi

Nokia G20 Nokia

o Q.32

Write a SQL query to find the total number of employees in each company.

53
1000+ SQL Interview Questions & Answers | By Zero Analyst

Datasets & Schemas


Companies Table:
CREATE TABLE companies (
company_id INT PRIMARY KEY,
company_name VARCHAR(100)
);

Employees Table:
CREATE TABLE employees (
employee_id INT PRIMARY KEY,
employee_name VARCHAR(100),
company_id INT,
salary DECIMAL(10, 2),
FOREIGN KEY (company_id) REFERENCES companies(company_id)
);

Data Insertion:
INSERT INTO companies (company_id, company_name) VALUES
(1, 'TechCorp'),
(2, 'HealthInc'),
(3, 'FinanceSolutions'),
(4, 'EduGlobal'),
(5, 'RetailWorld');

INSERT INTO employees (employee_id, employee_name, company_id, salary) VAL


UES
(1, 'Alice', 1, 90000),
(2, 'Bob', 2, 70000),
(3, 'Charlie', 1, 80000),
(4, 'David', 3, 95000),
(5, 'Eva', 4, 65000),
(6, 'Frank', 5, 60000),
(7, 'Grace', 2, 72000);

Learning:

• To find the total number of employees for each company, you can use
the GROUP BY clause to group the employees by company.
• The COUNT() function will then count the number of employees for
each company.

Answer:
SELECT c.company_name, COUNT(e.employee_id) AS total_employees
FROM companies c
JOIN employees e ON c.company_id = e.company_id
GROUP BY c.company_name;

Explanation:

• JOIN: This is used to combine the companies and employees tables on


the company_id column.
• COUNT(e.employee_id): This counts the number of employees
(employee_id) for each company.

54
1000+ SQL Interview Questions & Answers | By Zero Analyst

• GROUP BY c.company_name: This groups the result by company_name


so that the count of employees is calculated for each company.

Output Example:

company_name total_employees

TechCorp 2

HealthInc 2

FinanceSolutions 1

EduGlobal 1

RetailWorld 1

o Q.33

List all Indian tech companies with more than 50,000 employees.

Datasets & Schemas


Tech Companies Table:
CREATE TABLE tech_companies (
company_id INT PRIMARY KEY,
company_name VARCHAR(100),
employees INT
);

Data Insertion:
INSERT INTO tech_companies VALUES
(1, 'Infosys', 25000),
(2, 'TCS', 150000),
(3, 'Wipro', 20000);

Learning:

• To filter companies based on the number of employees, we can use the


WHERE clause.
• In this case, we need to retrieve only those companies where the
employees column has a value greater than 50,000.

55
1000+ SQL Interview Questions & Answers | By Zero Analyst

Answer:
SELECT company_name, employees
FROM tech_companies
WHERE employees > 50000;

Explanation:

• SELECT company_name, employees: This selects the company_name


and employees columns from the tech_companies table.
• WHERE employees > 50000: This filters the rows to include only
those companies where the number of employees is greater than
50,000.

Output Example:

company_name employees

TCS 150000

o Q.34

Get the names and contact numbers of all clients.

Datasets & Schemas


Clients Table:
CREATE TABLE clients (
client_id INT,
client_name VARCHAR(100),
contact_number VARCHAR(15)
);

Data Insertion:
INSERT INTO clients VALUES
(1, 'ABC Corp', '9876543210'),
(2, 'XYZ Ltd.', '9123456780'),
(3, 'Tech Solutions', '8765432100'),
(4, 'Innovatech', '9988776655'),
(5, 'Alpha Industries', '9988123456');

Learning:

• This query involves a simple SELECT statement to retrieve specific


columns from the clients table.
• We will retrieve the client_name and contact_number columns.

56
1000+ SQL Interview Questions & Answers | By Zero Analyst

Answer:
SELECT client_name, contact_number
FROM clients;

Explanation:

• SELECT client_name, contact_number: This selects the


client_name and contact_number columns from the clients table.
• FROM clients: This specifies that we are retrieving data from the
clients table.

Output Example:

client_name contact_number

ABC Corp 9876543210

XYZ Ltd. 9123456780

Tech Solutions 8765432100

Innovatech 9988776655

Alpha Industries 9988123456

o Q.35

Get all manufacturers producing electric vehicles.

Datasets & Schemas


Manufacturers Table:
CREATE TABLE manufacturers (
manufacturer_id INT PRIMARY KEY,
manufacturer_name VARCHAR(100),
product_type VARCHAR(100)
);

Data Insertion:
INSERT INTO manufacturers VALUES
(1, 'Tata Motors', 'Electric Vehicle'),
(2, 'Mahindra', 'Diesel Vehicle'),
(3, 'Reva', 'Electric Vehicle');

Learning:

57
1000+ SQL Interview Questions & Answers | By Zero Analyst

• To filter manufacturers producing electric vehicles, we will use a


WHERE clause to match the product_type column with the value
'Electric Vehicle'.
• WHERE clause is used to filter records based on specific conditions.

Answer:
SELECT manufacturer_name
FROM manufacturers
WHERE product_type = 'Electric Vehicle';

Explanation:

• SELECT manufacturer_name: This selects the manufacturer_name


column from the manufacturers table.
• FROM manufacturers: Specifies the table from which to retrieve data.
• WHERE product_type = 'Electric Vehicle': Filters the results to
include only those rows where the product_type is 'Electric
Vehicle'.

Output Example:
manufacturer_name

Tata Motors

Reva

o Q.36

Retrieve the top 3 highest-paid employees from the employees table.

Datasets & Schemas


Employees Table:
CREATE TABLE employees (
emp_id INT PRIMARY KEY,
name VARCHAR(50),
salary DECIMAL(10,2)
);

Data Insertion:
INSERT INTO employees VALUES

58
1000+ SQL Interview Questions & Answers | By Zero Analyst

(1, 'Amit', 90000),


(2, 'Kavya', 75000),
(3, 'Rahul', 60000);

Learning:

• To retrieve the top N highest-paid employees, we will use the ORDER


BY clause to sort employees by salary in descending order.
• We can limit the results to the top 3 using the LIMIT keyword (in
MySQL and PostgreSQL) or TOP (in SQL Server).

Answer:

For MySQL or PostgreSQL:


SELECT emp_id, name, salary
FROM employees
ORDER BY salary DESC
LIMIT 3;

For SQL Server:


SELECT TOP 3 emp_id, name, salary
FROM employees
ORDER BY salary DESC;

Explanation:

• SELECT emp_id, name, salary: This selects the emp_id, name, and
salary columns from the employees table.
• ORDER BY salary DESC: Orders the employees by salary in
descending order (highest to lowest).
• LIMIT 3: Restricts the results to the top 3 highest-paid employees (for
MySQL/PostgreSQL).
• TOP 3: In SQL Server, the TOP keyword is used to fetch the first 3
rows.

Output Example:

emp_id name salary

1 Amit 90000

2 Kavya 75000

3 Rahul 60000

59
1000+ SQL Interview Questions & Answers | By Zero Analyst

This query will retrieve the top 3 highest-paid employees from the employees
table.

o Q.37

Find products that belong to the Electronics category.

Datasets & Schemas


Products Table:
CREATE TABLE products (
product_id INT PRIMARY KEY,
product_name VARCHAR(100),
category VARCHAR(100)
);

Data Insertion:
INSERT INTO products VALUES
(1, 'Smartphone', 'Electronics'),
(2, 'T-shirt', 'Clothing'),
(3, 'Laptop', 'Electronics');

Learning:

• To filter products belonging to the Electronics category, we will use


the WHERE clause.
• We will specify that the category column must be equal to
'Electronics'.

Answer:
SELECT product_id, product_name, category
FROM products
WHERE category = 'Electronics';

Explanation:

• SELECT product_id, product_name, category: This selects the


columns product_id, product_name, and category from the
products table.
• WHERE category = 'Electronics': This filters the products,
returning only those where the category is 'Electronics'.

Output Example:

product_id product_name category

60
1000+ SQL Interview Questions & Answers | By Zero Analyst

1 Smartphone Electronics

3 Laptop Electronics

This query will return all the products that belong to the Electronics category
from the products table.

o Q.38

List employees who work in the IT department.

Datasets & Schemas


Departments Table:
CREATE TABLE departments (
department_id INT,
department_name VARCHAR(100)
);

Employees Table:
CREATE TABLE employees (
employee_id INT,
employee_name VARCHAR(100),
department_id INT
);

Data Insertion:
INSERT INTO departments VALUES
(1, 'HR'),
(2, 'Finance'),
(3, 'IT'),
(4, 'Marketing');

INSERT INTO employees VALUES


(1, 'Ravi Gupta', 1),
(2, 'Nisha Patil', 3),
(3, 'Amit Shah', 2);

Learning:

• To find employees working in the IT department, we need to join the


employees table with the departments table using the
department_id.
• We can filter the result based on the department_name being 'IT'.

Answer:
SELECT e.employee_id, e.employee_name
FROM employees e
JOIN departments d ON e.department_id = d.department_id
WHERE d.department_name = 'IT';

61
1000+ SQL Interview Questions & Answers | By Zero Analyst

Explanation:

• SELECT e.employee_id, e.employee_name: This selects the


employee_id and employee_name columns from the employees table.
• JOIN departments d ON e.department_id = d.department_id:
This joins the employees table with the departments table on the
department_id column to get the department names.
• WHERE d.department_name = 'IT': This filters the results to only
include employees who belong to the IT department.

Output Example:

employee_id employee_name

2 Nisha Patil

This query will return the list of employees who are working in the IT
department.

o Q.39

Find all orders from customers in Bangalore.

Datasets & Schemas


Customers Table:
CREATE TABLE customers (
customer_id INT PRIMARY KEY,
customer_name VARCHAR(100),
city VARCHAR(100)
);

Orders Table:
CREATE TABLE orders (
order_id INT PRIMARY KEY,
customer_id INT
);

Data Insertion:
INSERT INTO customers VALUES
(1, 'Rajesh', 'Bangalore'),
(2, 'Aditi', 'Mumbai');

INSERT INTO orders VALUES


(1, 1),
(2, 2);

62
1000+ SQL Interview Questions & Answers | By Zero Analyst

Learning:

• To find orders placed by customers from a specific city, we need to


join the customers and orders tables.
• The WHERE clause will filter the results to include only those customers
whose city is 'Bangalore'.

Answer:
SELECT o.order_id, c.customer_name
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id
WHERE c.city = 'Bangalore';

Explanation:

• SELECT o.order_id, c.customer_name: This selects the order_id


from the orders table and the customer_name from the customers
table.
• JOIN customers c ON o.customer_id = c.customer_id: This
performs an inner join between the orders table and the customers
table based on the customer_id.
• WHERE c.city = 'Bangalore': This filters the results to include only
those customers who are from Bangalore.

Output Example:

order_id customer_name

1 Rajesh

o Q.40

Retrieve the names of all employees who are also managers. In other words,
find employees who appear as managers in the manager_id column.

Datasets & Schemas


Employees Table:
DROP TABLE IF EXISTS employees;

CREATE TABLE employees (


emp_id INT PRIMARY KEY,
name VARCHAR(100),
manager_id INT,

63
1000+ SQL Interview Questions & Answers | By Zero Analyst

FOREIGN KEY (manager_id) REFERENCES employees(emp_id)


);

Data Insertion:
INSERT INTO employees (emp_id, name, manager_id)
VALUES
(1, 'John Doe', NULL),
(2, 'Jane Smith', 1),
(3, 'Alice Johnson', 1),
(4, 'Bob Brown', 3),
(5, 'Emily White', NULL),
(6, 'Michael Lee', 3),
(7, 'David Clark', NULL),
(8, 'Sarah Davis', 2),
(9, 'Kevin Wilson', 2),
(10, 'Laura Martinez', 4);

Learning:

• To find employees who are also managers, we need to check for


employees who appear as a manager_id for other employees.
• This means looking for distinct emp_id values in the manager_id
column.
• We can use a self-join on the same table to achieve this.

Answer:
SELECT DISTINCT e.name
FROM employees e
WHERE e.emp_id IN (SELECT DISTINCT manager_id FROM employees WHERE
manager_id IS NOT NULL);

Explanation:

• SELECT DISTINCT e.name: This retrieves the distinct names of


employees who are managers.
• FROM employees e: We are querying the employees table (alias e).
• WHERE e.emp_id IN (SELECT DISTINCT manager_id FROM
employees WHERE manager_id IS NOT NULL): This subquery finds
all unique manager_id values from the employees table where the
manager_id is not NULL. The emp_id of these employees is then used
to filter the main query.
• DISTINCT ensures that we only retrieve unique names of employees
who are managers.

• Medium
o Q.41

Find the total revenue generated by each product category for Flipkart.

Explanation:

64
1000+ SQL Interview Questions & Answers | By Zero Analyst

In this question, we need to calculate the total revenue generated by each


product category. The revenue for each product can be calculated as the price
of the product multiplied by the quantity sold. The challenge is to group the
results by product category and sum the revenue for each group.
The solution requires:

1. Joining the products table (which contains product information) and


the orders table (which contains information about the quantity
ordered for each product).

2. Using GROUP BY to group the results by product category.

3. Using SUM to calculate the total revenue for each category.

Datasets and SQL Schemas:

Tables and Data


-- Products table
CREATE TABLE products (
product_id INT PRIMARY KEY,
product_name VARCHAR(50),
category VARCHAR(50),
price DECIMAL(10, 2)
);

-- Orders table
CREATE TABLE orders (
order_id INT PRIMARY KEY,
product_id INT,
quantity INT,
FOREIGN KEY (product_id) REFERENCES products(product_id)
);

-- Sample data for products


INSERT INTO products (product_id, product_name, category, price)
VALUES
(1, 'Smartphone', 'Electronics', 15000.00),
(2, 'Shoes', 'Footwear', 2000.00),
(3, 'Laptop', 'Electronics', 50000.00),
(4, 'T-shirt', 'Clothing', 500.00),
(5, 'Headphones', 'Electronics', 1500.00);

-- Sample data for orders


INSERT INTO orders (order_id, product_id, quantity)
VALUES
(1, 1, 3),
(2, 2, 4),
(3, 3, 2),
(4, 4, 10),
(5, 5, 5);

Learnings:

• JOIN: We learn how to use JOIN to combine data from two related
tables based on a common column (product_id).
• GROUP BY: We understand how to group data by a column (in this
case, the category) to perform aggregate operations.

65
1000+ SQL Interview Questions & Answers | By Zero Analyst

• SUM(): We use the SUM function to calculate the total revenue by


multiplying the price of each product by its quantity sold and
summing them up for each category.

This problem reinforces the basics of SQL aggregation and joins.

Solutions:

PostgreSQL Solution:
The solution works the same way in PostgreSQL because the query relies on
standard SQL functions (JOIN, SUM, GROUP BY), which are supported in both
PostgreSQL and MySQL.
-- PostgreSQL Solution
SELECT p.category,
SUM(p.price * o.quantity) AS total_revenue
FROM products p
JOIN orders o ON p.product_id = o.product_id
GROUP BY p.category;

MySQL Solution:
Similarly, the same solution will work in MySQL.
-- MySQL Solution
SELECT p.category,
SUM(p.price * o.quantity) AS total_revenue
FROM products p
JOIN orders o ON p.product_id = o.product_id
GROUP BY p.category;

o Q.42

Find the highest-paid employee in each department for Infosys.

Explanation:
We need to find the employee with the highest salary in each department. A
subquery with MAX(salary) can help retrieve the highest salary for each
department, and then we can join this result with the original table to get the
employee details.

Datasets and SQL Schemas:

Tables and Data


-- Employees table
CREATE TABLE employees (
employee_id INT PRIMARY KEY,
name VARCHAR(50),
department VARCHAR(50),
salary DECIMAL(10, 2)
);

-- Sample data for employees


INSERT INTO employees (employee_id, name, department, salary)
VALUES
(1, 'Ankit', 'HR', 50000.00),

66
1000+ SQL Interview Questions & Answers | By Zero Analyst

(2, 'Riya', 'Engineering', 100000.00),


(3, 'Vivek', 'HR', 60000.00),
(4, 'Sara', 'Engineering', 120000.00),
(5, 'Neha', 'Finance', 70000.00);

Learnings:

• Subqueries: How to use a subquery to find the maximum salary for


each department.
• Correlation: Understanding correlated subqueries where the subquery
refers to the outer query (department in this case).

Solutions:

PostgreSQL Solution:
-- PostgreSQL Solution
SELECT e.department, e.name, e.salary
FROM employees e
WHERE e.salary = (
SELECT MAX(salary)
FROM employees
WHERE department = e.department
);

MySQL Solution:
-- MySQL Solution
SELECT e.department, e.name, e.salary
FROM employees e
WHERE e.salary = (
SELECT MAX(salary)
FROM employees
WHERE department = e.department
);

o Q.43

List all customers who placed orders worth more than the average order value
on Swiggy.

Explanation:
We need to find customers who have placed orders with a total_amount
greater than the average order value. First, we calculate the average order
value, then we filter customers whose order values are above this average.

Datasets and SQL Schemas:

Tables and Data


-- Customers table
CREATE TABLE customers (
customer_id INT PRIMARY KEY,
name VARCHAR(50)
);

-- Orders table
CREATE TABLE orders (
order_id INT PRIMARY KEY,
customer_id INT,

67
1000+ SQL Interview Questions & Answers | By Zero Analyst

total_amount DECIMAL(10, 2),


FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
);

-- Sample data for customers


INSERT INTO customers (customer_id, name)
VALUES
(1, 'Rahul'),
(2, 'Priya'),
(3, 'Arjun'),
(4, 'Meera'),
(5, 'Kiran');

-- Sample data for orders


INSERT INTO orders (order_id, customer_id, total_amount)
VALUES
(1, 1, 500.00),
(2, 2, 800.00),
(3, 3, 600.00),
(4, 4, 700.00),
(5, 5, 300.00);

Learnings:

• JOIN: How to join two tables (customers and orders) to retrieve


relevant information.
• Subquery: Using a subquery to calculate the average order value.
• Comparison: Filtering records based on a comparison with the
average.

Solutions:

PostgreSQL Solution:
-- PostgreSQL Solution
SELECT c.name, o.total_amount
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
WHERE o.total_amount > (
SELECT AVG(total_amount)
FROM orders
);

MySQL Solution:
-- MySQL Solution
SELECT c.name, o.total_amount
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
WHERE o.total_amount > (
SELECT AVG(total_amount)
FROM orders
);

o Q.44

Find all cities where TCS has more than 3 employees.

Explanation:
To solve this, we need to:

68
1000+ SQL Interview Questions & Answers | By Zero Analyst

1. Group the employees by city.

2. Count the number of employees in each city.

3. Filter cities where the count of employees is greater than 3.

Datasets and SQL Schemas:

Tables and Data


-- Employees table
CREATE TABLE employees (
employee_id INT PRIMARY KEY,
name VARCHAR(50),
company VARCHAR(50),
city VARCHAR(50)
);

-- Sample data for employees


INSERT INTO employees (employee_id, name, company, city)
VALUES
(1, 'Amit', 'TCS', 'Mumbai'),
(2, 'Riya', 'TCS', 'Mumbai'),
(3, 'Karan', 'TCS', 'Pune'),
(4, 'Sara', 'TCS', 'Mumbai'),
(5, 'Neha', 'TCS', 'Pune'),
(6, 'Raj', 'TCS', 'Hyderabad'),
(7, 'Meena', 'TCS', 'Mumbai');

Learnings:

• GROUP BY: Grouping data by city to aggregate the number of


employees.
• HAVING: Filtering the results after aggregation (i.e., cities with more
than 3 employees).

Solutions:

PostgreSQL Solution:
-- PostgreSQL Solution
SELECT city
FROM employees
WHERE company = 'TCS'
GROUP BY city
HAVING COUNT(employee_id) > 3;

MySQL Solution:
-- MySQL Solution
SELECT city
FROM employees
WHERE company = 'TCS'
GROUP BY city
HAVING COUNT(employee_id) > 3;

o Q.45

List all sellers on Amazon who sold more than 5 different products.

69
1000+ SQL Interview Questions & Answers | By Zero Analyst

Explanation:
We need to:

1. Count the number of distinct products sold by each seller.

2. Filter sellers who have sold more than 5 different products.

Datasets and SQL Schemas:

Tables and Data


-- Sellers table
CREATE TABLE sellers (
seller_id INT PRIMARY KEY,
name VARCHAR(50)
);

-- Products table
CREATE TABLE products (
product_id INT PRIMARY KEY,
seller_id INT,
name VARCHAR(50),
FOREIGN KEY (seller_id) REFERENCES sellers(seller_id)
);

-- Sample data for sellers


INSERT INTO sellers (seller_id, name)
VALUES
(1, 'Seller A'),
(2, 'Seller B'),
(3, 'Seller C');

-- Sample data for products


INSERT INTO products (product_id, seller_id, name)
VALUES
(1, 1, 'Laptop'),
(2, 1, 'Mouse'),
(3, 1, 'Keyboard'),
(4, 1, 'Monitor'),
(5, 1, 'Speaker'),
(6, 1, 'Tablet'),
(7, 2, 'Shoes'),
(8, 2, 'T-shirt'),
(9, 3, 'Headphones');

Learnings:

• JOIN: Joining the sellers and products tables to link product sales
to sellers.
• COUNT(DISTINCT): Counting distinct products sold by each seller.
• HAVING: Filtering the results to only show sellers who sold more
than 5 distinct products.

Solutions:

PostgreSQL Solution:
-- PostgreSQL Solution
SELECT s.name
FROM sellers s
JOIN products p ON s.seller_id = p.seller_id
GROUP BY s.seller_id
HAVING COUNT(DISTINCT p.product_id) > 5;

70
1000+ SQL Interview Questions & Answers | By Zero Analyst

MySQL Solution:
-- MySQL Solution
SELECT s.name
FROM sellers s
JOIN products p ON s.seller_id = p.seller_id
GROUP BY s.seller_id
HAVING COUNT(DISTINCT p.product_id) > 5;

o Q.46

Find the most ordered product on Zomato.

Explanation:
We need to identify which product has the highest total quantity ordered. To
do this, we can use GROUP BY to group the orders by product and then use
SUM(quantity) to calculate the total quantity for each product. Finally, we
can sort the results and pick the top product.

Datasets and SQL Schemas:

Tables and Data


-- Orders table
CREATE TABLE orders (
order_id INT PRIMARY KEY,
product_name VARCHAR(50),
quantity INT
);

-- Sample data for orders


INSERT INTO orders (order_id, product_name, quantity)
VALUES
(1, 'Pizza', 5),
(2, 'Burger', 3),
(3, 'Pizza', 7),
(4, 'Pasta', 2),
(5, 'Pizza', 6),
(6, 'Burger', 4);

Learnings:

• GROUP BY: Grouping the data by product_name to calculate the


total quantity for each product.
• SUM(): Summing the quantity for each product.
• ORDER BY: Sorting the result to find the product with the highest
quantity.

Solutions:

PostgreSQL Solution:
-- PostgreSQL Solution
SELECT product_name
FROM orders
GROUP BY product_name
ORDER BY SUM(quantity) DESC
LIMIT 1;

71
1000+ SQL Interview Questions & Answers | By Zero Analyst

MySQL Solution:
-- MySQL Solution
SELECT product_name
FROM orders
GROUP BY product_name
ORDER BY SUM(quantity) DESC
LIMIT 1;

o Q.47

Find all customers who have accounts in both SBI and ICICI.

Explanation:
To solve this, we need to find customers who appear with both 'SBI' and
'ICICI' in the accounts table. This can be done using a JOIN or a GROUP BY
approach with a HAVING clause to ensure that each customer has accounts in
both banks.

Datasets and SQL Schemas:

Tables and Data


-- Accounts table
CREATE TABLE accounts (
customer_id INT,
bank_name VARCHAR(50)
);

-- Sample data for accounts


INSERT INTO accounts (customer_id, bank_name)
VALUES
(1, 'SBI'),
(2, 'ICICI'),
(3, 'SBI'),
(1, 'ICICI'),
(4, 'HDFC'),
(3, 'ICICI');

Learnings:

• GROUP BY: Grouping by customer_id to check each customer's


bank accounts.
• HAVING: Filtering customers who have accounts in both 'SBI' and
'ICICI'.
• COUNT DISTINCT: Using COUNT(DISTINCT bank_name) to ensure
the customer has accounts in both banks.

Solutions:

PostgreSQL Solution:
-- PostgreSQL Solution
SELECT customer_id
FROM accounts
WHERE bank_name IN ('SBI', 'ICICI')
GROUP BY customer_id
HAVING COUNT(DISTINCT bank_name) = 2;

72
1000+ SQL Interview Questions & Answers | By Zero Analyst

MySQL Solution:
-- MySQL Solution
SELECT customer_id
FROM accounts
WHERE bank_name IN ('SBI', 'ICICI')
GROUP BY customer_id
HAVING COUNT(DISTINCT bank_name) = 2;

o Q.48

Find the employee(s) with the second-highest salary in Infosys.

Explanation:
To find the employee(s) with the second-highest salary, we can use a subquery
to first identify the highest salary, then another query to find the maximum
salary that is less than the highest salary. Finally, we can use that result to
filter out the employee(s) with the second-highest salary.

Datasets and SQL Schemas:

Tables and Data


-- Employees table
CREATE TABLE employees (
employee_id INT PRIMARY KEY,
name VARCHAR(50),
salary DECIMAL(10, 2)
);

-- Sample data for employees


INSERT INTO employees (employee_id, name, salary)
VALUES
(1, 'Amit', 50000.00),
(2, 'Riya', 60000.00),
(3, 'Vivek', 80000.00),
(4, 'Sara', 70000.00),
(5, 'Neha', 60000.00);

Learnings:

• Subquery: Using a subquery to find the highest and second-highest


salary.
• WHERE: Filtering the employee(s) based on the second-highest
salary.
• MAX(): Finding the second-largest value by excluding the highest
salary.

Solutions:

PostgreSQL Solution:
-- PostgreSQL Solution
SELECT name, salary
FROM employees
WHERE salary = (
SELECT MAX(salary)
FROM employees
WHERE salary < (SELECT MAX(salary) FROM employees)

73
1000+ SQL Interview Questions & Answers | By Zero Analyst

);

MySQL Solution:
-- MySQL Solution
SELECT name, salary
FROM employees
WHERE salary = (
SELECT MAX(salary)
FROM employees
WHERE salary < (SELECT MAX(salary) FROM employees)
);

o Q.49

Find all movies released after 2015 with a rating higher than the average rating
of all movies.

Explanation:
To solve this:

1. First, calculate the average rating of all movies using AVG(rating).

2. Then, filter movies that were released after 2015 and have a rating
higher than the average rating.

Datasets and SQL Schemas:

Tables and Data


-- Movies table
CREATE TABLE movies (
movie_id INT PRIMARY KEY,
name VARCHAR(50),
release_year INT,
rating DECIMAL(3, 1)
);

-- Sample data for movies


INSERT INTO movies (movie_id, name, release_year, rating)
VALUES
(1, 'Movie A', 2014, 8.2),
(2, 'Movie B', 2016, 7.5),
(3, 'Movie C', 2018, 8.8),
(4, 'Movie D', 2020, 7.9),
(5, 'Movie E', 2013, 6.5);

Learnings:

• AVG(): Calculating the average rating of all movies.


• WHERE: Filtering by release_year and rating.
• Subquery: Using a subquery to compare the movie ratings with the
overall average rating.

Solutions:

PostgreSQL Solution:

74
1000+ SQL Interview Questions & Answers | By Zero Analyst

-- PostgreSQL Solution
SELECT name, release_year, rating
FROM movies
WHERE release_year > 2015
AND rating > (SELECT AVG(rating) FROM movies);

MySQL Solution:
-- MySQL Solution
SELECT name, release_year, rating
FROM movies
WHERE release_year > 2015
AND rating > (SELECT AVG(rating) FROM movies);

o Q.50

Find the total number of transactions done per day by Paytm, sorted in
descending order of the number of transactions.

Explanation:
We need to:

1. Filter the transactions to only include those made by Paytm.

2. Group the transactions by transaction_date.

3. Count the number of transactions per day.

4. Sort the result in descending order of the transaction count.

Datasets and SQL Schemas:

Tables and Data


-- Transactions table
CREATE TABLE transactions (
transaction_id INT PRIMARY KEY,
company VARCHAR(50),
transaction_date DATE
);

-- Sample data for transactions


INSERT INTO transactions (transaction_id, company, transaction_date)
VALUES
(1, 'Paytm', '2024-12-01'),
(2, 'Paytm', '2024-12-01'),
(3, 'Google Pay', '2024-12-01'),
(4, 'Paytm', '2024-12-02'),
(5, 'Paytm', '2024-12-02'),
(6, 'PhonePe', '2024-12-02');

Learnings:

• GROUP BY: Grouping transactions by transaction_date to count


the transactions per day.
• COUNT(): Counting the total number of transactions for each day.
• WHERE: Filtering the data for Paytm only.

75
1000+ SQL Interview Questions & Answers | By Zero Analyst

• ORDER BY: Sorting the results in descending order based on the


transaction count.

Solutions:

PostgreSQL Solution:
-- PostgreSQL Solution
SELECT transaction_date, COUNT(transaction_id) AS total_transactions
FROM transactions
WHERE company = 'Paytm'
GROUP BY transaction_date
ORDER BY total_transactions DESC;

MySQL Solution:
-- MySQL Solution
SELECT transaction_date, COUNT(transaction_id) AS total_transactions
FROM transactions
WHERE company = 'Paytm'
GROUP BY transaction_date
ORDER BY total_transactions DESC;

o Q.51

Find the top 3 most profitable companies in each industry.

Explanation:
To solve this:

1. Rank the companies within each industry based on profit.

2. Use the ROW_NUMBER() window function to assign a rank to each


company within its respective industry.

3. Filter out companies whose rank is greater than 3 to get only the top 3
most profitable companies in each industry.

Datasets and SQL Schemas:

Tables and Data


-- Companies table
CREATE TABLE companies (
company_id INT PRIMARY KEY,
name VARCHAR(50),
industry VARCHAR(50),
revenue DECIMAL(15, 2),
profit DECIMAL(15, 2)
);

-- Sample data for companies


INSERT INTO companies (company_id, name, industry, revenue, profit)
VALUES
(1, 'Apple', 'Technology', 365000000000, 94680000000),
(2, 'Microsoft', 'Technology', 198000000000, 72900000000),
(3, 'Amazon', 'E-commerce', 469800000000, 33240000000),
(4, 'Tesla', 'Automotive', 53800000000, 5563000000),
(5, 'Google', 'Technology', 282000000000, 76000000000),
(6, 'Walmart', 'Retail', 572800000000, 15000000000);

76
1000+ SQL Interview Questions & Answers | By Zero Analyst

Learnings:

• ROW_NUMBER(): Assigning ranks to companies within each


industry based on their profit.
• PARTITION BY: Partitioning the data by industry to rank companies
within each industry.
• ORDER BY: Ordering the data by profit in descending order to get the
most profitable companies first.
• WHERE: Filtering to keep only the top 3 companies in each industry.

Solutions:

PostgreSQL Solution:
-- PostgreSQL Solution
WITH RankedCompanies AS (
SELECT name, industry, profit,
ROW_NUMBER() OVER (PARTITION BY industry ORDER BY profit DESC)
AS rank
FROM companies
)
SELECT name, industry, profit
FROM RankedCompanies
WHERE rank <= 3;

MySQL Solution:
-- MySQL Solution
WITH RankedCompanies AS (
SELECT name, industry, profit,
ROW_NUMBER() OVER (PARTITION BY industry ORDER BY profit DESC)
AS rank
FROM companies
)
SELECT name, industry, profit
FROM RankedCompanies
WHERE rank <= 3;

o Q.52

Calculate the average revenue and profit for each sector and list sectors where
the average profit exceeds $10 billion.

Explanation:
To solve this:

1. GROUP BY the industry to aggregate data by sector.

2. Calculate the average revenue and average profit for each sector
using AVG().

3. Use a HAVING clause to filter the sectors where the average profit
exceeds $10 billion.

Datasets and SQL Schemas:

77
1000+ SQL Interview Questions & Answers | By Zero Analyst

Tables and Data


-- Companies table
CREATE TABLE companies (
company_id INT PRIMARY KEY,
name VARCHAR(50),
industry VARCHAR(50),
revenue DECIMAL(15, 2),
profit DECIMAL(15, 2)
);

-- Sample data for companies


INSERT INTO companies (company_id, name, industry, revenue, profit)
VALUES
(1, 'Apple', 'Technology', 365000000000, 94680000000),
(2, 'Microsoft', 'Technology', 198000000000, 72900000000),
(3, 'Amazon', 'E-commerce', 469800000000, 33240000000),
(4, 'Tesla', 'Automotive', 53800000000, 5563000000),
(5, 'Google', 'Technology', 282000000000, 76000000000),
(6, 'Walmart', 'Retail', 572800000000, 15000000000);

Learnings:

• GROUP BY: Aggregating data by sector (industry).


• AVG(): Calculating average values for both revenue and profit.
• HAVING: Filtering results to include only sectors where the average
profit exceeds $10 billion.

Solutions:

PostgreSQL Solution:
-- PostgreSQL Solution
SELECT industry,
AVG(revenue) AS avg_revenue,
AVG(profit) AS avg_profit
FROM companies
GROUP BY industry
HAVING AVG(profit) > 10000000000;

MySQL Solution:
-- MySQL Solution
SELECT industry,
AVG(revenue) AS avg_revenue,
AVG(profit) AS avg_profit
FROM companies
GROUP BY industry
HAVING AVG(profit) > 10000000000;

o Q.53

Find the company with the second-highest revenue in the Technology sector.

Explanation:
To solve this:

1. Use a subquery to find the company with the highest revenue in the
Technology sector.

78
1000+ SQL Interview Questions & Answers | By Zero Analyst

2. Exclude this company and then use another query to find the maximum
revenue again to get the second-highest revenue in the same sector.

Datasets and SQL Schemas:

Tables and Data


-- Companies table
CREATE TABLE companies (
company_id INT PRIMARY KEY,
name VARCHAR(50),
industry VARCHAR(50),
revenue DECIMAL(15, 2),
profit DECIMAL(15, 2)
);

-- Sample data for companies


INSERT INTO companies (company_id, name, industry, revenue, profit)
VALUES
(1, 'Apple', 'Technology', 365000000000, 94680000000),
(2, 'Microsoft', 'Technology', 198000000000, 72900000000),
(3, 'Amazon', 'E-commerce', 469800000000, 33240000000),
(4, 'Tesla', 'Automotive', 53800000000, 5563000000),
(5, 'Google', 'Technology', 282000000000, 76000000000),
(6, 'Walmart', 'Retail', 572800000000, 15000000000);

Learnings:

• Subquery: A subquery is used to find the highest revenue in the


Technology sector, which is then excluded from the main query.
• WHERE NOT IN: Excluding the company with the highest revenue.
• MAX(): Finding the maximum revenue in the remaining companies to
get the second-highest revenue.

Solutions:

PostgreSQL Solution:
-- PostgreSQL Solution
SELECT name, revenue
FROM companies
WHERE industry = 'Technology'
AND revenue = (
SELECT MAX(revenue)
FROM companies
WHERE industry = 'Technology'
AND revenue < (SELECT MAX(revenue) FROM companies WHERE industry = '
Technology')
);

MySQL Solution:
-- MySQL Solution
SELECT name, revenue
FROM companies
WHERE industry = 'Technology'
AND revenue = (
SELECT MAX(revenue)
FROM companies
WHERE industry = 'Technology'
AND revenue < (SELECT MAX(revenue) FROM companies WHERE industry = '
Technology')
);

79
1000+ SQL Interview Questions & Answers | By Zero Analyst

o Q.54

List all employees of Google who earn above the average salary of employees
in the Technology sector.

Explanation:
To solve this:

1. First, calculate the average salary of employees in the Technology


sector.

2. Then, select the employees from Google whose salary is greater than
the calculated average salary.

Datasets and SQL Schemas:

Tables and Data


-- Employees table
CREATE TABLE employees (
employee_id INT PRIMARY KEY,
name VARCHAR(50),
company VARCHAR(50),
sector VARCHAR(50),
salary DECIMAL(15, 2)
);

-- Sample data for employees


INSERT INTO employees (employee_id, name, company, sector, salary)
VALUES
(1, 'Alice', 'Google', 'Technology', 200000.00),
(2, 'Bob', 'Google', 'Technology', 180000.00),
(3, 'Charlie', 'Microsoft', 'Technology', 150000.00),
(4, 'Dave', 'Amazon', 'E-commerce', 170000.00),
(5, 'Eve', 'Google', 'Technology', 220000.00);

Learnings:

• AVG(): Calculating the average salary for the Technology sector.


• WHERE: Filtering employees of Google whose salary is greater than
the calculated average.
• Subquery: Using a subquery to calculate the average salary in the
Technology sector.

Solutions:

PostgreSQL Solution:
-- PostgreSQL Solution
SELECT name, salary
FROM employees
WHERE company = 'Google'
AND salary > (
SELECT AVG(salary)
FROM employees
WHERE sector = 'Technology'
);

80
1000+ SQL Interview Questions & Answers | By Zero Analyst

MySQL Solution:
-- MySQL Solution
SELECT name, salary
FROM employees
WHERE company = 'Google'
AND salary > (
SELECT AVG(salary)
FROM employees
WHERE sector = 'Technology'
);

o Q.55

Find all companies that generate more than 10% of the total revenue of their
respective industry.

Explanation:
To solve this:

1. First, calculate the total revenue for each industry.

2. Then, for each company, check if their revenue is more than 10% of
the total revenue of their respective industry.

3. Use a JOIN or a subquery to compare each company's revenue with


the industry’s total revenue.

Datasets and SQL Schemas:

Tables and Data (Updated to include company revenues)


-- Companies table
CREATE TABLE companies (
company_id INT PRIMARY KEY,
name VARCHAR(50),
industry VARCHAR(50),
revenue DECIMAL(15, 2)
);

-- Sample data for companies


INSERT INTO companies (company_id, name, industry, revenue)
VALUES
(1, 'Apple', 'Technology', 365000000000),
(2, 'Microsoft', 'Technology', 198000000000),
(3, 'Amazon', 'E-commerce', 469800000000),
(4, 'Tesla', 'Automotive', 53800000000),
(5, 'Google', 'Technology', 282000000000),
(6, 'Walmart', 'Retail', 572800000000);

Learnings:

• GROUP BY: To calculate the total revenue for each industry.


• HAVING: To filter out companies that generate more than 10% of
their industry’s total revenue.
• Subquery: To calculate the total revenue per industry, which is then
compared against each company's revenue.

81
1000+ SQL Interview Questions & Answers | By Zero Analyst

Solutions:

PostgreSQL Solution:
-- PostgreSQL Solution
SELECT name, industry, revenue
FROM companies c
WHERE revenue > 0.1 * (
SELECT SUM(revenue)
FROM companies
WHERE industry = c.industry
);

MySQL Solution:
-- MySQL Solution
SELECT name, industry, revenue
FROM companies c
WHERE revenue > 0.1 * (
SELECT SUM(revenue)
FROM companies
WHERE industry = c.industry
);

o Q.56

List all products sold by Amazon that generate more than 15% of Amazon's
total sales.

Explanation:
To solve this:

1. Calculate Amazon's total sales by summing up the sales of all


products sold by Amazon.

2. For each product sold by Amazon, check if its sales exceed 15% of the
total sales.

3. Use a subquery to calculate Amazon's total sales and filter products


based on this threshold.

Datasets and SQL Schemas:

Tables and Data (Corrected to include products and sales)


-- Products table
CREATE TABLE products (
product_id INT PRIMARY KEY,
name VARCHAR(50),
company VARCHAR(50),
price DECIMAL(15, 2)
);

-- Sales table
CREATE TABLE sales (
sale_id INT PRIMARY KEY,
product_id INT,
quantity INT,

82
1000+ SQL Interview Questions & Answers | By Zero Analyst

FOREIGN KEY (product_id) REFERENCES products(product_id)


);

-- Sample data for products and sales


INSERT INTO products (product_id, name, company, price)
VALUES
(1, 'Laptop', 'Amazon', 1500),
(2, 'Smartphone', 'Amazon', 800),
(3, 'Tablet', 'Amazon', 400),
(4, 'Headphones', 'Amazon', 100);

INSERT INTO sales (sale_id, product_id, quantity)


VALUES
(1, 1, 100), -- Laptop sales
(2, 2, 200), -- Smartphone sales
(3, 3, 150), -- Tablet sales
(4, 4, 50); -- Headphones sales

Learnings:

• JOIN: Join the products and sales tables to get the sales details for
each product.
• SUM(): Calculate the total sales for each product and the overall total
sales for Amazon.
• Subquery: Use a subquery to calculate Amazon's total sales, and then
filter products whose total sales exceed 15% of this total.

Solutions:

PostgreSQL Solution:
-- PostgreSQL Solution
SELECT p.name,
SUM(s.quantity * p.price) AS product_sales
FROM products p
JOIN sales s ON p.product_id = s.product_id
WHERE p.company = 'Amazon'
GROUP BY p.product_id, p.name
HAVING SUM(s.quantity * p.price) > 0.15 * (
SELECT SUM(s2.quantity * p2.price)
FROM products p2
JOIN sales s2 ON p2.product_id = s2.product_id
WHERE p2.company = 'Amazon'
);

MySQL Solution:
-- MySQL Solution
SELECT p.name,
SUM(s.quantity * p.price) AS product_sales
FROM products p
JOIN sales s ON p.product_id = s.product_id
WHERE p.company = 'Amazon'
GROUP BY p.product_id, p.name
HAVING SUM(s.quantity * p.price) > 0.15 * (
SELECT SUM(s2.quantity * p2.price)
FROM products p2
JOIN sales s2 ON p2.product_id = s2.product_id
WHERE p2.company = 'Amazon'
);

o Q.57

83
1000+ SQL Interview Questions & Answers | By Zero Analyst

Find the total number of employees working in each sector and list sectors
with more than 1 million employees.

Explanation:
To solve this:

1. GROUP BY: Group the employees by their sector to calculate the


total number of employees in each sector.

2. HAVING: Filter the sectors where the number of employees exceeds 1


million.

3. COUNT(): Use the COUNT() function to count the number of


employees in each sector.

Datasets and SQL Schemas:

Tables and Data


-- Employees table
CREATE TABLE employees (
employee_id INT PRIMARY KEY,
name VARCHAR(50),
company VARCHAR(50),
sector VARCHAR(50),
salary DECIMAL(15, 2)
);

-- Sample data for employees


INSERT INTO employees (employee_id, name, company, sector, salary)
VALUES
(1, 'Alice', 'Google', 'Technology', 200000.00),
(2, 'Bob', 'Google', 'Technology', 180000.00),
(3, 'Charlie', 'Microsoft', 'Technology', 150000.00),
(4, 'Dave', 'Amazon', 'E-commerce', 170000.00),
(5, 'Eve', 'Google', 'Technology', 220000.00);

Learnings:

• GROUP BY: To group employees by sector and count the number of


employees per sector.
• HAVING: To filter out sectors with more than 1 million employees.
• COUNT(): To count the number of employees in each sector.

Solutions:

PostgreSQL Solution:
-- PostgreSQL Solution
SELECT sector, COUNT(employee_id) AS total_employees
FROM employees
GROUP BY sector
HAVING COUNT(employee_id) > 1000000;

MySQL Solution:
-- MySQL Solution
SELECT sector, COUNT(employee_id) AS total_employees
FROM employees

84
1000+ SQL Interview Questions & Answers | By Zero Analyst

GROUP BY sector
HAVING COUNT(employee_id) > 1000000;

o Q.58

Identify the company that has the highest employee-to-revenue ratio.

Explanation:
To solve this:

1. Employee-to-revenue ratio: This can be calculated by dividing the


number of employees by the revenue for each company.

2. Find the company with the maximum employee-to-revenue ratio.

3. Use ORDER BY to sort the companies by this ratio and then select the
top one.

Datasets and SQL Schemas:

Tables and Data


-- Companies table
CREATE TABLE companies (
company_id INT PRIMARY KEY,
name VARCHAR(50),
revenue DECIMAL(15, 2),
employees INT
);

-- Sample data for companies


INSERT INTO companies (company_id, name, revenue, employees)
VALUES
(1, 'Apple', 365000000000, 147000),
(2, 'Walmart', 572800000000, 2300000),
(3, 'Amazon', 469800000000, 1600000),
(4, 'Tesla', 53800000000, 110000),
(5, 'Google', 282000000000, 156500);

Learnings:

• Employee-to-revenue ratio: This is calculated by dividing the number


of employees by the total revenue.
• ORDER BY: To sort companies based on the employee-to-revenue
ratio in descending order.
• LIMIT: To select only the company with the highest ratio.

Solutions:

PostgreSQL Solution:
-- PostgreSQL Solution
SELECT name, revenue, employees, (employees / revenue) AS employee_to_reve
nue_ratio
FROM companies
ORDER BY employee_to_revenue_ratio DESC
LIMIT 1;

85
1000+ SQL Interview Questions & Answers | By Zero Analyst

MySQL Solution:
-- MySQL Solution
SELECT name, revenue, employees, (employees / revenue) AS employee_to_reve
nue_ratio
FROM companies
ORDER BY employee_to_revenue_ratio DESC
LIMIT 1;

o Q.59

Find the total sales for the top 5 performing products of Apple.

Explanation:
To solve this:

1. Identify top 5 performing products: You can calculate the


performance of a product based on its total sales, which would
typically be calculated by multiplying product price by quantity sold.

2. Calculate total sales: After identifying the top 5 products based on


sales performance, sum their total sales.

Datasets and SQL Schemas:

Tables and Data (Updated to include products and sales)


-- Products table
CREATE TABLE products (
product_id INT PRIMARY KEY,
name VARCHAR(50),
company VARCHAR(50),
price DECIMAL(15, 2)
);

-- Sales table
CREATE TABLE sales (
sale_id INT PRIMARY KEY,
product_id INT,
quantity INT,
FOREIGN KEY (product_id) REFERENCES products(product_id)
);

-- Sample data for products and sales


INSERT INTO products (product_id, name, company, price)
VALUES
(1, 'iPhone', 'Apple', 1000),
(2, 'MacBook Pro', 'Apple', 2500),
(3, 'iPad', 'Apple', 500),
(4, 'Apple Watch', 'Apple', 400),
(5, 'AirPods', 'Apple', 150),
(6, 'iMac', 'Apple', 1800),
(7, 'iPhone 13', 'Apple', 1200),
(8, 'Apple TV', 'Apple', 200);

INSERT INTO sales (sale_id, product_id, quantity)


VALUES
(1, 1, 5000), -- iPhone
(2, 2, 3000), -- MacBook Pro
(3, 3, 7000), -- iPad
(4, 4, 10000), -- Apple Watch
(5, 5, 12000), -- AirPods
(6, 6, 2000), -- iMac

86
1000+ SQL Interview Questions & Answers | By Zero Analyst

(7, 7, 4000), -- iPhone 13


(8, 8, 8000); -- Apple TV

Learnings:

• JOIN: To join the products and sales tables based on product_id to


get product prices and quantities sold.
• ORDER BY: To order products by their total sales (calculated by
price × quantity sold).
• LIMIT: To restrict the results to the top 5 performing products.

Solutions:

PostgreSQL Solution:
-- PostgreSQL Solution
SELECT p.name,
(s.quantity * p.price) AS total_sales
FROM products p
JOIN sales s ON p.product_id = s.product_id
WHERE p.company = 'Apple'
ORDER BY total_sales DESC
LIMIT 5;

MySQL Solution:
-- MySQL Solution
SELECT p.name,
(s.quantity * p.price) AS total_sales
FROM products p
JOIN sales s ON p.product_id = s.product_id
WHERE p.company = 'Apple'
ORDER BY total_sales DESC
LIMIT 5;

o Q.60

List all industries where at least 3 companies have profits above the industry
average.

Explanation:
To solve this:

1. Calculate the industry average profit: For each industry, calculate


the average profit of all companies within that industry.

2. Compare each company's profit: Find companies that have profits


greater than the industry average.

3. Count the companies per industry: For each industry, count how
many companies have profits greater than the industry average.

4. Filter industries: Only include industries where at least 3 companies


have profits above the average.

87
1000+ SQL Interview Questions & Answers | By Zero Analyst

Datasets and SQL Schemas:

Tables and Data


-- Companies table
CREATE TABLE companies (
company_id INT PRIMARY KEY,
name VARCHAR(50),
industry VARCHAR(50),
revenue DECIMAL(15, 2),
profit DECIMAL(15, 2),
employees INT
);

-- Sample data for companies


INSERT INTO companies (company_id, name, industry, revenue, profit, employ
ees)
VALUES
(1, 'Apple', 'Technology', 365000000000, 94680000000, 147000),
(2, 'Microsoft', 'Technology', 198000000000, 72900000000, 150000),
(3, 'Amazon', 'E-commerce', 469800000000, 33240000000, 1600000),
(4, 'Tesla', 'Automotive', 53800000000, 5563000000, 110000),
(5, 'Google', 'Technology', 282000000000, 76000000000, 156500),
(6, 'Walmart', 'Retail', 572800000000, 15000000000, 2300000);

Learnings:

• Subquery: To calculate the average profit for each industry.


• HAVING: To filter out industries where less than 3 companies have
profits above the industry average.
• COUNT(): To count the number of companies in each industry with
profits above the average.

Solutions:

PostgreSQL Solution:
-- PostgreSQL Solution
WITH IndustryAvgProfit AS (
SELECT industry, AVG(profit) AS avg_profit
FROM companies
GROUP BY industry
)
SELECT c.industry
FROM companies c
JOIN IndustryAvgProfit iap ON c.industry = iap.industry
WHERE c.profit > iap.avg_profit
GROUP BY c.industry
HAVING COUNT(c.company_id) >= 3;

MySQL Solution:
-- MySQL Solution
WITH IndustryAvgProfit AS (
SELECT industry, AVG(profit) AS avg_profit
FROM companies
GROUP BY industry
)
SELECT c.industry
FROM companies c
JOIN IndustryAvgProfit iap ON c.industry = iap.industry
WHERE c.profit > iap.avg_profit
GROUP BY c.industry
HAVING COUNT(c.company_id) >= 3;

88
1000+ SQL Interview Questions & Answers | By Zero Analyst

o Q.61

Find the year with the highest number of new patents filed by Microsoft.

Explanation:
To solve this:

1. Group patents by year: First, group the patents by filing year for
Microsoft.

2. Count patents per year: For each year, count the number of patents
filed.

3. Identify the year with the maximum count: Use the ORDER BY clause
to order the years based on the count of patents in descending order
and use LIMIT 1 to get the year with the highest number.

Datasets and SQL Schemas:

Tables and Data


-- Patents table
CREATE TABLE patents (
patent_id INT PRIMARY KEY,
company VARCHAR(50),
filing_year INT
);

-- Sample data for patents


INSERT INTO patents (patent_id, company, filing_year)
VALUES
(1, 'Microsoft', 2021),
(2, 'Microsoft', 2020),
(3, 'Microsoft', 2020),
(4, 'Apple', 2021),
(5, 'Microsoft', 2021);

Learnings:

• GROUP BY: To group patents by filing year for Microsoft.


• COUNT(): To count the number of patents filed in each year.
• ORDER BY: To order the results by the count of patents in
descending order.
• LIMIT: To get only the year with the highest number of patents.

Solutions:

PostgreSQL Solution:
-- PostgreSQL Solution
SELECT filing_year, COUNT(*) AS num_patents
FROM patents
WHERE company = 'Microsoft'
GROUP BY filing_year
ORDER BY num_patents DESC
LIMIT 1;

89
1000+ SQL Interview Questions & Answers | By Zero Analyst

MySQL Solution:
-- MySQL Solution
SELECT filing_year, COUNT(*) AS num_patents
FROM patents
WHERE company = 'Microsoft'
GROUP BY filing_year
ORDER BY num_patents DESC
LIMIT 1;

o Q.62

List all companies whose profit margin (profit/revenue) exceeds the average
margin across all companies.

Explanation:
To solve this:

1. Calculate the profit margin: For each company, calculate the profit
margin as the ratio of profit to revenue.

2. Calculate the average profit margin: Compute the average profit


margin across all companies.

3. Filter companies: Find companies whose profit margin exceeds the


calculated average.

Datasets and SQL Schemas:

Tables and Data


-- Companies table
CREATE TABLE companies (
company_id INT PRIMARY KEY,
name VARCHAR(50),
revenue DECIMAL(15, 2),
profit DECIMAL(15, 2)
);

-- Sample data for companies


INSERT INTO companies (company_id, name, revenue, profit)
VALUES
(1, 'Apple', 365000000000, 94680000000),
(2, 'Microsoft', 198000000000, 72900000000),
(3, 'Amazon', 469800000000, 33240000000),
(4, 'Tesla', 53800000000, 5563000000),
(5, 'Google', 282000000000, 76000000000),
(6, 'Walmart', 572800000000, 15000000000);

Learnings:

• Profit Margin Calculation: Profit margin is computed as profit /


revenue.
• Subquery: Used to calculate the average profit margin across all
companies.
• HAVING: Filters out companies whose profit margin is above the
average.

90
1000+ SQL Interview Questions & Answers | By Zero Analyst

Solutions:

PostgreSQL Solution:
-- PostgreSQL Solution
WITH AverageMargin AS (
SELECT AVG(profit / revenue) AS avg_margin
FROM companies
)
SELECT name
FROM companies
WHERE (profit / revenue) > (SELECT avg_margin FROM AverageMargin);

MySQL Solution:
-- MySQL Solution
WITH AverageMargin AS (
SELECT AVG(profit / revenue) AS avg_margin
FROM companies
)
SELECT name
FROM companies
WHERE (profit / revenue) > (SELECT avg_margin FROM AverageMargin);

o Q.63

Identify the city where Tesla has the maximum number of sales.

Explanation:
To solve this:

1. Group sales by city: First, group the sales data by city for Tesla.

2. Sum units sold per city: Calculate the total number of units sold in
each city.

3. Identify the city with the maximum sales: Use ORDER BY to sort the
cities based on total units sold in descending order and limit the result
to the top city.

Datasets and SQL Schemas:

Tables and Data


-- Sales table
CREATE TABLE sales (
sale_id INT PRIMARY KEY,
company VARCHAR(50),
city VARCHAR(50),
units_sold INT
);

-- Sample data for sales


INSERT INTO sales (sale_id, company, city, units_sold)
VALUES
(1, 'Tesla', 'Los Angeles', 1000),
(2, 'Tesla', 'New York', 1200),
(3, 'Tesla', 'San Francisco', 1500),
(4, 'Tesla', 'Chicago', 900);

91
1000+ SQL Interview Questions & Answers | By Zero Analyst

Learnings:

• GROUP BY: Group sales data by city.


• SUM(): Calculate the total number of units sold in each city.
• ORDER BY: Sort the cities based on the sum of units sold in
descending order.
• LIMIT: Retrieve only the city with the maximum sales.

Solutions:

PostgreSQL Solution:
-- PostgreSQL Solution
SELECT city
FROM sales
WHERE company = 'Tesla'
GROUP BY city
ORDER BY SUM(units_sold) DESC
LIMIT 1;

MySQL Solution:
-- MySQL Solution
SELECT city
FROM sales
WHERE company = 'Tesla'
GROUP BY city
ORDER BY SUM(units_sold) DESC
LIMIT 1;

o Q.64

List all employees who earn above the 90th percentile in their company.

Explanation:
To solve this:

1. Percentile Calculation: Calculate the 90th percentile salary for each


company.

2. Filter Employees: Identify employees whose salaries exceed the 90th


percentile in their respective company.

3. Use Window Functions: To calculate percentiles per company,


window functions like PERCENTILE_CONT() in PostgreSQL or
NTILE() in MySQL can be used.

Datasets and SQL Schemas:

Tables and Data


-- Employee table (Company, Salary)
CREATE TABLE employees (
employee_id INT PRIMARY KEY,
name VARCHAR(50),
company VARCHAR(50),
salary DECIMAL(15, 2)

92
1000+ SQL Interview Questions & Answers | By Zero Analyst

);

-- Sample data for employees with their respective companies and salaries
INSERT INTO employees (employee_id, name, company, salary)
VALUES
(1, 'Alice', 'Apple', 200000.00),
(2, 'Bob', 'Apple', 180000.00),
(3, 'Charlie', 'Apple', 150000.00),
(4, 'Dave', 'Apple', 250000.00),
(5, 'Eve', 'Google', 220000.00),
(6, 'Frank', 'Google', 190000.00),
(7, 'Grace', 'Google', 170000.00),
(8, 'Hank', 'Google', 210000.00);

Learnings:

• Window Function (PostgreSQL): PERCENTILE_CONT() to calculate


percentiles within each company.
• NTILE (MySQL): Divide the data into buckets to calculate the rank
and select employees above the 90th percentile.
• WHERE Clause: Filter employees whose salary is greater than the
90th percentile for their respective company.

Solutions:

PostgreSQL Solution:
PostgreSQL has built-in support for percentile calculation using
PERCENTILE_CONT():
-- PostgreSQL Solution
WITH Percentiles AS (
SELECT
company,
name,
salary,
PERCENTILE_CONT(0.9) WITHIN GROUP (ORDER BY salary) OVER (PARTITIO
N BY company) AS percentile_90
FROM employees
)
SELECT name, company, salary
FROM Percentiles
WHERE salary > percentile_90;

MySQL Solution:
MySQL doesn’t have PERCENTILE_CONT(), but you can simulate this with
window functions or by using NTILE():
-- MySQL Solution
SELECT e.name, e.company, e.salary
FROM (
SELECT
e.name,
e.company,
e.salary,
NTILE(100) OVER (PARTITION BY e.company ORDER BY e.salary DESC) AS
percentile_rank
FROM employees e
) AS RankedEmployees
WHERE percentile_rank > 90;

93
1000+ SQL Interview Questions & Answers | By Zero Analyst

Explanation:

• PostgreSQL: We use PERCENTILE_CONT(0.9) to get the 90th


percentile salary for each company and filter out the employees who
earn more than that percentile.
• MySQL: We use NTILE(100) to create 100 buckets and filter those
employees in the top 10% (percentile greater than 90).
o Q.65

Rank the top 3 industries by total profit and list the companies contributing to
those profits.

Explanation:
To solve this:

1. Group by industry: Group companies by their industry to calculate


the total profit for each industry.

2. Sum profits: Calculate the total profit for each industry by summing
the profits of all companies in the industry.

3. Rank industries: Rank the industries based on total profit in


descending order.

4. List companies: For the top 3 industries, list the companies


contributing to those profits.

Datasets and SQL Schemas:

Tables and Data


-- Companies table (Revenue, Profit, Industry)
CREATE TABLE companies (
company_id INT PRIMARY KEY,
name VARCHAR(50),
industry VARCHAR(50),
revenue DECIMAL(15, 2),
profit DECIMAL(15, 2)
);

-- Sample data for companies with profits and industries


INSERT INTO companies (company_id, name, industry, revenue, profit)
VALUES
(1, 'Apple', 'Technology', 365000000000, 94680000000),
(2, 'Microsoft', 'Technology', 198000000000, 72900000000),
(3, 'Amazon', 'E-commerce', 469800000000, 33240000000),
(4, 'Tesla', 'Automotive', 53800000000, 5563000000),
(5, 'Google', 'Technology', 282000000000, 76000000000),
(6, 'Walmart', 'Retail', 572800000000, 15000000000);

Learnings:

• GROUP BY: Group companies by their industry.


• SUM(): Calculate the total profit for each industry.

94
1000+ SQL Interview Questions & Answers | By Zero Analyst

• ORDER BY: Rank industries based on their total profit in descending


order.
• JOIN: Join the companies to their respective industry ranks to list all
companies in the top 3 industries.

Solutions:

PostgreSQL Solution:
-- PostgreSQL Solution
WITH IndustryProfit AS (
SELECT industry, SUM(profit) AS total_profit
FROM companies
GROUP BY industry
ORDER BY total_profit DESC
LIMIT 3
)
SELECT c.industry, c.name, c.profit
FROM companies c
JOIN IndustryProfit ip ON c.industry = ip.industry
ORDER BY ip.total_profit DESC, c.profit DESC;

MySQL Solution:
-- MySQL Solution
WITH IndustryProfit AS (
SELECT industry, SUM(profit) AS total_profit
FROM companies
GROUP BY industry
ORDER BY total_profit DESC
LIMIT 3
)
SELECT c.industry, c.name, c.profit
FROM companies c
JOIN IndustryProfit ip ON c.industry = ip.industry
ORDER BY ip.total_profit DESC, c.profit DESC;

o Q.66

Find the company that has the highest revenue per employee in the Retail
sector.

Explanation:
To solve this:

1. Filter companies by the Retail sector: We are only interested in


companies from the Retail sector.

2. Calculate revenue per employee: This is calculated as revenue /


employees for each company.

3. Identify the company with the highest revenue per employee: Use
ORDER BY to sort the companies based on the revenue per employee in
descending order and select the top company.

Datasets and SQL Schemas:

95
1000+ SQL Interview Questions & Answers | By Zero Analyst

Tables and Data


-- Companies table (Revenue and Employees)
CREATE TABLE companies (
company_id INT PRIMARY KEY,
name VARCHAR(50),
revenue DECIMAL(15, 2),
employees INT,
sector VARCHAR(50)
);

-- Sample data for companies in different sectors


INSERT INTO companies (company_id, name, revenue, employees, sector)
VALUES
(1, 'Apple', 365000000000, 147000, 'Technology'),
(2, 'Walmart', 572800000000, 2300000, 'Retail'),
(3, 'Amazon', 469800000000, 1600000, 'E-commerce'),
(4, 'Tesla', 53800000000, 110000, 'Automotive'),
(5, 'Google', 282000000000, 156500, 'Technology'),
(6, 'Target', 78000000000, 400000, 'Retail');

Learnings:

• Revenue per employee: The ratio of a company’s revenue to the


number of employees.
• WHERE clause: To filter for companies in the Retail sector.
• ORDER BY: To sort companies based on the revenue per employee in
descending order.
• LIMIT: To get the company with the highest revenue per employee.

Solutions:

PostgreSQL Solution:
-- PostgreSQL Solution
SELECT name, (revenue / employees) AS revenue_per_employee
FROM companies
WHERE sector = 'Retail'
ORDER BY revenue_per_employee DESC
LIMIT 1;

MySQL Solution:
-- MySQL Solution
SELECT name, (revenue / employees) AS revenue_per_employee
FROM companies
WHERE sector = 'Retail'
ORDER BY revenue_per_employee DESC
LIMIT 1;

o Q.67

Identify the quarter in which Apple generated its highest revenue for 2024.

Explanation:
To solve this:

1. Filter the data for Apple: We are only interested in Apple's revenue.

96
1000+ SQL Interview Questions & Answers | By Zero Analyst

2. Find the maximum revenue: Identify the highest revenue value for
Apple across the quarters.

3. Retrieve the quarter: Use a query to select the quarter corresponding


to the highest revenue.

Datasets and SQL Schemas:

Tables and Data


-- Quarterly revenue table (Company, Quarter, Revenue)
CREATE TABLE quarterly_revenue (
id INT PRIMARY KEY,
company VARCHAR(50),
quarter VARCHAR(10),
revenue DECIMAL(15, 2)
);

-- Sample data for Apple's quarterly revenue in 2024


INSERT INTO quarterly_revenue (id, company, quarter, revenue)
VALUES
(1, 'Apple', 'Q1', 100000000000),
(2, 'Apple', 'Q2', 95000000000),
(3, 'Apple', 'Q3', 110000000000),
(4, 'Apple', 'Q4', 115000000000);

Learnings:

• MAX(): Used to find the highest value of revenue for a company.


• WHERE clause: To filter records only for Apple.
• ORDER BY: To arrange quarters based on revenue in descending
order and select the highest.

Solutions:

PostgreSQL Solution:
-- PostgreSQL Solution
SELECT quarter, revenue
FROM quarterly_revenue
WHERE company = 'Apple'
ORDER BY revenue DESC
LIMIT 1;

MySQL Solution:
-- MySQL Solution
SELECT quarter, revenue
FROM quarterly_revenue
WHERE company = 'Apple'
ORDER BY revenue DESC
LIMIT 1;

o Q.68

Identify products from Amazon that had declining sales over the last 3
quarters.

Explanation:

97
1000+ SQL Interview Questions & Answers | By Zero Analyst

To solve this:

1. Product Revenue Data: Assuming a similar table structure where


sales or revenue data is tracked for products, we'll compare the sales of
products for the last 3 quarters.

2. Declining Sales: To identify products with declining sales, we need to


check if the sales for a product are consistently decreasing across the
last three quarters.

3. Window Function: Use LAG() or self-joins to compare sales data


across quarters for each product.

Datasets and SQL Schemas:

Tables and Data:


-- Table to track quarterly sales by product (company, product, quarter, r
evenue)
CREATE TABLE quarterly_revenue (
id INT PRIMARY KEY,
company VARCHAR(50),
product_name VARCHAR(50),
quarter VARCHAR(10),
revenue DECIMAL(15, 2)
);

-- Sample data: Quarterly revenue for products at Amazon


INSERT INTO quarterly_revenue (id, company, product_name, quarter, revenue
)
VALUES
(1, 'Amazon', 'Laptop', 'Q1', 150000.00),
(2, 'Amazon', 'Laptop', 'Q2', 130000.00),
(3, 'Amazon', 'Laptop', 'Q3', 120000.00),
(4, 'Amazon', 'Smartphone', 'Q1', 250000.00),
(5, 'Amazon', 'Smartphone', 'Q2', 240000.00),
(6, 'Amazon', 'Smartphone', 'Q3', 230000.00),
(7, 'Amazon', 'Headphones', 'Q1', 50000.00),
(8, 'Amazon', 'Headphones', 'Q2', 52000.00),
(9, 'Amazon', 'Headphones', 'Q3', 51000.00);

Learnings:

• Self Join: You can join the table to itself to compare current quarter's
sales with previous quarters.
• Window Function (PostgreSQL): LAG() can help compare each
product's revenue with the previous quarter.
• Sales Decline: To detect a decline, check if the revenue in each quarter
is less than the previous quarter's revenue.

Solutions:

PostgreSQL Solution:
PostgreSQL has the LAG() function, which allows you to access the value of a
previous row within a window.
-- PostgreSQL Solution
WITH RevenueComparison AS (

98
1000+ SQL Interview Questions & Answers | By Zero Analyst

SELECT
company,
product_name,
quarter,
revenue,
LAG(revenue, 1) OVER (PARTITION BY company, product_name ORDER BY
quarter) AS prev_quarter_revenue,
LAG(revenue, 2) OVER (PARTITION BY company, product_name ORDER BY
quarter) AS prev_quarter2_revenue
FROM quarterly_revenue
WHERE company = 'Amazon'
)
SELECT
company,
product_name
FROM RevenueComparison
WHERE revenue < prev_quarter_revenue AND prev_quarter_revenue < prev_quart
er2_revenue
GROUP BY company, product_name;

MySQL Solution:
In MySQL, you can use LAG() or JOIN to compare revenue for each product
across quarters.
-- MySQL Solution (Using LAG function in MySQL 8.0+)
WITH RevenueComparison AS (
SELECT
company,
product_name,
quarter,
revenue,
LAG(revenue, 1) OVER (PARTITION BY company, product_name ORDER BY
quarter) AS prev_quarter_revenue,
LAG(revenue, 2) OVER (PARTITION BY company, product_name ORDER BY
quarter) AS prev_quarter2_revenue
FROM quarterly_revenue
WHERE company = 'Amazon'
)
SELECT
company,
product_name
FROM RevenueComparison
WHERE revenue < prev_quarter_revenue AND prev_quarter_revenue < prev_quart
er2_revenue
GROUP BY company, product_name;

Explanation:

• PostgreSQL and MySQL (with LAG): The LAG() function helps


compare the current quarter's revenue with the previous quarters’
revenues for each product.
• Declining Sales Logic: We select the products where each quarter's
revenue is lower than the previous quarter's revenue, indicating a
consistent decline.

o Q.69

Find the total revenue and profit for each company for the last 5 years, sorted
by profit in descending order.

99
1000+ SQL Interview Questions & Answers | By Zero Analyst

Explanation:
This query involves calculating the total revenue and profit for each company
over a period of time (the last 5 years). The goal is to sort the results by profit
in descending order. The data provided is assumed to be for quarterly revenue,
and we will also include profit calculation (assumed to be 20% of revenue).

Datasets and SQL Schemas:


CREATE TABLE quarterly_revenue (
id INT PRIMARY KEY,
company VARCHAR(50),
quarter VARCHAR(10),
revenue DECIMAL(15, 2),
profit DECIMAL(15, 2) -- Assuming 20% of revenue as profit
);

INSERT INTO quarterly_revenue (id, company, quarter, revenue, profit)


VALUES
(1, 'Apple', 'Q1', 100000000000, 20000000000), -- 20% of revenue as p
rofit
(2, 'Apple', 'Q2', 95000000000, 19000000000),
(3, 'Apple', 'Q3', 110000000000, 22000000000),
(4, 'Apple', 'Q4', 115000000000, 23000000000),
(5, 'Microsoft', 'Q1', 90000000000, 18000000000),
(6, 'Microsoft', 'Q2', 95000000000, 19000000000),
(7, 'Microsoft', 'Q3', 100000000000, 20000000000),
(8, 'Microsoft', 'Q4', 105000000000, 21000000000),
(9, 'Amazon', 'Q1', 80000000000, 16000000000),
(10, 'Amazon', 'Q2', 85000000000, 17000000000),
(11, 'Amazon', 'Q3', 90000000000, 18000000000),
(12, 'Amazon', 'Q4', 95000000000, 19000000000);

SQL Statement:
SELECT
company,
SUM(revenue) AS total_revenue,
SUM(profit) AS total_profit
FROM
quarterly_revenue
GROUP BY
company
ORDER BY
total_profit DESC;

Learnings:

• Aggregation: The SUM function is used to calculate total revenue and


profit for each company.
• Sorting: The ORDER BY clause sorts the results based on total profit in
descending order.
• Assumed Profit Margin: The profit is calculated as a fixed percentage
(20%) of revenue in this case. This can vary in real-world scenarios.

Solutions:

• PostgreSQL and MySQL:


The SQL statement provided works identically in both PostgreSQL
and MySQL as there are no specific database differences for this query
type.
o Q.70

100
1000+ SQL Interview Questions & Answers | By Zero Analyst

List all companies whose revenue grew by more than 10% year-over-year
consistently for the past 3 years.

Explanation:
To identify companies whose revenue grew by more than 10% year-over-year
for the past 3 years, we need to calculate the year-over-year growth for each
company. We then check for those that consistently showed a growth rate
greater than 10% for each of the past three years.

Assumptions:

1. The data is broken down into quarters. For simplicity, we assume there
is sufficient data for the past 3 years, ideally having 4 quarters per
year.

2. Year-over-year growth is calculated based on comparing the same


quarter in consecutive years.

Data (Adjusted for 3 years of quarters):


CREATE TABLE quarterly_revenue (
id INT PRIMARY KEY,
company VARCHAR(50),
quarter VARCHAR(10),
revenue DECIMAL(15, 2)
);

INSERT INTO quarterly_revenue (id, company, quarter, revenue)


VALUES
(1, 'Apple', 'Q1', 100000000000),
(2, 'Apple', 'Q2', 95000000000),
(3, 'Apple', 'Q3', 110000000000),
(4, 'Apple', 'Q4', 115000000000),
(5, 'Apple', 'Q1', 110000000000), -- 10% growth from the previous yea
r
(6, 'Apple', 'Q2', 105000000000),
(7, 'Apple', 'Q3', 121000000000),
(8, 'Apple', 'Q4', 125000000000),
(9, 'Microsoft', 'Q1', 50000000000),
(10, 'Microsoft', 'Q2', 51000000000),
(11, 'Microsoft', 'Q3', 52000000000),
(12, 'Microsoft', 'Q4', 53000000000),
(13, 'Microsoft', 'Q1', 60000000000), -- 20% growth
(14, 'Microsoft', 'Q2', 61000000000),
(15, 'Microsoft', 'Q3', 62500000000),
(16, 'Microsoft', 'Q4', 63500000000);

SQL Statement:
WITH year_over_year_growth AS (
SELECT
a.company,
a.quarter,
a.revenue AS current_revenue,
b.revenue AS previous_year_revenue,
((a.revenue - b.revenue) / b.revenue) * 100 AS growth_percentage
FROM
quarterly_revenue a
JOIN
quarterly_revenue b
ON
a.company = b.company

101
1000+ SQL Interview Questions & Answers | By Zero Analyst

AND EXTRACT(YEAR FROM a.quarter) = EXTRACT(YEAR FROM b.quarter) +


1
)
SELECT
company
FROM
year_over_year_growth
GROUP BY
company
HAVING
COUNT(*) = 4 -- For a complete 4 quarters per year over 3 years
AND MIN(growth_percentage) > 10; -- Consistent growth > 10% year-
over-year

Explanation:

• Year-over-year growth: We calculate the revenue growth percentage


for each quarter compared to the same quarter in the previous year.
• WITH clause: We use a WITH clause to first calculate the year-over-
year growth for each company.
• Filtering: The HAVING clause ensures that only companies with
consistent growth greater than 10% for all 3 years (4 quarters per year)
are selected.

Learnings:

• Self Join: The query uses a self-join to compare each quarter with the
same quarter in the previous year.
• Growth Calculation: The growth is calculated as
(current_year_revenue - previous_year_revenue) /
previous_year_revenue * 100.
• Filtering Consistent Growth: The query ensures that all four quarters
of each year have a growth rate of more than 10%.

• Hard
o Q.71

Rank all employees at Deloitte based on their monthly performance score.

Assumptions:

1. The table provided only includes employee performance scores but


does not have company information. We'll assume that all the data
provided pertains to employees at Deloitte, as implied in the question.

2. We need to rank employees based on their performance score within


their department or overall.

Data:
CREATE TABLE performance (
employee_id INT PRIMARY KEY,
employee_name VARCHAR(100),
department VARCHAR(50),
performance_score INT
);

102
1000+ SQL Interview Questions & Answers | By Zero Analyst

INSERT INTO performance VALUES


(1, 'Alice Smith', 'Finance', 90),
(2, 'John Doe', 'Finance', 85),
(3, 'Emma Wilson', 'IT', 95),
(4, 'Liam Brown', 'IT', 89),
(5, 'Sophia Johnson', 'Finance', 87);

SQL Statement:
SELECT
employee_id,
employee_name,
department,
performance_score,
RANK() OVER (PARTITION BY department ORDER BY performance_score DESC)
AS rank
FROM
performance;

Explanation:

• RANK(): The RANK() function is used to rank employees based on


their performance scores. The RANK() function assigns a unique rank
to each employee based on the order of the performance score in
descending order.
• PARTITION BY: This clause divides the employees into different
departments. The ranking is performed separately for each department
(i.e., "Finance" and "IT").
• ORDER BY: The employees are ordered within their department by
their performance score in descending order, meaning the highest score
will receive rank 1.

Output (Example):

employee_id employee_name department performance_score rank

3 Emma Wilson IT 95 1

4 Liam Brown IT 89 2

1 Alice Smith Finance 90 1

5 Sophia Johnson Finance 87 2

2 John Doe Finance 85 3

Explanation of the Output:

• Employees in each department are ranked based on their performance


score.
• For example, Emma Wilson in IT has the highest score, so she is
ranked 1 in the IT department.

103
1000+ SQL Interview Questions & Answers | By Zero Analyst

• The Finance department has a ranking for each employee with the
highest performer (Alice Smith) ranked 1.

Notes:

• If two employees have the same score, they will receive the same rank,
and the next rank will be skipped (e.g., if two people tie for rank 1, the
next rank will be 3, not 2).
o Q.72

Write a SQL query to find the customer who made the most recent order.

Explanation:
To solve this problem, you need to find the customer who made the most
recent order. The solution requires joining the customers and orders tables,
then identifying the most recent order by selecting the maximum order_date.
The query will then return the customer who made that order.

Datasets and SQL Schemas:


-- Table creation and sample data
CREATE TABLE customers (
customer_id INT PRIMARY KEY,
customer_name VARCHAR(100)
);

CREATE TABLE orders (


order_id INT PRIMARY KEY,
customer_id INT,
amount DECIMAL(10, 2),
order_date DATE,
FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
);

-- Sample data insertions


INSERT INTO customers (customer_id, customer_name) VALUES
(1, 'Anjali'),
(2, 'Rohan'),
(3, 'Suresh'),
(4, 'Priya'),
(5, 'Rahul');

INSERT INTO orders (order_id, customer_id, amount, order_date) VALUES


(1, 1, 2500, '2023-01-01'),
(2, 2, 3000, '2023-01-02'),
(3, 1, 1500, '2023-02-03'),
(4, 3, 4000, '2023-02-12'),
(5, 1, 3000, '2023-01-05'),
(6, 2, 4500, '2023-01-06'),
(7, 4, 5000, '2023-01-07'),
(8, 5, 2000, '2023-01-08');

Learnings:

• Join: Combine data from multiple tables (customers and orders)


based on a common field (customer_id).
• Aggregation and Sorting: Find the most recent order by using MAX()
to get the latest order_date.

104
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Subquery: Use a subquery to identify the most recent order_date

Solutions:
PostgreSQL Solution:
SELECT c.customer_name
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
WHERE o.order_date = (SELECT MAX(order_date) FROM orders);

MySQL Solution:
SELECT c.customer_name
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
WHERE o.order_date = (SELECT MAX(order_date) FROM orders);

In both PostgreSQL and MySQL, this query retrieves the customer who placed
the most recent order by matching the order_date to the maximum
order_date found in the orders table.

o Q.73

List all customers who have made purchases in all product categories and the
total amount they spent.

Explanation:
To solve this, you need to:

1. Identify customers who have made purchases in all product


categories.

2. Calculate the total amount spent by each customer.

You can use a GROUP BY clause to aggregate the total amount spent per
customer. Additionally, you need to ensure that the customer has purchased
products from all available categories, which can be verified by checking that
the count of distinct categories a customer has bought products from matches
the total number of product categories.

Datasets and SQL Schemas:


-- Table creation and sample data
CREATE TABLE customers (
customer_id INT,
customer_name VARCHAR(100)
);

CREATE TABLE products (


product_id INT,
product_name VARCHAR(100),
category_id INT,
price DECIMAL(10, 2)
);

CREATE TABLE purchases (


purchase_id INT,

105
1000+ SQL Interview Questions & Answers | By Zero Analyst

customer_id INT,
product_id INT
);

-- Sample data insertions


INSERT INTO customers VALUES
(1, 'Alice'),
(2, 'Bob'),
(3, 'Charlie');

INSERT INTO products VALUES


(1, 'Laptop', 1, 800),
(2, 'Smartphone', 1, 600),
(3, 'Book', 2, 20),
(4, 'Headphones', 1, 150);

INSERT INTO purchases VALUES


(1, 1, 1),
(2, 1, 2),
(3, 1, 3),
(4, 2, 2),
(5, 2, 3),
(6, 3, 1),
(7, 3, 2),
(8, 3, 4);

Learnings:

• Joins: Necessary to combine data from multiple tables (customers,


products, purchases).
• Aggregation: The SUM() function is used to calculate the total amount
spent by each customer.
• Groupings: The COUNT(DISTINCT category_id) function is used to
determine the number of categories a customer has made purchases in.
• Subquery: Used to determine the total number of product categories.

Solutions:
PostgreSQL Solution:
SELECT c.customer_name, SUM(p.price) AS total_spent
FROM customers c
JOIN purchases pu ON c.customer_id = pu.customer_id
JOIN products p ON pu.product_id = p.product_id
GROUP BY c.customer_name
HAVING COUNT(DISTINCT p.category_id) = (SELECT COUNT(DISTINCT category_id)
FROM products);

MySQL Solution:
SELECT c.customer_name, SUM(p.price) AS total_spent
FROM customers c
JOIN purchases pu ON c.customer_id = pu.customer_id
JOIN products p ON pu.product_id = p.product_id
GROUP BY c.customer_name
HAVING COUNT(DISTINCT p.category_id) = (SELECT COUNT(DISTINCT category_id)
FROM products);

Explanation of Solutions:

• Join: We join the customers, purchases, and products tables on the


relevant keys (customer_id and product_id).

106
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Aggregation: SUM(p.price) calculates the total amount spent by each


customer.
• Filtering: HAVING COUNT(DISTINCT p.category_id) ensures that a
customer has made purchases from all distinct product categories. The
COUNT(DISTINCT category_id) from the subquery ensures that the
customer has purchased from every available category in the products
table.

o Q.74

Identify the top 3 employees with the highest salaries within each department
at PwC.

Explanation:
To solve this, you need to identify the top 3 employees with the highest
salaries within each department. You can use a window function such as
ROW_NUMBER() to assign a rank to employees within each department, ordered
by salary in descending order. Then, filter out employees who have a rank
greater than 3.

Datasets and SQL Schemas:


-- Table creation and sample data
CREATE TABLE employees (
employee_id INT PRIMARY KEY,
employee_name VARCHAR(100),
department VARCHAR(50),
salary DECIMAL(10, 2)
);

-- Sample data insertions


INSERT INTO employees VALUES
(1, 'Chris Martin', 'Consulting', 85000.00),
(2, 'Jennifer Lewis', 'Finance', 92000.00),
(3, 'Emily Taylor', 'Finance', 88000.00),
(4, 'Michael Scott', 'Consulting', 78000.00),
(5, 'David Lee', 'Finance', 95000.00);

Learnings:

• Window Functions: ROW_NUMBER() is used to rank employees within


each department based on their salary.
• Filtering: By using the window function to assign ranks, we can filter
for the top 3 employees in each department.
• ORDER BY: The employees are ranked by salary in descending order
to ensure the highest salaries come first.

Solutions:
PostgreSQL Solution:
WITH RankedEmployees AS (
SELECT employee_id, employee_name, department, salary,
ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC
) AS rank

107
1000+ SQL Interview Questions & Answers | By Zero Analyst

FROM employees
)
SELECT employee_id, employee_name, department, salary
FROM RankedEmployees
WHERE rank <= 3;

MySQL Solution:
In MySQL 8.0 and later, you can use the same approach with window
functions:
WITH RankedEmployees AS (
SELECT employee_id, employee_name, department, salary,
ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC
) AS rank
FROM employees
)
SELECT employee_id, employee_name, department, salary
FROM RankedEmployees
WHERE rank <= 3;

Explanation of Solutions:

• CTE (WITH): We first create a Common Table Expression (CTE)


called RankedEmployees that computes the rank of each employee
within their department using ROW_NUMBER().
• ROW_NUMBER(): This window function assigns a sequential number
(starting at 1) to each employee, ordered by salary in descending order
within each department (partitioned by department).
• Filtering: In the final SELECT, we filter out the employees whose rank
is greater than 3, thereby selecting only the top 3 employees in each
department.

This query works in both PostgreSQL and MySQL (8.0+), which support
window functions.

o Q.75

Determine the total number of unique suppliers used by both Barclays and
HSBC in the same year.

Explanation:
To solve this problem, you need to find the suppliers who have contracts with
both Barclays and HSBC in the same year. This involves:

1. Joining the contracts table with the suppliers table based on


supplier_id.

2. Filtering contracts for both Barclays and HSBC.

3. Ensuring that the contracts are from the same year.

4. Counting the number of unique suppliers who meet these criteria.

108
1000+ SQL Interview Questions & Answers | By Zero Analyst

Datasets and SQL Schemas:


-- Table creation and sample data
CREATE TABLE suppliers (
supplier_id INT PRIMARY KEY,
supplier_name VARCHAR(100)
);

CREATE TABLE contracts (


contract_id INT PRIMARY KEY,
supplier_id INT,
company VARCHAR(50),
contract_date DATE
);

-- Sample data insertions


INSERT INTO suppliers VALUES
(1, 'Tech Supplies Ltd'),
(2, 'Finance Services Ltd');

INSERT INTO contracts VALUES


(1, 1, 'Barclays', '2023-02-10'),
(2, 1, 'HSBC', '2023-02-25');

Learnings:

• Join: Combining the contracts and suppliers tables based on the


supplier_id.
• Filtering: Ensuring the supplier has contracts with both Barclays and
HSBC in the same year.
• Aggregation: Using COUNT(DISTINCT) to count unique suppliers.
• Date functions: Extracting the year from the contract_date to
ensure the contracts are within the same year.

Solutions:
PostgreSQL Solution:
SELECT COUNT(DISTINCT c.supplier_id) AS unique_suppliers
FROM contracts c
JOIN suppliers s ON c.supplier_id = s.supplier_id
WHERE c.company IN ('Barclays', 'HSBC')
GROUP BY c.supplier_id, EXTRACT(YEAR FROM c.contract_date)
HAVING COUNT(DISTINCT c.company) = 2;

MySQL Solution:
SELECT COUNT(DISTINCT c.supplier_id) AS unique_suppliers
FROM contracts c
JOIN suppliers s ON c.supplier_id = s.supplier_id
WHERE c.company IN ('Barclays', 'HSBC')
GROUP BY c.supplier_id, YEAR(c.contract_date)
HAVING COUNT(DISTINCT c.company) = 2;

Explanation of Solutions:

• Join: We join the contracts table with the suppliers table on


supplier_id to get supplier details.
• Filtering: We filter the contracts to only consider those related to
Barclays or HSBC (WHERE c.company IN ('Barclays', 'HSBC')).

109
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Grouping: We group by supplier_id and the year extracted from the


contract_date using EXTRACT(YEAR FROM c.contract_date) in
PostgreSQL or YEAR(c.contract_date) in MySQL.
• HAVING clause: We use HAVING COUNT(DISTINCT c.company) = 2
to ensure the supplier has contracts with both Barclays and HSBC in
the same year.
• Counting: We count the distinct supplier_id to get the total number
of unique suppliers who meet the condition.

This solution works for both PostgreSQL and MySQL, with slight variations
in date extraction functions.

o Q.76

Write a SQL query to find customers who have ordered more than once and
their total spending.

Explanation:
To solve this, you need to:

1. Identify customers who have placed more than one order.

2. Calculate the total amount spent by each of these customers.

You can use GROUP BY to aggregate orders by customer, use HAVING to filter
customers who have ordered more than once, and SUM() to calculate the total
spending.

Datasets and SQL Schemas:


-- Table creation and sample data
CREATE TABLE customers (
customer_id INT PRIMARY KEY,
customer_name VARCHAR(100)
);

CREATE TABLE orders (


order_id INT PRIMARY KEY,
customer_id INT,
amount DECIMAL(10, 2),
order_date DATE,
FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
);

-- Sample data insertions


INSERT INTO customers (customer_id, customer_name) VALUES
(1, 'Anjali'),
(2, 'Rohan'),
(3, 'Suresh'),
(4, 'Priya'),
(5, 'Rahul');

INSERT INTO orders (order_id, customer_id, amount, order_date) VALUES


(1, 1, 2500, '2023-01-01'),
(2, 1, 3000, '2023-01-02'),
(3, 2, 4500, '2023-01-05'),
(4, 3, 4000, '2023-01-12'),

110
1000+ SQL Interview Questions & Answers | By Zero Analyst

(5, 1, 1500, '2023-01-15'),


(6, 2, 3500, '2023-01-20');

Learnings:

• Grouping: GROUP BY is used to group orders by customer_id.


• Filtering: HAVING COUNT(order_id) > 1 filters customers who have
placed more than one order.
• Aggregation: SUM(amount) is used to calculate the total spending by
each customer.

Solutions:
PostgreSQL Solution:
SELECT c.customer_name, SUM(o.amount) AS total_spending
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id
HAVING COUNT(o.order_id) > 1;

MySQL Solution:
SELECT c.customer_name, SUM(o.amount) AS total_spending
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id
HAVING COUNT(o.order_id) > 1;

Explanation of Solutions:

• Join: The customers table is joined with the orders table on


customer_id to combine customer details with their orders.
• Grouping: GROUP BY c.customer_id groups the results by each
customer.
• Filtering: The HAVING COUNT(o.order_id) > 1 clause ensures that
only customers who have made more than one order are included.
• Aggregation: SUM(o.amount) calculates the total spending for each
customer.

This query works identically in both PostgreSQL and MySQL. It returns the
customers who have placed multiple orders and the total amount they have
spent.

o Q.77

Find the top 5 products with the highest average rating in each category,
including their manufacturer and the number of reviews.

Explanation:
To solve this, you need to:

1. Calculate the average rating for each product using AVG(rating).

111
1000+ SQL Interview Questions & Answers | By Zero Analyst

2. Count the number of reviews for each product using


COUNT(review_id).

3. Rank products within each category based on the average rating.

4. Return the top 5 products with the highest average rating per
category.

You can use window functions like ROW_NUMBER() or RANK() to rank the
products in each category based on their average ratings and limit the result to
the top 5 in each category.

Datasets and SQL Schemas:


-- Table creation and sample data
CREATE TABLE products (
product_id INT,
product_name VARCHAR(100),
category_id INT,
manufacturer VARCHAR(100)
);

CREATE TABLE reviews (


review_id INT,
product_id INT,
rating DECIMAL(2, 1)
);

-- Sample data insertions


INSERT INTO products VALUES
(1, 'Laptop', 1, 'BrandA'),
(2, 'Smartphone', 1, 'BrandB'),
(3, 'Tablet', 2, 'BrandC'),
(4, 'Headphones', 2, 'BrandA'),
(5, 'Smartwatch', 3, 'BrandD');

INSERT INTO reviews VALUES


(1, 1, 4.5),
(2, 1, 4.7),
(3, 2, 4.3),
(4, 2, 4.8),
(5, 3, 4.0),
(6, 3, 4.5),
(7, 4, 4.8),
(8, 5, 4.2);

Learnings:

• Aggregation: Use AVG() to compute the average rating and COUNT()


to compute the number of reviews for each product.
• Window Functions: Use RANK() or ROW_NUMBER() to rank the
products within each category based on their average rating.
• Partitioning: PARTITION BY allows you to compute the rank for each
category separately.
• Filtering: Use the WHERE clause to filter out products that are not in the
top 5.

Solutions:
PostgreSQL Solution:

112
1000+ SQL Interview Questions & Answers | By Zero Analyst

WITH ProductRatings AS (
SELECT p.product_id, p.product_name, p.category_id, p.manufacturer,
AVG(r.rating) AS avg_rating, COUNT(r.review_id) AS num_reviews,
RANK() OVER (PARTITION BY p.category_id ORDER BY AVG(r.rating)
DESC) AS rank
FROM products p
JOIN reviews r ON p.product_id = r.product_id
GROUP BY p.product_id, p.product_name, p.category_id, p.manufacturer
)
SELECT product_id, product_name, category_id, manufacturer, avg_rating, nu
m_reviews
FROM ProductRatings
WHERE rank <= 5
ORDER BY category_id, rank;

MySQL Solution:
WITH ProductRatings AS (
SELECT p.product_id, p.product_name, p.category_id, p.manufacturer,
AVG(r.rating) AS avg_rating, COUNT(r.review_id) AS num_reviews,
RANK() OVER (PARTITION BY p.category_id ORDER BY AVG(r.rating)
DESC) AS rank
FROM products p
JOIN reviews r ON p.product_id = r.product_id
GROUP BY p.product_id, p.product_name, p.category_id, p.manufacturer
)
SELECT product_id, product_name, category_id, manufacturer, avg_rating, nu
m_reviews
FROM ProductRatings
WHERE rank <= 5
ORDER BY category_id, rank;

Explanation of Solutions:

• CTE (WITH): The Common Table Expression (CTE) ProductRatings


calculates the average rating (AVG(r.rating)) and the number of
reviews (COUNT(r.review_id)) for each product. It also uses the
RANK() window function to rank the products within each category by
their average rating.
• RANK(): This window function assigns a rank to each product within
its category, ordered by the average rating in descending order.
• Partitioning: PARTITION BY p.category_id ensures that the ranking
is done separately for each category.
• Filtering: The WHERE rank <= 5 clause ensures that only the top 5
products in each category are selected.
• Sorting: The ORDER BY category_id, rank ensures the final result
is sorted by category and rank.

This solution works for both PostgreSQL and MySQL (8.0+), which support
window functions like RANK(). It returns the top 5 products in each category
along with their average rating, manufacturer, and the number of reviews.

o Q.78

Find the top 5 products with the highest average rating in each category,
including their manufacturer and the number of reviews.

Explanation:
To solve this, you need to:

113
1000+ SQL Interview Questions & Answers | By Zero Analyst

1. Calculate the average rating for each product using AVG(rating).

2. Count the number of reviews for each product using


COUNT(review_id).

3. Rank products within each category based on the average rating.

4. Retrieve only the top 5 products for each category.

You can use window functions like RANK() or ROW_NUMBER() to rank the
products based on their average ratings. The products should be ordered by
rating in descending order, and you can use PARTITION BY to rank them
separately within each category.

Datasets and SQL Schemas:


-- Table creation and sample data
CREATE TABLE products (
product_id INT,
product_name VARCHAR(100),
category_id INT,
manufacturer VARCHAR(100)
);

CREATE TABLE reviews (


review_id INT,
product_id INT,
rating DECIMAL(2, 1)
);

-- Sample data insertions


INSERT INTO products VALUES
(1, 'Laptop', 1, 'BrandA'),
(2, 'Smartphone', 1, 'BrandB'),
(3, 'Tablet', 2, 'BrandC'),
(4, 'Headphones', 2, 'BrandA'),
(5, 'Smartwatch', 3, 'BrandD');

INSERT INTO reviews VALUES


(1, 1, 4.5),
(2, 1, 4.7),
(3, 2, 4.3),
(4, 2, 4.8),
(5, 3, 4.0),
(6, 3, 4.5),
(7, 4, 4.8),
(8, 5, 4.2);

Learnings:

• Aggregation: You need to aggregate the rating and review_id for


each product to get the average rating and the total number of reviews.
• Window Functions: RANK() or ROW_NUMBER() are used to rank the
products based on their average rating within each category.
• Partitioning: Use PARTITION BY to ensure the ranking is done per
category, and ORDER BY to sort products by their average rating in
descending order.

Solutions:

114
1000+ SQL Interview Questions & Answers | By Zero Analyst

PostgreSQL Solution:
WITH ProductRatings AS (
SELECT p.product_id, p.product_name, p.category_id, p.manufacturer,
AVG(r.rating) AS avg_rating, COUNT(r.review_id) AS num_reviews,
RANK() OVER (PARTITION BY p.category_id ORDER BY AVG(r.rating)
DESC) AS rank
FROM products p
LEFT JOIN reviews r ON p.product_id = r.product_id
GROUP BY p.product_id, p.product_name, p.category_id, p.manufacturer
)
SELECT product_id, product_name, category_id, manufacturer, avg_rating, nu
m_reviews
FROM ProductRatings
WHERE rank <= 5
ORDER BY category_id, rank;

MySQL Solution:
WITH ProductRatings AS (
SELECT p.product_id, p.product_name, p.category_id, p.manufacturer,
AVG(r.rating) AS avg_rating, COUNT(r.review_id) AS num_reviews,
RANK() OVER (PARTITION BY p.category_id ORDER BY AVG(r.rating)
DESC) AS rank
FROM products p
LEFT JOIN reviews r ON p.product_id = r.product_id
GROUP BY p.product_id, p.product_name, p.category_id, p.manufacturer
)
SELECT product_id, product_name, category_id, manufacturer, avg_rating, nu
m_reviews
FROM ProductRatings
WHERE rank <= 5
ORDER BY category_id, rank;

Explanation of Solutions:

• CTE (WITH): We first create a Common Table Expression (CTE)


called ProductRatings that calculates the average rating
(AVG(r.rating)) and the total number of reviews
(COUNT(r.review_id)) for each product. We also calculate the rank
using the RANK() window function, which ranks the products by their
average rating within each category.
• RANK(): The RANK() function assigns a rank to each product based on
the average rating, ordered in descending order within each category.
Products with the same rating get the same rank.
• LEFT JOIN: We use LEFT JOIN to ensure that all products are
included, even if they don't have reviews (i.e., for products with no
reviews, the count will be zero).
• Partitioning: PARTITION BY p.category_id ensures the ranking is
done separately for each category.
• Filtering: The WHERE rank <= 5 clause filters out products that are
not in the top 5 for each category.
• Sorting: The final ORDER BY category_id, rank sorts the results
first by category, then by rank.

This solution works for both PostgreSQL and MySQL 8.0+, which support
window functions like RANK(). The query returns the top 5 products with the
highest average rating in each category, including their manufacturer and the
number of reviews.

115
1000+ SQL Interview Questions & Answers | By Zero Analyst

o Q.79

Find all employees who have not taken any training sessions in the last year
and the number of projects they are currently assigned to.

Explanation:
To solve this, you need to:

1. Identify employees who have not taken any training sessions in the
last year. This involves filtering out employees who have a
training_date within the last 12 months.

2. Count the number of projects each employee is currently assigned to,


using the projects table.

3. Use LEFT JOIN to ensure all employees are included, even those
without any training sessions or project assignments.

4. Use WHERE and NOT EXISTS to filter out employees who have training
sessions within the last year.

Datasets and SQL Schemas:


-- Table creation and sample data
CREATE TABLE employees (
employee_id INT,
employee_name VARCHAR(100)
);

CREATE TABLE training_sessions (


session_id INT,
employee_id INT,
training_date DATE
);

CREATE TABLE projects (


project_id INT,
employee_id INT
);

-- Sample data insertions


INSERT INTO employees VALUES
(1, 'John'),
(2, 'Jane'),
(3, 'Mark'),
(4, 'Lucy');

INSERT INTO training_sessions VALUES


(1, 1, '2022-01-10'),
(2, 1, '2021-06-15'),
(3, 2, '2023-02-20'),
(4, 2, '2021-11-01');

INSERT INTO projects VALUES


(1, 1),
(2, 1),
(3, 2),
(4, 3);

Learnings:

116
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Date Functions: Use the CURRENT_DATE function to compare the


training_date with the last year.
• Filtering: Use NOT EXISTS or LEFT JOIN with a condition to filter
employees who haven't attended any training session in the last year.
• Counting: Use COUNT() to determine the number of projects assigned
to each employee.
• Handling Nulls: Ensure that employees with no project assignments
are still counted as having 0 projects.

Solutions:
PostgreSQL Solution:
SELECT e.employee_name, COUNT(p.project_id) AS num_projects
FROM employees e
LEFT JOIN training_sessions t ON e.employee_id = t.employee_id
AND t.training_date >= CURRENT_DATE - INTERVAL '1 year'
LEFT JOIN projects p ON e.employee_id = p.employee_id
WHERE t.session_id IS NULL -- No training session in the last year
GROUP BY e.employee_id, e.employee_name
ORDER BY e.employee_name;

MySQL Solution:
SELECT e.employee_name, COUNT(p.project_id) AS num_projects
FROM employees e
LEFT JOIN training_sessions t ON e.employee_id = t.employee_id
AND t.training_date >= CURDATE() - INTERVAL 1 YEAR
LEFT JOIN projects p ON e.employee_id = p.employee_id
WHERE t.session_id IS NULL -- No training session in the last year
GROUP BY e.employee_id, e.employee_name
ORDER BY e.employee_name;

Explanation of Solutions:

• LEFT JOIN with training_sessions: We join the employees table


with the training_sessions table and filter only the sessions that
occurred within the last year using the AND t.training_date >=
CURRENT_DATE - INTERVAL '1 year' condition (PostgreSQL) or
AND t.training_date >= CURDATE() - INTERVAL 1 YEAR
(MySQL).
• Filtering for employees without recent training: The WHERE
t.session_id IS NULL ensures that only employees who do not
have a training session within the last year are included.
• LEFT JOIN with projects: We use a LEFT JOIN with the projects
table to include all employees, even those who may not have any
projects assigned. If an employee has no projects, the count will be 0.
• Counting projects: COUNT(p.project_id) counts the number of
projects each employee is assigned to. This will return 0 for employees
with no projects.
• Grouping: We use GROUP BY to group by employee_id and
employee_name, ensuring the results are aggregated per employee.
• Sorting: The ORDER BY e.employee_name clause sorts the employees
alphabetically.

Key Points:

117
1000+ SQL Interview Questions & Answers | By Zero Analyst

• The solution handles employees with no training sessions and no


project assignments.
• The filtering logic (WHERE t.session_id IS NULL) ensures that only
employees without recent training sessions are included.
• The query works for both PostgreSQL and MySQL with minimal
differences in date handling.

This query will give you the list of employees who haven't taken any training
in the past year along with the number of projects they are assigned to.

o Q.80

Write an SQL query to find the name of the product with the highest price in
each country.

Explanation:
To solve this problem, the task is to:

1. Identify the highest-priced product in each country.

2. Group by country to ensure we get the highest-priced product per


country.

3. Join the Product and Supplier tables based on supplier_id to


match products with their respective suppliers and countries.

4. Use the ROW_NUMBER() window function or MAX() aggregation to


ensure we get the product with the highest price for each country.

Datasets and SQL Schemas:


-- Table creation and sample data
CREATE TABLE suppliers(
supplier_id INT PRIMARY KEY,
supplier_name VARCHAR(25),
country VARCHAR(25)
);

INSERT INTO suppliers


VALUES
(501, 'alan', 'India'),
(502, 'rex', 'US'),
(503, 'dodo', 'India'),
(504, 'rahul', 'US'),
(505, 'zara', 'Canada'),
(506, 'max', 'Canada');

CREATE TABLE products(


product_id INT PRIMARY KEY,
product_name VARCHAR(25),
supplier_id INT,
price FLOAT,
FOREIGN KEY (supplier_id) REFERENCES suppliers(supplier_id)
);

INSERT INTO products


VALUES
(201, 'iPhone 14', 501, 1299),
(202, 'iPhone 8', 502, 999),

118
1000+ SQL Interview Questions & Answers | By Zero Analyst

(204, 'iPhone 13', 502, 1199),


(203, 'iPhone 11', 503, 1199),
(205, 'iPhone 12', 502, 1199),
(206, 'iPhone 14', 501, 1399),
(214, 'iPhone 15', 503, 1499),
(207, 'iPhone 15', 505, 1499),
(208, 'iPhone 15', 504, 1499),
(209, 'iPhone 12', 502, 1299),
(210, 'iPhone 13', 502, 1199),
(211, 'iPhone 11', 501, 1099),
(212, 'iPhone 14', 503, 1399),
(213, 'iPhone 8', 502, 1099),
-- adding more products
(222, 'Samsung Galaxy S21', 504, 1699),
(223, 'Samsung Galaxy S20', 505, 1899),
(224, 'Google Pixel 6', 501, 899),
(225, 'Google Pixel 5', 502, 799),
(226, 'OnePlus 9 Pro', 503, 1699),
(227, 'OnePlus 9', 502, 1999),
(228, 'Xiaomi Mi 11', 501, 899),
(229, 'Xiaomi Mi 10', 504, 699),
(230, 'Huawei P40 Pro', 505, 1099),
(231, 'Huawei P30', 502, 1299),
(232, 'Sony Xperia 1 III', 503, 1199),
(233, 'Sony Xperia 5 III', 501, 999),
(234, 'LG Velvet', 505, 1899),
(235, 'LG G8 ThinQ', 504, 799),
(236, 'Motorola Edge Plus', 502, 1099),
(237, 'Motorola One 5G', 501, 799),
(238, 'ASUS ROG Phone 5', 503, 1999),
(239, 'ASUS ZenFone 8', 504, 999),
(240, 'Nokia 8.3 5G', 502, 899),
(241, 'Nokia 7.2', 501, 699),
(242, 'BlackBerry Key2', 504, 1899),
(243, 'BlackBerry Motion', 502, 799),
(244, 'HTC U12 Plus', 501, 899),
(245, 'HTC Desire 20 Pro', 505, 699),
(246, 'Lenovo Legion Phone Duel', 503, 1499),
(247, 'Lenovo K12 Note', 504, 1499),
(248, 'ZTE Axon 30 Ultra', 501, 1299),
(249, 'ZTE Blade 20', 502, 1599),
(250, 'Oppo Find X3 Pro', 503, 1999);

-- Sample data for Supplier and Product tables


SELECT * FROM suppliers;
SELECT * FROM products;

Learnings:

• Joins: You need to join two tables: Product and Supplier using
supplier_id to associate products with suppliers and their countries.
• Aggregation: Use MAX() or window functions to find the highest-
priced product for each country.
• Group By: Group the results by country to get one entry per country.
• Subqueries: To get the highest-priced product per country, you may
need a subquery or a window function to filter the product with the
highest price for each country.

Solutions:

PostgreSQL / MySQL Solution:


WITH MaxPriceProducts AS (
SELECT p.product_name, p.price, s.country,
ROW_NUMBER() OVER (PARTITION BY s.country ORDER BY p.price DESC
) AS rank

119
1000+ SQL Interview Questions & Answers | By Zero Analyst

FROM products p
JOIN suppliers s ON p.supplier_id = s.supplier_id
)
SELECT product_name, country, price
FROM MaxPriceProducts
WHERE rank = 1
ORDER BY country;

Explanation of Solution:

1. CTE (WITH MaxPriceProducts):


• The CTE calculates the rank for each product within its country
based on the highest price (ROW_NUMBER() OVER (PARTITION
BY s.country ORDER BY p.price DESC)).
• The PARTITION BY s.country ensures that the ranking is
done separately for each country, while ORDER BY p.price
DESC ensures that the highest-priced product is ranked first.

2. Filtering with WHERE rank = 1:


• This ensures that only the highest-priced product per country is
selected.

3. Final SELECT:
• The query retrieves the product_name, country, and price for
the top product in each country.

4. Sorting:
• The results are ordered by country for a clearer output.

Expected Output:
The query will return the name of the product with the highest price in each
country along with the price:

product_name country price

iPhone 15 Canada 1899

Samsung Galaxy S21 US 1699

iPhone 14 India 1399


This query works efficiently for both PostgreSQL and MySQL and returns
the correct highest-priced product for each country.

o Q.81

Write an SQL query to calculate the difference between the highest salaries in
the marketing and engineering departments. Output the absolute difference in
salaries.

120
1000+ SQL Interview Questions & Answers | By Zero Analyst

Explanation:
To solve this problem, we need to:

1. Find the highest salary in the Marketing department.

2. Find the highest salary in the Engineering department.

3. Calculate the absolute difference between these two salaries.

4. Use SQL aggregation functions (like MAX()) and absolute difference


(ABS() function) to achieve this.

Datasets and SQL Schemas:


-- DDL for Salaries table
CREATE TABLE Salaries (
emp_name VARCHAR(50),
department VARCHAR(50),
salary INT,
PRIMARY KEY (emp_name, department)
);

-- DML for Salaries table


INSERT INTO Salaries (emp_name, department, salary) VALUES
('Kathy', 'Engineering', 50000),
('Roy', 'Marketing', 30000),
('Charles', 'Engineering', 45000),
('Jack', 'Engineering', 85000),
('Benjamin', 'Marketing', 34000),
('Anthony', 'Marketing', 42000),
('Edward', 'Engineering', 102000),
('Terry', 'Engineering', 44000),
('Evelyn', 'Marketing', 53000),
('Arthur', 'Engineering', 32000);

Learnings:

• Aggregation: Using MAX() to get the highest salary for each


department.
• Conditional Filtering: Filtering by departments (e.g., Marketing,
Engineering) in the query.
• Absolute Difference: Using the ABS() function to calculate the
absolute difference between two values.

Solutions:

PostgreSQL / MySQL Solution:


SELECT ABS(
(SELECT MAX(salary) FROM Salaries WHERE department = 'Engineeri
ng') -
(SELECT MAX(salary) FROM Salaries WHERE department = 'Marketing
')
) AS salary_difference;

Explanation of Solution:

1. Subqueries:

121
1000+ SQL Interview Questions & Answers | By Zero Analyst

• The first subquery (SELECT MAX(salary) FROM Salaries


WHERE department = 'Engineering') retrieves the
maximum salary in the Engineering department.
• The second subquery (SELECT MAX(salary) FROM Salaries
WHERE department = 'Marketing') retrieves the maximum
salary in the Marketing department.

2. ABS() function:
• The ABS() function ensures that the result is always positive,
regardless of which department has the higher salary.

3. Final Output:
• The result will be the absolute difference between the highest
salaries in the Engineering and Marketing departments.

Expected Output:
The query will return the absolute difference between the highest salaries in
the two departments.
salary_difference

27000

• The highest salary in Engineering is 102000 (Edward).


• The highest salary in Marketing is 53000 (Evelyn).
• The absolute difference is |102000 - 53000| = 27000.

This query works efficiently for both PostgreSQL and MySQL and provides
the correct absolute difference in salaries.

o Q.82

Write an SQL query to find the average order amount for male and female
customers separately. Return the results with 2 decimal points.

Explanation:
To solve this problem:

1. Join the customers table with the orders table on the customer_id
column.

2. Group the data by gender to calculate the average order amount for
male and female customers separately.

3. Use the AVG() function to compute the average order amount for each
gender.

122
1000+ SQL Interview Questions & Answers | By Zero Analyst

4. Format the result to 2 decimal places using the ROUND() function.

Datasets and SQL Schemas:


-- Create customers table
CREATE TABLE customers (
customer_id INT PRIMARY KEY,
customer_name VARCHAR(255),
age INT,
gender VARCHAR(10)
);

-- Create orders table


CREATE TABLE orders (
order_id INT PRIMARY KEY,
customer_id INT,
order_date DATE,
total_amount DECIMAL(10, 2),
FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
);

-- Insert values into the customers table


INSERT INTO customers (customer_id, customer_name, age, gender)
VALUES
(1, 'John Doe', 30, 'Male'),
(2, 'Jane Smith', 25, 'Female'),
(3, 'Alice Johnson', 35, 'Female'),
(4, 'Bob Brown', 40, 'Male');

-- Insert values into the orders table


INSERT INTO orders (order_id, customer_id, order_date, total_amount)
VALUES
(101, 1, '2023-01-15', 150.50),
(102, 2, '2022-02-20', 200.25),
(103, 3, '2023-03-10', 180.75),
(104, 4, '2023-04-05', 300.00),
(105, 1, '2022-05-12', 175.80),
(106, 2, '2021-06-18', 220.40),
(107, 3, '2023-07-22', 190.30),
(108, 4, '2023-08-30', 250.60),
(109, 4, '2021-08-30', 250.60),
(110, 4, '2024-01-30', 250.60),
(111, 4, '2023-08-30', 250.60);

Learnings:

• JOIN operation: Using JOIN to combine data from two tables based
on a common key (customer_id).
• AVG() function: Computing the average of numerical data.
• GROUP BY: Grouping the result by gender to calculate averages for
each group.
• ROUND() function: Formatting the result to a specified number of
decimal places.

Solutions:

PostgreSQL / MySQL Solution:


SELECT
c.gender,
ROUND(AVG(o.total_amount), 2) AS avg_order_amount
FROM
customers c
JOIN
orders o ON c.customer_id = o.customer_id
GROUP BY

123
1000+ SQL Interview Questions & Answers | By Zero Analyst

c.gender;

Explanation of Solution:

1. JOIN: The JOIN clause is used to combine the customers table (c)
with the orders table (o) on the customer_id column.

2. AVG(): The AVG(o.total_amount) computes the average order


amount for each gender.

3. ROUND(): The ROUND() function is used to format the average order


amount to two decimal places.

4. GROUP BY: The query groups the data by the gender column from
the customers table, so we calculate the average for each gender
separately.

Expected Output:

gender avg_order_amount

Male 226.18

Female 191.40

• Male customers have an average order amount of 226.18.


• Female customers have an average order amount of 191.40.

This query works in both PostgreSQL and MySQL and provides the correct
average order amounts for male and female customers, rounded to two
decimal places.

o Q.83

Write an SQL query to obtain the third transaction of every user. Output the
user id, spend, and transaction date.

Explanation:
To retrieve the third transaction for each user, the approach involves:

1. Sorting the transactions by transaction date for each user, as the


third transaction should be the one with the 3rd earliest transaction
date.

2. Filtering the results to only get the third transaction for each user.

3. Using the ROW_NUMBER() window function to assign a unique


number to each transaction for each user, and filtering for the third one.

124
1000+ SQL Interview Questions & Answers | By Zero Analyst

Datasets and SQL Schemas:


-- Create transactions table
CREATE TABLE transactions (
user_id INTEGER,
spend DECIMAL(10, 2),
transaction_date TIMESTAMP
);

-- Insert data into transactions table


INSERT INTO transactions (user_id, spend, transaction_date) VALUES
(111, 100.50, '2022-01-08 12:00:00'),
(111, 55.00, '2022-01-10 12:00:00'),
(121, 36.00, '2022-01-18 12:00:00'),
(145, 24.99, '2022-01-26 12:00:00'),
(111, 89.60, '2022-02-05 12:00:00');

Learnings:

• Window Functions: Using ROW_NUMBER() to rank transactions per


user.
• ORDER BY: Sorting the transactions by transaction_date to
ensure the correct order.
• Filtering: Using WHERE to get the third transaction for each user.

Solutions:

PostgreSQL / MySQL Solution (with window function support):


WITH RankedTransactions AS (
SELECT
user_id,
spend,
transaction_date,
ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY transaction_date)
AS row_num
FROM transactions
)
SELECT user_id, spend, transaction_date
FROM RankedTransactions
WHERE row_num = 3;

Explanation of Solution:

1. Common Table Expression (CTE): The RankedTransactions CTE


uses the ROW_NUMBER() window function to assign a sequential number
(row_num) to each transaction, ordered by transaction_date for each
user_id. The PARTITION BY user_id clause ensures the numbering
restarts for each user.

2. ROW_NUMBER(): This window function is used to assign a rank to


each transaction within each user group based on the
transaction_date. The first transaction gets row number 1, the
second gets 2, and so on.

3. Filtering the Third Transaction: The final query selects only the
rows where row_num = 3, which gives us the third transaction for each
user.

125
1000+ SQL Interview Questions & Answers | By Zero Analyst

Expected Output:

user_id spend transaction_date

111 89.60 2022-02-05 12:00:00

• The result shows the third transaction for user 111, which is the
transaction on 2022-02-05 with a spend of 89.60.

Notes:

• Partitioning: By partitioning the data by user_id, we ensure each


user’s transactions are ranked independently.
• ROW_NUMBER(): If a user has less than three transactions, they
won't appear in the result, as they do not have a third transaction.

This solution works well for both PostgreSQL and MySQL with window
function support.

o Q.84

Find the top 5 products whose revenue has decreased in comparison to the
previous year (both 2022 and 2023). Return the product name, revenue for the
previous year, revenue for the current year, revenue decreased, and the
decreased ratio (percentage).

Explanation:
We need to:

1. Compare the revenue of each product between 2022 and 2023.

2. Filter the products whose revenue has decreased in 2023 compared to


2022.

3. Calculate the difference in revenue (revenue_decreased) and the


percentage decrease (decreased_ratio).

4. Sort the results by the largest decrease and return the top 5 products.

Datasets and SQL Schemas:


-- Create product_revenue table
CREATE TABLE product_revenue (
product_name VARCHAR(255),
year INTEGER,
revenue DECIMAL(10, 2)
);

-- Insert sample records


INSERT INTO product_revenue (product_name, year, revenue) VALUES
('Product A', 2022, 10000.00),
('Product A', 2023, 9500.00),
('Product B', 2022, 15000.00),

126
1000+ SQL Interview Questions & Answers | By Zero Analyst

('Product B', 2023, 14500.00),


('Product C', 2022, 8000.00),
('Product C', 2023, 8500.00),
('Product D', 2022, 12000.00),
('Product D', 2023, 12500.00),
('Product E', 2022, 20000.00),
('Product E', 2023, 19000.00),
('Product F', 2022, 7000.00),
('Product F', 2023, 7200.00),
('Product G', 2022, 18000.00),
('Product G', 2023, 17000.00),
('Product H', 2022, 3000.00),
('Product H', 2023, 3200.00),
('Product I', 2022, 9000.00),
('Product I', 2023, 9200.00),
('Product J', 2022, 6000.00),
('Product J', 2023, 5900.00);

Learnings:

• Self-Join or Window Functions: Using JOIN or LEAD/LAG window


functions to compare revenue between two years.
• Mathematical Operations: Calculating the revenue difference and
percentage change.
• Sorting: Sorting results based on the highest decrease in revenue.

Solutions:

PostgreSQL / MySQL Solution (with JOIN):


SELECT
pr_2023.product_name,
pr_2022.revenue AS revenue_2022,
pr_2023.revenue AS revenue_2023,
(pr_2022.revenue - pr_2023.revenue) AS revenue_decreased,
ROUND((pr_2022.revenue - pr_2023.revenue) / pr_2022.revenue * 100, 2)
AS decreased_ratio
FROM
product_revenue pr_2022
JOIN
product_revenue pr_2023
ON
pr_2022.product_name = pr_2023.product_name
AND pr_2022.year = 2022
AND pr_2023.year = 2023
WHERE
pr_2023.revenue < pr_2022.revenue
ORDER BY
revenue_decreased DESC
LIMIT 5;

Explanation of Solution:

1. Self Join: The query joins the product_revenue table on itself


(pr_2022 for 2022 and pr_2023 for 2023) using the product_name
and year columns to compare the revenue for each product in both
years.

2. Revenue Decreased: The difference in revenue (revenue_decreased)


is calculated as pr_2022.revenue - pr_2023.revenue.

127
1000+ SQL Interview Questions & Answers | By Zero Analyst

3. Percentage Decrease: The percentage decrease is calculated as


(revenue_decreased / pr_2022.revenue) * 100, and ROUND is
used to keep the result to two decimal places.

4. Filtering: The WHERE clause ensures that only products with a revenue
decrease in 2023 compared to 2022 are included.

5. Sorting and Limiting: The query is sorted by the


revenue_decreased in descending order, and only the top 5 products
are returned using LIMIT 5.

Expected Output:

product_na revenue_20 revenue_20 revenue_decrea decreased_ra


me 22 23 sed tio

Product E 20000.00 19000.00 1000.00 5.00

Product G 18000.00 17000.00 1000.00 5.56

Product A 10000.00 9500.00 500.00 5.00

Product J 6000.00 5900.00 100.00 1.67

Notes:

• ROUND Function: In PostgreSQL and MySQL, ROUND is used to limit


the percentage decrease to 2 decimal places.
• Efficiency: This query assumes that each product has only one record
for 2022 and 2023. If the dataset is larger, more complex filtering and
grouping may be needed.

This solution works well for both PostgreSQL and MySQL.

o Q.85

Write a query that calculates the total viewership for laptops and mobile
devices, where mobile is defined as the sum of tablet and phone viewership.
Output the total viewership for laptops as laptop_views and the total
viewership for mobile devices as mobile_views.

Explanation:
We need to:

1. Sum the viewership_count for laptop devices.

128
1000+ SQL Interview Questions & Answers | By Zero Analyst

2. Sum the viewership_count for tablet and phone devices to


compute the total mobile viewership.

3. Return these two sums as laptop_views and mobile_views.

Datasets and SQL Schemas:


-- Create viewership table
CREATE TABLE viewership (
device_type VARCHAR(255),
viewership_count INTEGER
);

-- Insert sample records


INSERT INTO viewership (device_type, viewership_count) VALUES
('laptop', 5000),
('tablet', 3000),
('phone', 7000),
('laptop', 6000),
('tablet', 4000),
('phone', 8000),
('laptop', 5500),
('tablet', 3500),
('phone', 7500);

Learnings:

• Aggregating Viewership: Using SUM to aggregate the viewership


count for each device type.
• Grouping Data: We don't need to group data by device type but need
to compute separate sums for laptop and mobile devices (which
includes both tablet and phone).
• Conditional Aggregation: Using CASE or OR in the WHERE clause to
distinguish between laptops and mobile devices.

Solutions:

PostgreSQL / MySQL Solution:


SELECT
SUM(CASE WHEN device_type = 'laptop' THEN viewership_count ELSE 0 END)
AS laptop_views,
SUM(CASE WHEN device_type IN ('tablet', 'phone') THEN viewership_count
ELSE 0 END) AS mobile_views
FROM
viewership;

Explanation:

1. SUM with CASE:


• For laptop_views, we use a CASE statement inside SUM to sum
the viewership_count only for rows where device_type is
'laptop'.
• For mobile_views, we use a CASE statement to sum the
viewership_count for both tablet and phone rows by
checking if device_type is either 'tablet' or 'phone'.

129
1000+ SQL Interview Questions & Answers | By Zero Analyst

2. ELSE 0: If the device type doesn't match the specified condition, we


add 0 to the sum.

Expected Output:

laptop_views mobile_views

16500 34500

Notes:

• Efficient Aggregation: This query calculates the total viewership in a


single pass over the viewership table, making it efficient.
• Device Classification: Mobile viewership is classified as the sum of
both tablet and phone devices.

This solution works for both PostgreSQL and MySQL.

o Q.86

Write a query to identify the top two highest-grossing products within each
category in the year 2022. The output should include the category, product,
and total spend.

Explanation
To solve this problem, you need to calculate the total spend per product within
each category for the year 2022. Then, you need to rank the products by their
total spend within each category and select the top two highest-grossing
products per category. You can achieve this by using the RANK() or
ROW_NUMBER() window function, along with filtering by year and summing
the spend for each product.

Datasets and SQL Schemas


Table creation and sample data
-- Create product_spend table
CREATE TABLE product_spend (
category VARCHAR(255),
product VARCHAR(255),
user_id INTEGER,
spend DECIMAL(10, 2),
transaction_date TIMESTAMP
);

-- Insert sample records


INSERT INTO product_spend (category, product, user_id, spend, transaction_
date) VALUES
('appliance', 'refrigerator', 165, 246.00, '2021-12-26 12:00:00'),
('appliance', 'refrigerator', 123, 299.99, '2022-03-02 12:00:00'),
('appliance', 'washing machine', 123, 219.80, '2022-03-02 12:00:00'),
('electronics', 'vacuum', 178, 152.00, '2022-04-05 12:00:00'),
('electronics', 'wireless headset', 156, 249.90, '2022-07-08 12:00:00'),
('electronics', 'vacuum', 145, 189.00, '2022-07-15 12:00:00');

130
1000+ SQL Interview Questions & Answers | By Zero Analyst

Learnings

• Window functions (RANK() or ROW_NUMBER()) are useful to assign


rankings to rows based on specific criteria like total spend.
• Aggregation is necessary to calculate total spend per product.
• Date filtering (by year) can be done using the EXTRACT() function or
DATE_TRUNC() depending on the SQL dialect.
• Partitioning by category allows calculating ranks independently
within each category.

Solutions
PostgreSQL solution:
WITH total_spend_per_product AS (
SELECT
category,
product,
SUM(spend) AS total_spend
FROM
product_spend
WHERE
EXTRACT(YEAR FROM transaction_date) = 2022
GROUP BY
category, product
),
ranked_products AS (
SELECT
category,
product,
total_spend,
RANK() OVER (PARTITION BY category ORDER BY total_spend DESC) AS r
ank
FROM
total_spend_per_product
)
SELECT
category,
product,
total_spend
FROM
ranked_products
WHERE
rank <= 2
ORDER BY
category, rank;

MySQL solution:
WITH total_spend_per_product AS (
SELECT
category,
product,
SUM(spend) AS total_spend
FROM
product_spend
WHERE
YEAR(transaction_date) = 2022
GROUP BY
category, product
),
ranked_products AS (
SELECT
category,
product,
total_spend,

131
1000+ SQL Interview Questions & Answers | By Zero Analyst

RANK() OVER (PARTITION BY category ORDER BY total_spend DESC) AS r


ank
FROM
total_spend_per_product
)
SELECT
category,
product,
total_spend
FROM
ranked_products
WHERE
rank <= 2
ORDER BY
category, rank;

Explanation of SQL Components:

1. CTEs (Common Table Expressions): The query uses two CTEs:


• total_spend_per_product aggregates the spend per product
for the year 2022.
• ranked_products ranks the products within each category
based on their total spend in descending order.

2. RANK(): This window function assigns a rank to each product within


its category based on total spend.

3. Filtering: The WHERE rank <= 2 filters to return only the top two
products per category.

o Q.87

Write a query to obtain a histogram of tweets posted per user in 2022. The
output should include the tweet count per user as the bucket and the number of
Twitter users who fall into that bucket.

Explanation
To solve this problem, we need to calculate the total number of tweets posted
by each user in the year 2022, then create a histogram showing how many
users fall into each tweet count "bucket". This can be done by grouping the
tweets by user and counting the number of tweets per user, then counting how
many users fall into each bucket of tweet counts.

Datasets and SQL Schemas


Table creation and sample data
-- Create tweets table
CREATE TABLE tweets (
tweet_id INTEGER,
user_id INTEGER,
msg VARCHAR(255),
tweet_date TIMESTAMP
);

132
1000+ SQL Interview Questions & Answers | By Zero Analyst

-- Insert sample records


INSERT INTO tweets (tweet_id, user_id, msg, tweet_date) VALUES
(214252, 111, 'Am considering taking Tesla private at $420. Funding secure
d.', '2021-12-30 00:00:00'),
(739252, 111, 'Despite the constant negative press covfefe', '2022-01-01 0
0:00:00'),
(846402, 111, 'Following @NickSinghTech on Twitter changed my life!', '202
2-02-14 00:00:00'),
(241425, 254, 'If the salary is so competitive why won’t you tell me what
it is?', '2022-03-01 00:00:00'),
(231574, 148, 'I no longer have a manager. I can\'t be managed', '2022-03-
23 00:00:00');

Learnings

• Aggregation: The query requires aggregation to count tweets per user.


• Group by: We use GROUP BY to group the data by user_id and
calculate the tweet count.
• Bucketization: The use of COUNT() for bucketization allows
categorizing users based on their tweet activity.
• Date filtering: Filtering by the year 2022 ensures that only tweets
from that year are considered.

Solutions
PostgreSQL solution:
WITH tweet_counts AS (
SELECT
user_id,
COUNT(tweet_id) AS tweet_count
FROM
tweets
WHERE
EXTRACT(YEAR FROM tweet_date) = 2022
GROUP BY
user_id
)
SELECT
tweet_count AS tweet_bucket,
COUNT(*) AS user_count
FROM
tweet_counts
GROUP BY
tweet_count
ORDER BY
tweet_bucket;

MySQL solution:
WITH tweet_counts AS (
SELECT
user_id,
COUNT(tweet_id) AS tweet_count
FROM
tweets
WHERE
YEAR(tweet_date) = 2022
GROUP BY
user_id
)
SELECT
tweet_count AS tweet_bucket,
COUNT(*) AS user_count
FROM
tweet_counts
GROUP BY

133
1000+ SQL Interview Questions & Answers | By Zero Analyst

tweet_count
ORDER BY
tweet_bucket;

Explanation of SQL Components:

1. CTE (Common Table Expression): The tweet_counts CTE groups


tweets by user_id and counts how many tweets each user posted in
2022.

2. Main query: The main query then groups these tweet counts
(tweet_count) and counts how many users have a particular number
of tweets (i.e., how many users fall into each "bucket").

3. Date filtering: Both PostgreSQL and MySQL use the EXTRACT(YEAR


FROM tweet_date) function or YEAR(tweet_date) to filter for tweets
from 2022.

4. Grouping and ordering: The final result groups by the tweet count
and orders the output by the tweet count to generate the histogram.

o Q.88

Write a query to find the employees who are high earners in each of the
company's departments. A high earner in a department is an employee who
has a salary in the top three unique salaries for that department.

Explanation
To solve this problem, you need to find the employees who have one of the
top three unique salaries within each department. This can be achieved by first
ranking employees within each department based on their salary in descending
order, then filtering to keep only those with the top three distinct salary values.
The solution requires handling of ties and ensuring that the "top three" salaries
are unique.

Datasets and SQL Schemas


Table creation and sample data
-- Create Department table
CREATE TABLE Department (
id INT PRIMARY KEY,
name VARCHAR(50)
);

-- Insert values into Department table


INSERT INTO Department (id, name) VALUES
(1, 'IT'),
(2, 'Sales');

-- Create Employee table


CREATE TABLE Employee (
id INT PRIMARY KEY,
name VARCHAR(50),

134
1000+ SQL Interview Questions & Answers | By Zero Analyst

salary INT,
departmentId INT,
FOREIGN KEY (departmentId) REFERENCES Department(id)
);

-- Insert additional records into Employee table


INSERT INTO Employee (id, name, salary, departmentId) VALUES
(8, 'Alice', 75000, 2),
(9, 'Bob', 82000, 2),
(10, 'Carol', 78000, 1),
(11, 'David', 70000, 1),
(12, 'Eva', 85000, 2),
(13, 'Frank', 72000, 1),
(14, 'Gina', 83000, 1),
(15, 'Hank', 68000, 1),
(16, 'Irene', 76000, 2),
(17, 'Jack', 74000, 2),
(18, 'Kelly', 79000, 1),
(19, 'Liam', 71000, 1),
(20, 'Molly', 77000, 2),
(21, 'Nathan', 81000, 1),
(22, 'Olivia', 73000, 2),
(23, 'Peter', 78000, 1),
(24, 'Quinn', 72000, 1),
(25, 'Rachel', 80000, 2),
(26, 'Steve', 75000, 2),
(27, 'Tina', 79000, 1);

Learnings

• Ranking: The RANK() or DENSE_RANK() function can be used to rank


employees based on their salary within each department.
• Handling ties: The query needs to consider cases where two or more
employees have the same salary.
• Filtering top values: We can use DISTINCT or filter after ranking to
ensure we get only the top unique salaries.
• Window functions: Understanding how to partition and rank data
using window functions like RANK() is crucial.

Solutions
PostgreSQL solution:
WITH ranked_salaries AS (
SELECT
e.id AS employee_id,
e.name AS employee_name,
e.salary,
e.departmentId,
DENSE_RANK() OVER (PARTITION BY e.departmentId ORDER BY e.salary D
ESC) AS salary_rank
FROM
Employee e
)
SELECT
d.name AS department_name,
rs.employee_name,
rs.salary
FROM
ranked_salaries rs
JOIN
Department d ON rs.departmentId = d.id
WHERE
rs.salary_rank <= 3
ORDER BY
d.name, rs.salary DESC;

135
1000+ SQL Interview Questions & Answers | By Zero Analyst

MySQL solution:
WITH ranked_salaries AS (
SELECT
e.id AS employee_id,
e.name AS employee_name,
e.salary,
e.departmentId,
DENSE_RANK() OVER (PARTITION BY e.departmentId ORDER BY e.salary D
ESC) AS salary_rank
FROM
Employee e
)
SELECT
d.name AS department_name,
rs.employee_name,
rs.salary
FROM
ranked_salaries rs
JOIN
Department d ON rs.departmentId = d.id
WHERE
rs.salary_rank <= 3
ORDER BY
d.name, rs.salary DESC;

Explanation of SQL Components:

1. CTE (Common Table Expression): The ranked_salaries CTE


ranks employees within each department using DENSE_RANK(),
partitioned by departmentId and ordered by salary in descending
order. DENSE_RANK() ensures that salaries with ties get the same rank.

2. Join with Department table: The Department table is joined to add


the department name alongside each employee's data.

3. Filtering by rank: The main query filters to keep only employees


whose rank is less than or equal to 3, meaning they are in the top three
salaries for their department.

4. Ordering: The results are ordered by department name and salary in


descending order to show the highest earners first.

o Q.89

Write an SQL query to find, for each month and country, the number of
transactions and their total amount, as well as the number of approved
transactions and their total amount.
The result should be ordered by country and month.

Explanation
You need to calculate the number of transactions and their total amount for
each month and country, as well as the number of approved transactions and
their total amount. This requires grouping the transactions by month and

136
1000+ SQL Interview Questions & Answers | By Zero Analyst

country, and using conditional aggregation to separate approved and declined


transactions.

Datasets and SQL Schemas


Table creation and sample data
CREATE TABLE Transactions (
id INT PRIMARY KEY,
country VARCHAR(255),
state VARCHAR,
amount INT,
trans_date DATE
);

-- Insert sample records


INSERT INTO Transactions (id, country, state, amount, trans_date) VALUES
(121, 'US', 'approved', 1000, '2018-12-18'),
(122, 'US', 'declined', 2000, '2018-12-19'),
(123, 'US', 'approved', 2000, '2019-01-01'),
(124, 'DE', 'approved', 2000, '2019-01-07');

Learnings

• Date manipulation: You need to extract the month and year from the
transaction date to group by month.
• Conditional aggregation: Using CASE WHEN to conditionally count
and sum the approved transactions.
• Grouping: Group by both country and the month/year extracted from
the transaction date.
• Sorting: Sorting the result by country and month.

Solutions
PostgreSQL solution:
SELECT
country,
TO_CHAR(trans_date, 'YYYY-MM') AS month, -- Format the date to year-m
onth
COUNT(*) AS total_transactions, -- Total number of transactions
SUM(amount) AS total_amount, -- Total amount of all transactions
COUNT(CASE WHEN state = 'approved' THEN 1 END) AS approved_transaction
s, -- Approved transactions count
SUM(CASE WHEN state = 'approved' THEN amount END) AS approved_amount
-- Approved transactions total amount
FROM
Transactions
GROUP BY
country, TO_CHAR(trans_date, 'YYYY-MM')
ORDER BY
country, month;

MySQL solution:
SELECT
country,
DATE_FORMAT(trans_date, '%Y-%m') AS month, -- Format the date to year
-month
COUNT(*) AS total_transactions, -- Total number of transactions
SUM(amount) AS total_amount, -- Total amount of all transactions
COUNT(CASE WHEN state = 'approved' THEN 1 END) AS approved_transaction
s, -- Approved transactions count
SUM(CASE WHEN state = 'approved' THEN amount END) AS approved_amount
-- Approved transactions total amount

137
1000+ SQL Interview Questions & Answers | By Zero Analyst

FROM
Transactions
GROUP BY
country, DATE_FORMAT(trans_date, '%Y-%m')
ORDER BY
country, month;

Explanation of SQL Components:

1. Date formatting: We use TO_CHAR(trans_date, 'YYYY-MM') in


PostgreSQL and DATE_FORMAT(trans_date, '%Y-%m') in MySQL to
extract the year and month from the transaction date.

2. Conditional aggregation:
• COUNT(CASE WHEN state = 'approved' THEN 1 END)
counts the number of approved transactions.
• SUM(CASE WHEN state = 'approved' THEN amount END)
sums the amounts for approved transactions.

3. Grouping: We group the results by country and the formatted month


to ensure the output reflects both country and monthly breakdowns.

4. Sorting: The result is ordered by country and month to meet the


specified output order.

o Q.90

Given the reviews table, write a query to retrieve the average star rating for
each product, grouped by month. The output should display:

• The month as a numerical value (1-12),


• The product ID,
• The average star rating rounded to two decimal places.

The result should be sorted first by month and then by product ID.

Explanation
To solve this problem:

1. We need to extract the month and year from the submit_date column
to group the reviews by month.

2. We calculate the average star rating for each product within each
month.

3. The results must be rounded to two decimal places.

4. Finally, we sort the results first by month and then by product ID.

138
1000+ SQL Interview Questions & Answers | By Zero Analyst

Datasets and SQL Schemas


Table creation and sample data
CREATE TABLE reviews (
review_id INTEGER,
user_id INTEGER,
submit_date TIMESTAMP,
product_id INTEGER,
stars INTEGER
);

-- Insert sample records


INSERT INTO reviews (review_id, user_id, submit_date, product_id, stars) V
ALUES
(6171, 123, '2022-06-08 00:00:00', 50001, 4),
(7802, 265, '2022-06-10 00:00:00', 69852, 4),
(5293, 362, '2022-06-18 00:00:00', 50001, 3),
(6352, 192, '2022-07-26 00:00:00', 69852, 3),
(4517, 981, '2022-07-05 00:00:00', 69852, 2),
(8654, 753, '2022-08-15 00:00:00', 50001, 5),
(9743, 642, '2022-08-22 00:00:00', 69852, 3),
(1025, 874, '2022-08-05 00:00:00', 50001, 4),
(2089, 512, '2022-09-10 00:00:00', 69852, 2),
(3078, 369, '2022-09-18 00:00:00', 50001, 5),
(4056, 785, '2022-09-25 00:00:00', 69852, 4),
(5034, 641, '2022-10-12 00:00:00', 50001, 3),
(6023, 829, '2022-10-18 00:00:00', 69852, 5),
(7012, 957, '2022-10-25 00:00:00', 50001, 2),
(8001, 413, '2022-11-05 00:00:00', 69852, 4),
(8990, 268, '2022-11-15 00:00:00', 50001, 3),
(9967, 518, '2022-11-22 00:00:00', 69852, 3),
(1086, 753, '2022-12-10 00:00:00', 50001, 5),
(1175, 642, '2022-12-18 00:00:00', 69852, 4),
(1264, 874, '2022-12-25 00:00:00', 50001, 3),
(1353, 512, '2022-12-31 00:00:00', 69852, 2),
(1442, 369, '2023-01-05 00:00:00', 50001, 4),
(1531, 785, '2023-01-15 00:00:00', 69852, 5),
(1620, 641, '2023-01-22 00:00:00', 50001, 3),
(1709, 829, '2023-01-30 00:00:00', 69852, 4);

Learnings

• Date functions: Extracting the month and year from a timestamp is


essential for grouping by month.
• Aggregation: Using AVG() to calculate the average star rating for each
product within each month.
• Rounding: Using ROUND() to round the average to two decimal places.
• Grouping and sorting: We need to group by product_id and the
extracted month, and then order the results by month and product ID.

Solutions
PostgreSQL solution:
SELECT
EXTRACT(MONTH FROM submit_date) AS month, -- Extract the month from t
he date
product_id,
ROUND(AVG(stars)::numeric, 2) AS avg_star_rating -- Calculate and rou
nd the average star rating
FROM
reviews
GROUP BY
EXTRACT(MONTH FROM submit_date), product_id
ORDER BY

139
1000+ SQL Interview Questions & Answers | By Zero Analyst

month, product_id;

MySQL solution:
SELECT
MONTH(submit_date) AS month, -- Extract the month from the date
product_id,
ROUND(AVG(stars), 2) AS avg_star_rating -- Calculate and round the av
erage star rating
FROM
reviews
GROUP BY
MONTH(submit_date), product_id
ORDER BY
month, product_id;

Explanation of SQL Components:

1. Month extraction:
• In PostgreSQL, EXTRACT(MONTH FROM submit_date) extracts
the month part of the submit_date.
• In MySQL, MONTH(submit_date) does the same.

2. Average calculation: The AVG(stars) function computes the average


star rating for each product in a given month. In PostgreSQL, we
explicitly cast the result to numeric and round it to two decimal places
using ROUND(). In MySQL, ROUND(AVG(stars), 2) directly rounds
the result to two decimal places.

3. Grouping: The GROUP BY clause groups the data by month and


product_id, so we get the average per product per month.

4. Sorting: The ORDER BY clause ensures the results are ordered first by
month and then by product ID.

o Q.91

Identify users who have made purchases totaling more than $10,000 in the last
month from the purchases table. The table contains information about
purchases, including the user ID, date of purchase, product ID, and the amount
spent.

Explanation
To solve this:

1. We need to filter the records to include only purchases made in the last
month.

2. Sum the total amount spent by each user during this period.

3. Select only those users whose total amount spent exceeds $10,000.

140
1000+ SQL Interview Questions & Answers | By Zero Analyst

4. Ensure that the query dynamically calculates "last month" based on the
current date.

Datasets and SQL Schemas


Table creation and sample data
CREATE TABLE purchases (
purchase_id INT PRIMARY KEY,
user_id INT,
date_of_purchase TIMESTAMP,
product_id INT,
amount_spent DECIMAL(10, 2)
);

-- Insert sample records


INSERT INTO purchases (purchase_id, user_id, date_of_purchase, product_id,
amount_spent) VALUES
(2171, 145, '2024-02-22 00:00:00', 43001, 1000),
(3022, 578, '2024-02-24 00:00:00', 25852, 4000),
(4933, 145, '2024-02-28 00:00:00', 43001, 7000),
(6322, 248, '2024-02-19 00:00:00', 25852, 2000),
(4717, 578, '2024-02-12 00:00:00', 25852, 7000),
(2172, 145, '2024-01-15 00:00:00', 43001, 8000),
(3023, 578, '2024-01-18 00:00:00', 25852, 3000),
(4934, 145, '2024-01-28 00:00:00', 43001, 9000),
(6323, 248, '2024-02-20 00:00:00', 25852, 1500),
(4718, 578, '2024-02-25 00:00:00', 25852, 6000);

Learnings

• Date manipulation: To dynamically get purchases made in the "last


month," you can use CURRENT_DATE or NOW() along with INTERVAL or
DATE_TRUNC() depending on the SQL database.
• Aggregating amounts: You need to sum the amount_spent for each
user.
• Filtering results: Filter the users whose total amount spent exceeds
$10,000.
• Grouping and Sorting: Group by user_id to calculate the total
amount spent for each user.

Solutions
PostgreSQL solution:
SELECT
user_id,
SUM(amount_spent) AS total_spent
FROM
purchases
WHERE
date_of_purchase >= DATE_TRUNC('month', CURRENT_DATE) - INTERVAL '1 mo
nth' -- Start of last month
AND date_of_purchase < DATE_TRUNC('month', CURRENT_DATE) -- End of la
st month
GROUP BY
user_id
HAVING
SUM(amount_spent) > 10000 -- Only users who spent more than $10,000
ORDER BY
user_id;

MySQL solution:

141
1000+ SQL Interview Questions & Answers | By Zero Analyst

SELECT
user_id,
SUM(amount_spent) AS total_spent
FROM
purchases
WHERE
date_of_purchase >= CURDATE() - INTERVAL 1 MONTH -- Start of last mon
th
AND date_of_purchase < CURDATE() -- End of last month
GROUP BY
user_id
HAVING
SUM(amount_spent) > 10000 -- Only users who spent more than $10,000
ORDER BY
user_id;

Explanation of SQL Components:

1. Date filtering:
• In PostgreSQL, DATE_TRUNC('month', CURRENT_DATE) -
INTERVAL '1 month' gives the start of the previous month.
• In MySQL, CURDATE() - INTERVAL 1 MONTH gives the start
of the previous month.
• The end of the last month is calculated by using
DATE_TRUNC('month', CURRENT_DATE) in PostgreSQL and
CURDATE() in MySQL.

2. SUM() aggregation: The SUM(amount_spent) function is used to


calculate the total amount spent by each user within the last month.

3. HAVING: The HAVING clause is used to filter out users whose total
spend is less than or equal to $10,000.

4. Grouping: We group by user_id to calculate the total spend for each


user.

5. Sorting: The result is ordered by user_id for better readability and


analysis.

o Q.92

Given the data on IBM employees, write a query to find the average duration
of service for employees across different departments. The duration of service
is calculated as end_date - start_date. If end_date is NULL, consider it
as the current date.

Explanation
To solve this:

1. The duration of service for each employee is calculated by subtracting


the start_date from the end_date.

142
1000+ SQL Interview Questions & Answers | By Zero Analyst

2. If the end_date is NULL (indicating the employee is still employed),


we need to use the current date.

3. The result should be grouped by department, and the average duration


should be calculated for each department.

Datasets and SQL Schemas


Table creation and sample data
CREATE TABLE employee_service (
employee_id INT PRIMARY KEY,
name VARCHAR(50),
start_date DATE,
end_date DATE,
department VARCHAR(50)
);

-- Insert sample records


INSERT INTO employee_service (employee_id, name, start_date, end_date, dep
artment) VALUES
(101, 'John', '2015-01-15', '2020-06-30', 'Technology'),
(102, 'Emma', '2016-08-01', NULL, 'Management'),
(103, 'Ava', '2017-05-30', '2019-08-01', 'Strategy'),
(104, 'Oliver', '2018-11-11', NULL, 'Technology'),
(105, 'Sophia', '2020-01-17', NULL, 'Management'),
(106, 'William', '2019-03-20', NULL, 'Strategy'),
(107, 'James', '2018-09-10', NULL, 'Technology'),
(108, 'Charlotte', '2017-12-05', NULL, 'Management'),
(109, 'Michael', '2016-06-15', '2021-02-28', 'Technology'),
(110, 'Amelia', '2019-11-25', NULL, 'Strategy'),
(111, 'Ethan', '2018-04-08', '2022-01-10', 'Management'),
(112, 'Mia', '2020-07-15', NULL, 'Technology'),
(113, 'Alexander', '2017-10-30', '2020-09-15', 'Strategy'),
(114, 'Isabella', '2016-05-22', '2021-08-20', 'Management'),
(115, 'Liam', '2019-02-12', '2023-04-05', 'Technology'),
(116, 'Ella', '2018-08-05', '2022-11-28', 'Strategy'),
(117, 'Noah', '2020-09-18', NULL, 'Management'),
(118, 'Avery', '2017-11-10', NULL, 'Technology'),
(119, 'Benjamin', '2016-04-04', NULL, 'Strategy'),
(120, 'Abigail', '2019-08-30', NULL, 'Management');

Learnings

• Date differences: Calculate the difference between two dates to


determine the length of an employee's service.
• Handling NULL values: Use COALESCE or CASE to substitute the
current date when end_date is NULL.
• Aggregation: Calculate the average duration of service for each
department using the AVG() function.
• Grouping: Use GROUP BY to group the data by department to get the
average duration per department.

Solutions
PostgreSQL solution:
SELECT
department,
AVG(
CASE
WHEN end_date IS NULL THEN CURRENT_DATE - start_date -- Use c
urrent date if end_date is NULL

143
1000+ SQL Interview Questions & Answers | By Zero Analyst

ELSE end_date - start_date


END
) AS avg_duration_of_service
FROM
employee_service
GROUP BY
department
ORDER BY
department;

MySQL solution:
SELECT
department,
AVG(
CASE
WHEN end_date IS NULL THEN DATEDIFF(CURDATE(), start_date) --
Use current date if end_date is NULL
ELSE DATEDIFF(end_date, start_date)
END
) AS avg_duration_of_service
FROM
employee_service
GROUP BY
department
ORDER BY
department;

Explanation of SQL Components:

1. Date calculation:
• PostgreSQL: CURRENT_DATE - start_date calculates the
duration from start_date to the current date when end_date
is NULL.
• MySQL: DATEDIFF(CURDATE(), start_date) calculates the
difference in days between the start_date and the current
date if end_date is NULL.
• When end_date is not NULL, the difference between
end_date and start_date is calculated directly.

2. CASE statement: The CASE expression is used to check whether


end_date is NULL. If it is, we use the current date (via CURRENT_DATE
in PostgreSQL or CURDATE() in MySQL); otherwise, we calculate the
duration based on end_date.

3. AVG() aggregation: The AVG() function is used to compute the


average duration of service for each department.

4. GROUP BY: The query is grouped by department to compute the


average duration for each department.

5. Sorting: The result is sorted by department for easy comparison.

This solution computes the average service duration for employees in each
department, accounting for those whose service is ongoing. Let me know if
you need further explanations or adjustments!

144
1000+ SQL Interview Questions & Answers | By Zero Analyst

o Q.93

Write a query to identify the top 3 posts with the highest engagement (likes +
comments) for each user on a Facebook page. Display the user ID, post ID,
engagement count, and rank for each post.

Explanation
To solve this:

1. Calculate the total engagement for each post by summing likes and
comments.

2. Rank the posts for each user based on the total engagement.

3. For each user, select the top 3 posts with the highest engagement.

4. The result should include the user ID, post ID, engagement count (likes
+ comments), and rank for each post.

Datasets and SQL Schemas


Table creation and sample data
CREATE TABLE fb_posts (
post_id INT PRIMARY KEY,
user_id INT,
likes INT,
comments INT,
post_date DATE
);

-- Insert sample records


INSERT INTO fb_posts (post_id, user_id, likes, comments, post_date) VALUES
(1, 101, 50, 20, '2024-02-27'),
(2, 102, 30, 15, '2024-02-28'),
(3, 103, 70, 25, '2024-02-29'),
(4, 101, 80, 30, '2024-03-01'),
(5, 102, 40, 10, '2024-03-02'),
(6, 103, 60, 20, '2024-03-03'),
(7, 101, 90, 35, '2024-03-04'),
(8, 101, 90, 35, '2024-03-05'),
(9, 102, 50, 15, '2024-03-06'),
(10, 103, 30, 10, '2024-03-07'),
(11, 101, 60, 25, '2024-03-08'),
(12, 102, 70, 30, '2024-03-09'),
(13, 103, 80, 35, '2024-03-10'),
(14, 101, 40, 20, '2024-03-11'),
(15, 102, 90, 40, '2024-03-12'),
(16, 103, 20, 5, '2024-03-13'),
(17, 101, 70, 25, '2024-03-14'),
(18, 102, 50, 15, '2024-03-15'),
(19, 103, 30, 10, '2024-03-16'),
(20, 101, 60, 20, '2024-03-17');

Learnings

• Aggregation: Summing up the likes and comments to calculate the


total engagement.

145
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Ranking: Using window functions like ROW_NUMBER() or RANK() to


assign ranks based on engagement.
• Window Functions: The PARTITION BY clause is used in conjunction
with window functions to rank posts per user.
• Filtering: Limiting the result to only the top 3 posts per user using
WHERE or ROW_NUMBER().

Solutions
PostgreSQL solution:
WITH ranked_posts AS (
SELECT
user_id,
post_id,
(likes + comments) AS engagement,
ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY (likes + comments
) DESC) AS rank
FROM
fb_posts
)
SELECT
user_id,
post_id,
engagement,
rank
FROM
ranked_posts
WHERE
rank <= 3
ORDER BY
user_id, rank;

MySQL solution:
WITH ranked_posts AS (
SELECT
user_id,
post_id,
(likes + comments) AS engagement,
ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY (likes + comments
) DESC) AS rank
FROM
fb_posts
)
SELECT
user_id,
post_id,
engagement,
rank
FROM
ranked_posts
WHERE
rank <= 3
ORDER BY
user_id, rank;

Explanation of SQL Components:

1. Engagement Calculation: The total engagement for each post is


calculated by summing the likes and comments columns: (likes +
comments).

2. Window Function:

146
1000+ SQL Interview Questions & Answers | By Zero Analyst

• The ROW_NUMBER() function assigns a unique rank to each post,


ordered by the total engagement in descending order.
• PARTITION BY user_id: This ensures the ranking is done
separately for each user.
• ORDER BY (likes + comments) DESC: This orders posts by
engagement in descending order for each user.

3. Filter for Top 3 Posts: The WHERE rank <= 3 clause filters the results
to only include the top 3 posts for each user.

4. Sorting: The result is ordered first by user_id, and then by rank,


ensuring the top 3 posts are displayed in order.

This query retrieves the top 3 posts for each user based on engagement, with
the user ID, post ID, engagement count, and rank for each post. Let me know
if you need further clarifications!

o Q.94

Write a query to retrieve the count of companies that have posted duplicate job
listings.
Definition:
Duplicate job listings are defined as two job listings within the same company
that share identical titles and descriptions.

Explanation
To solve this:

1. We need to identify job listings within the same company that have
identical titles and descriptions.

2. We should count how many companies have posted at least one


duplicate job listing.

3. A "duplicate" is defined as job listings having the same company_id,


title, and description.

Datasets and SQL Schemas


Table creation and sample data
CREATE TABLE job_listings (
job_id INTEGER PRIMARY KEY,
company_id INTEGER,
title TEXT,
description TEXT
);

-- Insert sample records


INSERT INTO job_listings (job_id, company_id, title, description)
VALUES

147
1000+ SQL Interview Questions & Answers | By Zero Analyst

(248, 827, 'Business Analyst', 'Business analyst evaluates past and cu


rrent business data with the primary goal of improving decision-making pro
cesses within organizations.'),
(149, 845, 'Business Analyst', 'Business analyst evaluates past and cu
rrent business data with the primary goal of improving decision-making pro
cesses within organizations.'),
(945, 345, 'Data Analyst', 'Data analyst reviews data to identify key
insights into a business''s customers and ways the data can be used to sol
ve problems.'),
(164, 345, 'Data Analyst', 'Data analyst reviews data to identify key
insights into a business''s customers and ways the data can be used to sol
ve problems.'),
(172, 244, 'Data Engineer', 'Data engineer works in a variety of setti
ngs to build systems that collect, manage, and convert raw data into usabl
e information for data scientists and business analysts to interpret.'),
(573, 456, 'Software Engineer', 'Software engineer designs, develops,
tests, and maintains software applications.'),
(324, 789, 'Software Engineer', 'Software engineer designs, develops,
tests, and maintains software applications.'),
(890, 123, 'Data Scientist', 'Data scientist analyzes and interprets c
omplex data to help organizations make informed decisions.'),
(753, 123, 'Data Scientist', 'Data scientist analyzes and interprets
complex data to help organizations make informed decisions.');

Learnings

• Group By: We will group the records by company_id, title, and


description to identify the duplicates.
• Having: The HAVING clause will help filter the groups that have more
than one listing, indicating duplicates.
• Count: We will count the distinct companies that have duplicate job
listings.

Solutions
PostgreSQL Solution:
SELECT COUNT(DISTINCT company_id) AS duplicate_companies
FROM job_listings
GROUP BY company_id, title, description
HAVING COUNT(*) > 1;

MySQL Solution:
SELECT COUNT(DISTINCT company_id) AS duplicate_companies
FROM job_listings
GROUP BY company_id, title, description
HAVING COUNT(*) > 1;

Explanation of SQL Components:

1. Grouping:
• We group by company_id, title, and description to check
if there are any duplicates within the same company.

2. HAVING:
• HAVING COUNT(*) > 1: This filters the groups to only include
those that have more than one job posting with the same title
and description.

3. Counting Companies:

148
1000+ SQL Interview Questions & Answers | By Zero Analyst

• COUNT(DISTINCT company_id): We count the distinct


companies that have posted duplicate job listings.

Expected Output:
The result will be a single value representing the number of companies that
have posted duplicate job listings based on identical titles and descriptions.

o Q.95

Identify the region with the lowest sales amount for the previous month.
Return the region name and total sales amount.

Explanation
To solve this, we need to:

1. Filter the data to consider only sales from the previous month
(February 2024).

2. Group the data by region to calculate the total sales amount for each
region.

3. Identify the region with the lowest total sales amount.

4. Return the region name and the total sales amount for that region.

Datasets and SQL Schemas


Table creation and sample data
CREATE TABLE Sales (
SaleID SERIAL PRIMARY KEY,
Region VARCHAR(50),
Amount DECIMAL(10, 2),
SaleDate DATE
);

-- Insert sample records


INSERT INTO Sales (Region, Amount, SaleDate) VALUES
('North', 5000.00, '2024-02-01'),
('South', 6000.00, '2024-02-02'),
('East', 4500.00, '2024-02-03'),
('West', 7000.00, '2024-02-04'),
('North', 5500.00, '2024-02-05'),
('South', 6500.00, '2024-02-06'),
('East', 4800.00, '2024-02-07'),
('West', 7200.00, '2024-02-08'),
('North', 5200.00, '2024-02-09'),
('South', 6200.00, '2024-02-10'),
('East', 4700.00, '2024-02-11'),
('West', 7100.00, '2024-02-12'),
('North', 5300.00, '2024-02-13'),
('South', 6300.00, '2024-02-14'),
('East', 4600.00, '2024-02-15'),
('West', 7300.00, '2024-02-16'),
('North', 5400.00, '2024-02-17'),
('South', 6400.00, '2024-02-18'),
('East', 4900.00, '2024-02-19'),

149
1000+ SQL Interview Questions & Answers | By Zero Analyst

('West', 7400.00, '2024-02-20'),


('North', 5600.00, '2024-02-21'),
('South', 6600.00, '2024-02-22'),
('East', 5000.00, '2024-02-23'),
('West', 7500.00, '2024-02-24'),
('North', 5700.00, '2024-02-25'),
('South', 6700.00, '2024-02-26'),
('East', 5100.00, '2024-02-27'),
('West', 7600.00, '2024-02-28');

Learnings

• Group By: We need to group the sales data by Region to calculate


total sales per region.
• Aggregation: We will use the SUM() function to calculate the total
sales per region.
• Filtering Dates: We will filter records for the previous month
(February 2024).
• Sorting: To find the region with the lowest sales, we will sort the
results in ascending order of total sales.

Solutions
PostgreSQL Solution:
SELECT Region, SUM(Amount) AS total_sales
FROM Sales
WHERE SaleDate BETWEEN '2024-02-01' AND '2024-02-29'
GROUP BY Region
ORDER BY total_sales ASC
LIMIT 1;

MySQL Solution:
SELECT Region, SUM(Amount) AS total_sales
FROM Sales
WHERE SaleDate BETWEEN '2024-02-01' AND '2024-02-29'
GROUP BY Region
ORDER BY total_sales ASC
LIMIT 1;

Explanation of SQL Components:

1. Date Filtering:
• WHERE SaleDate BETWEEN '2024-02-01' AND '2024-02-
29': Filters sales records for February 2024.

2. Grouping:
• GROUP BY Region: Groups the data by region to calculate total
sales for each region.

3. Aggregation:
• SUM(Amount) AS total_sales: Sums up the sales amount for
each region.

4. Sorting:

150
1000+ SQL Interview Questions & Answers | By Zero Analyst

• ORDER BY total_sales ASC: Sorts the regions in ascending


order of total sales, so the region with the lowest sales comes
first.

5. Limit:
• LIMIT 1: Returns only the region with the lowest total sales.

Expected Output:
The result will display the region with the lowest sales for the month of
February 2024, along with the total sales amount for that region.
Example output:

Region total_sales

East 13800.00

o Q.96

Find the median within a series of numbers in SQL.

Explanation
To find the median in SQL:

1. Median Definition: The median is the middle number in a sorted list


of numbers. If the number of elements is odd, the median is the middle
element. If the number of elements is even, the median is the average
of the two middle values.

2. We need to:
• Sort the views column.
• Find the middle value (or the average of the two middle values
if the number of rows is even).

3. We will use window functions or subqueries to handle this.

Datasets and SQL Schemas


Table creation and sample data
CREATE TABLE tiktok (
views INT
);

-- Insert records into the tiktok table


INSERT INTO tiktok (views)
VALUES
(100), (800), (350),
(150), (600),
(700), (700), (950);

151
1000+ SQL Interview Questions & Answers | By Zero Analyst

Learnings

• Window Functions: You can use ROW_NUMBER(), RANK(), or NTILE()


to handle ranking and finding the middle element.
• Handling Odd and Even Rows: Use COUNT() to determine if the
number of rows is odd or even, and then apply different logic for each
case.
• Median Calculation: If the number of rows is odd, the median is the
middle element. If even, it is the average of the two middle values.

Solutions
PostgreSQL Solution:
In PostgreSQL, we can use window functions to rank the rows and find the
middle value(s):
WITH RankedViews AS (
SELECT views,
ROW_NUMBER() OVER (ORDER BY views) AS row_num,
COUNT(*) OVER () AS total_rows
FROM tiktok
)
SELECT
CASE
WHEN total_rows % 2 = 1 THEN -- Odd number of rows
(SELECT views FROM RankedViews WHERE row_num = (total_rows + 1
) / 2)
ELSE -- Even number of rows
(SELECT AVG(views) FROM RankedViews WHERE row_num IN (total_ro
ws / 2, total_rows / 2 + 1))
END AS median;

Explanation:

1. Window Functions:
• ROW_NUMBER() OVER (ORDER BY views): This generates a
unique row number for each row based on the sorted views
column.
• COUNT(*) OVER (): This counts the total number of rows in
the table.

2. Median Logic:
• If the number of rows (total_rows) is odd, the median is the
row in the middle ((total_rows + 1) / 2).
• If the number of rows is even, the median is the average of the
two middle values (total_rows / 2 and total_rows / 2 +
1).

MySQL Solution:
MySQL doesn’t support ROW_NUMBER() and COUNT(*) OVER () in the same
way as PostgreSQL, but we can use a similar approach:
SELECT
CASE
WHEN COUNT(*) % 2 = 1 THEN
(SELECT views FROM tiktok ORDER BY views LIMIT 1 OFFSET (COUNT
(*) - 1) / 2)

152
1000+ SQL Interview Questions & Answers | By Zero Analyst

ELSE
(SELECT AVG(views) FROM (SELECT views FROM tiktok ORDER BY vie
ws LIMIT 2 OFFSET (COUNT(*) / 2 - 1)) AS subquery)
END AS median
FROM tiktok;

Explanation:

• Odd Number of Rows: If the count of rows is odd, use LIMIT 1


OFFSET (COUNT(*) - 1) / 2 to select the middle row.
• Even Number of Rows: If the count of rows is even, use a subquery to
fetch the two middle values and calculate their average.

Expected Output
For the given sample data:
views

100

800

350

150

600

700

700

950

Sorted Views:
100, 150, 350, 600, 700, 700, 800, 950

• Since there are 8 values (even), the median will be the average of the
4th and 5th values (600 and 700):
(600 + 700) / 2 = 650

So, the expected output will be:


median

153
1000+ SQL Interview Questions & Answers | By Zero Analyst

650.00

Key Takeaways

• Window Functions: Used to rank rows and calculate positions.


• Handling Odd/Even: The logic to calculate the median varies
depending on whether the total number of rows is odd or even.
• Subqueries: Used to calculate averages or select specific rows in case
of an even number of rows.

Let me know if you need any further clarification!

o Q.97

Identify the region with the lowest sales amount for the previous month.
Return the region name and the total sale amount.

Explanation
You need to identify which region had the lowest total sales amount in the
previous month. This involves filtering the sales records to include only those
from the last month, summing the sales amounts by region, and then selecting
the region with the smallest total sales.

Datasets and SQL Schemas


Table Creation and Sample Data
-- Create Sales table
CREATE TABLE Sales (
SaleID SERIAL PRIMARY KEY,
Region VARCHAR(50),
Amount DECIMAL(10, 2),
SaleDate DATE
);

-- Insert sample data into Sales table


INSERT INTO Sales (Region, Amount, SaleDate) VALUES
('North', 5000.00, '2024-02-01'),
('South', 6000.00, '2024-02-02'),
('East', 4500.00, '2024-02-03'),
('West', 7000.00, '2024-02-04'),
('North', 5500.00, '2024-02-05'),
('South', 6500.00, '2024-02-06'),
('East', 4800.00, '2024-02-07'),
('West', 7200.00, '2024-02-08'),
('North', 5200.00, '2024-02-09'),
('South', 6200.00, '2024-02-10'),
('East', 4700.00, '2024-02-11'),
('West', 7100.00, '2024-02-12'),
('North', 5300.00, '2024-02-13'),
('South', 6300.00, '2024-02-14'),
('East', 4600.00, '2024-02-15'),

154
1000+ SQL Interview Questions & Answers | By Zero Analyst

('West', 7300.00, '2024-02-16'),


('North', 5400.00, '2024-02-17'),
('South', 6400.00, '2024-02-18'),
('East', 4900.00, '2024-02-19'),
('West', 7400.00, '2024-02-20'),
('North', 5600.00, '2024-02-21'),
('South', 6600.00, '2024-02-22'),
('East', 5000.00, '2024-02-23'),
('West', 7500.00, '2024-02-24'),
('North', 5700.00, '2024-02-25'),
('South', 6700.00, '2024-02-26'),
('East', 5100.00, '2024-02-27'),
('West', 7600.00, '2024-02-28');

Learnings

• Filtering by Date: You’ll need to filter the data for the previous
month, which can be done using date functions such as CURRENT_DATE,
DATE_TRUNC, and INTERVAL.
• Aggregation: Use the SUM() function to aggregate sales for each
region.
• Grouping: Use GROUP BY to group the data by region.
• Ordering and Limiting: Use ORDER BY and LIMIT to get the region
with the lowest total sales.

Solutions

PostgreSQL Solution
SELECT Region, SUM(Amount) AS total_sale
FROM Sales
WHERE SaleDate >= date_trunc('month', CURRENT_DATE) - INTERVAL '1 month'
AND SaleDate < date_trunc('month', CURRENT_DATE)
GROUP BY Region
ORDER BY total_sale ASC
LIMIT 1;

MySQL Solution
SELECT Region, SUM(Amount) AS total_sale
FROM Sales
WHERE SaleDate >= DATE_SUB(CURRENT_DATE, INTERVAL 1 MONTH)
AND SaleDate < CURDATE()
GROUP BY Region
ORDER BY total_sale ASC
LIMIT 1;

o Q.98

Which metro city had the highest number of restaurant orders in September
2021?
Write the SQL query to retrieve the city name and the total count of orders,
ordered by the total count of orders in descending order.

155
1000+ SQL Interview Questions & Answers | By Zero Analyst

Explanation
You need to find out which of the listed metro cities had the highest number of
restaurant orders in September 2021. This requires filtering the orders based
on the month and year, counting the number of orders for each city, and then
sorting the results by the count in descending order.

Datasets and SQL Schemas


Table Creation and Sample Data
-- Create the restaurant_orders table
CREATE TABLE restaurant_orders (
city VARCHAR(50),
restaurant_id INT,
order_id INT,
order_date DATE
);

-- Insert sample records into restaurant_orders


INSERT INTO restaurant_orders (city, restaurant_id, order_id, order_date)
VALUES
('Delhi', 101, 1, '2021-09-05'),
('Bangalore', 102, 12, '2021-09-08'),
('Bangalore', 102, 13, '2021-09-08'),
('Bangalore', 102, 14, '2021-09-08'),
('Mumbai', 103, 3, '2021-09-10'),
('Mumbai', 103, 30, '2021-09-10'),
('Chennai', 104, 4, '2021-09-15'),
('Delhi', 105, 5, '2021-09-20'),
('Bangalore', 106, 6, '2021-09-25'),
('Mumbai', 107, 7, '2021-09-28'),
('Chennai', 108, 8, '2021-09-30'),
('Delhi', 109, 9, '2021-10-05'),
('Bangalore', 110, 10, '2021-10-08'),
('Mumbai', 111, 11, '2021-10-10'),
('Chennai', 112, 12, '2021-10-15'),
('Kolkata', 113, 13, '2021-10-20'),
('Hyderabad', 114, 14, '2021-10-25'),
('Pune', 115, 15, '2021-10-28'),
('Jaipur', 116, 16, '2021-10-30');

Learnings

• Filtering by Date: You'll need to filter orders to only include those


from September 2021.
• Aggregation: Use the COUNT() function to aggregate the number of
orders for each city.
• Grouping: Group the results by city to calculate the total number of
orders for each city.
• Ordering: Use ORDER BY to sort the cities by their total number of
orders in descending order.

Solutions

PostgreSQL Solution

156
1000+ SQL Interview Questions & Answers | By Zero Analyst

SELECT city, COUNT(order_id) AS total_orders


FROM restaurant_orders
WHERE city IN ('Delhi', 'Mumbai', 'Bangalore', 'Hyderabad')
AND order_date >= '2021-09-01'
AND order_date < '2021-10-01'
GROUP BY city
ORDER BY total_orders DESC
LIMIT 1;

MySQL Solution
SELECT city, COUNT(order_id) AS total_orders
FROM restaurant_orders
WHERE city IN ('Delhi', 'Mumbai', 'Bangalore', 'Hyderabad')
AND order_date >= '2021-09-01'
AND order_date < '2021-10-01'
GROUP BY city
ORDER BY total_orders DESC
LIMIT 1;

o Q.99

Identify the drivers with the highest average rating in the last 6 months.
For each driver, calculate their average rating, the number of completed
rides, and rank them based on their average rating in descending order.
Display the driver ID, average rating, number of completed rides, and
rank.

Explanation
You need to calculate the average rating for each driver in the last 6 months,
count the number of completed rides for each driver, and rank them by their
average rating. The challenge here involves:

• Subqueries: To calculate the average rating for each driver.


• Window Functions: To rank the drivers based on their average rating.
• Date Filtering: Considering only rides from the last 6 months.
• Aggregation: Using COUNT() to count the completed rides.

Datasets and SQL Schemas


Table Creation and Sample Data
-- Create the rapido_rides table
CREATE TABLE rapido_rides (
ride_id INT PRIMARY KEY,
driver_id INT,
rating DECIMAL(3, 2), -- Rating out of 5
ride_status VARCHAR(20), -- Completed, Cancelled, etc.
ride_date DATE
);

-- Insert sample records into rapido_rides


INSERT INTO rapido_rides (ride_id, driver_id, rating, ride_status, ride_da
te)
VALUES
(101, 1, 4.5, 'Completed', '2024-05-01'),

157
1000+ SQL Interview Questions & Answers | By Zero Analyst

(102, 2, 4.7, 'Completed', '2024-05-05'),


(103, 3, 4.3, 'Completed', '2024-06-01'),
(104, 1, 3.8, 'Completed', '2024-06-10'),
(105, 2, 5.0, 'Completed', '2024-06-15'),
(106, 4, 4.6, 'Completed', '2024-07-01'),
(107, 1, 4.7, 'Completed', '2024-07-10'),
(108, 3, 4.4, 'Completed', '2024-07-15'),
(109, 2, 4.9, 'Completed', '2024-07-20'),
(110, 4, 3.9, 'Completed', '2024-08-01'),
(111, 1, 4.8, 'Completed', '2024-08-05'),
(112, 3, 4.2, 'Completed', '2024-08-10'),
(113, 2, 4.6, 'Completed', '2024-08-15'),
(114, 4, 4.1, 'Completed', '2024-09-01'),
(115, 1, 4.9, 'Completed', '2024-09-05'),
(116, 3, 4.0, 'Completed', '2024-09-10');

Learnings

• Subqueries: A subquery can be used to calculate the total number of


completed rides and average ratings for each driver.
• Window Functions: RANK() or ROW_NUMBER() can be used to rank
drivers based on their average rating in descending order.
• Date Filtering: Filtering rides to include only those within the last 6
months using a date comparison.
• Aggregation and Grouping: Using AVG() to calculate the average
rating and COUNT() to count the number of completed rides.

Solutions

PostgreSQL Solution
WITH driver_ratings AS (
SELECT driver_id,
COUNT(ride_id) AS total_rides,
AVG(rating) AS avg_rating
FROM rapido_rides
WHERE ride_status = 'Completed'
AND ride_date >= CURRENT_DATE - INTERVAL '6 months'
GROUP BY driver_id
)
SELECT driver_id,
avg_rating,
total_rides,
RANK() OVER (ORDER BY avg_rating DESC) AS rating_rank
FROM driver_ratings
ORDER BY rating_rank;

MySQL Solution
WITH driver_ratings AS (
SELECT driver_id,
COUNT(ride_id) AS total_rides,
AVG(rating) AS avg_rating
FROM rapido_rides
WHERE ride_status = 'Completed'
AND ride_date >= CURDATE() - INTERVAL 6 MONTH
GROUP BY driver_id
)
SELECT driver_id,
avg_rating,
total_rides,

158
1000+ SQL Interview Questions & Answers | By Zero Analyst

RANK() OVER (ORDER BY avg_rating DESC) AS rating_rank


FROM driver_ratings
ORDER BY rating_rank;

Explanation of Solutions

1. Subquery (driver_ratings):
• This part of the query calculates the total number of rides and
the average rating for each driver in the last 6 months.
• We filter the rides to include only the ones with the status
"Completed".
• The COUNT() function is used to get the number of completed
rides, and AVG() is used to calculate the average rating.

2. Window Function (RANK()):


• After calculating the average ratings, the RANK() window
function assigns a rank to each driver based on their average
rating in descending order.

3. Date Filtering:
• We use CURRENT_DATE - INTERVAL '6 months'
(PostgreSQL) or CURDATE() - INTERVAL 6 MONTH (MySQL)
to filter the records for the last 6 months.

4. Final Output:
• The query returns the driver_id, avg_rating, total_rides,
and the rank for each driver, ordered by their rank.

o Q.100

Identify the customers who have purchased eyewear or sunglasses from


Lenskart in the last 3 months, along with the total number of items they
have bought, the total amount spent, and their rank based on the total
amount spent. Display the customer ID, total number of items, total
amount spent, and rank.

Explanation
You need to calculate:

1. The total number of items purchased by each customer.

2. The total amount spent by each customer on eyewear or sunglasses


within the last 3 months.

3. Rank customers based on the total amount spent in descending order.

159
1000+ SQL Interview Questions & Answers | By Zero Analyst

The problem involves:

• Subqueries: To calculate the total number of items and the total


amount spent by each customer.
• Window Functions: To rank customers based on the total amount
spent.
• Date Filtering: Considering only purchases made within the last 3
months.
• Aggregation: Using SUM() for the total amount spent and COUNT() for
the total number of items.

Datasets and SQL Schemas


Table Creation and Sample Data
-- Create the lenskart_purchases table
CREATE TABLE lenskart_purchases (
purchase_id INT PRIMARY KEY,
customer_id INT,
product_name VARCHAR(255),
product_category VARCHAR(50), -- Eyewear, Sunglasses, etc.
quantity INT,
price DECIMAL(10, 2),
purchase_date DATE
);

-- Insert sample records into lenskart_purchases


INSERT INTO lenskart_purchases (purchase_id, customer_id, product_name, pr
oduct_category, quantity, price, purchase_date)
VALUES
(101, 1, 'Aviator Sunglasses', 'Sunglasses', 1, 2500.00, '2024-05-15')
,
(102, 2, 'Round Eyewear', 'Eyewear', 2, 1500.00, '2024-06-10'),
(103, 3, 'Wayfarer Sunglasses', 'Sunglasses', 1, 3000.00, '2024-07-01'
),
(104, 1, 'Cat Eye Sunglasses', 'Sunglasses', 1, 3500.00, '2024-07-05')
,
(105, 2, 'Square Eyewear', 'Eyewear', 1, 1200.00, '2024-07-12'),
(106, 4, 'Polarized Sunglasses', 'Sunglasses', 2, 2000.00, '2024-08-05
'),
(107, 3, 'Goggles', 'Eyewear', 1, 1800.00, '2024-08-15'),
(108, 1, 'Aviator Sunglasses', 'Sunglasses', 1, 2500.00, '2024-08-20')
,
(109, 5, 'Round Eyewear', 'Eyewear', 3, 1500.00, '2024-08-25'),
(110, 2, 'Wayfarer Eyewear', 'Eyewear', 1, 2000.00, '2024-09-01'),
(111, 6, 'Aviator Sunglasses', 'Sunglasses', 1, 2500.00, '2024-09-10')
,
(112, 7, 'Cat Eye Eyewear', 'Eyewear', 1, 3000.00, '2024-09-15');

Learnings

• Subqueries: Used to aggregate total items and total amount spent by


each customer.
• Window Functions: To rank customers based on the total amount
spent.
• Date Filtering: Only considering purchases from the last 3 months
using date comparison.

160
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Aggregation and Grouping: Use SUM() to calculate the total amount


and COUNT() for the total quantity of items purchased.
• String Matching: To filter only the relevant products (Eyewear or
Sunglasses).

Solutions

PostgreSQL Solution
WITH customer_spending AS (
SELECT customer_id,
COUNT(*) AS total_items,
SUM(quantity * price) AS total_spent
FROM lenskart_purchases
WHERE product_category IN ('Eyewear', 'Sunglasses')
AND purchase_date >= CURRENT_DATE - INTERVAL '3 months'
GROUP BY customer_id
)
SELECT customer_id,
total_items,
total_spent,
RANK() OVER (ORDER BY total_spent DESC) AS rank
FROM customer_spending
ORDER BY rank;

MySQL Solution
WITH customer_spending AS (
SELECT customer_id,
COUNT(*) AS total_items,
SUM(quantity * price) AS total_spent
FROM lenskart_purchases
WHERE product_category IN ('Eyewear', 'Sunglasses')
AND purchase_date >= CURDATE() - INTERVAL 3 MONTH
GROUP BY customer_id
)
SELECT customer_id,
total_items,
total_spent,
RANK() OVER (ORDER BY total_spent DESC) AS rank
FROM customer_spending
ORDER BY rank;

Explanation of Solutions

1. Subquery (customer_spending):
• This part of the query calculates the total number of items
purchased (COUNT(*)) and the total amount spent by each
customer (SUM(quantity * price)) on products in the
Eyewear or Sunglasses categories, considering only the last 3
months.
• We filter the purchases to include only those with the relevant
categories (Eyewear and Sunglasses) and within the specified
time range (purchase_date >= CURRENT_DATE - INTERVAL
'3 months' for PostgreSQL or CURDATE() - INTERVAL 3
MONTH for MySQL).

161
1000+ SQL Interview Questions & Answers | By Zero Analyst

2. Window Function (RANK()):


• After calculating the total amount spent by each customer, the
RANK() window function is used to assign ranks to customers,
with the highest spender ranked first.

3. Final Output:
• The query returns the customer_id, total_items,
total_spent, and rank for each customer, ordered by rank.

Key Points

• This problem showcases the use of subqueries for data aggregation


and window functions for ranking results.
• It also emphasizes the importance of date filtering and category
filtering to target specific products and time ranges.

Questions By Topic
SELECT Statement

• Q.101

Question
Retrieve all distinct job titles from the employee table.
Explanation
You need to query the Employee table and retrieve distinct job titles, meaning no
duplicates.
Datasets and SQL Schemas
-- Table creation
CREATE TABLE Employee (
EmployeeID INT,
Name VARCHAR(50),
JobTitle VARCHAR(50),
Department VARCHAR(50),
Salary DECIMAL(10, 2)
);

-- Datasets
INSERT INTO Employee (EmployeeID, Name, JobTitle, Department, Salary) VALUES
(1, 'Alice', 'Software Engineer', 'Engineering', 120000.00),
(2, 'Bob', 'Data Scientist', 'Data Analytics', 115000.00),
(3, 'Charlie', 'Software Engineer', 'Engineering', 120000.00),
(4, 'Daisy', 'HR Manager', 'Human Resources', 95000.00),
(5, 'Eve', 'Product Manager', 'Product', 130000.00);

Learnings

• Use of the DISTINCT keyword to remove duplicate values.

162
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Basic SELECT query to retrieve specific columns from a table.

Solutions

• - PostgreSQL solution
SELECT DISTINCT JobTitle
FROM Employee;

• - MySQL solution
SELECT DISTINCT JobTitle
FROM Employee;

• Q.102

Question
List all distinct product categories available on the platform.
Explanation
You need to query the Product table and retrieve the distinct categories, ensuring no
duplicates in the result.
Datasets and SQL Schemas
-- Table creation
CREATE TABLE Product (
ProductID INT,
ProductName VARCHAR(50),
Category VARCHAR(50),
Price DECIMAL(10, 2)
);

-- Datasets
INSERT INTO Product (ProductID, ProductName, Category, Price) VALUES
(1, 'Echo Dot', 'Electronics', 49.99),
(2, 'Fire Stick', 'Electronics', 39.99),
(3, 'Running Shoes', 'Sports', 79.99),
(4, 'Yoga Mat', 'Sports', 19.99),
(5, 'Smart Bulb', 'Electronics', 24.99);

Learnings

• Use of the DISTINCT keyword to get unique values.


• Basic SELECT query to retrieve specific columns from a table.

Solutions

• - PostgreSQL solution
SELECT DISTINCT Category
FROM Product;

• - MySQL solution
SELECT DISTINCT Category
FROM Product;

163
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Q.103

Question
Select the product name and price of the most expensive product from the Products
table.
Explanation
You need to find the most expensive product and return only its name and price. Use
MAX() to get the highest price.

Datasets and SQL Schemas


-- Table creation
CREATE TABLE Products (
ProductID INT,
ProductName VARCHAR(50),
Category VARCHAR(50),
Price DECIMAL(10, 2)
);

-- Datasets
INSERT INTO Products (ProductID, ProductName, Category, Price) VALUES
(1, 'Laptop', 'Electronics', 1200.00),
(2, 'Smartphone', 'Electronics', 800.00),
(3, 'Tablet', 'Electronics', 600.00),
(4, 'Coffee Maker', 'Appliances', 100.00),
(5, 'Toaster', 'Appliances', 50.00);

Solutions

• - PostgreSQL solution
SELECT ProductName, MAX(Price) AS MaxPrice
FROM Products;

• - MySQL solution
SELECT ProductName, MAX(Price) AS MaxPrice
FROM Products;

• Q.104

Question
List all distinct industries running ad campaigns.
Explanation
You need to query the AdCampaigns table and retrieve distinct industries involved in
ad campaigns. The DISTINCT keyword ensures no duplicates in the result.
Datasets and SQL Schemas
-- Table creation
CREATE TABLE AdCampaigns (
CampaignID INT,
CompanyName VARCHAR(50),
Industry VARCHAR(50),
Budget DECIMAL(10, 2)
);

-- Datasets
INSERT INTO AdCampaigns (CampaignID, CompanyName, Industry, Budget) VALUES

164
1000+ SQL Interview Questions & Answers | By Zero Analyst

(1, 'Company A', 'Retail', 5000.00),


(2, 'Company B', 'Technology', 10000.00),
(3, 'Company C', 'Education', 3000.00),
(4, 'Company D', 'Healthcare', 8000.00),
(5, 'Company E', 'Retail', 7000.00);

Learnings

• Use of DISTINCT to eliminate duplicate entries.


• Basic SELECT query to retrieve specific columns from a table.

Solutions

• - PostgreSQL solution
SELECT DISTINCT Industry
FROM AdCampaigns;

• - MySQL solution
SELECT DISTINCT Industry
FROM AdCampaigns;

• Q.105

Question
Find all genres of Netflix titles.
Explanation
You need to query the NetflixTitles table and retrieve distinct genres of Netflix
titles using the DISTINCT keyword to ensure no duplicates in the result.
Datasets and SQL Schemas
-- Table creation
CREATE TABLE NetflixTitles (
TitleID INT,
TitleName VARCHAR(50),
Genre VARCHAR(50),
ReleaseYear INT
);

-- Datasets
INSERT INTO NetflixTitles (TitleID, TitleName, Genre, ReleaseYear) VALUES
(1, 'Stranger Things', 'Sci-Fi', 2016),
(2, 'The Crown', 'Drama', 2016),
(3, 'Money Heist', 'Thriller', 2017),
(4, 'Bridgerton', 'Romance', 2020),
(5, 'Breaking Bad', 'Crime', 2008);

Learnings

• Use of DISTINCT to fetch unique values.


• Basic SELECT query to retrieve specific columns.

Solutions

• - PostgreSQL solution

165
1000+ SQL Interview Questions & Answers | By Zero Analyst

SELECT DISTINCT Genre


FROM NetflixTitles;

• - MySQL solution
SELECT DISTINCT Genre
FROM NetflixTitles;

• Q.106

Question
Display distinct license types available for Microsoft software.
Explanation
You need to query the Licenses table and retrieve distinct license types for Microsoft
software. You can filter by the SoftwareName column containing Microsoft products
and ensure there are no duplicates in the license types.
Datasets and SQL Schemas
-- Table creation
CREATE TABLE Licenses (
LicenseID INT,
SoftwareName VARCHAR(50),
LicenseType VARCHAR(50),
Price DECIMAL(10, 2)
);

-- Datasets
INSERT INTO Licenses (LicenseID, SoftwareName, LicenseType, Price) VALUES
(1, 'Microsoft Office', 'Personal', 69.99),
(2, 'Microsoft Office', 'Business', 149.99),
(3, 'Windows 11', 'Home', 119.99),
(4, 'Windows 11', 'Pro', 199.99),
(5, 'Azure', 'Enterprise', 499.99);

Learnings

• Using DISTINCT to retrieve unique values.


• Filtering data based on specific conditions in the WHERE clause.
• Basic SELECT query to pull relevant data from a table.

Solutions

• - PostgreSQL solution
SELECT DISTINCT LicenseType
FROM Licenses
WHERE SoftwareName LIKE 'Microsoft%';

• - MySQL solution
SELECT DISTINCT LicenseType
FROM Licenses
WHERE SoftwareName LIKE 'Microsoft%';

• Q.107

Question

166
1000+ SQL Interview Questions & Answers | By Zero Analyst

Select the first order received time on 31st December 2024 from the Orders table.
Explanation
You need to find the earliest order time (OrderTime) for all orders made on 31st
December 2024. This assumes all data corresponds to that date, and you simply want
the first order's time.
Datasets and SQL Schemas
-- Table creation
CREATE TABLE Orders (
OrderID INT,
CustomerID INT,
OrderDate DATE,
OrderTime TIME,
Amount DECIMAL(10, 2)
);

-- Datasets (assumes all orders are for 31st December 2024)


INSERT INTO Orders (OrderID, CustomerID, OrderDate, OrderTime, Amount) VALUES
(1, 101, '2024-12-31', '09:30:00', 500.00),
(2, 102, '2024-12-31', '08:15:00', 300.00),
(3, 103, '2024-12-31', '10:00:00', 150.00),
(4, 104, '2024-12-31', '12:45:00', 200.00),
(5, 105, '2024-12-31', '11:00:00', 400.00);

Solutions

• - PostgreSQL solution
SELECT MIN(OrderTime) AS FirstOrderTime
FROM Orders;

• - MySQL solution
SELECT MIN(OrderTime) AS FirstOrderTime
FROM Orders;

• Q.108

Question
Select the total price of all products from the Products table.
Explanation
You need to find the total price of all products in the Products table using the SUM()
function.
Datasets and SQL Schemas
-- Table creation
CREATE TABLE Products (
ProductID INT,
ProductName VARCHAR(50),
Category VARCHAR(50),
Price DECIMAL(10, 2)
);

-- Datasets
INSERT INTO Products (ProductID, ProductName, Category, Price) VALUES
(1, 'Laptop', 'Electronics', 1200.00),
(2, 'Smartphone', 'Electronics', 800.00),
(3, 'Tablet', 'Electronics', 600.00),
(4, 'Coffee Maker', 'Appliances', 100.00),

167
1000+ SQL Interview Questions & Answers | By Zero Analyst

(5, 'Toaster', 'Appliances', 50.00);

Solutions

• - PostgreSQL solution
SELECT SUM(Price) AS TotalPrice
FROM Products;

• - MySQL solution
SELECT SUM(Price) AS TotalPrice
FROM Products;

• Q.109

Question
Select the average salary of employees in the Employees table.
Explanation
You need to calculate the average salary of all employees in the Employees table
using the AVG() function.
Datasets and SQL Schemas
-- Table creation
CREATE TABLE Employees (
EmployeeID INT,
EmployeeName VARCHAR(50),
Department VARCHAR(50),
Salary DECIMAL(10, 2)
);

-- Datasets
INSERT INTO Employees (EmployeeID, EmployeeName, Department, Salary) VALUES
(1, 'Alice', 'Engineering', 100000.00),
(2, 'Bob', 'Engineering', 95000.00),
(3, 'Charlie', 'HR', 70000.00),
(4, 'David', 'HR', 75000.00),
(5, 'Eve', 'Marketing', 60000.00);

Solutions

• - PostgreSQL solution
SELECT AVG(Salary) AS AvgSalary
FROM Employees;

• - MySQL solution
SELECT AVG(Salary) AS AvgSalary
FROM Employees;

• Q.110

Question
Select the maximum salary from the Employees table.
Explanation

168
1000+ SQL Interview Questions & Answers | By Zero Analyst

You need to find the highest salary in the Employees table using the MAX() function.
Datasets and SQL Schemas
-- Table creation
CREATE TABLE Employees (
EmployeeID INT,
EmployeeName VARCHAR(50),
Department VARCHAR(50),
Salary DECIMAL(10, 2)
);

-- Datasets
INSERT INTO Employees (EmployeeID, EmployeeName, Department, Salary) VALUES
(1, 'Alice', 'Engineering', 100000.00),
(2, 'Bob', 'Engineering', 95000.00),
(3, 'Charlie', 'HR', 70000.00),
(4, 'David', 'HR', 75000.00),
(5, 'Eve', 'Marketing', 60000.00);

Solutions

• - PostgreSQL solution
SELECT MAX(Salary) AS MaxSalary
FROM Employees;

• - MySQL solution
SELECT MAX(Salary) AS MaxSalary
FROM Employees;

• COUNT
o Q.111

Question
Count the total number of orders placed in the Orders table.
Explanation
You need to count the total number of orders in the Orders table using
COUNT().
Datasets and SQL Schemas
-- Table creation
CREATE TABLE Orders (
OrderID INT,
CustomerID INT,
OrderDate DATE,
Amount DECIMAL(10, 2)
);

-- Datasets
INSERT INTO Orders (OrderID, CustomerID, OrderDate, Amount) VALUES
(1, 101, '2024-12-01', 50.00),
(2, 102, '2024-12-02', 30.00),
(3, 103, '2024-12-03', 70.00),
(4, 104, '2024-12-04', 100.00),
(5, 105, '2024-12-05', 25.00);

Solutions

• - PostgreSQL solution

169
1000+ SQL Interview Questions & Answers | By Zero Analyst

SELECT COUNT(OrderID) AS TotalOrders


FROM Orders;

• - MySQL solution
SELECT COUNT(OrderID) AS TotalOrders
FROM Orders;

o Q.112

Question
Count the number of unique products in the Products table.
Explanation
You need to count the distinct number of products in the Products table using
COUNT(DISTINCT).

Datasets and SQL Schemas


-- Table creation
CREATE TABLE Products (
ProductID INT,
ProductName VARCHAR(50),
Category VARCHAR(50),
Price DECIMAL(10, 2)
);

-- Datasets
INSERT INTO Products (ProductID, ProductName, Category, Price) VALUES
(1, 'Laptop', 'Electronics', 800.00),
(2, 'Smartphone', 'Electronics', 500.00),
(3, 'Tablet', 'Electronics', 300.00),
(4, 'Smartwatch', 'Accessories', 150.00),
(5, 'Laptop', 'Electronics', 800.00);

Solutions

• - PostgreSQL solution
SELECT COUNT(DISTINCT ProductName) AS UniqueProducts
FROM Products;

• - MySQL solution
SELECT COUNT(DISTINCT ProductName) AS UniqueProducts
FROM Products;

o Q.113

Question
Count the number of customers who made an order greater than $50 from the
Orders table.

Explanation
You need to count the number of orders where the amount is greater than 50
using COUNT() with a simple condition in the SELECT clause.
Datasets and SQL Schemas

170
1000+ SQL Interview Questions & Answers | By Zero Analyst

-- Table creation
CREATE TABLE Orders (
OrderID INT,
CustomerID INT,
OrderDate DATE,
Amount DECIMAL(10, 2)
);

-- Datasets
INSERT INTO Orders (OrderID, CustomerID, OrderDate, Amount) VALUES
(1, 101, '2024-12-01', 50.00),
(2, 102, '2024-12-02', 30.00),
(3, 103, '2024-12-03', 70.00),
(4, 104, '2024-12-04', 100.00),
(5, 105, '2024-12-05', 25.00);

Solutions

• - PostgreSQL solution
SELECT COUNT(OrderID) AS OrdersAbove50
FROM Orders
WHERE Amount > 50;

• - MySQL solution
SELECT COUNT(OrderID) AS OrdersAbove50
FROM Orders
WHERE Amount > 50;

o Q.114

Question
Count the total number of products available in each category from the
Products table.

Explanation
You need to count the total number of products in the Products table, and
then apply COUNT() to the records, making sure the category is taken into
account.
Datasets and SQL Schemas
-- Table creation
CREATE TABLE Products (
ProductID INT,
ProductName VARCHAR(50),
Category VARCHAR(50),
Price DECIMAL(10, 2)
);

-- Datasets
INSERT INTO Products (ProductID, ProductName, Category, Price) VALUES
(1, 'Laptop', 'Electronics', 800.00),
(2, 'Smartphone', 'Electronics', 500.00),
(3, 'Washing Machine', 'Home Appliances', 300.00),
(4, 'Refrigerator', 'Home Appliances', 600.00),
(5, 'Air Conditioner', 'Home Appliances', 700.00);

Solutions

• - PostgreSQL solution

171
1000+ SQL Interview Questions & Answers | By Zero Analyst

SELECT COUNT(ProductID) AS TotalProducts


FROM Products;

• - MySQL solution
SELECT COUNT(ProductID) AS TotalProducts
FROM Products;

o Q.115

Question
Count the number of employees assigned to each project from the
EmployeeProjects table.

Explanation
You need to count how many employees are assigned to each project. Use
COUNT() to aggregate the number of employees for each project.
Datasets and SQL Schemas
-- Table creation
CREATE TABLE EmployeeProjects (
EmployeeID INT,
ProjectName VARCHAR(50),
Department VARCHAR(50)
);

-- Datasets
INSERT INTO EmployeeProjects (EmployeeID, ProjectName, Department) VALUES
(1, 'AI Development', 'Engineering'),
(2, 'Cloud Migration', 'Engineering'),
(3, 'AI Development', 'Engineering'),
(4, 'Data Analytics', 'Analytics'),
(5, 'Cloud Migration', 'Engineering');

Learnings

• Understanding how to use the COUNT() function.


• Aggregating data using GROUP BY.
• Working with multiple columns to generate grouped counts.

Solutions

• - PostgreSQL solution
SELECT ProjectName, COUNT(EmployeeID) AS EmployeeCount
FROM EmployeeProjects
GROUP BY ProjectName;

• - MySQL solution
SELECT ProjectName, COUNT(EmployeeID) AS EmployeeCount
FROM EmployeeProjects
GROUP BY ProjectName;

o Q.116

Question

172
1000+ SQL Interview Questions & Answers | By Zero Analyst

Count the number of vehicles in each type available in the Vehicles table.
Explanation
You need to count the total number of vehicles available, based on the type, in
the Vehicles table. COUNT() is used to determine the total number of entries
for each vehicle type.
Datasets and SQL Schemas
-- Table creation
CREATE TABLE Vehicles (
VehicleID INT,
VehicleType VARCHAR(50),
Model VARCHAR(50),
Year INT
);

-- Datasets
INSERT INTO Vehicles (VehicleID, VehicleType, Model, Year) VALUES
(1, 'Car', 'Sedan', 2022),
(2, 'Car', 'SUV', 2021),
(3, 'Bike', 'Mountain', 2020),
(4, 'Car', 'Hatchback', 2023),
(5, 'Bike', 'Road', 2021);

Solutions

• - PostgreSQL solution
SELECT COUNT(VehicleID) AS TotalVehicles
FROM Vehicles;

• - MySQL solution
SELECT COUNT(VehicleID) AS TotalVehicles
FROM Vehicles;

o Q.117

Question
Count the total number of student records in the Students table.
Explanation
You need to count the total number of student records in the Students table
using COUNT().
Datasets and SQL Schemas
-- Table creation
CREATE TABLE Students (
StudentID INT,
Name VARCHAR(50),
Grade VARCHAR(2),
EnrollmentDate DATE
);

-- Datasets
INSERT INTO Students (StudentID, Name, Grade, EnrollmentDate) VALUES
(1, 'John Doe', 'A', '2023-01-10'),
(2, 'Jane Smith', 'B', '2022-09-20'),
(3, 'Sam Brown', 'A', '2021-05-15'),
(4, 'Anna White', 'C', '2024-02-22'),
(5, 'Peter Black', 'B', '2022-11-03');

173
1000+ SQL Interview Questions & Answers | By Zero Analyst

Solutions

• - PostgreSQL solution
SELECT COUNT(StudentID) AS TotalStudents
FROM Students;

• - MySQL solution
SELECT COUNT(StudentID) AS TotalStudents
FROM Students;

o Q.118

Question
Count how many rows in the Users table have a non-null date of birth (DOB).
Explanation
You need to count how many users have a non-null DOB in the Users table.
Use COUNT() to determine the number of users with a valid date of birth.
Datasets and SQL Schemas
-- Table creation
CREATE TABLE Users (
UserID INT,
UserName VARCHAR(50),
DOB DATE
);

-- Datasets
INSERT INTO Users (UserID, UserName, DOB) VALUES
(1, 'Alice', '1990-06-15'),
(2, 'Bob', NULL),
(3, 'Charlie', '1985-09-22'),
(4, 'David', NULL),
(5, 'Eve', '1992-01-11');

Solutions

• - PostgreSQL solution
SELECT COUNT(DOB) AS UsersWithDOB
FROM Users;

• - MySQL solution
SELECT COUNT(DOB) AS UsersWithDOB
FROM Users;

o Q.119

Question
Count how many records in the Books table have a publication year after
2010.
Explanation

174
1000+ SQL Interview Questions & Answers | By Zero Analyst

You need to count how many books in the Books table were published after
2010. Since we are not allowed to use WHERE, think about how you might
calculate the result with just COUNT().
Datasets and SQL Schemas
-- Table creation
CREATE TABLE Books (
BookID INT,
BookTitle VARCHAR(100),
Author VARCHAR(50),
PublicationYear INT
);

-- Datasets
INSERT INTO Books (BookID, BookTitle, Author, PublicationYear) VALUES
(1, 'The Great Gatsby', 'F. Scott Fitzgerald', 1925),
(2, 'The Catcher in the Rye', 'J.D. Salinger', 1951),
(3, 'Sapiens', 'Yuval Noah Harari', 2014),
(4, 'Becoming', 'Michelle Obama', 2018),
(5, 'Educated', 'Tara Westover', 2018);

Solutions

• - PostgreSQL solution
SELECT COUNT(BookID) AS BooksAfter2010
FROM Books
WHERE PublicationYear > 2010;

• - MySQL solution
SELECT COUNT(BookID) AS BooksAfter2010
FROM Books
WHERE PublicationYear > 2010;

o Q.120

Question
Count the number of distinct cities in the Locations table.
Explanation
You need to count how many distinct cities are available in the Locations
table using COUNT(DISTINCT).
Datasets and SQL Schemas
-- Table creation
CREATE TABLE Locations (
LocationID INT,
City VARCHAR(50),
Country VARCHAR(50),
Population INT
);

-- Datasets
INSERT INTO Locations (LocationID, City, Country, Population) VALUES
(1, 'Mumbai', 'India', 20000000),
(2, 'Delhi', 'India', 19000000),
(3, 'New York', 'USA', 8500000),
(4, 'Los Angeles', 'USA', 4000000),
(5, 'Mumbai', 'India', 20000000);

Solutions

175
1000+ SQL Interview Questions & Answers | By Zero Analyst

• - PostgreSQL solution
SELECT COUNT(DISTINCT City) AS DistinctCities
FROM Locations;

• - MySQL solution
SELECT COUNT(DISTINCT City) AS DistinctCities
FROM Locations;

WHERE Clause

• Q.121

Question
Fetch the distinct customer IDs who made purchases in the last month.
Explanation
You need to retrieve the unique customer IDs from the Transactions table where the
TransactionDate falls within the last month (i.e., from 15th December 2024 to 15th
January 2025). Use date functions to filter the data accordingly.
Datasets and SQL Schemas
-- Table creation
CREATE TABLE Transactions (
TransactionID INT,
CustomerID INT,
ProductID INT,
TransactionDate DATE,
Amount DECIMAL(10, 2)
);

-- Datasets
INSERT INTO Transactions (TransactionID, CustomerID, ProductID, TransactionDate,
Amount) VALUES
(1, 101, 1, '2024-12-01', 49.99),
(2, 102, 2, '2024-12-02', 39.99),
(3, 101, 3, '2024-12-03', 79.99),
(4, 103, 4, '2024-12-04', 19.99),
(5, 104, 5, '2024-12-05', 24.99);

Learnings

• Use DISTINCT to get unique values.


• Date filtering with CURRENT_DATE or equivalent to extract records from the
last month.
• Familiarity with date manipulation functions to adjust the date range.

Solutions

• - PostgreSQL solution
SELECT DISTINCT CustomerID
FROM Transactions
WHERE TransactionDate >= '2024-12-15' AND TransactionDate <= '2025-01-15';

176
1000+ SQL Interview Questions & Answers | By Zero Analyst

• - MySQL solution
SELECT DISTINCT CustomerID
FROM Transactions
WHERE TransactionDate >= '2024-12-15' AND TransactionDate <= '2025-01-15';

This solution ensures you're considering purchases made in the last month,
specifically from 15th December 2024 to 15th January 2025.

• Q.122

Question
Retrieve all aircraft orders where the quantity is greater than 50 from the
AircraftOrders table.
Explanation
You need to select the rows from the AircraftOrders table where the Quantity
column is greater than 50. The WHERE clause is used to apply this condition.
Datasets and SQL Schemas
-- Table creation
CREATE TABLE AircraftOrders (
OrderID INT,
CustomerName VARCHAR(50),
AircraftModel VARCHAR(50),
Quantity INT
);

-- Datasets
INSERT INTO AircraftOrders (OrderID, CustomerName, AircraftModel, Quantity) VALUE
S
(1, 'Lufthansa', 'A320', 60),
(2, 'Air France', 'A380', 30),
(3, 'British Airways', 'A350', 70),
(4, 'Ryanair', 'A320', 40),
(5, 'Iberia', 'A321', 55);

Learnings

• Understanding how to apply the WHERE condition for filtering data.


• Working with numeric conditions in SQL to filter results based on a value
(e.g., Quantity > 50).

Solutions

• - PostgreSQL solution
SELECT OrderID, CustomerName, AircraftModel, Quantity
FROM AircraftOrders
WHERE Quantity > 50;

• - MySQL solution
SELECT OrderID, CustomerName, AircraftModel, Quantity
FROM AircraftOrders
WHERE Quantity > 50;

177
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Q.123

Question
Retrieve all cars sold in the year 2024 from the CarSales table.
Explanation
You need to select all records from the CarSales table where the SaleYear is 2024.
The WHERE clause will be used to filter the data based on the year.
Datasets and SQL Schemas
-- Table creation
CREATE TABLE CarSales (
SaleID INT,
ModelName VARCHAR(50),
SaleYear INT,
SaleAmount DECIMAL(10, 2)
);

-- Datasets
INSERT INTO CarSales (SaleID, ModelName, SaleYear, SaleAmount) VALUES
(1, 'X5', 2024, 80000.00),
(2, 'i4', 2023, 60000.00),
(3, '3 Series', 2024, 45000.00),
(4, 'i7', 2022, 120000.00),
(5, '5 Series', 2024, 65000.00);

Learnings

• Filtering data based on specific conditions (e.g., a certain year) using the
WHERE clause.
• Working with date-based or year-based data in SQL queries.

Solutions

• - PostgreSQL solution
SELECT SaleID, ModelName, SaleYear, SaleAmount
FROM CarSales
WHERE SaleYear = 2024;

• - MySQL solution
SELECT SaleID, ModelName, SaleYear, SaleAmount
FROM CarSales
WHERE SaleYear = 2024;

• Q.124

Question
Get all distinct Tesla vehicle models ordered.
Explanation
You need to query the VehicleOrders table and retrieve distinct vehicle models that
are Tesla vehicles. No duplicates should appear in the result.
Datasets and SQL Schemas
-- Table creation

178
1000+ SQL Interview Questions & Answers | By Zero Analyst

CREATE TABLE VehicleOrders (


OrderID INT,
CustomerID INT,
Model VARCHAR(50),
OrderDate DATE,
Price DECIMAL(10, 2)
);

-- Datasets
INSERT INTO VehicleOrders (OrderID, CustomerID, Model, OrderDate, Price) VALUES
(1, 201, 'Model S', '2024-11-01', 79999.99),
(2, 202, 'Model 3', '2024-11-02', 49999.99),
(3, 201, 'Model X', '2024-11-03', 89999.99),
(4, 203, 'Model 3', '2024-11-04', 49999.99),
(5, 204, 'Model Y', '2024-11-05', 69999.99);

Learnings

• Use of the DISTINCT keyword to retrieve unique values.


• Filtering or retrieving specific columns with SELECT.

Solutions

• - PostgreSQL solution
SELECT DISTINCT Model
FROM VehicleOrders
WHERE Model LIKE 'Model%';

• - MySQL solution
SELECT DISTINCT Model
FROM VehicleOrders
WHERE Model LIKE 'Model%';

• Q.125

Question
Select the total amount spent by a specific customer (CustomerID = 101) from the
Purchases table.

Explanation
You need to calculate the total amount spent by a specific customer by selecting only
the CustomerID and the sum of the Amount column. Use SUM() to calculate the total.
Datasets and SQL Schemas
-- Table creation
CREATE TABLE Purchases (
PurchaseID INT,
CustomerID INT,
Amount DECIMAL(10, 2),
PurchaseDate DATE
);

-- Datasets
INSERT INTO Purchases (PurchaseID, CustomerID, Amount, PurchaseDate) VALUES
(1, 101, 250.00, '2024-01-10'),
(2, 102, 150.00, '2024-02-20'),
(3, 101, 100.00, '2024-03-15'),
(4, 103, 200.00, '2024-04-01'),
(5, 101, 50.00, '2024-05-22');

179
1000+ SQL Interview Questions & Answers | By Zero Analyst

Solutions

• - PostgreSQL solution
SELECT CustomerID, SUM(Amount) AS TotalSpent
FROM Purchases
WHERE CustomerID = 101;

• - MySQL solution
SELECT CustomerID, SUM(Amount) AS TotalSpent
FROM Purchases
WHERE CustomerID = 101;

• Q.126

Question
Retrieve all sales of Diesel where the quantity is between 1000 and 5000 liters from
the FuelSales table.
Explanation
You need to select all records from the FuelSales table where the FuelType is
'Diesel' and the QuantityLiters is between 1000 and 5000 liters. The WHERE clause
will be used to filter both conditions.
Datasets and SQL Schemas
-- Table creation
CREATE TABLE FuelSales (
SaleID INT,
FuelType VARCHAR(50),
QuantityLiters INT,
SaleAmount DECIMAL(10, 2)
);

-- Datasets
INSERT INTO FuelSales (SaleID, FuelType, QuantityLiters, SaleAmount) VALUES
(1, 'Diesel', 1200, 2500.00),
(2, 'Petrol', 800, 2000.00),
(3, 'Diesel', 4000, 5000.00),
(4, 'Diesel', 6000, 8000.00),
(5, 'Petrol', 1500, 3000.00);

Learnings

• Combining multiple conditions in a WHERE clause to filter data based on


numeric ranges.
• Using AND to combine conditions and retrieve specific data based on multiple
criteria (fuel type and quantity).

Solutions

• - PostgreSQL solution
SELECT SaleID, FuelType, QuantityLiters, SaleAmount
FROM FuelSales
WHERE FuelType = 'Diesel' AND QuantityLiters BETWEEN 1000 AND 5000;

180
1000+ SQL Interview Questions & Answers | By Zero Analyst

• - MySQL solution
SELECT SaleID, FuelType, QuantityLiters, SaleAmount
FROM FuelSales
WHERE FuelType = 'Diesel' AND QuantityLiters BETWEEN 1000 AND 5000;

• Q.127

Question
Retrieve all products with "Table" in their name and a price less than 200 EUR from
the ProductInventory table.
Explanation
You need to select all products where the ProductName contains the word "Table"
and the Price is less than 200 EUR. The WHERE clause will be used to filter both
conditions: a substring match for the product name and a condition for the price.
Datasets and SQL Schemas
-- Table creation
CREATE TABLE ProductInventory (
ProductID INT,
ProductName VARCHAR(50),
Price DECIMAL(10, 2),
Stock INT
);

-- Datasets
INSERT INTO ProductInventory (ProductID, ProductName, Price, Stock) VALUES
(1, 'Dining Table', 150.00, 100),
(2, 'Coffee Table', 180.00, 200),
(3, 'Office Chair', 100.00, 300),
(4, 'Bed Frame', 250.00, 150),
(5, 'Side Table', 90.00, 500);

Learnings

• Using LIKE for pattern matching in the WHERE clause to find a substring within
a column (e.g., products containing "Table").
• Combining multiple conditions in the WHERE clause (e.g., price less than 200
EUR).

Solutions

• - PostgreSQL solution
SELECT ProductID, ProductName, Price, Stock
FROM ProductInventory
WHERE ProductName LIKE '%Table%' AND Price < 200;

• - MySQL solution
SELECT ProductID, ProductName, Price, Stock
FROM ProductInventory
WHERE ProductName LIKE '%Table%' AND Price < 200;

• Q.127

181
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Q.128

Question
Retrieve all users with a Premium subscription who joined after 2022 from the
UserSubscriptions table.
Explanation
You need to select all users where the SubscriptionType is 'Premium' and the
JoinYear is greater than 2022. The WHERE clause will be used to filter these two
conditions.
Datasets and SQL Schemas
-- Table creation
CREATE TABLE UserSubscriptions (
UserID INT,
UserName VARCHAR(50),
SubscriptionType VARCHAR(50),
JoinYear INT
);

-- Datasets
INSERT INTO UserSubscriptions (UserID, UserName, SubscriptionType, JoinYear) VALU
ES
(1, 'Alice', 'Premium', 2023),
(2, 'Bob', 'Free', 2022),
(3, 'Charlie', 'Premium', 2024),
(4, 'Diana', 'Free', 2021),
(5, 'Eve', 'Premium', 2021);

Learnings

• Filtering data using the WHERE clause with multiple conditions (e.g.,
SubscriptionType = 'Premium' and JoinYear > 2022).
• Working with text-based columns (e.g., SubscriptionType) and numeric
columns (e.g., JoinYear).

Solutions

• - PostgreSQL solution
SELECT UserID, UserName, SubscriptionType, JoinYear
FROM UserSubscriptions
WHERE SubscriptionType = 'Premium' AND JoinYear > 2022;

• - MySQL solution
SELECT UserID, UserName, SubscriptionType, JoinYear
FROM UserSubscriptions
WHERE SubscriptionType = 'Premium' AND JoinYear > 2022;

• Q.129

Question
Retrieve all chocolate products sold in either Germany or France with sales above 500
units from the ProductSales table.

182
1000+ SQL Interview Questions & Answers | By Zero Analyst

Explanation
You need to select all records where the ProductCategory is 'Chocolate', the
Country is either 'Germany' or 'France', and the UnitsSold is greater than 500. Use
the WHERE clause to filter the conditions.
Datasets and SQL Schemas
-- Table creation
CREATE TABLE ProductSales (
SaleID INT,
ProductCategory VARCHAR(50),
Country VARCHAR(50),
UnitsSold INT
);

-- Datasets
INSERT INTO ProductSales (SaleID, ProductCategory, Country, UnitsSold) VALUES
(1, 'Chocolate', 'Germany', 600),
(2, 'Chocolate', 'France', 700),
(3, 'Beverage', 'Germany', 400),
(4, 'Snacks', 'France', 300),
(5, 'Chocolate', 'Spain', 200);

Learnings

• Using logical operators (OR) in the WHERE clause to combine multiple


conditions.
• Filtering data based on both categorical (Country) and numeric (UnitsSold)
columns.

Solutions

• - PostgreSQL solution
SELECT SaleID, ProductCategory, Country, UnitsSold
FROM ProductSales
WHERE ProductCategory = 'Chocolate' AND (Country = 'Germany' OR Country =
'France') AND UnitsSold > 500;

• - MySQL solution
SELECT SaleID, ProductCategory, Country, UnitsSold
FROM ProductSales
WHERE ProductCategory = 'Chocolate' AND (Country = 'Germany' OR Country =
'France') AND UnitsSold > 500;

• Q.130

Question
Retrieve all users whose average daily data usage exceeds 2GB but is below 5GB
from the NetworkUsage table.
Explanation
You need to select all records where the AverageDailyUsageGB is greater than 2 and
less than 5. Use the WHERE clause to filter this numeric condition.

183
1000+ SQL Interview Questions & Answers | By Zero Analyst

Datasets and SQL Schemas


-- Table creation
CREATE TABLE NetworkUsage (
UsageID INT,
UserID INT,
AverageDailyUsageGB DECIMAL(5, 2)
);

-- Datasets
INSERT INTO NetworkUsage (UsageID, UserID, AverageDailyUsageGB) VALUES
(1, 101, 2.5),
(2, 102, 1.8),
(3, 103, 4.7),
(4, 104, 5.5),
(5, 105, 3.2);

Learnings

• Filtering numeric data using comparison operators (>, <) to find values within
a specific range.
• Working with DECIMAL data type and performing range-based filtering.

Solutions

• - PostgreSQL solution
SELECT UsageID, UserID, AverageDailyUsageGB
FROM NetworkUsage
WHERE AverageDailyUsageGB > 2 AND AverageDailyUsageGB < 5;

• - MySQL solution
SELECT UsageID, UserID, AverageDailyUsageGB
FROM NetworkUsage
WHERE AverageDailyUsageGB > 2 AND AverageDailyUsageGB < 5;

GROUP BY

• Q.131

Question
Retrieve the distinct product names along with their corresponding average price from
each manufacturer.
Explanation
You need to select the product names and calculate the average price for each product
from different manufacturers. Use GROUP BY for the manufacturer and AVG() for the
average price.
Datasets and SQL Schemas
-- Table creation
CREATE TABLE Products (
ProductID INT,
ProductName VARCHAR(50),
Manufacturer VARCHAR(50),
Price DECIMAL(10, 2)
);

184
1000+ SQL Interview Questions & Answers | By Zero Analyst

-- Datasets
INSERT INTO Products (ProductID, ProductName, Manufacturer, Price) VALUES
(1, 'Laptop', 'Company A', 1200.00),
(2, 'Smartphone', 'Company B', 800.00),
(3, 'Tablet', 'Company A', 600.00),
(4, 'Smartwatch', 'Company B', 200.00),
(5, 'Laptop', 'Company A', 1100.00);

Solutions

• - PostgreSQL solution
SELECT ProductName, Manufacturer, AVG(Price) AS AveragePrice
FROM Products
GROUP BY ProductName, Manufacturer;

• - MySQL solution
SELECT ProductName, Manufacturer, AVG(Price) AS AveragePrice
FROM Products
GROUP BY ProductName, Manufacturer;

• Q.132

Question
Select the customer names along with their corresponding order IDs and the total
amount spent for each order.
Explanation
You need to select the customer name, order ID, and the total amount for each order
by grouping data based on CustomerID and OrderID. Use aggregation to calculate the
total amount spent for each order.
Datasets and SQL Schemas
-- Table creation
CREATE TABLE Orders (
OrderID INT,
CustomerID INT,
CustomerName VARCHAR(50),
Amount DECIMAL(10, 2)
);

-- Datasets
INSERT INTO Orders (OrderID, CustomerID, CustomerName, Amount) VALUES
(1, 101, 'Alice', 250.00),
(2, 102, 'Bob', 150.00),
(3, 101, 'Alice', 100.00),
(4, 103, 'Charlie', 200.00),
(5, 102, 'Bob', 50.00);

Solutions

• - PostgreSQL solution
SELECT CustomerName, OrderID, SUM(Amount) AS TotalAmount
FROM Orders
GROUP BY CustomerName, OrderID;

• - MySQL solution
SELECT CustomerName, OrderID, SUM(Amount) AS TotalAmount

185
1000+ SQL Interview Questions & Answers | By Zero Analyst

FROM Orders
GROUP BY CustomerName, OrderID;

• Q.133

Question
Fetch all distinct ride types offered by Uber, along with the average price for each ride
type.
Explanation
You need to retrieve distinct ride types from the Rides table. Additionally, for each
ride type, calculate the average price. Use the GROUP BY clause to group the data by
RideType and AVG() to calculate the average price.
Datasets and SQL Schemas
-- Table creation
CREATE TABLE Rides (
RideID INT,
CustomerID INT,
RideType VARCHAR(50),
Distance DECIMAL(5, 2),
Price DECIMAL(10, 2)
);

-- Datasets
INSERT INTO Rides (RideID, CustomerID, RideType, Distance, Price) VALUES
(1, 401, 'UberX', 5.2, 12.50),
(2, 402, 'UberXL', 10.5, 25.00),
(3, 403, 'UberX', 7.8, 15.00),
(4, 404, 'Uber Black', 3.2, 20.00),
(5, 405, 'UberXL', 8.1, 30.00);

Learnings

• Use of DISTINCT to get unique values.


• Grouping data using GROUP BY and performing aggregate functions like
AVG().
• Filtering or sorting data based on calculated fields.

Solutions

• - PostgreSQL solution
SELECT RideType, AVG(Price) AS AveragePrice
FROM Rides
GROUP BY RideType;

• - MySQL solution
SELECT RideType, AVG(Price) AS AveragePrice
FROM Rides
GROUP BY RideType;

• Q.134

Question

186
1000+ SQL Interview Questions & Answers | By Zero Analyst

Select the average salary per department and the department with the highest average
salary.
Explanation
You need to calculate the average salary for each department. Then, select the
department that has the highest average salary. You can use AVG() to calculate the
average and MAX() to find the highest value.
Datasets and SQL Schemas
-- Table creation
CREATE TABLE Employees (
EmployeeID INT,
EmployeeName VARCHAR(50),
Department VARCHAR(50),
Salary DECIMAL(10, 2)
);

-- Datasets
INSERT INTO Employees (EmployeeID, EmployeeName, Department, Salary) VALUES
(1, 'Alice', 'Engineering', 100000.00),
(2, 'Bob', 'Engineering', 95000.00),
(3, 'Charlie', 'HR', 70000.00),
(4, 'David', 'HR', 75000.00),
(5, 'Eve', 'Marketing', 60000.00);

Solutions

• - PostgreSQL solution
SELECT Department, AVG(Salary) AS AvgSalary
FROM Employees
GROUP BY Department
ORDER BY AvgSalary DESC
LIMIT 1;

• - MySQL solution
SELECT Department, AVG(Salary) AS AvgSalary
FROM Employees
GROUP BY Department
ORDER BY AvgSalary DESC
LIMIT 1;

Question 2
Find the highest and lowest priced products from each category.
Explanation
You need to find both the maximum (MAX()) and minimum (MIN()) price for products
within each category. This requires grouping the products by their category.
Datasets and SQL Schemas
-- Table creation
CREATE TABLE Products (
ProductID INT,
ProductName VARCHAR(50),
Category VARCHAR(50),
Price DECIMAL(10, 2)
);

-- Datasets

187
1000+ SQL Interview Questions & Answers | By Zero Analyst

INSERT INTO Products (ProductID, ProductName, Category, Price) VALUES


(1, 'Laptop', 'Electronics', 1200.00),
(2, 'Smartphone', 'Electronics', 800.00),
(3, 'Tablet', 'Electronics', 600.00),
(4, 'Coffee Maker', 'Appliances', 100.00),
(5, 'Toaster', 'Appliances', 50.00);

Solutions

• - PostgreSQL solution
SELECT Category, MAX(Price) AS MaxPrice, MIN(Price) AS MinPrice
FROM Products
GROUP BY Category;

• - MySQL solution
SELECT Category, MAX(Price) AS MaxPrice, MIN(Price) AS MinPrice
FROM Products
GROUP BY Category;

Question 3
Select the total amount spent by each customer, along with the date of their first and
last purchase.
Explanation
For each customer, calculate the total amount spent (use SUM()), and also find the date
of their first and last purchase using MIN() and MAX() respectively. Group the result
by customer.
Datasets and SQL Schemas
-- Table creation
CREATE TABLE Purchases (
PurchaseID INT,
CustomerID INT,
CustomerName VARCHAR(50),
Amount DECIMAL(10, 2),
PurchaseDate DATE
);

-- Datasets
INSERT INTO Purchases (PurchaseID, CustomerID, CustomerName, Amount, PurchaseDate
) VALUES
(1, 101, 'Alice', 250.00, '2024-01-10'),
(2, 102, 'Bob', 150.00, '2024-02-20'),
(3, 101, 'Alice', 100.00, '2024-03-15'),
(4, 103, 'Charlie', 200.00, '2024-04-01'),
(5, 102, 'Bob', 50.00, '2024-05-22');

Solutions

• - PostgreSQL solution
SELECT CustomerName, SUM(Amount) AS TotalSpent, MIN(PurchaseDate) AS FirstPurchas
e, MAX(PurchaseDate) AS LastPurchase
FROM Purchases
GROUP BY CustomerName;

• - MySQL solution

188
1000+ SQL Interview Questions & Answers | By Zero Analyst

SELECT CustomerName, SUM(Amount) AS TotalSpent, MIN(PurchaseDate) AS FirstPurchas


e, MAX(PurchaseDate) AS LastPurchase
FROM Purchases
GROUP BY CustomerName;

These questions focus purely on SELECT statements that utilize various aggregation
functions, such as AVG(), SUM(), MAX(), MIN(), and handle dates, while also
involving multiple columns. Let me know if you'd like more variations or
explanations!

• Q.135

Question
Find the number of missions launched by each country and filter those with more than
2 missions from the SpaceMissions table.
Explanation
You need to calculate the number of missions launched by each country. After
counting the missions per country, filter the results to show only countries with more
than 2 missions. This can be achieved by using aggregation and a HAVING clause to
filter based on the count.
Datasets and SQL Schemas
-- Table creation
CREATE TABLE SpaceMissions (
MissionID INT,
MissionName VARCHAR(50),
LaunchCountry VARCHAR(50),
LaunchDate DATE
);

-- Datasets
INSERT INTO SpaceMissions (MissionID, MissionName, LaunchCountry, LaunchDate) VAL
UES
(1, 'Apollo 11', 'USA', '1969-07-16'),
(2, 'Luna 2', 'USSR', '1959-09-12'),
(3, 'Voyager 1', 'USA', '1977-09-05'),
(4, 'Mars Rover', 'USA', '2003-06-10'),
(5, 'Venera 7', 'USSR', '1970-08-17');

Learnings

• Using the COUNT() function to aggregate the number of records (missions) per
group (country).
• Filtering the aggregated results using the HAVING clause to display only groups
that meet the condition (missions > 2).

Solutions

• - PostgreSQL solution
SELECT LaunchCountry, COUNT(MissionID) AS MissionCount
FROM SpaceMissions
GROUP BY LaunchCountry
HAVING COUNT(MissionID) > 2;

• - MySQL solution

189
1000+ SQL Interview Questions & Answers | By Zero Analyst

SELECT LaunchCountry, COUNT(MissionID) AS MissionCount


FROM SpaceMissions
GROUP BY LaunchCountry
HAVING COUNT(MissionID) > 2;

• Q.136

Question
List the number of aircraft delivered for each model and filter for models with total
deliveries exceeding 50 from the AircraftDeliveries table.
Explanation
You need to calculate the total number of deliveries for each aircraft model. After
calculating the sum, filter the results to show only models with total deliveries greater
than 50. This can be done using aggregation and the HAVING clause to filter based on
the summed units.
Datasets and SQL Schemas
-- Table creation
CREATE TABLE AircraftDeliveries (
DeliveryID INT,
Model VARCHAR(50),
UnitsDelivered INT
);

-- Datasets
INSERT INTO AircraftDeliveries (DeliveryID, Model, UnitsDelivered) VALUES
(1, 'A320', 30),
(2, 'A380', 20),
(3, 'A350', 60),
(4, 'A320', 40),
(5, 'A321', 70);

Learnings

• Using the SUM() function to aggregate the total units delivered for each model.
• Filtering aggregated results using the HAVING clause to only display models
with total deliveries exceeding 50.

Solutions

• - PostgreSQL solution
SELECT Model, SUM(UnitsDelivered) AS TotalDeliveries
FROM AircraftDeliveries
GROUP BY Model
HAVING SUM(UnitsDelivered) > 50;

• - MySQL solution
SELECT Model, SUM(UnitsDelivered) AS TotalDeliveries
FROM AircraftDeliveries
GROUP BY Model
HAVING SUM(UnitsDelivered) > 50;

• Q.137

Question

190
1000+ SQL Interview Questions & Answers | By Zero Analyst

Find the total quantity of fuel sold for each type and filter fuel types with total sales
exceeding 5000 liters from the FuelTypes table.
Explanation
You need to calculate the total quantity of fuel sold for each fuel type. After
calculating the total, filter the results to show only fuel types where the total quantity
sold exceeds 5000 liters. This can be achieved by using aggregation and the HAVING
clause to filter based on the summed quantity.
Datasets and SQL Schemas
-- Table creation
CREATE TABLE FuelTypes (
SaleID INT,
FuelType VARCHAR(50),
QuantityLiters INT
);

-- Datasets with additional records


INSERT INTO FuelTypes (SaleID, FuelType, QuantityLiters) VALUES
(1, 'Diesel', 2000),
(2, 'Petrol', 1800),
(3, 'Diesel', 3500),
(4, 'Petrol', 4000),
(5, 'Gasoline', 1000),
(6, 'Diesel', 4500),
(7, 'Petrol', 2200),
(8, 'Diesel', 6000),
(9, 'Gasoline', 2000),
(10, 'Diesel', 3000),
(11, 'Petrol', 5000),
(12, 'Gasoline', 1500);

Learnings

• Using the SUM() function to aggregate the total quantity of fuel sold for each
fuel type.
• Filtering aggregated results using the HAVING clause to only display fuel types
with sales exceeding 5000 liters.

Solutions

• - PostgreSQL solution
SELECT FuelType, SUM(QuantityLiters) AS TotalQuantitySold
FROM FuelTypes
GROUP BY FuelType
HAVING SUM(QuantityLiters) > 5000;

• - MySQL solution
SELECT FuelType, SUM(QuantityLiters) AS TotalQuantitySold
FROM FuelTypes
GROUP BY FuelType
HAVING SUM(QuantityLiters) > 5000;

• Q.138

Question

191
1000+ SQL Interview Questions & Answers | By Zero Analyst

Find the total number of streams for each artist and filter for artists with more than
1000 streams from the MusicStreams table.
Explanation
You need to calculate the total number of streams for each artist. After calculating the
total, filter the results to show only those artists who have more than 1000 streams.
This can be achieved by using aggregation and the HAVING clause to filter based on
the summed streams.
Datasets and SQL Schemas
-- Table creation
CREATE TABLE MusicStreams (
StreamID INT,
ArtistName VARCHAR(50),
Streams INT
);

-- Datasets with additional records


INSERT INTO MusicStreams (StreamID, ArtistName, Streams) VALUES
(1, 'Taylor Swift', 1200),
(2, 'Ed Sheeran', 800),
(3, 'Drake', 2000),
(4, 'BTS', 1500),
(5, 'Adele', 700),
(6, 'Billie Eilish', 1800),
(7, 'Justin Bieber', 2200),
(8, 'Ariana Grande', 2500),
(9, 'Post Malone', 3200),
(10, 'Lil Nas X', 1100),
(11, 'Shawn Mendes', 1200),
(12, 'The Weeknd', 500);

Learnings

• Using the SUM() function to aggregate the total number of streams for each
artist.
• Filtering aggregated results using the HAVING clause to only display artists
with more than 1000 streams.

Solutions

• - PostgreSQL solution
SELECT ArtistName, SUM(Streams) AS TotalStreams
FROM MusicStreams
GROUP BY ArtistName
HAVING SUM(Streams) > 1000;

• - MySQL solution
SELECT ArtistName, SUM(Streams) AS TotalStreams
FROM MusicStreams
GROUP BY ArtistName
HAVING SUM(Streams) > 1000;

• Q.139

Question

192
1000+ SQL Interview Questions & Answers | By Zero Analyst

Count the distinct products sold for each category and filter categories with more than
2 distinct products from the ProductCategories table.
Explanation
You need to count the number of distinct products sold within each category. After
counting, filter the categories to include only those with more than 2 distinct products.
This can be done using the COUNT(DISTINCT ProductName) to count distinct
products and the HAVING clause to filter the categories.
Datasets and SQL Schemas
-- Table creation
CREATE TABLE ProductCategories (
ProductID INT,
CategoryName VARCHAR(50),
ProductName VARCHAR(50)
);

-- Datasets with more product entries


INSERT INTO ProductCategories (ProductID, CategoryName, ProductName) VALUES
(1, 'Chocolate', 'KitKat'),
(2, 'Beverages', 'Nescafe'),
(3, 'Chocolate', 'Munch'),
(4, 'Beverages', 'Nestea'),
(5, 'Chocolate', 'MilkyBar'),
(6, 'Chocolate', 'DairyMilk'),
(7, 'Beverages', 'Lipton'),
(8, 'Beverages', 'CocaCola'),
(9, 'Snacks', 'Lays'),
(10, 'Snacks', 'Doritos');

Learnings

• Using COUNT(DISTINCT) to count unique products in each category.


• Filtering the results with HAVING to only show categories with more than 2
distinct products.

Solutions

• - PostgreSQL solution
SELECT CategoryName, COUNT(DISTINCT ProductName) AS DistinctProductCount
FROM ProductCategories
GROUP BY CategoryName
HAVING COUNT(DISTINCT ProductName) > 2;

• - MySQL solution
SELECT CategoryName, COUNT(DISTINCT ProductName) AS DistinctProductCount
FROM ProductCategories
GROUP BY CategoryName
HAVING COUNT(DISTINCT ProductName) > 2;

• Q.140

Question
Calculate the total data usage for each plan type and filter plans with data usage
exceeding 50 GB from the NetworkUsage table.
Explanation

193
1000+ SQL Interview Questions & Answers | By Zero Analyst

You need to calculate the total data usage for each PlanType. After calculating the
total, filter the results to show only the plan types with total data usage exceeding 50
GB. This can be done using the SUM() function to aggregate the data usage and the
HAVING clause to filter the results.

Datasets and SQL Schemas


-- Table creation
CREATE TABLE NetworkUsage (
UsageID INT,
PlanType VARCHAR(50),
DataUsageGB DECIMAL(5, 2)
);

-- Datasets with more data entries


INSERT INTO NetworkUsage (UsageID, PlanType, DataUsageGB) VALUES
(1, 'Unlimited', 25.5),
(2, 'Family', 30.0),
(3, 'Unlimited', 35.0),
(4, 'Prepaid', 20.5),
(5, 'Family', 25.0),
(6, 'Unlimited', 15.0),
(7, 'Prepaid', 10.0),
(8, 'Family', 10.0),
(9, 'Unlimited', 40.0),
(10, 'Family', 22.0),
(11, 'Unlimited', 50.0),
(12, 'Prepaid', 40.0),
(13, 'Family', 55.0),
(14, 'Unlimited', 60.0),
(15, 'Prepaid', 30.0),
(16, 'Family', 45.0),
(17, 'Unlimited', 70.0),
(18, 'Prepaid', 50.0),
(19, 'Family', 35.0),
(20, 'Unlimited', 100.0);

Learnings

• Using SUM() to calculate the total of a column (DataUsageGB).


• Filtering the result using HAVING to limit to plan types with more than 50 GB
of total data usage.

Solutions

• - PostgreSQL solution
SELECT PlanType, SUM(DataUsageGB) AS TotalDataUsage
FROM NetworkUsage
GROUP BY PlanType
HAVING SUM(DataUsageGB) > 50;

• - MySQL solution
SELECT PlanType, SUM(DataUsageGB) AS TotalDataUsage
FROM NetworkUsage
GROUP BY PlanType
HAVING SUM(DataUsageGB) > 50;

GROUP BY + HAVING

• Q.141

194
1000+ SQL Interview Questions & Answers | By Zero Analyst

Question
Find the total order amount for each customer and filter customers who spent more
than 500 EUR.
Explanation
Calculate the total order amount for each customer and then filter the customers
whose total order amount exceeds 500 EUR.
Datasets and SQL Schemas

• - Table creation
CREATE TABLE CustomerOrders (
OrderID INT,
CustomerName VARCHAR(50),
OrderAmount DECIMAL(10, 2)
);

• - Datasets
INSERT INTO CustomerOrders (OrderID, CustomerName, OrderAmount) VALUES
(1, 'Alice', 150.00),
(2, 'Bob', 400.00),
(3, 'Charlie', 550.00),
(4, 'Diana', 200.00),
(5, 'Eve', 300.00),
(6, 'Frank', 600.00),
(7, 'Grace', 700.00),
(8, 'Hannah', 800.00),
(9, 'Ivan', 450.00),
(10, 'Jack', 250.00),
(11, 'Kathy', 650.00),
(12, 'Leo', 100.00),
(13, 'Mia', 1200.00),
(14, 'Nina', 350.00),
(15, 'Oscar', 800.00);

Learnings

• Aggregation (SUM)
• Grouping data (GROUP BY)
• Filtering using HAVING clause

Solutions

• - PostgreSQL solution
SELECT CustomerName, SUM(OrderAmount) AS TotalSpent
FROM CustomerOrders
GROUP BY CustomerName
HAVING SUM(OrderAmount) > 500;

• - MySQL solution
SELECT CustomerName, SUM(OrderAmount) AS TotalSpent
FROM CustomerOrders
GROUP BY CustomerName
HAVING SUM(OrderAmount) > 500;

• Q.142

195
1000+ SQL Interview Questions & Answers | By Zero Analyst

Question
Count the number of returns for each category and filter categories with more than 3
returns.
Explanation
Count the number of returns for each product category and then filter out categories
that have 3 or fewer returns.
Datasets and SQL Schemas

• - Table creation
CREATE TABLE ProductReturns (
ReturnID INT,
CategoryName VARCHAR(50),
ReturnReason VARCHAR(100)
);

• - Datasets
INSERT INTO ProductReturns (ReturnID, CategoryName, ReturnReason) VALUES
(1, 'Shoes', 'Size issue'),
(2, 'Apparel', 'Defect'),
(3, 'Shoes', 'Damaged'),
(4, 'Apparel', 'Color mismatch'),
(5, 'Shoes', 'Wrong size'),
(6, 'Shoes', 'Quality issue'),
(7, 'Apparel', 'Defect'),
(8, 'Shoes', 'Size issue'),
(9, 'Accessories', 'Color mismatch'),
(10, 'Shoes', 'Wrong size'),
(11, 'Shoes', 'Damaged'),
(12, 'Apparel', 'Size issue'),
(13, 'Accessories', 'Defect'),
(14, 'Shoes', 'Quality issue'),
(15, 'Apparel', 'Fit issue'),
(16, 'Shoes', 'Wrong size'),
(17, 'Accessories', 'Color mismatch'),
(18, 'Shoes', 'Defect'),
(19, 'Apparel', 'Defect'),
(20, 'Shoes', 'Size issue');

Learnings

• Counting rows with COUNT()


• Grouping data with GROUP BY
• Filtering groups with HAVING clause

Solutions

• - PostgreSQL solution
SELECT CategoryName, COUNT(ReturnID) AS ReturnCount
FROM ProductReturns
GROUP BY CategoryName
HAVING COUNT(ReturnID) > 3;

• - MySQL solution
SELECT CategoryName, COUNT(ReturnID) AS ReturnCount

196
1000+ SQL Interview Questions & Answers | By Zero Analyst

FROM ProductReturns
GROUP BY CategoryName
HAVING COUNT(ReturnID) > 3;

• Q.143

Question
Find the total absences for each department and filter departments with total absences
greater than 20.
Explanation
Sum the absences for each department and filter the results to include only
departments with a total absence count greater than 20.
Datasets and SQL Schemas

• - Table creation
CREATE TABLE EmployeeAbsences (
AbsenceID INT,
Department VARCHAR(50),
Absences INT
);

• - Datasets
INSERT INTO EmployeeAbsences (AbsenceID, Department, Absences) VALUES
(1, 'Engineering', 12),
(2, 'HR', 5),
(3, 'Marketing', 10),
(4, 'Engineering', 15),
(5, 'HR', 8),
(6, 'Engineering', 10),
(7, 'Marketing', 7),
(8, 'HR', 4),
(9, 'Engineering', 20),
(10, 'Sales', 12),
(11, 'Engineering', 5),
(12, 'Marketing', 15),
(13, 'Sales', 5),
(14, 'HR', 6),
(15, 'Sales', 8),
(16, 'Marketing', 9),
(17, 'Engineering', 13),
(18, 'Sales', 10),
(19, 'HR', 3),
(20, 'Marketing', 10);

Learnings

• Aggregation with SUM()


• Grouping data with GROUP BY
• Filtering with HAVING clause

Solutions

• - PostgreSQL solution
SELECT Department, SUM(Absences) AS TotalAbsences
FROM EmployeeAbsences

197
1000+ SQL Interview Questions & Answers | By Zero Analyst

GROUP BY Department
HAVING SUM(Absences) > 20;

• - MySQL solution
SELECT Department, SUM(Absences) AS TotalAbsences
FROM EmployeeAbsences
GROUP BY Department
HAVING SUM(Absences) > 20;

• Q.144

Question
Find the number of books borrowed by each member and filter members who
borrowed more than 10 books in total.
Explanation
Sum the number of books borrowed by each member, then filter to only include
members who borrowed more than 10 books.
Datasets and SQL Schemas

• - Table creation
CREATE TABLE BookBorrowing (
BorrowID INT,
MemberID INT,
BookTitle VARCHAR(100),
BorrowCount INT
);

• - Datasets
INSERT INTO BookBorrowing (BorrowID, MemberID, BookTitle, BorrowCount) VALUES
(1, 101, 'Introduction to SQL', 3),
(2, 102, 'Advanced SQL', 5),
(3, 103, 'Database Design', 4),
(4, 101, 'Data Structures', 2),
(5, 102, 'Algorithms', 6),
(6, 104, 'Operating Systems', 1),
(7, 105, 'Computer Networks', 7),
(8, 101, 'Machine Learning', 3),
(9, 103, 'Web Development', 6),
(10, 105, 'Cloud Computing', 5);

Learnings

• Using SUM() for aggregation


• Filtering aggregated results with HAVING
• Grouping with GROUP BY

Solutions

• - PostgreSQL solution
SELECT MemberID, SUM(BorrowCount) AS TotalBooksBorrowed
FROM BookBorrowing
GROUP BY MemberID
HAVING SUM(BorrowCount) > 10;

198
1000+ SQL Interview Questions & Answers | By Zero Analyst

• - MySQL solution
SELECT MemberID, SUM(BorrowCount) AS TotalBooksBorrowed
FROM BookBorrowing
GROUP BY MemberID
HAVING SUM(BorrowCount) > 10;

• Q.145

Question
Find the total number of patient visits by department and filter departments with more
than 30 visits.
Explanation
Sum the total number of visits for each department and filter out departments with 30
or fewer visits.
Datasets and SQL Schemas

• - Table creation
CREATE TABLE PatientVisits (
VisitID INT,
Department VARCHAR(50),
PatientID INT,
VisitCount INT
);

• - Datasets
INSERT INTO PatientVisits (VisitID, Department, PatientID, VisitCount) VALUES
(1, 'Cardiology', 201, 3),
(2, 'Neurology', 202, 4),
(3, 'Orthopedics', 203, 5),
(4, 'Cardiology', 204, 6),
(5, 'Neurology', 205, 7),
(6, 'Orthopedics', 206, 2),
(7, 'Pediatrics', 207, 8),
(8, 'Cardiology', 208, 4),
(9, 'Orthopedics', 209, 5),
(10, 'Pediatrics', 210, 5),
(11, 'Neurology', 211, 6),
(12, 'Orthopedics', 212, 4),
(13, 'Cardiology', 213, 3),
(14, 'Pediatrics', 214, 9),
(15, 'Orthopedics', 215, 6);

Learnings

• Using SUM() with GROUP BY


• Filtering results with HAVING
• Counting and grouping data by department

Solutions

• - PostgreSQL solution
SELECT Department, SUM(VisitCount) AS TotalVisits
FROM PatientVisits

199
1000+ SQL Interview Questions & Answers | By Zero Analyst

GROUP BY Department
HAVING SUM(VisitCount) > 30;

• - MySQL solution
SELECT Department, SUM(VisitCount) AS TotalVisits
FROM PatientVisits
GROUP BY Department
HAVING SUM(VisitCount) > 30;

• Q.146

Question
Find the number of students enrolled in each course and filter out courses with fewer
than 5 students.
Explanation
Count the number of students in each course and then filter courses that have fewer
than 5 students enrolled.
Datasets and SQL Schemas

• - Table creation
CREATE TABLE CourseEnrollments (
EnrollmentID INT,
CourseName VARCHAR(100),
StudentID INT
);

• - Datasets
INSERT INTO CourseEnrollments (EnrollmentID, CourseName, StudentID) VALUES
(1, 'Data Science', 301),
(2, 'Machine Learning', 302),
(3, 'Data Science', 303),
(4, 'Artificial Intelligence', 304),
(5, 'Data Science', 305),
(6, 'Web Development', 306),
(7, 'Machine Learning', 307),
(8, 'Web Development', 308),
(9, 'Artificial Intelligence', 309),
(10, 'Data Science', 310),
(11, 'Machine Learning', 311),
(12, 'Web Development', 312),
(13, 'Artificial Intelligence', 313),
(14, 'Data Science', 314),
(15, 'Machine Learning', 315);

Learnings

• Counting rows with COUNT()


• Using GROUP BY with aggregate functions
• Filtering groups with HAVING

Solutions

• - PostgreSQL solution

200
1000+ SQL Interview Questions & Answers | By Zero Analyst

SELECT CourseName, COUNT(StudentID) AS TotalStudents


FROM CourseEnrollments
GROUP BY CourseName
HAVING COUNT(StudentID) >= 5;

• - MySQL solution
SELECT CourseName, COUNT(StudentID) AS TotalStudents
FROM CourseEnrollments
GROUP BY CourseName
HAVING COUNT(StudentID) >= 5;

• Q.147

Question
Find the top 5 customers who spent the most on products in each product category.
Include only customers who have spent more than 1000 EUR in a single category and
sort by the total amount spent in descending order.
Explanation
For each product category, sum the spending for each customer. Filter to only include
customers with spending greater than 1000 EUR, and return the top 5 customers in
each category based on their spending.
Datasets and SQL Schemas

• - Table creation
CREATE TABLE ProductSales (
SaleID INT,
ProductID INT,
ProductName VARCHAR(100),
Category VARCHAR(50),
CustomerID INT,
SaleAmount DECIMAL(10, 2),
SaleDate DATE
);

• - Datasets
INSERT INTO ProductSales (SaleID, ProductID, ProductName, Category, CustomerID, S
aleAmount, SaleDate) VALUES
(1, 101, 'Laptop', 'Electronics', 201, 1200.00, '2023-01-01'),
(2, 102, 'Phone', 'Electronics', 202, 800.00, '2023-01-10'),
(3, 103, 'Shoes', 'Apparel', 201, 150.00, '2023-02-15'),
(4, 104, 'Jacket', 'Apparel', 203, 350.00, '2023-02-20'),
(5, 105, 'Headphones', 'Electronics', 204, 300.00, '2023-03-05'),
(6, 106, 'Smartwatch', 'Electronics', 201, 900.00, '2023-03-10'),
(7, 107, 'Shirt', 'Apparel', 202, 100.00, '2023-04-01'),
(8, 108, 'Laptop', 'Electronics', 203, 1300.00, '2023-04-05'),
(9, 109, 'Phone', 'Electronics', 205, 500.00, '2023-04-10'),
(10, 110, 'Shoes', 'Apparel', 206, 200.00, '2023-05-15'),
(11, 111, 'Tablet', 'Electronics', 207, 1100.00, '2023-05-25'),
(12, 112, 'Shirt', 'Apparel', 204, 250.00, '2023-06-05'),
(13, 113, 'Smartwatch', 'Electronics', 201, 1100.00, '2023-06-20');

Learnings

• Aggregation with SUM()


• Filtering with HAVING

201
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Ranking (ROW_NUMBER for top customers in each category)


• Grouping by multiple fields (category and customer)

Solutions

• - PostgreSQL solution
WITH RankedCustomers AS (
SELECT Category, CustomerID, SUM(SaleAmount) AS TotalSpent,
ROW_NUMBER() OVER (PARTITION BY Category ORDER BY SUM(SaleAmount) DESC
) AS Rank
FROM ProductSales
GROUP BY Category, CustomerID
HAVING SUM(SaleAmount) > 1000
)
SELECT Category, CustomerID, TotalSpent
FROM RankedCustomers
WHERE Rank <= 5;

• - MySQL solution
WITH RankedCustomers AS (
SELECT Category, CustomerID, SUM(SaleAmount) AS TotalSpent,
ROW_NUMBER() OVER (PARTITION BY Category ORDER BY SUM(SaleAmount) DESC
) AS Rank
FROM ProductSales
GROUP BY Category, CustomerID
HAVING SUM(SaleAmount) > 1000
)
SELECT Category, CustomerID, TotalSpent
FROM RankedCustomers
WHERE Rank <= 5;

• Q.148

Question
Identify the products that have more than 10 returns and have an average return
amount greater than 100 EUR. Only include products where the total sales amount is
greater than 5000 EUR.
Explanation
For each product, calculate the total sales amount and the total number of returns.
Filter products with more than 10 returns, an average return amount greater than 100
EUR, and total sales greater than 5000 EUR.
Datasets and SQL Schemas

• - Table creation
CREATE TABLE SalesReturns (
ReturnID INT,
ProductID INT,
ProductName VARCHAR(100),
SaleAmount DECIMAL(10, 2),
ReturnAmount DECIMAL(10, 2),
ReturnDate DATE
);

• - Datasets

202
1000+ SQL Interview Questions & Answers | By Zero Analyst

INSERT INTO SalesReturns (ReturnID, ProductID, ProductName, SaleAmount, ReturnAmo


unt, ReturnDate) VALUES
(1, 201, 'Laptop', 1500.00, 200.00, '2023-01-10'),
(2, 202, 'Phone', 800.00, 120.00, '2023-01-20'),
(3, 201, 'Laptop', 1200.00, 150.00, '2023-02-05'),
(4, 203, 'Shoes', 100.00, 20.00, '2023-02-15'),
(5, 204, 'Headphones', 150.00, 50.00, '2023-03-01'),
(6, 201, 'Laptop', 1300.00, 300.00, '2023-03-10'),
(7, 205, 'Smartwatch', 200.00, 80.00, '2023-03-25'),
(8, 201, 'Laptop', 1200.00, 250.00, '2023-04-05'),
(9, 203, 'Shoes', 150.00, 30.00, '2023-04-10'),
(10, 202, 'Phone', 900.00, 100.00, '2023-05-01'),
(11, 201, 'Laptop', 1800.00, 400.00, '2023-05-10'),
(12, 205, 'Smartwatch', 300.00, 90.00, '2023-05-15'),
(13, 204, 'Headphones', 250.00, 60.00, '2023-06-01');

Learnings

• Using HAVING for both aggregation and filtering


• Calculating the average of returns (AVG())
• Filtering by multiple criteria

Solutions

• - PostgreSQL solution
SELECT ProductID, ProductName, COUNT(ReturnID) AS TotalReturns, AVG(ReturnAmount)
AS AvgReturnAmount, SUM(SaleAmount) AS TotalSales
FROM SalesReturns
GROUP BY ProductID, ProductName
HAVING COUNT(ReturnID) > 10
AND AVG(ReturnAmount) > 100
AND SUM(SaleAmount) > 5000;

• - MySQL solution
SELECT ProductID, ProductName, COUNT(ReturnID) AS TotalReturns, AVG(ReturnAmount)
AS AvgReturnAmount, SUM(SaleAmount) AS TotalSales
FROM SalesReturns
GROUP BY ProductID, ProductName
HAVING COUNT(ReturnID) > 10
AND AVG(ReturnAmount) > 100
AND SUM(SaleAmount) > 5000;

• Q.149

Question
Find the top 5 customers who spent the most across all their orders in a specific year
(e.g., 2023). Include only customers who placed at least 3 orders in that year, and
filter for customers whose total spending exceeds 5000 EUR.
Explanation
For each customer, sum the spending across all their orders in 2023. Filter out
customers with fewer than 3 orders and total spending less than 5000 EUR. Sort by
total spending in descending order and return the top 5.
Datasets and SQL Schemas

• - Table creation

203
1000+ SQL Interview Questions & Answers | By Zero Analyst

CREATE TABLE Orders (


OrderID INT,
CustomerID INT,
OrderDate DATE,
OrderAmount DECIMAL(10, 2)
);

• - Datasets
INSERT INTO Orders (OrderID, CustomerID, OrderDate, OrderAmount) VALUES
(1, 301, '2023-01-15', 1500.00),
(2, 302, '2023-02-10', 1200.00),
(3, 303, '2023-03-05', 200.00),
(4, 301, '2023-04-01', 700.00),
(5, 304, '2023-05-15', 250.00),
(6, 305, '2023-06-01',

350.00),
(7, 301, '2023-07-10', 1200.00),
(8, 303, '2023-08-25', 700.00),
(9, 302, '2023-09-05', 1200.00),
(10, 305, '2023-09-20', 500.00),
(11, 301, '2023-10-15', 3000.00),
(12, 304, '2023-11-10', 450.00),
(13, 305, '2023-12-01', 600.00);

**Learnings**
- Grouping by customer and filtering by multiple conditions
- Aggregation using `SUM()`
- Filtering using `HAVING` for both count and value conditions

**Solutions**

-- PostgreSQL solution
```sql
WITH CustomerSpending AS (
SELECT CustomerID, COUNT(OrderID) AS OrderCount, SUM(OrderAmount) AS TotalSpe
nding
FROM Orders
WHERE EXTRACT(YEAR FROM OrderDate) = 2023
GROUP BY CustomerID
HAVING COUNT(OrderID) >= 3 AND SUM(OrderAmount) > 5000
)
SELECT CustomerID, TotalSpending
FROM CustomerSpending
ORDER BY TotalSpending DESC
LIMIT 5;

• - MySQL solution
WITH CustomerSpending AS (
SELECT CustomerID, COUNT(OrderID) AS OrderCount, SUM(OrderAmount) AS TotalSpe
nding
FROM Orders
WHERE YEAR(OrderDate) = 2023
GROUP BY CustomerID
HAVING COUNT(OrderID) >= 3 AND SUM(OrderAmount) > 5000
)
SELECT CustomerID, TotalSpending
FROM CustomerSpending
ORDER BY TotalSpending DESC

204
1000+ SQL Interview Questions & Answers | By Zero Analyst

LIMIT 5;

These questions test advanced SQL skills involving GROUP BY, HAVING, WHERE,
window functions, and aggregate functions. Let me know if you need further
modifications!

• Q.150

Question
Identify the top 5 products with the highest return rate, where the return rate is defined
as the percentage of returns relative to the total number of sales. Include only products
where the total number of returns is greater than 20, and filter for products where the
return rate is greater than 15%.
Explanation
For each product, calculate the return rate (number of returns / total sales). Filter out
products with fewer than 20 returns and where the return rate is greater than 15%.
Sort by return rate in descending order and return the top 5 products.
Datasets and SQL Schemas

• - Table creation
CREATE TABLE ProductSales (
SaleID INT,
ProductID INT,
ProductName VARCHAR(100),
SaleAmount DECIMAL(10, 2),
SaleDate DATE
);

CREATE TABLE ProductReturns (


ReturnID INT,
ProductID INT,
ReturnAmount DECIMAL(10, 2),
ReturnDate DATE
);

• - Datasets
INSERT INTO ProductSales (SaleID, ProductID, ProductName, SaleAmount, SaleDate) V
ALUES
(1, 101, 'Laptop', 1500.00, '2023-01-01'),
(2, 101, 'Laptop', 1200.00, '2023-01-15'),
(3, 102, 'Phone', 800.00, '2023-02-10'),
(4, 101, 'Laptop', 1300.00, '2023-02-20'),
(5, 103, 'Headphones', 150.00, '2023-03-01'),
(6, 102, 'Phone', 900.00, '2023-03-05'),
(7, 104, 'Smartwatch', 300.00, '2023-04-01'),
(8, 101, 'Laptop', 1600.00, '2023-04-10'),
(9, 102, 'Phone', 700.00, '2023-05-15'),
(10, 103, 'Headphones', 250.00, '2023-06-01'),
(11, 101, 'Laptop', 1200.00, '2023-06-20');

INSERT INTO ProductReturns (ReturnID, ProductID, ReturnAmount, ReturnDate) VALUES


(1, 101, 200.00, '2023-01-10'),
(2, 101, 250.00, '2023-02-05'),
(3, 102, 120.00, '2023-02-15'),
(4, 101, 300.00, '2023-03-01'),
(5, 102, 150.00, '2023-03-20'),
(6, 101, 250.00, '2023-04-01'),

205
1000+ SQL Interview Questions & Answers | By Zero Analyst

(7, 103, 50.00, '2023-05-10'),


(8, 102, 100.00, '2023-05-25'),
(9, 101, 400.00, '2023-06-05'),
(10, 103, 75.00, '2023-06-15');

Learnings

• Using COUNT() to calculate the number of returns


• Calculating return rates as percentages
• Filtering groups using HAVING
• Sorting by aggregated values

Solutions

• - PostgreSQL solution
WITH ProductReturnRates AS (
SELECT ps.ProductID, ps.ProductName,
COUNT(pr.ReturnID) AS TotalReturns,
COUNT(ps.SaleID) AS TotalSales,
(COUNT(pr.ReturnID) * 1.0 / COUNT(ps.SaleID)) * 100 AS ReturnRate
FROM ProductSales ps
LEFT JOIN ProductReturns pr ON ps.ProductID = pr.ProductID
GROUP BY ps.ProductID, ps.ProductName
HAVING COUNT(pr.ReturnID) > 20 AND (COUNT(pr.ReturnID) * 1.0 / COUNT(ps.SaleI
D)) * 100 > 15
)
SELECT ProductID, ProductName, TotalReturns, ReturnRate
FROM ProductReturnRates
ORDER BY ReturnRate DESC
LIMIT 5;

• - MySQL solution
WITH ProductReturnRates AS (
SELECT ps.ProductID, ps.ProductName,
COUNT(pr.ReturnID) AS TotalReturns,
COUNT(ps.SaleID) AS TotalSales,
(COUNT(pr.ReturnID) * 1.0 / COUNT(ps.SaleID)) * 100 AS ReturnRate
FROM ProductSales ps
LEFT JOIN ProductReturns pr ON ps.ProductID = pr.ProductID
GROUP BY ps.ProductID, ps.ProductName
HAVING COUNT(pr.ReturnID) > 20 AND (COUNT(pr.ReturnID) * 1.0 / COUNT(ps.SaleI
D)) * 100 > 15
)
SELECT ProductID, ProductName, TotalReturns, ReturnRate
FROM ProductReturnRates
ORDER BY ReturnRate DESC
LIMIT 5;

ORDER BY

• Q.151

Question
Get all distinct sizes of beverages offered, but ensure that the sizes are ordered by
their price in descending order.
Explanation

206
1000+ SQL Interview Questions & Answers | By Zero Analyst

You need to retrieve distinct sizes of beverages from the Beverages table.
Additionally, order the sizes by their corresponding price in descending order to
display the most expensive sizes first.
Datasets and SQL Schemas
-- Table creation
CREATE TABLE Beverages (
BeverageID INT,
BeverageName VARCHAR(50),
Size VARCHAR(50),
Price DECIMAL(10, 2)
);

-- Datasets
INSERT INTO Beverages (BeverageID, BeverageName, Size, Price) VALUES
(1, 'Latte', 'Tall', 3.50),
(2, 'Latte', 'Grande', 4.00),
(3, 'Latte', 'Venti', 4.50),
(4, 'Espresso', 'Solo', 2.50),
(5, 'Espresso', 'Doppio', 3.00);

Learnings

• Use of DISTINCT to retrieve unique values.


• Use of ORDER BY to sort results, combined with DESC to order by price in
descending order.
• SELECT query to retrieve and filter data based on specific conditions.

Solutions

• - PostgreSQL solution
SELECT DISTINCT Size
FROM Beverages
ORDER BY Price DESC;

• - MySQL solution
SELECT DISTINCT Size
FROM Beverages
ORDER BY Price DESC;

• Q.152

Question
Sort Properties by Price in Ascending Order
Explanation
The task is to retrieve all property listings from the PropertyListings table and sort
them by the Price column in ascending order. This can be achieved using the ORDER
BY clause with the ASC keyword, which ensures that the rows are sorted from the
lowest to the highest price.
Datasets and SQL Schemas

• - Table creation

207
1000+ SQL Interview Questions & Answers | By Zero Analyst

CREATE TABLE PropertyListings (


ListingID INT,
PropertyName VARCHAR(100),
City VARCHAR(50),
Price DECIMAL(10, 2)
);

• - Datasets
INSERT INTO PropertyListings (ListingID, PropertyName, City, Price) VALUES
(1, 'Luxury Villa', 'Miami', 2500000.00),
(2, 'Cozy Condo', 'San Diego', 800000.00),
(3, 'Suburban House', 'Austin', 350000.00),
(4, 'Downtown Loft', 'Seattle', 1200000.00),
(5, 'Lakefront Cabin', 'Denver', 900000.00);

Learnings

• Using ORDER BY to sort query results.


• Sorting in ascending order using ASC (default order).
• Sorting by a numeric field (Price) to arrange data in a meaningful order.

Solutions

• - PostgreSQL solution
SELECT *
FROM PropertyListings
ORDER BY Price ASC;

• - MySQL solution
SELECT *
FROM PropertyListings
ORDER BY Price ASC;

• Q.153

Question
Sort Rental Properties by Monthly Rent in Descending Order
Explanation
The task is to retrieve all rental property listings from the RentalProperties table
and sort them by the MonthlyRent column in descending order. This can be done
using the ORDER BY clause with the DESC keyword, which ensures that the rows are
sorted from the highest to the lowest monthly rent.
Datasets and SQL Schemas

• - Table creation
CREATE TABLE RentalProperties (
RentalID INT,
PropertyName VARCHAR(100),
City VARCHAR(50),
MonthlyRent DECIMAL(10, 2)
);

208
1000+ SQL Interview Questions & Answers | By Zero Analyst

• - Datasets
INSERT INTO RentalProperties (RentalID, PropertyName, City, MonthlyRent) VALUES
(1, 'Urban Apartment', 'Boston', 2500.00),
(2, 'Lakeside Duplex', 'Denver', 1800.00),
(3, 'Riverside Studio', 'Portland', 2200.00),
(4, 'City Heights Condo', 'San Francisco', 3500.00),
(5, 'Suburban Flat', 'Austin', 1500.00);

Learnings

• Using ORDER BY to sort query results by numeric values.


• Sorting by a numeric column (MonthlyRent) in descending order (DESC) to
list the highest rents first.
• Sorting rental data based on price to prioritize higher rent properties.

Solutions

• - PostgreSQL solution
SELECT *
FROM RentalProperties
ORDER BY MonthlyRent DESC;

• - MySQL solution
SELECT *
FROM RentalProperties
ORDER BY MonthlyRent DESC;

• Q.154

Question
Sort Properties by Year Built with Unknown Years Last
Explanation
The task is to sort the properties by the YearBuilt column in ascending order,
ensuring that properties with a NULL value in the YearBuilt column appear at the end
of the list. This can be done using the ORDER BY clause with the ASC keyword, and
specifying NULLS LAST to ensure that NULL values are treated as the largest possible
values when sorting.
Datasets and SQL Schemas

• - Table creation
CREATE TABLE LuxuryProperties (
PropertyID INT,
PropertyName VARCHAR(100),
City VARCHAR(50),
Price DECIMAL(10, 2),
YearBuilt INT
);

• - Datasets

209
1000+ SQL Interview Questions & Answers | By Zero Analyst

INSERT INTO LuxuryProperties (PropertyID, PropertyName, City, Price, YearBuilt) V


ALUES
(1, 'Skyline Villa', 'New York', 2000000.00, 2015),
(2, 'Ocean Breeze', 'Miami', 1500000.00, NULL),
(3, 'Mountain Retreat', 'Denver', 2500000.00, 2020),
(4, 'City Penthouse', 'Los Angeles', 1200000.00, 2018);

Learnings

• Using ORDER BY to sort by a numeric column (YearBuilt).


• Handling NULL values explicitly with NULLS LAST to ensure they appear at the
end of the sorted list.
• Sorting with ASC (ascending order) and treating NULL as the largest possible
value in the column.

Solutions

• - PostgreSQL solution
SELECT *
FROM LuxuryProperties
ORDER BY YearBuilt ASC NULLS LAST;

• - MySQL solution
SELECT *
FROM LuxuryProperties
ORDER BY YearBuilt ASC;

In MySQL, NULLS LAST is not explicitly supported, but NULL


values are always placed last by default when sorting in ascending
order.
This query will return all luxury properties, sorted by the year they were built in
ascending order, with properties where the year is NULL appearing last.

• Q.155

Question
Retrieve Properties Sorted by Price-per-Square-Foot Value
Explanation
The task is to calculate the price-per-square-foot value for each property and sort the
results in ascending order based on this value. The price-per-square-foot is calculated
by dividing the Price by the SquareFeet. The query then orders the properties by
this calculated value.
Datasets and SQL Schemas

• - Table creation
CREATE TABLE RealEstateProperties (
PropertyID INT,
PropertyName VARCHAR(100),
City VARCHAR(50),

210
1000+ SQL Interview Questions & Answers | By Zero Analyst

Price DECIMAL(10, 2),


SquareFeet INT
);

• - Datasets
INSERT INTO RealEstateProperties (PropertyID, PropertyName, City, Price, SquareFe
et) VALUES
(1, 'Seaside Cottage', 'Miami', 800000.00, 2000),
(2, 'Downtown Loft', 'Seattle', 1200000.00, 1500),
(3, 'Suburban House', 'Austin', 350000.00, 2500),
(4, 'Lakefront Cabin', 'Denver', 900000.00, 1800);

Learnings

• Using arithmetic operations (division) to calculate derived values like price


per square foot.
• Sorting based on a calculated field (PricePerSquareFoot).
• The ORDER BY clause can be used to sort by both columns and calculated
expressions.

Solutions

• - PostgreSQL solution
SELECT PropertyName, Price / SquareFeet AS PricePerSquareFoot
FROM RealEstateProperties
ORDER BY PricePerSquareFoot ASC;

• - MySQL solution
SELECT PropertyName, Price / SquareFeet AS PricePerSquareFoot
FROM RealEstateProperties
ORDER BY PricePerSquareFoot ASC;

• Q.156

Question
Sorting Dates Stored as Text
Explanation
The task is to sort real estate transactions by their sale date, but the dates are stored as
text in the SaleDate column, formatted as 'DD-MM-YYYY'. Since text sorting is
lexicographical and doesn't handle dates as expected, we need to first convert the text
into a valid date format for correct sorting.
Datasets and SQL Schemas

• - Table creation
CREATE TABLE RealEstateTransactions (
TransactionID INT,
PropertyName VARCHAR(100),
City VARCHAR(50),
SaleDate VARCHAR(20) -- Dates stored as text
);

211
1000+ SQL Interview Questions & Answers | By Zero Analyst

• - Datasets
INSERT INTO RealEstateTransactions (TransactionID, PropertyName, City, SaleDate)
VALUES
(1, 'Luxury Villa', 'London', '12-05-2023'),
(2, 'Suburban House', 'Berlin', '01-03-2022'),
(3, 'Downtown Loft', 'Paris', '25-12-2021'),
(4, 'Cozy Apartment', 'Madrid', '08-07-2024'),
(5, 'Lakeside Cabin', 'Geneva', '15-09-2023');

Learnings

• Handling dates stored as text and converting them to a valid date format for
sorting.
• Using date functions (TO_DATE in PostgreSQL and STR_TO_DATE in MySQL)
to convert text into date format.
• Sorting by the converted date values.

Solutions

• - PostgreSQL solution
SELECT *
FROM RealEstateTransactions
ORDER BY TO_DATE(SaleDate, 'DD-MM-YYYY') ASC;

• - MySQL solution
SELECT *
FROM RealEstateTransactions
ORDER BY STR_TO_DATE(SaleDate, '%d-%m-%Y') ASC;

Explanation:

• In PostgreSQL, the TO_DATE() function is used to convert the text field


SaleDate into a date using the format 'DD-MM-YYYY'.
• In MySQL, the STR_TO_DATE() function performs the conversion of the text
field SaleDate to a date using the same format.

• Q.157

Question
Sort Employees by Department and Calculate Average Salary
Explanation
The task is to retrieve all employees, sorted first by their Department, and then by
their Salary in descending order within each department. Additionally, calculate the
average salary for each department.
Datasets and SQL Schemas

• - Table creation
CREATE TABLE Employees (
EmployeeID INT,
EmployeeName VARCHAR(100),

212
1000+ SQL Interview Questions & Answers | By Zero Analyst

Department VARCHAR(50),
Salary DECIMAL(10, 2)
);

• - Datasets
INSERT INTO Employees (EmployeeID, EmployeeName, Department, Salary) VALUES
(1, 'Alice', 'Engineering', 90000.00),
(2, 'Bob', 'HR', 50000.00),
(3, 'Charlie', 'Engineering', 95000.00),
(4, 'Diana', 'Marketing', 60000.00),
(5, 'Eve', 'HR', 55000.00),
(6, 'Frank', 'Marketing', 75000.00);

Learnings

• Using ORDER BY to sort data by multiple columns.


• Calculating averages with AVG() for grouped data using GROUP BY.
• Combining ORDER BY with aggregate functions.

Solutions

• - PostgreSQL solution
SELECT EmployeeName, Department, Salary
FROM Employees
ORDER BY Department, Salary DESC;

• - MySQL solution
SELECT EmployeeName, Department, Salary
FROM Employees
ORDER BY Department, Salary DESC;

• Q.158

Question
Sort Projects by Deadline and Count Employees Involved
Explanation
The task is to sort projects by their deadline in ascending order and count the number
of employees involved in each project. The data is spread across two tables: Projects
and ProjectEmployees.
Datasets and SQL Schemas

• - Table creation for Projects


CREATE TABLE Projects (
ProjectID INT,
ProjectName VARCHAR(100),
Deadline DATE
);

• - Table creation for ProjectEmployees


CREATE TABLE ProjectEmployees (

213
1000+ SQL Interview Questions & Answers | By Zero Analyst

ProjectID INT,
EmployeeID INT
);

• - Datasets
INSERT INTO Projects (ProjectID, ProjectName, Deadline) VALUES
(1, 'Project Alpha', '2024-12-01'),
(2, 'Project Beta', '2023-08-15'),
(3, 'Project Gamma', '2023-11-30');

INSERT INTO ProjectEmployees (ProjectID, EmployeeID) VALUES


(1, 101),
(1, 102),
(1, 103),
(2, 104),
(3, 105);

Learnings

• Using JOIN to combine data from multiple tables.


• Sorting by date (Deadline).
• Counting employees per project using COUNT() with GROUP BY.

Solutions

• - PostgreSQL solution
SELECT p.ProjectName, p.Deadline, COUNT(pe.EmployeeID) AS EmployeeCount
FROM Projects p
JOIN ProjectEmployees pe ON p.ProjectID = pe.ProjectID
GROUP BY p.ProjectID
ORDER BY p.Deadline ASC;

• - MySQL solution
SELECT p.ProjectName, p.Deadline, COUNT(pe.EmployeeID) AS EmployeeCount
FROM Projects p
JOIN ProjectEmployees pe ON p.ProjectID = pe.ProjectID
GROUP BY p.ProjectID
ORDER BY p.Deadline ASC;

• Q.159

Question
Sort Students by Total Marks and Filter Those Above a Threshold
Explanation
The task is to retrieve all students, sort them by their total marks in descending order,
and filter only those students whose total marks exceed a specific threshold (e.g.,
300).
Datasets and SQL Schemas

• - Table creation
CREATE TABLE Students (
StudentID INT,
StudentName VARCHAR(100)

214
1000+ SQL Interview Questions & Answers | By Zero Analyst

);

• - Table creation for Marks


CREATE TABLE Marks (
StudentID INT,
Subject VARCHAR(50),
Marks INT
);

• - Datasets
INSERT INTO Students (StudentID, StudentName) VALUES
(1, 'Alice'),
(2, 'Bob'),
(3, 'Charlie'),
(4, 'Diana');

INSERT INTO Marks (StudentID, Subject, Marks) VALUES


(1, 'Math', 95),
(1, 'Science', 90),
(1, 'English', 85),
(2, 'Math', 80),
(2, 'Science', 70),
(2, 'English', 85),
(3, 'Math', 100),
(3, 'Science', 90),
(3, 'English', 85),
(4, 'Math', 75),
(4, 'Science', 65),
(4, 'English', 70);

Learnings

• Using JOIN to combine tables and aggregate data.


• Using SUM() to calculate the total marks for each student.
• Filtering results with HAVING after aggregation to enforce a condition on the
sum.
• Sorting by total marks (ORDER BY).

Solutions

• - PostgreSQL solution
SELECT s.StudentName, SUM(m.Marks) AS TotalMarks
FROM Students s
JOIN Marks m ON s.StudentID = m.StudentID
GROUP BY s.StudentID
HAVING SUM(m.Marks) > 300
ORDER BY TotalMarks DESC;

• - MySQL solution
SELECT s.StudentName, SUM(m.Marks) AS TotalMarks
FROM Students s
JOIN Marks m ON s.StudentID = m.StudentID
GROUP BY s.StudentID
HAVING SUM(m.Marks) > 300
ORDER BY TotalMarks DESC;

• Q.160

215
1000+ SQL Interview Questions & Answers | By Zero Analyst

Question
Select the maximum salary from the Employees table.
Explanation
You need to find the highest salary in the Employees table using the MAX() function.
Datasets and SQL Schemas
-- Table creation
CREATE TABLE Employees (
EmployeeID INT,
EmployeeName VARCHAR(50),
Department VARCHAR(50),
Salary DECIMAL(10, 2)
);

-- Datasets
INSERT INTO Employees (EmployeeID, EmployeeName, Department, Salary) VALUES
(1, 'Alice', 'Engineering', 100000.00),
(2, 'Bob', 'Engineering', 95000.00),
(3, 'Charlie', 'HR', 70000.00),
(4, 'David', 'HR', 75000.00),
(5, 'Eve', 'Marketing', 60000.00);

Solutions

• - PostgreSQL solution
SELECT MAX(Salary) AS MaxSalary
FROM Employees;

• - MySQL solution
SELECT MAX(Salary) AS MaxSalary
FROM Employees;

JOINS

• Q.161

Question
Write an SQL query to retrieve all students along with their grades.
Explanation
This question requires you to perform a JOIN operation between the Students and
Grades tables to get a list of students along with their corresponding grades based on
the StudentID.
Datasets and SQL Schemas

• - Table creation
CREATE TABLE Students (
StudentID INT,
Name VARCHAR(50)
);

CREATE TABLE Grades (


StudentID INT,

216
1000+ SQL Interview Questions & Answers | By Zero Analyst

Grade VARCHAR(2)
);

• - Insert Records into Students Table


INSERT INTO Students (StudentID, Name)
VALUES
(1, 'John'),
(2, 'Alice'),
(3, 'Bob'),
(4, 'Emily'),
(5, 'Michael'),
(6, 'Sophia'),
(7, 'Daniel'),
(8, 'Olivia'),
(9, 'William'),
(10, 'Ava'),
(11, 'James'),
(12, 'Emma'),
(13, 'Alexander'),
(14, 'Mia'),
(15, 'Benjamin'),
(16, 'Charlotte'),
(17, 'Ethan'),
(18, 'Amelia'),
(19, 'Elijah'),
(20, 'Harper'),
(21, 'Henry'),
(22, 'Evelyn'),
(23, 'Jacob'),
(24, 'Abigail'),
(25, 'Matthew'),
(26, 'Emily'),
(27, 'David'),
(28, 'Liam'),
(29, 'Avery'),
(30, 'Michael'),
(31, 'Sofia'),
(32, 'Lucas'),
(33, 'Madison'),
(34, 'Ella'),
(35, 'Logan');

• - Insert Records into Grades Table


INSERT INTO Grades (StudentID, Grade)
VALUES
(1, 'A'),
(2, 'B'),
(3, 'C'),
(4, 'B'),
(5, 'A'),
(6, 'B'),
(7, 'A'),
(8, 'C'),
(9, 'A'),
(10, 'B'),
(11, 'B'),
(12, 'C'),
(13, 'A'),
(14, 'B'),
(15, 'C'),
(16, 'A'),
(17, 'A'),
(18, 'B'),
(19, 'C'),
(20, 'A'),
(21, 'A'),
(22, 'B'),

217
1000+ SQL Interview Questions & Answers | By Zero Analyst

(23, 'B'),
(24, 'C'),
(25, 'A'),
(26, 'B'),
(27, 'C'),
(28, 'A'),
(29, 'A'),
(30, 'B'),
(31, 'B'),
(32, 'C'),
(33, 'A'),
(34, 'B'),
(35, 'C');

Learnings

• Use of JOIN to combine related tables.


• Retrieving data from multiple tables using a common column (StudentID in
this case).
• Handling data from related tables to produce meaningful results.

Solutions

• - PostgreSQL solution
SELECT s.StudentID, s.Name, g.Grade
FROM Students s
JOIN Grades g ON s.StudentID = g.StudentID;

• - MySQL solution
SELECT s.StudentID, s.Name, g.Grade
FROM Students s
JOIN Grades g ON s.StudentID = g.StudentID;

• Q.162

Question
Write an SQL query to retrieve all employees along with their department names.
Explanation
This question requires performing an INNER JOIN between the Employees and
Departments tables using the common DepartmentID column. The goal is to retrieve
the department name for each employee.
Datasets and SQL Schemas

• - Table creation
CREATE TABLE Employees (
EmployeeID INT,
Name VARCHAR(50),
DepartmentID INT
);

CREATE TABLE Departments (


DepartmentID INT,
DepartmentName VARCHAR(50)
);

218
1000+ SQL Interview Questions & Answers | By Zero Analyst

• - Insert Records into Employees Table


INSERT INTO Employees (EmployeeID, Name, DepartmentID)
VALUES
(1, 'John', 1),
(2, 'Alice', 2),
(3, 'Bob', 1),
(4, 'Emily', 3),
(5, 'Michael', 2),
(6, 'Sophia', 1),
(7, 'Daniel', 2),
(8, 'Olivia', 3),
(9, 'William', 1),
(10, 'Ava', 2);

• - Insert Records into Departments Table


INSERT INTO Departments (DepartmentID, DepartmentName)
VALUES
(1, 'Human Resources'),
(2, 'Finance'),
(3, 'Marketing');

Learnings

• Using INNER JOIN to combine data from two related tables.


• Joining tables on a common key (DepartmentID).
• Retrieving related data from multiple tables in a single query.

Solutions

• - PostgreSQL solution
SELECT e.EmployeeID, e.Name, d.DepartmentName
FROM Employees e
JOIN Departments d ON e.DepartmentID = d.DepartmentID;

• - MySQL solution
SELECT e.EmployeeID, e.Name, d.DepartmentName
FROM Employees e
JOIN Departments d ON e.DepartmentID = d.DepartmentID;

• Q.163

Question
Write an SQL query to select all orders along with their customer names.
Explanation
This question requires you to perform a JOIN operation between the Customers and
Orders tables using the common CustomerID column. The goal is to retrieve both the
order details and the corresponding customer names.
Datasets and SQL Schemas

• - Table creation
CREATE TABLE Customers (

219
1000+ SQL Interview Questions & Answers | By Zero Analyst

CustomerID INT,
CustomerName VARCHAR(50)
);

CREATE TABLE Orders (


OrderID INT,
CustomerID INT,
TotalAmount DECIMAL(10, 2)
);

• - Insert Records into Customers Table


INSERT INTO Customers (CustomerID, CustomerName)
VALUES
(101, 'John Doe'),
(102, 'Alice Smith'),
(103, 'Bob Johnson'),
(104, 'Emily Brown'),
(105, 'Michael Wilson'),
(106, 'Sophia Taylor'),
(107, 'Daniel Anderson'),
(108, 'Olivia Martinez'),
(109, 'William Garcia'),
(110, 'Ava Lopez'),
(111, 'James Lee'),
(112, 'Emma Hernandez'),
(113, 'Alexander Adams'),
(114, 'Mia Evans'),
(115, 'Benjamin Baker'),
(116, 'Charlotte Hill'),
(117, 'Ethan Nelson'),
(118, 'Amelia Green'),
(119, 'Elijah Carter'),
(120, 'Harper Hughes'),
(121, 'Henry Flores'),
(122, 'Evelyn Collins'),
(123, 'Jacob Stewart'),
(124, 'Abigail Morris'),
(125, 'Matthew Rogers');

• - Insert Records into Orders Table


INSERT INTO Orders (OrderID, CustomerID, TotalAmount)
VALUES
(1, 101, 100.50),
(2, 102, 200.75),
(3, 103, 350.20),
(4, 104, 450.90),
(5, 105, 550.25),
(6, 106, 450.00),
(7, 107, 600.75),
(8, 108, 700.50),
(9, 109, 800.25),
(10, 110, 900.00),
(11, 111, 1000.50),
(12, 112, 1050.75),
(13, 113, 1200.20),
(14, 114, 1350.90),
(15, 115, 1450.25),
(16, 116, 1550.00),
(17, 117, 1600.75),
(18, 118, 1700.50),
(19, 119, 1800.25),
(20, 120, 1900.00),
(21, 121, 2000.50),
(22, 122, 2100.75),
(23, 123, 2200.20),
(24, 124, 2300.90),
(25, 125, 2400.25);

220
1000+ SQL Interview Questions & Answers | By Zero Analyst

Learnings

• Using JOIN to fetch related data from multiple tables.


• Joining tables on a common key (CustomerID).
• Retrieving multiple fields from different tables in a single query.

Solutions

• - PostgreSQL solution
SELECT o.OrderID, o.TotalAmount, c.CustomerName
FROM Orders o
JOIN Customers c ON o.CustomerID = c.CustomerID;

• - MySQL solution
SELECT o.OrderID, o.TotalAmount, c.CustomerName
FROM Orders o
JOIN Customers c ON o.CustomerID = c.CustomerID;

• Q.164

Question
Write an SQL query to select all products along with their category names.
Explanation
This question requires you to perform a JOIN between the Products and Categories
tables using the CategoryID column, in order to retrieve product details along with
their respective category names.
Datasets and SQL Schemas

• - Table creation
CREATE TABLE Categories (
CategoryID INT,
CategoryName VARCHAR(50)
);

CREATE TABLE Products (


ProductID INT,
ProductName VARCHAR(50),
CategoryID INT
);

• - Insert Records into Categories Table


INSERT INTO Categories (CategoryID, CategoryName)
VALUES
(1, 'Electronics'),
(2, 'Clothing'),
(3, 'Books'),
(4, 'Home Appliances'),
(5, 'Toys');

• - Insert Records into Products Table

221
1000+ SQL Interview Questions & Answers | By Zero Analyst

INSERT INTO Products (ProductID, ProductName, CategoryID)


VALUES
(101, 'Laptop', 1),
(102, 'Smartphone', 1),
(103, 'T-shirt', 2),
(104, 'Jeans', 2),
(105, 'Novel', 3),
(106, 'Cookbook', 3),
(107, 'Refrigerator', 4),
(108, 'Washing Machine', 4),
(109, 'Action Figure', 5),
(110, 'Board Game', 5);

Learnings

• Using JOIN to combine data from multiple tables.


• Joining on the foreign key (CategoryID) to link products with their categories.
• Retrieving related data across multiple tables.

Solutions

• - PostgreSQL solution
SELECT p.ProductID, p.ProductName, c.CategoryName
FROM Products p
JOIN Categories c ON p.CategoryID = c.CategoryID;

• - MySQL solution
SELECT p.ProductID, p.ProductName, c.CategoryName
FROM Products p
JOIN Categories c ON p.CategoryID = c.CategoryID;

• Q.165

Question
Write an SQL query to select all employees and their salaries. Include both employees
without a salary and salary records without a corresponding employee.
Explanation
This question requires using a FULL OUTER JOIN between the Employees and
Salaries tables. The FULL OUTER JOIN ensures that all records from both tables are
included, even if there is no corresponding match in the other table. If an employee
has no salary record, the salary will be NULL, and if a salary record does not have a
corresponding employee, the employee's name will be NULL.
Datasets and SQL Schemas

• - Table creation
CREATE TABLE Employees (
EmployeeID INT,
Name VARCHAR(50)
);

CREATE TABLE Salaries (


EmployeeID INT,
Salary DECIMAL(10, 2)
);

222
1000+ SQL Interview Questions & Answers | By Zero Analyst

• - Insert Records into Employees Table


INSERT INTO Employees (EmployeeID, Name)
VALUES
(1, 'John'),
(2, 'Alice'),
(3, 'Bob'),
(4, 'Emily');

• - Insert Records into Salaries Table


INSERT INTO Salaries (EmployeeID, Salary)
VALUES
(1, 50000.00),
(2, 60000.00),
(NULL, 55000.00),
(4, NULL);

Learnings

• Using a FULL OUTER JOIN to return all records from both tables, even if there
is no matching record in the other table.
• Handling NULL values for unmatched records in either table.

Solutions

• - PostgreSQL solution
SELECT e.Name, s.Salary
FROM Employees e
FULL OUTER JOIN Salaries s ON e.EmployeeID = s.EmployeeID;

• - MySQL solution (MySQL does not support FULL OUTER JOIN directly)
SELECT e.Name, s.Salary
FROM Employees e
LEFT JOIN Salaries s ON e.EmployeeID = s.EmployeeID
UNION
SELECT e.Name, s.Salary
FROM Employees e
RIGHT JOIN Salaries s ON e.EmployeeID = s.EmployeeID;

• Q.166

Question
Write an SQL query to select all teachers along with the subjects they teach.
Explanation
This question requires you to perform a JOIN between the Teachers and Subjects
tables using the SubjectID column, in order to retrieve the teacher names along with
the subjects they are assigned to.
Datasets and SQL Schemas

• - Table creation
CREATE TABLE Subjects (
SubjectID INT,

223
1000+ SQL Interview Questions & Answers | By Zero Analyst

SubjectName VARCHAR(50)
);

CREATE TABLE Teachers (


TeacherID INT,
Name VARCHAR(50),
SubjectID INT
);

• - Insert Records into Subjects Table


INSERT INTO Subjects (SubjectID, SubjectName)
VALUES
(1, 'Mathematics'),
(2, 'Physics'),
(3, 'Biology'),
(4, 'History'),
(5, 'English');

• - Insert Records into Teachers Table


INSERT INTO Teachers (TeacherID, Name, SubjectID)
VALUES
(101, 'Mr. Smith', 1),
(102, 'Ms. Johnson', 2),
(103, 'Mrs. Lee', 3),
(104, 'Mr. Garcia', 4),
(105, 'Dr. Martinez', 5),
(106, 'Ms. Clark', 1),
(107, 'Mr. Rodriguez', 2),
(108, 'Mrs. Wilson', 3),
(109, 'Mr. Taylor', 4),
(110, 'Ms. Brown', 5);

Learnings

• Using JOIN to relate teacher data with subject data.


• Joining on a foreign key (SubjectID) to link teachers with the subjects they
teach.
• Retrieving data from two related tables in a single query.

Solutions

• - PostgreSQL solution
SELECT t.Name, s.SubjectName
FROM Teachers t
JOIN Subjects s ON t.SubjectID = s.SubjectID;

• - MySQL solution
SELECT t.Name, s.SubjectName
FROM Teachers t
JOIN Subjects s ON t.SubjectID = s.SubjectID;

• Q.167

Question
Write an SQL query to retrieve the list of customers who have made orders for
products in the "Electronics" or "Books" categories, including their names, order total

224
1000+ SQL Interview Questions & Answers | By Zero Analyst

amounts, and the product names. The query should include data from the following
tables: Customers, Orders, Products, and Categories.
Explanation
This question requires performing a complex JOIN operation across four tables:

• Customers (to get customer details),


• Orders (to get order details),
• Products (to get product details),
• Categories (to filter products by category).

You will need to filter products by the categories "Electronics" and "Books" and join
them with the respective customer order data.
Datasets and SQL Schemas

• - Table creation
CREATE TABLE Customers (
CustomerID INT,
CustomerName VARCHAR(50)
);

CREATE TABLE Orders (


OrderID INT,
CustomerID INT,
TotalAmount DECIMAL(10, 2)
);

CREATE TABLE Products (


ProductID INT,
ProductName VARCHAR(50),
CategoryID INT
);

CREATE TABLE Categories (


CategoryID INT,
CategoryName VARCHAR(50)
);

• - Insert Records into Customers Table


INSERT INTO Customers (CustomerID, CustomerName)
VALUES
(1, 'John Doe'),
(2, 'Alice Smith'),
(3, 'Bob Johnson'),
(4, 'Emily Brown');

• - Insert Records into Orders Table


INSERT INTO Orders (OrderID, CustomerID, TotalAmount)
VALUES
(101, 1, 100.50),
(102, 2, 200.75),
(103, 3, 350.20),
(104, 4, 450.90);

• - Insert Records into Products Table


INSERT INTO Products (ProductID, ProductName, CategoryID)

225
1000+ SQL Interview Questions & Answers | By Zero Analyst

VALUES
(1001, 'Laptop', 1),
(1002, 'Smartphone', 1),
(1003, 'Novel', 3),
(1004, 'Cookbook', 3),
(1005, 'Headphones', 1),
(1006, 'Action Figure', 5);

• - Insert Records into Categories Table


INSERT INTO Categories (CategoryID, CategoryName)
VALUES
(1, 'Electronics'),
(2, 'Clothing'),
(3, 'Books'),
(4, 'Home Appliances'),
(5, 'Toys');

Learnings

• Performing JOIN operations across multiple tables (Customers, Orders,


Products, and Categories).
• Using filters in the WHERE clause to get specific product categories.
• Retrieving combined data (customer, product, and order) in a single query.

Solutions

• - PostgreSQL solution
SELECT c.CustomerName, o.TotalAmount, p.ProductName
FROM Customers c
JOIN Orders o ON c.CustomerID = o.CustomerID
JOIN Products p ON o.OrderID = p.ProductID
JOIN Categories cat ON p.CategoryID = cat.CategoryID
WHERE cat.CategoryName IN ('Electronics', 'Books');

• - MySQL solution
SELECT c.CustomerName, o.TotalAmount, p.ProductName
FROM Customers c
JOIN Orders o ON c.CustomerID = o.CustomerID
JOIN Products p ON o.OrderID = p.ProductID
JOIN Categories cat ON p.CategoryID = cat.CategoryID
WHERE cat.CategoryName IN ('Electronics', 'Books');

• Q.167

Question
Write an SQL query to retrieve the list of employees who have the same manager and
their corresponding manager's name. You need to use a self-join on the Employees
table. The result should show the employee's name and the name of their manager.
Explanation
This question requires performing a self-join on the Employees table. The idea is to
join the table with itself, where one instance of the table represents employees and the
other instance represents their managers. You will need to link the ManagerID (a
foreign key to EmployeeID) to fetch the manager's name for each employee.

226
1000+ SQL Interview Questions & Answers | By Zero Analyst

Datasets and SQL Schemas

• - Table creation
CREATE TABLE Employees (
EmployeeID INT,
Name VARCHAR(50),
ManagerID INT
);

• - Insert Records into Employees Table


INSERT INTO Employees (EmployeeID, Name, ManagerID)
VALUES
(1, 'John', NULL), -- John has no manager (top-level)
(2, 'Alice', 1), -- Alice's manager is John
(3, 'Bob', 1), -- Bob's manager is John
(4, 'Emily', 2), -- Emily's manager is Alice
(5, 'Michael', 2), -- Michael's manager is Alice
(6, 'Sophia', 3), -- Sophia's manager is Bob
(7, 'Daniel', 3); -- Daniel's manager is Bob

Learnings

• Using a self-join to join a table with itself.


• Using aliases (e1 and e2) to represent the same table multiple times in a query.
• Handling hierarchical relationships (e.g., employees and their managers) using
joins.

Solutions

• - PostgreSQL solution
SELECT e1.Name AS EmployeeName, e2.Name AS ManagerName
FROM Employees e1
JOIN Employees e2 ON e1.ManagerID = e2.EmployeeID
ORDER BY e1.Name;

• - MySQL solution
SELECT e1.Name AS EmployeeName, e2.Name AS ManagerName
FROM Employees e1
JOIN Employees e2 ON e1.ManagerID = e2.EmployeeID
ORDER BY e1.Name;

• Q.168

Question
Write an SQL query to select all subjects and the corresponding teacher names.
Include subjects without a corresponding teacher.
Explanation
This question requires using a LEFT JOIN between the Subjects and Teachers
tables. The LEFT JOIN ensures that all records from the Subjects table are returned,
even if there is no corresponding teacher in the Teachers table. If there is no
matching teacher, the teacher's name will be NULL.

227
1000+ SQL Interview Questions & Answers | By Zero Analyst

Datasets and SQL Schemas

• - Table creation
CREATE TABLE Subjects (
SubjectID INT,
SubjectName VARCHAR(50)
);

CREATE TABLE Teachers (


TeacherID INT,
Name VARCHAR(50),
SubjectID INT
);

• - Insert Records into Subjects Table


INSERT INTO Subjects (SubjectID, SubjectName)
VALUES
(1, 'Mathematics'),
(2, 'Physics'),
(3, 'Biology'),
(4, 'History'),
(5, 'English');

• - Insert Records into Teachers Table


INSERT INTO Teachers (TeacherID, Name, SubjectID)
VALUES
(101, 'Mr. Smith', 1),
(102, 'Ms. Johnson', 2),
(103, 'Mrs. Lee', 3),
(104, 'Mr. Garcia', NULL),
(105, 'Dr. Martinez', 5),
(106, 'Ms. Clark', NULL);

Learnings

• Using a LEFT JOIN to return all records from the left table (Subjects), even if
there is no matching record in the right table (Teachers).
• Handling NULL values when a subject does not have a corresponding teacher.

Solutions

• - PostgreSQL solution
SELECT s.SubjectName, t.Name AS TeacherName
FROM Subjects s
LEFT JOIN Teachers t ON s.SubjectID = t.SubjectID;

• - MySQL solution
SELECT s.SubjectName, t.Name AS TeacherName
FROM Subjects s
LEFT JOIN Teachers t ON s.SubjectID = t.SubjectID;

• Q.169

Question

228
1000+ SQL Interview Questions & Answers | By Zero Analyst

Write an SQL query to retrieve a list of all products and their respective sales,
including products that have not been sold yet. The result should show the product
name, sales amount, and quantity sold. For products that have not been sold, the sales
amount and quantity sold should be NULL. Use a RIGHT JOIN between the Products
and Sales tables.
Explanation
This question requires performing a RIGHT JOIN between the Products and Sales
tables. The goal is to ensure that all products are listed, even if they have no
corresponding sales records. For those products with no sales, the sales-related fields
(SalesAmount, QuantitySold) should show NULL.

Datasets and SQL Schemas

• - Table creation
CREATE TABLE Products (
ProductID INT,
ProductName VARCHAR(50)
);

CREATE TABLE Sales (


SaleID INT,
ProductID INT,
SalesAmount DECIMAL(10, 2),
QuantitySold INT
);

• - Insert Records into Products Table (10+ products)


INSERT INTO Products (ProductID, ProductName)
VALUES
(1, 'Laptop'),
(2, 'Smartphone'),
(3, 'Headphones'),
(4, 'Tablet'),
(5, 'Smartwatch'),
(6, 'Monitor'),
(7, 'Keyboard'),
(8, 'Mouse'),
(9, 'External Hard Drive'),
(10, 'Printer'),
(11, 'Speakers'),
(12, 'Smart TV'),
(13, 'USB Drive'),
(14, 'Webcam'),
(15, 'Charger');

• - Insert Records into Sales Table (10+ sales records)


INSERT INTO Sales (SaleID, ProductID, SalesAmount, QuantitySold)
VALUES
(1, 1, 1200.50, 3),
(2, 2, 800.75, 5),
(3, 4, 500.00, 2),
(4, 6, 300.25, 7),
(5, 7, 150.00, 10),
(6, 8, 80.00, 15),
(7, 10, 200.50, 6),
(8, 12, 950.75, 4),
(9, 13, 25.00, 30),

229
1000+ SQL Interview Questions & Answers | By Zero Analyst

(10, 2, 850.50, 8),


(11, 14, 50.00, 20);

Learnings

• Using a RIGHT JOIN to include all records from the right table (Sales) and
matching records from the left table (Products).
• Handling NULL values for products with no sales records.
• Retrieving data where some rows may not have corresponding records in one
of the tables.

Solutions

• - PostgreSQL solution
SELECT p.ProductName, s.SalesAmount, s.QuantitySold
FROM Products p
RIGHT JOIN Sales s ON p.ProductID = s.ProductID
ORDER BY p.ProductName;

• - MySQL solution
SELECT p.ProductName, s.SalesAmount, s.QuantitySold
FROM Products p
RIGHT JOIN Sales s ON p.ProductID = s.ProductID
ORDER BY p.ProductName;

• Q.170

Question
Write an SQL query to select all grades along with the student names. Include grades
without a corresponding student.
Explanation
This question requires using a LEFT JOIN between the Grades and Students tables.
The LEFT JOIN ensures that all records from the Grades table are returned, even if
there is no corresponding student in the Students table. If there is no matching
student, the student name will be NULL.
Datasets and SQL Schemas

• - Table creation
CREATE TABLE Students (
StudentID INT,
Name VARCHAR(50)
);

CREATE TABLE Grades (


StudentID INT,
Grade VARCHAR(2)
);

• - Insert Records into Students Table

230
1000+ SQL Interview Questions & Answers | By Zero Analyst

INSERT INTO Students (StudentID, Name)


VALUES
(1, 'John'),
(2, 'Alice'),
(3, 'Bob'),
(4, 'Emily');

• - Insert Records into Grades Table


INSERT INTO Grades (StudentID, Grade)
VALUES
(1, 'A'),
(2, 'B'),
(NULL, 'C'),
(4, 'B');

Learnings

• Using a LEFT JOIN to ensure all records from one table (Grades) are included,
even when there is no matching record in the other table (Students).
• Handling NULL values when there is no corresponding match in the Students
table.

Solutions

• - PostgreSQL solution
SELECT g.Grade, s.Name
FROM Grades g
LEFT JOIN Students s ON g.StudentID = s.StudentID;

• - MySQL solution
SELECT g.Grade, s.Name
FROM Grades g
LEFT JOIN Students s ON g.StudentID = s.StudentID;

Subqueries

• Q.171

Question
Select the employee name and their date of joining (DOJ) who joined the earliest from
the Employees table.
Explanation
You need to find the employee with the earliest DOJ (Date of Joining) and select their
name and the DOJ field. Use MIN() to find the earliest date.

Explanation:

• Subquery: (SELECT MIN(DOJ) FROM Employees) finds the earliest date of


joining from the Employees table.
• Main query: The main query uses that result to select the employee name
and DOJ for the employee who has that earliest DOJ.

231
1000+ SQL Interview Questions & Answers | By Zero Analyst

Datasets and SQL Schemas


-- Table creation
CREATE TABLE Employees (
EmployeeID INT,
EmployeeName VARCHAR(50),
DOJ DATE,
Department VARCHAR(50),
Salary DECIMAL(10, 2)
);

-- Datasets
INSERT INTO Employees (EmployeeID, EmployeeName, DOJ, Department, Salary) VALUES
(1, 'Alice', '2015-01-10', 'Engineering', 100000.00),
(2, 'Bob', '2017-06-25', 'HR', 95000.00),
(3, 'Charlie', '2014-08-15', 'Marketing', 70000.00),
(4, 'David', '2019-04-01', 'Engineering', 75000.00),
(5, 'Eve', '2016-03-22', 'HR', 80000.00);

Solutions

• - PostgreSQL solution
-- PostgreSQL solution
SELECT EmployeeName, DOJ AS EarliestDOJ
FROM Employees
WHERE DOJ = (SELECT MIN(DOJ) FROM Employees);

• - MySQL solution
-- MySQL solution
SELECT EmployeeName, DOJ AS EarliestDOJ
FROM Employees
WHERE DOJ = (SELECT MIN(DOJ) FROM Employees);

• Q.172

Question
Write an SQL query to select all employees who belong to the department with ID 5.
Explanation
This query requires a simple SELECT statement with a WHERE clause that filters the
employees based on their DepartmentID. The condition should be DepartmentID =
5.

Datasets and SQL Schemas

• - Table creation
CREATE TABLE Employees (
ID INT,
Name VARCHAR(50),
DepartmentID INT
);

• - Insert Records into Employees Table


INSERT INTO Employees (ID, Name, DepartmentID)
VALUES
(1, 'John', 5),
(2, 'Alice', 3),
(3, 'Bob', 5),

232
1000+ SQL Interview Questions & Answers | By Zero Analyst

(4, 'Jane', 7),


(5, 'Sam', 5),
(6, 'Emily', 2),
(7, 'Michael', 5),
(8, 'Sophia', 4),
(9, 'Daniel', 5),
(10, 'Olivia', 6),
(11, 'William', 5),
(12, 'Ava', 5),
(13, 'James', 5),
(14, 'Emma', 3),
(15, 'Alexander', 5),
(16, 'Mia', 5),
(17, 'Benjamin', 1),
(18, 'Charlotte', 5),
(19, 'Ethan', 5),
(20, 'Amelia', 5),
(21, 'Elijah', 5),
(22, 'Harper', 5),
(23, 'Henry', 5),
(24, 'Evelyn', 5),
(25, 'Jacob', 5),
(26, 'Abigail', 5),
(27, 'Matthew', 5),
(28, 'Emily', 5),
(29, 'David', 5),
(30, 'Liam', 5),
(31, 'Avery', 5),
(32, 'Michael', 5),
(33, 'Sofia', 5),
(34, 'Lucas', 5),
(35, 'Madison', 5);

Learnings

• Filtering data using a WHERE clause to find specific records based on column
values.

Solutions

• - PostgreSQL solution
SELECT *
FROM Employees
WHERE DepartmentID = 5;

• - MySQL solution
SELECT *
FROM Employees
WHERE DepartmentID = 5;

• Q.173

Question
Write a SQL query to select the order with the highest total amount.
Explanation
To find the order with the highest total amount, we can use a subquery. The subquery
will first find the maximum TotalAmount from the Orders table, and then the outer
query will select the order that has this maximum value.

233
1000+ SQL Interview Questions & Answers | By Zero Analyst

Datasets and SQL Schemas

• - Table creation
CREATE TABLE Orders (
OrderID INT,
CustomerID INT,
TotalAmount DECIMAL(10, 2)
);

• - Insert Records into Orders Table


INSERT INTO Orders (OrderID, CustomerID, TotalAmount)
VALUES
(1, 101, 100.50),
(2, 102, 200.75),
(3, 103, 350.20),
(4, 104, 450.90),
(5, 105, 550.25),
(6, 106, 450.00),
(7, 107, 600.75),
(8, 108, 700.50),
(9, 109, 800.25),
(10, 110, 900.00),
(11, 111, 1000.50),
(12, 112, 1050.75),
(13, 113, 1200.20),
(14, 114, 1350.90),
(15, 115, 1450.25),
(16, 116, 1550.00),
(17, 117, 1600.75),
(18, 118, 1700.50),
(19, 119, 1800.25),
(20, 120, 1900.00),
(21, 121, 2000.50),
(22, 122, 2100.75),
(23, 123, 2200.20),
(24, 124, 2300.90),
(25, 125, 2400.25),
(26, 126, 2500.00),
(27, 127, 2600.75),
(28, 128, 2700.50),
(29, 129, 2800.25),
(30, 130, 2900.00),
(31, 131, 3000.50),
(32, 132, 3100.75),
(33, 133, 3200.20),
(34, 134, 3300.90),
(35, 135, 3400.25);

Learnings

• Using a subquery to first find the maximum value and then using it in the
outer query to filter the result.
• Subqueries in SELECT or WHERE clauses to perform aggregate operations.

Solutions

• - PostgreSQL solution
SELECT *
FROM Orders
WHERE TotalAmount = (SELECT MAX(TotalAmount) FROM Orders);

234
1000+ SQL Interview Questions & Answers | By Zero Analyst

• - MySQL solution
SELECT *
FROM Orders
WHERE TotalAmount = (SELECT MAX(TotalAmount) FROM Orders);

• Q.174

Question
Write an SQL query to retrieve employee details from each department who have a
salary greater than the average salary in their department.

Explanation
This task requires a correlated subquery. For each employee, compare their salary
with the average salary in their department. The subquery should calculate the
average salary for the department, and the main query should return employees whose
salary exceeds that average.

Datasets and SQL Schemas

• - Table creation
CREATE TABLE Employees (
Emp_No DECIMAL(4,0) NOT NULL,
Emp_Name VARCHAR(10),
Job_Name VARCHAR(9),
Manager_Id DECIMAL(4,0),
HireDate DATE,
Salary DECIMAL(7,2),
Commission DECIMAL(7,2),
Department VARCHAR(20)
);

• - Insert data
INSERT INTO Employees (Emp_No, Emp_Name, Job_Name, Manager_Id, HireDate, Salary,
Commission, Department) VALUES
(7839, 'KING', 'PRESIDENT', NULL, '1981-11-17', 5000, NULL, 'IT'),
(7698, 'BLAKE', 'MANAGER', 7839, '1981-05-01', 2850, NULL, 'HR'),
(7782, 'CLARK', 'MANAGER', 7839, '1981-06-09', 2450, NULL, 'Marketing'),
(7566, 'JONES', 'MANAGER', 7839, '1981-04-02', 2975, NULL, 'Operations'),
(7788, 'SCOTT', 'ANALYST', 7566, '1987-07-29', 3000, NULL, 'Operations'),
(7902, 'FORD', 'ANALYST', 7566, '1981-12-03', 3000, NULL, 'Operations'),
(7369, 'SMITH', 'CLERK', 7902, '1980-12-17', 800, NULL, 'Operations'),
(7499, 'ALLEN', 'SALESMAN', 7698, '1981-02-20', 1600, 300, 'HR'),
(7521, 'WARD', 'SALESMAN', 7698, '1981-02-22', 1250, 500, 'HR'),
(7654, 'MARTIN', 'SALESMAN', 7698, '1981-09-28', 1250, 1400, 'HR'),
(7844, 'TURNER', 'SALESMAN', 7698, '1981-09-08', 1500, 0, 'HR'),
(7876, 'ADAMS', 'CLERK', 7788, '1987-06-02', 1100, NULL, 'Operations'),
(7900, 'JAMES', 'CLERK', 7698, '1981-12-03', 950, NULL, 'HR'),
(7934, 'MILLER', 'CLERK', 7782, '1982-01-23', 1300, NULL, 'Marketing'),
(7905, 'BROWN', 'SALESMAN', 7698, '1981-11-12', 1250, 1400, 'HR'),
(7906, 'DAVIS', 'ANALYST', 7566, '1987-07-13', 3000, NULL, 'Operations'),
(7907, 'GARCIA', 'MANAGER', 7839, '1981-08-12', 2975, NULL, 'IT'),
(7908, 'HARRIS', 'SALESMAN', 7698, '1981-06-21', 1600, 300, 'HR'),
(7909, 'JACKSON', 'CLERK', 7902, '1981-11-17', 800, NULL, 'Operations'),
(7910, 'JOHNSON', 'MANAGER', 7839, '1981-04-02', 2850, NULL, 'Marketing'),
(7911, 'LEE', 'ANALYST', 7566, '1981-09-28', 1250, 1400, 'Operations'),
(7912, 'MARTINEZ', 'CLERK', 7902, '1981-12-03', 1250, NULL, 'Operations'),
(7913, 'MILLER', 'MANAGER', 7839, '1981-01-23', 2450, NULL, 'HR'),

235
1000+ SQL Interview Questions & Answers | By Zero Analyst

(7914, 'RODRIGUEZ', 'SALESMAN', 7698, '1981-12-03', 1500, 0, 'Marketing'),


(7915, 'SMITH', 'CLERK', 7902, '1980-12-17', 1100, NULL, 'IT'),
(7916, 'TAYLOR', 'CLERK', 7902, '1981-02-20', 950, NULL, 'Marketing'),
(7917, 'THOMAS', 'SALESMAN', 7698, '1981-02-22', 1250, 500, 'Operations'),
(7918, 'WHITE', 'ANALYST', 7566, '1981-09-28', 1300, NULL, 'IT'),
(7919, 'WILLIAMS', 'MANAGER', 7839, '1981-11-17', 5000, NULL, 'Marketing'),
(7920, 'WILSON', 'SALESMAN', 7698, '1981-05-01', 2850, NULL, 'HR'),
(7921, 'YOUNG', 'CLERK', 7902, '1981-06-09', 2450, NULL, 'Operations'),
(7922, 'ADAMS', 'ANALYST', 7566, '1987-07-13', 3000, NULL, 'HR'),
(7923, 'BROWN', 'MANAGER', 7839, '1981-08-12', 2975, NULL, 'Marketing'),
(7924, 'DAVIS', 'SALESMAN', 7698, '1981-06-21', 1600, 300, 'Operations');

Learnings

• Correlated subqueries are used for row-by-row comparisons.


• Use of aggregate functions like AVG() to calculate average salaries.
• Filtering results based on computed aggregates.

Solutions

• - PostgreSQL solution
SELECT Emp_No, Emp_Name, Job_Name, Salary, Department
FROM Employees e1
WHERE Salary > (
SELECT AVG(Salary)
FROM Employees e2
WHERE e1.Department = e2.Department
);

• - MySQL solution
SELECT Emp_No, Emp_Name, Job_Name, Salary, Department
FROM Employees e1
WHERE Salary > (
SELECT AVG(Salary)
FROM Employees e2
WHERE e1.Department = e2.Department
);

• Q.175

Question
Find the details of employees whose salary is greater than the average salary across
the entire company.

Explanation
This problem involves calculating the overall average salary for all employees in the
company and retrieving the details of employees whose salary exceeds this average. A
subquery can be used to compute the average salary, which is then compared with
each employee's salary in the main query.

Datasets and SQL Schemas

• - Table creation

236
1000+ SQL Interview Questions & Answers | By Zero Analyst

CREATE TABLE employees (


employee_id SERIAL PRIMARY KEY,
employee_name VARCHAR(100),
department VARCHAR(50),
salary DECIMAL(10, 2)
);

• - Insert data
INSERT INTO employees (employee_name, department, salary)
VALUES
('John Doe', 'HR', 50000.00),
('Jane Smith', 'HR', 55000.00),
('Michael Johnson', 'HR', 60000.00),
('Emily Davis', 'IT', 60000.00),
('David Brown', 'IT', 65000.00),
('Sarah Wilson', 'Finance', 70000.00),
('Robert Taylor', 'Finance', 75000.00),
('Jennifer Martinez', 'Finance', 80000.00);

Learnings

• Using aggregate functions like AVG() for overall calculations.


• Subqueries can be used to calculate a global value (like the average salary).
• Filtering with comparison operators based on a calculated value.

Solutions

• - PostgreSQL solution
SELECT employee_id, employee_name, department, salary
FROM employees
WHERE salary > (
SELECT AVG(salary)
FROM employees
);

• - MySQL solution
SELECT employee_id, employee_name, department, salary
FROM employees
WHERE salary > (
SELECT AVG(salary)
FROM employees
);

• Q.176

Question
Write a SQL query to find the names of managers who have at least five direct
reports. Ensure that no employee is their own manager. Return the result table in any
order.

Explanation
To solve this, we need to identify employees who have at least five direct reports. A
direct report is an employee whose managerId matches the id of another employee.

237
1000+ SQL Interview Questions & Answers | By Zero Analyst

A subquery can be used to count the number of direct reports for each manager and
filter managers who have five or more direct reports.

Datasets and SQL Schemas

• - Table creation
CREATE TABLE Employees (
id INT PRIMARY KEY,
name VARCHAR(255),
department VARCHAR(255),
managerId INT
);

• - Insert data
INSERT INTO Employees (id, name, department, managerId) VALUES
(101, 'John', 'A', NULL),
(102, 'Dan', 'A', 101),
(103, 'James', 'A', 101),
(104, 'Amy', 'A', 101),
(105, 'Anne', 'A', 101),
(106, 'Ron', 'B', 101),
(107, 'Michael', 'C', NULL),
(108, 'Sarah', 'C', 107),
(109, 'Emily', 'C', 107),
(110, 'Brian', 'C', 107);

Learnings

• Self-joins or subqueries can be used to count related rows (i.e., direct reports).
• Filtering based on the count of related rows (employees managed by each
manager).
• Ensuring there are no circular relationships (no employee is their own
manager).

Solutions

• - PostgreSQL solution
SELECT name
FROM Employees
WHERE id IN (
SELECT managerId
FROM Employees
WHERE managerId IS NOT NULL
GROUP BY managerId
HAVING COUNT(*) >= 5
);

• - MySQL solution
SELECT name
FROM Employees
WHERE id IN (
SELECT managerId
FROM Employees
WHERE managerId IS NOT NULL
GROUP BY managerId
HAVING COUNT(*) >= 5

238
1000+ SQL Interview Questions & Answers | By Zero Analyst

);

• Q.177

Question
Write an SQL query to find the average market price of houses in each state and city,
where the average market price exceeds 300,000, using a correlated subquery.

Explanation
The task is to calculate the average market price for each state and city using a
correlated subquery. For each row, the subquery will calculate the average market
price of houses in the same state and city, and the main query will filter out those
where the average market price is greater than 300,000.

Datasets and SQL Schemas

• - Table creation
CREATE TABLE house_price (
id INT,
state VARCHAR(255),
city VARCHAR(255),
street_address VARCHAR(255),
mkt_price INT
);

• - Insert data
INSERT INTO house_price (id, state, city, street_address, mkt_price) VALUES
(1, 'NY', 'New York City', '66 Trout Drive', 449761),
(2, 'NY', 'New York City', 'Atwater', 277527),
(3, 'NY', 'New York City', '58 Gates Street', 268394),
(4, 'NY', 'New York City', 'Norcross', 279929),
(5, 'NY', 'New York City', '337 Shore Ave.', 151592),
(6, 'NY', 'New York City', 'Plainfield', 624531),
(7, 'NY', 'New York City', '84 Central Street', 267345),
(8, 'NY', 'New York City', 'Passaic', 88504),
(9, 'NY', 'New York City', '951 Fulton Road', 270476),
(10, 'NY', 'New York City', 'Oxon Hill', 118112),
(11, 'CA', 'Los Angeles', '692 Redwood Court', 150707),
(12, 'CA', 'Los Angeles', 'Lewiston', 463180),
(13, 'CA', 'Los Angeles', '8368 West Acacia Ave.', 538865),
(14, 'CA', 'Los Angeles', 'Pearl', 390896),
(15, 'CA', 'Los Angeles', '8206 Old Riverview Rd.', 117754),
(16, 'CA', 'Los Angeles', 'Seattle', 424588),
(17, 'CA', 'Los Angeles', '7227 Joy Ridge Rd.', 156850),
(18, 'CA', 'Los Angeles', 'Battle Ground', 643454),
(19, 'CA', 'Los Angeles', '233 Bedford Ave.', 713841),
(20, 'CA', 'Los Angeles', 'Saint Albans', 295852),
(21, 'IL', 'Chicago', '8830 Baker St.', 12944),
(22, 'IL', 'Chicago', 'Watertown', 410766),
(23, 'IL', 'Chicago', '632 Princeton St.', 160696),
(24, 'IL', 'Chicago', 'Waxhaw', 464144),
(25, 'IL', 'Chicago', '7773 Tailwater Drive', 129393),
(26, 'IL', 'Chicago', 'Bonita Springs', 174886),
(27, 'IL', 'Chicago', '31 Summerhouse Rd.', 296008),
(28, 'IL', 'Chicago', 'Middleburg', 279000),
(29, 'IL', 'Chicago', '273 Windfall Avenue', 424846),
(30, 'IL', 'Chicago', 'Graham', 592268),
(31, 'TX', 'Houston', '91 Canterbury Dr.', 632014),
(32, 'TX', 'Houston', 'Dallas', 68868),
(33, 'TX', 'Houston', '503 Elmwood St.', 454184),

239
1000+ SQL Interview Questions & Answers | By Zero Analyst

(34, 'TX', 'Houston', 'Kennewick', 186280),


(35, 'TX', 'Houston', '739 Chapel Street', 334474),
(36, 'TX', 'Houston', 'San Angelo', 204460),
(37, 'TX', 'Houston', '572 Parker Dr.', 678443),
(38, 'TX', 'Houston', 'Bellmore', 401090),
(39, 'TX', 'Houston', '8653 South Oxford Street', 482214),
(40, 'TX', 'Houston', 'Butler', 330868),
(41, 'AZ', 'Phoenix', '8667 S. Joy Ridge Court', 316291),
(42, 'AZ', 'Phoenix', 'Torrance', 210392),
(43, 'AZ', 'Phoenix', '35 Harvard St.', 167502),
(44, 'AZ', 'Phoenix', 'Nutley', 327554),
(45, 'AZ', 'Phoenix', '7313 Vermont St.', 285135),
(46, 'AZ', 'Phoenix', 'Lemont', 577667),
(47, 'AZ', 'Phoenix', '8905 Buttonwood Dr.', 212301),
(48, 'AZ', 'Phoenix', 'Lafayette', 317504);

Learnings

• A correlated subquery is a subquery that references columns from the outer


query.
• The subquery is evaluated for each row in the outer query.
• This technique can be used for calculating aggregates such as averages, sums,
counts, etc.

Solutions

• - PostgreSQL solution
SELECT state, city
FROM house_price hp1
WHERE (
SELECT AVG(mkt_price)
FROM house_price hp2
WHERE hp1.state = hp2.state AND hp1.city = hp2.city
) > 300000
GROUP BY state, city;

• - MySQL solution
SELECT state, city
FROM house_price hp1
WHERE (
SELECT AVG(mkt_price)
FROM house_price hp2
WHERE hp1.state = hp2.state AND hp1.city = hp2.city
) > 300000
GROUP BY state, city;

• Q.178

Question
Find the customer with the highest total purchase value based on their orders and
order items.

Explanation
The task is to calculate the total purchase value for each customer by multiplying the
quantity of each item ordered by its unit price, then summing the values for all items

240
1000+ SQL Interview Questions & Answers | By Zero Analyst

ordered by each customer. The customer with the highest total purchase value should
be returned.

Datasets and SQL Schemas

• - Create tables
CREATE TABLE Customers (
CustomerID INT PRIMARY KEY,
CustomerName VARCHAR(100)
);

CREATE TABLE Orders (


OrderID INT PRIMARY KEY,
CustomerID INT,
OrderDate DATE,
FOREIGN KEY (CustomerID) REFERENCES Customers(CustomerID)
);

CREATE TABLE OrderItems (


OrderItemID INT PRIMARY KEY,
OrderID INT,
ProductName VARCHAR(100),
Quantity INT,
UnitPrice DECIMAL(10, 2),
FOREIGN KEY (OrderID) REFERENCES Orders(OrderID)
);

• - Insert data
INSERT INTO Customers VALUES
(1, 'John Doe'),
(2, 'Jane Smith'),
(3, 'Alice Johnson');

INSERT INTO Orders VALUES


(101, 1, '2025-01-01'),
(102, 2, '2025-01-02'),
(103, 3, '2025-01-03');

INSERT INTO OrderItems VALUES


(201, 101, 'Laptop', 1, 1000.00),
(202, 102, 'Mouse', 2, 25.00),
(203, 102, 'Keyboard', 1, 50.00),
(204, 103, 'Monitor', 1, 200.00);

Learnings

• Subqueries: A correlated subquery is used to calculate the total purchase


value for each customer.
• Joins: The Orders table is joined with OrderItems to link customer orders
with the individual items they purchased.
• Aggregation: The total purchase value is calculated by summing the quantity
of items multiplied by their unit price.
• Ordering: The query orders the results by the total value in descending order
to find the customer with the highest total purchase.

Solutions

• - PostgreSQL solution

241
1000+ SQL Interview Questions & Answers | By Zero Analyst

SELECT CustomerName
FROM Customers
WHERE CustomerID = (
SELECT CustomerID
FROM Orders
JOIN OrderItems ON Orders.OrderID = OrderItems.OrderID
GROUP BY CustomerID
ORDER BY SUM(Quantity * UnitPrice) DESC
LIMIT 1
);

• - MySQL solution
SELECT CustomerName
FROM Customers
WHERE CustomerID = (
SELECT CustomerID
FROM Orders
JOIN OrderItems ON Orders.OrderID = OrderItems.OrderID
GROUP BY CustomerID
ORDER BY SUM(Quantity * UnitPrice) DESC
LIMIT 1
);

• Q.179

Question
Find employees who earn above the average salary of their department.

Explanation
The task is to find employees whose salary is greater than the average salary of
employees in the same department. A correlated subquery is used to calculate the
average salary for each department while comparing it against each employee’s salary
in the main query.

Datasets and SQL Schemas

• - Create tables
CREATE TABLE Departments (
DepartmentID INT PRIMARY KEY,
DepartmentName VARCHAR(100)
);

CREATE TABLE Employees (


EmployeeID INT PRIMARY KEY,
EmployeeName VARCHAR(100),
DepartmentID INT,
Salary DECIMAL(10, 2),
FOREIGN KEY (DepartmentID) REFERENCES Departments(DepartmentID)
);

• - Insert data
INSERT INTO Departments VALUES
(1, 'Engineering'),
(2, 'Marketing'),
(3, 'Finance');

INSERT INTO Employees VALUES


(1, 'Alice', 1, 80000.00),

242
1000+ SQL Interview Questions & Answers | By Zero Analyst

(2, 'Bob', 1, 60000.00),


(3, 'Charlie', 2, 70000.00),
(4, 'David', 2, 75000.00),
(5, 'Eve', 3, 90000.00);

Learnings

• Correlated Subqueries: The subquery references a column from the outer


query (DepartmentID) to calculate the average salary within the same
department.
• Comparison: The main query filters employees whose salary is greater than
the computed average salary for their respective department.

Solutions

• - PostgreSQL solution
SELECT EmployeeName
FROM Employees E
WHERE Salary > (
SELECT AVG(Salary)
FROM Employees
WHERE DepartmentID = E.DepartmentID
);

• - MySQL solution
SELECT EmployeeName
FROM Employees E
WHERE Salary > (
SELECT AVG(Salary)
FROM Employees
WHERE DepartmentID = E.DepartmentID
);

• Q.180

Question
Find the names of employees who earn more than the average salary of employees in
the same department, but only for departments where the average salary is greater
than $50,000.

Explanation
This query involves two key parts:

1. Correlated Subquery: For each employee, we compute the average salary in


their department and check if their salary is above that average.

2. Subquery Condition: The outer query filters only those departments where
the average salary is greater than $50,000.

Datasets and SQL Schemas

• - Create tables

243
1000+ SQL Interview Questions & Answers | By Zero Analyst

CREATE TABLE Departments (


DepartmentID INT PRIMARY KEY,
DepartmentName VARCHAR(100)
);

CREATE TABLE Employees (


EmployeeID INT PRIMARY KEY,
EmployeeName VARCHAR(100),
DepartmentID INT,
Salary DECIMAL(10, 2),
FOREIGN KEY (DepartmentID) REFERENCES Departments(DepartmentID)
);

• - Insert data
INSERT INTO Departments VALUES
(1, 'Engineering'),
(2, 'Marketing'),
(3, 'Finance');

INSERT INTO Employees VALUES


(1, 'Alice', 1, 80000.00),
(2, 'Bob', 1, 60000.00),
(3, 'Charlie', 2, 70000.00),
(4, 'David', 2, 75000.00),
(5, 'Eve', 3, 90000.00),
(6, 'Frank', 3, 45000.00),
(7, 'Grace', 1, 120000.00);

Learnings

• Correlated Subqueries: The inner subquery calculates the average salary for
each department.
• Subquery Filtering: The outer query filters only departments where the
average salary exceeds $50,000.
• Aggregation: The AVG() function is used for calculating the average salary in
each department.

Solutions

• - PostgreSQL solution
SELECT EmployeeName
FROM Employees E
WHERE Salary > (
SELECT AVG(Salary)
FROM Employees
WHERE DepartmentID = E.DepartmentID
)
AND E.DepartmentID IN (
SELECT DepartmentID
FROM Employees
GROUP BY DepartmentID
HAVING AVG(Salary) > 50000
);

• - MySQL solution
SELECT EmployeeName
FROM Employees E
WHERE Salary > (
SELECT AVG(Salary)
FROM Employees

244
1000+ SQL Interview Questions & Answers | By Zero Analyst

WHERE DepartmentID = E.DepartmentID


)
AND E.DepartmentID IN (
SELECT DepartmentID
FROM Employees
GROUP BY DepartmentID
HAVING AVG(Salary) > 50000
);

Explanation of the Query:

• The inner subquery inside the WHERE clause calculates the average salary for
each department.
• The outer query compares each employee’s salary with the average salary of
their department.
• The second subquery in the AND condition ensures that only those departments
where the average salary is greater than $50,000 are considered.

Common Table Expressions (CTEs)

• Q.181

Question
Write a query to find employees whose salary is greater than the average salary of all
employees.

Explanation
This can be solved using a CTE to calculate the average salary first, then filter
employees based on this average.

Datasets and SQL Schemas

• - Create tables
CREATE TABLE Employees (
EmployeeID INT PRIMARY KEY,
EmployeeName VARCHAR(100),
Salary DECIMAL(10, 2)
);

• - Insert data
INSERT INTO Employees VALUES
(1, 'John Doe', 50000.00),
(2, 'Jane Smith', 60000.00),
(3, 'Alice Johnson', 70000.00),
(4, 'Bob Brown', 65000.00);

Learnings

• CTE (Common Table Expression): Using a CTE to calculate the average


salary in a separate step.
• Filtering: Using the result of the CTE to filter employees with salaries above
the average.

245
1000+ SQL Interview Questions & Answers | By Zero Analyst

MySQL & Postgres Solution


WITH AvgSalary AS (
SELECT AVG(Salary) AS avg_salary
FROM Employees
)
SELECT EmployeeName
FROM Employees
WHERE Salary > (SELECT avg_salary FROM AvgSalary);

• Q.182

Question
Write a query to find all departments that have more than 2 employees, along with the
department name and the number of employees in each department.

Explanation
This can be solved using a CTE to calculate the number of employees in each
department, then filtering the results to only show departments with more than 2
employees.

Datasets and SQL Schemas

• - Create tables
CREATE TABLE Departments (
DepartmentID INT PRIMARY KEY,
DepartmentName VARCHAR(100)
);

CREATE TABLE Employees (


EmployeeID INT PRIMARY KEY,
EmployeeName VARCHAR(100),
DepartmentID INT,
FOREIGN KEY (DepartmentID) REFERENCES Departments(DepartmentID)
);

• - Insert data
INSERT INTO Departments VALUES
(1, 'Engineering'),
(2, 'Marketing'),
(3, 'Finance');

INSERT INTO Employees VALUES


(1, 'Alice', 1),
(2, 'Bob', 1),
(3, 'Charlie', 2),
(4, 'David', 2),
(5, 'Eve', 3);

Learnings

• CTE: Using a CTE to count employees per department.


• Grouping and Filtering: Using GROUP BY to count employees and filtering
using HAVING to only return departments with more than 2 employees.

MySQL & Postgres Solution

246
1000+ SQL Interview Questions & Answers | By Zero Analyst

WITH DeptEmployeeCount AS (
SELECT DepartmentID, COUNT(*) AS EmployeeCount
FROM Employees
GROUP BY DepartmentID
)
SELECT D.DepartmentName, DEC.EmployeeCount
FROM Departments D
JOIN DeptEmployeeCount DEC ON D.DepartmentID = DEC.DepartmentID
WHERE DEC.EmployeeCount > 2;

• Q.183

Question
Write a query to find employees who report to the same manager. The result should
include the employee name and their manager's name.

Explanation
This can be solved using a CTE that joins the Employees table with itself to find
employees who have the same manager.

Datasets and SQL Schemas

• - Create tables
CREATE TABLE Employees (
EmployeeID INT PRIMARY KEY,
EmployeeName VARCHAR(100),
ManagerID INT
);

• - Insert data
INSERT INTO Employees VALUES
(1, 'John Doe', NULL),
(2, 'Jane Smith', 1),
(3, 'Alice Johnson', 1),
(4, 'Bob Brown', 2),
(5, 'Charlie Davis', 2);

Learnings

• Self Join: Using a CTE to perform a self-join on the Employees table.


• CTE for Cleaner Code: Using CTEs to keep the query simple and readable.

MySQL & Postgres Solution


WITH EmployeeManagers AS (
SELECT EmployeeName, ManagerID
FROM Employees
WHERE ManagerID IS NOT NULL
)
SELECT E1.EmployeeName AS Employee, E2.EmployeeName AS Manager
FROM EmployeeManagers E1
JOIN EmployeeManagers E2 ON E1.ManagerID = E2.ManagerID
WHERE E1.EmployeeName != E2.EmployeeName;

• Q.184

247
1000+ SQL Interview Questions & Answers | By Zero Analyst

Question
Write a query to find the names of products that have never been ordered.

Explanation
This can be done using a CTE to list all products and then using a LEFT JOIN to
check which products don't have any corresponding orders in the OrderItems table.

Datasets and SQL Schemas

• - Create tables
CREATE TABLE Products (
ProductID INT PRIMARY KEY,
ProductName VARCHAR(100)
);

CREATE TABLE Orders (


OrderID INT PRIMARY KEY,
OrderDate DATE
);

CREATE TABLE OrderItems (


OrderItemID INT PRIMARY KEY,
OrderID INT,
ProductID INT,
FOREIGN KEY (OrderID) REFERENCES Orders(OrderID),
FOREIGN KEY (ProductID) REFERENCES Products(ProductID)
);

• - Insert data
INSERT INTO Products VALUES
(1, 'Laptop'),
(2, 'Mouse'),
(3, 'Keyboard'),
(4, 'Monitor');

INSERT INTO Orders VALUES


(101, '2025-01-01'),
(102, '2025-01-02');

INSERT INTO OrderItems VALUES


(201, 101, 1),
(202, 102, 2);

Learnings

• CTEs: Using CTEs to break down complex filtering tasks.


• LEFT JOIN: To find products with no corresponding order.

MySQL Solution
WITH ProductList AS (
SELECT ProductID, ProductName
FROM Products
)
SELECT ProductName
FROM ProductList PL
LEFT JOIN OrderItems OI ON PL.ProductID = OI.ProductID
WHERE OI.OrderID IS NULL;

248
1000+ SQL Interview Questions & Answers | By Zero Analyst

Postgres Solution
WITH ProductList AS (
SELECT ProductID, ProductName
FROM Products
)
SELECT ProductName
FROM ProductList PL
LEFT JOIN OrderItems OI ON PL.ProductID = OI.ProductID
WHERE OI.OrderID IS NULL;

• Q.185

Question
Write a query to find customers who have purchased more than 5 different products.

Explanation
Using a CTE, we first count the number of distinct products each customer has
purchased, then filter those who have purchased more than 5 products.

Datasets and SQL Schemas

• - Create tables
CREATE TABLE Customers (
CustomerID INT PRIMARY KEY,
CustomerName VARCHAR(100)
);

CREATE TABLE Orders (


OrderID INT PRIMARY KEY,
CustomerID INT,
OrderDate DATE,
FOREIGN KEY (CustomerID) REFERENCES Customers(CustomerID)
);

CREATE TABLE OrderItems (


OrderItemID INT PRIMARY KEY,
OrderID INT,
ProductID INT,
FOREIGN KEY (OrderID) REFERENCES Orders(OrderID)
);

• - Insert data
INSERT INTO Customers VALUES
(1, 'John Doe'),
(2, 'Jane Smith');

INSERT INTO Orders VALUES


(101, 1, '2025-01-01'),
(102, 1, '2025-02-01'),
(103, 2, '2025-03-01');

INSERT INTO OrderItems VALUES


(201, 101, 1),
(202, 101, 2),
(203, 101, 3),
(204, 101, 4),
(205, 101, 5),
(206, 102, 1),
(207, 102, 2);

249
1000+ SQL Interview Questions & Answers | By Zero Analyst

Learnings

• CTEs: Used to simplify aggregation and filtering.


• COUNT(): Used to count distinct products purchased.
• GROUP BY: Groups data by customers to calculate the number of different
products.

MySQL Solution
WITH ProductCounts AS (
SELECT O.CustomerID, COUNT(DISTINCT OI.ProductID) AS ProductCount
FROM Orders O
JOIN OrderItems OI ON O.OrderID = OI.OrderID
GROUP BY O.CustomerID
)
SELECT C.CustomerName
FROM Customers C
JOIN ProductCounts PC ON C.CustomerID = PC.CustomerID
WHERE PC.ProductCount > 5;

Postgres Solution
WITH ProductCounts AS (
SELECT O.CustomerID, COUNT(DISTINCT OI.ProductID) AS ProductCount
FROM Orders O
JOIN OrderItems OI ON O.OrderID = OI.OrderID
GROUP BY O.CustomerID
)
SELECT C.CustomerName
FROM Customers C
JOIN ProductCounts PC ON C.CustomerID = PC.CustomerID
WHERE PC.ProductCount > 5;

• Q.186

Question
Write a query to find the most expensive product in each category.

Explanation
We can use a CTE to calculate the maximum price for each category, then use this to
filter out the most expensive product in each category.

Datasets and SQL Schemas

• - Create tables
CREATE TABLE Categories (
CategoryID INT PRIMARY KEY,
CategoryName VARCHAR(100)
);

CREATE TABLE Products (


ProductID INT PRIMARY KEY,
ProductName VARCHAR(100),
CategoryID INT,
Price DECIMAL(10, 2),
FOREIGN KEY (CategoryID) REFERENCES Categories(CategoryID)
);

• - Insert data

250
1000+ SQL Interview Questions & Answers | By Zero Analyst

INSERT INTO Categories VALUES


(1, 'Electronics'),
(2, 'Furniture'),
(3, 'Clothing');

INSERT INTO Products VALUES


(1, 'Laptop', 1, 1500.00),
(2, 'Smartphone', 1, 800.00),
(3, 'Table', 2, 200.00),
(4, 'Chair', 2, 100.00),
(5, 'T-Shirt', 3, 30.00),
(6, 'Jeans', 3, 50.00);

Learnings

• CTEs: To calculate the maximum price for each category.


• Filtering by Max: Using the CTE to find the most expensive product in each
category.
• Aggregation: Using MAX() to determine the most expensive product.

MySQL Solution
WITH MaxPrices AS (
SELECT CategoryID, MAX(Price) AS MaxPrice
FROM Products
GROUP BY CategoryID
)
SELECT P.ProductName, C.CategoryName, P.Price
FROM Products P
JOIN MaxPrices MP ON P.CategoryID = MP.CategoryID AND P.Price = MP.MaxPrice
JOIN Categories C ON P.CategoryID = C.CategoryID;

Postgres Solution
WITH MaxPrices AS (
SELECT CategoryID, MAX(Price) AS MaxPrice
FROM Products
GROUP BY CategoryID
)
SELECT P.ProductName, C.CategoryName, P.Price
FROM Products P
JOIN MaxPrices MP ON P.CategoryID = MP.CategoryID AND P.Price = MP.MaxPrice
JOIN Categories C ON P.CategoryID = C.CategoryID;

• Q.187

Question
Calculate the department-wise average salary using a Common Table Expression
(CTE).

Explanation
You need to calculate the average salary for each department. Use a CTE to join the
Employees, Salaries, and Departments tables, then group by department to
calculate the average salary.

Datasets and SQL Schemas

• - Table creation

251
1000+ SQL Interview Questions & Answers | By Zero Analyst

CREATE TABLE Employees (


EmployeeID INT PRIMARY KEY,
EmployeeName VARCHAR(100),
DepartmentID INT
);

CREATE TABLE Departments (


DepartmentID INT PRIMARY KEY,
DepartmentName VARCHAR(100)
);

CREATE TABLE Salaries (


EmployeeID INT,
Salary DECIMAL(10, 2),
FOREIGN KEY (EmployeeID) REFERENCES Employees(EmployeeID)
);

• - Datasets
INSERT INTO Employees VALUES
(1, 'Arun Kumar', 1),
(2, 'Priya Sharma', 2),
(3, 'Ravi Patel', 1),
(4, 'Sita Mehta', 3);

INSERT INTO Departments VALUES


(1, 'HR'),
(2, 'Finance'),
(3, 'Marketing');

INSERT INTO Salaries VALUES


(1, 50000),
(2, 70000),
(3, 60000),
(4, 80000);

Learnings

• Use of Common Table Expressions (CTEs)


• Joins across multiple tables
• Aggregation with AVG function
• Grouping by a non-numeric column

Solutions

• - PostgreSQL solution
WITH DepartmentSalaries AS (
SELECT E.DepartmentID, D.DepartmentName, AVG(S.Salary) AS AvgSalary
FROM Employees E
JOIN Salaries S ON E.EmployeeID = S.EmployeeID
JOIN Departments D ON E.DepartmentID = D.DepartmentID
GROUP BY E.DepartmentID, D.DepartmentName
)
SELECT * FROM DepartmentSalaries;

• - MySQL solution
WITH DepartmentSalaries AS (
SELECT E.DepartmentID, D.DepartmentName, AVG(S.Salary) AS AvgSalary
FROM Employees E
JOIN Salaries S ON E.EmployeeID = S.EmployeeID
JOIN Departments D ON E.DepartmentID = D.DepartmentID
GROUP BY E.DepartmentID, D.DepartmentName

252
1000+ SQL Interview Questions & Answers | By Zero Analyst

)
SELECT * FROM DepartmentSalaries;

• Q.188

Question
Find the Employees Who Have the Highest Salary in Each Department.

Explanation
The task is to find the employees with the highest salary in each department using a
Common Table Expression (CTE). The solution involves:

• Identifying the maximum salary for each department.


• Joining the Employees, Salaries, and Departments tables to retrieve the
employee details who have the highest salary in their respective departments.

Datasets and SQL Schemas

• - Table creation
CREATE TABLE Employees (
EmployeeID INT PRIMARY KEY,
EmployeeName VARCHAR(100),
DepartmentID INT
);

CREATE TABLE Departments (


DepartmentID INT PRIMARY KEY,
DepartmentName VARCHAR(100)
);

CREATE TABLE Salaries (


EmployeeID INT,
Salary DECIMAL(10, 2),
FOREIGN KEY (EmployeeID) REFERENCES Employees(EmployeeID)
);

• - Datasets
INSERT INTO Employees VALUES
(1, 'Arun Kumar', 1),
(2, 'Priya Sharma', 2),
(3, 'Ravi Patel', 1),
(4, 'Sita Mehta', 3);

INSERT INTO Departments VALUES


(1, 'HR'),
(2, 'Finance'),
(3, 'Marketing');

INSERT INTO Salaries VALUES


(1, 50000),
(2, 70000),
(3, 60000),
(4, 80000);

Learnings

253
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Using CTEs to calculate aggregate values (like MAX salary) for each group
(e.g., department).
• Joins across multiple tables (Employees, Salaries, Departments) to fetch
related information.
• Filtering the results to only show employees with the maximum salary in each
department.

Solutions

• - PostgreSQL solution
WITH DepartmentMaxSalary AS (
SELECT E.DepartmentID, MAX(S.Salary) AS MaxSalary
FROM Employees E
JOIN Salaries S ON E.EmployeeID = S.EmployeeID
GROUP BY E.DepartmentID
)
SELECT E.EmployeeName, S.Salary, D.DepartmentName
FROM Employees E
JOIN Salaries S ON E.EmployeeID = S.EmployeeID
JOIN Departments D ON E.DepartmentID = D.DepartmentID
JOIN DepartmentMaxSalary DMS ON E.DepartmentID = DMS.DepartmentID
WHERE S.Salary = DMS.MaxSalary;

• - MySQL solution
WITH DepartmentMaxSalary AS (
SELECT E.DepartmentID, MAX(S.Salary) AS MaxSalary
FROM Employees E
JOIN Salaries S ON E.EmployeeID = S.EmployeeID
GROUP BY E.DepartmentID
)
SELECT E.EmployeeName, S.Salary, D.DepartmentName
FROM Employees E
JOIN Salaries S ON E.EmployeeID = S.EmployeeID
JOIN Departments D ON E.DepartmentID = D.DepartmentID
JOIN DepartmentMaxSalary DMS ON E.DepartmentID = DMS.DepartmentID
WHERE S.Salary = DMS.MaxSalary;

• Q.189

Question
Find Patients Who Have Visited More Than 3 Times.

Explanation
The goal is to identify patients who have had more than 3 appointments. The solution
involves:

• Counting the number of visits for each patient using a Common Table
Expression (CTE).
• Filtering patients whose visit count is greater than 3.

Datasets and SQL Schemas

• - Table creation

254
1000+ SQL Interview Questions & Answers | By Zero Analyst

CREATE TABLE Patients (


PatientID INT PRIMARY KEY,
PatientName VARCHAR(100),
Age INT,
Gender VARCHAR(10)
);

CREATE TABLE Appointments (


AppointmentID INT PRIMARY KEY,
PatientID INT,
AppointmentDate DATE,
DoctorID INT,
FOREIGN KEY (PatientID) REFERENCES Patients(PatientID)
);

• - Datasets
INSERT INTO Patients VALUES
(1, 'Anita Roy', 35, 'Female'),
(2, 'Sandeep Kumar', 40, 'Male'),
(3, 'Ravi Gupta', 28, 'Male'),
(4, 'Priya Sharma', 55, 'Female');

INSERT INTO Appointments VALUES


(1, 1, '2024-01-01', 101),
(2, 1, '2024-02-01', 102),
(3, 1, '2024-03-01', 101),
(4, 2, '2024-01-10', 103),
(5, 3, '2024-02-15', 101),
(6, 3, '2024-03-10', 102),
(7, 4, '2024-04-01', 104),
(8, 4, '2024-05-10', 101);

Learnings

• CTEs to calculate the total number of appointments per patient.


• Filtering records based on aggregate counts (e.g., COUNT(*) > 3).
• Joins between Patients and a CTE for efficient filtering of results.

Solutions

• - PostgreSQL solution
WITH PatientVisitCount AS (
SELECT PatientID, COUNT(*) AS VisitCount
FROM Appointments
GROUP BY PatientID
)
SELECT P.PatientName
FROM Patients P
JOIN PatientVisitCount PVC ON P.PatientID = PVC.PatientID
WHERE PVC.VisitCount > 3;

• - MySQL solution
WITH PatientVisitCount AS (
SELECT PatientID, COUNT(*) AS VisitCount
FROM Appointments
GROUP BY PatientID
)
SELECT P.PatientName
FROM Patients P
JOIN PatientVisitCount PVC ON P.PatientID = PVC.PatientID
WHERE PVC.VisitCount > 3;

255
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Q.190

Question
List the Most Frequent Doctor for Each Patient.

Explanation
The task is to identify the most frequent doctor for each patient. This can be achieved
by:

• Counting the number of visits for each patient to each doctor.


• Determining the doctor with the highest visit count for each patient.
• Joining the necessary tables (Patients, Doctors, Appointments) to retrieve the
relevant details.

Datasets and SQL Schemas

• - Table creation
CREATE TABLE Patients (
PatientID INT PRIMARY KEY,
PatientName VARCHAR(100)
);

CREATE TABLE Appointments (


AppointmentID INT PRIMARY KEY,
PatientID INT,
AppointmentDate DATE,
DoctorID INT,
FOREIGN KEY (PatientID) REFERENCES Patients(PatientID),
FOREIGN KEY (DoctorID) REFERENCES Doctors(DoctorID)
);

CREATE TABLE Doctors (


DoctorID INT PRIMARY KEY,
DoctorName VARCHAR(100),
Specialty VARCHAR(100)
);

• - Datasets
INSERT INTO Patients VALUES
(1, 'Anita Roy'),
(2, 'Sandeep Kumar'),
(3, 'Ravi Gupta'),
(4, 'Priya Sharma');

INSERT INTO Doctors VALUES


(101, 'Dr. A Sharma', 'Cardiology'),
(102, 'Dr. B Patel', 'Neurology'),
(103, 'Dr. C Gupta', 'Orthopedics');

INSERT INTO Appointments VALUES


(1, 1, '2024-01-01', 101),
(2, 1, '2024-02-01', 102),
(3, 1, '2024-03-01', 101),
(4, 2, '2024-01-10', 103),
(5, 3, '2024-02-15', 101),
(6, 3, '2024-03-10', 102),
(7, 4, '2024-04-01', 103),
(8, 4, '2024-05-10', 101);

256
1000+ SQL Interview Questions & Answers | By Zero Analyst

Learnings

• Using CTEs to calculate aggregate counts (COUNT(*)) and to find the


maximum value (MAX()).
• Grouping by multiple columns (e.g., PatientID, DoctorID) to get the count
of visits per patient-doctor combination.
• Joining the results from multiple tables (Patients, Appointments, Doctors) to
retrieve the required details.

Solutions

• - PostgreSQL solution
WITH PatientDoctorCount AS (
SELECT PatientID, DoctorID, COUNT(*) AS VisitCount
FROM Appointments
GROUP BY PatientID, DoctorID
),
MaxDoctorVisit AS (
SELECT PatientID, MAX(VisitCount) AS MaxVisits
FROM PatientDoctorCount
GROUP BY PatientID
)
SELECT P.PatientName, D.DoctorName
FROM Patients P
JOIN PatientDoctorCount PDC ON P.PatientID = PDC.PatientID
JOIN Doctors D ON PDC.DoctorID = D.DoctorID
JOIN MaxDoctorVisit MDV ON PDC.PatientID = MDV.PatientID
WHERE PDC.VisitCount = MDV.MaxVisits;

• - MySQL solution
WITH PatientDoctorCount AS (
SELECT PatientID, DoctorID, COUNT(*) AS VisitCount
FROM Appointments
GROUP BY PatientID, DoctorID
),
MaxDoctorVisit AS (
SELECT PatientID, MAX(VisitCount) AS MaxVisits
FROM PatientDoctorCount
GROUP BY PatientID
)
SELECT P.PatientName, D.DoctorName
FROM Patients P
JOIN PatientDoctorCount PDC ON P.PatientID = PDC.PatientID
JOIN Doctors D ON PDC.DoctorID = D.DoctorID
JOIN MaxDoctorVisit MDV ON PDC.PatientID = MDV.PatientID
WHERE PDC.VisitCount = MDV.MaxVisits;

Window Functions

• Q.191

Question
Rank Employees by Salary (Descending).

Explanation

257
1000+ SQL Interview Questions & Answers | By Zero Analyst

The task is to rank employees based on their salary in descending order using the
RANK() window function. This function assigns a rank to each row within a result set,
with ties receiving the same rank, but skipping subsequent ranks.

Datasets and SQL Schemas

• - Table creation
CREATE TABLE Employees (
EmployeeID INT PRIMARY KEY,
EmployeeName VARCHAR(100),
Salary DECIMAL(10, 2),
DepartmentID INT
);

CREATE TABLE Departments (


DepartmentID INT PRIMARY KEY,
DepartmentName VARCHAR(100)
);

• - Datasets
INSERT INTO Employees VALUES
(1, 'John Smith', 55000, 1),
(2, 'Sarah Brown', 60000, 2),
(3, 'James White', 50000, 1),
(4, 'Emma Green', 65000, 3),
(5, 'Michael Clark', 48000, 2);

Learnings

• Window functions, particularly RANK(), allow ranking rows within a partition


of the data.
• Ordering by salary in descending order helps assign the highest rank to the
highest salary.
• Ranking with ties: Employees with the same salary receive the same rank,
and subsequent ranks are skipped.

Solutions

• - PostgreSQL solution
SELECT EmployeeName, Salary,
RANK() OVER (ORDER BY Salary DESC) AS SalaryRank
FROM Employees;

• - MySQL solution
SELECT EmployeeName, Salary,
RANK() OVER (ORDER BY Salary DESC) AS SalaryRank
FROM Employees;

• Q.192

Question
Assign a Unique Row Number to Each Employee in a Department.

258
1000+ SQL Interview Questions & Answers | By Zero Analyst

Explanation
The task is to assign a unique row number to each employee within their respective
department using the ROW_NUMBER() window function. This function provides a
sequential number for each row within a partition, ordered by salary in descending
order.

Datasets and SQL Schemas

• - Table creation
CREATE TABLE Employees (
EmployeeID INT PRIMARY KEY,
Name VARCHAR(100),
Salary DECIMAL(10, 2),
DepartmentID INT
);

CREATE TABLE Departments (


DepartmentID INT PRIMARY KEY,
DepartmentName VARCHAR(100)
);

• - Datasets
INSERT INTO Employees VALUES
(1, 'Oliver Harris', 45000, 1),
(2, 'Emily Walker', 52000, 2),
(3, 'Charlotte King', 48000, 2),
(4, 'James Thompson', 56000, 1),
(5, 'Liam White', 54000, 1);

Learnings

• ROW_NUMBER(): Assigns a unique number to each row within a specified


partition, starting from 1.
• PARTITION BY: Divides the data into partitions based on department,
ensuring the row numbering resets for each department.
• ORDER BY: The ROW_NUMBER() function assigns numbers based on the order
of salary in descending order.

Solutions

• - PostgreSQL solution
SELECT Name, DepartmentID, Salary,
ROW_NUMBER() OVER (PARTITION BY DepartmentID ORDER BY Salary DESC) AS RowN
um
FROM Employees;

• - MySQL solution
SELECT Name, DepartmentID, Salary,
ROW_NUMBER() OVER (PARTITION BY DepartmentID ORDER BY Salary DESC) AS RowN
um
FROM Employees;

259
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Q.193

Question
List Employees with Their Previous Salary (Using LAG).

Explanation
The task is to retrieve the salary of each employee along with their previous salary
using the LAG() window function. The LAG() function returns the value of the
specified column (in this case, Salary) from the previous row in the result set, based
on the given order.

Datasets and SQL Schemas

• - Table creation
CREATE TABLE StaffMembers (
StaffID INT PRIMARY KEY,
Name VARCHAR(100),
Salary DECIMAL(10, 2)
);

• - Datasets
INSERT INTO StaffMembers VALUES
(1, 'William Brown', 49000),
(2, 'Sophia Harris', 54000),
(3, 'Isabella Clark', 46000),
(4, 'Mia Lewis', 55000),
(5, 'Jacob Davis', 50000);

Learnings

• LAG(): The LAG() window function retrieves the value of a column from the
previous row within the specified ordering.
• ORDER BY: The order of salary is used to determine the previous salary. It
assigns a previous salary to each row based on ascending salary order.
• Handling NULLs: For the first row, where there is no previous row, LAG()
returns NULL.

Solutions

• - PostgreSQL solution
SELECT Name, Salary,
LAG(Salary, 1) OVER (ORDER BY Salary) AS PreviousSalary
FROM StaffMembers;

• - MySQL solution
SELECT Name, Salary,
LAG(Salary, 1) OVER (ORDER BY Salary) AS PreviousSalary
FROM StaffMembers;

260
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Q.194

Question
Find the Top 3 Highest Paid Employees Using ROW_NUMBER.

Explanation
The task is to find the top 3 highest-paid employees by assigning a unique rank using
the ROW_NUMBER() window function. The function assigns a number to each row
based on salary in descending order, and then the result is filtered to retrieve only the
top 3 employees.

Datasets and SQL Schemas

• - Table creation
CREATE TABLE Employees (
EmployeeID INT PRIMARY KEY,
Name VARCHAR(100),
Salary DECIMAL(10, 2)
);

• - Datasets
INSERT INTO Employees VALUES
(1, 'Harry Williams', 55000),
(2, 'Olivia Jackson', 60000),
(3, 'George Smith', 58000),
(4, 'Charlotte Brown', 67000),
(5, 'Amelia Harris', 65000);

Learnings

• ROW_NUMBER(): Assigns a unique number to each row, allowing you to


rank employees based on salary.
• ORDER BY: Used in ROW_NUMBER() to ensure employees are ranked by
salary in descending order.
• Filtering: After assigning row numbers, you can filter results to select only
the top N entries (in this case, the top 3).

Solutions

• - PostgreSQL solution
WITH TopSalaries AS (
SELECT Name, Salary,
ROW_NUMBER() OVER (ORDER BY Salary DESC) AS Rank
FROM Employees
)
SELECT Name, Salary
FROM TopSalaries
WHERE Rank <= 3;

• - MySQL solution

261
1000+ SQL Interview Questions & Answers | By Zero Analyst

WITH TopSalaries AS (
SELECT Name, Salary,
ROW_NUMBER() OVER (ORDER BY Salary DESC) AS Rank
FROM Employees
)
SELECT Name, Salary
FROM TopSalaries
WHERE Rank <= 3;

• Q.195

Question
Rank Employees with Ties Using DENSE_RANK.

Explanation
The task is to rank employees by salary in descending order using the DENSE_RANK()
window function. Unlike RANK(), which skips rank numbers in case of ties,
DENSE_RANK() assigns the same rank to tied values but does not skip subsequent
ranks.

Datasets and SQL Schemas

• - Table creation
CREATE TABLE Employees (
EmployeeID INT PRIMARY KEY,
Name VARCHAR(100),
Salary DECIMAL(10, 2)
);

• - Datasets
INSERT INTO Employees VALUES
(1, 'Lucas Evans', 55000),
(2, 'Ava Johnson', 65000),
(3, 'Ethan Harris', 65000),
(4, 'Ella Walker', 47000),
(5, 'Mason Davis', 48000);

Learnings

• DENSE_RANK(): Ranks rows without gaps in ranking values, even in case


of ties.
• ORDER BY: The DENSE_RANK() function orders employees by salary in
descending order, assigning the highest rank to the highest salary.
• Handling Ties: Tied employees (with the same salary) receive the same rank,
and the next rank is incremented by 1.

Solutions

• - PostgreSQL solution
SELECT Name, Salary,
DENSE_RANK() OVER (ORDER BY Salary DESC) AS SalaryRank
FROM Employees;

262
1000+ SQL Interview Questions & Answers | By Zero Analyst

• - MySQL solution
SELECT Name, Salary,
DENSE_RANK() OVER (ORDER BY Salary DESC) AS SalaryRank
FROM Employees;

• Q.196

Question
Find Employees Who Are 2nd in Their Department by Salary Using
ROW_NUMBER.

Explanation
The task is to find employees who are ranked 2nd in their department by salary using
the ROW_NUMBER() window function. The ROW_NUMBER() function assigns a unique
rank to each employee within their department, ordered by salary in descending order.
We then filter for employees who have a rank of 2.

Datasets and SQL Schemas

• - Table creation
CREATE TABLE Employees (
EmployeeID INT PRIMARY KEY,
Name VARCHAR(100),
DepartmentID INT,
Salary DECIMAL(10, 2)
);

CREATE TABLE Departments (


DepartmentID INT PRIMARY KEY,
DepartmentName VARCHAR(100)
);

• - Datasets
INSERT INTO Employees VALUES
(1, 'Daniel Miller', 55000, 1),
(2, 'Sophia Allen', 60000, 2),
(3, 'Ethan Jackson', 57000, 1),
(4, 'Olivia White', 62000, 2),
(5, 'Mason Harris', 45000, 1);

Learnings

• ROW_NUMBER(): The function assigns a unique rank to each row based on


the ORDER BY clause, allowing us to rank employees by salary.
• PARTITION BY: Used to rank employees within each department
separately.
• Filtering by Rank: After ranking employees, we can filter for those ranked
2nd.

Solutions

• - PostgreSQL solution

263
1000+ SQL Interview Questions & Answers | By Zero Analyst

WITH RankedEmployees AS (
SELECT Name, DepartmentID, Salary,
ROW_NUMBER() OVER (PARTITION BY DepartmentID ORDER BY Salary DESC) AS
Rank
FROM Employees
)
SELECT Name, DepartmentID, Salary
FROM RankedEmployees
WHERE Rank = 2;

• - MySQL solution
WITH RankedEmployees AS (
SELECT Name, DepartmentID, Salary,
ROW_NUMBER() OVER (PARTITION BY DepartmentID ORDER BY Salary DESC) AS
Rank
FROM Employees
)
SELECT Name, DepartmentID, Salary
FROM RankedEmployees
WHERE Rank = 2;

• Q.197

Question
Use NTILE to Divide Employees into 4 Salary Groups.

Explanation
The task is to divide employees into 4 groups based on their salary using the NTILE()
window function. The NTILE(n) function assigns rows into n approximately equal
groups, ordered by the specified column. Here, employees are divided into 4 salary
groups based on descending salary.

Datasets and SQL Schemas

• - Table creation
CREATE TABLE Employees (
EmployeeID INT PRIMARY KEY,
Name VARCHAR(100),
Salary DECIMAL(10, 2)
);

• - Datasets
INSERT INTO Employees VALUES
(1, 'Lily Martin', 55000),
(2, 'James White', 70000),
(3, 'Benjamin Lewis', 80000),
(4, 'Lucas Walker', 95000),
(5, 'Mia Scott', 40000);

Learnings

• NTILE(): The NTILE(n) function divides the data into n groups based on an
ordered column. If there is a remainder, the extra rows are distributed across
the groups.

264
1000+ SQL Interview Questions & Answers | By Zero Analyst

• ORDER BY: The ORDER BY clause ensures that employees are ranked by
salary in descending order before dividing them into groups.
• Grouping: Employees are distributed into 4 salary groups based on their
relative ranking in terms of salary.

Solutions

• - PostgreSQL solution
SELECT Name, Salary,
NTILE(4) OVER (ORDER BY Salary DESC) AS SalaryGroup
FROM Employees;

• - MySQL solution
SELECT Name, Salary,
NTILE(4) OVER (ORDER BY Salary DESC) AS SalaryGroup
FROM Employees;

• Q.198

Question
Find the Salary Difference Between Employees and Their Preceding Employee Using
LAG.

Explanation
The task is to calculate the salary difference between each employee and the
preceding employee in terms of salary using the LAG() window function. The LAG()
function allows you to access the salary of the previous employee in the ordered list,
and then compute the difference.

Datasets and SQL Schemas

• - Table creation
CREATE TABLE EmployeeSalaries (
EmployeeID INT PRIMARY KEY,
Name VARCHAR(100),
Salary DECIMAL(10, 2)
);

• - Datasets
INSERT INTO EmployeeSalaries VALUES
(1, 'Henry Clark', 51000),
(2, 'Ava White', 55000),
(3, 'Isabella Davis', 56000),
(4, 'George Lewis', 58000),
(5, 'Mason Harris', 60000);

Learnings

• LAG(): The LAG() function allows you to reference the previous row's value,
here used to get the previous employee's salary.

265
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Salary Difference: Subtracting the current salary from the previous one
calculates the difference.
• ORDER BY: The employees are ordered by salary to ensure that the
"preceding employee" refers to the one with the lower salary.

Solutions

• - PostgreSQL solution
SELECT Name, Salary,
LAG(Salary, 1) OVER (ORDER BY Salary) AS PreviousSalary,
Salary - LAG(Salary, 1) OVER (ORDER BY Salary) AS SalaryDifference
FROM EmployeeSalaries;

• - MySQL solution
SELECT Name, Salary,
LAG(Salary, 1) OVER (ORDER BY Salary) AS PreviousSalary,
Salary - LAG(Salary, 1) OVER (ORDER BY Salary) AS SalaryDifference
FROM EmployeeSalaries;

• Q.199

Question
Find the Employee with the Highest Salary in Each Department Using
ROW_NUMBER.

Explanation
The task is to identify the employee with the highest salary in each department using
the ROW_NUMBER() window function. The ROW_NUMBER() function assigns a unique
rank to employees within each department, ordered by salary in descending order. The
employee with the highest salary in each department will have a rank of 1.

Datasets and SQL Schemas

• - Table creation
CREATE TABLE Employees (
EmployeeID INT PRIMARY KEY,
Name VARCHAR(100),
DepartmentID INT,
Salary DECIMAL(10, 2)
);

• - Datasets
INSERT INTO Employees VALUES
(1, 'Oliver Phillips', 64000, 1),
(2, 'Sophia Wood', 67000, 2),
(3, 'Liam White', 60000, 1),
(4, 'Charlotte Scott', 72000, 2),
(5, 'Amelia Johnson', 50000, 1);

Learnings

266
1000+ SQL Interview Questions & Answers | By Zero Analyst

• ROW_NUMBER(): This window function assigns a unique rank to each


employee based on the specified ordering (here, by salary in descending
order).
• PARTITION BY: Used to group employees by department, so the ranking is
reset for each department.
• Filtering: After assigning ranks, we filter for the rows where the rank is 1 to
get the highest-paid employee in each department.

Solutions

• - PostgreSQL solution
WITH DepartmentSalaries AS (
SELECT Name, DepartmentID, Salary,
ROW_NUMBER() OVER (PARTITION BY DepartmentID ORDER BY Salary DESC) AS
Rank
FROM Employees
)
SELECT Name, DepartmentID, Salary
FROM DepartmentSalaries
WHERE Rank = 1;

• - MySQL solution
WITH DepartmentSalaries AS (
SELECT Name, DepartmentID, Salary,
ROW_NUMBER() OVER (PARTITION BY DepartmentID ORDER BY Salary DESC) AS
Rank
FROM Employees
)
SELECT Name, DepartmentID, Salary
FROM DepartmentSalaries
WHERE Rank = 1;

• Q.200

Question
Get Employees Who Are 3rd in Terms of Salary in Each Department Using
ROW_NUMBER.

Explanation
The task is to find employees who rank 3rd in terms of salary within their respective
departments using the ROW_NUMBER() window function. The ROW_NUMBER() function
assigns a unique rank to employees within each department, ordered by salary in
descending order. The employees who rank 3rd in their departments will have a rank
of 3.

Datasets and SQL Schemas

• - Table creation
CREATE TABLE Employees (
EmployeeID INT PRIMARY KEY,
Name VARCHAR(100),
DepartmentID INT,
Salary DECIMAL(10, 2)

267
1000+ SQL Interview Questions & Answers | By Zero Analyst

);

• - Datasets
INSERT INTO Employees VALUES
(1, 'John Black', 48000, 1),
(2, 'Oliver Smith', 55000, 2),
(3, 'Emily Harris', 60000, 1),
(4, 'Daniel Brown', 65000, 2),
(5, 'Sophia King', 48000, 1);

Learnings

• ROW_NUMBER(): This function assigns a unique rank to each employee


based on the ORDER BY clause, ensuring employees are ranked within each
department by their salary.
• PARTITION BY: The PARTITION BY clause ensures the ranking is done per
department.
• Filtering: After ranking the employees, we filter for those whose rank is 3,
identifying the 3rd highest-paid employee in each department.

Solutions

• - PostgreSQL solution
WITH DepartmentRank AS (
SELECT Name, DepartmentID, Salary,
ROW_NUMBER() OVER (PARTITION BY DepartmentID ORDER BY Salary DESC) AS
Rank
FROM Employees
)
SELECT Name, DepartmentID, Salary
FROM DepartmentRank
WHERE Rank = 3;

• - MySQL solution
WITH DepartmentRank AS (
SELECT Name, DepartmentID, Salary,
ROW_NUMBER() OVER (PARTITION BY DepartmentID ORDER BY Salary DESC) AS
Rank
FROM Employees
)
SELECT Name, DepartmentID, Salary
FROM DepartmentRank
WHERE Rank = 3;

String Functions

• Q.201

Question
Concatenate First and Last Names.
Explanation

268
1000+ SQL Interview Questions & Answers | By Zero Analyst

The task is to concatenate the first and last names of employees into a single full name
using the CONCAT() function. This function combines the values of the first and last
name with a space in between.

Datasets and SQL Schemas

• - Table creation
CREATE TABLE Employees (
EmployeeID INT PRIMARY KEY,
FirstName VARCHAR(50),
LastName VARCHAR(50)
);

• - Datasets
INSERT INTO Employees VALUES
(1, 'John', 'Black'),
(2, 'Oliver', 'Smith'),
(3, 'Emily', 'Harris'),
(4, 'Daniel', 'Brown');

Learnings

• CONCAT(): This function concatenates multiple strings together. It can join


columns or values, with the option to add separators like spaces or commas.
• String Manipulation: Concatenating columns is a common way to generate
full names, addresses, or other composite fields from individual pieces of data.

Solutions

• - PostgreSQL solution
SELECT EmployeeID, CONCAT(FirstName, ' ', LastName) AS FullName
FROM Employees;

• - MySQL solution
SELECT EmployeeID, CONCAT(FirstName, ' ', LastName) AS FullName
FROM Employees;

• Q.202

Question
Extract Year from Date of Birth (TEXT).
Explanation
The task is to extract the year from the DateOfBirth column using the EXTRACT()
function, which is commonly used to retrieve specific parts of a date (e.g., year,
month, day).

Datasets and SQL Schemas

269
1000+ SQL Interview Questions & Answers | By Zero Analyst

• - Table creation
CREATE TABLE People (
PersonID INT PRIMARY KEY,
Name VARCHAR(100),
DateOfBirth DATE
);

• - Datasets
INSERT INTO People VALUES
(1, 'Lucas Green', '1990-04-15'),
(2, 'Charlotte Brown', '1985-09-25'),
(3, 'Ethan White', '1992-03-10'),
(4, 'Mason Harris', '1988-11-05');

Learnings

• EXTRACT(): This function extracts a specified part (like year, month, or


day) from a date or timestamp.
• Date Manipulation: Extracting specific components from a date is common
for analyzing age, time periods, and more.

Solutions

• - PostgreSQL solution
SELECT Name, DateOfBirth,
EXTRACT(YEAR FROM DateOfBirth) AS BirthYear
FROM People;

• - MySQL solution
SELECT Name, DateOfBirth,
YEAR(DateOfBirth) AS BirthYear
FROM People;

Note: In PostgreSQL, the EXTRACT() function is used, whereas MySQL provides the
YEAR() function to directly extract the year from a DATE field.

• Q.203

Question
Remove Extra Spaces in a String (TRIM).
Explanation
The task is to remove any leading or trailing spaces from the ProductName column
using the TRIM() function, which eliminates whitespace characters from both ends of
a string.

Datasets and SQL Schemas

• - Table creation
CREATE TABLE Products (

270
1000+ SQL Interview Questions & Answers | By Zero Analyst

ProductID INT PRIMARY KEY,


ProductName VARCHAR(255)
);

• - Datasets
INSERT INTO Products VALUES
(1, ' Apple iPhone 12 '),
(2, ' Samsung Galaxy S21 '),
(3, ' Sony Xperia 1 II '),
(4, ' OnePlus 8 Pro ');

Learnings

• TRIM(): This function is used to remove leading and trailing spaces from a
string.
• String Cleaning: Removing extra spaces is essential for clean and consistent
data, especially when processing text for search or reporting.

Solutions

• - PostgreSQL solution
SELECT ProductID, TRIM(ProductName) AS TrimmedProductName
FROM Products;

• - MySQL solution
SELECT ProductID, TRIM(ProductName) AS TrimmedProductName
FROM Products;

Note: The TRIM() function works similarly in both PostgreSQL and MySQL for
removing leading and trailing whitespace.

• Q.204

Question
Replace Specific Characters in a Text (REPLACE).

Explanation
The task is to replace all commas with semicolons in the Address field using the
REPLACE() function. This function finds all occurrences of a specified substring (in
this case, commas) and replaces them with a new substring (semicolons).

Datasets and SQL Schemas

• - Table creation
CREATE TABLE Customers (
CustomerID INT PRIMARY KEY,
Name VARCHAR(100),
Address VARCHAR(200)
);

271
1000+ SQL Interview Questions & Answers | By Zero Analyst

• - Datasets
INSERT INTO Customers VALUES
(1, 'John Doe', '1234 Elm St., New York, NY 10001'),
(2, 'Alice Johnson', '5678 Oak Ave, Los Angeles, CA 90001'),
(3, 'Bob Brown', '4321 Pine Rd, Chicago, IL 60001');

Learnings

• REPLACE(): This function replaces occurrences of a substring with another


substring within a string.
• String Manipulation: Replacing characters is useful for formatting or
cleaning up data, such as changing delimiters or correcting characters.

Solutions

• - PostgreSQL solution
SELECT CustomerID, Name, REPLACE(Address, ',', ';') AS UpdatedAddress
FROM Customers;

• - MySQL solution
SELECT CustomerID, Name, REPLACE(Address, ',', ';') AS UpdatedAddress
FROM Customers;

Note: The REPLACE() function works similarly in both PostgreSQL and MySQL for
replacing specific characters or substrings within a string.

• Q.205

Question
Substring Extraction for Specific Position.
Explanation
The task is to extract the first 4 characters from the CompanyName column using the
SUBSTRING() function. The function allows you to extract a portion of a string
starting from a specific position.

Datasets and SQL Schemas

• - Table creation
CREATE TABLE Companies (
CompanyID INT PRIMARY KEY,
CompanyName VARCHAR(100),
Industry VARCHAR(100)
);

• - Datasets
INSERT INTO Companies VALUES
(1, 'Tech Corp', 'Technology'),
(2, 'Green Earth Ltd.', 'Environmental'),
(3, 'Global Ventures', 'Investment');

272
1000+ SQL Interview Questions & Answers | By Zero Analyst

Learnings

• SUBSTRING(): This function extracts a part of a string based on the starting


position and length.
• String Slicing: Substring extraction is useful for getting portions of a string,
such as prefixes, suffixes, or specific positions.

Solutions

• - PostgreSQL solution
SELECT CompanyID, CompanyName, SUBSTRING(CompanyName FROM 1 FOR 4) AS NamePrefix
FROM Companies;

• - MySQL solution
SELECT CompanyID, CompanyName, SUBSTRING(CompanyName, 1, 4) AS NamePrefix
FROM Companies;

Note: In MySQL, the SUBSTRING() function uses a starting position and length, while
in PostgreSQL, it uses a more flexible syntax with the FROM and FOR keywords.

• Q.206

Question
Search for a Pattern in a String (LIKE Operator).

Explanation
The task is to find countries with names starting with the word "United" using the
LIKE operator. The LIKE operator is used for pattern matching in strings, where %
represents any sequence of characters.

Datasets and SQL Schemas

• - Table creation
CREATE TABLE Countries (
CountryID INT PRIMARY KEY,
CountryName VARCHAR(100),
Continent VARCHAR(50)
);

• - Datasets
INSERT INTO Countries VALUES
(1, 'United States', 'North America'),
(2, 'India', 'Asia'),
(3, 'United Kingdom', 'Europe'),
(4, 'South Korea', 'Asia');

Learnings

273
1000+ SQL Interview Questions & Answers | By Zero Analyst

• LIKE Operator: This is used for pattern matching in SQL. The % symbol
matches any sequence of characters.
• Pattern Matching: The LIKE operator can be used for more complex
searches, such as finding words that start with, end with, or contain certain
substrings.

Solutions

• - PostgreSQL solution
SELECT CountryID, CountryName
FROM Countries
WHERE CountryName LIKE 'United%';

• - MySQL solution
SELECT CountryID, CountryName
FROM Countries
WHERE CountryName LIKE 'United%';

Note: The LIKE operator works the same way in both PostgreSQL and MySQL for
pattern matching in strings.

• Q.207

Question
Check for Null or Empty Strings.
Explanation
The task is to identify clients with either NULL or empty strings ('') in the Email
field. This can be done using the IS NULL condition to check for NULL values and a
comparison (= '') to check for empty strings.

Datasets and SQL Schemas

• - Table creation
CREATE TABLE Clients (
ClientID INT PRIMARY KEY,
ClientName VARCHAR(100),
Email VARCHAR(150)
);

• - Datasets
INSERT INTO Clients VALUES
(1, 'Jane Doe', '[email protected]'),
(2, 'Peter Parker', ''),
(3, 'Clark Kent', '[email protected]'),
(4, 'Bruce Wayne', NULL);

Learnings

• NULL check: Use IS NULL to identify NULL values in a column.

274
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Empty string check: Use = '' to identify columns that contain empty strings.
• Combined conditions: You can use OR to combine multiple conditions for
filtering.

Solutions

• - PostgreSQL solution
SELECT ClientID, ClientName, Email
FROM Clients
WHERE Email IS NULL OR Email = '';

• - MySQL solution
SELECT ClientID, ClientName, Email
FROM Clients
WHERE Email IS NULL OR Email = '';

Note: Both PostgreSQL and MySQL use the same syntax to check for NULL values
and empty strings.

• Q.208

Question
Find All Companies with a Specific Suffix (REGEX).

Explanation
The task is to find all startups that have the suffix "Inc." or "LLC" in their name using
regular expressions (REGEXP). The regular expression pattern (Inc\.|LLC)$ will
match any startup name ending with either "Inc." or "LLC". The $ symbol asserts that
the match is at the end of the string.

Datasets and SQL Schemas

• - Table creation
CREATE TABLE Startups (
StartupID INT PRIMARY KEY,
StartupName VARCHAR(100),
FoundedYear INT
);

• - Datasets
INSERT INTO Startups VALUES
(1, 'TechFusion Inc.', 2015),
(2, 'GreenPlanet LLC', 2018),
(3, 'EduTech Solutions', 2020),
(4, 'MedicaCorp Ltd.', 2017);

Learnings

• REGEXP: Regular expressions can be used in SQL for complex pattern


matching.

275
1000+ SQL Interview Questions & Answers | By Zero Analyst

• $ symbol: The $ asserts that the match must occur at the end of the string,
ensuring that only companies with the specified suffix are selected.
• Pattern Matching: The pipe | is used to match either "Inc." or "LLC".

Solutions

• - PostgreSQL solution
SELECT StartupID, StartupName
FROM Startups
WHERE StartupName ~ '(Inc\.|LLC)$';

• - MySQL solution
SELECT StartupID, StartupName
FROM Startups
WHERE StartupName REGEXP '(Inc\.|LLC)$';

Note: In MySQL, REGEXP is used for pattern matching, while in PostgreSQL, the tilde
~ operator is used to apply regular expressions.

• Q.209

Question
Find the Top 3 Most Frequent Words in a Text Column.
Explanation
This query extracts words from the Content column of the BlogPosts table, then
counts their frequency while excluding common stop words (e.g., "the", "is", "and",
etc.). It uses STRING_TO_ARRAY to split the content into words and UNNEST to flatten
the array into rows. The result is grouped by the word and ordered by frequency,
limiting the result to the top 3 most frequent words.

Datasets and SQL Schemas

• - Table creation
CREATE TABLE BlogPosts (
PostID INT PRIMARY KEY,
Title VARCHAR(255),
Content TEXT
);

• - Datasets
INSERT INTO BlogPosts VALUES
(1, 'SQL Best Practices', 'SQL best practices are important for performance. Lear
n SQL!'),
(2, 'Learn SQL for Data Analysis', 'Learn SQL for data analysis. SQL is essential
for data analysts.'),
(3, 'Advanced SQL Techniques', 'Master advanced SQL techniques and improve your s
kills.'),
(4, 'Introduction to Databases', 'Databases are the backbone of most modern appli
cations. Learn how to manage databases efficiently.'),
(5, 'SQL vs NoSQL', 'SQL and NoSQL databases serve different purposes. SQL is gre
at for structured data, while NoSQL is good for unstructured data.'),

276
1000+ SQL Interview Questions & Answers | By Zero Analyst

(6, 'Optimizing SQL Queries', 'Optimizing SQL queries is key to improving perform
ance. Learn how to write efficient queries and index your tables.'),
(7, 'Database Design Best Practices', 'Good database design can save time and eff
ort. Learn how to structure your database for performance and scalability.'),
(8, 'What is Data Science?', 'Data science combines statistical analysis and mach
ine learning to extract insights from data. Learn the fundamentals of data scienc
e!'),
(9, 'Introduction to Machine Learning', 'Machine learning algorithms can make pre
dictions based on data. Start learning machine learning concepts today!'),
(10, 'Big Data and Cloud Computing', 'Big data and cloud computing are
transforming industries. Learn how to leverage big data and cloud technologies
for growth.');

Learnings

• String Manipulation: Using STRING_TO_ARRAY to split text into words and


UNNEST to turn the array into rows.
• Excluding Stop Words: Filtering out common words using NOT IN.
• Aggregation: Counting word frequencies with COUNT(*) and grouping by
word.
• Ordering and Limiting: Sorting by frequency and limiting the result to the
top 3.

Solutions

• - PostgreSQL solution
SELECT Word, COUNT(*) AS Frequency
FROM (
SELECT UNNEST(STRING_TO_ARRAY(LOWER(Content), ' ')) AS Word
FROM BlogPosts
) AS Words
WHERE Word NOT IN ('the', 'and', 'is', 'for', 'a', 'an', 'in', 'of', 'to')
GROUP BY Word
ORDER BY Frequency DESC
LIMIT 3;

• - MySQL solution

MySQL does not have a built-in STRING_TO_ARRAY or UNNEST function, so a more


complex solution would be needed (such as creating a custom function or using a
combination of SUBSTRING_INDEX and REGEXP to extract words). Here is a conceptual
approach:
-- Conceptual approach for MySQL (may require custom functions)
SELECT Word, COUNT(*) AS Frequency
FROM (
SELECT REGEXP_SUBSTR(Content, '[[:alpha:]]+', 1, n) AS Word
FROM BlogPosts
JOIN (SELECT @rownum := @rownum + 1 AS n FROM BlogPosts, (SELECT @rownum := 0
) init) numbers
WHERE REGEXP_SUBSTR(Content, '[[:alpha:]]+', 1, n) IS NOT NULL
) AS Words
WHERE Word NOT IN ('the', 'and', 'is', 'for', 'a', 'an', 'in', 'of', 'to')
GROUP BY Word
ORDER BY Frequency DESC
LIMIT 3;

Note: The MySQL solution is more complex and would require additional handling to
break the text into individual words since MySQL lacks the direct functionality for
array splitting like PostgreSQL.

277
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Q.210

Question
Extract Country Code from Email Addresses (REGEX).

Explanation
The task is to extract the country code (or domain name) from the email domain using
a regular expression. The REGEXP_SUBSTR() function is used to capture the part of the
email after the "@" symbol, which typically represents the domain of the email
address.

Datasets and SQL Schemas

• - Table creation
CREATE TABLE Users (
UserID INT PRIMARY KEY,
Name VARCHAR(100),
Email VARCHAR(150)
);

• - Datasets
INSERT INTO Users VALUES
(1, 'John Doe', '[email protected]'),
(2, 'Alice Smith', '[email protected]'),
(3, 'Bob Johnson', '[email protected]'),
(4, 'Sophia Green', '[email protected]');

Learnings

• REGEXP_SUBSTR(): This function extracts a substring that matches the


provided regular expression.
• Regular Expression: @([a-zA-Z]+) captures the part after the "@" symbol
and returns the country code or domain.
• Email Parsing: Extracting domain names is useful for categorizing or filtering
users based on their geographical or organizational domain.

Solutions

• - PostgreSQL solution
SELECT Name, Email,
REGEXP_SUBSTR(Email, '@([a-zA-Z]+)') AS CountryCode
FROM Users;

• - MySQL solution
SELECT Name, Email,
REGEXP_SUBSTR(Email, '@([a-zA-Z]+)') AS CountryCode
FROM Users;

278
1000+ SQL Interview Questions & Answers | By Zero Analyst

Note: The REGEXP_SUBSTR() function works in both PostgreSQL and MySQL for
extracting the domain name. However, the behavior may slightly vary across versions
or configurations.

Date Functions

• Q.211

Question:
Convert Date to Text (Date Format Change)
Tables: Orders
-- Create table
CREATE TABLE Orders (
OrderID INT PRIMARY KEY,
OrderDate DATE,
CustomerID INT
);

-- Insert data
INSERT INTO Orders VALUES
(1, '2024-01-15', 101),
(2, '2024-02-20', 102),
(3, '2024-03-18', 103),
(4, '2024-04-25', 104),
(5, '2024-05-05', 105),
(6, '2024-06-10', 106),
(7, '2024-07-22', 107),
(8, '2024-08-13', 108),
(9, '2024-09-09', 109),
(10, '2024-10-30', 110);

Explanation:
Convert the OrderDate from a DATE type to a text string with the format "DD-Mon-
YYYY", where DD is the day, Mon is the abbreviated month name, and YYYY is the year.

Learnings:

• Use of TO_CHAR() function to convert dates to text.


• Formatting of date strings using specific patterns (e.g., DD, Mon, YYYY).

Solutions:

• PostgreSQL Solution:
• SELECT OrderID, TO_CHAR(OrderDate, 'DD-Mon-YYYY') AS OrderDateText
FROM Orders;

• MySQL Solution:
• SELECT OrderID, DATE_FORMAT(OrderDate, '%d-%b-%Y') AS OrderDateText
FROM Orders;

• Q.212

Question

279
1000+ SQL Interview Questions & Answers | By Zero Analyst

Convert the JoiningDate (a string) to a date format 'YYYY-MM-DD'.


Explanation
You need to convert the JoiningDate column, which is stored as text, into an actual
DATE type using a specific format ('DD-MM-YYYY').

Datasets and SQL Schemas


-- Table creation
CREATE TABLE EmployeeRecords (
EmployeeID INT PRIMARY KEY,
JoiningDate VARCHAR(20),
Department VARCHAR(100)
);

-- Datasets
INSERT INTO EmployeeRecords (EmployeeID, JoiningDate, Department)
VALUES
(1, '01-12-2020', 'HR'),
(2, '15-02-2021', 'Finance'),
(3, '28-06-2019', 'Engineering'),
(4, '07-08-2018', 'Sales'),
(5, '20-11-2017', 'Marketing'),
(6, '11-05-2022', 'Product'),
(7, '02-09-2020', 'Operations'),
(8, '18-03-2019', 'Legal');

Learnings

• Converting strings to dates using specific date formats


• Handling date formats using database functions like TO_DATE
• Managing data type conversions in SQL

Solutions

• - PostgreSQL solution
SELECT EmployeeID, TO_DATE(JoiningDate, 'DD-MM-YYYY') AS JoiningDate
FROM EmployeeRecords;

• - MySQL solution
SELECT EmployeeID, STR_TO_DATE(JoiningDate, '%d-%m-%Y') AS JoiningDate
FROM EmployeeRecords;

• Q.213

Question
Calculate the end date by adding the LeaveDuration to the LeaveStartDate.

Explanation
You need to calculate the end date of a leave request by adding the LeaveDuration
(in days) to the LeaveStartDate.

Datasets and SQL Schemas


-- Table creation
CREATE TABLE LeaveRequests (

280
1000+ SQL Interview Questions & Answers | By Zero Analyst

RequestID INT PRIMARY KEY,


EmployeeID INT,
LeaveStartDate DATE,
LeaveDuration INT
);

-- Datasets
INSERT INTO LeaveRequests (RequestID, EmployeeID, LeaveStartDate, LeaveDuration)
VALUES
(1, 101, '2024-02-01', 5),
(2, 102, '2024-03-05', 3),
(3, 103, '2024-04-10', 7);

Learnings

• Performing date arithmetic (adding days to a date)


• Using INTERVAL to add a specified number of days to a date
• Handling date calculations in SQL

Solutions

• - PostgreSQL solution
SELECT RequestID, EmployeeID, LeaveStartDate,
LeaveStartDate + INTERVAL '1 day' * LeaveDuration AS LeaveEndDate
FROM LeaveRequests;

• - MySQL solution
SELECT RequestID, EmployeeID, LeaveStartDate,
DATE_ADD(LeaveStartDate, INTERVAL LeaveDuration DAY) AS LeaveEndDate
FROM LeaveRequests;

• Q.214

Question
Extract the year from the TransactionDate.

Explanation
You need to extract the year part from the TransactionDate column.

Datasets and SQL Schemas


-- Table creation
CREATE TABLE Transactions (
TransactionID INT PRIMARY KEY,
TransactionDate DATE,
Amount DECIMAL(10, 2)
);

-- Datasets
INSERT INTO Transactions (TransactionID, TransactionDate, Amount)
VALUES
(1, '2023-11-20', 250.50),
(2, '2022-05-15', 180.75),
(3, '2024-01-10', 540.60);

Learnings

281
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Extracting specific parts of a date (e.g., year, month, day)


• Using the EXTRACT() function to retrieve the year
• Understanding date manipulation functions in SQL

Solutions

• - PostgreSQL solution
SELECT TransactionID, TransactionDate, EXTRACT(YEAR FROM TransactionDate) AS Tran
sactionYear
FROM Transactions;

• - MySQL solution
SELECT TransactionID, TransactionDate, YEAR(TransactionDate) AS TransactionYear
FROM Transactions;

• Q.215

Question
Find the longest streak of consecutive days each user has logged in, based on
their LoginDate.

Explanation
You need to find the longest consecutive streak of days that each user has logged in.
A consecutive streak means there are no gaps (i.e., the difference between consecutive
login dates is 1 day). This requires identifying and grouping consecutive days, then
calculating the longest streak for each user.

Datasets and SQL Schemas


-- Table creation
CREATE TABLE UserLogins (
UserID INT,
LoginDate DATE
);

-- Datasets
INSERT INTO UserLogins (UserID, LoginDate)
VALUES
(1, '2024-01-01'),
(1, '2024-01-02'),
(1, '2024-01-04'),
(1, '2024-01-05'),
(1, '2024-01-06'),
(2, '2024-01-01'),
(2, '2024-01-02'),
(2, '2024-01-04'),
(2, '2024-01-05'),
(3, '2024-01-03'),
(3, '2024-01-04'),
(3, '2024-01-05'),
(3, '2024-01-06'),
(4, '2024-01-01'),
(4, '2024-01-03'),
(4, '2024-01-04');

Learnings

282
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Using ROW_NUMBER() and LAG() window functions to identify consecutive


dates
• Handling complex date intervals and comparing rows based on date
differences
• Grouping consecutive dates and calculating the longest streaks for each user

Solutions
PostgreSQL Solution
WITH RankedLogins AS (
SELECT UserID, LoginDate,
ROW_NUMBER() OVER (PARTITION BY UserID ORDER BY LoginDate) -
ROW_NUMBER() OVER (PARTITION BY UserID ORDER BY LoginDate) AS StreakGr
oup
FROM UserLogins
),
ConsecutiveStreaks AS (
SELECT UserID, MIN(LoginDate) AS StreakStart, MAX(LoginDate) AS StreakEnd,
COUNT(*) AS StreakLength
FROM RankedLogins
GROUP BY UserID, StreakGroup
)
SELECT UserID, MAX(StreakLength) AS LongestStreak
FROM ConsecutiveStreaks
GROUP BY UserID;

MySQL Solution
WITH RankedLogins AS (
SELECT UserID, LoginDate,
DATEDIFF(LoginDate, @prev_date := IF(@prev_user = UserID, @prev_date,
NULL)) AS StreakGroup,
@prev_user := UserID
FROM UserLogins
ORDER BY UserID, LoginDate
),
ConsecutiveStreaks AS (
SELECT UserID, MIN(LoginDate) AS StreakStart, MAX(LoginDate) AS StreakEnd,
COUNT(*) AS StreakLength
FROM RankedLogins
GROUP BY UserID, StreakGroup
)
SELECT UserID, MAX(StreakLength) AS LongestStreak
FROM ConsecutiveStreaks
GROUP BY UserID;

Explanation of the Solution:

1. Step 1: RankedLogins identifies consecutive login dates by calculating a


streak group for each login. The key idea is to use ROW_NUMBER() to partition
by UserID and order by LoginDate. Subtracting the two ROW_NUMBER()
functions for each row effectively groups consecutive dates together.

2. Step 2: The ConsecutiveStreaks CTE (Common Table Expression) groups


consecutive logins by the streak identifier, and it calculates the length of each
streak.

3. Step 3: Finally, we select the longest streak for each user by finding the
maximum streak length (MAX(StreakLength)).

283
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Q.216

Question
Find the total sales for each day of the week.

Explanation
You need to extract the day of the week from the TransactionDate and calculate the
total sales for each day using the Amount column.

Datasets and SQL Schemas


-- Table creation
CREATE TABLE Transactions (
TransactionID INT PRIMARY KEY,
TransactionDate DATE,
Amount DECIMAL(10, 2)
);

-- Datasets
INSERT INTO Transactions (TransactionID, TransactionDate, Amount)
VALUES
(1, '2023-11-20', 250.50),
(2, '2022-05-15', 180.75),
(3, '2024-01-10', 540.60),
(4, '2023-11-21', 320.40),
(5, '2023-11-22', 150.20),
(6, '2023-11-20', 430.30),
(7, '2023-11-23', 210.10),
(8, '2023-11-24', 300.00),
(9, '2023-11-25', 150.00),
(10, '2023-11-26', 500.00),
(11, '2023-11-27', 410.25),
(12, '2023-11-28', 100.40),
(13, '2023-11-29', 750.90),
(14, '2023-11-30', 600.75),
(15, '2023-12-01', 230.15),
(16, '2023-12-02', 185.20),
(17, '2023-12-03', 420.60),
(18, '2023-12-04', 520.45),
(19, '2023-12-05', 310.10),
(20, '2023-12-06', 450.25),
(21, '2023-12-07', 650.80),
(22, '2023-12-08', 370.50),
(23, '2023-12-09', 330.30),
(24, '2023-12-10', 490.40),
(25, '2023-12-11', 210.75),
(26, '2023-12-12', 320.10);

Learnings

• Using date functions to extract the day of the week (e.g., DAYOFWEEK(),
EXTRACT())
• Aggregating data using SUM() to calculate total sales
• Grouping results by day of the week

Solutions

• - PostgreSQL solution
SELECT EXTRACT(DOW FROM TransactionDate) AS DayOfWeek,
SUM(Amount) AS TotalSales

284
1000+ SQL Interview Questions & Answers | By Zero Analyst

FROM Transactions
GROUP BY EXTRACT(DOW FROM TransactionDate)
ORDER BY DayOfWeek;

• - MySQL solution
SELECT DAYOFWEEK(TransactionDate) AS DayOfWeek,
SUM(Amount) AS TotalSales
FROM Transactions
GROUP BY DAYOFWEEK(TransactionDate)
ORDER BY DayOfWeek;

• Q.217

Question
Calculate the total ride duration for each ride, considering rides that span across
midnight (e.g., from 11:58 PM to 12:15 AM the next day) as well as rides that end on
the same day.

Explanation
You need to calculate the duration between the RideStartTime and RideEndTime,
handling both rides that span across midnight and those that end within the same day.
Ensure that the time difference is calculated correctly.

Datasets and SQL Schemas


-- Table creation
CREATE TABLE UberRides (
RideID INT PRIMARY KEY,
RideStartTime DATETIME,
RideEndTime DATETIME
);

-- Datasets
INSERT INTO UberRides (RideID, RideStartTime, RideEndTime)
VALUES
(1, '2024-01-01 23:58:00', '2024-01-02 00:15:00'),
(2, '2024-01-02 14:30:00', '2024-01-02 15:10:00'),
(3, '2024-01-03 07:45:00', '2024-01-03 08:25:00'),
(4, '2024-01-03 22:10:00', '2024-01-04 00:05:00'),
(5, '2024-01-04 12:00:00', '2024-01-04 12:45:00'),
(6, '2024-01-02 08:00:00', '2024-01-02 08:45:00'),
(7, '2024-01-02 09:15:00', '2024-01-02 10:00:00'),
(8, '2024-01-02 11:30:00', '2024-01-02 12:15:00'),
(9, '2024-01-02 13:00:00', '2024-01-02 13:50:00'),
(10, '2024-01-03 06:30:00', '2024-01-03 07:00:00'),
(11, '2024-01-03 09:00:00', '2024-01-03 09:30:00'),
(12, '2024-01-03 11:45:00', '2024-01-03 12:30:00'),
(13, '2024-01-03 15:00:00', '2024-01-03 15:40:00'),
(14, '2024-01-03 16:10:00', '2024-01-03 16:55:00'),
(15, '2024-01-03 17:00:00', '2024-01-03 17:30:00'),
(16, '2024-01-03 18:15:00', '2024-01-03 19:00:00'),
(17, '2024-01-03 20:00:00', '2024-01-03 20:45:00'),
(18, '2024-01-03 21:30:00', '2024-01-03 22:15:00'),
(19, '2024-01-03 23:00:00', '2024-01-03 23:40:00'),
(20, '2024-01-04 00:10:00', '2024-01-04 00:40:00'),
(21, '2024-01-04 02:00:00', '2024-01-04 02:45:00'),
(22, '2024-01-04 03:00:00', '2024-01-04 03:35:00'),
(23, '2024-01-04 05:10:00', '2024-01-04 05:50:00'),
(24, '2024-01-04 06:30:00', '2024-01-04 07:15:00'),
(25, '2024-01-04 08:00:00', '2024-01-04 08:50:00');

Learnings

285
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Calculating time differences between two DATETIME values, including spans


across midnight
• Using TIMESTAMPDIFF() or EXTRACT(EPOCH FROM ...) to compute ride
durations
• Handling rides that end on the same day versus those that span midnight

Solutions

• - PostgreSQL solution
SELECT RideID,
RideStartTime,
RideEndTime,
EXTRACT(EPOCH FROM (RideEndTime - RideStartTime)) / 60 AS RideDurationInMi
nutes
FROM UberRides;

• - MySQL solution
SELECT RideID,
RideStartTime,
RideEndTime,
TIMESTAMPDIFF(MINUTE, RideStartTime, RideEndTime) AS RideDurationInMinutes
FROM UberRides;

• Q.218

Question
Calculate the difference in months between the first and last purchase dates for
each customer.

Explanation
You need to calculate the number of months between the earliest purchase
(FirstPurchaseDate) and the latest purchase (LastPurchaseDate) for each
customer. You should handle edge cases where the purchases might be in different
years or months.

Datasets and SQL Schemas


-- Table creation
CREATE TABLE CustomerPurchases (
CustomerID INT PRIMARY KEY,
PurchaseID INT,
PurchaseDate DATE
);

-- Datasets
INSERT INTO CustomerPurchases (CustomerID, PurchaseID, PurchaseDate)
VALUES
(1, 101, '2023-03-15'),
(1, 102, '2023-08-20'),
(1, 103, '2024-01-10'),
(2, 104, '2022-02-05'),
(2, 105, '2023-07-15'),
(2, 106, '2024-04-22'),
(3, 107, '2023-06-11'),
(3, 108, '2023-09-25');

286
1000+ SQL Interview Questions & Answers | By Zero Analyst

Learnings

• Using DATEDIFF() and MONTH() to calculate date differences


• Handling date functions across months and years
• Grouping results by customer ID

Solutions

• - PostgreSQL solution
SELECT CustomerID,
DATE_PART('year', MAX(PurchaseDate)) - DATE_PART('year', MIN(PurchaseDate)
) AS YearDiff,
DATE_PART('month', MAX(PurchaseDate)) - DATE_PART('month', MIN(PurchaseDat
e)) AS MonthDiff,
EXTRACT(MONTH FROM MAX(PurchaseDate) - MIN(PurchaseDate)) AS MonthsBetween
FROM CustomerPurchases
GROUP BY CustomerID;

• - MySQL solution
SELECT CustomerID,
TIMESTAMPDIFF(MONTH, MIN(PurchaseDate), MAX(PurchaseDate)) AS MonthsBetwee
n
FROM CustomerPurchases
GROUP BY CustomerID;

• Q.219

Question
Find the date of the last Friday of each month for the last 12 months.

Explanation
For each of the last 12 months, find the date of the last Friday. This involves
calculating the last day of the month and then adjusting backward to the previous
Friday if necessary.

Datasets and SQL Schemas


No datasets are needed for this query as it is focused on generating dynamic date
values based on the system’s date.

Learnings

• Using LAST_DAY() to find the last day of the month


• Using date manipulation functions to adjust to the last Friday of the month
• Handling dynamic date calculations for recurring time intervals (e.g., monthly)

Solutions

• - PostgreSQL solution
SELECT

287
1000+ SQL Interview Questions & Answers | By Zero Analyst

CURRENT_DATE - INTERVAL '1 month' * series AS FirstOfMonth,


CURRENT_DATE - INTERVAL '1 month' * series - EXTRACT(DOW FROM (CURRENT_DATE -
INTERVAL '1 month' * series))::INT + 5 AS LastFriday
FROM generate_series(1, 12) AS series;

• - MySQL solution
SELECT
LAST_DAY(CURRENT_DATE - INTERVAL n MONTH) AS LastDayOfMonth,
DATE_SUB(LAST_DAY(CURRENT_DATE - INTERVAL n MONTH), INTERVAL (DAYOFWEEK(LAST_
DAY(CURRENT_DATE - INTERVAL n MONTH)) + 1) % 7 DAY) AS LastFriday
FROM (SELECT 0 AS n UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4
UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9 UNION
SELECT 10 UNION SELECT 11) AS months;

• Q.220

Question
Find the number of days since each user last logged in.

Explanation
You need to find the number of days since each user last logged in based on the
LastLoginDate column. This should be calculated from the current date, taking into
account the system's current date.

Datasets and SQL Schemas


-- Table creation
CREATE TABLE UserLogins (
UserID INT PRIMARY KEY,
LastLoginDate DATE
);

-- Datasets
INSERT INTO UserLogins (UserID, LastLoginDate)
VALUES
(1, '2023-12-01'),
(2, '2024-01-05'),
(3, '2023-10-15'),
(4, '2023-11-23'),
(5, '2024-01-01');

Learnings

• Using CURRENT_DATE or NOW() to get the current date


• Using DATEDIFF() or DATE_SUB() to calculate the difference between dates
• Handling different time zones and ensuring accurate date calculations

Solutions

• - PostgreSQL solution
SELECT UserID,
CURRENT_DATE - LastLoginDate AS DaysSinceLastLogin
FROM UserLogins;

• - MySQL solution

288
1000+ SQL Interview Questions & Answers | By Zero Analyst

SELECT UserID,
DATEDIFF(CURRENT_DATE, LastLoginDate) AS DaysSinceLastLogin
FROM UserLogins;

Case Statements

• Q.221

Question
Use a CASE statement to categorize customers based on their TotalSpent.
Explanation
This question asks you to categorize customers into three spending categories: 'Low
Spender', 'Medium Spender', and 'High Spender', based on the TotalSpent value. Use
a CASE statement to assign these categories.

Datasets and SQL Schemas

• - Table creation
CREATE TABLE CustomerPurchases (
CustomerID INT PRIMARY KEY,
TotalSpent DECIMAL(10, 2)
);

• - Datasets
INSERT INTO CustomerPurchases VALUES
(1, 350.00),
(2, 1500.00),
(3, 75.00),
(4, 230.00),
(5, 1200.00),
(6, 450.00),
(7, 25.00),
(8, 890.00),
(9, 1020.00),
(10, 300.00),
(11, 150.00),
(12, 600.00),
(13, 750.00),
(14, 50.00),
(15, 10.00);

Learnings

• Use of CASE statement for conditional logic.


• Categorization based on ranges in SQL.
• Handling different conditions using WHEN, BETWEEN, and ELSE.

Solutions

• - PostgreSQL solution

289
1000+ SQL Interview Questions & Answers | By Zero Analyst

SELECT CustomerID, TotalSpent,


CASE
WHEN TotalSpent < 100 THEN 'Low Spender'
WHEN TotalSpent BETWEEN 100 AND 500 THEN 'Medium Spender'
ELSE 'High Spender'
END AS SpendingCategory
FROM CustomerPurchases;

• - MySQL solution
SELECT CustomerID, TotalSpent,
CASE
WHEN TotalSpent < 100 THEN 'Low Spender'
WHEN TotalSpent BETWEEN 100 AND 500 THEN 'Medium Spender'
ELSE 'High Spender'
END AS SpendingCategory
FROM CustomerPurchases;

• Q.222

Question
Use a CASE statement to assign a salary grade based on Salary.
Explanation
This question asks you to categorize employees into salary grades: 'Grade A', 'Grade
B', and 'Grade C', based on their salary. Use a CASE statement to assign the grades
accordingly.

Datasets and SQL Schemas

• - Table creation
CREATE TABLE Employees (
EmployeeID INT PRIMARY KEY,
EmployeeName VARCHAR(100),
Salary DECIMAL(10, 2)
);

• - Datasets
INSERT INTO Employees VALUES
(1, 'John Doe', 120000),
(2, 'Alice Smith', 90000),
(3, 'Bob Brown', 60000),
(4, 'Charlie Davis', 135000),
(5, 'Eve Harris', 75000),
(6, 'Frank Black', 110000),
(7, 'Grace White', 85000),
(8, 'Helen Green', 95000),
(9, 'Igor King', 65000),
(10, 'Jackie Lewis', 140000),
(11, 'Kevin Moore', 115000),
(12, 'Liam Wilson', 80000),
(13, 'Mona Clark', 90000),
(14, 'Nancy Adams', 70000),
(15, 'Oscar Scott', 105000);

Learnings

290
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Use of CASE statement to assign categories based on conditions.


• Salary ranges can be checked using BETWEEN in SQL.
• Categorization into grades or levels is a common use case for CASE.

Solutions

• - PostgreSQL solution
SELECT EmployeeID, EmployeeName, Salary,
CASE
WHEN Salary > 100000 THEN 'Grade A'
WHEN Salary BETWEEN 70000 AND 100000 THEN 'Grade B'
ELSE 'Grade C'
END AS SalaryGrade
FROM Employees;

• - MySQL solution
SELECT EmployeeID, EmployeeName, Salary,
CASE
WHEN Salary > 100000 THEN 'Grade A'
WHEN Salary BETWEEN 70000 AND 100000 THEN 'Grade B'
ELSE 'Grade C'
END AS SalaryGrade
FROM Employees;

• Q.223

Question
Use a CASE statement to determine the delivery status of each order.
Explanation
This question asks you to classify the delivery status of orders into three categories:
'Pending', 'Delivered On Time', and 'Late Delivery', based on whether the
DeliveryDate is NULL or falls before or after the OrderDate. Use a CASE statement to
categorize each order accordingly.

Datasets and SQL Schemas

• - Table creation
CREATE TABLE Orders (
OrderID INT PRIMARY KEY,
CustomerID INT,
OrderDate DATE,
DeliveryDate DATE
);

• - Datasets
INSERT INTO Orders VALUES
(1, 101, '2024-01-10', '2024-01-15'),
(2, 102, '2024-02-05', NULL),
(3, 103, '2024-01-15', '2024-01-20'),
(4, 104, '2024-02-01', '2024-02-03'),

291
1000+ SQL Interview Questions & Answers | By Zero Analyst

(5, 105, '2024-01-25', '2024-01-30'),


(6, 106, '2024-01-18', '2024-01-20'),
(7, 107, '2024-02-10', NULL),
(8, 108, '2024-02-15', '2024-02-16'),
(9, 109, '2024-01-12', '2024-01-17'),
(10, 110, '2024-02-01', '2024-02-02'),
(11, 111, '2024-01-20', '2024-01-25'),
(12, 112, '2024-02-05', NULL),
(13, 113, '2024-01-28', '2024-02-01'),
(14, 114, '2024-02-08', '2024-02-10'),
(15, 115, '2024-02-12', NULL);

Learnings

• Using CASE for conditional logic based on date comparisons.


• Handling NULL values in SQL with IS NULL.
• Categorizing data based on conditions such as date ranges or missing values.

Solutions

• - PostgreSQL solution
SELECT OrderID, CustomerID, OrderDate, DeliveryDate,
CASE
WHEN DeliveryDate IS NULL THEN 'Pending'
WHEN DeliveryDate > OrderDate THEN 'Delivered On Time'
ELSE 'Late Delivery'
END AS DeliveryStatus
FROM Orders;

• - MySQL solution
SELECT OrderID, CustomerID, OrderDate, DeliveryDate,
CASE
WHEN DeliveryDate IS NULL THEN 'Pending'
WHEN DeliveryDate > OrderDate THEN 'Delivered On Time'
ELSE 'Late Delivery'
END AS DeliveryStatus
FROM Orders;

• Q.224

Question
Use a CASE statement to categorize products based on their Rating.
Explanation
This question requires you to categorize products based on their Rating. The
categories are 'Excellent' for ratings 4.5 and above, 'Good' for ratings between 3.5 and
4.4, and 'Average' for ratings below 3.5. Use a CASE statement to assign the
appropriate category.

Datasets and SQL Schemas

• - Table creation

292
1000+ SQL Interview Questions & Answers | By Zero Analyst

CREATE TABLE ProductReviews (


ProductID INT PRIMARY KEY,
Rating DECIMAL(3, 2)
);

• - Datasets
INSERT INTO ProductReviews VALUES
(101, 4.7),
(102, 3.5),
(103, 2.9),
(104, 4.0),
(105, 4.8),
(106, 3.2),
(107, 4.4),
(108, 3.8),
(109, 4.1),
(110, 2.5),
(111, 3.6),
(112, 4.9),
(113, 3.0),
(114, 2.0),
(115, 4.2);

Learnings

• Categorizing data using CASE based on numeric conditions.


• Using BETWEEN to check for a range of values.
• Handling various conditions for classification (e.g., >=, BETWEEN, ELSE).

Solutions

• - PostgreSQL solution
SELECT ProductID, Rating,
CASE
WHEN Rating >= 4.5 THEN 'Excellent'
WHEN Rating BETWEEN 3.5 AND 4.4 THEN 'Good'
ELSE 'Average'
END AS RatingCategory
FROM ProductReviews;

• - MySQL solution
SELECT ProductID, Rating,
CASE
WHEN Rating >= 4.5 THEN 'Excellent'
WHEN Rating BETWEEN 3.5 AND 4.4 THEN 'Good'
ELSE 'Average'
END AS RatingCategory
FROM ProductReviews;

• Q.225

Question
Use a CASE statement to mark high-value transactions.
Explanation

293
1000+ SQL Interview Questions & Answers | By Zero Analyst

This question asks you to categorize transactions as either 'High Value' or 'Standard
Value' based on the TransactionAmount. If the transaction amount is greater than
1000, mark it as 'High Value'; otherwise, it should be 'Standard Value'. Use a CASE
statement for this categorization.

Datasets and SQL Schemas

• - Table creation
CREATE TABLE CryptoTransactions (
TransactionID INT PRIMARY KEY,
TransactionDate DATE,
TransactionAmount DECIMAL(10, 2),
TransactionType VARCHAR(50)
);

• - Datasets
INSERT INTO CryptoTransactions VALUES
(1, '2024-01-10', 1000, 'Deposit'),
(2, '2024-02-05', 500, 'Withdrawal'),
(3, '2024-02-10', 1500, 'Deposit'),
(4, '2024-02-15', 300, 'Deposit'),
(5, '2024-02-20', 100, 'Withdrawal'),
(6, '2024-02-25', 800, 'Deposit'),
(7, '2024-03-01', 1200, 'Withdrawal'),
(8, '2024-03-05', 500, 'Deposit'),
(9, '2024-03-10', 200, 'Withdrawal'),
(10, '2024-03-15', 1500, 'Deposit'),
(11, '2024-03-20', 900, 'Withdrawal'),
(12, '2024-03-25', 1300, 'Deposit'),
(13, '2024-03-30', 700, 'Withdrawal'),
(14, '2024-04-01', 2500, 'Deposit'),
(15, '2024-04-05', 1000, 'Withdrawal');

Learnings

• Using CASE to categorize data based on numeric conditions.


• Handling simple > comparisons for classification.
• Assigning different statuses based on conditions in SQL.

Solutions

• - PostgreSQL solution
SELECT TransactionID, TransactionDate, TransactionAmount, TransactionType,
CASE
WHEN TransactionAmount > 1000 THEN 'High Value'
ELSE 'Standard Value'
END AS TransactionStatus
FROM CryptoTransactions;

• - MySQL solution
SELECT TransactionID, TransactionDate, TransactionAmount, TransactionType,
CASE

294
1000+ SQL Interview Questions & Answers | By Zero Analyst

WHEN TransactionAmount > 1000 THEN 'High Value'


ELSE 'Standard Value'
END AS TransactionStatus
FROM CryptoTransactions;

• Q.226

Question
Use a CASE statement to mark products as Available or Out of Stock.
Explanation
This question requires you to determine the availability of products based on their
StockQuantity. If the stock quantity is greater than 0, mark the product as
'Available'; otherwise, mark it as 'Out of Stock'. A CASE statement is used for this
categorization.

Datasets and SQL Schemas

• - Table creation
CREATE TABLE Products (
ProductID INT PRIMARY KEY,
ProductName VARCHAR(100),
StockQuantity INT
);

• - Datasets
INSERT INTO Products VALUES
(101, 'T-shirt', 50),
(102, 'Jeans', 0),
(103, 'Jacket', 5),
(104, 'Hat', 30),
(105, 'Scarf', 0),
(106, 'Socks', 15),
(107, 'Sweater', 10),
(108, 'Shoes', 20),
(109, 'Gloves', 0),
(110, 'Coat', 8),
(111, 'Dress', 25),
(112, 'Skirt', 12),
(113, 'Blouse', 40),
(114, 'Belt', 5),
(115, 'Trousers', 0);

Learnings

• Using CASE for conditional categorization based on numeric values.


• Checking for available stock with a simple condition (> 0).
• Categorizing products based on stock availability.

Solutions

• - PostgreSQL solution

295
1000+ SQL Interview Questions & Answers | By Zero Analyst

SELECT ProductID, ProductName, StockQuantity,


CASE
WHEN StockQuantity > 0 THEN 'Available'
ELSE 'Out of Stock'
END AS Availability
FROM Products;

• - MySQL solution
SELECT ProductID, ProductName, StockQuantity,
CASE
WHEN StockQuantity > 0 THEN 'Available'
ELSE 'Out of Stock'
END AS Availability
FROM Products;

• Q.227

Question
Use a CASE statement to categorize customers by their age.
Explanation
This question asks you to categorize customers based on their Age into three groups:
'Young' for ages less than 35, 'Middle-aged' for ages between 35 and 55, and 'Senior'
for ages 56 and above. A CASE statement is used to assign the appropriate age
category.

Datasets and SQL Schemas

• - Table creation
CREATE TABLE Customers (
CustomerID INT PRIMARY KEY,
CustomerName VARCHAR(100),
Age INT
);

• - Datasets
INSERT INTO Customers VALUES
(1, 'John Doe', 30),
(2, 'Alice Smith', 65),
(3, 'Bob Brown', 45),
(4, 'Charlie Davis', 25),
(5, 'Eve Harris', 55),
(6, 'Frank Black', 40),
(7, 'Grace White', 20),
(8, 'Helen Green', 33),
(9, 'Igor King', 60),
(10, 'Jackie Lewis', 28),
(11, 'Kevin Moore', 50),
(12, 'Liam Wilson', 60),
(13, 'Mona Clark', 35),
(14, 'Nancy Adams', 70),
(15, 'Oscar Scott', 41);

Learnings

296
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Using CASE to categorize data based on age ranges.


• Using BETWEEN for a range of values in SQL.
• Assigning different categories based on conditions.

Solutions

• - PostgreSQL solution
SELECT CustomerID, CustomerName, Age,
CASE
WHEN Age < 35 THEN 'Young'
WHEN Age BETWEEN 35 AND 55 THEN 'Middle-aged'
ELSE 'Senior'
END AS AgeCategory
FROM Customers;

• - MySQL solution
SELECT CustomerID, CustomerName, Age,
CASE
WHEN Age < 35 THEN 'Young'
WHEN Age BETWEEN 35 AND 55 THEN 'Middle-aged'
ELSE 'Senior'
END AS AgeCategory
FROM Customers;

• Q.228

Question
Sales Performance Categorization
Given a table of sales transactions, categorize sales performance based on the total
amount of sales for each employee. If an employee’s total sales exceed $100,000,
classify them as 'Top Performer'. If the total sales are between $50,000 and $100,000,
classify them as 'Average Performer'. If the total sales are below $50,000, classify
them as 'Low Performer'.
Explanation
You need to group the sales transactions by employee, calculate the total sales per
employee, and then categorize them based on their sales performance.

Datasets and SQL Schemas

• - Table creation
CREATE TABLE SalesTransactions (
EmployeeID INT,
TransactionAmount DECIMAL(10, 2)
);

• - Datasets

297
1000+ SQL Interview Questions & Answers | By Zero Analyst

INSERT INTO SalesTransactions VALUES


(1, 15000),
(1, 35000),
(1, 40000),
(2, 5000),
(2, 30000),
(2, 20000),
(3, 10000),
(3, 25000),
(3, 4000),
(4, 60000);

Learnings

• Aggregating data using SUM to calculate total sales.


• Categorizing the data using conditions.
• Combining aggregation and conditional logic.

Solution (PostgreSQL / MySQL)


SELECT EmployeeID, SUM(TransactionAmount) AS TotalSales,
CASE
WHEN SUM(TransactionAmount) > 100000 THEN 'Top Performer'
WHEN SUM(TransactionAmount) BETWEEN 50000 AND 100000 THEN 'Average Perfor
mer'
ELSE 'Low Performer'
END AS PerformanceCategory
FROM SalesTransactions
GROUP BY EmployeeID;

• Q.229

Question
Product Pricing Strategy
Given a table of product prices, identify the pricing strategy for each product. If the
price is greater than $100, categorize it as 'Premium'. If the price is between $50 and
$100, categorize it as 'Standard'. If the price is below $50, categorize it as 'Discount'.
Explanation
You need to categorize products based on their prices.

Datasets and SQL Schemas

• - Table creation
CREATE TABLE Products (
ProductID INT,
ProductName VARCHAR(100),
Price DECIMAL(10, 2)
);

• - Datasets
INSERT INTO Products VALUES
(1, 'Smartphone', 150),

298
1000+ SQL Interview Questions & Answers | By Zero Analyst

(2, 'Laptop', 999),


(3, 'Headphones', 40),
(4, 'Tablet', 250),
(5, 'Charger', 25),
(6, 'Smartwatch', 75),
(7, 'Camera', 200),
(8, 'Mouse', 20),
(9, 'Keyboard', 80);

Learnings

• Using CASE to classify products based on price.


• Leveraging numeric conditions for categorization.
• Handling price ranges effectively in SQL.

Solution (PostgreSQL / MySQL)


SELECT ProductID, ProductName, Price,
CASE
WHEN Price > 100 THEN 'Premium'
WHEN Price BETWEEN 50 AND 100 THEN 'Standard'
ELSE 'Discount'
END AS PricingStrategy
FROM Products;

• Q.230

Question
Employee Tenure and Salary Adjustment
For each employee, determine if they are eligible for a salary increase based on their
years of service. If an employee has been with the company for 5 years or more, they
are eligible for a '10% Increase'. If they have been with the company for less than 5
years but more than 2 years, they are eligible for a '5% Increase'. Employees with less
than 2 years are not eligible for any increase.
Explanation
You need to calculate the years of service for each employee and apply different
salary increases based on their tenure. The date of hire is provided, and you need to
compute the years of service by comparing the current date with the hire date.

Datasets and SQL Schemas

• - Table creation
CREATE TABLE Employees (
EmployeeID INT,
EmployeeName VARCHAR(100),
HireDate DATE,
Salary DECIMAL(10, 2)
);

• - Datasets
INSERT INTO Employees VALUES

299
1000+ SQL Interview Questions & Answers | By Zero Analyst

(1, 'John Doe', '2018-03-01', 50000),


(2, 'Alice Smith', '2021-06-15', 45000),
(3, 'Bob Brown', '2017-09-20', 60000),
(4, 'Charlie Davis', '2019-11-01', 55000),
(5, 'Eve Harris', '2022-01-10', 40000),
(6, 'Frank Black', '2020-05-15', 48000),
(7, 'Grace White', '2015-12-12', 70000);

Learnings

• Calculating the difference between dates using DATEDIFF or date subtraction.


• Using CASE to apply different conditions based on computed tenure.
• Handling conditional logic for salary adjustments.

Solution (PostgreSQL / MySQL)


SELECT EmployeeID, EmployeeName, Salary,
CASE
WHEN EXTRACT(YEAR FROM CURRENT_DATE) - EXTRACT(YEAR FROM HireDate) >= 5 T
HEN '10% Increase'
WHEN EXTRACT(YEAR FROM CURRENT_DATE) - EXTRACT(YEAR FROM HireDate) BETWEE
N 2 AND 5 THEN '5% Increase'
ELSE 'No Increase'
END AS SalaryAdjustment
FROM Employees;

Set Operations (Union, Union All, Excepts)

• Q.231

Merging Customer Feedback from Different Sources


Question
You are given two tables, WebsiteFeedback and StoreFeedback, where both tables
store customer feedback for a company. Each table has CustomerID and Feedback
columns. The WebsiteFeedback table contains feedback from the website, and the
StoreFeedback table contains feedback from physical stores. Write a query to merge
these tables and return a unified list of feedback, removing any duplicate entries.
Explanation
This task involves merging data from two different sources while ensuring that
duplicates are removed. You'll use the UNION operator to combine both tables while
eliminating duplicates.

Datasets and SQL Schemas

• - Table creation
CREATE TABLE WebsiteFeedback (
CustomerID INT,
Feedback TEXT
);

• - Datasets

300
1000+ SQL Interview Questions & Answers | By Zero Analyst

INSERT INTO WebsiteFeedback VALUES


(1, 'Great website!'),
(2, 'Easy to navigate'),
(3, 'Loved the product range'),
(4, 'Excellent service');

• - Table creation
CREATE TABLE StoreFeedback (
CustomerID INT,
Feedback TEXT
);

• - Datasets
INSERT INTO StoreFeedback VALUES
(1, 'Great website!'),
(5, 'Friendly staff'),
(6, 'Store was clean and organized'),
(4, 'Excellent service');

Learnings

• Using UNION to merge data from different tables.


• Eliminating duplicate entries using UNION.
• Combining results from multiple sources without duplication.

Solution (PostgreSQL / MySQL)


SELECT CustomerID, Feedback
FROM WebsiteFeedback
UNION
SELECT CustomerID, Feedback
FROM StoreFeedback;

• Q.232

Question
You are given two product catalogs: CatalogA and CatalogB. Each table contains
columns ProductID and ProductName. Write a query to find the products that appear
in both catalogs (common products), and also find the products that are in either one
but not the other (unique products). Use appropriate set operations to achieve this.
Explanation
This question requires you to:

• Find the intersection (common products) using INTERSECT.


• Find the exclusive products using EXCEPT.

Datasets and SQL Schemas

• - Table creation

301
1000+ SQL Interview Questions & Answers | By Zero Analyst

CREATE TABLE CatalogA (


ProductID INT,
ProductName VARCHAR(100)
);

• - Datasets
INSERT INTO CatalogA VALUES
(101, 'Laptop'),
(102, 'Smartphone'),
(103, 'Tablet'),
(104, 'Monitor');

• - Table creation
CREATE TABLE CatalogB (
ProductID INT,
ProductName VARCHAR(100)
);

• - Datasets
INSERT INTO CatalogB VALUES
(102, 'Smartphone'),
(103, 'Tablet'),
(105, 'Keyboard'),
(106, 'Mouse');

Learnings

• Using INTERSECT to find common items between two datasets.


• Using EXCEPT to find items that are present in one dataset but not the other.
• Combining set operations for complex data comparisons.

Solution (PostgreSQL / MySQL)


-- Find common products (intersection)
SELECT ProductID, ProductName
FROM CatalogA
INTERSECT
SELECT ProductID, ProductName
FROM CatalogB;

-- Find unique products (items in either CatalogA or CatalogB but not both)
SELECT ProductID, ProductName
FROM CatalogA
EXCEPT
SELECT ProductID, ProductName
FROM CatalogB
UNION
SELECT ProductID, ProductName
FROM CatalogB
EXCEPT
SELECT ProductID, ProductName
FROM CatalogA;

• Q.233

Question

302
1000+ SQL Interview Questions & Answers | By Zero Analyst

You are managing two course enrollment tables: MathEnrollment and


ScienceEnrollment. Both tables have StudentID and CourseName columns. Write a
query to:

1. Find students who are enrolled in either the Math or Science course (or both).

2. Find students who are only enrolled in one course (i.e., students enrolled only
in Math or only in Science, but not both).

Explanation
This involves:

• Using UNION to find all students enrolled in either course.


• Using EXCEPT to find students who are enrolled in only one course and not the
other.

Datasets and SQL Schemas

• - Table creation
CREATE TABLE MathEnrollment (
StudentID INT,
CourseName VARCHAR(100)
);

• - Datasets
INSERT INTO MathEnrollment VALUES
(1, 'Math'),
(2, 'Math'),
(3, 'Math'),
(4, 'Math');

• - Table creation
CREATE TABLE ScienceEnrollment (
StudentID INT,
CourseName VARCHAR(100)
);

• - Datasets
INSERT INTO ScienceEnrollment VALUES
(2, 'Science'),
(3, 'Science'),
(5, 'Science');

Learnings

• Using UNION to combine data from two sources.


• Using EXCEPT to find students enrolled only in one course and not the other.
• Understanding set operations for filtering and combining data from different
sources.

303
1000+ SQL Interview Questions & Answers | By Zero Analyst

Solution (PostgreSQL / MySQL)


-- Find students enrolled in either Math or Science or both
SELECT StudentID, CourseName
FROM MathEnrollment
UNION
SELECT StudentID, CourseName
FROM ScienceEnrollment;

-- Find students enrolled only in one course (Math or Science but not both)
SELECT StudentID, CourseName
FROM MathEnrollment
EXCEPT
SELECT StudentID, CourseName
FROM ScienceEnrollment
UNION
SELECT StudentID, CourseName
FROM ScienceEnrollment
EXCEPT
SELECT StudentID, CourseName
FROM MathEnrollment;

• Q.234

Question
You are given two tables, CarModels2023 and CarModels2024, which contain the list
of Tesla car models for the years 2023 and 2024. Both tables have columns ModelID
and ModelName. Write a query to get a unified list of all Tesla car models from both
years, ensuring that duplicates are removed.
Explanation
Use UNION to merge the car models from both years and remove duplicates.

Datasets and SQL Schemas

• - Table creation
CREATE TABLE CarModels2023 (
ModelID INT,
ModelName VARCHAR(100)
);

• - Datasets
INSERT INTO CarModels2023 VALUES
(1, 'Model S'),
(2, 'Model 3'),
(3, 'Model X'),
(4, 'Model Y');

• - Table creation
CREATE TABLE CarModels2024 (
ModelID INT,
ModelName VARCHAR(100)
);

304
1000+ SQL Interview Questions & Answers | By Zero Analyst

• - Datasets
INSERT INTO CarModels2024 VALUES
(1, 'Model S'),
(2, 'Model 3'),
(5, 'Cybertruck'),
(6, 'Roadster');

Learnings

• Using UNION to remove duplicates while merging data.


• Handling data across different years in the same category.

Solution (PostgreSQL / MySQL)


SELECT ModelID, ModelName
FROM CarModels2023
UNION
SELECT ModelID, ModelName
FROM CarModels2024;

• Q.235

Question
You are given two tables, ElectricTeslaModels and StandardTeslaModels, that
contain lists of Tesla car models. The ElectricTeslaModels table lists models that
are electric, and the StandardTeslaModels table lists standard (non-electric) Tesla
models. Write a query to get a list of all Tesla models, including electric and non-
electric, ensuring that duplicate entries are included when a model appears in both
tables.
Explanation
Use UNION ALL to include all models from both tables, even if some models appear in
both tables.

Datasets and SQL Schemas

• - Table creation
CREATE TABLE ElectricTeslaModels (
ModelID INT,
ModelName VARCHAR(100)
);

• - Datasets
INSERT INTO ElectricTeslaModels VALUES
(1, 'Model S'),
(2, 'Model 3'),
(3, 'Model X'),
(4, 'Model Y');

• - Table creation

305
1000+ SQL Interview Questions & Answers | By Zero Analyst

CREATE TABLE StandardTeslaModels (


ModelID INT,
ModelName VARCHAR(100)
);

• - Datasets
INSERT INTO StandardTeslaModels VALUES
(2, 'Model 3'),
(5, 'Cybertruck'),
(6, 'Roadster');

Learnings

• Using UNION ALL to combine all records from different tables, including
duplicates.
• Merging different data sets while preserving duplicate entries.

Solution (PostgreSQL / MySQL)


SELECT ModelID, ModelName
FROM ElectricTeslaModels
UNION ALL
SELECT ModelID, ModelName
FROM StandardTeslaModels;
• Q.236
Question
You are given two tables, ElectricCars and HybridCars, which contain Tesla car models
available in electric and hybrid variants, respectively. Each table has ModelID and
ModelName. Write a query to find the car models that are common to both electric and hybrid
categories, and the ones that are unique to either electric or hybrid.
Explanation
• Use INTERSECT to find the common models between the two tables.
• Use EXCEPT to find the models that are exclusive to each category.

Datasets and SQL Schemas


• - Table creation
CREATE TABLE ElectricCars (
ModelID INT,
ModelName VARCHAR(100)
);
• - Datasets
INSERT INTO ElectricCars VALUES
(1, 'Model S'),
(2, 'Model 3'),
(3, 'Model X'),
(4, 'Model Y');
• - Table creation
CREATE TABLE HybridCars (
ModelID INT,
ModelName VARCHAR(100)
);
• - Datasets
INSERT INTO HybridCars VALUES

306
1000+ SQL Interview Questions & Answers | By Zero Analyst

(2, 'Model 3'),


(4, 'Model Y'),
(5, 'Cybertruck');

Learnings
• Using INTERSECT to find common records.
• Using EXCEPT to find unique records between tables.

Solution (PostgreSQL / MySQL)


-- Find common models (electric and hybrid)
SELECT ModelID, ModelName
FROM ElectricCars
INTERSECT
SELECT ModelID, ModelName
FROM HybridCars;

-- Find models exclusive to each category (electric or hybrid)


SELECT ModelID, ModelName
FROM ElectricCars
EXCEPT
SELECT ModelID, ModelName
FROM HybridCars
UNION
SELECT ModelID, ModelName
FROM HybridCars
EXCEPT
SELECT ModelID, ModelName
FROM ElectricCars;
• Q.237
Question
You are given two tables: ElectronicsPurchases and ClothingPurchases. Both tables
store CustomerID and ProductID for customers who purchased items from the respective
categories. Write a query to find customers who bought items from only one category (either
Electronics or Clothing, but not both).
Explanation
Use EXCEPT to find customers who bought items from one table and not the other.

Datasets and SQL Schemas


• - Table creation
CREATE TABLE ElectronicsPurchases (
CustomerID INT,
ProductID INT
);
• - Datasets
INSERT INTO ElectronicsPurchases VALUES
(1, 101),
(2, 102),
(3, 103),
(4, 104);
• - Table creation
CREATE TABLE ClothingPurchases (
CustomerID INT,
ProductID INT
);
• - Datasets
INSERT INTO ClothingPurchases VALUES

307
1000+ SQL Interview Questions & Answers | By Zero Analyst

(2, 201),
(3, 202),
(5, 203),
(6, 204);

Learnings
• Using EXCEPT to find records in one table but not the other.
• Filtering unique customers based on their purchase categories.

Solution (PostgreSQL / MySQL)


-- Customers who bought only from Electronics (not Clothing)
SELECT CustomerID
FROM ElectronicsPurchases
EXCEPT
SELECT CustomerID
FROM ClothingPurchases;

-- Customers who bought only from Clothing (not Electronics)


SELECT CustomerID
FROM ClothingPurchases
EXCEPT
SELECT CustomerID
FROM ElectronicsPurchases;
• Q.238
Question
You are given two tables: BooksPurchases and GroceryPurchases. Both tables have
columns CustomerID and ProductID. Write a query to find the products that have been
purchased by customers who bought both books and groceries (i.e., products bought by the
same customer from both categories).
Explanation
Use INTERSECT to find products that appear in both tables for the same customer.

Datasets and SQL Schemas


• - Table creation
CREATE TABLE BooksPurchases (
CustomerID INT,
ProductID INT
);
• - Datasets
INSERT INTO BooksPurchases VALUES
(1, 101),
(2, 102),
(3, 103),
(4, 104);
• - Table creation
CREATE TABLE GroceryPurchases (
CustomerID INT,
ProductID INT
);
• - Datasets
INSERT INTO GroceryPurchases VALUES
(1, 201),
(2, 202),
(4, 104),
(5, 203);

308
1000+ SQL Interview Questions & Answers | By Zero Analyst

Learnings
• Using INTERSECT to find common items purchased by the same customer across two
different categories.
• Identifying customers who purchase across multiple product categories.

Solution (PostgreSQL / MySQL)


-- Find products purchased by customers who bought from both Books and Grocery
SELECT ProductID
FROM BooksPurchases
INTERSECT
SELECT ProductID
FROM GroceryPurchases;
• Q.239
Question
You are given two tables: FashionPurchases and BeautyPurchases. Both tables contain
CustomerID and ProductID. Write a query to combine the customers who bought either
Fashion or Beauty products (or both) without eliminating duplicates.
Explanation
Use UNION ALL to combine customers from both categories, ensuring that duplicates are not
removed.

Datasets and SQL Schemas


• - Table creation
CREATE TABLE FashionPurchases (
CustomerID INT,
ProductID INT
);
• - Datasets
INSERT INTO FashionPurchases VALUES
(1, 101),
(2, 102),
(3, 103),
(4, 104);
• - Table creation
CREATE TABLE BeautyPurchases (
CustomerID INT,
ProductID INT
);
• - Datasets
INSERT INTO BeautyPurchases VALUES
(2, 201),
(3, 202),
(4, 203),
(5, 204);

Learnings
• Using UNION ALL to include all records, even duplicates.
• Combining data from different product categories.

Solution (PostgreSQL / MySQL)

309
1000+ SQL Interview Questions & Answers | By Zero Analyst

-- Combine customers who bought from either Fashion or Beauty, including duplicates
SELECT CustomerID
FROM FashionPurchases
UNION ALL
SELECT CustomerID
FROM BeautyPurchases;
• Q.240
Question
You are given three tables:
• ElectronicsPurchases: Contains CustomerID, ProductID, and PurchaseDate for
electronics purchases.
• ClothingPurchases: Contains CustomerID, ProductID, and PurchaseDate for clothing
purchases.
• GroceryPurchases: Contains CustomerID, ProductID, and PurchaseDate for grocery
purchases.
Write a query to identify customer purchase behavior, categorizing them into three categories
based on their activity:
• "Heavy Shopper": Customers who have purchased items from all three categories
(Electronics, Clothing, and Grocery).
• "Seasonal Shopper": Customers who have purchased items from two of the categories.
• "Category-Specific Shopper": Customers who have only purchased from one category.
Additionally:
• Use CTEs to calculate the number of unique categories each customer has purchased from.
• Use CASE to categorize the customer into the three categories.
• Use Set Operations to eliminate customers who are not in the desired categories.

Explanation
• CTEs will help track the number of unique categories each customer has purchased from.
• CASE will categorize customers into one of the three categories based on the count of
unique categories.
• Set Operations (UNION ALL, EXCEPT, etc.) will be used to combine and filter customers.

Datasets and SQL Schemas


• - Table creation
-- Create table ElectronicsPurchases
CREATE TABLE ElectronicsPurchases (
CustomerID INT,
ProductID INT,
PurchaseDate DATE
);

-- Insert data into ElectronicsPurchases


INSERT INTO ElectronicsPurchases VALUES
(1, 101, '2024-01-10'),
(2, 102, '2024-02-05'),
(3, 103, '2024-03-15'),
(1, 104, '2024-03-01'),
(4, 105, '2024-02-28'),
(6, 106, '2024-04-01'),
(7, 107, '2024-04-12'),
(2, 108, '2024-05-20'),

310
1000+ SQL Interview Questions & Answers | By Zero Analyst

(8, 109, '2024-05-25'),


(9, 110, '2024-06-10'),
(3, 111, '2024-06-15'),
(10, 112, '2024-06-25'),
(1, 113, '2024-07-05'),
(11, 114, '2024-07-20'),
(5, 115, '2024-07-25');
• - Table creation
-- Create table ClothingPurchases
CREATE TABLE ClothingPurchases (
CustomerID INT,
ProductID INT,
PurchaseDate DATE
);

-- Insert data into ClothingPurchases


INSERT INTO ClothingPurchases VALUES
(1, 201, '2024-01-20'),
(2, 202, '2024-02-15'),
(3, 203, '2024-03-05'),
(4, 204, '2024-04-10'),
(5, 205, '2024-04-18'),
(6, 206, '2024-04-22'),
(7, 207, '2024-05-02'),
(8, 208, '2024-05-12'),
(9, 209, '2024-05-18'),
(10, 210, '2024-05-25'),
(11, 211, '2024-06-05'),
(12, 212, '2024-06-12'),
(13, 213, '2024-06-22'),
(14, 214, '2024-07-02'),
(15, 215, '2024-07-15');
• - Table creation
-- Create table GroceryPurchases
CREATE TABLE GroceryPurchases (
CustomerID INT,
ProductID INT,
PurchaseDate DATE
);

-- Insert data into GroceryPurchases


INSERT INTO GroceryPurchases VALUES
(1, 301, '2024-01-15'),
(2, 302, '2024-02-18'),
(4, 303, '2024-03-25'),
(5, 304, '2024-04-12'),
(6, 305, '2024-05-01'),
(7, 306, '2024-05-15'),
(8, 307, '2024-06-05'),
(9, 308, '2024-06-10'),
(10, 309, '2024-06-20'),
(11, 310, '2024-07-01'),
(12, 311, '2024-07-03'),
(13, 312, '2024-07-10'),
(14, 313, '2024-07-14'),
(15, 314, '2024-07-20'),
(1, 315, '2024-07-25');

Learnings
• CTEs: Used to calculate and track unique values and aggregations across multiple tables.
• Set Operations: Help combine data from different tables.
• CASE Statements: Categorize customers based on their purchase behaviors.
• Combining Concepts: The question demonstrates the use of multiple concepts (CTEs,
CASE, Set Operations) in a complex scenario.

311
1000+ SQL Interview Questions & Answers | By Zero Analyst

Solution (PostgreSQL / MySQL)


WITH CategoryPurchases AS (
-- Get the count of unique categories each customer purchased from
SELECT CustomerID,
COUNT(DISTINCT CASE WHEN ep.CustomerID IS NOT NULL THEN 'Electronics' END +
CASE WHEN cp.CustomerID IS NOT NULL THEN 'Clothing' END +
CASE WHEN gp.CustomerID IS NOT NULL THEN 'Grocery' END) AS CategoryCou
nt
FROM (SELECT DISTINCT CustomerID FROM ElectronicsPurchases) ep
FULL OUTER JOIN (SELECT DISTINCT CustomerID FROM ClothingPurchases) cp ON ep.Custome
rID = cp.CustomerID
FULL OUTER JOIN (SELECT DISTINCT CustomerID FROM GroceryPurchases) gp ON ep.Customer
ID = gp.CustomerID
GROUP BY CustomerID
)
-- Categorize the customer based on the number of categories they've purchased from
SELECT CustomerID,
CASE
WHEN CategoryCount = 3 THEN 'Heavy Shopper'
WHEN CategoryCount = 2 THEN 'Seasonal Shopper'
WHEN CategoryCount = 1 THEN 'Category-Specific Shopper'
ELSE 'No Purchases'
END AS ShopperCategory
FROM CategoryPurchases
WHERE CategoryCount >= 1;

Explanation of Solution
• CTE (CategoryPurchases):
• This CTE aggregates data from the three purchase tables (ElectronicsPurchases,
ClothingPurchases, and GroceryPurchases) and calculates the number of unique
categories each customer has purchased from. The COUNT(DISTINCT ...) ensures that only
unique categories are counted for each customer.
• The FULL OUTER JOIN is used to ensure we capture customers who may have only
purchased from one or two categories (even if they are missing from some of the tables).
• CASE Statement:
• The CASE is used to categorize customers into "Heavy Shopper", "Seasonal Shopper", or
"Category-Specific Shopper" based on how many categories they have purchased from.
• The condition CategoryCount >= 1 is used in the final WHERE clause to exclude customers
who haven’t purchased from any category.
• Set Operations:
• The FULL OUTER JOIN is conceptually a set operation that ensures we merge customers
from all three tables, even if they don't appear in all three.

Recursive CTE
• Q.241
Question
You are given a table Employees that contains employee information and their direct
manager. Write a query to generate a report showing each employee's name and their
manager's name, starting from the top-level manager and recursively listing employees under
them.

Explanation

312
1000+ SQL Interview Questions & Answers | By Zero Analyst

This query uses a Recursive CTE to first identify the top-level manager (who has no
manager), and then recursively find employees under each manager.

Datasets and SQL Schemas


• - Table creation
CREATE TABLE Employees (
EmployeeID INT PRIMARY KEY,
EmployeeName VARCHAR(100),
ManagerID INT
);
• - Insert data
INSERT INTO Employees VALUES
(1, 'John Doe', NULL),
(2, 'Alice Smith', 1),
(3, 'Bob Brown', 1),
(4, 'Charlie Davis', 2),
(5, 'Eve Harris', 2),
(6, 'Frank Black', 3),
(7, 'Grace White', 3);

MySQL Solution
WITH RECURSIVE EmployeeHierarchy AS (
-- Base case: Start with the top-level manager (EmployeeID 1, John Doe)
SELECT EmployeeID, EmployeeName, ManagerID
FROM Employees
WHERE ManagerID IS NULL
UNION ALL
-- Recursive case: Join to get employees under each manager
SELECT e.EmployeeID, e.EmployeeName, e.ManagerID
FROM Employees e
INNER JOIN EmployeeHierarchy eh ON e.ManagerID = eh.EmployeeID
)
-- Select the employees and their managers
SELECT e.EmployeeName AS Employee, m.EmployeeName AS Manager
FROM EmployeeHierarchy e
LEFT JOIN Employees m ON e.ManagerID = m.EmployeeID
ORDER BY e.EmployeeName;

Postgres Solution
WITH RECURSIVE EmployeeHierarchy AS (
-- Base case: Start with the top-level manager (EmployeeID 1, John Doe)
SELECT EmployeeID, EmployeeName, ManagerID
FROM Employees
WHERE ManagerID IS NULL
UNION ALL
-- Recursive case: Join to get employees under each manager
SELECT e.EmployeeID, e.EmployeeName, e.ManagerID
FROM Employees e
INNER JOIN EmployeeHierarchy eh ON e.ManagerID = eh.EmployeeID
)
-- Select the employees and their managers
SELECT e.EmployeeName AS Employee, m.EmployeeName AS Manager
FROM EmployeeHierarchy e
LEFT JOIN Employees m ON e.ManagerID = m.EmployeeID
ORDER BY e.EmployeeName;
• Q.242
Question
Write a query that uses a Recursive CTE to generate the first 10 numbers of the Fibonacci
sequence (0, 1, 1, 2, 3, 5, 8, 13, 21, 34).

313
1000+ SQL Interview Questions & Answers | By Zero Analyst

Explanation
A Recursive CTE can be used here to generate the Fibonacci sequence by using the previous
two numbers to calculate the next one.

MySQL Solution
WITH RECURSIVE Fibonacci(n, fib_value, prev_value) AS (
-- Base case: The first two Fibonacci numbers (0, 1)
SELECT 1, 0, 0 -- n=1, fib_value=0, prev_value=0
UNION ALL
SELECT 2, 1, 0 -- n=2, fib_value=1, prev_value=0
UNION ALL
-- Recursive case: Calculate the next Fibonacci number
SELECT n + 1, fib_value + prev_value, fib_value
FROM Fibonacci
WHERE n < 10
)
SELECT fib_value
FROM Fibonacci
WHERE n <= 10;

Postgres Solution
WITH RECURSIVE Fibonacci(n, fib_value, prev_value) AS (
-- Base case: The first two Fibonacci numbers (0, 1)
SELECT 1, 0, 0 -- n=1, fib_value=0, prev_value=0
UNION ALL
SELECT 2, 1, 0 -- n=2, fib_value=1, prev_value=0
UNION ALL
-- Recursive case: Calculate the next Fibonacci number
SELECT n + 1, fib_value + prev_value, fib_value
FROM Fibonacci
WHERE n < 10
)
SELECT fib_value
FROM Fibonacci
WHERE n <= 10;
• Q.243
Question
Write a query using a Recursive CTE to generate a date range from '2024-01-01' to '2024-
01-10'. Return all the dates in this range.

Explanation
A Recursive CTE is used here to generate a sequence of dates starting from a specific date
and incrementing by one day for each iteration.

MySQL Solution
WITH RECURSIVE DateRange AS (
-- Base case: Start with '2024-01-01'
SELECT CAST('2024-01-01' AS DATE) AS current_date
UNION ALL
-- Recursive case: Add one day to the current date
SELECT DATE_ADD(current_date, INTERVAL 1 DAY)
FROM DateRange
WHERE current_date < '2024-01-10'

314
1000+ SQL Interview Questions & Answers | By Zero Analyst

)
SELECT current_date
FROM DateRange;

Postgres Solution
WITH RECURSIVE DateRange AS (
-- Base case: Start with '2024-01-01'
SELECT CAST('2024-01-01' AS DATE) AS current_date
UNION ALL
-- Recursive case: Add one day to the current date
SELECT current_date + INTERVAL '1 day'
FROM DateRange
WHERE current_date < '2024-01-10'
)
SELECT current_date
FROM DateRange;

Key Points for Recursive CTEs:


• Base Case: Always start with a query that forms the starting point (the anchor or base
case).
• Recursive Case: The recursive part performs the next iteration of the operation.
• Termination Condition: Ensure there’s a condition to stop recursion (e.g., a WHERE
clause).
• Q.244
Question
Write a query that uses a Recursive CTE to calculate the factorial of a given number (e.g., 5!
= 5 × 4 × 3 × 2 × 1 = 120).

Explanation
The Recursive CTE can be used to multiply numbers starting from the given number and
decrementing down to 1. The factorial is the product of all positive integers up to a given
number.

MySQL Solution
WITH RECURSIVE Factorial(n, result) AS (
-- Base case: Start with 1
SELECT 1, 1
UNION ALL
-- Recursive case: Multiply n by the result of the previous iteration
SELECT n + 1, (n + 1) * result
FROM Factorial
WHERE n < 5 -- For factorial of 5, change this number as needed
)
SELECT result
FROM Factorial
WHERE n = 5; -- To get the factorial of 5

Postgres Solution
WITH RECURSIVE Factorial(n, result) AS (
-- Base case: Start with 1
SELECT 1, 1
UNION ALL
-- Recursive case: Multiply n by the result of the previous iteration
SELECT n + 1, (n + 1) * result

315
1000+ SQL Interview Questions & Answers | By Zero Analyst

FROM Factorial
WHERE n < 5 -- For factorial of 5, change this number as needed
)
SELECT result
FROM Factorial
WHERE n = 5; -- To get the factorial of 5
• Q.245
Question
Given a FamilyMembers table where each person has a ParentID (which references the
PersonID of their parent), write a query to list all descendants of a specific person, say
PersonID = 1, including indirect descendants.

Explanation
A Recursive CTE is ideal for traversing hierarchical structures like family trees. It allows us
to recursively find all descendants, starting with the direct children and then moving down
the tree.

Datasets and SQL Schemas


• - Table creation
CREATE TABLE FamilyMembers (
PersonID INT PRIMARY KEY,
PersonName VARCHAR(100),
ParentID INT
);
• - Insert data
INSERT INTO FamilyMembers VALUES
(1, 'John Doe', NULL),
(2, 'Alice Smith', 1),
(3, 'Bob Brown', 1),
(4, 'Charlie Davis', 2),
(5, 'Eve Harris', 2),
(6, 'Frank Black', 3),
(7, 'Grace White', 3);

MySQL Solution
WITH RECURSIVE FamilyTree AS (
-- Base case: Start with the root person (e.g., PersonID = 1)
SELECT PersonID, PersonName, ParentID
FROM FamilyMembers
WHERE PersonID = 1 -- Change to the person you want to start from
UNION ALL
-- Recursive case: Join to find descendants (children)
SELECT fm.PersonID, fm.PersonName, fm.ParentID
FROM FamilyMembers fm
INNER JOIN FamilyTree ft ON fm.ParentID = ft.PersonID
)
SELECT PersonID, PersonName
FROM FamilyTree;

Postgres Solution
WITH RECURSIVE FamilyTree AS (
-- Base case: Start with the root person (e.g., PersonID = 1)
SELECT PersonID, PersonName, ParentID
FROM FamilyMembers
WHERE PersonID = 1 -- Change to the person you want to start from
UNION ALL

316
1000+ SQL Interview Questions & Answers | By Zero Analyst

-- Recursive case: Join to find descendants (children)


SELECT fm.PersonID, fm.PersonName, fm.ParentID
FROM FamilyMembers fm
INNER JOIN FamilyTree ft ON fm.ParentID = ft.PersonID
)
SELECT PersonID, PersonName
FROM FamilyTree;
• Q.246

Question
Generate an Arithmetic Progression
Question
Write a query to generate an arithmetic progression (e.g., 2, 5, 8, 11, 14, ...) starting from 2
with a common difference of 3 up to the 10th term using a Recursive CTE.

Explanation
A Recursive CTE is well-suited for generating sequences where each term depends on the
previous one. In this case, we can recursively add 3 to the starting number (2) until we reach
the 10th term.

MySQL Solution
WITH RECURSIVE ArithmeticProgression(n, value) AS (
-- Base case: Start with 2
SELECT 1, 2
UNION ALL
-- Recursive case: Add 3 to the previous value
SELECT n + 1, value + 3
FROM ArithmeticProgression
WHERE n < 10 -- Stop at the 10th term
)
-- Select the generated values
SELECT value
FROM ArithmeticProgression;

Postgres Solution
WITH RECURSIVE ArithmeticProgression(n, value) AS (
-- Base case: Start with 2
SELECT 1, 2
UNION ALL
-- Recursive case: Add 3 to the previous value
SELECT n + 1, value + 3
FROM ArithmeticProgression
WHERE n < 10 -- Stop at the 10th term
)
-- Select the generated values
SELECT value
FROM ArithmeticProgression;
• Q.247

Question
Find All Ancestors in a Family Tree
Question

317
1000+ SQL Interview Questions & Answers | By Zero Analyst

Given a FamilyMembers table, write a query to find all ancestors of a specific person (e.g.,
PersonID = 5), including direct and indirect ancestors.

Explanation
This query uses a Recursive CTE to find all ancestors of a person, starting from their direct
parent and then recursively climbing up the family tree.

Datasets and SQL Schemas


• - Table creation
CREATE TABLE FamilyMembers (
PersonID INT PRIMARY KEY,
PersonName VARCHAR(100),
ParentID INT
);
• - Insert data
INSERT INTO FamilyMembers VALUES
(1, 'John Doe', NULL),
(2, 'Alice Smith', 1),
(3, 'Bob Brown', 1),
(4, 'Charlie Davis', 2),
(5, 'Eve Harris', 4),
(6, 'Frank Black', 3),
(7, 'Grace White', 3);

MySQL Solution
WITH RECURSIVE Ancestors AS (
-- Base case: Start with the person whose ancestors we need to find
SELECT PersonID, PersonName, ParentID
FROM FamilyMembers
WHERE PersonID = 5 -- Change to the person ID of interest
UNION ALL
-- Recursive case: Find each person's parent recursively
SELECT fm.PersonID, fm.PersonName, fm.ParentID
FROM FamilyMembers fm
INNER JOIN Ancestors a ON fm.PersonID = a.ParentID
)
-- Final selection of the ancestors
SELECT PersonID, PersonName
FROM Ancestors;

Postgres Solution
WITH RECURSIVE Ancestors AS (
-- Base case: Start with the person whose ancestors we need to find
SELECT PersonID, PersonName, ParentID
FROM FamilyMembers
WHERE PersonID = 5 -- Change to the person ID of interest
UNION ALL
-- Recursive case: Find each person's parent recursively
SELECT fm.PersonID, fm.PersonName, fm.ParentID
FROM FamilyMembers fm
INNER JOIN Ancestors a ON fm.PersonID = a.ParentID
)
-- Final selection of the ancestors
SELECT PersonID, PersonName
FROM Ancestors;
• Q.248
Question

318
1000+ SQL Interview Questions & Answers | By Zero Analyst

Given an Employees table where each employee has a ManagerID, write a query to calculate
the maximum depth (or level) of employees under any manager. The depth is defined as the
number of levels below the top manager.

Explanation
A Recursive CTE is used to traverse the hierarchy, starting from the top-level manager
(those with no ManagerID) and calculating the depth for each level.

Datasets and SQL Schemas


• - Table creation
CREATE TABLE Employees (
EmployeeID INT PRIMARY KEY,
EmployeeName VARCHAR(100),
ManagerID INT
);
• - Insert data
INSERT INTO Employees VALUES
(1, 'John Doe', NULL),
(2, 'Alice Smith', 1),
(3, 'Bob Brown', 1),
(4, 'Charlie Davis', 2),
(5, 'Eve Harris', 2),
(6, 'Frank Black', 3),
(7, 'Grace White', 3),
(8, 'Helen Green', 4);

MySQL Solution
WITH RECURSIVE EmployeeHierarchy AS (
-- Base case: Start with the top-level employees (employees with no manager)
SELECT EmployeeID, EmployeeName, ManagerID, 1 AS Depth
FROM Employees
WHERE ManagerID IS NULL
UNION ALL
-- Recursive case: Join employees with their managers to calculate depth
SELECT e.EmployeeID, e.EmployeeName, e.ManagerID, eh.Depth + 1
FROM Employees e
INNER JOIN EmployeeHierarchy eh ON e.ManagerID = eh.EmployeeID
)
-- Select the maximum depth in the employee hierarchy
SELECT MAX(Depth) AS MaxDepth
FROM EmployeeHierarchy;

Postgres Solution
WITH RECURSIVE EmployeeHierarchy AS (
-- Base case: Start with the top-level employees (employees with no manager)
SELECT EmployeeID, EmployeeName, ManagerID, 1 AS Depth
FROM Employees
WHERE ManagerID IS NULL
UNION ALL
-- Recursive case: Join employees with their managers to calculate depth
SELECT e.EmployeeID, e.EmployeeName, e.ManagerID, eh.Depth + 1
FROM Employees e
INNER JOIN EmployeeHierarchy eh ON e.ManagerID = eh.EmployeeID
)
-- Select the maximum depth in the employee hierarchy
SELECT MAX(Depth) AS MaxDepth
FROM EmployeeHierarchy;
• Q.249

319
1000+ SQL Interview Questions & Answers | By Zero Analyst

Question
Generate All Possible Parent-Child Pairs in a Tree
Question
Given a Categories table, which stores a hierarchical product category structure (where
CategoryID references the parent category via ParentCategoryID), write a query to
generate all possible parent-child pairs, showing the CategoryID of the parent and the
CategoryID of the child.

Explanation
This query uses a Recursive CTE to traverse the category tree and produce all parent-child
combinations in the hierarchy.

Datasets and SQL Schemas


• - Table creation
CREATE TABLE Categories (
CategoryID INT PRIMARY KEY,
CategoryName VARCHAR(100),
ParentCategoryID INT
);
• - Insert data
INSERT INTO Categories VALUES
(1, 'Electronics', NULL),
(2, 'Laptops', 1),
(3, 'Smartphones', 1),
(4, 'Tablets', 1),
(5, 'Gaming Laptops', 2),
(6, 'Android Phones', 3),
(7, 'iPhones', 3);

MySQL Solution
WITH RECURSIVE FamilyTree AS (
-- Base case: Start with the root person (e.g., PersonID = 1)
SELECT PersonID, PersonName, ParentID
FROM FamilyMembers
WHERE PersonID = 1 -- Change to the person you want to start from
UNION ALL
-- Recursive case: Join to find descendants (children)
SELECT fm.PersonID, fm.PersonName, fm.ParentID
FROM FamilyMembers fm
INNER JOIN FamilyTree ft ON fm.ParentID = ft.PersonID
)
-- Select all the descendants in the family tree
SELECT PersonID, PersonName
FROM FamilyTree;

Postgres Solution
WITH RECURSIVE FamilyTree AS (
-- Base case: Start with the root person (e.g., PersonID = 1)
SELECT PersonID, PersonName, ParentID
FROM FamilyMembers
WHERE PersonID = 1 -- Change to the person you want to start from
UNION ALL
-- Recursive case: Join to find descendants (children)
SELECT fm.PersonID, fm.PersonName, fm.ParentID
FROM FamilyMembers fm
INNER JOIN FamilyTree ft ON fm.ParentID = ft.PersonID

320
1000+ SQL Interview Questions & Answers | By Zero Analyst

)
-- Select all the descendants in the family tree
SELECT PersonID, PersonName
FROM FamilyTree;
• Q.250
Question
Given a Tasks table where each task has a TaskID, TaskName, and a PredecessorID (which
references the TaskID of the task that must be completed before the current task), write a
query using Recursive CTEs to find the longest dependency chain (path) of tasks. The result
should include the TaskID and TaskName in the longest path.

Explanation
In this problem, the Recursive CTE will traverse the task dependencies, and for each task, it
will recursively follow the chain of tasks. We are interested in finding the longest
dependency chain of tasks, so we need to track the depth (number of tasks in the path) as we
go through the dependencies. The longest path will be the one with the maximum depth.

Datasets and SQL Schemas


• - Table creation
CREATE TABLE Tasks (
TaskID INT PRIMARY KEY,
TaskName VARCHAR(100),
PredecessorID INT -- references TaskID of the task that needs to be completed first
);
• - Insert data
INSERT INTO Tasks VALUES
(1, 'Task A', NULL),
(2, 'Task B', 1),
(3, 'Task C', 2),
(4, 'Task D', 2),
(5, 'Task E', 3),
(6, 'Task F', 5),
(7, 'Task G', 6),
(8, 'Task H', 4),
(9, 'Task I', 7);

MySQL Solution
WITH RECURSIVE TaskDependencies AS (
-- Base case: Start with tasks that have no predecessors (i.e., TaskID = NULL)
SELECT TaskID, TaskName, PredecessorID, 1 AS PathLength, CAST(TaskName AS CHAR) AS P
ath
FROM Tasks
WHERE PredecessorID IS NULL -- Tasks without any predecessor
UNION ALL
-- Recursive case: Find the next task in the dependency chain
SELECT t.TaskID, t.TaskName, t.PredecessorID, td.PathLength + 1, CONCAT(td.Path, ' -
> ', t.TaskName)
FROM Tasks t
INNER JOIN TaskDependencies td ON t.PredecessorID = td.TaskID
)
SELECT TaskID, TaskName, PathLength, Path
FROM TaskDependencies
WHERE PathLength = (
SELECT MAX(PathLength) FROM TaskDependencies -- Find the longest path
)
ORDER BY PathLength DESC;

321
1000+ SQL Interview Questions & Answers | By Zero Analyst

Postgres Solution
WITH RECURSIVE TaskDependencies AS (
-- Base case: Start with tasks that have no predecessors (i.e., TaskID = NULL)
SELECT TaskID, TaskName, PredecessorID, 1 AS PathLength, CAST(TaskName AS VARCHAR(10
0)) AS Path
FROM Tasks
WHERE PredecessorID IS NULL -- Tasks without any predecessor
UNION ALL
-- Recursive case: Find the next task in the dependency chain
SELECT t.TaskID, t.TaskName, t.PredecessorID, td.PathLength + 1, td.Path || ' -> ' |
| t.TaskName
FROM Tasks t
INNER JOIN TaskDependencies td ON t.PredecessorID = td.TaskID
)
SELECT TaskID, TaskName, PathLength, Path
FROM TaskDependencies
WHERE PathLength = (
SELECT MAX(PathLength) FROM TaskDependencies -- Find the longest path
)
ORDER BY PathLength DESC;

Explanation of the Solution:


• Base Case: We start with the tasks that have no predecessor (i.e., PredecessorID IS
NULL).
• Recursive Case: We recursively join the Tasks table on PredecessorID, keeping track of
the current task's ID, its name, and its position in the dependency chain (PathLength and
Path).
• Tracking Path: We concatenate the TaskName for each task along the chain using
CAST(TaskName AS VARCHAR(100)) and the || operator to accumulate the full path.
• Longest Path: We then select the task paths with the longest PathLength using a
subquery to find the maximum PathLength.

DDL
• Q.251
Question
Write an SQL query to create a table named Employees with the following columns:
• EmployeeID (integer, primary key, auto-increment)
• FirstName (variable character, maximum 50 characters, not null)
• LastName (variable character, maximum 50 characters, not null)
• HireDate (date)

Explanation
Create a table named Employees where the EmployeeID auto-increments, and other columns
have appropriate data types and constraints.

Datasets and SQL Schemas


• - Table creation
• - Sample Data (optional)

Learnings

322
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Using AUTO_INCREMENT (MySQL) or SERIAL (PostgreSQL) for automatically incrementing


primary key values
• Data type specification (e.g., VARCHAR, INT, DATE)
• Constraints like PRIMARY KEY and NOT NULL
• SQL syntax for table creation with auto-increment functionality

Solutions
• - PostgreSQL solution
CREATE TABLE Employees (
EmployeeID SERIAL PRIMARY KEY,
FirstName VARCHAR(50) NOT NULL,
LastName VARCHAR(50) NOT NULL,
HireDate DATE
);
• - MySQL solution
CREATE TABLE Employees (
EmployeeID INT PRIMARY KEY AUTO_INCREMENT,
FirstName VARCHAR(50) NOT NULL,
LastName VARCHAR(50) NOT NULL,
HireDate DATE
);
• Q.252
Question
Write an SQL query to add a column named Email (variable character, maximum 100
characters) to the Employees table.

Explanation
Alter the existing Employees table by adding a new column Email with the specified data
type and size.

Datasets and SQL Schemas


• - Table creation (for reference)
CREATE TABLE Employees (
EmployeeID INT PRIMARY KEY AUTO_INCREMENT,
FirstName VARCHAR(50) NOT NULL,
LastName VARCHAR(50) NOT NULL,
HireDate DATE
);
• - Sample Data (optional)
INSERT INTO Employees (FirstName, LastName, HireDate)
VALUES
('John', 'Doe', '2021-05-01'),
('Jane', 'Smith', '2020-07-15');

Learnings
• Using the ALTER TABLE statement to modify an existing table
• Adding a column with specific data type and size
• SQL syntax for altering tables

Solutions
• - PostgreSQL solution
ALTER TABLE Employees
ADD COLUMN Email VARCHAR(100);
• - MySQL solution
ALTER TABLE Employees

323
1000+ SQL Interview Questions & Answers | By Zero Analyst

ADD COLUMN Email VARCHAR(100);


• Q.253
Question
Write an SQL query to rename the table Employees to Staff.

Explanation
Use the ALTER TABLE statement to rename the Employees table to Staff.

Datasets and SQL Schemas


• - Table creation (for reference)
CREATE TABLE Employees (
EmployeeID INT PRIMARY KEY AUTO_INCREMENT,
FirstName VARCHAR(50) NOT NULL,
LastName VARCHAR(50) NOT NULL,
HireDate DATE
);
• - Sample Data (optional)
INSERT INTO Employees (FirstName, LastName, HireDate)
VALUES
('John', 'Doe', '2021-05-01'),
('Jane', 'Smith', '2020-07-15');

Learnings
• Using the ALTER TABLE statement to rename a table
• SQL syntax for renaming tables

Solutions
• - PostgreSQL solution
ALTER TABLE Employees
RENAME TO Staff;
• - MySQL solution
ALTER TABLE Employees
RENAME TO Staff;
• Q.254
Question
Write an SQL query to change the data type of the HireDate column in the Employees table
to DATETIME.

Explanation
Use the ALTER TABLE statement to modify the data type of the HireDate column from DATE
to DATETIME.

Datasets and SQL Schemas


• - Table creation (for reference)
CREATE TABLE Employees (
EmployeeID INT PRIMARY KEY AUTO_INCREMENT,
FirstName VARCHAR(50) NOT NULL,
LastName VARCHAR(50) NOT NULL,
HireDate DATE
);
• - Sample Data (optional)
INSERT INTO Employees (FirstName, LastName, HireDate)
VALUES

324
1000+ SQL Interview Questions & Answers | By Zero Analyst

('John', 'Doe', '2021-05-01'),


('Jane', 'Smith', '2020-07-15');

Learnings
• Using the ALTER TABLE statement to modify column data types
• Syntax for changing a column’s data type in SQL
• Understanding the difference between DATE and DATETIME data types

Solutions
• - PostgreSQL solution
ALTER TABLE Employees
ALTER COLUMN HireDate TYPE DATETIME;
• - MySQL solution
ALTER TABLE Employees
MODIFY COLUMN HireDate DATETIME;
• Q.255
Question
Write an SQL query to add a primary key constraint to the EmployeeID column in the
Employees table.

Explanation
Use the ALTER TABLE statement to add a primary key constraint to the EmployeeID column
if it doesn't already have one.

Datasets and SQL Schemas


• - Table creation (for reference)
CREATE TABLE Employees (
EmployeeID INT,
FirstName VARCHAR(50) NOT NULL,
LastName VARCHAR(50) NOT NULL,
HireDate DATE
);
• - Sample Data (optional)
INSERT INTO Employees (EmployeeID, FirstName, LastName, HireDate)
VALUES
(1, 'John', 'Doe', '2021-05-01'),
(2, 'Jane', 'Smith', '2020-07-15');

Learnings
• Using the ALTER TABLE statement to add constraints
• Adding a PRIMARY KEY constraint to an existing column
• Understanding the importance of primary key constraints for uniqueness and indexing

Solutions
• - PostgreSQL solution
ALTER TABLE Employees
ADD CONSTRAINT pk_employeeid PRIMARY KEY (EmployeeID);
• - MySQL solution
ALTER TABLE Employees
ADD PRIMARY KEY (EmployeeID);
• Q.256
Question

325
1000+ SQL Interview Questions & Answers | By Zero Analyst

Write an SQL query to create a view named EmployeeNames that includes EmployeeID,
FirstName, and LastName from the Employees table.

Explanation
Use the CREATE VIEW statement to create a view that selects the EmployeeID, FirstName,
and LastName columns from the Employees table.

Datasets and SQL Schemas


• - Table creation (for reference)
CREATE TABLE Employees (
EmployeeID INT PRIMARY KEY AUTO_INCREMENT,
FirstName VARCHAR(50) NOT NULL,
LastName VARCHAR(50) NOT NULL,
HireDate DATE
);
• - Sample Data (optional)
INSERT INTO Employees (FirstName, LastName, HireDate)
VALUES
('John', 'Doe', '2021-05-01'),
('Jane', 'Smith', '2020-07-15');

Learnings
• Using the CREATE VIEW statement to create a view in SQL
• Selecting specific columns from a table in a view
• Views provide a way to simplify queries and encapsulate frequently used logic

Solutions
• - PostgreSQL solution
CREATE VIEW EmployeeNames AS
SELECT EmployeeID, FirstName, LastName
FROM Employees;
• - MySQL solution
CREATE VIEW EmployeeNames AS
SELECT EmployeeID, FirstName, LastName
FROM Employees;
• Q.257
Question
Assume there is a table named Departments with a primary key DepartmentID. Write an
SQL query to add a foreign key constraint to the Employees table, linking the
DepartmentID column to the DepartmentID column in the Departments table.

Explanation
Use the ALTER TABLE statement to add a foreign key constraint on the DepartmentID
column in the Employees table, referencing the DepartmentID column in the Departments
table.

Datasets and SQL Schemas


• - Table creation for Departments
CREATE TABLE Departments (
DepartmentID INT PRIMARY KEY,
DepartmentName VARCHAR(100) NOT NULL
);
• - Table creation for Employees (assuming DepartmentID column doesn't exist yet)

326
1000+ SQL Interview Questions & Answers | By Zero Analyst

CREATE TABLE Employees (


EmployeeID INT PRIMARY KEY AUTO_INCREMENT,
FirstName VARCHAR(50) NOT NULL,
LastName VARCHAR(50) NOT NULL,
HireDate DATE,
DepartmentID INT
);
• - Sample Data for Departments
INSERT INTO Departments (DepartmentID, DepartmentName)
VALUES
(1, 'Human Resources'),
(2, 'Engineering');
• - Sample Data for Employees (assuming the DepartmentID column exists)
INSERT INTO Employees (EmployeeID, FirstName, LastName, HireDate, DepartmentID)
VALUES
(1, 'John', 'Doe', '2021-05-01', 1),
(2, 'Jane', 'Smith', '2020-07-15', 2);

Learnings
• Adding a foreign key constraint to enforce referential integrity
• Understanding how foreign keys link columns between tables
• Using ALTER TABLE to add foreign key constraints

Solutions
• - PostgreSQL solution
ALTER TABLE Employees
ADD CONSTRAINT fk_department
FOREIGN KEY (DepartmentID)
REFERENCES Departments (DepartmentID);
• - MySQL solution
ALTER TABLE Employees
ADD CONSTRAINT fk_department
FOREIGN KEY (DepartmentID)
REFERENCES Departments (DepartmentID);
• Q.258
Question
Write an SQL query to create a non-unique index on the LastName column in the
Employees table to improve search performance.

Explanation
Use the CREATE INDEX statement to create a non-unique index on the LastName column in
the Employees table, which will improve the performance of queries that search by
LastName.

Datasets and SQL Schemas


• - Table creation (for reference)
CREATE TABLE Employees (
EmployeeID INT PRIMARY KEY AUTO_INCREMENT,
FirstName VARCHAR(50) NOT NULL,
LastName VARCHAR(50) NOT NULL,
HireDate DATE
);
• - Sample Data (optional)
INSERT INTO Employees (FirstName, LastName, HireDate)
VALUES
('John', 'Doe', '2021-05-01'),
('Jane', 'Smith', '2020-07-15'),
('Alice', 'Johnson', '2019-11-20');

327
1000+ SQL Interview Questions & Answers | By Zero Analyst

Learnings
• Using CREATE INDEX to improve query performance
• Understanding the difference between unique and non-unique indexes
• Indexes speed up SELECT queries but can slow down INSERT/UPDATE operations

Solutions
• - PostgreSQL solution
CREATE INDEX idx_lastname ON Employees (LastName);
• - MySQL solution
CREATE INDEX idx_lastname ON Employees (LastName);
• Q.259
Question
Write an SQL query to remove the Email column from the Employees table.

Explanation
Use the ALTER TABLE statement with the DROP COLUMN clause to remove the Email column
from the Employees table.

Datasets and SQL Schemas


• - Table creation (for reference)
CREATE TABLE Employees (
EmployeeID INT PRIMARY KEY AUTO_INCREMENT,
FirstName VARCHAR(50) NOT NULL,
LastName VARCHAR(50) NOT NULL,
HireDate DATE,
Email VARCHAR(100)
);
• - Sample Data (optional)
INSERT INTO Employees (FirstName, LastName, HireDate, Email)
VALUES
('John', 'Doe', '2021-05-01', '[email protected]'),
('Jane', 'Smith', '2020-07-15', '[email protected]');

Learnings
• Using ALTER TABLE with DROP COLUMN to remove a column
• SQL syntax for modifying tables
• When removing a column, ensure that the column is not needed for any other constraints or
relationships

Solutions
• - PostgreSQL solution
ALTER TABLE Employees
DROP COLUMN Email;
• - MySQL solution
ALTER TABLE Employees
DROP COLUMN Email;
• Q.260
Question
Write an SQL query to create a table named Projects with the following specifications:
• ProjectID (integer, primary key, auto-increment)
• ProjectName (variable character, maximum 100 characters, not null, unique)

328
1000+ SQL Interview Questions & Answers | By Zero Analyst

• StartDate (date, not null)


• EndDate (date, must be after StartDate)

Explanation
Create a table named Projects with the specified columns, adding constraints like PRIMARY
KEY, NOT NULL, UNIQUE, and ensuring that EndDate is after StartDate using a check
constraint.

Datasets and SQL Schemas


• - Table creation
CREATE TABLE Projects (
ProjectID INT PRIMARY KEY AUTO_INCREMENT,
ProjectName VARCHAR(100) NOT NULL UNIQUE,
StartDate DATE NOT NULL,
EndDate DATE NOT NULL,
CHECK (EndDate > StartDate)
);
• - Sample Data (optional)
INSERT INTO Projects (ProjectName, StartDate, EndDate)
VALUES
('Project A', '2023-01-01', '2023-12-31'),
('Project B', '2022-06-01', '2023-06-01');

Learnings
• Using constraints such as PRIMARY KEY, NOT NULL, UNIQUE, and CHECK
• How to enforce data integrity by ensuring EndDate is after StartDate
• Syntax for creating tables with multiple constraints

Solutions
• - PostgreSQL solution
CREATE TABLE Projects (
ProjectID SERIAL PRIMARY KEY,
ProjectName VARCHAR(100) NOT NULL UNIQUE,
StartDate DATE NOT NULL,
EndDate DATE NOT NULL,
CONSTRAINT check_enddate CHECK (EndDate > StartDate)
);
• - MySQL solution
CREATE TABLE Projects (
ProjectID INT PRIMARY KEY AUTO_INCREMENT,
ProjectName VARCHAR(100) NOT NULL UNIQUE,
StartDate DATE NOT NULL,
EndDate DATE NOT NULL,
CHECK (EndDate > StartDate)
);

DML
• Q.261
Question
Write an SQL query to update the JobTitle of the employee with EmployeeID = 3 to 'Senior
Software Engineer' in the Employees table.

Explanation
Use the UPDATE statement to modify the JobTitle for the employee with EmployeeID = 3.

329
1000+ SQL Interview Questions & Answers | By Zero Analyst

Datasets and SQL Schemas


• - Table creation (for reference)
CREATE TABLE Employees (
EmployeeID INT PRIMARY KEY AUTO_INCREMENT,
FirstName VARCHAR(50) NOT NULL,
LastName VARCHAR(50) NOT NULL,
JobTitle VARCHAR(100),
HireDate DATE
);
• - Sample Data (optional)
INSERT INTO Employees (FirstName, LastName, JobTitle, HireDate)
VALUES
(1, 'John', 'Doe', 'Software Engineer', '2021-05-01'),
(2, 'Jane', 'Smith', 'Data Analyst', '2020-07-15'),
(3, 'Alice', 'Johnson', 'Software Engineer', '2019-11-20');

Learnings
• Using the UPDATE statement to modify data in a table
• Specifying the row to update using the WHERE clause
• Avoiding accidental updates to all rows by ensuring the correct condition in the WHERE
clause

Solutions
• - PostgreSQL solution
UPDATE Employees
SET JobTitle = 'Senior Software Engineer'
WHERE EmployeeID = 3;
• - MySQL solution
UPDATE Employees
SET JobTitle = 'Senior Software Engineer'
WHERE EmployeeID = 3;
• Q.262
Question
Write an SQL query to insert a new employee into the Employees table with the following
details:
• FirstName: 'Michael'
• LastName: 'Taylor'
• JobTitle: 'UX Designer'
• HireDate: '2023-01-15'

Explanation
Use the INSERT INTO statement to add a new record into the Employees table with the
provided values.

Datasets and SQL Schemas


• - Table creation (for reference)
CREATE TABLE Employees (
EmployeeID INT PRIMARY KEY AUTO_INCREMENT,
FirstName VARCHAR(50) NOT NULL,
LastName VARCHAR(50) NOT NULL,
JobTitle VARCHAR(100),
HireDate DATE
);
• - Sample Data (optional)

330
1000+ SQL Interview Questions & Answers | By Zero Analyst

INSERT INTO Employees (FirstName, LastName, JobTitle, HireDate)


VALUES
('John', 'Doe', 'Software Engineer', '2021-05-01'),
('Jane', 'Smith', 'Data Analyst', '2020-07-15');

Learnings
• Using the INSERT INTO statement to add new rows into a table
• Inserting multiple columns of data at once
• SQL syntax for specifying values for each column

Solutions
• - PostgreSQL solution
INSERT INTO Employees (FirstName, LastName, JobTitle, HireDate)
VALUES ('Michael', 'Taylor', 'UX Designer', '2023-01-15');
• - MySQL solution
INSERT INTO Employees (FirstName, LastName, JobTitle, HireDate)
VALUES ('Michael', 'Taylor', 'UX Designer', '2023-01-15');
• Q.263
Question
Write a query to delete all customers from the Customers table who have not placed an order
in the last two years. Assume there is an Orders table with a CustomerID and OrderDate.

Explanation
Use a subquery in the DELETE statement to identify CustomerID values that have placed an
order within the last two years. Then delete all customers whose CustomerID does not
appear in the results of that subquery.

Datasets and SQL Schemas


• - Table creation for Customers
CREATE TABLE Customers (
CustomerID INT PRIMARY KEY,
FirstName VARCHAR(50),
LastName VARCHAR(50),
Email VARCHAR(100)
);
• - Table creation for Orders
CREATE TABLE Orders (
OrderID INT PRIMARY KEY,
CustomerID INT,
OrderDate DATE,
FOREIGN KEY (CustomerID) REFERENCES Customers(CustomerID)
);
• - Sample Data for Customers (optional)
INSERT INTO Customers (CustomerID, FirstName, LastName, Email)
VALUES
(1, 'John', 'Doe', '[email protected]'),
(2, 'Jane', 'Smith', '[email protected]'),
(3, 'Alice', 'Johnson', '[email protected]'),
(4, 'Michael', 'Taylor', '[email protected]'),
(5, 'Sarah', 'Williams', '[email protected]'),
(6, 'David', 'Brown', '[email protected]');
• - Sample Data for Orders (optional)
INSERT INTO Orders (OrderID, CustomerID, OrderDate)
VALUES
(1, 1, '2023-01-15'),
(2, 2, '2022-06-10'),
(3, 3, '2021-05-20'),
(4, 4, '2020-07-15'),

331
1000+ SQL Interview Questions & Answers | By Zero Analyst

(5, 2, '2021-09-22'),
(6, 5, '2020-12-01'),
(7, 6, '2023-03-10'),
(8, 1, '2021-02-25');

Learnings
• Using a subquery to filter rows for deletion
• Deleting rows based on conditions from another table
• SQL syntax for deleting data with complex conditions

Solutions
• - PostgreSQL solution
DELETE FROM Customers
WHERE CustomerID NOT IN (
SELECT DISTINCT CustomerID
FROM Orders
WHERE OrderDate >= CURRENT_DATE - INTERVAL '2 years'
);
• - MySQL solution
DELETE FROM Customers
WHERE CustomerID NOT IN (
SELECT DISTINCT CustomerID
FROM Orders
WHERE OrderDate >= CURDATE() - INTERVAL 2 YEAR
);
• Q.264
Question
Write a query to insert a new employee into the Employees table with the following details:
• EmployeeID = 101
• FirstName = 'John'
• LastName = 'Doe'
• HireDate = '2025-01-14'

Explanation
Use the INSERT INTO statement to add a new employee record with the specified details into
the Employees table.

Datasets and SQL Schemas


• - Table creation for Employees
CREATE TABLE Employees (
EmployeeID INT PRIMARY KEY,
FirstName VARCHAR(50) NOT NULL,
LastName VARCHAR(50) NOT NULL,
HireDate DATE
);
• - Sample Data (optional)
INSERT INTO Employees (EmployeeID, FirstName, LastName, HireDate)
VALUES
(1, 'John', 'Doe', '2021-05-01'),
(2, 'Jane', 'Smith', '2020-07-15');

Learnings
• Using the INSERT INTO statement to add a specific row to a table
• Providing values for all columns when inserting new records

332
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Ensuring the correct data types for each column

Solutions
• - PostgreSQL solution
INSERT INTO Employees (EmployeeID, FirstName, LastName, HireDate)
VALUES (101, 'John', 'Doe', '2025-01-14');
• - MySQL solution
INSERT INTO Employees (EmployeeID, FirstName, LastName, HireDate)
VALUES (101, 'John', 'Doe', '2025-01-14');
• Q.265
Question
Insert New Car Model
Write an SQL query to insert a new car model into the Cars table with the following details:
• ModelID = 501
• ModelName = 'Model X'
• ReleaseYear = 2024
• Price = 89999.99
• Status = 'Available'

Explanation
Use the INSERT INTO statement to add a new record into the Cars table with the provided
values.

Datasets and SQL Schemas


• - Table creation
CREATE TABLE Cars (
ModelID INT,
ModelName VARCHAR(255),
ReleaseYear INT,
Price DECIMAL(10, 2),
Status VARCHAR(50)
);
• - Datasets
INSERT INTO Cars (ModelID, ModelName, ReleaseYear, Price, Status)
VALUES
(501, 'Model X', 2024, 89999.99, 'Available');

Learnings
• Understanding the INSERT INTO statement for adding records.
• Using proper data types for different fields (e.g., INT, VARCHAR, DECIMAL).
• How to handle string, numeric, and date-based data in SQL.

Solutions
• - PostgreSQL solution
INSERT INTO Cars (ModelID, ModelName, ReleaseYear, Price, Status)
VALUES (501, 'Model X', 2024, 89999.99, 'Available');
• - MySQL solution
INSERT INTO Cars (ModelID, ModelName, ReleaseYear, Price, Status)
VALUES (501, 'Model X', 2024, 89999.99, 'Available');
• Q.266
Question

333
1000+ SQL Interview Questions & Answers | By Zero Analyst

Update Car Price


Write an SQL query to update the Price of the car model with ModelID = 502 to 74999.99
in the Cars table.

Explanation
Use the UPDATE statement to modify the Price column of the car model where ModelID is
502.

Datasets and SQL Schemas


• - Table creation
CREATE TABLE Cars (
ModelID INT,
ModelName VARCHAR(255),
ReleaseYear INT,
Price DECIMAL(10, 2),
Status VARCHAR(50)
);
• - Datasets
INSERT INTO Cars (ModelID, ModelName, ReleaseYear, Price, Status)
VALUES
(502, 'Model Y', 2023, 79999.99, 'Sold Out');

Learnings
• Using the UPDATE statement to modify existing records.
• Filtering records using WHERE clause to target specific rows.
• Understanding how to modify specific columns in a table.

Solutions
• - PostgreSQL solution
UPDATE Cars
SET Price = 74999.99
WHERE ModelID = 502;
• - MySQL solution
UPDATE Cars
SET Price = 74999.99
WHERE ModelID = 502;
• Q.267
Question
Delete Outdated Car Models
Write an SQL query to delete all cars from the Cars table that were released before 2019.

Explanation
Use the DELETE statement with a WHERE clause to remove records from the Cars table where
the ReleaseYear is earlier than 2019.

Datasets and SQL Schemas


• - Table creation
CREATE TABLE Cars (
ModelID INT,
ModelName VARCHAR(255),
ReleaseYear INT,
Price DECIMAL(10, 2),
Status VARCHAR(50)

334
1000+ SQL Interview Questions & Answers | By Zero Analyst

);
• - Datasets
INSERT INTO Cars (ModelID, ModelName, ReleaseYear, Price, Status)
VALUES
(601, 'Model A', 2017, 45999.99, 'Discontinued'),
(602, 'Model B', 2019, 50999.99, 'Available'),
(603, 'Model C', 2018, 47999.99, 'Discontinued');

Learnings
• Using the DELETE statement to remove records from a table.
• Applying conditions with WHERE to target specific rows for deletion.
• Managing data retention based on business rules (e.g., removing outdated records).

Solutions
• - PostgreSQL solution
DELETE FROM Cars
WHERE ReleaseYear < 2019;
• - MySQL solution
DELETE FROM Cars
WHERE ReleaseYear < 2019;
• Q.268
Question
Update Product Stock Based on Sales
Write an SQL query to update the Stock of all products in the Products table by reducing the
stock by the quantity sold in the Sales table. Assume the Sales table contains ProductID and
QuantitySold columns, and the Products table contains ProductID and Stock columns.

Explanation
Use an UPDATE statement combined with a JOIN to update the Stock in the Products table by
subtracting the QuantitySold from the Sales table.

Datasets and SQL Schemas


• - Table creation
CREATE TABLE Products (
ProductID INT,
ProductName VARCHAR(255),
Stock INT
);

CREATE TABLE Sales (


SaleID INT,
ProductID INT,
QuantitySold INT
);
• - Datasets
INSERT INTO Products (ProductID, ProductName, Stock)
VALUES
(1, 'Product A', 100),
(2, 'Product B', 150),
(3, 'Product C', 200);

INSERT INTO Sales (SaleID, ProductID, QuantitySold)


VALUES
(101, 1, 10),
(102, 2, 25),
(103, 3, 50);

335
1000+ SQL Interview Questions & Answers | By Zero Analyst

Learnings
• Using UPDATE with JOIN to modify values based on related data in another table.
• Subtracting values from columns using arithmetic operators ().
• Handling updates across multiple tables in a relational database.

Solutions
• - PostgreSQL solution
UPDATE Products p
SET Stock = p.Stock - COALESCE(s.QuantitySold, 0)
FROM Sales s
WHERE p.ProductID = s.ProductID;
• - MySQL solution
UPDATE Products p
JOIN Sales s ON p.ProductID = s.ProductID
SET p.Stock = p.Stock - s.QuantitySold;
• Q.269
Question
Delete Out-of-Stock Products Older Than 5 Years
Write an SQL query to delete all products from the Products table that are out of stock (i.e.,
Stock = 0) and have not been sold in the last 5 years. Assume the Products table contains
ProductID, ProductName, Stock, and LastSoldDate columns, and the Sales table contains
ProductID and SaleDate.

Explanation
Use a DELETE statement with a combination of WHERE clauses, checking for products with
Stock = 0 and no sales for the last 5 years by comparing LastSoldDate with the current
date.

Datasets and SQL Schemas


• - Table creation
CREATE TABLE Products (
ProductID INT,
ProductName VARCHAR(255),
Stock INT,
LastSoldDate DATE
);

CREATE TABLE Sales (


SaleID INT,
ProductID INT,
SaleDate DATE
);
• - Datasets
INSERT INTO Products (ProductID, ProductName, Stock, LastSoldDate)
VALUES
(1, 'Product A', 0, '2017-06-15'),
(2, 'Product B', 10, '2021-04-22'),
(3, 'Product C', 0, '2015-01-10');

INSERT INTO Sales (SaleID, ProductID, SaleDate)


VALUES
(101, 1, '2017-06-15'),
(102, 2, '2021-04-22');

Learnings
• Combining multiple conditions in a WHERE clause.

336
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Using DELETE to remove records based on conditions from related tables.


• Working with date functions to filter records based on time intervals (e.g., last 5 years).
• Leveraging LEFT JOIN to check for related records and handle missing sales data.

Solutions
• - PostgreSQL solution
DELETE FROM Products p
WHERE p.Stock = 0
AND (p.LastSoldDate IS NULL OR p.LastSoldDate < CURRENT_DATE - INTERVAL '5 years')
AND NOT EXISTS (
SELECT 1 FROM Sales s WHERE s.ProductID = p.ProductID AND s.SaleDate >= CURRENT_DATE
- INTERVAL '5 years'
);
• - MySQL solution
DELETE p
FROM Products p
LEFT JOIN Sales s ON p.ProductID = s.ProductID
WHERE p.Stock = 0
AND (p.LastSoldDate IS NULL OR p.LastSoldDate < CURDATE() - INTERVAL 5 YEAR)
AND s.SaleDate IS NULL OR s.SaleDate < CURDATE() - INTERVAL 5 YEAR;
• Q.270
Question
Delete Duplicate Employees Based on Email
Write an SQL query to delete all duplicate employee records from the Employees table,
keeping only one record for each unique email. Assume the Employees table contains
EmployeeID, EmployeeName, and Email columns.

Explanation
Use a DELETE statement combined with a CTID to identify and remove duplicate records
based on the Email column, keeping only the first occurrence.

Datasets and SQL Schemas


• - Table creation
CREATE TABLE Employees (
EmployeeID INT,
EmployeeName VARCHAR(255),
Email VARCHAR(255)
);
• - Datasets
-- Table creation
CREATE TABLE Employees (
EmployeeID INT,
EmployeeName VARCHAR(255),
Email VARCHAR(255)
);

-- Insert data into Employees table with some duplicates


INSERT INTO Employees (EmployeeID, EmployeeName, Email)
VALUES
(1, 'Amit Sharma', '[email protected]'),
(2, 'Priya Patel', '[email protected]'),
(3, 'Amit Sharma', '[email protected]'),
(4, 'Rajesh Kumar', '[email protected]'),
(5, 'Amit Sharma', '[email protected]'),
(6, 'Neha Gupta', '[email protected]'),
(7, 'Ravi Mehra', '[email protected]'),
(8, 'Amit Verma', '[email protected]'),
(9, 'Sandeep Singh', '[email protected]'),
(10, 'Kavita Reddy', '[email protected]'),

337
1000+ SQL Interview Questions & Answers | By Zero Analyst

(11, 'Ravi Mehra', '[email protected]');

Learnings
• Removing duplicate records based on a unique column like Email.
• Using the CTID in PostgreSQL to uniquely identify rows in a table.
• Leveraging window functions and CTID for efficient deletion of duplicates.

Solutions
• - PostgreSQL solution
WITH DuplicateEmployees AS (
SELECT MIN(CTID) AS keep_ctid, Email
FROM Employees
GROUP BY Email
)
DELETE FROM Employees
WHERE CTID NOT IN (
SELECT keep_ctid FROM DuplicateEmployees
)
AND Email IN (
SELECT Email FROM DuplicateEmployees
);

This query works as follows:


• The WITH clause (CTE) creates a list of emails and their corresponding minimum CTID,
effectively selecting the first occurrence of each unique email.
• The DELETE statement then deletes all records that have the same Email but a CTID
different from the one selected in the WITH clause.

MySQL solution
DELETE e FROM Employees e
JOIN (
SELECT MIN(EmployeeID) AS EmployeeID, Email
FROM Employees
GROUP BY Email
) AS first_occurrence
ON e.Email = first_occurrence.Email
WHERE e.EmployeeID > first_occurrence.EmployeeID;

Data Cleaning
• Q.271
Question
Handle Missing Values in Customer Data
Write an SQL query to update all NULL values in the Email column of the Customers table
with a default value of '[email protected]'.

Explanation
Use the UPDATE statement combined with the IS NULL condition to find and update rows
where the Email is NULL, and set a default value for those records.

Datasets and SQL Schemas


• - Table creation
CREATE TABLE Customers (
CustomerID INT,
CustomerName VARCHAR(255),

338
1000+ SQL Interview Questions & Answers | By Zero Analyst

Email VARCHAR(255)
);
• - Datasets
INSERT INTO Customers (CustomerID, CustomerName, Email)
VALUES
(1, 'John Doe', '[email protected]'),
(2, 'Jane Smith', NULL),
(3, 'Sam Johnson', '[email protected]'),
(4, 'Mike Brown', NULL),
(5, 'Emily Davis', '[email protected]');

Learnings
• Handling missing or NULL values in SQL.
• Using the IS NULL condition to identify missing data.
• Applying the UPDATE statement to modify data in a specific column.

Solutions
• - PostgreSQL solution
UPDATE Customers
SET Email = '[email protected]'
WHERE Email IS NULL;
• - MySQL solution
UPDATE Customers
SET Email = '[email protected]'
WHERE Email IS NULL;
• Q.272
Question
Remove Duplicate Orders
Write an SQL query to delete all duplicate orders in the Orders table where both the
CustomerID and OrderDate are identical. Keep only the first instance of each duplicated
order.

Explanation
Use the DELETE statement combined with a JOIN to identify and remove rows with duplicate
CustomerID and OrderDate, keeping only the first instance of each duplicated order.

Datasets and SQL Schemas


• - Table creation
CREATE TABLE Orders (
OrderID INT,
CustomerID INT,
OrderDate DATE,
OrderAmount DECIMAL(10, 2)
);
• - Datasets
INSERT INTO Orders (OrderID, CustomerID, OrderDate, OrderAmount)
VALUES
(1, 101, '2023-01-15', 2500.00),
(2, 102, '2023-01-16', 1500.00),
(3, 101, '2023-01-15', 2500.00),
(4, 103, '2023-01-18', 5000.00),
(5, 104, '2023-01-17', 1200.00),
(6, 102, '2023-01-16', 1500.00),
(7, 101, '2023-01-15', 2500.00),
(8, 105, '2023-01-20', 3200.00);

339
1000+ SQL Interview Questions & Answers | By Zero Analyst

Learnings
• Removing duplicate rows based on multiple columns (CustomerID, OrderDate).
• Using JOIN to identify duplicates across multiple records.
• Understanding how to keep the first occurrence of a duplicate and remove the rest.

Solutions
• - PostgreSQL solution
WITH DuplicateOrders AS (
SELECT MIN(OrderID) AS keep_orderid, CustomerID, OrderDate
FROM Orders
GROUP BY CustomerID, OrderDate
)
DELETE FROM Orders
WHERE OrderID NOT IN (
SELECT keep_orderid FROM DuplicateOrders
)
AND (CustomerID, OrderDate) IN (
SELECT CustomerID, OrderDate FROM DuplicateOrders
);
• - MySQL solution
DELETE o FROM Orders o
JOIN (
SELECT MIN(OrderID) AS keep_orderid, CustomerID, OrderDate
FROM Orders
GROUP BY CustomerID, OrderDate
) AS first_occurrence
ON o.CustomerID = first_occurrence.CustomerID
AND o.OrderDate = first_occurrence.OrderDate
WHERE o.OrderID > first_occurrence.keep_orderid;
• Q.273
Question
Write an SQL query to find and update all PhoneNumber values in the Customers table that
do not follow the standard UK phone format (i.e., must start with +44 and be 11 digits long)
to 'Invalid'.

Explanation
Use the UPDATE statement with a WHERE clause to identify rows where the PhoneNumber does
not match the standard UK phone format (+44 followed by 9 digits). This can be achieved
using pattern matching or regular expressions, depending on the SQL database being used.

Datasets and SQL Schemas


• - Table creation
CREATE TABLE Customers (
CustomerID INT,
CustomerName VARCHAR(255),
PhoneNumber VARCHAR(20)
);
• - Datasets (with phone numbers)
INSERT INTO Customers (CustomerID, CustomerName, PhoneNumber)
VALUES
(1, 'Amit Sharma', '+441234567890'),
(2, 'Priya Patel', '+442345678901'),
(3, 'Rajesh Kumar', '1234567890'),
(4, 'Neha Gupta', '+44345678901'),
(5, 'Ravi Mehra', '9876543210'),
(6, 'Kavita Reddy', '+4412345678');

Learnings

340
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Using pattern matching (LIKE or regular expressions) to validate data formats.


• Updating records based on matching patterns.
• Correcting invalid data formats using conditional statements.

Solutions

PostgreSQL solution
PostgreSQL supports regular expressions through the SIMILAR TO or ~ operator, which can
be used to match the phone format.
UPDATE Customers
SET PhoneNumber = 'Invalid'
WHERE PhoneNumber !~ '^\+44\d{9}$';
• ^\+44\d{9}$ checks that the phone number starts with +44 and is followed by exactly 9
digits.

MySQL solution
MySQL supports REGEXP for regular expressions.
UPDATE Customers
SET PhoneNumber = 'Invalid'
WHERE PhoneNumber NOT REGEXP '^\\+44[0-9]{9}$';
• ^\\+44[0-9]{9}$ ensures that the phone number starts with +44 and is followed by
exactly 9 digits.
• In MySQL, we need to escape the + symbol with a double backslash (\\).

Notes
• This solution assumes the phone numbers are stored as strings (VARCHAR) in the table.
• Regular expressions can vary in syntax across different databases, so make sure to adjust
based on your SQL platform's capabilities.
• Q.274
Question
Remove Outdated Products
Write an SQL query to delete all products from the Products table that have not been sold in
the last 2 years. Assume the Products table has a LastSoldDate column.

Explanation
Use the DELETE statement to remove records where the LastSoldDate is older than 2 years
compared to the current date. This can be achieved using a date comparison in the WHERE
clause.

Datasets and SQL Schemas


• - Table creation
CREATE TABLE Products (
ProductID INT,
ProductName VARCHAR(255),
LastSoldDate DATE
);
• - Datasets
INSERT INTO Products (ProductID, ProductName, LastSoldDate)
VALUES

341
1000+ SQL Interview Questions & Answers | By Zero Analyst

(1, 'Product A', '2024-05-10'),


(2, 'Product B', '2024-03-15'),
(3, 'Product C', '2019-11-20'),
(4, 'Product D', '2023-01-25'),
(5, 'Product E', '2021-07-30');

Learnings
• Using date functions (CURRENT_DATE or CURDATE()) to compare dates.
• Deleting records based on specific date conditions.
• Ensuring accurate date comparisons using SQL date operators.

Solutions

PostgreSQL solution
PostgreSQL uses CURRENT_DATE to get the current date.
DELETE FROM Products
WHERE LastSoldDate < CURRENT_DATE - INTERVAL '2 years';
• CURRENT_DATE - INTERVAL '2 years' calculates the date 2 years ago from the current
date and deletes any records where LastSoldDate is earlier than this.

MySQL solution
MySQL uses CURDATE() to get the current date.
DELETE FROM Products
WHERE LastSoldDate < CURDATE() - INTERVAL 2 YEAR;
• CURDATE() - INTERVAL 2 YEAR calculates the date 2 years ago from today and deletes
any records where LastSoldDate is earlier than this.

Notes
• This solution assumes LastSoldDate is stored in the DATE format.
• Date functions like CURRENT_DATE in PostgreSQL and CURDATE() in MySQL help in
comparing dates directly.
• Q.275
Question
Normalize Customer Names
Write an SQL query to convert all FirstName and LastName values in the Customers table
to proper case (e.g., 'john' → 'John').

Explanation
Use the UPDATE statement combined with string functions such as UPPER() and LOWER() to
capitalize the first letter of each name and convert the rest to lowercase, thereby converting
them to proper case.

Datasets and SQL Schemas


• - Table creation
CREATE TABLE Customers (
CustomerID INT,
FirstName VARCHAR(255),
LastName VARCHAR(255)
);

342
1000+ SQL Interview Questions & Answers | By Zero Analyst

• - Datasets
INSERT INTO Customers (CustomerID, FirstName, LastName)
VALUES
(1, 'john', 'doe'),
(2, 'jane', 'smith'),
(3, 'rajesh', 'kumar'),
(4, 'neha', 'gupta'),
(5, 'ravi', 'mehra');

Learnings
• Using string functions such as UPPER(), LOWER(), and INITCAP() to manipulate text case.
• Updating data in a table based on text transformation.

Solutions

PostgreSQL solution
PostgreSQL provides the INITCAP() function to convert text to proper case.
UPDATE Customers
SET FirstName = INITCAP(LOWER(FirstName)),
LastName = INITCAP(LOWER(LastName));
• LOWER(FirstName) converts the first name to lowercase, and INITCAP() capitalizes the
first letter of each word.

MySQL solution
MySQL does not have a direct INITCAP() function, so we can use a combination of
CONCAT(), UPPER(), and LOWER() functions to achieve proper case.
UPDATE Customers
SET FirstName = CONCAT(UPPER(SUBSTRING(FirstName, 1, 1)), LOWER(SUBSTRING(FirstName, 2))
),
LastName = CONCAT(UPPER(SUBSTRING(LastName, 1, 1)), LOWER(SUBSTRING(LastName, 2)));
• UPPER(SUBSTRING(..., 1, 1)) capitalizes the first letter, and LOWER(SUBSTRING(...,
2)) ensures the rest are in lowercase.

Notes
• The PostgreSQL solution utilizes the built-in INITCAP() function, which makes it
straightforward.
• In MySQL, we use a combination of SUBSTRING(), UPPER(), and LOWER() to achieve the
same result as INITCAP().
• Q.276
Question
Remove Inconsistent Date Formats
Write an SQL query to standardize the OrderDate column in the Orders table, ensuring all
dates are in the 'YYYY-MM-DD' format. Assume the OrderDate column may contain
inconsistent date formats.

Explanation
Use the UPDATE statement combined with date functions such as STR_TO_DATE() in MySQL
or TO_DATE() in PostgreSQL to convert the OrderDate column values to a consistent date
format ('YYYY-MM-DD').

343
1000+ SQL Interview Questions & Answers | By Zero Analyst

Datasets and SQL Schemas


• - Table creation
CREATE TABLE Orders (
OrderID INT,
OrderDate VARCHAR(255)
);
• - Datasets (with inconsistent date formats)
INSERT INTO Orders (OrderID, OrderDate)
VALUES
(1, '2023/01/15'),
(2, '15-02-2023'),
(3, '2023-03-20'),
(4, 'April 25, 2023'),
(5, '2023-05-10'),
(6, '03/06/2023');

Learnings
• Handling inconsistent date formats in SQL.
• Using STR_TO_DATE() (MySQL) and TO_DATE() (PostgreSQL) to convert string
representations of dates into standardized date formats.
• Ensuring consistency in date storage and representation.

Solutions

PostgreSQL solution
PostgreSQL uses the TO_DATE() function to convert string data into the DATE format.
UPDATE Orders
SET OrderDate = TO_DATE(OrderDate, 'YYYY-MM-DD')
WHERE OrderDate IS NOT NULL;
• TO_DATE(OrderDate, 'YYYY-MM-DD') converts the OrderDate to the 'YYYY-MM-DD'
format.
For more complex formats (like 'April 25, 2023' or '15-02-2023'), PostgreSQL can
handle those too by using the correct format pattern:
UPDATE Orders
SET OrderDate = TO_DATE(OrderDate, 'YYYY-MM-DD')
WHERE OrderDate IS NOT NULL;

UPDATE Orders
SET OrderDate = TO_DATE(OrderDate, 'DD-MM-YYYY')
WHERE OrderDate LIKE '%-%';

MySQL solution
In MySQL, use the STR_TO_DATE() function to convert string values into a standardized date
format.
UPDATE Orders
SET OrderDate = DATE_FORMAT(STR_TO_DATE(OrderDate, '%Y-%m-%d'), '%Y-%m-%d')
WHERE OrderDate IS NOT NULL;
• STR_TO_DATE(OrderDate, '%Y-%m-%d') converts the OrderDate string into a date
object, and DATE_FORMAT(..., '%Y-%m-%d') ensures it is stored in the 'YYYY-MM-DD'
format.
For different formats like '15-02-2023', you can use:
UPDATE Orders
SET OrderDate = STR_TO_DATE(OrderDate, '%d-%m-%Y')
WHERE OrderDate LIKE '%-%';

344
1000+ SQL Interview Questions & Answers | By Zero Analyst

Notes
• In PostgreSQL, TO_DATE() can handle different formats based on the date pattern you
specify, making it flexible to work with various date formats.
• In MySQL, STR_TO_DATE() converts strings into date types, and DATE_FORMAT() ensures
the output is in the desired format.
• Q.277
Question
Handle Missing Product Prices
Write an SQL query to replace NULL values in the Price column of the Products table with
the average price of all products.

Explanation
Use the UPDATE statement combined with a subquery to calculate the average price of all
products, and then replace NULL values in the Price column with this calculated average.

Datasets and SQL Schemas


• - Table creation
CREATE TABLE Products (
ProductID INT,
ProductName VARCHAR(255),
Price DECIMAL(10, 2)
);
• - Datasets (with missing prices)
INSERT INTO Products (ProductID, ProductName, Price)
VALUES
(1, 'Product A', 150.00),
(2, 'Product B', NULL),
(3, 'Product C', 200.00),
(4, 'Product D', NULL),
(5, 'Product E', 300.00);

Learnings
• Handling NULL values in SQL.
• Using subqueries to calculate aggregate values like averages.
• Updating data in a table based on calculated values.

Solutions

PostgreSQL solution
PostgreSQL supports the use of subqueries in the UPDATE statement.
UPDATE Products
SET Price = (SELECT AVG(Price) FROM Products WHERE Price IS NOT NULL)
WHERE Price IS NULL;
• The subquery (SELECT AVG(Price) FROM Products WHERE Price IS NOT NULL)
calculates the average price of all products that have a non-NULL price.
• The UPDATE statement then replaces NULL values in the Price column with this average.

MySQL solution
In MySQL, the same approach can be applied using a subquery.
UPDATE Products

345
1000+ SQL Interview Questions & Answers | By Zero Analyst

SET Price = (SELECT AVG(Price) FROM Products WHERE Price IS NOT NULL)
WHERE Price IS NULL;
• Similar to PostgreSQL, this subquery calculates the average price for products that have a
non-NULL value in the Price column and then updates the NULL values with this average.

Notes
• This solution ensures that only NULL prices are updated, leaving non-NULL values
unchanged.
• Both PostgreSQL and MySQL support subqueries in UPDATE statements, making this
approach consistent across the two databases.
• Q.278
Question
Standardize Country Names
Write an SQL query to standardize the Country column in the Customers table, ensuring all
instances of 'United Kingdom', 'UK', and 'GB' are replaced with 'United Kingdom'.

Explanation
Use the UPDATE statement with a CASE or REPLACE function to standardize the values in the
Country column. The goal is to replace different representations of the same country ('UK',
'GB') with a single standardized name ('United Kingdom').

Datasets and SQL Schemas


• - Table creation
CREATE TABLE Customers (
CustomerID INT,
CustomerName VARCHAR(255),
Country VARCHAR(255)
);
• - Datasets
INSERT INTO Customers (CustomerID, CustomerName, Country)
VALUES
(1, 'John Doe', 'United Kingdom'),
(2, 'Jane Smith', 'UK'),
(3, 'Rajesh Kumar', 'GB'),
(4, 'Neha Gupta', 'India'),
(5, 'Ravi Mehra', 'United Kingdom'),
(6, 'Kavita Reddy', 'GB');

Learnings
• Using UPDATE to modify multiple values in a column.
• Standardizing inconsistent text values in a column.
• Using CASE or REPLACE to handle multiple conditions in SQL.

Solutions

PostgreSQL solution
PostgreSQL allows you to use the CASE expression for conditional updates.
UPDATE Customers
SET Country = CASE
WHEN Country IN ('UK', 'GB') THEN 'United Kingdom'
ELSE Country
END;

346
1000+ SQL Interview Questions & Answers | By Zero Analyst

• The CASE expression checks if the Country is either 'UK' or 'GB' and replaces it with
'United Kingdom'.
• If the country is neither of these values, it keeps the original value.
Alternatively, you can use the REPLACE() function if you are certain that only the specific
values need to be replaced.
UPDATE Customers
SET Country = REPLACE(REPLACE(Country, 'UK', 'United Kingdom'), 'GB', 'United Kingdom');
• The nested REPLACE() functions first replace 'UK' with 'United Kingdom' and then
replace 'GB' with 'United Kingdom'.

MySQL solution
In MySQL, both the CASE expression and REPLACE() function are supported.
Using the CASE expression:
UPDATE Customers
SET Country = CASE
WHEN Country IN ('UK', 'GB') THEN 'United Kingdom'
ELSE Country
END;

Using the REPLACE() function:


UPDATE Customers
SET Country = REPLACE(REPLACE(Country, 'UK', 'United Kingdom'), 'GB', 'United Kingdom');
• Both methods work in MySQL similarly to PostgreSQL, allowing you to standardize the
country names.

Notes
• The CASE expression is more flexible as it allows you to handle multiple conditions and
ensures you can expand the logic easily in the future.
• The REPLACE() method is a more straightforward approach, though it could potentially
cause unexpected changes if there are partial matches (e.g., 'UK' within a longer string).
• Q.279
Question
Fix Number Format Issues in Text Column
Write an SQL query to fix the number format in the Amount column of the Transactions
table, which is stored as text. The column contains numeric values with commas as thousand
separators (e.g., '1,000', '1,500.50'). Remove the commas and standardize the format to store
the number as a plain text value (e.g., '1000', '1500.50').

Explanation
Use the UPDATE statement combined with the REPLACE() function to remove the commas
from the Amount column, while keeping the column as text. The goal is to standardize the
format without changing the column type.

Datasets and SQL Schemas


• - Table creation
CREATE TABLE Transactions (
TransactionID INT,
Amount TEXT
);

347
1000+ SQL Interview Questions & Answers | By Zero Analyst

• - Datasets (with number format issues)


INSERT INTO Transactions (TransactionID, Amount)
VALUES
(1, '1,000'),
(2, '2,500.75'),
(3, '3,000'),
(4, '1,200.50'),
(5, '4,500');

Learnings
• Using string functions like REPLACE() to manipulate text data.
• Removing non-numeric characters while keeping the original data type as text.
• Handling number format inconsistencies stored in text columns.

Solutions

PostgreSQL solution
PostgreSQL allows the use of the REPLACE() function to remove unwanted characters.
UPDATE Transactions
SET Amount = REPLACE(Amount, ',', '');
• The REPLACE() function removes all commas from the Amount column, leaving the
number in a standardized format.

MySQL solution
MySQL also supports the REPLACE() function for text manipulation.
UPDATE Transactions
SET Amount = REPLACE(Amount, ',', '');
• Similar to PostgreSQL, this REPLACE() function removes commas from the Amount
column, ensuring the number is correctly formatted as text.

Notes
• The Amount column is kept as text, but the number format is corrected by removing
commas.
• This solution does not convert the text to an actual numeric type but ensures that the
formatting is consistent.
• Q.280
Question
Remove Invalid Email Domains
Write an SQL query to delete all customer records from the Customers table where the
Email column contains an invalid domain, such as 'example.com' or 'fake.com'.

Explanation
Use the DELETE statement combined with the WHERE clause and LIKE or regular expressions to
filter out records with specific invalid email domains. You can identify invalid domains by
using pattern matching to check for email addresses ending in the undesired domains
('example.com' or 'fake.com').

Datasets and SQL Schemas

348
1000+ SQL Interview Questions & Answers | By Zero Analyst

• - Table creation
CREATE TABLE Customers (
CustomerID INT,
CustomerName VARCHAR(255),
Email VARCHAR(255)
);
• - Datasets
INSERT INTO Customers (CustomerID, CustomerName, Email)
VALUES
(1, 'John Doe', '[email protected]'),
(2, 'Jane Smith', '[email protected]'),
(3, 'Rajesh Kumar', '[email protected]'),
(4, 'Neha Gupta', '[email protected]'),
(5, 'Ravi Mehra', '[email protected]');

Learnings
• Deleting records based on string pattern matching in SQL.
• Using LIKE or regular expressions to filter records by specific conditions.
• Understanding how to manage email-related data by identifying invalid or unwanted
domains.

Solutions

PostgreSQL solution
PostgreSQL supports regular expressions with the ~ operator to match patterns.
DELETE FROM Customers
WHERE Email ~* '@(example\.com|fake\.com)$';
• The regular expression @(example\.com|fake\.com)$ matches email addresses that end
with either 'example.com' or 'fake.com'.
• The ~* operator performs a case-insensitive match.
Alternatively, using LIKE:
DELETE FROM Customers
WHERE Email LIKE '%@example.com' OR Email LIKE '%@fake.com';
• The LIKE operator checks if the email address ends with the specified invalid domains.

MySQL solution
In MySQL, you can use the REGEXP operator for regular expressions.
DELETE FROM Customers
WHERE Email REGEXP '@(example\\.com|fake\\.com)$';
• The regular expression @(example\.com|fake\.com)$ matches email addresses ending
with 'example.com' or 'fake.com'.
• The REGEXP operator allows pattern matching in MySQL.
Alternatively, using LIKE:
DELETE FROM Customers
WHERE Email LIKE '%@example.com' OR Email LIKE '%@fake.com';
• The LIKE operator in MySQL works similarly to the PostgreSQL version for matching
specific email domains.

Notes
• Regular expressions provide more flexibility and precision, especially when checking
complex patterns.

349
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Both REGEXP (MySQL) and ~ (PostgreSQL) are useful for pattern-based filtering, while
LIKE is simpler but may be less flexible for complex patterns.
• Be cautious when using LIKE with % as it may lead to partial matches in unintended places.
Regular expressions allow more control over the pattern matching.

Questions By Company
Amazon
• Q.281
Question
Write an SQL query to find all dates' id with a higher temperature compared to its previous
day's temperature.

Explanation
We need to compare the temperature of each day with the temperature of the previous day.
This can be done by using a self-join or a window function to get the temperature of the
previous day for each row, then filtering those rows where the temperature is higher than the
previous day's temperature.

Datasets and SQL Schemas


• - Table creation and sample data
CREATE TABLE Weather (
id INT PRIMARY KEY,
recordDate DATE,
temperature INT
);

-- Sample data insertions


INSERT INTO Weather (id, recordDate, temperature)
VALUES
(1, '2015-01-01', 10),
(2, '2015-01-02', 25),
(3, '2015-01-03', 20),
(4, '2015-01-04', 30);

Learnings
• Self-joins to compare a row with its previous row.
• Use of date comparison to match consecutive records.
• Understanding the use of LAG() or self-join techniques for handling consecutive records.

Solutions
• - PostgreSQL solution (using LAG() function)
SELECT w.id
FROM Weather w
JOIN LATERAL (
SELECT LAG(temperature) OVER (ORDER BY recordDate) AS prev_temp
) AS prev_day ON true
WHERE w.temperature > prev_day.prev_temp;

350
1000+ SQL Interview Questions & Answers | By Zero Analyst

• - MySQL solution (using a self-join)


SELECT w1.id
FROM Weather w1
JOIN Weather w2 ON w1.recordDate = DATE_ADD(w2.recordDate, INTERVAL 1 DAY)
WHERE w1.temperature > w2.temperature;
• Q.282
Write an SQL query to find the missing customer IDs. The missing IDs are those that are not
present in the Customers table but are within the range from 1 to the maximum customer_id
in the table.

Explanation
We need to generate a sequence of numbers from 1 to the maximum customer_id, and then
find which numbers do not exist in the Customers table. This can be done by comparing the
generated sequence with the customer_id values in the table. The MAX() function will help
identify the highest customer_id, and then a range of numbers can be generated to identify
the missing IDs.

Datasets and SQL Schemas


• - Table creation and sample data
sql
CREATE TABLE Customers (
customer_id INT PRIMARY KEY,
customer_name VARCHAR(100)
);

-- Sample data insertions


INSERT INTO Customers (customer_id, customer_name)
VALUES
(1, 'Alice'),
(4, 'Bob'),
(5, 'Charlie');

Learnings
• Generating a range of numbers using JOIN or WITH clause.
• Using NOT IN or LEFT JOIN to filter missing records.
• Understanding how to dynamically calculate ranges and compare them to table values.

Solutions
• - PostgreSQL solution (using generate_series() to create a range of numbers)
SELECT series.ids
FROM generate_series(1, (SELECT MAX(customer_id) FROM Customers)) AS series(ids)
WHERE series.ids NOT IN (SELECT customer_id FROM Customers)
ORDER BY series.ids;
• - MySQL solution (using a JOIN with a sequence of numbers)
sql
WITH RECURSIVE NumberSequence AS (
SELECT 1 AS ids
UNION ALL
SELECT ids + 1
FROM NumberSequence
WHERE ids < (SELECT MAX(customer_id) FROM Customers)
)
SELECT ids

351
1000+ SQL Interview Questions & Answers | By Zero Analyst

FROM NumberSequence
WHERE ids NOT IN (SELECT customer_id FROM Customers)
ORDER BY ids;
• Q.283
Question
Write an SQL query to show the second most recent activity of each user. If the user only has
one activity, return that one.

Explanation
We need to identify the second most recent activity for each user based on the startDate of
each activity. If a user has only one activity, we should return that single activity. To achieve
this, we can use the ROW_NUMBER() window function to rank the activities per user, then filter
for the second most recent one. If there's only one activity, we return that record.

Datasets and SQL Schemas


• - Table creation and sample data
CREATE TABLE UserActivity (
username VARCHAR(100),
activity VARCHAR(100),
startDate DATE,
endDate DATE
);

-- Sample data insertions


INSERT INTO UserActivity (username, activity, startDate, endDate)
VALUES
('Alice', 'Travel', '2020-02-12', '2020-02-20'),
('Alice', 'Dancing', '2020-02-21', '2020-02-23'),
('Alice', 'Travel', '2020-02-24', '2020-02-28'),
('Bob', 'Travel', '2020-02-11', '2020-02-18');

Learnings
• Use of window functions (ROW_NUMBER()) to rank records based on a specific order (in this
case, by startDate).
• Handling cases where a user has only one record.
• Using conditional logic to return the correct result when there are fewer than two activities.

Solutions
• - PostgreSQL solution (using ROW_NUMBER() to rank activities per user)
WITH RankedActivities AS (
SELECT username, activity, startDate, endDate,
ROW_NUMBER() OVER (PARTITION BY username ORDER BY startDate DESC) AS rn
FROM UserActivity
)
SELECT username, activity, startDate, endDate
FROM RankedActivities
WHERE rn = 2
UNION ALL
SELECT username, activity, startDate, endDate
FROM RankedActivities
WHERE rn = 1
AND username NOT IN (SELECT username FROM RankedActivities WHERE rn = 2);
• - MySQL solution (using ROW_NUMBER() and IFNULL() for similar functionality)
WITH RankedActivities AS (

352
1000+ SQL Interview Questions & Answers | By Zero Analyst

SELECT username, activity, startDate, endDate,


ROW_NUMBER() OVER (PARTITION BY username ORDER BY startDate DESC) AS rn
FROM UserActivity
)
SELECT username, activity, startDate, endDate
FROM RankedActivities
WHERE rn = 2
UNION
SELECT username, activity, startDate, endDate
FROM RankedActivities
WHERE rn = 1 AND username NOT IN (SELECT username FROM RankedActivities WHERE rn = 2);
• Q.284
Question
Write an SQL query to report the distance travelled by each user. The result should be
ordered by the travelled distance in descending order. If two or more users travelled the same
distance, order them by their name in ascending order.

Explanation
We need to calculate the total distance travelled by each user, which requires summing the
distance from the Rides table grouped by user_id. Additionally, we need to include users
who have no rides, which can be handled by performing a LEFT JOIN between the Users and
Rides tables. The result should be ordered first by the total distance in descending order and
then by name in ascending order in case of ties.

Datasets and SQL Schemas


• - Table creation and sample data
CREATE TABLE Users (
id INT PRIMARY KEY,
name VARCHAR(100)
);

CREATE TABLE Rides (


id INT PRIMARY KEY,
user_id INT,
distance INT
);

-- Sample data insertions


INSERT INTO Users (id, name)
VALUES
(1, 'Alice'),
(2, 'Bob'),
(3, 'Alex'),
(4, 'Donald'),
(7, 'Lee'),
(13, 'Jonathan'),
(19, 'Elvis');

INSERT INTO Rides (id, user_id, distance)


VALUES
(1, 1, 120),
(2, 2, 317),
(3, 3, 222),
(4, 7, 100),
(5, 13, 312),
(6, 19, 50),
(7, 7, 120),
(8, 19, 400),
(9, 7, 230);

353
1000+ SQL Interview Questions & Answers | By Zero Analyst

Learnings
• Using LEFT JOIN to include all users, even those without a corresponding record in the
Rides table.
• Grouping and aggregating data using SUM() to calculate the total distance travelled by each
user.
• Sorting results based on multiple columns (ORDER BY).

Solutions
• - PostgreSQL / MySQL solution (using LEFT JOIN and GROUP BY)
SELECT u.name, COALESCE(SUM(r.distance), 0) AS travelled_distance
FROM Users u
LEFT JOIN Rides r ON u.id = r.user_id
GROUP BY u.id
ORDER BY travelled_distance DESC, u.name ASC;

In this solution:
• The LEFT JOIN ensures that all users are included, even those without any rides.
• The SUM(r.distance) calculates the total distance each user has travelled.
• The COALESCE() function is used to return 0 for users who have no corresponding ride
data.
• The result is ordered first by travelled_distance in descending order, then by name in
ascending order for tie-breaking.
• Q.285
Question
Write an SQL query to report the current balance of each user after performing transactions,
along with a flag indicating whether the user has breached their credit limit (i.e., if their
balance is less than 0). The result should show user_id, user_name, credit, and
credit_limit_breached ("Yes" or "No").

Explanation
To calculate the current balance of each user, we need to account for both incoming and
outgoing transactions:
• For every transaction, we need to subtract the amount from the credit of the user who
paid (paid_by) and add the amount to the credit of the user who received (paid_to).
• After calculating the new balance, check if the balance is less than 0. If so, mark
credit_limit_breached as "Yes"; otherwise, it should be "No".
• The result should show each user's original credit (as credit) and their updated balance
after transactions (as current_balance).

Datasets and SQL Schemas


• - Table creation and sample data
CREATE TABLE Users (
user_id INT PRIMARY KEY,
user_name VARCHAR(100),
credit INT
);

354
1000+ SQL Interview Questions & Answers | By Zero Analyst

CREATE TABLE Transactions (


trans_id INT PRIMARY KEY,
paid_by INT,
paid_to INT,
amount INT,
transacted_on DATE
);

-- Sample data insertions


INSERT INTO Users (user_id, user_name, credit)
VALUES
(1, 'Moustafa', 100),
(2, 'Jonathan', 200),
(3, 'Winston', 10000),
(4, 'Luis', 800);

INSERT INTO Transactions (trans_id, paid_by, paid_to, amount, transacted_on)


VALUES
(1, 1, 3, 400, '2020-08-01'),
(2, 3, 2, 500, '2020-08-02'),
(3, 2, 1, 200, '2020-08-03');

Learnings
• Using LEFT JOIN to include users who have not participated in any transactions.
• Aggregating the changes to user balances based on transactions.
• Using conditional logic (CASE) to check whether the credit limit is breached.

Solutions
• - PostgreSQL / MySQL solution (calculating the balance and checking the credit breach)
SELECT u.user_id,
u.user_name,
u.credit - COALESCE(SUM(CASE WHEN t.paid_by = u.user_id THEN t.amount ELSE 0 END)
, 0) +
COALESCE(SUM(CASE WHEN t.paid_to = u.user_id THEN t.amount ELSE 0 END), 0) AS cur
rent_balance,
CASE
WHEN u.credit - COALESCE(SUM(CASE WHEN t.paid_by = u.user_id THEN t.amount EL
SE 0 END), 0) +
COALESCE(SUM(CASE WHEN t.paid_to = u.user_id THEN t.amount ELSE 0 END),
0) < 0 THEN 'Yes'
ELSE 'No'
END AS credit_limit_breached
FROM Users u
LEFT JOIN Transactions t ON u.user_id = t.paid_by OR u.user_id = t.paid_to
GROUP BY u.user_id, u.user_name, u.credit;

In this solution:
• We calculate the current balance by adjusting the user's credit based on both outgoing
(paid_by) and incoming (paid_to) transactions using SUM() and CASE statements.
• COALESCE() ensures that users with no transactions are still included (i.e., their balance
remains unchanged).
• The CASE expression checks whether the calculated balance is less than 0 to determine if
the credit limit is breached.
• The result is ordered implicitly by user ID.
• Q.286
Question

355
1000+ SQL Interview Questions & Answers | By Zero Analyst

Write an SQL query to find the most frequently ordered product(s) for each customer. Return
the product_id and product_name for each customer_id who ordered at least one product.
If there are multiple most frequently ordered products for a customer, return all of them. The
result table should be ordered by customer_id.

Explanation
To solve this problem:
• We need to join the Orders table with the Products table to get product details (such as
product_name) for each order.
• Then, we need to count how many times each customer ordered each product using GROUP
BY on customer_id and product_id.
• For each customer, we need to find the most frequently ordered product(s). If there are
multiple products with the same frequency, we need to include all of them.
• The result should be ordered by customer_id to meet the requirement.

Datasets and SQL Schemas


• - Table creation and sample data
CREATE TABLE Customers (
customer_id INT PRIMARY KEY,
name VARCHAR(100)
);

CREATE TABLE Orders (


order_id INT PRIMARY KEY,
order_date DATE,
customer_id INT,
product_id INT
);

CREATE TABLE Products (


product_id INT PRIMARY KEY,
product_name VARCHAR(100),
price INT
);

-- Sample data insertions


INSERT INTO Customers (customer_id, name)
VALUES
(1, 'Alice'),
(2, 'Bob'),
(3, 'Tom'),
(4, 'Jerry'),
(5, 'John');

INSERT INTO Orders (order_id, order_date, customer_id, product_id)


VALUES
(1, '2020-07-31', 1, 1),
(2, '2020-07-30', 2, 2),
(3, '2020-08-29', 3, 3),
(4, '2020-07-29', 4, 1),
(5, '2020-06-10', 1, 2),
(6, '2020-08-01', 2, 1),
(7, '2020-08-01', 3, 3),
(8, '2020-08-03', 1, 2),
(9, '2020-08-07', 2, 3),
(10, '2020-07-15', 1, 2);

INSERT INTO Products (product_id, product_name, price)


VALUES
(1, 'keyboard', 120),

356
1000+ SQL Interview Questions & Answers | By Zero Analyst

(2, 'mouse', 80),


(3, 'screen', 600),
(4, 'hard disk', 450);

Learnings
• Using JOIN to combine related information from multiple tables (Orders and Products).
• Using GROUP BY to aggregate orders by customer and product.
• Using HAVING or window functions (ROW_NUMBER(), RANK()) to filter for the most frequent
items per customer.
• Handling cases where there are multiple products with the same frequency.

Solutions
• - PostgreSQL / MySQL solution (finding the most frequently ordered products for each
customer)
WITH ProductFrequency AS (
SELECT o.customer_id,
o.product_id,
p.product_name,
COUNT(*) AS frequency
FROM Orders o
JOIN Products p ON o.product_id = p.product_id
GROUP BY o.customer_id, o.product_id, p.product_name
),
MaxFrequency AS (
SELECT customer_id,
MAX(frequency) AS max_frequency
FROM ProductFrequency
GROUP BY customer_id
)
SELECT pf.customer_id,
pf.product_id,
pf.product_name
FROM ProductFrequency pf
JOIN MaxFrequency mf ON pf.customer_id = mf.customer_id
WHERE pf.frequency = mf.max_frequency
ORDER BY pf.customer_id;

Explanation of the solution:


• ProductFrequency: This CTE counts how many times each product was ordered by each
customer. It joins the Orders and Products tables to get the product_name.
• MaxFrequency: This CTE identifies the maximum frequency of product orders for each
customer.
• We then join ProductFrequency with MaxFrequency to get the product(s) for each
customer that were ordered the most times.
• The WHERE clause ensures we only get the products with the maximum frequency for each
customer.
• The result is ordered by customer_id to meet the required output format.

This solution should efficiently solve the problem of identifying the most frequently ordered
products for each customer, considering cases where multiple products might be equally
frequent for a customer.
• Q.287
Question

357
1000+ SQL Interview Questions & Answers | By Zero Analyst

Write an SQL query to report the number of bank accounts in each salary category. The
salary categories are:
• "Low Salary": All salaries strictly less than $20,000.
• "Average Salary": All salaries in the inclusive range [$20,000, $50,000].
• "High Salary": All salaries strictly greater than $50,000.
The result table must contain all three categories. If there are no accounts in a category, report
0. Return the result table in any order.

Explanation
To solve this problem, we can categorize the income values into three groups based on the
given salary ranges. We will use CASE expressions to count the number of accounts falling
into each category. If there are no accounts for a given category, the result should be 0 for
that category.
Steps:
• Use CASE expressions to categorize the incomes into the three groups: "Low Salary",
"Average Salary", and "High Salary".
• Use COUNT to count the number of accounts in each category.
• Make sure to include all categories, even if there are no accounts in a particular category
(using UNION ALL to ensure we always get a row for each category).

Datasets and SQL Schemas


• - Table creation and sample data
CREATE TABLE Accounts (
account_id INT PRIMARY KEY,
income INT
);

-- Sample data insertions


INSERT INTO Accounts (account_id, income)
VALUES
(3, 108939),
(2, 12747),
(8, 87709),
(6, 91796);

Learnings
• Using CASE to classify data into different categories.
• Using COUNT to count the number of records for each category.
• Handling cases with no records in a category using UNION ALL.

Solutions
• - PostgreSQL / MySQL solution (counting the number of accounts in each salary category)
SELECT 'Low Salary' AS category,
COUNT(*) AS accounts_count
FROM Accounts
WHERE income < 20000
UNION ALL
SELECT 'Average Salary',
COUNT(*)
FROM Accounts

358
1000+ SQL Interview Questions & Answers | By Zero Analyst

WHERE income BETWEEN 20000 AND 50000


UNION ALL
SELECT 'High Salary',
COUNT(*)
FROM Accounts
WHERE income > 50000;

Explanation of the solution:


• Low Salary: We use the condition income < 20000 to count accounts with low salaries.
• Average Salary: We use the condition income BETWEEN 20000 AND 50000 to count
accounts with average salaries.
• High Salary: We use the condition income > 50000 to count accounts with high salaries.
• The UNION ALL operator ensures that even if a category has no accounts, it will still appear
in the result with a count of 0.
• Each category is explicitly named using the SELECT statement, ensuring we return the
correct labels ("Low Salary", "Average Salary", and "High Salary").

This solution guarantees that all salary categories will appear in the result, even if some
categories have no accounts.
• Q.288
Sure! Here's the formatted SQL interview question based on the information you've provided:

Question
Write an SQL query to find the name of the product with the highest price in each country.
Explanation
You need to find the product with the highest price for each country. This involves joining
the two tables (Product and Supplier) on Supplier_id, then grouping by the Country
column. After that, you should use an aggregation function or window function to select the
highest priced product for each country.
Datasets and SQL Schemas
• - Table creation and sample data
CREATE TABLE suppliers (
supplier_id INT PRIMARY KEY,
supplier_name VARCHAR(25),
country VARCHAR(25)
);

-- Sample data insertions for suppliers


INSERT INTO suppliers (supplier_id, supplier_name, country)
VALUES
(501, 'alan', 'India'),
(502, 'rex', 'US'),
(503, 'dodo', 'India'),
(504, 'rahul', 'US'),
(505, 'zara', 'Canada'),
(506, 'max', 'Canada');

CREATE TABLE products (


product_id INT PRIMARY KEY,
product_name VARCHAR(25),
supplier_id INT,
price FLOAT,
FOREIGN KEY (supplier_id) REFERENCES suppliers(supplier_id)

359
1000+ SQL Interview Questions & Answers | By Zero Analyst

);

-- Sample data insertions for products


INSERT INTO products (product_id, product_name, supplier_id, price)
VALUES
(201, 'iPhone 14', 501, 1299),
(202, 'iPhone 8', 502, 999),
(204, 'iPhone 13', 502, 1199),
(203, 'iPhone 11', 503, 1199),
(205, 'iPhone 12', 502, 1199),
(206, 'iPhone 14', 501, 1399),
(214, 'iPhone 15', 503, 1499),
(207, 'iPhone 15', 505, 1499),
(208, 'iPhone 15', 504, 1499),
(209, 'iPhone 12', 502, 1299),
(210, 'iPhone 13', 502, 1199),
(211, 'iPhone 11', 501, 1099),
(212, 'iPhone 14', 503, 1399),
(213, 'iPhone 8', 502, 1099),
(222, 'Samsung Galaxy S21', 504, 1699),
(223, 'Samsung Galaxy S20', 505, 1899),
(224, 'Google Pixel 6', 501, 899),
(225, 'Google Pixel 5', 502, 799),
(226, 'OnePlus 9 Pro', 503, 1699),
(227, 'OnePlus 9', 502, 1999),
(228, 'Xiaomi Mi 11', 501, 899),
(229, 'Xiaomi Mi 10', 504, 699),
(230, 'Huawei P40 Pro', 505, 1099),
(231, 'Huawei P30', 502, 1299),
(232, 'Sony Xperia 1 III', 503, 1199),
(233, 'Sony Xperia 5 III', 501, 999),
(234, 'LG Velvet', 505, 1899),
(235, 'LG G8 ThinQ', 504, 799),
(236, 'Motorola Edge Plus', 502, 1099),
(237, 'Motorola One 5G', 501, 799),
(238, 'ASUS ROG Phone 5', 503, 1999),
(239, 'ASUS ZenFone 8', 504, 999),
(240, 'Nokia 8.3 5G', 502, 899),
(241, 'Nokia 7.2', 501, 699),
(242, 'BlackBerry Key2', 504, 1899),
(243, 'BlackBerry Motion', 502, 799),
(244, 'HTC U12 Plus', 501, 899),
(245, 'HTC Desire 20 Pro', 505, 699),
(246, 'Lenovo Legion Phone Duel', 503, 1499),
(247, 'Lenovo K12 Note', 504, 1499),
(248, 'ZTE Axon 30 Ultra', 501, 1299),
(249, 'ZTE Blade 20', 502, 1599),
(250, 'Oppo Find X3 Pro', 503, 1999);

Learnings
• How to join tables using a foreign key (supplier_id).
• Use of aggregation (MAX) and grouping by Country to find the highest priced product for
each country.
• Importance of dealing with subqueries or window functions to isolate the highest value.
Solutions
• - PostgreSQL and MySQL solution
WITH RankedProducts AS (
SELECT
s.country,
p.product_name,
p.price,
RANK() OVER (PARTITION BY s.country ORDER BY p.price DESC) AS rank
FROM products p
JOIN suppliers s ON p.supplier_id = s.supplier_id
)
SELECT country, product_name
FROM RankedProducts

360
1000+ SQL Interview Questions & Answers | By Zero Analyst

WHERE rank = 1
ORDER BY country;

This should help provide a comprehensive approach for solving this SQL interview question.
• Q.289

Question
Write an SQL query to calculate the total transaction amount for each customer for the
current year. The output should contain Customer_Name and the total amount.
Explanation
You need to calculate the total transaction amount for each customer for the current year. To
do this:
• Join the Customer and Transaction tables on Customer_id.
• Filter the transactions to only include those from the current year by using the
EXTRACT(YEAR FROM date) function.
• Group the results by Customer_Name and aggregate the total amount for each customer
using SUM().
Datasets and SQL Schemas
• - Table creation and sample data
-- Create Customer table
CREATE TABLE Customers (
Customer_id INT PRIMARY KEY,
Customer_Name VARCHAR(100),
Registration_Date DATE
);

-- Create Transaction table


CREATE TABLE Transaction (
Transaction_id INT PRIMARY KEY,
Customer_id INT,
Transaction_Date DATE,
Amount DECIMAL(10, 2),
FOREIGN KEY (Customer_id) REFERENCES Customers(Customer_id)
);

-- Insert records into Customer table


INSERT INTO Customers (Customer_id, Customer_Name, Registration_Date)
VALUES
(1, 'John Doe', '2023-01-15'),
(2, 'Jane Smith', '2023-02-20'),
(3, 'Michael Johnson', '2023-03-10');

-- Insert records into Transaction table


INSERT INTO Transaction (Transaction_id, Customer_id, Transaction_Date, Amount)
VALUES
(201, 1, '2024-01-20', 50.00),
(202, 1, '2024-02-05', 75.50),
(203, 2, '2023-02-22', 100.00),
(204, 3, '2022-03-15', 200.00),
(205, 2, '2024-03-20', 120.75),
(301, 1, '2024-01-20', 50.00),
(302, 1, '2024-02-05', 75.50),
(403, 2, '2023-02-22', 100.00),
(304, 3, '2022-03-15', 200.00),
(505, 2, '2024-03-20', 120.75);

Learnings

361
1000+ SQL Interview Questions & Answers | By Zero Analyst

• How to filter data based on the current year using EXTRACT(YEAR FROM date) or
equivalent functions.
• Using JOIN to combine information from two related tables (Customer and Transaction).
• Aggregating data using SUM() and grouping by customer.
Solutions
• - PostgreSQL solution (using EXTRACT to filter the current year)
SELECT
c.customer_name,
SUM(t.amount) AS total_amt
FROM customers AS c
JOIN transaction AS t ON c.customer_id = t.customer_id
WHERE EXTRACT(YEAR FROM t.transaction_date) = EXTRACT(YEAR FROM CURRENT_DATE)
GROUP BY c.customer_name;
• - MySQL solution (using YEAR() to extract the year)
SELECT
c.customer_name,
SUM(t.amount) AS total_amt
FROM customers AS c
JOIN transaction AS t ON c.customer_id = t.customer_id
WHERE YEAR(t.transaction_date) = YEAR(CURRENT_DATE)
GROUP BY c.customer_name;

Explanation of the Solution:


• PostgreSQL uses EXTRACT(YEAR FROM ...) to extract the year part from the
transaction_date and compare it with the current year using CURRENT_DATE.
• MySQL uses the YEAR() function to achieve the same effect, which directly extracts the
year from the transaction_date.
• The result is grouped by Customer_Name and the transaction amounts are aggregated using
SUM() to give the total transaction amount per customer for the current year.
• Q.290

Write a SQL query to get the average review ratings for every product every month. The
output should include the month in numerical value, product id, and average star rating
rounded to two decimal places. Sort the output based on the month followed by the product
id.
Explanation
The goal is to calculate the average star rating for each product on a monthly basis. This
involves:
• Extracting the month and year from the submit_date in the reviews table.
• Grouping the results by product and month.
• Calculating the average rating (stars) for each group.
• Sorting the output by month and product id.
• Rounding the average rating to two decimal places.
Datasets and SQL Schemas
• - Table creation and sample data
-- Create reviews table
CREATE TABLE reviews (
review_id INT PRIMARY KEY,
user_id INT,
submit_date TIMESTAMP,

362
1000+ SQL Interview Questions & Answers | By Zero Analyst

product_id INT,
stars INT
);

-- Sample data insertions for reviews


INSERT INTO reviews (review_id, user_id, submit_date, product_id, stars)
VALUES
(6171, 123, '2022-06-08 00:00:00', 50001, 4),
(7802, 265, '2022-06-10 00:00:00', 69852, 4),
(5293, 362, '2022-06-18 00:00:00', 50001, 3),
(6352, 192, '2022-07-26 00:00:00', 69852, 3),
(4517, 981, '2022-07-05 00:00:00', 69852, 2);

Learnings
• Extracting the month and year from a TIMESTAMP or DATE column.
• Using GROUP BY to aggregate data by month and product.
• Applying AVG() to calculate the average star rating.
• Rounding the result to two decimal places using ROUND().
• Sorting the output by month and product ID.
Solutions
• - PostgreSQL and MySQL solution
SELECT
EXTRACT(MONTH FROM submit_date) AS mth,
product_id AS product,
ROUND(AVG(stars), 2) AS avg_stars
FROM reviews
GROUP BY mth, product
ORDER BY mth, product;

Explanation of the Solution:


• EXTRACT(MONTH FROM submit_date): This extracts the month part of the submit_date.
This is a PostgreSQL function, but it also works similarly in MySQL.
• AVG(stars): This function calculates the average star rating for each group (each product
per month).
• ROUND(AVG(stars), 2): The ROUND() function rounds the average star rating to two
decimal places.
• GROUP BY mth, product: We group the results by the extracted month (mth) and the
product_id.
• ORDER BY mth, product: Finally, the output is sorted by the month and then by the
product ID in ascending order.
• Q.291

Write a SQL query to find the highest-grossing items. Identify the top two highest-grossing
products within each category in 2022. Output the category, product, and total spend.
Explanation
The task is to find the top two highest-grossing products within each category for the year
2022. To achieve this:
• Filter the transactions that occurred in 2022.
• Aggregate the total spend for each product within each category using SUM().
• Use a window function like RANK() or ROW_NUMBER() to rank products within each
category by total spend.

363
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Select only the top two highest-grossing products for each category.
• Sort the results by category and the total spend in descending order.
Datasets and SQL Schemas
• - Table creation and sample data
-- Create product_spend table
CREATE TABLE product_spend (
category VARCHAR(50),
product VARCHAR(50),
user_id INT,
spend DECIMAL(10, 2),
transaction_date TIMESTAMP
);

-- Sample data insertions for product_spend


INSERT INTO product_spend (category, product, user_id, spend, transaction_date)
VALUES
('appliance', 'refrigerator', 165, 246.00, '2021-12-26 12:00:00'),
('appliance', 'refrigerator', 123, 299.99, '2022-03-02 12:00:00'),
('appliance', 'washing machine', 123, 219.80, '2022-03-02 12:00:00'),
('electronics', 'vacuum', 178, 152.00, '2022-04-05 12:00:00'),
('electronics', 'wireless headset', 156, 249.90, '2022-07-08 12:00:00'),
('electronics', 'vacuum', 145, 189.00, '2022-07-15 12:00:00');

Learnings
• How to filter data by a specific year using EXTRACT(YEAR FROM ...) or equivalent
functions.
• Aggregating data by category and product using SUM().
• Using RANK() or ROW_NUMBER() to rank the products based on total spend within each
category.
• Sorting the result by total spend and selecting the top two products per category.
Solutions
• - PostgreSQL and MySQL solution (using RANK() to rank products)
WITH RankedProducts AS (
SELECT
category,
product,
SUM(spend) AS total_spend,
RANK() OVER (PARTITION BY category ORDER BY SUM(spend) DESC) AS rank
FROM product_spend
WHERE EXTRACT(YEAR FROM transaction_date) = 2022
GROUP BY category, product
)
SELECT category, product, total_spend
FROM RankedProducts
WHERE rank <= 2
ORDER BY category, total_spend DESC;

Explanation of the Solution:


• Filtering for the year 2022: EXTRACT(YEAR FROM transaction_date) = 2022 filters
the records to only include those from 2022.
• Aggregating by product: The SUM(spend) aggregates the total spending for each product.
• Ranking the products: RANK() is used with PARTITION BY category to rank products
within each category by total spend, with the highest spend getting the top rank.
• Limiting to the top two: WHERE rank <= 2 ensures that only the top two highest-grossing
products within each category are selected.

364
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Sorting the result: ORDER BY category, total_spend DESC sorts the output first by
category and then by total spend in descending order.
This query will return the top two products per category with their total spend in 2022, sorted
as required.
• Q.292

Question
Write a SQL query to identify high-spending customers who have made purchases exceeding
$100. You have two tables called customers (containing customer_id and name) and
orders (containing order_id, customer_id, and order_amount). The task is to join these
tables and calculate the total purchase amount for each customer, selecting customers whose
total purchase amount exceeds $100.
Explanation
The task requires identifying customers who have made purchases exceeding $100 by:
• Joining the customers and orders tables using the customer_id.
• Aggregating the total purchase amount for each customer using SUM().
• Filtering the result to include only customers whose total purchase amount exceeds $100
using the HAVING clause.
• Grouping the result by customer_id and name.
Datasets and SQL Schemas
• - Table creation and sample data
-- Create customers table
CREATE TABLE customers (
customer_id INT PRIMARY KEY,
name VARCHAR(100)
);

-- Create orders table


CREATE TABLE orders (
order_id INT PRIMARY KEY,
customer_id INT,
order_amount DECIMAL(10, 2),
FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
);

-- Insert records into customers table


INSERT INTO customers (customer_id, name)
VALUES
(1, 'Alice'),
(2, 'Bob'),
(3, 'Charlie'),
(4, 'David');

-- Insert records into orders table


INSERT INTO orders (order_id, customer_id, order_amount)
VALUES
(1, 1, 50.00),
(2, 2, 75.00),
(3, 3, 100.00),
(4, 1, 120.00),
(5, 2, 80.00),
(6, 4, 150.00);

Learnings
• How to join two tables using a common key (customer_id).

365
1000+ SQL Interview Questions & Answers | By Zero Analyst

• How to use SUM() to calculate the total purchase amount for each customer.
• How to filter results based on aggregated values using the HAVING clause.
• The importance of grouping by relevant columns (customer_id and name).
Solutions
• - PostgreSQL and MySQL solution
SELECT c.customer_id, c.name
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.name
HAVING SUM(o.order_amount) > 100;

Explanation of the Solution:


• Join: The JOIN clause combines data from the customers and orders tables based on
customer_id.
• Aggregation: The SUM(o.order_amount) calculates the total amount spent by each
customer.
• Group By: The GROUP BY clause groups the result by customer_id and name to aggregate
the total spend for each customer.
• Filtering: The HAVING clause filters the grouped results, keeping only those customers
whose total purchase amount exceeds $100.
The result will show customers who have spent more than $100 in total across all their
orders.
Example Output:
customer_id | name
------------|------
1 | Alice
4 | David

This query correctly identifies high-spending customers based on the total amount spent in
the orders table.
• Q.293

Write a SQL query to generate a histogram showing the count of comments made by each
user. You are given two tables, users and comments. The users table contains information
about users, and the comments table contains comments made by users. The task is to
calculate the number of comments each user has made and sort the results in descending
order of comment count to identify the most active users.

Explanation
To calculate the number of comments made by each user and generate a histogram of their
activity:
• Join the users table and the comments table on the user_id.
• Count the number of comments for each user using the COUNT() function.
• Group the results by user_id and name to aggregate the comment counts.
• Sort the results in descending order by the comment count, showing the most active users
first.

366
1000+ SQL Interview Questions & Answers | By Zero Analyst

Datasets and SQL Schemas


• - Table creation and sample data
-- Create users table
CREATE TABLE users (
user_id INT PRIMARY KEY,
name VARCHAR(100)
);

-- Create comments table


CREATE TABLE comments (
comment_id INT PRIMARY KEY,
user_id INT,
comment_text TEXT,
FOREIGN KEY (user_id) REFERENCES users(user_id)
);

-- Insert records into users table


INSERT INTO users (user_id, name)
VALUES
(1, 'Alice'),
(2, 'Bob'),
(3, 'Charlie'),
(4, 'David');

-- Insert records into comments table


INSERT INTO comments (comment_id, user_id, comment_text)
VALUES
(1, 1, 'Great product!'),
(2, 2, 'I really like this.'),
(3, 1, 'Could be better.'),
(4, 3, 'Not satisfied.'),
(5, 1, 'Excellent service.'),
(6, 2, 'Could use some improvements.'),
(7, 4, 'Fast delivery.'),
(8, 3, 'Highly recommend.'),
(9, 1, 'Will buy again.'),
(10, 2, 'Good value for money.');

Learnings
• How to perform a LEFT JOIN to include all users, even those who haven't made any
comments.
• Using the COUNT() function to count the number of comments for each user.
• Grouping by multiple columns (user_id, name) to aggregate data.
• Sorting the results using ORDER BY to display users with the highest comment count first.

Solution
• - PostgreSQL and MySQL solution
SELECT u.user_id, u.name, COUNT(c.comment_id) AS comment_count
FROM users u
LEFT JOIN comments c ON u.user_id = c.user_id
GROUP BY u.user_id, u.name
ORDER BY comment_count DESC;

Explanation of the Solution:


• LEFT JOIN: The LEFT JOIN ensures that we include all users from the users table, even if
they haven't made any comments. For those users, the comment_id will be NULL, but the
COUNT() function will return 0 for them.
• COUNT(c.comment_id): The COUNT() function counts the number of comments for each
user. It counts NULL values as well, so in the case of users with no comments, they will be
counted as 0.

367
1000+ SQL Interview Questions & Answers | By Zero Analyst

• GROUP BY u.user_id, u.name: The GROUP BY clause aggregates the results by user, so
that we can calculate the total comment count for each user.
• ORDER BY comment_count DESC: The ORDER BY clause sorts the result by the
comment_count in descending order to show the most active users first.

Expected Output:
user_id | name | comment_count
--------|---------|---------------
1 | Alice | 5
2 | Bob | 4
3 | Charlie | 2
4 | David | 1

• Q.294
Write a SQL query to determine the daily aggregate count of new users and the cumulative
count of users over time. You are given a users table with the columns user_id and
registration_date. The task is to generate a report showing:

• The daily count of new users (new_users).


• The cumulative count of users (cumulative_count) up to each registration date.
The results should be ordered by
registration_date.

Explanation
To generate the required report:
• Group by registration_date to count how many users registered on each specific day.
• Use COUNT(user_id) to count the number of new users each day.
• Use a window function (SUM() with OVER clause) to calculate the cumulative count of
users by summing up the number of new users up to the current date.
• Order by registration_date to ensure the result is sorted by the date of registration.

Datasets and SQL Schemas


• - Table creation and sample data
-- Create the 'users' table
CREATE TABLE users (
user_id INT PRIMARY KEY,
registration_date DATE
);

-- Insert sample data into 'users' table


INSERT INTO users (user_id, registration_date)
VALUES
(1, '2024-01-01'),
(2, '2024-01-01'),
(3, '2024-01-02'),
(4, '2024-01-02'),
(5, '2024-01-03'),
(6, '2024-01-03'),
(7, '2024-01-03'),
(8, '2024-01-04'),
(9, '2024-01-04'),
(10, '2024-01-04');

Learnings
• How to use COUNT() for counting new users for each day.

368
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Using window functions like SUM() OVER() to calculate cumulative counts.


• Grouping results by a specific date and sorting by date to track the progression of new user
registrations.

Solution
• - PostgreSQL and MySQL solution
SELECT
registration_date,
COUNT(user_id) AS new_users,
SUM(COUNT(user_id)) OVER (ORDER BY registration_date) AS cumulative_count
FROM
users
GROUP BY
registration_date
ORDER BY
registration_date;

Explanation of the Solution:


• COUNT(user_id): This counts the number of new users who registered on each day.
• SUM(COUNT(user_id)) OVER (ORDER BY registration_date): This is a window
function that calculates the cumulative sum of new users up to each date. It adds up the daily
new users in an ordered manner (based on registration_date).
• GROUP BY registration_date: This groups the data by registration_date, so we get
the count of users for each specific date.
• ORDER BY registration_date: The result is ordered by registration_date, which is
necessary to calculate the cumulative count in the correct sequence.

Expected Output:
registration_date | new_users | cumulative_count
------------------|-----------|-----------------
2024-01-01 | 2 | 2
2024-01-02 | 2 | 4
2024-01-03 | 3 | 7
2024-01-04 | 3 | 10

Summary:
This query helps track user registration trends by counting the new users per day and
calculating the cumulative number of users over time. The use of window functions (SUM()
OVER()) makes it easy to compute the cumulative count dynamically as the query processes
each date in the dataset.
• Q.295
Write an SQL query to track daily user registrations and calculate the daily count of new
users and the cumulative count of users over time. The given users table contains user_id
and registration_date. Your goal is to generate a report showing:
• The count of new users per day (new_users).
• The cumulative count of users up to each registration date (cumulative_count).
The results should be ordered by registration_date.

Explanation
To generate the report:

369
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Group the data by registration_date to calculate the daily new user count.
• Use COUNT(user_id) to count the number of new users on each specific day.
• Use the window function SUM() OVER() to calculate the cumulative count of users up to
the current date.
• Order the results by registration_date to show the daily progression of user
registrations.

Datasets and SQL Schemas


• - Table creation and sample data
-- Create the 'users' table
CREATE TABLE users (
user_id INT PRIMARY KEY,
registration_date DATE
);

-- Insert sample data into 'users' table


INSERT INTO users (user_id, registration_date)
VALUES
(1, '2024-01-01'),
(2, '2024-01-01'),
(3, '2024-01-02'),
(4, '2024-01-02'),
(5, '2024-01-03'),
(6, '2024-01-03'),
(7, '2024-01-03'),
(8, '2024-01-04'),
(9, '2024-01-04'),
(10, '2024-01-04');

Learnings
• Using COUNT() to calculate the number of users on each day.
• Using SUM() OVER() to calculate the cumulative count over a range of dates.
• Grouping by registration_date to aggregate user registrations.
• Ordering the data by date to maintain chronological order.

Solution
• - PostgreSQL and MySQL solution
SELECT
registration_date,
COUNT(user_id) AS new_users,
SUM(COUNT(user_id)) OVER (ORDER BY registration_date) AS cumulative_count
FROM
users
GROUP BY
registration_date
ORDER BY
registration_date;

Explanation of the Solution:


• COUNT(user_id): This counts the number of new users who registered on each specific
date.
• SUM(COUNT(user_id)) OVER (ORDER BY registration_date): This window function
calculates the cumulative sum of new users up to each date. By using the OVER() clause with
ORDER BY registration_date, we ensure the cumulative count is calculated correctly for
each date.
• GROUP BY registration_date: This ensures that the count of new users is calculated for
each unique registration_date.

370
1000+ SQL Interview Questions & Answers | By Zero Analyst

• ORDER BY registration_date: This sorts the results by registration_date, so the


report is displayed in chronological order.

Summary:
This SQL query effectively tracks user registrations by showing both daily new user counts
and cumulative totals, helping to analyze user growth over time. The use of window functions
(like SUM() OVER()) ensures a dynamic and efficient way to calculate cumulative values
based on chronological ordering.
• Q.296
Write an SQL query to find the second-highest salary of employees in the Engineering
department. The query should retrieve the second-highest salary for employees specifically in
the Engineering department.

Explanation
To solve this problem:
• Join the employees table with the departments table on department_id to get the
relevant salary and department information.
• Filter by the Engineering department.
• Use the RANK() window function to assign a rank to each employee’s salary in descending
order within the Engineering department.
• Select the salary with a rank of 2 to identify the second-highest salary.
• Group the result by department_name to ensure you output the result at the department
level.

Datasets and SQL Schemas


• - Table creation and sample data
-- Create the 'departments' table
CREATE TABLE departments (
department_id INT PRIMARY KEY,
department_name VARCHAR(50)
);

-- Insert sample data into 'departments' table


INSERT INTO departments (department_id, department_name)
VALUES
(1, 'Engineering'),
(2, 'Sales'),
(3, 'Marketing');

-- Create the 'employees' table


CREATE TABLE employees (
employee_id INT PRIMARY KEY,
department_id INT,
salary DECIMAL(10, 2),
FOREIGN KEY (department_id) REFERENCES departments(department_id)
);

-- Insert sample data into 'employees' table


INSERT INTO employees (employee_id, department_id, salary)
VALUES
(1, 1, 60000.00),
(2, 1, 75000.00),
(3, 1, 80000.00),
(4, 2, 50000.00),
(5, 2, 55000.00),

371
1000+ SQL Interview Questions & Answers | By Zero Analyst

(6, 3, 45000.00);

Learnings
• Using RANK() to rank employees' salaries in descending order within a specific
department.
• Filtering by rank to retrieve the second-highest salary.
• Window functions allow partitioning and ordering within groups (i.e., departments in this
case).
• JOIN operations to combine data from multiple tables.

Solution
• - PostgreSQL and MySQL solution
SELECT
department_name,
MAX(salary) AS second_highest_salary
FROM
(
SELECT
d.department_name,
e.salary,
RANK() OVER (PARTITION BY d.department_name ORDER BY e.salary DESC) AS salary_ra
nk
FROM
employees e
JOIN departments d ON e.department_id = d.department_id
WHERE
d.department_name = 'Engineering'
) ranked_salaries
WHERE
salary_rank = 2
GROUP BY
department_name;

Explanation of the Solution:


• RANK() OVER (PARTITION BY d.department_name ORDER BY e.salary DESC): This
ranks employees' salaries in descending order within each department. It assigns the rank 1 to
the highest salary, 2 to the second-highest salary, and so on.
• WHERE salary_rank = 2: This filters the results to select the employee(s) with the
second-highest salary in the Engineering department.
• MAX(salary): This ensures that if there are multiple employees with the same second-
highest salary, only one result is returned.
• GROUP BY department_name: Since we are only concerned with the Engineering
department, we group by department_name to output the second-highest salary per
department.

Summary:
This SQL query identifies the second-highest salary in the Engineering department by
ranking the employees' salaries in descending order and filtering for the rank 2. The use of
the RANK() window function allows handling ties in salaries, ensuring that the second-highest
salary is correctly identified. The MAX() function ensures that if multiple employees share the
same salary, only one result is returned.
• Q.297

372
1000+ SQL Interview Questions & Answers | By Zero Analyst

Write an SQL query to determine which manager oversees the largest team. The employees
table contains employee_id, manager_id, and department_id. Your task is to find the
manager_id and the team size (the number of employees they manage).

Explanation
To solve this problem:
• Group the data by manager_id to calculate the number of employees they manage.
• Use COUNT(employee_id) to count the number of employees under each manager.
• Order the results by team size in descending order to identify the manager with the
largest team.
• Limit the result to the top manager using LIMIT 1.

Datasets and SQL Schemas


• - Table creation and sample data
-- Create the 'employees' table
CREATE TABLE employees (
employee_id INT PRIMARY KEY,
manager_id INT,
department_id INT
);

-- Insert sample data into 'employees' table


INSERT INTO employees (employee_id, manager_id, department_id)
VALUES
(1, 3, 1),
(2, 3, 1),
(3, NULL, 1),
(4, 3, 1),
(5, 4, 2),
(6, 4, 2),
(7, 4, 2),
(8, 4, 2),
(9, 5, 2),
(10, 5, 2);

Learnings
• COUNT(employee_id): Used to calculate the number of employees managed by each
manager.
• GROUP BY manager_id: Aggregates the employee data based on the manager.
• ORDER BY team_size DESC: Ensures that the manager with the largest team is sorted at
the top.
• LIMIT 1: Restricts the result to the top manager only.

Solution
• - PostgreSQL and MySQL solution
SELECT
manager_id,
COUNT(employee_id) AS team_size
FROM
employees
GROUP BY
manager_id
ORDER BY
team_size DESC
LIMIT 1;

Explanation of the Solution:

373
1000+ SQL Interview Questions & Answers | By Zero Analyst

• COUNT(employee_id): This counts the number of employees managed by each manager.


NULL values for manager_id (representing top-level employees with no manager) are not
included in the count.
• GROUP BY manager_id: This groups the employees by their manager, allowing us to
calculate the team size for each manager.
• ORDER BY team_size DESC: Orders the results by team size in descending order, ensuring
the manager with the largest team appears first.
• LIMIT 1: Restricts the output to only the manager with the largest team.

Summary:
This SQL query identifies the manager who oversees the largest team by grouping employees
by manager_id and counting the number of employees under each manager. The result is
sorted in descending order by team size, and the LIMIT 1 clause ensures only the manager
with the largest team is selected.
• Q.298

Write an SQL query to generate a report of product names, sale years, and prices for each
sale from the sales table. The output should include the sale ID, product name, the year the
sale was made, and the price.

Explanation
To generate this report:
• Extract the year from the sale_date using the EXTRACT(YEAR FROM sale_date)
function.
• Select the necessary columns: sale_id, product_name, year (extracted from
sale_date), and price.
• Return all rows from the sales table to get a complete list of sales data with these details.

Datasets and SQL Schemas


• - Table creation and sample data
-- Create the 'sales' table
CREATE TABLE sales (
sale_id INT PRIMARY KEY,
product_name VARCHAR(50),
sale_date DATE,
price DECIMAL(10, 2)
);

-- Insert sample data into 'sales' table


INSERT INTO sales (sale_id, product_name, sale_date, price)
VALUES
(101, 'Laptop', '2024-01-15', 1200.00),
(102, 'Smartphone', '2024-02-20', 800.00),
(103, 'Tablet', '2024-03-10', 600.00),
(104, 'Laptop', '2024-04-05', 1100.00),
(105, 'Smartwatch', '2024-05-25', 300.00);

Learnings
• EXTRACT(YEAR FROM date): Used to extract the year from a date field.

374
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Selecting multiple columns: Ensures the report includes all relevant data, including sale
ID, product name, sale year, and price.
• Simple SELECT statement: Pulls data directly from the table.

Solution
• - PostgreSQL and MySQL solution
SELECT
sale_id,
product_name,
EXTRACT(YEAR FROM sale_date) AS year,
price
FROM
sales;

Explanation of the Solution:


• EXTRACT(YEAR FROM sale_date): This function is used to extract just the year from the
sale_date column, allowing us to display the year the sale occurred.
• Columns selected: The query retrieves the sale_id, product_name, year, and price for
each sale.
• The query returns all rows from the sales table that match the column requirements.

Expected Output:
sale_id | product_name | year | price
--------|--------------|------|------
101 | Laptop | 2024 | 1200.00
102 | Smartphone | 2024 | 800.00
103 | Tablet | 2024 | 600.00
104 | Laptop | 2024 | 1100.00
105 | Smartwatch | 2024 | 300.00

Summary:
This SQL query retrieves detailed sales information from the sales table, including the
product name, sale year (extracted from sale_date), and price for each sale. The EXTRACT()
function is used to extract the year from the sale date, and the output is structured to display
the required fields.
• Q.299

Write an SQL query to find the second most recent order date for each customer from the
Orders table. The table contains the following columns:

• OrderID
• CustomerID
• OrderDate
The output should display the CustomerID and the second most recent order date for each
customer.

Explanation
To find the second most recent order date for each customer:
• Order the data by OrderDate for each customer in descending order.

375
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Rank the rows for each customer using the ROW_NUMBER() or RANK() window function.
• Filter the results to keep only the rows with the second most recent order (i.e., where the
rank is 2).

Datasets and SQL Schemas


• - Table creation and sample data
-- Create the 'Orders' table
CREATE TABLE Orders (
OrderID INT PRIMARY KEY,
CustomerID INT,
OrderDate DATE
);

-- Insert sample data into 'Orders' table


INSERT INTO Orders (OrderID, CustomerID, OrderDate)
VALUES
(1, 101, '2024-01-10'),
(2, 102, '2024-01-12'),
(3, 101, '2024-02-15'),
(4, 103, '2024-03-01'),
(5, 102, '2024-03-05'),
(6, 101, '2024-03-10'),
(7, 103, '2024-04-02');

Learnings
• Window Functions (ROW_NUMBER() or RANK()): Used to assign a rank or number to each
row within a partition (grouped by CustomerID), which helps in identifying the most recent
and second most recent records.
• Using WHERE to filter by rank: Once rows are ranked, filter to get the second most recent
record.

Solution
• - PostgreSQL and MySQL solution
WITH RankedOrders AS (
SELECT
CustomerID,
OrderDate,
ROW_NUMBER() OVER (PARTITION BY CustomerID ORDER BY OrderDate DESC) AS OrderRank
FROM Orders
)
SELECT
CustomerID,
OrderDate AS SecondMostRecentOrderDate
FROM RankedOrders
WHERE OrderRank = 2;

Explanation of the Solution:


• ROW_NUMBER() OVER (PARTITION BY CustomerID ORDER BY OrderDate DESC):
• The ROW_NUMBER() function assigns a unique rank to each row for each CustomerID based
on the descending order of OrderDate.
• This helps in identifying the most recent and second most recent orders for each customer.
• WITH RankedOrders AS:
• A Common Table Expression (CTE) is used to rank the orders by OrderDate for each
CustomerID.
• WHERE OrderRank = 2:
• Filters the result to include only the rows where the rank is 2, which corresponds to the
second most recent order.

376
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Q.300
Write an SQL query to calculate the running total of sales for each brand, ordered by
sale_date. The brand_sales table contains the following columns:

• sale_id
• brand
• sale_date
• sale_amount
The output should display the brand, sale_id, sale_date, and the cumulative sum of
sale_amount for each brand in a running total.

Explanation
To calculate the running total of sales by each brand:
• Use the SUM() function along with a window function to calculate the cumulative sum of
sale_amount for each brand.
• Partition the data by brand to calculate the running total separately for each brand.
• Order the data by sale_date to ensure the running total is calculated chronologically for
each brand.

Datasets and SQL Schemas


• - Table creation and sample data
-- Create the 'brand_sales' table
CREATE TABLE brand_sales (
sale_id INT PRIMARY KEY,
brand VARCHAR(50),
sale_date DATE,
sale_amount DECIMAL(10, 2)
);

-- Insert sample data into 'brand_sales' table


INSERT INTO brand_sales (sale_id, brand, sale_date, sale_amount)
VALUES
(1, 'Apple', '2024-01-10', 500.00),
(2, 'Samsung', '2024-01-12', 300.00),
(3, 'Apple', '2024-01-15', 600.00),
(4, 'Apple', '2024-01-20', 700.00),
(5, 'Samsung', '2024-01-25', 400.00),
(6, 'Google', '2024-01-30', 250.00),
(7, 'Samsung', '2024-02-05', 350.00),
(8, 'Google', '2024-02-10', 500.00),
(9, 'Apple', '2024-02-15', 750.00),
(10, 'Samsung', '2024-02-20', 450.00),
(11, 'Apple', '2024-02-22', 800.00),
(12, 'Google', '2024-02-25', 650.00),
(13, 'Samsung', '2024-03-01', 550.00),
(14, 'Apple', '2024-03-05', 900.00),
(15, 'Google', '2024-03-10', 300.00),
(16, 'Samsung', '2024-03-15', 600.00),
(17, 'Apple', '2024-03-20', 1200.00),
(18, 'Google', '2024-03-25', 700.00),
(19, 'Samsung', '2024-03-28', 850.00),
(20, 'Apple', '2024-03-30', 1500.00);

Learnings
• Window Functions (SUM() with OVER() clause): The SUM() function is used with the
OVER() clause to calculate the running total of sales. The PARTITION BY clause groups the
data by brand, and the ORDER BY clause ensures the sales are summed in chronological order.

377
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Cumulative sum: The running total is computed progressively for each brand, which is
useful for tracking sales growth over time.

Solution
• - PostgreSQL and MySQL solution
SELECT
brand,
sale_id,
sale_date,
sale_amount,
SUM(sale_amount) OVER (PARTITION BY brand ORDER BY sale_date) AS running_total
FROM
brand_sales
ORDER BY
brand, sale_date;

Explanation of the Solution:


• SUM(sale_amount) OVER (PARTITION BY brand ORDER BY sale_date):
• The SUM() function calculates the running total of sale_amount for each brand partitioned
by brand and ordered by sale_date.
• This ensures that the sales amounts are accumulated for each brand in chronological order.
• ORDER BY brand, sale_date:
• The final result is ordered first by brand and then by sale_date, ensuring the cumulative
totals are presented in the correct order.
• PARTITION BY brand

Google
• Q.301
Question
Calculate the 3-month rolling average of total revenue from purchases, excluding returns
(negative amounts), grouped by year-month (YYYY-MM). Sort the result from the earliest to
the latest month.
Explanation
To solve this, you need to:
• Exclude negative purchase amounts (returns).
• Group data by year-month.
• Calculate the total revenue for each month.
• Compute the 3-month rolling average using a window function.
• Sort the results by year-month in ascending order.
Datasets and SQL Schemas
• - Table creation and sample data
CREATE TABLE amazon_purchases (
created_at DATETIME,
purchase_amt BIGINT,
user_id BIGINT
);

-- Sample data insertions


INSERT INTO amazon_purchases (created_at, purchase_amt, user_id)
VALUES
('2023-01-05', 1500, 101),
('2023-01-15', -200, 102),

378
1000+ SQL Interview Questions & Answers | By Zero Analyst

('2023-02-10', 2000, 103),


('2023-02-20', 1200, 101),
('2023-03-01', 1800, 104),
('2023-03-15', -100, 102),
('2023-04-05', 2200, 105),
('2023-04-10', 1400, 103),
('2023-05-01', 2500, 106),
('2023-05-15', 1700, 107),
('2023-06-05', 1300, 108),
('2023-06-15', 1900, 109);

Learnings
• Grouping by month using TO_CHAR() or DATE_FORMAT().
• Filtering out negative purchase values with WHERE purchase_amt > 0.
• Using window functions (AVG()) for rolling averages.
• Sorting by date to maintain chronological order.
Solutions
• - PostgreSQL solution
WITH monthly_revenue AS (
SELECT
TO_CHAR(created_at, 'YYYY-MM') AS month,
SUM(purchase_amt) AS total_revenue
FROM amazon_purchases
WHERE purchase_amt > 0
GROUP BY TO_CHAR(created_at, 'YYYY-MM')
),
rolling_avg AS (
SELECT
month,
AVG(total_revenue) OVER (ORDER BY month ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
) AS three_month_avg
FROM monthly_revenue
)
SELECT month, three_month_avg
FROM rolling_avg
ORDER BY month;
• - MySQL solution
WITH monthly_revenue AS (
SELECT
DATE_FORMAT(created_at, '%Y-%m') AS month,
SUM(purchase_amt) AS total_revenue
FROM amazon_purchases
WHERE purchase_amt > 0
GROUP BY DATE_FORMAT(created_at, '%Y-%m')
),
rolling_avg AS (
SELECT
month,
AVG(total_revenue) OVER (ORDER BY month ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
) AS three_month_avg
FROM monthly_revenue
)
SELECT month, three_month_avg
FROM rolling_avg
ORDER BY month;
• Q.302
Find the fifth highest salary from the com_worker table without using TOP or LIMIT. Note:
Duplicate salaries should not be removed.
Explanation

379
1000+ SQL Interview Questions & Answers | By Zero Analyst

To solve this, you can use a common table expression (CTE) and window functions such as
RANK() or DENSE_RANK() to rank the salaries in descending order. Then, you can filter for the
salary that has the rank of 5.
Datasets and SQL Schemas
• - Table creation and sample data
CREATE TABLE com_worker (
worker_id BIGINT PRIMARY KEY,
department VARCHAR(25),
first_name VARCHAR(25),
last_name VARCHAR(25),
joining_date DATETIME,
salary BIGINT
);

-- Sample data insertions


INSERT INTO com_worker (worker_id, department, first_name, last_name, joining_date, sala
ry)
VALUES
(1, 'HR', 'John', 'Doe', '2020-01-15', 50000),
(2, 'IT', 'Jane', 'Smith', '2019-03-10', 60000),
(3, 'Finance', 'Emily', 'Jones', '2021-06-20', 75000),
(4, 'Sales', 'Michael', 'Brown', '2018-09-05', 60000),
(5, 'Marketing', 'Chris', 'Johnson', '2022-04-12', 70000),
(6, 'IT', 'David', 'Wilson', '2020-11-01', 80000),
(7, 'Finance', 'Sarah', 'Taylor', '2017-05-25', 45000),
(8, 'HR', 'James', 'Anderson', '2023-01-09', 65000),
(9, 'Sales', 'Anna', 'Thomas', '2020-02-18', 55000),
(10, 'Marketing', 'Robert', 'Jackson', '2021-07-14', 60000);

Learnings
• Using window functions like RANK() or DENSE_RANK() to assign ranks based on salary.
• Handling duplicate values in ranking without filtering them out.
• Using CTEs for better query organization and readability.
• Filtering based on rank to find specific salary positions.
Solutions
• - PostgreSQL solution
WITH ranked_salaries AS (
SELECT
salary,
RANK() OVER (ORDER BY salary DESC) AS rank
FROM com_worker
)
SELECT salary
FROM ranked_salaries
WHERE rank = 5;
• - MySQL solution
WITH ranked_salaries AS (
SELECT
salary,
RANK() OVER (ORDER BY salary DESC) AS rank
FROM com_worker
)
SELECT salary
FROM ranked_salaries
WHERE rank = 5;
• Q.303
Find the top 3 most common letters across all words in both google_file_store and
google_word_lists tables (ignore the filename column in google_file_store). Output
the letter along with the number of occurrences, ordered in descending order by occurrences.

380
1000+ SQL Interview Questions & Answers | By Zero Analyst

Explanation
To solve this, you need to:
• Extract all words from both tables.
• Break each word into individual characters.
• Count the frequency of each letter, ignoring spaces and non-alphabetic characters.
• Sort the results based on the number of occurrences, and select the top 3 most common
letters.
Datasets and SQL Schemas
• - Table creation and sample data
CREATE TABLE google_file_store (
contents VARCHAR(MAX),
filename VARCHAR(255)
);

INSERT INTO google_file_store (contents, filename)


VALUES
('This is a sample content with some words.', 'file1.txt'),
('Another file with more words and letters.', 'file2.txt'),
('Text for testing purposes with various characters.', 'file3.txt');

CREATE TABLE google_word_lists (


words1 VARCHAR(MAX),
words2 VARCHAR(MAX)
);

INSERT INTO google_word_lists (words1, words2)


VALUES
('apple banana cherry', 'dog elephant fox'),
('grape honeydew kiwi', 'lemon mango nectarine'),
('orange papaya quince', 'raspberry strawberry tangerine');

Learnings
• String manipulation and breaking down words into characters using REGEXP_REPLACE() or
similar functions.
• Using UNION ALL to combine results from multiple tables.
• Filtering out non-alphabetic characters to focus on letters only.
• Aggregating data using GROUP BY and ordering the result by frequency.
Solutions
• - PostgreSQL solution
WITH letter_counts AS (
-- Extract letters from both tables and count occurrences
SELECT LOWER(regexp_replace(word, '[^a-zA-Z]', '', 'g')) AS letter
FROM (
SELECT unnest(string_to_array(contents, ' ')) AS word FROM google_file_store
UNION ALL
SELECT unnest(string_to_array(words1, ' ')) FROM google_word_lists
UNION ALL
SELECT unnest(string_to_array(words2, ' ')) FROM google_word_lists
) AS all_words
)
SELECT letter, COUNT(*) AS occurrences
FROM letter_counts
WHERE letter <> ''
GROUP BY letter
ORDER BY occurrences DESC
LIMIT 3;
• - MySQL solution
WITH letter_counts AS (
-- Extract letters from both tables and count occurrences

381
1000+ SQL Interview Questions & Answers | By Zero Analyst

SELECT LOWER(REGEXP_REPLACE(word, '[^a-zA-Z]', '')) AS letter


FROM (
SELECT SUBSTRING_INDEX(SUBSTRING_INDEX(contents, ' ', n.n), ' ', -1) AS word
FROM google_file_store
CROSS JOIN (SELECT 1 AS n UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT
4 UNION ALL SELECT 5) n
UNION ALL
SELECT SUBSTRING_INDEX(SUBSTRING_INDEX(words1, ' ', n.n), ' ', -1)
FROM google_word_lists
CROSS JOIN (SELECT 1 AS n UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT
4 UNION ALL SELECT 5) n
UNION ALL
SELECT SUBSTRING_INDEX(SUBSTRING_INDEX(words2, ' ', n.n), ' ', -1)
FROM google_word_lists
CROSS JOIN (SELECT 1 AS n UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT
4 UNION ALL SELECT 5) n
) AS all_words
)
SELECT letter, COUNT(*) AS occurrences
FROM letter_counts
WHERE letter <> ''
GROUP BY letter
ORDER BY occurrences DESC
LIMIT 3;
• Q.304
Identify the top 5 percentile of claims with the highest fraud scores in each state as potentially
fraudulent. Output the policy number, state, claim cost, and fraud score.
Explanation
To solve this:
• Group the claims by state.
• Calculate the 95th percentile fraud score for each state.
• Filter the claims where the fraud score is higher than or equal to the 95th percentile fraud
score for that state.
• Output the policy number, state, claim cost, and fraud score for these suspicious claims.
Datasets and SQL Schemas
• - Table creation and sample data
CREATE TABLE claims (
policy_number VARCHAR(50),
state VARCHAR(50),
claim_cost FLOAT,
fraud_score FLOAT
);

-- Sample data insertions


INSERT INTO claims (policy_number, state, claim_cost, fraud_score)
VALUES
('POL123', 'CA', 10000.00, 85.5),
('POL124', 'CA', 5000.00, 70.2),
('POL125', 'CA', 20000.00, 92.8),
('POL126', 'NY', 15000.00, 88.1),
('POL127', 'NY', 8000.00, 65.4),
('POL128', 'NY', 25000.00, 93.7),
('POL129', 'TX', 12000.00, 75.3),
('POL130', 'TX', 18000.00, 95.2),
('POL131', 'TX', 9000.00, 60.0),
('POL132', 'FL', 11000.00, 82.0),
('POL133', 'FL', 14000.00, 87.5),
('POL134', 'FL', 30000.00, 99.0);

Learnings
• Using PERCENTILE_CONT() to calculate percentiles.

382
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Filtering data based on a calculated threshold.


• Grouping data by a specific attribute (in this case, state).
• Using window functions for calculating percentiles without subqueries.
Solutions
• - PostgreSQL solution
WITH state_percentiles AS (
SELECT
state,
PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY fraud_score DESC) AS fraud_score_95
FROM claims
GROUP BY state
)
SELECT c.policy_number, c.state, c.claim_cost, c.fraud_score
FROM claims c
JOIN state_percentiles sp ON c.state = sp.state
WHERE c.fraud_score >= sp.fraud_score_95
ORDER BY c.state, c.fraud_score DESC;
• - MySQL solution (using PERCENTILE_CONT is not directly supported in MySQL, so an
approximation can be used with variables)
WITH state_percentiles AS (
SELECT
state,
MAX(fraud_score) * 0.95 AS fraud_score_95 -- approximating 95th percentile
FROM claims
GROUP BY state
)
SELECT c.policy_number, c.state, c.claim_cost, c.fraud_score
FROM claims c
JOIN state_percentiles sp ON c.state = sp.state
WHERE c.fraud_score >= sp.fraud_score_95
ORDER BY c.state, c.fraud_score DESC;
• Q.305
Find the most common combination of department and salary in the employees table.
Output the department, salary, and the number of employees in that combination, ordered by
the count in descending order.
Explanation
To solve this:
• Group the data by both department and salary.
• Count the number of employees in each group.
• Order the results by the count in descending order to find the most common combinations.
Datasets and SQL Schemas
• - Table creation and sample data
CREATE TABLE employees (
employee_id INT PRIMARY KEY,
department VARCHAR(50),
salary INT
);

-- Sample data insertions


INSERT INTO employees (employee_id, department, salary)
VALUES
(1, 'HR', 50000),
(2, 'IT', 60000),
(3, 'IT', 60000),
(4, 'Finance', 70000),
(5, 'HR', 50000),
(6, 'IT', 75000),
(7, 'HR', 50000),

383
1000+ SQL Interview Questions & Answers | By Zero Analyst

(8, 'Finance', 70000),


(9, 'Sales', 60000),
(10, 'Finance', 70000);

Learnings
• Using GROUP BY to group data by multiple columns (department and salary).
• Counting occurrences with COUNT().
• Sorting results in descending order using ORDER BY.
Solutions
• - PostgreSQL solution
SELECT department, salary, COUNT(*) AS employee_count
FROM employees
GROUP BY department, salary
ORDER BY employee_count DESC;
• - MySQL solution
SELECT department, salary, COUNT(*) AS employee_count
FROM employees
GROUP BY department, salary
ORDER BY employee_count DESC;
• Q.306
Find the most common domain (excluding the www. prefix) used in email addresses from the
users table. Output the domain and the number of occurrences, ordered by the count in
descending order.
Explanation
To solve this:
• Extract the domain from the email addresses by splitting the string on the @ symbol.
• Remove the www. prefix (if it exists) from the domain part.
• Group the results by domain and count the occurrences.
• Order the result by the count in descending order to find the most common domains.
Datasets and SQL Schemas
• - Table creation and sample data
CREATE TABLE users (
user_id INT PRIMARY KEY,
email VARCHAR(255)
);

-- Sample data insertions


INSERT INTO users (user_id, email)
VALUES
(1, '[email protected]'),
(2, '[email protected]'),
(3, '[email protected]'),
(4, '[email protected]'),
(5, '[email protected]'),
(6, '[email protected]'),
(7, '[email protected]'),
(8, '[email protected]'),
(9, '[email protected]'),
(10, '[email protected]');

Learnings
• Using SUBSTRING_INDEX() to extract parts of a string.
• Using REPLACE() to remove unwanted substrings (www. in this case).
• Grouping data by specific substrings (in this case, the domain of email addresses).

384
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Counting occurrences using COUNT() and sorting using ORDER BY.


Solutions
• - PostgreSQL solution
SELECT
REPLACE(SUBSTRING(email FROM '@(.*)') , 'www.', '') AS domain,
COUNT(*) AS domain_count
FROM users
GROUP BY domain
ORDER BY domain_count DESC;
• - MySQL solution
SELECT
REPLACE(SUBSTRING_INDEX(email, '@', -1), 'www.', '') AS domain,
COUNT(*) AS domain_count
FROM users
GROUP BY domain
ORDER BY domain_count DESC;
• Q.307
Find the second most expensive product in each category from the products table. Output
the category, product name, and price.
Explanation
To solve this:
• Group the products by category.
• Sort the products within each category by price in descending order.
• Use a window function or subquery to identify the second most expensive product for each
category.
• Output the category, product name, and price.
Datasets and SQL Schemas
• - Table creation and sample data
CREATE TABLE products (
product_id INT PRIMARY KEY,
category VARCHAR(50),
product_name VARCHAR(100),
price DECIMAL(10, 2)
);

-- Sample data insertions


INSERT INTO products (product_id, category, product_name, price)
VALUES
(1, 'Electronics', 'Smartphone', 999.99),
(2, 'Electronics', 'Laptop', 1299.99),
(3, 'Electronics', 'Tablet', 599.99),
(4, 'Furniture', 'Sofa', 799.99),
(5, 'Furniture', 'Coffee Table', 199.99),
(6, 'Furniture', 'Bookshelf', 299.99),
(7, 'Clothing', 'Jacket', 120.00),
(8, 'Clothing', 'T-shirt', 25.00),
(9, 'Clothing', 'Jeans', 80.00),
(10, 'Clothing', 'Sweater', 55.00);

Learnings
• Using RANK() or ROW_NUMBER() to rank items by price within a category.
• Filtering for the second most expensive item by selecting the row with rank = 2.
• Grouping data by category to find the second most expensive item in each.
Solutions
• - PostgreSQL solution

385
1000+ SQL Interview Questions & Answers | By Zero Analyst

WITH ranked_products AS (
SELECT
category,
product_name,
price,
ROW_NUMBER() OVER (PARTITION BY category ORDER BY price DESC) AS rank
FROM products
)
SELECT category, product_name, price
FROM ranked_products
WHERE rank = 2;
• - MySQL solution
WITH ranked_products AS (
SELECT
category,
product_name,
price,
ROW_NUMBER() OVER (PARTITION BY category ORDER BY price DESC) AS rank
FROM products
)
SELECT category, product_name, price
FROM ranked_products
WHERE rank = 2;
• Q.308
Find the number of unique products purchased by each customer in the orders table. Output
the customer ID and the count of distinct product IDs they have purchased.
Explanation
To solve this:
• Group the data by customer_id.
• Count the distinct product_id for each customer.
• Output the customer ID and the number of unique products they have purchased.
Datasets and SQL Schemas
• - Table creation and sample data
CREATE TABLE orders (
order_id INT PRIMARY KEY,
customer_id INT,
product_id INT,
order_date DATE
);

-- Sample data insertions


INSERT INTO orders (order_id, customer_id, product_id, order_date)
VALUES
(1, 101, 1001, '2023-01-01'),
(2, 101, 1002, '2023-01-02'),
(3, 102, 1003, '2023-01-05'),
(4, 102, 1002, '2023-01-07'),
(5, 103, 1001, '2023-01-10'),
(6, 104, 1004, '2023-01-12'),
(7, 101, 1001, '2023-01-15'),
(8, 103, 1002, '2023-01-16'),
(9, 104, 1005, '2023-01-18');

Learnings
• Using COUNT(DISTINCT ...) to count unique occurrences.
• Grouping data by a specific attribute (customer_id in this case).
• Understanding how to aggregate data by grouping and counting distinct values.
Solutions

386
1000+ SQL Interview Questions & Answers | By Zero Analyst

• - PostgreSQL solution
SELECT customer_id, COUNT(DISTINCT product_id) AS unique_products
FROM orders
GROUP BY customer_id;
• - MySQL solution
SELECT customer_id, COUNT(DISTINCT product_id) AS unique_products
FROM orders
GROUP BY customer_id;
• Q.309
Find the longest streak of consecutive days with a purchase made by the same customer in the
orders table. Output the customer ID, the starting date of the streak, the ending date, and the
number of consecutive days in the streak.
Explanation
To solve this:
• You need to identify consecutive dates for each customer. A sequence of consecutive dates
is defined by the difference between each order date and the previous one being exactly 1
day.
• Group the data by customer_id and create a "group" for each consecutive streak of days.
• Count the length of each streak and select the longest one for each customer.
Datasets and SQL Schemas
• - Table creation and sample data
CREATE TABLE orders (
order_id INT PRIMARY KEY,
customer_id INT,
order_date DATE
);

-- Sample data insertions


INSERT INTO orders (order_id, customer_id, order_date)
VALUES
(1, 101, '2023-01-01'),
(2, 101, '2023-01-02'),
(3, 101, '2023-01-04'),
(4, 101, '2023-01-05'),
(5, 102, '2023-01-03'),
(6, 102, '2023-01-04'),
(7, 102, '2023-01-05'),
(8, 103, '2023-01-01'),
(9, 103, '2023-01-03'),
(10, 103, '2023-01-04');

Learnings
• Using LEAD() or LAG() window functions to find consecutive rows.
• Using a combination of date comparison and grouping techniques to identify consecutive
streaks.
• Aggregating data using window functions and conditional logic to group consecutive days.
Solutions
• - PostgreSQL solution
WITH consecutive_dates AS (
SELECT
customer_id,
order_date,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_date) -
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_date) AS streak_group
FROM orders
),

387
1000+ SQL Interview Questions & Answers | By Zero Analyst

streak_lengths AS (
SELECT
customer_id,
MIN(order_date) AS streak_start,
MAX(order_date) AS streak_end,
COUNT(*) AS streak_length
FROM consecutive_dates
GROUP BY customer_id, streak_group
)
SELECT customer_id, streak_start, streak_end, streak_length
FROM streak_lengths
WHERE streak_length = (
SELECT MAX(streak_length)
FROM streak_lengths
WHERE customer_id = streak_lengths.customer_id
)
ORDER BY customer_id;
• - MySQL solution
WITH consecutive_dates AS (
SELECT
customer_id,
order_date,
DATEDIFF(order_date,
(SELECT MAX(order_date)
FROM orders o2
WHERE o2.customer_id = orders.customer_id AND o2.order_date < orders.order_
date)
) AS streak_group
FROM orders
),
streak_lengths AS (
SELECT
customer_id,
MIN(order_date) AS streak_start,
MAX(order_date) AS streak_end,
COUNT(*) AS streak_length
FROM consecutive_dates
GROUP BY customer_id, streak_group
)
SELECT customer_id, streak_start, streak_end, streak_length
FROM streak_lengths
WHERE streak_length = (
SELECT MAX(streak_length)
FROM streak_lengths
WHERE customer_id = streak_lengths.customer_id
)
ORDER BY customer_id;

Explanation
• We calculate the difference between each order date and the previous one to identify
consecutive days using ROW_NUMBER() and DATEDIFF().
• We then group consecutive dates by using the difference of row numbers or by calculating
the streak group.
• The final step aggregates the streaks and filters the longest consecutive streak per
customer.
This query requires using window functions, grouping, and date comparison logic, making it
a more advanced solution for identifying streaks.
• Q.310
Find all email addresses from the users table that are from Gmail but have non-standard
domains (i.e., they do not end with "gmail.com", but may have additional subdomains).
Output the user ID, email address, and the domain part of the email (after the '@' symbol).
Explanation

388
1000+ SQL Interview Questions & Answers | By Zero Analyst

To solve this:
• Use regular expressions to match Gmail email addresses.
• Identify Gmail emails with non-standard domains (those that have additional subdomains).
• Extract the domain part of the email using string functions.
• Output the user ID, email, and the domain part for each matching email.
Datasets and SQL Schemas
• - Table creation and sample data
CREATE TABLE users (
user_id INT PRIMARY KEY,
email VARCHAR(255)
);

-- Sample data insertions


INSERT INTO users (user_id, email)
VALUES
(1, '[email protected]'),
(2, '[email protected]'),
(3, '[email protected]'),
(4, '[email protected]'),
(5, '[email protected]'),
(6, '[email protected]'),
(7, '[email protected]'),
(8, '[email protected]'),
(9, '[email protected]'),
(10, '[email protected]');

Learnings
• Using regular expressions to match patterns in email addresses.
• Extracting domain parts from emails using string functions.
• Filtering emails based on specific patterns such as non-standard Gmail domains.
Solutions
• - PostgreSQL solution
SELECT
user_id,
email,
SUBSTRING(email FROM '@(.*)$') AS domain
FROM users
WHERE email ~* '^.*@gmail\..+'
AND email !~* '@gmail\.com$'
ORDER BY user_id;
• - MySQL solution
SELECT
user_id,
email,
SUBSTRING_INDEX(email, '@', -1) AS domain
FROM users
WHERE email REGEXP '^.*@gmail\\..+'
AND email NOT REGEXP '@gmail\\.com$'
ORDER BY user_id;

Explanation
• PostgreSQL:
• email ~* '^.*@gmail\..+': This regular expression matches emails that are from Gmail
but with additional subdomains (i.e., gmail followed by any subdomain).
• email !~* '@gmail\.com$': This ensures the email does not end with "gmail.com",
excluding standard Gmail addresses.
• SUBSTRING(email FROM '@(.*)$'): Extracts the domain part from the email.

389
1000+ SQL Interview Questions & Answers | By Zero Analyst

• MySQL:
• REGEXP '^.*@gmail\\..+': Matches Gmail emails with subdomains.
• NOT REGEXP '@gmail\\.com$': Ensures the email address does not end with
"gmail.com".
• SUBSTRING_INDEX(email, '@', -1): Extracts the domain part of the email.
• Q.311

Question
Write a SQL query to find employees who have the highest salary in each of the departments.

Explanation
To solve this, you need to join the Employee table with the Department table, group by the
DepartmentId, and select the employee(s) with the highest salary in each department. This
can be achieved using a subquery or JOIN with aggregation.

Datasets and SQL Schemas


• - Table creation and sample data
CREATE TABLE Employee (
Id INT PRIMARY KEY,
Name VARCHAR(100),
Salary DECIMAL,
DepartmentId INT
);

CREATE TABLE Department (


Id INT PRIMARY KEY,
Name VARCHAR(100)
);

-- Sample data insertions


INSERT INTO Employee (Id, Name, Salary, DepartmentId)
VALUES
(1, 'Joe', 70000, 1),
(2, 'Jim', 90000, 1),
(3, 'Henry', 80000, 2),
(4, 'Sam', 60000, 2),
(5, 'Max', 90000, 1);

INSERT INTO Department (Id, Name)


VALUES
(1, 'IT'),
(2, 'Sales');

Learnings
• Use of JOIN to combine related data from different tables.
• GROUP BY and MAX() to find the highest salary in each department.
• Using subqueries to filter out employees with the highest salary.

Solutions
• - PostgreSQL solution
SELECT d.Name AS Department, e.Name AS Employee, e.Salary
FROM Employee e
JOIN Department d ON e.DepartmentId = d.Id
WHERE e.Salary = (
SELECT MAX(Salary)
FROM Employee
WHERE DepartmentId = e.DepartmentId
);

390
1000+ SQL Interview Questions & Answers | By Zero Analyst

• - MySQL solution
SELECT d.Name AS Department, e.Name AS Employee, e.Salary
FROM Employee e
JOIN Department d ON e.DepartmentId = d.Id
WHERE e.Salary = (
SELECT MAX(Salary)
FROM Employee
WHERE DepartmentId = e.DepartmentId
);
• Q.312

Question
Write a SQL query to print the node id and the type of the node (Root, Inner, Leaf). Sort the
result by the node id.

Explanation
To classify the nodes into Root, Inner, or Leaf:
• A Root node has p_id as NULL.
• An Inner node has a parent (p_id is not NULL) and at least one child.
• A Leaf node has a parent (p_id is not NULL) and no children.

Datasets and SQL Schemas


• - Table creation and sample data
CREATE TABLE tree (
id INT PRIMARY KEY,
p_id INT
);

-- Sample data insertions


INSERT INTO tree (id, p_id)
VALUES
(1, NULL),
(2, 1),
(3, 1),
(4, 2),
(5, 2);

Learnings
• Use of CASE statements to categorize the nodes.
• Identifying child nodes by checking for the absence or presence of a node with a matching
p_id.
• Sorting the output by node id.

Solutions
• - PostgreSQL solution
SELECT id,
CASE
WHEN p_id IS NULL THEN 'Root'
WHEN id IN (SELECT DISTINCT p_id FROM tree WHERE p_id IS NOT NULL) THEN 'Inne
r'
ELSE 'Leaf'
END AS Type
FROM tree
ORDER BY id;
• - MySQL solution
SELECT id,
CASE

391
1000+ SQL Interview Questions & Answers | By Zero Analyst

WHEN p_id IS NULL THEN 'Root'


WHEN id IN (SELECT DISTINCT p_id FROM tree WHERE p_id IS NOT NULL) THEN 'Inne
r'
ELSE 'Leaf'
END AS Type
FROM tree
ORDER BY id;
• Q.313

Question
Write a SQL query to report the league statistics, including matches played, points, goals
scored, goals conceded, and goal difference for each team.

Explanation
To calculate the statistics:
• matches_played: Count the number of times the team appears as a home or away team.
• points: 3 points for a win, 0 points for a loss, and 1 point for a draw.
• goal_for: The total goals scored by the team in all matches.
• goal_against: The total goals conceded by the team in all matches.
• goal_diff: Calculated as goal_for - goal_against.
The results need to be sorted by:
• points in descending order.
• If points are tied, by goal_diff in descending order.
• If both points and goal_diff are tied, by team_name lexicographically.

Datasets and SQL Schemas


• - Table creation and sample data
CREATE TABLE Teams (
team_id INT PRIMARY KEY,
team_name VARCHAR(100)
);

CREATE TABLE Matches (


home_team_id INT,
away_team_id INT,
home_team_goals INT,
away_team_goals INT,
PRIMARY KEY(home_team_id, away_team_id)
);

-- Sample data insertions


INSERT INTO Teams (team_id, team_name)
VALUES
(1, 'Team A'),
(2, 'Team B'),
(3, 'Team C');

INSERT INTO Matches (home_team_id, away_team_id, home_team_goals, away_team_goals)


VALUES
(1, 2, 2, 1),
(1, 3, 1, 1),
(2, 3, 3, 2),
(2, 1, 0, 1),
(3, 1, 1, 2);

Learnings
• Combining data from multiple tables using JOIN.

392
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Using conditional aggregation to calculate points, goals scored, and goals conceded.
• Sorting with multiple criteria using ORDER BY.

Solutions
• - PostgreSQL solution
SELECT t.team_name,
COUNT(m.home_team_id) + COUNT(m.away_team_id) AS matches_played,
SUM(CASE
WHEN (m.home_team_id = t.team_id AND m.home_team_goals > m.away_team_goal
s)
OR (m.away_team_id = t.team_id AND m.away_team_goals > m.home_team_g
oals) THEN 3
WHEN m.home_team_goals = m.away_team_goals THEN 1
ELSE 0
END) AS points,
SUM(CASE
WHEN m.home_team_id = t.team_id THEN m.home_team_goals
WHEN m.away_team_id = t.team_id THEN m.away_team_goals
END) AS goal_for,
SUM(CASE
WHEN m.home_team_id = t.team_id THEN m.away_team_goals
WHEN m.away_team_id = t.team_id THEN m.home_team_goals
END) AS goal_against,
SUM(CASE
WHEN m.home_team_id = t.team_id THEN m.home_team_goals
WHEN m.away_team_id = t.team_id THEN m.away_team_goals
END) - SUM(CASE
WHEN m.home_team_id = t.team_id THEN m.away_team_goals
WHEN m.away_team_id = t.team_id THEN m.home_team_goals
END) AS goal_diff
FROM Teams t
LEFT JOIN Matches m ON t.team_id = m.home_team_id OR t.team_id = m.away_team_id
GROUP BY t.team_name
ORDER BY points DESC, goal_diff DESC, t.team_name;
• - MySQL solution
SELECT t.team_name,
COUNT(m.home_team_id) + COUNT(m.away_team_id) AS matches_played,
SUM(CASE
WHEN (m.home_team_id = t.team_id AND m.home_team_goals > m.away_team_goal
s)
OR (m.away_team_id = t.team_id AND m.away_team_goals > m.home_team_g
oals) THEN 3
WHEN m.home_team_goals = m.away_team_goals THEN 1
ELSE 0
END) AS points,
SUM(CASE
WHEN m.home_team_id = t.team_id THEN m.home_team_goals
WHEN m.away_team_id = t.team_id THEN m.away_team_goals
END) AS goal_for,
SUM(CASE
WHEN m.home_team_id = t.team_id THEN m.away_team_goals
WHEN m.away_team_id = t.team_id THEN m.home_team_goals
END) AS goal_against,
SUM(CASE
WHEN m.home_team_id = t.team_id THEN m.home_team_goals
WHEN m.away_team_id = t.team_id THEN m.away_team_goals
END) - SUM(CASE
WHEN m.home_team_id = t.team_id THEN m.away_team_goals
WHEN m.away_team_id = t.team_id THEN m.home_team_goals
END) AS goal_diff
FROM Teams t
LEFT JOIN Matches m ON t.team_id = m.home_team_id OR t.team_id = m.away_team_id
GROUP BY t.team_name
ORDER BY points DESC, goal_diff DESC, t.team_name;
• Q.314

Question

393
1000+ SQL Interview Questions & Answers | By Zero Analyst

Write a SQL query to generate a report of the continuous periods where tasks either failed or
succeeded between 2019-01-01 and 2019-12-31. The report should include start_date,
end_date, and period_state (either 'failed' or 'succeeded'). Order the result by start_date.

Explanation
To solve this:
• Combine both the Failed and Succeeded tables and select the dates within the period
2019-01-01 to 2019-12-31.
• Group consecutive days together using a method to identify continuous periods of success
or failure.
• For each period, determine if the task was failed or succeeded.
• Output the start_date, end_date, and period_state for each continuous period,
ordering by start_date.

Datasets and SQL Schemas


• - Table creation and sample data
CREATE TABLE Failed (
fail_date DATE PRIMARY KEY
);

CREATE TABLE Succeeded (


success_date DATE PRIMARY KEY
);

-- Sample data insertions


INSERT INTO Failed (fail_date)
VALUES
('2018-12-28'),
('2018-12-29'),
('2019-01-04'),
('2019-01-05');

INSERT INTO Succeeded (success_date)


VALUES
('2018-12-30'),
('2018-12-31'),
('2019-01-01'),
('2019-01-02'),
('2019-01-03'),
('2019-01-06');

Learnings
• Combining data from multiple tables using UNION.
• Identifying consecutive days using GROUP BY and ROW_NUMBER() for windowing functions.
• Using CASE and conditional logic to determine if the period is failed or succeeded.

Solutions
• - PostgreSQL solution
WITH Combined AS (
SELECT fail_date AS task_date, 'failed' AS period_state
FROM Failed
WHERE fail_date BETWEEN '2019-01-01' AND '2019-12-31'
UNION ALL
SELECT success_date AS task_date, 'succeeded' AS period_state
FROM Succeeded
WHERE success_date BETWEEN '2019-01-01' AND '2019-12-31'
),
Ranked AS (

394
1000+ SQL Interview Questions & Answers | By Zero Analyst

SELECT task_date,
period_state,
ROW_NUMBER() OVER (ORDER BY task_date) -
ROW_NUMBER() OVER (PARTITION BY period_state ORDER BY task_date) AS grp
FROM Combined
)
SELECT MIN(task_date) AS start_date,
MAX(task_date) AS end_date,
period_state
FROM Ranked
GROUP BY period_state, grp
ORDER BY start_date;
• - MySQL solution
WITH Combined AS (
SELECT fail_date AS task_date, 'failed' AS period_state
FROM Failed
WHERE fail_date BETWEEN '2019-01-01' AND '2019-12-31'
UNION ALL
SELECT success_date AS task_date, 'succeeded' AS period_state
FROM Succeeded
WHERE success_date BETWEEN '2019-01-01' AND '2019-12-31'
),
Ranked AS (
SELECT task_date,
period_state,
ROW_NUMBER() OVER (ORDER BY task_date) -
ROW_NUMBER() OVER (PARTITION BY period_state ORDER BY task_date) AS grp
FROM Combined
)
SELECT MIN(task_date) AS start_date,
MAX(task_date) AS end_date,
period_state
FROM Ranked
GROUP BY period_state, grp
ORDER BY start_date;
• Q.315

Question
Write an SQL query to report the current credit balance for each user after processing all
transactions, and check if they have breached their credit limit (i.e., if their credit balance is
less than 0). The result should include user_id, user_name, credit (current balance), and
credit_limit_breached (with values 'Yes' or 'No').

Explanation
To solve this:
• Calculate the total transactions for each user by summing the amounts where the user is
either the paid_by or paid_to in the Transactions table.
• Update the user's credit balance by adjusting it for both money paid and received:
• If the user is the payer (paid_by), subtract the transaction amount.
• If the user is the receiver (paid_to), add the transaction amount.
• Check whether the updated credit balance is below 0 to determine if the user has breached
their credit limit.
• Return the user details along with their current balance and whether they have breached the
credit limit.

Datasets and SQL Schemas


• - Table creation and sample data
CREATE TABLE Users (
user_id INT PRIMARY KEY,

395
1000+ SQL Interview Questions & Answers | By Zero Analyst

user_name VARCHAR(100),
credit INT
);

CREATE TABLE Transactions (


trans_id INT PRIMARY KEY,
paid_by INT,
paid_to INT,
amount INT,
transacted_on DATE
);

-- Sample data insertions


INSERT INTO Users (user_id, user_name, credit)
VALUES
(1, 'Moustafa', 100),
(2, 'Jonathan', 200),
(3, 'Winston', 10000),
(4, 'Luis', 800);

INSERT INTO Transactions (trans_id, paid_by, paid_to, amount, transacted_on)


VALUES
(1, 1, 3, 400, '2020-08-01'),
(2, 3, 2, 500, '2020-08-02'),
(3, 2, 1, 200, '2020-08-03');

Learnings
• SUM() and conditional aggregation to calculate net transaction changes for each user.
• JOIN to combine Users and Transactions tables.
• Conditional logic with CASE to check credit breaches.

Solutions
• - PostgreSQL solution
SELECT u.user_id,
u.user_name,
u.credit - COALESCE(SUM(CASE WHEN t.paid_by = u.user_id THEN t.amount ELSE 0 END)
, 0) + COALESCE(SUM(CASE WHEN t.paid_to = u.user_id THEN t.amount ELSE 0 END), 0) AS cre
dit,
CASE
WHEN u.credit - COALESCE(SUM(CASE WHEN t.paid_by = u.user_id THEN t.amount EL
SE 0 END), 0) + COALESCE(SUM(CASE WHEN t.paid_to = u.user_id THEN t.amount ELSE 0 END),
0) < 0
THEN 'Yes'
ELSE 'No'
END AS credit_limit_breached
FROM Users u
LEFT JOIN Transactions t ON u.user_id = t.paid_by OR u.user_id = t.paid_to
GROUP BY u.user_id
ORDER BY u.user_id;
• - MySQL solution
SELECT u.user_id,
u.user_name,
u.credit - COALESCE(SUM(CASE WHEN t.paid_by = u.user_id THEN t.amount ELSE 0 END)
, 0) + COALESCE(SUM(CASE WHEN t.paid_to = u.user_id THEN t.amount ELSE 0 END), 0) AS cre
dit,
CASE
WHEN u.credit - COALESCE(SUM(CASE WHEN t.paid_by = u.user_id THEN t.amount EL
SE 0 END), 0) + COALESCE(SUM(CASE WHEN t.paid_to = u.user_id THEN t.amount ELSE 0 END),
0) < 0
THEN 'Yes'
ELSE 'No'
END AS credit_limit_breached
FROM Users u
LEFT JOIN Transactions t ON u.user_id = t.paid_by OR u.user_id = t.paid_to
GROUP BY u.user_id
ORDER BY u.user_id;
• Q.316

396
1000+ SQL Interview Questions & Answers | By Zero Analyst

Question
Write an SQL query to find the countries where the telecommunications company can invest.
The company wants to invest in countries where the average call duration is strictly greater
than the global average call duration.

Explanation
To solve this:
• Calculate the global average call duration by averaging the duration of all calls.
• Calculate the average call duration for each country by joining the Calls, Person, and
Country tables. This requires:
• Mapping each caller's and callee's phone number to their country based on the country
code.
• Grouping the calls by country and calculating the average duration for each.
• Compare the average call duration of each country with the global average and return
the countries where the country's average duration is greater than the global average.

Datasets and SQL Schemas


• - Table creation and sample data
CREATE TABLE Person (
id INT PRIMARY KEY,
name VARCHAR(100),
phone_number VARCHAR(100)
);

CREATE TABLE Country (


name VARCHAR(100),
country_code VARCHAR(3) PRIMARY KEY
);

CREATE TABLE Calls (


caller_id INT,
callee_id INT,
duration INT
);

-- Sample data insertions


INSERT INTO Person (id, name, phone_number)
VALUES
(3, 'Jonathan', '051-1234567'),
(12, 'Elvis', '051-7654321'),
(1, 'Moncef', '212-1234567'),
(2, 'Maroua', '212-6523651'),
(7, 'Meir', '972-1234567'),
(9, 'Rachel', '972-0011100');

INSERT INTO Country (name, country_code)


VALUES
('Peru', '051'),
('Israel', '972'),
('Morocco', '212'),
('Germany', '049'),
('Ethiopia', '251');

INSERT INTO Calls (caller_id, callee_id, duration)


VALUES
(1, 9, 33),
(2, 9, 4),
(1, 2, 59),
(3, 12, 102),
(3, 12, 330),
(12, 3, 5),

397
1000+ SQL Interview Questions & Answers | By Zero Analyst

(7, 9, 13),
(7, 1, 3),
(9, 7, 1),
(1, 7, 7);

Learnings
• Using JOIN to combine multiple tables based on relationships.
• Aggregating with AVG() to calculate averages.
• Filtering results based on comparison of averages (global vs. local).
• Dealing with LEFT JOIN for country code lookups.

Solutions
• - PostgreSQL solution
WITH GlobalAverage AS (
SELECT AVG(duration) AS global_avg
FROM Calls
),
CountryAverage AS (
SELECT c.name AS country,
AVG(call.duration) AS country_avg
FROM Calls call
JOIN Person p1 ON call.caller_id = p1.id
JOIN Person p2 ON call.callee_id = p2.id
JOIN Country c1 ON SUBSTRING(p1.phone_number FROM 1 FOR 3) = c1.country_code
JOIN Country c2 ON SUBSTRING(p2.phone_number FROM 1 FOR 3) = c2.country_code
GROUP BY c1.name
)
SELECT ca.country
FROM CountryAverage ca, GlobalAverage ga
WHERE ca.country_avg > ga.global_avg;
• - MySQL solution
WITH GlobalAverage AS (
SELECT AVG(duration) AS global_avg
FROM Calls
),
CountryAverage AS (
SELECT c.name AS country,
AVG(call.duration) AS country_avg
FROM Calls call
JOIN Person p1 ON call.caller_id = p1.id
JOIN Person p2 ON call.callee_id = p2.id
JOIN Country c1 ON SUBSTRING(p1.phone_number, 1, 3) = c1.country_code
JOIN Country c2 ON SUBSTRING(p2.phone_number, 1, 3) = c2.country_code
GROUP BY c1.name
)
SELECT ca.country
FROM CountryAverage ca, GlobalAverage ga
WHERE ca.country_avg > ga.global_avg;
• Q.317
• Q.318

Question
Write an SQL query to find the bank accounts where the total income from deposits exceeds
the max_income for two or more consecutive months. The total income of an account in a
month is the sum of all its 'Creditor' transactions during that month.

Explanation
To solve this:
• Extract the monthly total income for each account:

398
1000+ SQL Interview Questions & Answers | By Zero Analyst

• We will group the Transactions by account_id and month (using YEAR() and MONTH()
functions).
• We will sum the amount for transactions of type 'Creditor' for each account and month.
• Identify accounts with consecutive months where the total income exceeds
max_income:
• For each account, check if the total income for a month exceeds max_income and then
check if this occurs for two or more consecutive months.
• Return the account_id of suspicious accounts:
• An account is suspicious if its total income exceeds the max_income for two or more
consecutive months.

Datasets and SQL Schemas


• - Table creation and sample data
CREATE TABLE Accounts (
account_id INT PRIMARY KEY,
max_income INT
);

CREATE TABLE Transactions (


transaction_id INT PRIMARY KEY,
account_id INT,
type ENUM('Creditor', 'Debtor'),
amount INT,
day DATETIME
);

-- Sample data insertions


INSERT INTO Accounts (account_id, max_income)
VALUES
(3, 21000),
(4, 10400);

INSERT INTO Transactions (transaction_id, account_id, type, amount, day)


VALUES
(2, 3, 'Creditor', 107100, '2021-06-02 11:38:14'),
(4, 4, 'Creditor', 10400, '2021-06-20 12:39:18'),
(11, 4, 'Debtor', 58800, '2021-07-23 12:41:55'),
(1, 4, 'Creditor', 49300, '2021-05-03 16:11:04'),
(15, 3, 'Debtor', 75500, '2021-05-23 14:40:20'),
(10, 3, 'Creditor', 102100, '2021-06-15 10:37:16'),
(14, 4, 'Creditor', 56300, '2021-07-21 12:12:25'),
(19, 4, 'Debtor', 101100, '2021-05-09 15:21:49'),
(8, 3, 'Creditor', 64900, '2021-07-26 15:09:56'),
(7, 3, 'Creditor', 90900, '2021-06-14 11:23:07');

Learnings
• Using SUM() to calculate total income for each account in each month.
• Grouping by account_id, year, and month.
• Using JOIN to filter out accounts where the total income exceeds the max_income for
consecutive months.
• Window functions or subqueries to detect consecutive months with suspicious activity.

Solutions
• - PostgreSQL solution
WITH MonthlyIncome AS (
SELECT account_id,
EXTRACT(YEAR FROM day) AS year,
EXTRACT(MONTH FROM day) AS month,
SUM(amount) AS total_income
FROM Transactions

399
1000+ SQL Interview Questions & Answers | By Zero Analyst

WHERE type = 'Creditor'


GROUP BY account_id, EXTRACT(YEAR FROM day), EXTRACT(MONTH FROM day)
),
SuspiciousAccounts AS (
SELECT m1.account_id
FROM MonthlyIncome m1
JOIN MonthlyIncome m2
ON m1.account_id = m2.account_id
AND ((m1.year = m2.year AND m1.month + 1 = m2.month)
OR (m1.year + 1 = m2.year AND m1.month = 12 AND m2.month = 1))
JOIN Accounts a
ON m1.account_id = a.account_id
WHERE m1.total_income > a.max_income
AND m2.total_income > a.max_income
)
SELECT DISTINCT account_id
FROM SuspiciousAccounts
ORDER BY account_id;
• - MySQL solution
WITH MonthlyIncome AS (
SELECT account_id,
YEAR(day) AS year,
MONTH(day) AS month,
SUM(amount) AS total_income
FROM Transactions
WHERE type = 'Creditor'
GROUP BY account_id, YEAR(day), MONTH(day)
),
SuspiciousAccounts AS (
SELECT m1.account_id
FROM MonthlyIncome m1
JOIN MonthlyIncome m2
ON m1.account_id = m2.account_id
AND ((m1.year = m2.year AND m1.month + 1 = m2.month)
OR (m1.year + 1 = m2.year AND m1.month = 12 AND m2.month = 1))
JOIN Accounts a
ON m1.account_id = a.account_id
WHERE m1.total_income > a.max_income
AND m2.total_income > a.max_income
)
SELECT DISTINCT account_id
FROM SuspiciousAccounts
ORDER BY account_id;
• Q.319

Question
Write a query to find the number of students majoring in each department. Include all
departments, even those with no students. Sort the result by the number of students in
descending order, and in case of ties, alphabetically by the department name.

Explanation
To solve this:
• Join the tables: We will join the student and department tables using the dept_id
column.
• Use a LEFT JOIN to ensure that all departments are included, even those without
students.
• Count the students per department: After the join, we will count the number of students
for each department.
• Handle sorting: We will order the result by the number of students in descending order,
and by department name alphabetically in case of ties.

400
1000+ SQL Interview Questions & Answers | By Zero Analyst

Datasets and SQL Schemas


• - Table creation and sample data
CREATE TABLE student (
student_id INT PRIMARY KEY,
student_name VARCHAR(255),
gender CHAR(1),
dept_id INT
);

CREATE TABLE department (


dept_id INT PRIMARY KEY,
dept_name VARCHAR(255)
);

-- Sample data insertions


INSERT INTO student (student_id, student_name, gender, dept_id)
VALUES
(1, 'Jack', 'M', 1),
(2, 'Jane', 'F', 1),
(3, 'Mark', 'M', 2);

INSERT INTO department (dept_id, dept_name)


VALUES
(1, 'Engineering'),
(2, 'Science'),
(3, 'Law');

Learnings
• Using LEFT JOIN to include all rows from one table even when there are no matching rows
in the other table.
• Using COUNT() with GROUP BY to aggregate results.
• Sorting by multiple criteria using ORDER BY.

Solutions
• - PostgreSQL solution
SELECT d.dept_name,
COUNT(s.student_id) AS student_number
FROM department d
LEFT JOIN student s ON d.dept_id = s.dept_id
GROUP BY d.dept_name
ORDER BY student_number DESC, d.dept_name;
• - MySQL solution
SELECT d.dept_name,
COUNT(s.student_id) AS student_number
FROM department d
LEFT JOIN student s ON d.dept_id = s.dept_id
GROUP BY d.dept_name
ORDER BY student_number DESC, d.dept_name;
• Q.320

Question
Write an SQL query to calculate the quality of each query. The quality is defined as the
average of the ratio between the query's rating and its position. The result should be rounded
to two decimal places.

Explanation
• Calculating the ratio: For each query, the ratio is defined as the ratio between the rating
and position columns:
Ratio=ratingposition\text{Ratio} = \frac{\text{rating}}{\text{position}}

401
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Aggregating by query_name: For each query, we need to calculate the average of these
ratios.
• Filtering poor queries: Since the problem doesn't specifically ask to exclude poor queries
(rating less than 3), we will include all rows in the calculation.
• Rounding the result: The final average ratio (quality) should be rounded to 2 decimal
places.

Datasets and SQL Schemas


• - Table creation and sample data
CREATE TABLE Queries (
query_name VARCHAR(255),
result VARCHAR(255),
position INT,
rating INT
);

-- Sample data insertions


INSERT INTO Queries (query_name, result, position, rating)
VALUES
('Dog', 'Golden Retriever', 1, 5),
('Dog', 'German Shepherd', 2, 5),
('Dog', 'Mule', 200, 1),
('Cat', 'Shirazi', 5, 2),
('Cat', 'Siamese', 3, 3),
('Cat', 'Sphynx', 7, 4);

Learnings
• Using AVG() to calculate the average of a calculated expression.
• Using ROUND() to round the result to a specified number of decimal places.

Solutions
• - PostgreSQL solution
SELECT query_name,
ROUND(AVG(rating::FLOAT / position), 2) AS quality
FROM Queries
GROUP BY query_name;
• - MySQL solution
SELECT query_name,
ROUND(AVG(rating / position), 2) AS quality
FROM Queries
GROUP BY query_name;

Walmart
• Q.321
Question
Find the longest sequence of consecutive days a Walmart customer has made purchases.
Output the customer ID, the start date, the end date, and the number of consecutive days they
made purchases.
Explanation
To solve this:
• Identify consecutive purchase dates for each customer. A sequence of consecutive dates is
defined by the difference between each order date and the previous one being exactly 1 day.
• Group the purchases by customer and identify streaks of consecutive dates.
• Calculate the length of each streak, and then output the longest streak per customer.

402
1000+ SQL Interview Questions & Answers | By Zero Analyst

Datasets and SQL Schemas


• - Table creation and sample data
CREATE TABLE walmart_orders (
order_id INT PRIMARY KEY,
customer_id INT,
order_date DATE
);

-- Sample data insertions


INSERT INTO walmart_orders (order_id, customer_id, order_date)
VALUES
(1, 101, '2023-01-01'),
(2, 101, '2023-01-02'),
(3, 101, '2023-01-04'),
(4, 101, '2023-01-05'),
(5, 102, '2023-01-03'),
(6, 102, '2023-01-04'),
(7, 102, '2023-01-05'),
(8, 103, '2023-01-01'),
(9, 103, '2023-01-03'),
(10, 103, '2023-01-04'),
(11, 104, '2023-01-02'),
(12, 104, '2023-01-03'),
(13, 104, '2023-01-04'),
(14, 104, '2023-01-06');

Learnings
• Using window functions like LEAD() or LAG() to find consecutive dates.
• Using date difference logic to identify consecutive streaks.
• Grouping and aggregating based on streaks and customer IDs.
• Handling edge cases where streaks might have gaps.
Solutions
• - PostgreSQL solution
WITH consecutive_dates AS (
SELECT
customer_id,
order_date,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_date) -
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_date) AS streak_group
FROM walmart_orders
),
streak_lengths AS (
SELECT
customer_id,
MIN(order_date) AS streak_start,
MAX(order_date) AS streak_end,
COUNT(*) AS streak_length
FROM consecutive_dates
GROUP BY customer_id, streak_group
)
SELECT customer_id, streak_start, streak_end, streak_length
FROM streak_lengths
WHERE streak_length = (
SELECT MAX(streak_length)
FROM streak_lengths
WHERE customer_id = streak_lengths.customer_id
)
ORDER BY customer_id;
• - MySQL solution
WITH consecutive_dates AS (
SELECT
customer_id,
order_date,
DATEDIFF(order_date,
(SELECT MAX(order_date)

403
1000+ SQL Interview Questions & Answers | By Zero Analyst

FROM walmart_orders o2
WHERE o2.customer_id = walmart_orders.customer_id AND o2.order_date < walma
rt_orders.order_date)
) AS streak_group
FROM walmart_orders
),
streak_lengths AS (
SELECT
customer_id,
MIN(order_date) AS streak_start,
MAX(order_date) AS streak_end,
COUNT(*) AS streak_length
FROM consecutive_dates
GROUP BY customer_id, streak_group
)
SELECT customer_id, streak_start, streak_end, streak_length
FROM streak_lengths
WHERE streak_length = (
SELECT MAX(streak_length)
FROM streak_lengths
WHERE customer_id = streak_lengths.customer_id
)
ORDER BY customer_id;

Explanation
• PostgreSQL:
• We calculate the difference between row numbers within each customer using
ROW_NUMBER() to identify streak groups.
• We then aggregate the streaks, calculating the start and end dates and the length of each
streak.
• The final query filters the longest streak for each customer.
• MySQL:
• We use DATEDIFF() to find the difference between the current order's date and the
previous order's date for each customer. This helps in identifying consecutive purchase days.
• The streaks are identified using streak_group, and we aggregate the results to find the
longest streak for each customer.
This solution involves window functions and complex date difference calculations to identify
and group consecutive days of purchases, making it a more advanced problem.
• Q.322
Question
Find the top 3 most sold products globally (across all stores) in the walmart_inventory
table based on the total quantity sold. Output the product name, total quantity sold, and the
total revenue generated from that product.
Explanation
To solve this:
• Aggregate the data by product_id to get the total quantity sold and total revenue for each
product globally.
• Sort the results by total quantity sold in descending order to identify the top-selling
products.
• Filter to show only the top 3 products.
• Calculate the total revenue by multiplying quantity_sold by price_per_unit.
Datasets and SQL Schemas

404
1000+ SQL Interview Questions & Answers | By Zero Analyst

• - Table creation and sample data


CREATE TABLE walmart_inventory (
product_id INT PRIMARY KEY,
product_name VARCHAR(100),
quantity_sold INT,
price_per_unit DECIMAL(10, 2),
store_id INT,
sale_date DATE
);

-- Sample data insertions


INSERT INTO walmart_inventory (product_id, product_name, quantity_sold, price_per_unit,
store_id, sale_date)
VALUES
(101, 'Laptop', 50, 799.99, 1, '2023-01-01'),
(102, 'Smartphone', 150, 499.99, 1, '2023-01-02'),
(103, 'Tablet', 120, 299.99, 1, '2023-01-03'),
(104, 'Smartwatch', 90, 199.99, 2, '2023-01-04'),
(105, 'Headphones', 200, 99.99, 2, '2023-01-05'),
(106, 'Speaker', 80, 149.99, 2, '2023-01-06'),
(107, 'Laptop', 30, 799.99, 3, '2023-01-07'),
(108, 'Smartphone', 120, 499.99, 3, '2023-01-08'),
(109, 'Tablet', 110, 299.99, 3, '2023-01-09'),
(110, 'Smartwatch', 50, 199.99, 3, '2023-01-10');

Learnings
• Using aggregation (SUM()) to calculate total quantities and revenues.
• Grouping data by product to summarize sales information.
• Sorting the results to find the top 3 products by quantity sold.
• Calculating total revenue using multiplication of quantity_sold and price_per_unit.
Solutions
• - PostgreSQL solution
SELECT
product_name,
SUM(quantity_sold) AS total_quantity_sold,
SUM(quantity_sold * price_per_unit) AS total_revenue
FROM walmart_inventory
GROUP BY product_name
ORDER BY total_quantity_sold DESC
LIMIT 3;
• - MySQL solution
SELECT
product_name,
SUM(quantity_sold) AS total_quantity_sold,
SUM(quantity_sold * price_per_unit) AS total_revenue
FROM walmart_inventory
GROUP BY product_name
ORDER BY total_quantity_sold DESC
LIMIT 3;

Explanation
• Both PostgreSQL and MySQL solutions aggregate sales data by product_name using
SUM(quantity_sold) to calculate the total quantity sold and SUM(quantity_sold *
price_per_unit) to calculate the total revenue.
• We order the products by total_quantity_sold in descending order to identify the most
sold products.
• The query limits the output to the top 3 products using LIMIT 3.
This problem requires basic aggregation, sorting, and filtering techniques, but it challenges
you to think globally about data across multiple stores and to summarize it effectively.

405
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Q.323
Question
Find the second most recent purchase made by each customer in the purchases table.
Output the customer ID, the second most recent purchase date, and the amount spent on that
purchase.
Explanation
To solve this:
• For each customer, you need to determine the second most recent purchase date.
• This requires sorting the purchases by purchase_date for each customer and then
selecting the second entry.
• Handle edge cases where a customer has fewer than two purchases.
Datasets and SQL Schemas
• - Table creation and sample data
CREATE TABLE purchases (
purchase_id INT PRIMARY KEY,
customer_id INT,
purchase_date DATE,
amount_spent DECIMAL(10, 2)
);

-- Sample data insertions


INSERT INTO purchases (purchase_id, customer_id, purchase_date, amount_spent)
VALUES
(1, 101, '2023-01-01', 150.50),
(2, 101, '2023-02-10', 200.75),
(3, 102, '2023-01-15', 99.99),
(4, 103, '2023-01-05', 250.00),
(5, 103, '2023-03-01', 300.50),
(6, 104, '2023-01-20', 500.00),
(7, 104, '2023-02-15', 450.25),
(8, 105, '2023-01-25', 700.00);

Learnings
• Using window functions to rank purchases by date for each customer.
• Using ROW_NUMBER() to determine the ranking of purchases for each customer.
• Handling cases where customers might have fewer than two purchases.
Solutions
• - PostgreSQL solution
WITH ranked_purchases AS (
SELECT
customer_id,
purchase_date,
amount_spent,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY purchase_date DESC) AS purc
hase_rank
FROM purchases
)
SELECT customer_id, purchase_date, amount_spent
FROM ranked_purchases
WHERE purchase_rank = 2
ORDER BY customer_id;
• - MySQL solution
WITH ranked_purchases AS (
SELECT
customer_id,
purchase_date,

406
1000+ SQL Interview Questions & Answers | By Zero Analyst

amount_spent,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY purchase_date DESC) AS purc
hase_rank
FROM purchases
)
SELECT customer_id, purchase_date, amount_spent
FROM ranked_purchases
WHERE purchase_rank = 2
ORDER BY customer_id;

Explanation
• PostgreSQL and MySQL:
• We use ROW_NUMBER() window function to rank purchases by purchase_date for each
customer in descending order.
• The second most recent purchase for each customer is identified where purchase_rank =
2.
• The results are ordered by customer_id to list each customer and their second most recent
purchase.
This problem tests your ability to handle window functions, ranking data, and managing edge
cases like customers with fewer than two purchases.
• Q.324
Question
Find the most frequent day of the week on which each customer makes purchases. Output
the customer ID, the most frequent day of the week, and the number of times they made a
purchase on that day.
Explanation
To solve this:
• You need to extract the day of the week from the purchase_date (e.g., Monday, Tuesday,
etc.).
• Count the frequency of purchases for each day of the week for each customer.
• Identify the most frequent day for each customer by selecting the day with the highest
count.
Datasets and SQL Schemas
• - Table creation and sample data
CREATE TABLE purchases (
purchase_id INT PRIMARY KEY,
customer_id INT,
purchase_date DATE,
amount_spent DECIMAL(10, 2)
);

-- Sample data insertions


INSERT INTO purchases (purchase_id, customer_id, purchase_date, amount_spent)
VALUES
(1, 101, '2023-01-01', 150.50),
(2, 101, '2023-01-02', 200.75),
(3, 101, '2023-01-08', 100.00),
(4, 102, '2023-01-03', 99.99),
(5, 102, '2023-01-04', 150.50),
(6, 102, '2023-01-10', 250.00),
(7, 103, '2023-01-05', 300.50),
(8, 103, '2023-01-06', 50.00),
(9, 104, '2023-01-07', 500.00),
(10, 104, '2023-01-07', 450.25);

407
1000+ SQL Interview Questions & Answers | By Zero Analyst

Learnings
• Using EXTRACT(DOW FROM date) to get the day of the week.
• Grouping and counting the frequency of each day for each customer.
• Using RANK() or similar methods to select the most frequent day of the week for each
customer.
Solutions
• - PostgreSQL solution
WITH day_of_week AS (
SELECT
customer_id,
EXTRACT(DOW FROM purchase_date) AS purchase_day,
COUNT(*) AS purchase_count
FROM purchases
GROUP BY customer_id, purchase_day
),
ranked_days AS (
SELECT
customer_id,
purchase_day,
purchase_count,
RANK() OVER (PARTITION BY customer_id ORDER BY purchase_count DESC) AS day_rank
FROM day_of_week
)
SELECT
customer_id,
CASE
WHEN purchase_day = 0 THEN 'Sunday'
WHEN purchase_day = 1 THEN 'Monday'
WHEN purchase_day = 2 THEN 'Tuesday'
WHEN purchase_day = 3 THEN 'Wednesday'
WHEN purchase_day = 4 THEN 'Thursday'
WHEN purchase_day = 5 THEN 'Friday'
WHEN purchase_day = 6 THEN 'Saturday'
END AS most_frequent_day,
purchase_count
FROM ranked_days
WHERE day_rank = 1
ORDER BY customer_id;
• - MySQL solution
WITH day_of_week AS (
SELECT
customer_id,
DAYOFWEEK(purchase_date) AS purchase_day,
COUNT(*) AS purchase_count
FROM purchases
GROUP BY customer_id, purchase_day
),
ranked_days AS (
SELECT
customer_id,
purchase_day,
purchase_count,
RANK() OVER (PARTITION BY customer_id ORDER BY purchase_count DESC) AS day_rank
FROM day_of_week
)
SELECT
customer_id,
CASE
WHEN purchase_day = 1 THEN 'Sunday'
WHEN purchase_day = 2 THEN 'Monday'
WHEN purchase_day = 3 THEN 'Tuesday'
WHEN purchase_day = 4 THEN 'Wednesday'
WHEN purchase_day = 5 THEN 'Thursday'
WHEN purchase_day = 6 THEN 'Friday'
WHEN purchase_day = 7 THEN 'Saturday'
END AS most_frequent_day,

408
1000+ SQL Interview Questions & Answers | By Zero Analyst

purchase_count
FROM ranked_days
WHERE day_rank = 1
ORDER BY customer_id;

Explanation
• PostgreSQL and MySQL:
• We use EXTRACT(DOW FROM purchase_date) in PostgreSQL and
DAYOFWEEK(purchase_date) in MySQL to get the day of the week (0 for Sunday, 6 for
Saturday).
• We then count how many purchases were made on each day of the week for each
customer.
• Using RANK(), we rank the days based on the count of purchases.
• The final query selects the day with the highest rank (day_rank = 1) and converts the
numeric value of the day into its corresponding name.
This problem helps you practice date manipulation, grouping, and ranking functions to
analyze customer behavior in a creative way.
• Q.325
Question
Find the longest sequence of products purchased in a single session by each customer. A
session is defined as a series of purchases where the time between consecutive purchases is
less than or equal to 2 hours. Output the customer ID, the session start time (earliest
purchase), session end time (latest purchase), and the number of products purchased in that
session.
Explanation
To solve this:
• We need to group purchases into sessions for each customer. A session is defined as
purchases made within 2 hours of each other.
• Once the purchases are grouped by session, we need to count how many products were
purchased in each session and determine the start and end times for each session.
• Select the longest session for each customer based on the number of products purchased.
Datasets and SQL Schemas
• - Table creation and sample data
CREATE TABLE customer_purchases (
purchase_id INT PRIMARY KEY,
customer_id INT,
product_id INT,
purchase_time TIMESTAMP
);

-- Sample data insertions


INSERT INTO customer_purchases (purchase_id, customer_id, product_id, purchase_time)
VALUES
(1, 101, 201, '2023-01-01 10:00:00'),
(2, 101, 202, '2023-01-01 10:45:00'),
(3, 101, 203, '2023-01-01 12:00:00'),
(4, 101, 204, '2023-01-01 15:00:00'),
(5, 102, 301, '2023-01-01 11:00:00'),
(6, 102, 302, '2023-01-01 11:30:00'),
(7, 102, 303, '2023-01-01 14:00:00'),
(8, 103, 401, '2023-01-02 09:00:00'),
(9, 103, 402, '2023-01-02 09:45:00'),
(10, 103, 403, '2023-01-02 11:00:00'),

409
1000+ SQL Interview Questions & Answers | By Zero Analyst

(11, 103, 404, '2023-01-02 11:20:00'),


(12, 104, 501, '2023-01-03 08:00:00'),
(13, 104, 502, '2023-01-03 08:15:00'),
(14, 104, 503, '2023-01-03 10:00:00');

Learnings
• Using LAG() or LEAD() window functions to calculate time differences between
consecutive purchases.
• Grouping purchases into sessions based on time intervals.
• Handling time-based conditions using TIMESTAMP and INTERVAL.
• Aggregating results to identify the longest session.
Solutions
• - PostgreSQL solution
WITH session_groups AS (
SELECT
customer_id,
purchase_time,
product_id,
SUM(CASE
WHEN purchase_time - LAG(purchase_time) OVER (PARTITION BY customer_id ORDER
BY purchase_time) <= INTERVAL '2 hours'
THEN 0
ELSE 1
END)
OVER (PARTITION BY customer_id ORDER BY purchase_time) AS session_id
FROM customer_purchases
),
session_lengths AS (
SELECT
customer_id,
session_id,
MIN(purchase_time) AS session_start,
MAX(purchase_time) AS session_end,
COUNT(product_id) AS products_in_session
FROM session_groups
GROUP BY customer_id, session_id
)
SELECT
customer_id,
session_start,
session_end,
products_in_session
FROM session_lengths
WHERE products_in_session = (
SELECT MAX(products_in_session)
FROM session_lengths
WHERE customer_id = session_lengths.customer_id
)
ORDER BY customer_id;
• - MySQL solution
WITH session_groups AS (
SELECT
customer_id,
purchase_time,
product_id,
SUM(CASE
WHEN TIMESTAMPDIFF(HOUR, LAG(purchase_time) OVER (PARTITION BY customer_id O
RDER BY purchase_time), purchase_time) <= 2
THEN 0
ELSE 1
END)
OVER (PARTITION BY customer_id ORDER BY purchase_time) AS session_id
FROM customer_purchases
),
session_lengths AS (
SELECT

410
1000+ SQL Interview Questions & Answers | By Zero Analyst

customer_id,
session_id,
MIN(purchase_time) AS session_start,
MAX(purchase_time) AS session_end,
COUNT(product_id) AS products_in_session
FROM session_groups
GROUP BY customer_id, session_id
)
SELECT
customer_id,
session_start,
session_end,
products_in_session
FROM session_lengths
WHERE products_in_session = (
SELECT MAX(products_in_session)
FROM session_lengths
WHERE customer_id = session_lengths.customer_id
)
ORDER BY customer_id;

Explanation
• PostgreSQL and MySQL:
• In the session_groups CTE, we calculate the session for each customer using LAG() to
get the previous purchase time and check if the difference between consecutive purchases is
greater than 2 hours. If it's greater than 2 hours, it means a new session has started.
• In the session_lengths CTE, we aggregate by customer_id and session_id to
calculate the session start and end times, and count the number of products purchased in that
session.
• The final query selects the longest session for each customer by finding the session with
the highest number of products purchased.
This problem involves using window functions, date/time manipulation, and aggregating data
in a way that identifies meaningful purchase behavior. It tests your ability to handle time-
based sessions and complex grouping scenarios.
• Q.326
Question
Find the top 3 most profitable stores based on the total revenue from their sales in the
store_sales table. Output the store ID, store name, and the total revenue, sorted by revenue
in descending order.
Explanation
To solve this:
• Calculate the total revenue for each store by summing up the sale_amount for each store.
• Sort the stores based on the total revenue in descending order.
• Output the top 3 stores based on total revenue.
Datasets and SQL Schemas
• - Table creation and sample data
CREATE TABLE store_sales (
sale_id INT PRIMARY KEY,
store_id INT,
store_name VARCHAR(100),
sale_amount DECIMAL(10, 2),
sale_date DATE
);

411
1000+ SQL Interview Questions & Answers | By Zero Analyst

-- Sample data insertions


INSERT INTO store_sales (sale_id, store_id, store_name, sale_amount, sale_date)
VALUES
(1, 101, 'Store A', 1500.00, '2023-01-01'),
(2, 102, 'Store B', 2000.00, '2023-01-01'),
(3, 101, 'Store A', 1000.00, '2023-01-02'),
(4, 103, 'Store C', 1800.00, '2023-01-02'),
(5, 104, 'Store D', 1200.00, '2023-01-03'),
(6, 102, 'Store B', 2500.00, '2023-01-03'),
(7, 103, 'Store C', 2100.00, '2023-01-04'),
(8, 104, 'Store D', 1900.00, '2023-01-04'),
(9, 105, 'Store E', 1700.00, '2023-01-05'),
(10, 105, 'Store E', 1600.00, '2023-01-05'),
(11, 101, 'Store A', 2000.00, '2023-01-06'),
(12, 103, 'Store C', 2200.00, '2023-01-06'),
(13, 102, 'Store B', 2400.00, '2023-01-07'),
(14, 104, 'Store D', 1500.00, '2023-01-07'),
(15, 105, 'Store E', 1800.00, '2023-01-08');

Learnings
• Using aggregation (SUM()) to calculate total sales for each store.
• Grouping by store_id and store_name to aggregate sales data at the store level.
• Sorting and limiting the results to get the top N records based on total revenue.
Solutions
• - PostgreSQL solution
SELECT
store_id,
store_name,
SUM(sale_amount) AS total_revenue
FROM store_sales
GROUP BY store_id, store_name
ORDER BY total_revenue DESC
LIMIT 3;
• - MySQL solution
SELECT
store_id,
store_name,
SUM(sale_amount) AS total_revenue
FROM store_sales
GROUP BY store_id, store_name
ORDER BY total_revenue DESC
LIMIT 3;

Explanation
• PostgreSQL and MySQL:
• The query uses SUM(sale_amount) to calculate the total revenue for each store.
• We group the data by store_id and store_name to aggregate the sales by store.
• The results are sorted in descending order based on the total revenue, and only the top 3
stores are selected using LIMIT 3.
This problem helps practice grouping and aggregating data at the store level, as well as
sorting and limiting the result set to focus on the most profitable stores. It tests basic SQL
skills, but also helps refine your ability to analyze sales data effectively.
• Q.327
Question
Find the most profitable product category for each brand based on the total revenue from
sales in the product_sales table. Revenue for a category is calculated by summing the total

412
1000+ SQL Interview Questions & Answers | By Zero Analyst

sales of all products within that category for each brand. Output the brand name, product
category, and the total revenue for that category, ordered by brand and revenue.
Explanation
To solve this:
• Calculate the total revenue for each product category within each brand by summing the
sale_amount.
• Group the sales data by brand_id, brand_name, and category_name.
• Identify the most profitable category for each brand, which can be done by finding the
category with the highest total revenue for each brand.
• Output the brand name, category, and revenue, sorted by brand name and revenue.
Datasets and SQL Schemas
• - Table creation and sample data
CREATE TABLE product_sales (
sale_id INT PRIMARY KEY,
product_id INT,
product_name VARCHAR(100),
brand_id INT,
brand_name VARCHAR(100),
category_name VARCHAR(100),
sale_amount DECIMAL(10, 2),
sale_date DATE
);

-- Sample data insertions


INSERT INTO product_sales (sale_id, product_id, product_name, brand_id, brand_name, cate
gory_name, sale_amount, sale_date)
VALUES
(1, 201, 'Product A1', 1, 'Brand X', 'Electronics', 500.00, '2023-01-01'),
(2, 202, 'Product A2', 1, 'Brand X', 'Home Appliances', 300.00, '2023-01-02'),
(3, 203, 'Product A3', 2, 'Brand Y', 'Clothing', 200.00, '2023-01-03'),
(4, 204, 'Product A4', 1, 'Brand X', 'Electronics', 150.00, '2023-01-03'),
(5, 205, 'Product A5', 2, 'Brand Y', 'Clothing', 100.00, '2023-01-04'),
(6, 206, 'Product B1', 3, 'Brand Z', 'Electronics', 1200.00, '2023-01-04'),
(7, 207, 'Product B2', 3, 'Brand Z', 'Home Appliances', 900.00, '2023-01-05'),
(8, 208, 'Product B3', 1, 'Brand X', 'Furniture', 700.00, '2023-01-06'),
(9, 209, 'Product B4', 2, 'Brand Y', 'Clothing', 400.00, '2023-01-07'),
(10, 210, 'Product B5', 3, 'Brand Z', 'Home Appliances', 800.00, '2023-01-07'),
(11, 211, 'Product C1', 1, 'Brand X', 'Furniture', 600.00, '2023-01-08'),
(12, 212, 'Product C2', 3, 'Brand Z', 'Electronics', 1100.00, '2023-01-08'),
(13, 213, 'Product C3', 2, 'Brand Y', 'Clothing', 500.00, '2023-01-09'),
(14, 214, 'Product D1', 3, 'Brand Z', 'Furniture', 300.00, '2023-01-09'),
(15, 215, 'Product D2', 1, 'Brand X', 'Electronics', 800.00, '2023-01-10');

Learnings
• Using aggregation (SUM()) to calculate the total revenue per category for each brand.
• Grouping by multiple columns (brand_id, category_name) to aggregate the data.
• Identifying the highest revenue category for each brand using window functions or
subqueries.
• Sorting and selecting top records within groups (brands).
Solutions
• - PostgreSQL solution
WITH category_revenue AS (
SELECT
brand_name,
category_name,
SUM(sale_amount) AS total_revenue
FROM product_sales
GROUP BY brand_name, category_name

413
1000+ SQL Interview Questions & Answers | By Zero Analyst

),
ranked_categories AS (
SELECT
brand_name,
category_name,
total_revenue,
RANK() OVER (PARTITION BY brand_name ORDER BY total_revenue DESC) AS category_ra
nk
FROM category_revenue
)
SELECT
brand_name,
category_name,
total_revenue
FROM ranked_categories
WHERE category_rank = 1
ORDER BY brand_name, total_revenue DESC;
• - MySQL solution
WITH category_revenue AS (
SELECT
brand_name,
category_name,
SUM(sale_amount) AS total_revenue
FROM product_sales
GROUP BY brand_name, category_name
),
ranked_categories AS (
SELECT
brand_name,
category_name,
total_revenue,
RANK() OVER (PARTITION BY brand_name ORDER BY total_revenue DESC) AS category_ra
nk
FROM category_revenue
)
SELECT
brand_name,
category_name,
total_revenue
FROM ranked_categories
WHERE category_rank = 1
ORDER BY brand_name, total_revenue DESC;

Explanation
• PostgreSQL and MySQL:
• In the category_revenue CTE, we aggregate the sales data by brand_name and
category_name to calculate the total revenue for each category within each brand.
• In the ranked_categories CTE, we use the RANK() window function to rank the
categories within each brand based on the total_revenue.
• The final query filters for the most profitable category (category_rank = 1) for each
brand and orders the results by brand and total revenue.
This problem requires using aggregation and ranking techniques to find the most profitable
category for each brand. It helps you practice window functions and data partitioning, as well
as applying ranking to identify top values within groups.
• Q.328
Question
Find the top 3 customers who have made the most number of purchases in a single month,
across all months. Output the customer ID, customer name, the month (YYYY-MM), and the
number of purchases made by that customer in that month, ordered by number of purchases
(highest to lowest).

414
1000+ SQL Interview Questions & Answers | By Zero Analyst

Explanation
To solve this:
• Group the purchases by customer_id, customer_name, and the month of purchase.
• Count the number of purchases made by each customer in each month.
• Identify the top 3 customers who made the most purchases in any single month.
• Output the results, sorted by the number of purchases in descending order.
Datasets and SQL Schemas
• - Table creation and sample data
CREATE TABLE customer_purchases (
purchase_id INT PRIMARY KEY,
customer_id INT,
customer_name VARCHAR(100),
purchase_date DATE,
product_id INT,
purchase_amount DECIMAL(10, 2)
);

-- Sample data insertions


INSERT INTO customer_purchases (purchase_id, customer_id, customer_name, purchase_date,
product_id, purchase_amount)
VALUES
(1, 101, 'Alice', '2023-01-05', 1001, 500.00),
(2, 101, 'Alice', '2023-01-10', 1002, 200.00),
(3, 101, 'Alice', '2023-01-15', 1003, 150.00),
(4, 102, 'Bob', '2023-01-05', 1004, 250.00),
(5, 102, 'Bob', '2023-01-06', 1005, 300.00),
(6, 103, 'Charlie', '2023-01-10', 1006, 350.00),
(7, 103, 'Charlie', '2023-01-15', 1007, 400.00),
(8, 103, 'Charlie', '2023-02-01', 1008, 450.00),
(9, 104, 'David', '2023-02-02', 1009, 600.00),
(10, 104, 'David', '2023-02-03', 1010, 700.00),
(11, 104, 'David', '2023-02-10', 1011, 650.00),
(12, 105, 'Eve', '2023-02-10', 1012, 550.00),
(13, 105, 'Eve', '2023-02-15', 1013, 600.00),
(14, 106, 'Frank', '2023-03-05', 1014, 450.00),
(15, 106, 'Frank', '2023-03-07', 1015, 500.00);

Learnings
• Using GROUP BY with date functions (DATE_TRUNC() or DATE_FORMAT()) to group data by
month.
• Counting the number of purchases made by each customer in each month using COUNT().
• Sorting and limiting results to get the top N customers based on purchase counts.
Solutions
• - PostgreSQL solution
WITH monthly_purchases AS (
SELECT
customer_id,
customer_name,
TO_CHAR(purchase_date, 'YYYY-MM') AS purchase_month,
COUNT(purchase_id) AS num_purchases
FROM customer_purchases
GROUP BY customer_id, customer_name, TO_CHAR(purchase_date, 'YYYY-MM')
)
SELECT
customer_id,
customer_name,
purchase_month,
num_purchases
FROM monthly_purchases
ORDER BY num_purchases DESC
LIMIT 3;

415
1000+ SQL Interview Questions & Answers | By Zero Analyst

• - MySQL solution
WITH monthly_purchases AS (
SELECT
customer_id,
customer_name,
DATE_FORMAT(purchase_date, '%Y-%m') AS purchase_month,
COUNT(purchase_id) AS num_purchases
FROM customer_purchases
GROUP BY customer_id, customer_name, DATE_FORMAT(purchase_date, '%Y-%m')
)
SELECT
customer_id,
customer_name,
purchase_month,
num_purchases
FROM monthly_purchases
ORDER BY num_purchases DESC
LIMIT 3;

Explanation
• PostgreSQL and MySQL:
• The monthly_purchases CTE groups the data by customer_id, customer_name, and the
formatted month (YYYY-MM). It then calculates the number of purchases made by each
customer within each month using COUNT(purchase_id).
• The final query sorts the results by num_purchases in descending order and limits the
output to the top 3 records using LIMIT 3.
This problem helps you practice using date formatting and aggregation to analyze customer
purchase behavior over time. It tests your ability to group data by time intervals (months) and
sort results based on counts or other aggregated metrics.
• Q.329
Question
Find all customer names whose email addresses contain a domain from a specific list of
domains (e.g., gmail.com, yahoo.com, outlook.com). Output the customer name and email
address.
Explanation
To solve this:
• Use regular expressions to match email addresses that contain specific domains.
• Filter the emails by the given domains using REGEXP or REGEXP_LIKE (depending on the
DBMS).
• Output the customer name and email address.
Datasets and SQL Schemas
• - Table creation and sample data
CREATE TABLE customers (
customer_id INT PRIMARY KEY,
customer_name VARCHAR(100),
email VARCHAR(100)
);

-- Sample data insertions


INSERT INTO customers (customer_id, customer_name, email)
VALUES
(1, 'Alice', '[email protected]'),
(2, 'Bob', '[email protected]'),
(3, 'Charlie', '[email protected]'),
(4, 'David', '[email protected]'),

416
1000+ SQL Interview Questions & Answers | By Zero Analyst

(5, 'Eve', '[email protected]'),


(6, 'Frank', '[email protected]'),
(7, 'Grace', '[email protected]'),
(8, 'Hannah', '[email protected]'),
(9, 'Ian', '[email protected]'),
(10, 'Jack', '[email protected]');

Learnings
• Using regular expressions (REGEXP or REGEXP_LIKE) to filter email addresses.
• Understanding how to match patterns, specifically domain names within email addresses.
• Filtering results based on pattern matching in a column.
Solutions
• - PostgreSQL solution
SELECT
customer_name,
email
FROM customers
WHERE email ~* '\\.(gmail\\.com|yahoo\\.com|outlook\\.com)$';
• - MySQL solution
SELECT
customer_name,
email
FROM customers
WHERE email REGEXP '\\.(gmail\\.com|yahoo\\.com|outlook\\.com)$';

Explanation
• PostgreSQL and MySQL:
• The regular expression checks if the email ends with any of the specified domains
(gmail.com, yahoo.com, outlook.com).
• The ~* in PostgreSQL and REGEXP in MySQL are used to perform case-insensitive
matching.
• The \\. in the regular expression escapes the dot (.) since it is a special character in regex
and needs to be treated literally.
This question helps you practice using regular expressions to filter data based on patterns,
and it's useful for identifying customers from specific email domains.
• Q.330
Question
Based on each user's most recent transaction date, write a query to retrieve the users along
with the number of products they bought. Output the user's most recent transaction date, user
ID, and the number of products bought, sorted in chronological order by the transaction date.
Explanation
To solve this:
• Use the MAX() function to find the most recent transaction date for each user.
• Count the number of products bought by each user in their most recent transaction.
• Sort the results by the transaction date in chronological order.
Datasets and SQL Schemas
• - Table creation and sample data
CREATE TABLE walmart_transactions (
transaction_id INT PRIMARY KEY,
user_id INT,
transaction_date DATE,

417
1000+ SQL Interview Questions & Answers | By Zero Analyst

product_id INT,
product_name VARCHAR(100),
quantity INT
);

-- Sample data insertions


INSERT INTO walmart_transactions (transaction_id, user_id, transaction_date, product_id,
product_name, quantity)
VALUES
(1, 101, '2023-01-10', 1001, 'Laptop', 1),
(2, 101, '2023-01-15', 1002, 'Phone', 2),
(3, 102, '2023-02-01', 1003, 'Headphones', 1),
(4, 103, '2023-02-05', 1004, 'TV', 1),
(5, 104, '2023-02-10', 1005, 'Microwave', 1),
(6, 101, '2023-02-20', 1006, 'Tablet', 3),
(7, 102, '2023-02-25', 1007, 'Refrigerator', 2),
(8, 103, '2023-03-01', 1008, 'Washing Machine', 1),
(9, 104, '2023-03-05', 1009, 'Blender', 1),
(10, 101, '2023-03-10', 1010, 'Smart Watch', 2);

Learnings
• Using MAX() to find the most recent transaction date.
• Counting products bought in a specific transaction using SUM().
• Using GROUP BY to aggregate results by user.
• Sorting results based on date using ORDER BY.
Solutions
• - PostgreSQL solution
WITH recent_transactions AS (
SELECT
user_id,
MAX(transaction_date) AS recent_transaction_date
FROM walmart_transactions
GROUP BY user_id
)
SELECT
rt.recent_transaction_date,
wt.user_id,
SUM(wt.quantity) AS num_products
FROM recent_transactions rt
JOIN walmart_transactions wt ON rt.user_id = wt.user_id AND rt.recent_transaction_date =
wt.transaction_date
GROUP BY rt.recent_transaction_date, wt.user_id
ORDER BY rt.recent_transaction_date;
• - MySQL solution
WITH recent_transactions AS (
SELECT
user_id,
MAX(transaction_date) AS recent_transaction_date
FROM walmart_transactions
GROUP BY user_id
)
SELECT
rt.recent_transaction_date,
wt.user_id,
SUM(wt.quantity) AS num_products
FROM recent_transactions rt
JOIN walmart_transactions wt ON rt.user_id = wt.user_id AND rt.recent_transaction_date =
wt.transaction_date
GROUP BY rt.recent_transaction_date, wt.user_id
ORDER BY rt.recent_transaction_date;

Explanation
• PostgreSQL and MySQL:

418
1000+ SQL Interview Questions & Answers | By Zero Analyst

• The recent_transactions CTE finds the most recent transaction date for each user by
grouping the data by user_id and using MAX() on transaction_date.
• The main query joins this CTE with the walmart_transactions table to get the user_id,
transaction_date, and the total number of products bought in their most recent transaction
(calculated using SUM(wt.quantity)).
• The results are grouped by user_id and recent_transaction_date to aggregate product
quantities.
• The final result is sorted by recent_transaction_date to output the users in
chronological order of their most recent transactions.
This problem helps you practice working with window functions, joins, and aggregation to
analyze transaction data by user. It also improves your ability to work with dates and
summarize user activity.
• Q.331
Question
Given a table of employee sales, write a query to select the Employee_id, Store_id, and a
rank based on their Sale_amount for the year 2023, with 1 being the highest performing
employee.
Explanation
To solve this:
• Filter the data for the year 2023.
• Use a window function to rank employees based on their Sale_amount in descending
order.
• Return the Employee_id, Store_id, and the rank for each employee based on their total
sales.
Datasets and SQL Schemas
• - Table creation and sample data
CREATE TABLE employee_sales (
transaction_id INT PRIMARY KEY,
employee_id INT,
store_id INT,
sale_date DATE,
sale_amount DECIMAL(10, 2)
);

-- Sample data insertions


INSERT INTO employee_sales (transaction_id, employee_id, store_id, sale_date, sale_amoun
t)
VALUES
(1, 101, 1, '2023-01-10', 500.00),
(2, 102, 2, '2023-01-12', 800.00),
(3, 103, 3, '2023-01-15', 200.00),
(4, 101, 1, '2023-02-05', 700.00),
(5, 104, 1, '2023-02-10', 1200.00),
(6, 102, 2, '2023-03-01', 400.00),
(7, 101, 1, '2023-03-12', 600.00),
(8, 105, 2, '2023-03-15', 1000.00),
(9, 103, 3, '2023-04-01', 150.00),
(10, 104, 1, '2023-04-05', 900.00);

Learnings
• Using the RANK() window function to assign rankings based on a specific column.
• Filtering data for a specific year.

419
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Grouping and aggregating data before applying ranking.


Solutions
• - PostgreSQL solution
SELECT
employee_id,
store_id,
RANK() OVER (PARTITION BY store_id ORDER BY SUM(sale_amount) DESC) AS sales_rank
FROM employee_sales
WHERE EXTRACT(YEAR FROM sale_date) = 2023
GROUP BY employee_id, store_id
ORDER BY store_id, sales_rank;
• - MySQL solution
SELECT
employee_id,
store_id,
RANK() OVER (PARTITION BY store_id ORDER BY SUM(sale_amount) DESC) AS sales_rank
FROM employee_sales
WHERE YEAR(sale_date) = 2023
GROUP BY employee_id, store_id
ORDER BY store_id, sales_rank;

Explanation
• PostgreSQL and MySQL:
• The query filters the employee_sales table for transactions in 2023 using the
EXTRACT(YEAR FROM sale_date) in PostgreSQL and YEAR(sale_date) in MySQL.
• It then calculates the total sales for each employee by grouping the data by employee_id
and store_id, and uses the SUM(sale_amount) function.
• The RANK() window function is used to assign a ranking to each employee based on their
total sales in descending order, with the highest sales receiving a rank of 1. The PARTITION
BY store_id ensures that rankings are calculated separately for each store.
• The final result is ordered by store_id and sales_rank to ensure the rankings are in the
correct order.
This problem helps you practice working with window functions, filtering data by year, and
grouping results to analyze employee performance. It also gives you insight into ranking
methods like RANK() and DENSE_RANK().
• Q.332
Question
Write a SQL query to select the Supplier_id, Product_id, and start date of the period when
the stock quantity was below 50 units for more than two consecutive days.
Explanation
To solve this:
• Identify periods where the stock quantity is below 50 for consecutive days.
• Use LEAD() or LAG() window functions to compare stock quantities from one day to the
next.
• Detect streaks of days where the stock quantity remains below 50.
• Return the Supplier_id, Product_id, and the start date of these periods.
Datasets and SQL Schemas
• - Table creation and sample data
CREATE TABLE inventory (
record_id INT PRIMARY KEY,

420
1000+ SQL Interview Questions & Answers | By Zero Analyst

supplier_id INT,
product_id INT,
stock_quantity INT,
record_date DATE
);

-- Sample data insertions


INSERT INTO inventory (record_id, supplier_id, product_id, stock_quantity, record_date)
VALUES
(1, 201, 301, 60, '2023-01-01'),
(2, 201, 301, 45, '2023-01-02'),
(3, 201, 301, 30, '2023-01-03'),
(4, 201, 301, 40, '2023-01-04'),
(5, 201, 301, 20, '2023-01-05'),
(6, 202, 302, 80, '2023-01-01'),
(7, 202, 302, 30, '2023-01-02'),
(8, 202, 302, 25, '2023-01-03'),
(9, 202, 302, 50, '2023-01-04'),
(10, 203, 303, 90, '2023-01-01'),
(11, 203, 303, 45, '2023-01-02'),
(12, 203, 303, 70, '2023-01-03'),
(13, 203, 303, 30, '2023-01-04'),
(14, 203, 303, 10, '2023-01-05');

Learnings
• Using window functions (LAG(), LEAD()) to compare adjacent rows.
• Identifying patterns or streaks of data, such as consecutive days of low stock.
• Applying conditions to find specific periods where stock is below a threshold.
Solutions
• - PostgreSQL solution
WITH consecutive_low_stock AS (
SELECT
supplier_id,
product_id,
record_date,
stock_quantity,
LAG(stock_quantity, 1) OVER (PARTITION BY supplier_id, product_id ORDER BY recor
d_date) AS prev_day_stock,
LAG(stock_quantity, 2) OVER (PARTITION BY supplier_id, product_id ORDER BY recor
d_date) AS prev_2_day_stock
FROM inventory
)
SELECT
supplier_id,
product_id,
record_date AS start_date
FROM consecutive_low_stock
WHERE stock_quantity < 50
AND prev_day_stock < 50
AND prev_2_day_stock < 50
ORDER BY record_date;
• - MySQL solution
WITH consecutive_low_stock AS (
SELECT
supplier_id,
product_id,
record_date,
stock_quantity,
LAG(stock_quantity, 1) OVER (PARTITION BY supplier_id, product_id ORDER BY recor
d_date) AS prev_day_stock,
LAG(stock_quantity, 2) OVER (PARTITION BY supplier_id, product_id ORDER BY recor
d_date) AS prev_2_day_stock
FROM inventory
)
SELECT
supplier_id,
product_id,

421
1000+ SQL Interview Questions & Answers | By Zero Analyst

record_date AS start_date
FROM consecutive_low_stock
WHERE stock_quantity < 50
AND prev_day_stock < 50
AND prev_2_day_stock < 50
ORDER BY record_date;

Explanation
• PostgreSQL and MySQL:
• The LAG() window function is used to retrieve the stock_quantity of the previous day
and the day before that for each product and supplier.
• The query checks if the current day’s stock and the previous two days’ stock are all below
50.
• If this condition is true, it indicates that the stock was below 50 for more than two
consecutive days.
• The result includes the supplier_id, product_id, and the record_date of the first day in
this low-stock period (the "start date").
• The result is sorted by record_date to show the periods in chronological order.
This problem tests your ability to work with window functions (LAG()), and identify patterns
in sequential data, such as consecutive days of low stock. It also helps practice filtering and
grouping data based on conditions applied to multiple rows.
• Q.333
Question
Write a SQL query to select the Customer_id and Store_id for customers with more than
10 purchases from the same store in the past year.
Explanation
To solve this:
• Filter the transactions for the past year using CURRENT_DATE or NOW().
• Group the data by Customer_id and Store_id.
• Count the number of purchases for each customer at each store.
• Return customers who have made more than 10 purchases at the same store.
Datasets and SQL Schemas
• - Table creation and sample data
CREATE TABLE customer_purchases (
purchase_id INT PRIMARY KEY,
customer_id INT,
store_id INT,
purchase_date DATE,
amount DECIMAL(10, 2)
);

-- Sample data insertions


INSERT INTO customer_purchases (purchase_id, customer_id, store_id, purchase_date, amoun
t)
VALUES
(1, 101, 1, '2023-01-05', 50.00),
(2, 101, 1, '2023-02-10', 30.00),
(3, 101, 1, '2023-03-15', 100.00),
(4, 102, 2, '2023-03-01', 200.00),
(5, 102, 2, '2023-03-20', 150.00),
(6, 103, 1, '2023-04-12', 70.00),
(7, 101, 1, '2023-05-10', 90.00),
(8, 101, 1, '2023-06-15', 120.00),

422
1000+ SQL Interview Questions & Answers | By Zero Analyst

(9, 104, 2, '2023-06-01', 60.00),


(10, 101, 1, '2023-07-20', 80.00),
(11, 101, 1, '2023-08-02', 110.00),
(12, 102, 2, '2023-09-10', 130.00),
(13, 101, 1, '2023-10-05', 150.00),
(14, 101, 1, '2023-11-01', 200.00),
(15, 103, 2, '2023-12-15', 90.00);

Learnings
• Using GROUP BY to group data by multiple columns.
• Counting occurrences of events (purchases) using COUNT().
• Filtering data based on the result of an aggregation function like HAVING.
• Working with date ranges (e.g., "past year") and the CURRENT_DATE function.
Solutions
• - PostgreSQL solution
SELECT
customer_id,
store_id
FROM customer_purchases
WHERE purchase_date >= CURRENT_DATE - INTERVAL '1 year'
GROUP BY customer_id, store_id
HAVING COUNT(purchase_id) > 10;
• - MySQL solution
SELECT
customer_id,
store_id
FROM customer_purchases
WHERE purchase_date >= CURDATE() - INTERVAL 1 YEAR
GROUP BY customer_id, store_id
HAVING COUNT(purchase_id) > 10;

Explanation
• PostgreSQL and MySQL:
• The query filters the data to include only purchases made in the past year using
CURRENT_DATE - INTERVAL '1 year' in PostgreSQL and CURDATE() - INTERVAL 1
YEAR in MySQL.
• The data is grouped by customer_id and store_id to identify each customer’s purchases
at each store.
• The HAVING COUNT(purchase_id) > 10 condition ensures that only customers with more
than 10 purchases from the same store in the past year are included.
• The result includes the customer_id and store_id of frequent shoppers.
This problem demonstrates how to filter and aggregate data based on time intervals and
customer activity, and it emphasizes the importance of using the HAVING clause for
aggregated conditions.
• Q.334
Question
Write a SQL query to find all orders with a total amount greater than twice the average order
amount.
Explanation
To solve this:
• Calculate the average order amount across all orders.
• Find orders where the total order amount exceeds twice this average.

423
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Return the relevant order details such as Order_id, Customer_id, and Total_amount.
Datasets and SQL Schemas
• - Table creation and sample data
CREATE TABLE orders (
order_id INT PRIMARY KEY,
customer_id INT,
order_date DATE,
total_amount DECIMAL(10, 2)
);

-- Sample data insertions


INSERT INTO orders (order_id, customer_id, order_date, total_amount)
VALUES
(1, 101, '2023-01-05', 150.00),
(2, 102, '2023-01-07', 200.00),
(3, 103, '2023-02-10', 500.00),
(4, 104, '2023-02-15', 120.00),
(5, 105, '2023-03-01', 750.00),
(6, 106, '2023-03-05', 300.00),
(7, 107, '2023-04-15', 180.00),
(8, 108, '2023-04-20', 220.00),
(9, 109, '2023-05-10', 600.00),
(10, 101, '2023-06-01', 800.00);

Learnings
• Calculating averages using AVG().
• Using subqueries to compare an individual value against an aggregate (in this case,
comparing order totals against twice the average).
• Filtering data based on a condition applied to aggregated results.
Solutions
• - PostgreSQL solution
WITH avg_order_amount AS (
SELECT AVG(total_amount) AS avg_amount FROM orders
)
SELECT
order_id,
customer_id,
total_amount
FROM orders, avg_order_amount
WHERE total_amount > 2 * avg_amount;
• - MySQL solution
WITH avg_order_amount AS (
SELECT AVG(total_amount) AS avg_amount FROM orders
)
SELECT
order_id,
customer_id,
total_amount
FROM orders, avg_order_amount
WHERE total_amount > 2 * avg_amount;

Explanation
• PostgreSQL and MySQL:
• The subquery avg_order_amount calculates the average total_amount from the orders
table.
• The main query compares the total_amount of each order to twice the average order
amount.
• Only orders where the total_amount exceeds twice the average are selected.
• The result includes order_id, customer_id, and total_amount for all qualifying orders.

424
1000+ SQL Interview Questions & Answers | By Zero Analyst

This problem tests your ability to work with aggregates and subqueries, particularly in the
context of comparing individual rows to overall summary statistics (like averages). It helps
you practice filtering based on calculated metrics.
• Q.335
Question
Write a SQL query to find the top 5 products with the highest increase in sales compared to
the previous month.
Explanation
To solve this:
• Calculate the sales of each product for the current month and the previous month.
• Subtract the previous month's sales from the current month's sales to calculate the increase.
• Order the results by the highest increase in sales.
• Return the top 5 products with the highest sales increase.
Datasets and SQL Schemas
• - Table creation and sample data
CREATE TABLE product_sales (
product_id INT,
product_name VARCHAR(100),
sale_date DATE,
sale_amount DECIMAL(10, 2)
);

-- Sample data insertions


INSERT INTO product_sales (product_id, product_name, sale_date, sale_amount)
VALUES
(1, 'Product A', '2023-01-15', 200.00),
(1, 'Product A', '2023-02-10', 300.00),
(1, 'Product A', '2023-03-05', 400.00),
(2, 'Product B', '2023-01-12', 150.00),
(2, 'Product B', '2023-02-15', 180.00),
(2, 'Product B', '2023-03-01', 350.00),
(3, 'Product C', '2023-01-20', 250.00),
(3, 'Product C', '2023-02-25', 400.00),
(3, 'Product C', '2023-03-15', 600.00),
(4, 'Product D', '2023-01-10', 500.00),
(4, 'Product D', '2023-02-20', 550.00),
(4, 'Product D', '2023-03-18', 900.00),
(5, 'Product E', '2023-01-25', 100.00),
(5, 'Product E', '2023-02-15', 200.00),
(5, 'Product E', '2023-03-22', 150.00);

Learnings
• Using GROUP BY to aggregate sales by month and product.
• Applying LAG() or self-joins to compare the current month’s sales to the previous month's
sales.
• Sorting data to get the top 5 products based on sales increase.
Solutions
• - PostgreSQL solution
WITH sales_by_month AS (
SELECT
product_id,
product_name,
EXTRACT(YEAR FROM sale_date) AS year,
EXTRACT(MONTH FROM sale_date) AS month,
SUM(sale_amount) AS total_sales

425
1000+ SQL Interview Questions & Answers | By Zero Analyst

FROM product_sales
GROUP BY product_id, product_name, EXTRACT(YEAR FROM sale_date), EXTRACT(MONTH FROM
sale_date)
),
sales_diff AS (
SELECT
a.product_id,
a.product_name,
a.year,
a.month,
a.total_sales - COALESCE(b.total_sales, 0) AS sales_increase
FROM sales_by_month a
LEFT JOIN sales_by_month b
ON a.product_id = b.product_id
AND a.year = b.year
AND a.month = b.month + 1
)
SELECT
product_id,
product_name,
sales_increase
FROM sales_diff
ORDER BY sales_increase DESC
LIMIT 5;
• - MySQL solution
WITH sales_by_month AS (
SELECT
product_id,
product_name,
YEAR(sale_date) AS year,
MONTH(sale_date) AS month,
SUM(sale_amount) AS total_sales
FROM product_sales
GROUP BY product_id, product_name, YEAR(sale_date), MONTH(sale_date)
),
sales_diff AS (
SELECT
a.product_id,
a.product_name,
a.year,
a.month,
a.total_sales - COALESCE(b.total_sales, 0) AS sales_increase
FROM sales_by_month a
LEFT JOIN sales_by_month b
ON a.product_id = b.product_id
AND a.year = b.year
AND a.month = b.month + 1
)
SELECT
product_id,
product_name,
sales_increase
FROM sales_diff
ORDER BY sales_increase DESC
LIMIT 5;

Explanation
• PostgreSQL and MySQL:
• First, the sales_by_month CTE calculates the total sales for each product in each month
by extracting the year and month from the sale_date.
• The sales_diff CTE calculates the difference in sales between the current month
(a.total_sales) and the previous month (b.total_sales). The COALESCE() function
ensures that if there is no previous month's data, the sales increase is treated as the total sales
for the current month.
• The final result selects the top 5 products with the highest sales increase, ordered by
sales_increase in descending order.

426
1000+ SQL Interview Questions & Answers | By Zero Analyst

This query demonstrates how to compare aggregated data across different time periods
(months) and how to find products with the highest performance improvement over time.
• Q.336
Question
Write a SQL query to calculate the average time taken to fulfill orders, from order placement
to delivery, for each city.
Explanation
To solve this:
• Calculate the difference between the order placement date (order_date) and the delivery
date (delivery_date) for each order.
• Group the data by city and calculate the average time taken to fulfill the order for each
city.
• Output the city and the average fulfillment time (in days).
Datasets and SQL Schemas
• - Table creation and sample data
CREATE TABLE orders (
order_id INT PRIMARY KEY,
customer_id INT,
city VARCHAR(100),
order_date DATE,
delivery_date DATE,
total_amount DECIMAL(10, 2)
);

-- Sample data insertions


INSERT INTO orders (order_id, customer_id, city, order_date, delivery_date, total_amount
)
VALUES
(1, 101, 'New York', '2023-01-05', '2023-01-10', 250.00),
(2, 102, 'Los Angeles', '2023-01-10', '2023-01-15', 180.00),
(3, 103, 'Chicago', '2023-01-12', '2023-01-18', 300.00),
(4, 104, 'New York', '2023-01-15', '2023-01-20', 150.00),
(5, 105, 'Los Angeles', '2023-02-05', '2023-02-10', 200.00),
(6, 106, 'Chicago', '2023-02-10', '2023-02-14', 400.00),
(7, 107, 'New York', '2023-02-15', '2023-02-20', 320.00),
(8, 108, 'Chicago', '2023-02-20', '2023-02-25', 350.00),
(9, 109, 'Los Angeles', '2023-03-01', '2023-03-06', 180.00),
(10, 101, 'New York', '2023-03-10', '2023-03-15', 500.00);

Learnings
• Calculating date differences in SQL using DATEDIFF() or direct subtraction.
• Using GROUP BY to group the data by city and calculate aggregates.
• Aggregating with AVG() to find the average fulfillment time per city.
Solutions
• - PostgreSQL solution
SELECT
city,
AVG(CAST(delivery_date - order_date AS INTEGER)) AS avg_fulfillment_time
FROM orders
GROUP BY city;
• - MySQL solution
SELECT
city,
AVG(DATEDIFF(delivery_date, order_date)) AS avg_fulfillment_time
FROM orders

427
1000+ SQL Interview Questions & Answers | By Zero Analyst

GROUP BY city;

Explanation
• PostgreSQL:
• The difference between delivery_date and order_date is calculated directly. We cast
the result of this subtraction to an integer to get the number of days.
• AVG() is used to calculate the average fulfillment time for each city.
• MySQL:
• The DATEDIFF() function calculates the difference between delivery_date and
order_date in days.
• AVG() is used to calculate the average fulfillment time for each city.
Both solutions group the results by city to show the average time taken to fulfill orders for
each city.
This problem is a good exercise in working with date functions and aggregating time-based
data.
• Q.337
Question
Write a SQL query to identify customers who have not placed any orders in the last 6 months
but had placed more than 5 orders in the 6 months prior.
Explanation
To solve this:
• Identify customers who have not placed any orders in the last 6 months.
• Check the number of orders placed by these customers in the 6 months prior.
• Return customers who meet the condition of placing more than 5 orders in the previous 6
months and none in the last 6 months.
Datasets and SQL Schemas
• - Table creation and sample data
CREATE TABLE orders (
order_id INT PRIMARY KEY,
customer_id INT,
order_date DATE,
total_amount DECIMAL(10, 2)
);

-- Sample data insertions


INSERT INTO orders (order_id, customer_id, order_date, total_amount)
VALUES
(1, 101, '2023-01-05', 250.00),
(2, 101, '2023-02-10', 180.00),
(3, 101, '2023-03-05', 300.00),
(4, 102, '2023-01-12', 150.00),
(5, 102, '2023-02-15', 200.00),
(6, 102, '2023-03-01', 350.00),
(7, 103, '2022-08-20', 250.00),
(8, 103, '2022-09-15', 300.00),
(9, 103, '2022-10-10', 150.00),
(10, 103, '2022-11-05', 180.00),
(11, 103, '2022-12-01', 250.00),
(12, 103, '2023-01-10', 400.00),
(13, 104, '2023-01-25', 100.00),
(14, 104, '2023-02-15', 220.00),
(15, 104, '2023-03-22', 150.00);

428
1000+ SQL Interview Questions & Answers | By Zero Analyst

Learnings
• Filtering data by date ranges (using CURRENT_DATE or NOW()).
• Using COUNT() to find the number of orders for a customer in a specific time period.
• Combining conditions using HAVING to filter based on aggregated results.
Solutions
• - PostgreSQL solution
WITH orders_in_last_6_months AS (
SELECT
customer_id
FROM orders
WHERE order_date >= CURRENT_DATE - INTERVAL '6 months'
GROUP BY customer_id
),
orders_in_previous_6_months AS (
SELECT
customer_id
FROM orders
WHERE order_date >= CURRENT_DATE - INTERVAL '12 months'
AND order_date < CURRENT_DATE - INTERVAL '6 months'
GROUP BY customer_id
HAVING COUNT(order_id) > 5
)
SELECT
o.customer_id
FROM orders_in_previous_6_months o
LEFT JOIN orders_in_last_6_months l
ON o.customer_id = l.customer_id
WHERE l.customer_id IS NULL;
• - MySQL solution
WITH orders_in_last_6_months AS (
SELECT
customer_id
FROM orders
WHERE order_date >= CURDATE() - INTERVAL 6 MONTH
GROUP BY customer_id
),
orders_in_previous_6_months AS (
SELECT
customer_id
FROM orders
WHERE order_date >= CURDATE() - INTERVAL 12 MONTH
AND order_date < CURDATE() - INTERVAL 6 MONTH
GROUP BY customer_id
HAVING COUNT(order_id) > 5
)
SELECT
o.customer_id
FROM orders_in_previous_6_months o
LEFT JOIN orders_in_last_6_months l
ON o.customer_id = l.customer_id
WHERE l.customer_id IS NULL;

Explanation
• PostgreSQL and MySQL:
• The first CTE (orders_in_last_6_months) finds all customers who placed an order in
the last 6 months.
• The second CTE (orders_in_previous_6_months) finds customers who placed more
than 5 orders in the 6-month period prior to the last 6 months. This is done using HAVING
COUNT(order_id) > 5.
• A LEFT JOIN is performed between the two CTEs to ensure we identify customers who
have no orders in the last 6 months (l.customer_id IS NULL).

429
1000+ SQL Interview Questions & Answers | By Zero Analyst

• The final result returns customers who placed more than 5 orders in the previous 6 months
but have not placed any orders in the last 6 months.
This query demonstrates the use of CTEs, filtering based on date ranges, and using COUNT()
with HAVING to aggregate and filter data.
• Q.338
Question
Write a SQL query to find out supplier_id, product_id, and starting date of
record_date` for which stock quantity is less than 50 for two or more consecutive days.
Explanation
To solve this:
• Identify days where stock quantity is less than 50.
• Check for consecutive days where stock remains below 50.
• Return the supplier_id, product_id, and the first date of consecutive days when stock
quantity was below 50.
Datasets and SQL Schemas
• - Table creation and sample data
CREATE TABLE inventory (
supplier_id INT,
product_id INT,
record_date DATE,
stock_quantity INT
);

-- Sample data insertions


INSERT INTO inventory (supplier_id, product_id, record_date, stock_quantity)
VALUES
(1, 101, '2023-01-01', 60),
(1, 101, '2023-01-02', 40),
(1, 101, '2023-01-03', 45),
(1, 101, '2023-01-04', 30),
(1, 101, '2023-01-05', 20),
(1, 102, '2023-01-01', 50),
(1, 102, '2023-01-02', 49),
(1, 102, '2023-01-03', 48),
(1, 102, '2023-01-04', 30),
(1, 102, '2023-01-05', 55),
(2, 103, '2023-01-01', 10),
(2, 103, '2023-01-02', 20),
(2, 103, '2023-01-03', 30),
(2, 103, '2023-01-04', 40),
(2, 103, '2023-01-05', 50),
(2, 104, '2023-01-01', 70),
(2, 104, '2023-01-02', 45),
(2, 104, '2023-01-03', 48),
(2, 104, '2023-01-04', 30),
(2, 104, '2023-01-05', 49),
(3, 105, '2023-01-01', 60),
(3, 105, '2023-01-02', 45),
(3, 105, '2023-01-03', 49),
(3, 105, '2023-01-04', 29),
(3, 105, '2023-01-05', 25),
(3, 106, '2023-01-01', 80),
(3, 106, '2023-01-02', 60),
(3, 106, '2023-01-03', 30),
(3, 106, '2023-01-04', 25),
(3, 106, '2023-01-05', 45),
(4, 107, '2023-01-01', 100),
(4, 107, '2023-01-02', 75),
(4, 107, '2023-01-03', 49),

430
1000+ SQL Interview Questions & Answers | By Zero Analyst

(4, 107, '2023-01-04', 45),


(4, 107, '2023-01-05', 40),
(4, 108, '2023-01-01', 60),
(4, 108, '2023-01-02', 48),
(4, 108, '2023-01-03', 49),
(4, 108, '2023-01-04', 50),
(4, 108, '2023-01-05', 20);

Learnings
• Identifying consecutive rows with specific conditions (in this case, stock quantity less than
50).
• Using LEAD() or LAG() window functions for identifying consecutive data.
• Combining CASE statements or joins to filter consecutive periods with specific conditions.
Solutions
• - PostgreSQL solution
WITH consecutive_low_stock AS (
SELECT
supplier_id,
product_id,
record_date,
stock_quantity,
LEAD(record_date) OVER (PARTITION BY supplier_id, product_id ORDER BY record_dat
e) AS next_day,
LEAD(stock_quantity) OVER (PARTITION BY supplier_id, product_id ORDER BY record_
date) AS next_day_stock
FROM inventory
WHERE stock_quantity < 50
)
SELECT
supplier_id,
product_id,
record_date
FROM consecutive_low_stock
WHERE next_day IS NOT NULL
AND stock_quantity < 50
AND next_day_stock < 50
ORDER BY record_date;
• - MySQL solution
WITH consecutive_low_stock AS (
SELECT
supplier_id,
product_id,
record_date,
stock_quantity,
LEAD(record_date) OVER (PARTITION BY supplier_id, product_id ORDER BY record_dat
e) AS next_day,
LEAD(stock_quantity) OVER (PARTITION BY supplier_id, product_id ORDER BY record_
date) AS next_day_stock
FROM inventory
WHERE stock_quantity < 50
)
SELECT
supplier_id,
product_id,
record_date
FROM consecutive_low_stock
WHERE next_day IS NOT NULL
AND stock_quantity < 50
AND next_day_stock < 50
ORDER BY record_date;

Explanation
• PostgreSQL and MySQL:

431
1000+ SQL Interview Questions & Answers | By Zero Analyst

• The LEAD() window function is used to get the next day's record_date and
stock_quantity for each row.
• The query filters out rows where the stock quantity is less than 50 and checks if both the
current day and the next day have a stock quantity less than 50, indicating consecutive days
with low stock.
• The result includes supplier_id, product_id, and record_date for the starting date of
each consecutive low-stock period.
This solution uses window functions (LEAD()) to examine consecutive days and finds periods
where stock is consistently low, helping to identify potential inventory issues for suppliers.
• Q.339
Question
Given a table containing product sales data, write a query to find the top-selling product by
revenue in each product category. Include the category, product name, and total sales for each
product.
Explanation
To solve this:
• Calculate the total sales (total_sales) for each product in each category by multiplying
the quantity sold by the price.
• Group the data by product category and product name, then sum up the total sales for each
product.
• Use a window function (RANK() or ROW_NUMBER()) to rank the products within each
category based on total sales in descending order.
• Retrieve the top-ranked product in each category, which will be the one with the highest
total sales.
Datasets and SQL Schemas
• - Table creation and sample data
CREATE TABLE product_sales (
category VARCHAR(100),
product_name VARCHAR(100),
quantity_sold INT,
price DECIMAL(10, 2)
);

-- Sample data insertions


INSERT INTO product_sales (category, product_name, quantity_sold, price)
VALUES
('Electronics', 'Laptop', 100, 800.00),
('Electronics', 'Smartphone', 150, 600.00),
('Electronics', 'Tablet', 120, 400.00),
('Electronics', 'Headphones', 200, 100.00),
('Furniture', 'Chair', 50, 150.00),
('Furniture', 'Table', 60, 300.00),
('Furniture', 'Sofa', 30, 500.00),
('Furniture', 'Bed', 40, 700.00),
('Clothing', 'T-shirt', 500, 20.00),
('Clothing', 'Jeans', 300, 40.00),
('Clothing', 'Jacket', 150, 60.00),
('Clothing', 'Sweater', 200, 45.00);

Learnings
• Aggregating data based on categories and products.
• Using SUM() to calculate total sales.

432
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Using window functions like RANK() or ROW_NUMBER() to rank products by their total sales
within each category.
• Using PARTITION BY with window functions to apply the ranking within each category.
Solutions
• - PostgreSQL solution
WITH ranked_sales AS (
SELECT
category,
product_name,
SUM(quantity_sold * price) AS total_sales,
RANK() OVER (PARTITION BY category ORDER BY SUM(quantity_sold * price) DESC) AS
sales_rank
FROM product_sales
GROUP BY category, product_name
)
SELECT
category,
product_name,
total_sales
FROM ranked_sales
WHERE sales_rank = 1
ORDER BY category;
• - MySQL solution
WITH ranked_sales AS (
SELECT
category,
product_name,
SUM(quantity_sold * price) AS total_sales,
RANK() OVER (PARTITION BY category ORDER BY SUM(quantity_sold * price) DESC) AS
sales_rank
FROM product_sales
GROUP BY category, product_name
)
SELECT
category,
product_name,
total_sales
FROM ranked_sales
WHERE sales_rank = 1
ORDER BY category;

Explanation
• PostgreSQL and MySQL:
• The SUM(quantity_sold * price) calculates the total sales for each product.
• RANK() is used to assign a rank to each product within its category, ordering by total sales
in descending order.
• The PARTITION BY category ensures the ranking is calculated separately for each
category.
• Only the top-ranked product in each category is selected by filtering on sales_rank = 1.
• The result shows the category, product name, and total sales for the top-selling product in
each category.
This query demonstrates the use of window functions (RANK()) to rank products within each
category based on their total sales, allowing you to find the top-sellers efficiently.
• Q.340
Question

433
1000+ SQL Interview Questions & Answers | By Zero Analyst

Write a SQL query to find all possible combinations of employees and departments (cross
join). Include the employee's employee_id, employee_name, department_id, and
department_name.

Explanation
To solve this:
• Use a CROSS JOIN to combine all rows from the employees table with all rows from the
departments table.
• Ensure the output includes the employee's ID and name, as well as the department's ID and
name.
Datasets and SQL Schemas
• - Table creation and sample data
CREATE TABLE employees (
employee_id INT,
employee_name VARCHAR(100)
);

CREATE TABLE departments (


department_id INT,
department_name VARCHAR(100)
);

-- Sample data insertions


INSERT INTO employees (employee_id, employee_name)
VALUES
(1, 'John Doe'),
(2, 'Jane Smith'),
(3, 'Emily Johnson');

INSERT INTO departments (department_id, department_name)


VALUES
(1, 'Sales'),
(2, 'Marketing'),
(3, 'Finance');

Learnings
• Using CROSS JOIN to generate all possible combinations of two tables.
• Understanding how Cartesian products work (each row of the first table is combined with
all rows from the second table).
Solutions
• - PostgreSQL and MySQL solution
SELECT
e.employee_id,
e.employee_name,
d.department_id,
d.department_name
FROM employees e
CROSS JOIN departments d;

Explanation
• The CROSS JOIN creates a Cartesian product of the employees and departments tables.
This means each employee will be paired with every department, resulting in all possible
combinations.
• The query outputs the employee_id, employee_name, department_id, and
department_name for each combination.

434
1000+ SQL Interview Questions & Answers | By Zero Analyst

This query demonstrates how to use a CROSS JOIN to generate all combinations between two
tables without any filtering or condition.

Flipkart
• Q.341
Question
How would you concatenate two strings in SQL?
Explanation
To solve this:
• Use the string concatenation operator or function specific to the SQL dialect you are
working with.
• The goal is to combine two string values into a single string.
Datasets and SQL Schemas
No table creation is needed for this specific question as we are just focusing on string
concatenation.
Learnings
• Understanding how to concatenate strings in different SQL databases.
• Using the string concatenation operator (|| or +) or the built-in function (CONCAT()).
Solutions
• - PostgreSQL solution
SELECT 'Hello' || ' ' || 'World' AS concatenated_string;
• - MySQL solution
SELECT CONCAT('Hello', ' ', 'World') AS concatenated_string;

Explanation
• PostgreSQL: The || operator is used to concatenate strings.
• MySQL: The CONCAT() function is used to concatenate multiple strings.
In both examples, the query combines "Hello" and "World" with a space between them to
return the concatenated result "Hello World".
• Q.342

Question
Write a query to find all the big countries from the World table. A country is considered big if
it has:
• An area greater than 3 million square kilometers, or
• A population of more than 25 million.
Output the name, population, and area of these countries.

Explanation
To solve this:
• Use a WHERE clause to filter the countries that either:

435
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Have an area greater than 3 million (area > 3000000), or


• Have a population greater than 25 million (population > 25000000).
• Select the relevant columns: name, population, and area.

Datasets and SQL Schemas


• - Table creation and sample data
CREATE TABLE World (
name VARCHAR(255),
continent VARCHAR(255),
area INT,
population INT,
gdp INT
);

-- Sample data insertions


INSERT INTO World (name, continent, area, population, gdp)
VALUES
('Afghanistan', 'Asia', 652230, 25500100, 20343000),
('Albania', 'Europe', 28748, 2831741, 12960000),
('Algeria', 'Africa', 2381741, 37100000, 188681000),
('Andorra', 'Europe', 468, 78115, 3712000),
('Angola', 'Africa', 1246700, 20609294, 100990000);

Learnings
• Using logical conditions (OR) in the WHERE clause to filter rows.
• Filtering numeric data based on multiple criteria.

Solutions
• - PostgreSQL solution
SELECT name, population, area
FROM World
WHERE area > 3000000 OR population > 25000000;
• - MySQL solution
SELECT name, population, area
FROM World
WHERE area > 3000000 OR population > 25000000;
• Q.343

Question
Write a SQL query to find pairs of (actor_id, director_id) where the actor has co-
worked with the director at least 3 times.

Explanation
• Group by actor_id and director_id: We need to count how many times each (actor_id,
director_id) pair appears in the table.
• HAVING clause: We filter out the pairs that have less than 3 occurrences.
• Select distinct pairs: After applying the HAVING clause, select the pairs that meet the
condition.

Datasets and SQL Schemas


• - Table creation and sample data
CREATE TABLE ActorDirector (
actor_id INT,
director_id INT,
timestamp INT PRIMARY KEY
);

436
1000+ SQL Interview Questions & Answers | By Zero Analyst

-- Sample data insertions


INSERT INTO ActorDirector (actor_id, director_id, timestamp)
VALUES
(1, 1, 0),
(1, 1, 1),
(1, 1, 2),
(1, 2, 3),
(1, 2, 4),
(2, 1, 5),
(2, 1, 6);

Learnings
• GROUP BY: Useful for aggregating data (like counting occurrences) based on one or
more columns.
• HAVING: Used in combination with GROUP BY to filter groups based on an aggregate
condition.
• COUNT(): The aggregate function that counts the number of rows per group.

Solutions
• - PostgreSQL solution
SELECT actor_id, director_id
FROM ActorDirector
GROUP BY actor_id, director_id
HAVING COUNT(*) >= 3;
• - MySQL solution
SELECT actor_id, director_id
FROM ActorDirector
GROUP BY actor_id, director_id
HAVING COUNT(*) >= 3;
• Q.344

Question
Write a SQL query to retrieve the FirstName, LastName, City, and State for each person in
the Person table, regardless of whether there is an associated address for each person.

Explanation
• LEFT JOIN: Since we want to include all people from the Person table, even those
without an address, we should use a LEFT JOIN to join the Person table with the Address
table. This will include all rows from the Person table and matching rows from the Address
table. If no match is found, the columns from Address will contain NULL.
• Select required columns: After joining the tables, we select the FirstName, LastName,
City, and State from the result.

Datasets and SQL Schemas


• - Table creation and sample data
CREATE TABLE Person (
PersonId INT PRIMARY KEY,
FirstName VARCHAR(50),
LastName VARCHAR(50)
);

CREATE TABLE Address (


AddressId INT PRIMARY KEY,
PersonId INT,
City VARCHAR(50),
State VARCHAR(50),

437
1000+ SQL Interview Questions & Answers | By Zero Analyst

FOREIGN KEY (PersonId) REFERENCES Person(PersonId)


);

-- Sample data insertions


INSERT INTO Person (PersonId, FirstName, LastName)
VALUES
(1, 'John', 'Doe'),
(2, 'Jane', 'Smith'),
(3, 'Mike', 'Johnson');

INSERT INTO Address (AddressId, PersonId, City, State)


VALUES
(1, 1, 'New York', 'NY'),
(2, 2, 'Los Angeles', 'CA');

Learnings
• LEFT JOIN: Ensures that all records from the left table (Person) are included, even if
there is no matching record in the right table (Address).
• Handling NULLs: When there's no matching address, the City and State will be NULL
for that person.

Solutions
• - PostgreSQL solution
SELECT p.FirstName, p.LastName, a.City, a.State
FROM Person p
LEFT JOIN Address a ON p.PersonId = a.PersonId;
• - MySQL solution
SELECT p.FirstName, p.LastName, a.City, a.State
FROM Person p
LEFT JOIN Address a ON p.PersonId = a.PersonId;
• Q.345

Question
Write a SQL query to get the second-highest salary from the Employee table.

Explanation
• Subquery or Ranking Functions: We can use a subquery to find the maximum salary that
is less than the highest salary, which will effectively give us the second-highest salary.
• Handle edge case: If there's no second-highest salary (i.e., all employees have the same
salary), we need to return NULL.

Datasets and SQL Schemas


• - Table creation and sample data
CREATE TABLE Employee (
Id INT PRIMARY KEY,
Salary INT
);

-- Sample data insertions


INSERT INTO Employee (Id, Salary)
VALUES
(1, 100),
(2, 200),
(3, 300);

Learnings

438
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Subquery: Using a subquery to select the maximum salary less than the highest salary
gives us the second-highest value.
• Edge case: If there is no second-highest salary (e.g., only one distinct salary value), the
query should return NULL.

Solutions
• - PostgreSQL and MySQL solution
SELECT MAX(Salary) AS salary
FROM Employee
WHERE Salary < (SELECT MAX(Salary) FROM Employee);

Explanation:
• The subquery (SELECT MAX(Salary) FROM Employee) finds the highest salary.
• The outer query finds the maximum salary that is less than the highest salary, which is the
second-highest salary.
• If all salaries are the same, the WHERE clause will filter out all rows, and the result will be
NULL.
• Q.346

Question
Write a SQL query to find the employee with the maximum salary for each gender from the
Salary table.

Explanation
• Group By: We need to group the records by gender (sex column).
• Max Salary: For each gender group, we need to select the employee with the maximum
salary.
• Tie-breaker: If there are multiple employees with the same maximum salary in the same
gender, the query should still return all of them.

Datasets and SQL Schemas


• - Table creation and sample data
CREATE TABLE Salary (
id INT PRIMARY KEY,
name VARCHAR(255),
sex ENUM('m', 'f'),
salary INT
);

-- Sample data insertions


INSERT INTO Salary (id, name, sex, salary)
VALUES
(1, 'John', 'm', 5000),
(2, 'Alice', 'f', 7000),
(3, 'Bob', 'm', 8000),
(4, 'Charlie', 'f', 7000),
(5, 'David', 'm', 8000);

Learnings
• Aggregate functions: Using MAX() allows us to fetch the highest salary.
• Grouping: We need to group by sex to separate male and female employees.
• Handling ties: If multiple employees have the same highest salary, the query should return
all such employees.

439
1000+ SQL Interview Questions & Answers | By Zero Analyst

Solutions
• - PostgreSQL and MySQL solution
SELECT sex, name, salary
FROM Salary
WHERE (sex, salary) IN (
SELECT sex, MAX(salary)
FROM Salary
GROUP BY sex
);

Explanation:
• The subquery (SELECT sex, MAX(salary) FROM Salary GROUP BY sex) finds the
maximum salary for each gender.
• The outer query selects the employees whose sex and salary match those maximum
values, thus returning the employees with the highest salary in each gender.
• If multiple employees share the highest salary within the same gender, they are all
returned.
• Q.347
Question
Find all numbers that appear at least three times consecutively in the "Logs" table.
Explanation
You need to identify numbers that appear consecutively in three or more consecutive rows.
This can be done by comparing each row with its two subsequent rows.
Datasets and SQL Schemas
• - Table creation and sample data
CREATE TABLE Logs (
id INT PRIMARY KEY,
num VARCHAR(10)
);

-- Sample data insertions


INSERT INTO Logs (id, num) VALUES
(1, '1'),
(2, '1'),
(3, '1'),
(4, '2'),
(5, '1'),
(6, '2'),
(7, '2');

Learnings
• Using JOIN to compare rows with their subsequent ones
• Identifying consecutive rows
• Applying aggregation with GROUP BY for uniqueness
Solutions
• - PostgreSQL solution
SELECT DISTINCT l1.num AS ConsecutiveNums
FROM Logs l1, Logs l2, Logs l3
WHERE l1.id = l2.id - 1 AND l2.id = l3.id - 1
AND l1.num = l2.num AND l2.num = l3.num;
• - MySQL solution
SELECT DISTINCT l1.num AS ConsecutiveNums
FROM Logs l1, Logs l2, Logs l3
WHERE l1.id = l2.id - 1 AND l2.id = l3.id - 1
AND l1.num = l2.num AND l2.num = l3.num;

440
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Q.348
Question
Find the number of followers each follower has if they themselves have at least one follower.
Explanation
You need to find each follower's follower count. This can be done by joining the "follow"
table on itself and counting how many distinct followers each user has.
Datasets and SQL Schemas
• - Table creation and sample data
CREATE TABLE follow (
followee VARCHAR(10),
follower VARCHAR(10)
);

-- Sample data insertions


INSERT INTO follow (followee, follower) VALUES
('A', 'B'),
('B', 'C'),
('B', 'D'),
('D', 'E');

Learnings
• Self-join to count related records
• Using GROUP BY and COUNT() for aggregation
• Filtering with HAVING to include only those with at least one follower
Solutions
• - PostgreSQL solution
SELECT f1.follower, COUNT(DISTINCT f2.follower) AS num
FROM follow f1
JOIN follow f2 ON f1.follower = f2.followee
GROUP BY f1.follower
HAVING COUNT(DISTINCT f2.follower) > 0;
• - MySQL solution
SELECT f1.follower, COUNT(DISTINCT f2.follower) AS num
FROM follow f1
JOIN follow f2 ON f1.follower = f2.followee
GROUP BY f1.follower
HAVING COUNT(DISTINCT f2.follower) > 0;
• Q.349
Question
For each user, find the largest gap in days between consecutive visits or from the last visit to
today's date.
Explanation
You need to calculate the difference in days between each user's visit and the next visit (or
today’s date for the last visit). The largest gap should be selected for each user.
Datasets and SQL Schemas
• - Table creation and sample data
CREATE TABLE UserVisits (
user_id INT,
visit_date DATE
);

-- Sample data insertions

441
1000+ SQL Interview Questions & Answers | By Zero Analyst

INSERT INTO UserVisits (user_id, visit_date) VALUES


(1, '2020-11-28'),
(1, '2020-10-20'),
(1, '2020-12-03'),
(2, '2020-10-05'),
(2, '2020-12-09'),
(3, '2020-11-11');

Learnings
• Using LEAD() to get the next visit date
• Calculating the date difference with DATEDIFF()
• Using COALESCE() to handle the last visit date by comparing it to today
• Grouping by user_id and selecting the maximum gap
Solutions
• - PostgreSQL solution
WITH VisitGaps AS (
SELECT user_id,
visit_date,
LEAD(visit_date, 1, '2021-01-01'::date) OVER (PARTITION BY user_id ORDER BY v
isit_date) AS next_visit_date
FROM UserVisits
)
SELECT user_id, MAX(DATE(next_visit_date) - DATE(visit_date)) AS biggest_window
FROM VisitGaps
GROUP BY user_id
ORDER BY user_id;
• - MySQL solution
WITH VisitGaps AS (
SELECT user_id,
visit_date,
LEAD(visit_date, 1, '2021-01-01') OVER (PARTITION BY user_id ORDER BY visit_d
ate) AS next_visit_date
FROM UserVisits
)
SELECT user_id, MAX(DATEDIFF(next_visit_date, visit_date)) AS biggest_window
FROM VisitGaps
GROUP BY user_id
ORDER BY user_id;
• Q.350
Question
Find the transaction IDs with the maximum amount for each day. If there are multiple
transactions with the same maximum amount on a day, return all of them.
Explanation
You need to identify the transaction(s) with the highest amount for each day. To achieve this,
you can first find the maximum amount per day and then filter the transactions based on that.
Datasets and SQL Schemas
• - Table creation and sample data
CREATE TABLE Transactions (
transaction_id INT PRIMARY KEY,
day DATETIME,
amount INT
);

-- Sample data insertions


INSERT INTO Transactions (transaction_id, day, amount) VALUES
(8, '2021-4-3 15:57:28', 57),
(9, '2021-4-28 08:47:25', 21),
(1, '2021-4-29 13:28:30', 58),

442
1000+ SQL Interview Questions & Answers | By Zero Analyst

(5, '2021-4-28 16:39:59', 40),


(6, '2021-4-29 23:39:28', 58);

Learnings
• Using MAX() with GROUP BY to find the maximum value
• Filtering results based on the maximum value for each day
• Sorting the result by transaction ID
Solutions
• - PostgreSQL solution
WITH MaxAmounts AS (
SELECT DATE(day) AS day, MAX(amount) AS max_amount
FROM Transactions
GROUP BY DATE(day)
)
SELECT t.transaction_id
FROM Transactions t
JOIN MaxAmounts ma ON DATE(t.day) = ma.day
WHERE t.amount = ma.max_amount
ORDER BY t.transaction_id;
• - MySQL solution
WITH MaxAmounts AS (
SELECT DATE(day) AS day, MAX(amount) AS max_amount
FROM Transactions
GROUP BY DATE(day)
)
SELECT t.transaction_id
FROM Transactions t
JOIN MaxAmounts ma ON DATE(t.day) = ma.day
WHERE t.amount = ma.max_amount
ORDER BY t.transaction_id;
• Q.351
Question
Find the records with three or more consecutive rows where the number of people is greater
than or equal to 100 for each row.
Explanation
You need to identify groups of three or more consecutive rows where the number of people is
greater than or equal to 100 for all rows in the group. You can achieve this by checking for
consecutive id values and applying a condition on the people column.
Datasets and SQL Schemas
• - Table creation and sample data
CREATE TABLE Stadium (
id INT PRIMARY KEY,
visit_date DATE,
people INT
);

-- Sample data insertions


INSERT INTO Stadium (id, visit_date, people) VALUES
(1, '2017-01-01', 10),
(2, '2017-01-02', 109),
(3, '2017-01-03', 150),
(4, '2017-01-04', 99),
(5, '2017-01-05', 145),
(6, '2017-01-06', 1455),
(7, '2017-01-07', 199),
(8, '2017-01-09', 188);

Learnings

443
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Identifying consecutive rows using id


• Filtering records with people >= 100
• Using LEAD() and LAG() window functions to check consecutive rows
• Ensuring that three or more consecutive rows meet the conditions
Solutions
• - PostgreSQL solution
WITH ConsecutiveStadium AS (
SELECT s1.id, s1.visit_date, s1.people,
LEAD(s1.id, 1) OVER (ORDER BY s1.id) AS next_id,
LEAD(s1.people, 1) OVER (ORDER BY s1.id) AS next_people,
LAG(s1.id, 1) OVER (ORDER BY s1.id) AS prev_id,
LAG(s1.people, 1) OVER (ORDER BY s1.id) AS prev_people
FROM Stadium s1
WHERE s1.people >= 100
)
SELECT id, visit_date, people
FROM ConsecutiveStadium
WHERE (id - prev_id = 1 AND next_id - id = 1)
ORDER BY visit_date;
• - MySQL solution
WITH ConsecutiveStadium AS (
SELECT s1.id, s1.visit_date, s1.people,
LEAD(s1.id, 1) OVER (ORDER BY s1.id) AS next_id,
LEAD(s1.people, 1) OVER (ORDER BY s1.id) AS next_people,
LAG(s1.id, 1) OVER (ORDER BY s1.id) AS prev_id,
LAG(s1.people, 1) OVER (ORDER BY s1.id) AS prev_people
FROM Stadium s1
WHERE s1.people >= 100
)
SELECT id, visit_date, people
FROM ConsecutiveStadium
WHERE (id - prev_id = 1 AND next_id - id = 1)
ORDER BY visit_date;
• Q.352
Question
Calculate the total cubic feet of volume occupied by the inventory in each warehouse, based
on the product dimensions and units available in each warehouse.
Explanation
To calculate the cubic feet occupied by the inventory in each warehouse, you need to
multiply the dimensions (Width, Length, Height) of each product with the number of units in
the warehouse. Then, sum the volumes for each warehouse.
Datasets and SQL Schemas
• - Table creation and sample data
CREATE TABLE Warehouse (
name VARCHAR(100),
product_id INT,
units INT,
PRIMARY KEY (name, product_id)
);

CREATE TABLE Products (


product_id INT PRIMARY KEY,
product_name VARCHAR(100),
Width INT,
Length INT,
Height INT
);

-- Sample data insertions

444
1000+ SQL Interview Questions & Answers | By Zero Analyst

INSERT INTO Warehouse (name, product_id, units) VALUES


('Warehouse1', 1, 100),
('Warehouse1', 2, 200),
('Warehouse2', 1, 50);

INSERT INTO Products (product_id, product_name, Width, Length, Height) VALUES


(1, 'ProductA', 2, 3, 4),
(2, 'ProductB', 1, 1, 2);

Learnings
• Using JOIN to combine data from multiple tables
• Calculating volume by multiplying product dimensions with the units
• Grouping by warehouse name to get the total volume
Solutions
• - PostgreSQL solution
SELECT w.name AS warehouse_name,
SUM(w.units * p.Width * p.Length * p.Height) AS volume
FROM Warehouse w
JOIN Products p ON w.product_id = p.product_id
GROUP BY w.name;
• - MySQL solution
SELECT w.name AS warehouse_name,
SUM(w.units * p.Width * p.Length * p.Height) AS volume
FROM Warehouse w
JOIN Products p ON w.product_id = p.product_id
GROUP BY w.name;
• Q.353
Question
Find the patient_id, patient_name, and all conditions for patients who have Type I
Diabetes, where the conditions contain codes that start with the prefix DIAB1.
Explanation
You need to filter patients whose conditions column contains at least one code starting with
DIAB1. This can be done using the LIKE operator in SQL to match any condition that begins
with DIAB1.
Datasets and SQL Schemas
• - Table creation and sample data
CREATE TABLE Patients (
patient_id INT PRIMARY KEY,
patient_name VARCHAR(100),
conditions VARCHAR(255)
);

-- Sample data insertions


INSERT INTO Patients (patient_id, patient_name, conditions) VALUES
(1, 'Daniel', 'YFEV COUGH'),
(2, 'Alice', ''),
(3, 'Bob', 'DIAB100 MYOP'),
(4, 'George', 'ACNE DIAB100'),
(5, 'Alain', 'DIAB201');

Learnings
• Using the LIKE operator to filter rows based on pattern matching
• Handling conditions stored as space-separated strings
• Querying based on a specific prefix (DIAB1)
Solutions

445
1000+ SQL Interview Questions & Answers | By Zero Analyst

• - PostgreSQL solution
SELECT patient_id, patient_name, conditions
FROM Patients
WHERE conditions LIKE '%DIAB1%';
• - MySQL solution
SELECT patient_id, patient_name, conditions
FROM Patients
WHERE conditions LIKE '%DIAB1%';
• Q.354
Question
Find the second most recent activity for each user. If a user has only one activity, return that
activity.
Explanation
To find the second most recent activity, you need to order the activities for each user by
startDate in descending order and then fetch the second row (or the first row if only one
exists). You can achieve this using ROW_NUMBER() to rank the activities for each user and then
filter the result.
Datasets and SQL Schemas
• - Table creation and sample data
CREATE TABLE UserActivity (
username VARCHAR(100),
activity VARCHAR(100),
startDate DATE,
endDate DATE
);

-- Sample data insertions


INSERT INTO UserActivity (username, activity, startDate, endDate) VALUES
('Alice', 'Travel', '2020-02-12', '2020-02-20'),
('Alice', 'Dancing', '2020-02-21', '2020-02-23'),
('Alice', 'Travel', '2020-02-24', '2020-02-28'),
('Bob', 'Travel', '2020-02-11', '2020-02-18');

Learnings
• Using ROW_NUMBER() window function to rank rows for each user
• Filtering rows based on the rank for the second most recent activity
• Handling cases where there is only one activity for a user
Solutions
• - PostgreSQL solution
WITH RankedActivities AS (
SELECT username, activity, startDate, endDate,
ROW_NUMBER() OVER (PARTITION BY username ORDER BY startDate DESC) AS activity
_rank
FROM UserActivity
)
SELECT username, activity, startDate, endDate
FROM RankedActivities
WHERE activity_rank = 2
UNION
SELECT username, activity, startDate, endDate
FROM RankedActivities
WHERE activity_rank = 1
AND username NOT IN (SELECT username FROM RankedActivities WHERE activity_rank = 2)
ORDER BY username;
• - MySQL solution
WITH RankedActivities AS (
SELECT username, activity, startDate, endDate,

446
1000+ SQL Interview Questions & Answers | By Zero Analyst

ROW_NUMBER() OVER (PARTITION BY username ORDER BY startDate DESC) AS activity


_rank
FROM UserActivity
)
SELECT username, activity, startDate, endDate
FROM RankedActivities
WHERE activity_rank = 2
UNION
SELECT username, activity, startDate, endDate
FROM RankedActivities
WHERE activity_rank = 1
AND username NOT IN (SELECT username FROM RankedActivities WHERE activity_rank = 2)
ORDER BY username;
• Q.355
Question
Find the students who did not score the highest or lowest score in any exam they took. These
students are considered "quiet". A student who has never taken an exam should not be
included.
Explanation
To identify "quiet" students, you need to:
• Find the highest and lowest scores for each exam.
• Check which students' scores do not match the highest or lowest score for each exam.
• Return only those students who satisfy this condition for every exam they took.
Datasets and SQL Schemas
• - Table creation and sample data
CREATE TABLE Student (
student_id INT PRIMARY KEY,
student_name VARCHAR(100)
);

CREATE TABLE Exam (


exam_id INT,
student_id INT,
score INT,
PRIMARY KEY (exam_id, student_id),
FOREIGN KEY (student_id) REFERENCES Student(student_id)
);

-- Sample data insertions


INSERT INTO Student (student_id, student_name) VALUES
(1, 'Daniel'),
(2, 'Jade'),
(3, 'Stella'),
(4, 'Jonathan'),
(5, 'Will');

INSERT INTO Exam (exam_id, student_id, score) VALUES


(10, 1, 70),
(10, 2, 80),
(10, 3, 90),
(20, 1, 80),
(30, 1, 70),
(30, 3, 80),
(30, 4, 90),
(40, 1, 60),
(40, 2, 70),
(40, 4, 80);

Learnings
• Using GROUP BY and HAVING to calculate maximum and minimum scores per exam

447
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Using NOT IN to filter students who didn't have the highest or lowest score
• Ensuring the condition applies to all exams a student participated in
Solutions
• - PostgreSQL solution
WITH ExamStats AS (
SELECT exam_id, MAX(score) AS max_score, MIN(score) AS min_score
FROM Exam
GROUP BY exam_id
)
SELECT s.student_id, s.student_name
FROM Student s
JOIN Exam e ON s.student_id = e.student_id
JOIN ExamStats es ON e.exam_id = es.exam_id
GROUP BY s.student_id, s.student_name
HAVING SUM(CASE
WHEN e.score = es.max_score OR e.score = es.min_score THEN 1
ELSE 0
END) = 0
ORDER BY s.student_id;
• - MySQL solution
WITH ExamStats AS (
SELECT exam_id, MAX(score) AS max_score, MIN(score) AS min_score
FROM Exam
GROUP BY exam_id
)
SELECT s.student_id, s.student_name
FROM Student s
JOIN Exam e ON s.student_id = e.student_id
JOIN ExamStats es ON e.exam_id = es.exam_id
GROUP BY s.student_id, s.student_name
HAVING SUM(CASE
WHEN e.score = es.max_score OR e.score = es.min_score THEN 1
ELSE 0
END) = 0
ORDER BY s.student_id;
• Q.356
Question
Query all columns for all Marvel cities in the CITY table with populations larger than
100,000. The CountryCode for Marvel is 'Marv'.
Explanation
You need to filter the cities that belong to Marvel (i.e., where CountryCode = 'Marv') and
have a population greater than 100,000. Select all columns from the CITY table that match
these conditions.
Datasets and SQL Schemas
• - Table creation and sample data
CREATE TABLE CITY (
ID INT,
Name VARCHAR(100),
CountryCode VARCHAR(3),
Population INT
);

-- Sample data insertions


INSERT INTO CITY (ID, Name, CountryCode, Population) VALUES
(1, 'New York', 'USA', 8175133),
(2, 'Los Angeles', 'USA', 3792621),
(3, 'Gotham', 'Marv', 1500000),
(4, 'Metropolis', 'Marv', 1200000),
(5, 'Central City', 'Marv', 200000),
(6, 'Smallville', 'Marv', 50000);

448
1000+ SQL Interview Questions & Answers | By Zero Analyst

Learnings
• Filtering data based on conditions with WHERE
• Using specific country code filter for Marvel cities
• Selecting all columns with for the result
Solutions
• - PostgreSQL solution
SELECT *
FROM CITY
WHERE CountryCode = 'Marv' AND Population > 100000;
• - MySQL solution
SELECT *
FROM CITY
WHERE CountryCode = 'Marv' AND Population > 100000;
• Q.357
Question
Find the shortest distance between two points on the x-axis from the given list of points.
Explanation
To find the shortest distance between two points, first, you need to calculate the absolute
differences between all pairs of points. The minimum of these differences will be the shortest
distance. You can achieve this by joining the table with itself and comparing all pairs of
points.
Datasets and SQL Schemas
• - Table creation and sample data
CREATE TABLE point (
x INT
);

-- Sample data insertions


INSERT INTO point (x) VALUES
(-1),
(0),
(2);

Learnings
• Using self-joins to compare all pairs of points
• Using the ABS() function to calculate the absolute difference
• Filtering and finding the minimum distance
Solutions
• - PostgreSQL solution
SELECT MIN(ABS(p1.x - p2.x)) AS shortest
FROM point p1, point p2
WHERE p1.x < p2.x;
• - MySQL solution
SELECT MIN(ABS(p1.x - p2.x)) AS shortest
FROM point p1, point p2
WHERE p1.x < p2.x;
• Q.358
Question
Find the top 3 brands with the highest average product ratings in each category from a
products table and a reviews table.

449
1000+ SQL Interview Questions & Answers | By Zero Analyst

Explanation
You need to join the "products" table with the "reviews" table, group the results by category
and brand, calculate the average product rating for each brand in each category, and then
filter to get the top 3 brands for each category.
Datasets and SQL Schemas
• - Table creation and sample data
CREATE TABLE products (
product_id INT PRIMARY KEY,
product_name VARCHAR(100),
brand VARCHAR(50),
category VARCHAR(50)
);

CREATE TABLE reviews (


review_id INT PRIMARY KEY,
product_id INT,
rating DECIMAL(3, 2),
review_date DATE,
FOREIGN KEY (product_id) REFERENCES products(product_id)
);

-- Sample data insertions


INSERT INTO products (product_id, product_name, brand, category)
VALUES
(1, 'Smartphone A', 'BrandX', 'Electronics'),
(2, 'Laptop B', 'BrandY', 'Electronics'),
(3, 'Headphones C', 'BrandX', 'Accessories'),
(4, 'Smartphone D', 'BrandZ', 'Electronics'),
(5, 'Speaker E', 'BrandY', 'Accessories');

INSERT INTO reviews (review_id, product_id, rating, review_date)


VALUES
(1, 1, 4.5, '2025-01-10'),
(2, 2, 3.8, '2025-01-11'),
(3, 3, 4.0, '2025-01-12'),
(4, 4, 4.2, '2025-01-13'),
(5, 5, 4.7, '2025-01-14');

Learnings
• Using JOINs to combine product and review data.
• GROUP BY to group data by category and brand.
• AVG() to calculate average ratings.
• Using ROW_NUMBER() or RANK() for filtering top results.
Solutions
• - PostgreSQL solution
WITH ranked_brands AS (
SELECT p.category,
p.brand,
AVG(r.rating) AS avg_rating,
ROW_NUMBER() OVER (PARTITION BY p.category ORDER BY AVG(r.rating) DESC) AS ra
nk
FROM products p
JOIN reviews r ON p.product_id = r.product_id
GROUP BY p.category, p.brand
)
SELECT category, brand, avg_rating
FROM ranked_brands
WHERE rank <= 3
ORDER BY category, rank;
• - MySQL solution
WITH ranked_brands AS (
SELECT p.category,

450
1000+ SQL Interview Questions & Answers | By Zero Analyst

p.brand,
AVG(r.rating) AS avg_rating,
RANK() OVER (PARTITION BY p.category ORDER BY AVG(r.rating) DESC) AS rank
FROM products p
JOIN reviews r ON p.product_id = r.product_id
GROUP BY p.category, p.brand
)
SELECT category, brand, avg_rating
FROM ranked_brands
WHERE rank <= 3
ORDER BY category, rank;
• Q.359
Question
Identify and remove products with customer feedback that contains inappropriate words (e.g.,
nudity or offensive language) from a product review system. Only include reviews that do not
contain flagged words.
Explanation
You need to filter out reviews that contain inappropriate or offensive words using a
predefined list of such words. You will join the "products" and "reviews" tables, then apply a
filter to exclude reviews with flagged words.
Datasets and SQL Schemas
• - Table creation and sample data
CREATE TABLE products (
product_id INT PRIMARY KEY,
product_name VARCHAR(100),
category VARCHAR(50)
);

CREATE TABLE reviews (


review_id INT PRIMARY KEY,
product_id INT,
review_text TEXT,
rating DECIMAL(3, 2),
review_date DATE,
FOREIGN KEY (product_id) REFERENCES products(product_id)
);

-- Sample data insertions


INSERT INTO products (product_id, product_name, category)
VALUES
(1, 'Smartphone A', 'Electronics'),
(2, 'Laptop B', 'Electronics'),
(3, 'Headphones C', 'Accessories');

INSERT INTO reviews (review_id, product_id, review_text, rating, review_date)


VALUES
(1, 1, 'Great phone, but the design could be better. No nudity here!', 4.5, '2025-01
-10'),
(2, 2, 'Amazing laptop! Love it!', 5.0, '2025-01-11'),
(3, 3, 'Horrible product, waste of money. Terrible quality!', 1.0, '2025-01-12'),
(4, 1, 'Perfect phone. No complaints.', 5.0, '2025-01-13');

Learnings
• Filtering text data for specific words using LIKE or REGEXP (regular expressions).
• JOINs to combine product and review data.
• Using WHERE clause to exclude reviews with inappropriate content.
Solutions
• - PostgreSQL solution
SELECT p.product_name, r.review_text, r.rating

451
1000+ SQL Interview Questions & Answers | By Zero Analyst

FROM products p
JOIN reviews r ON p.product_id = r.product_id
WHERE NOT (r.review_text ILIKE '%nudity%' OR r.review_text ILIKE '%offensive_word%')
ORDER BY r.review_date;
• - MySQL solution
SELECT p.product_name, r.review_text, r.rating
FROM products p
JOIN reviews r ON p.product_id = r.product_id
WHERE NOT (r.review_text LIKE '%nudity%' OR r.review_text LIKE '%offensive_word%')
ORDER BY r.review_date;
• Q.360
Question
Identify customers who have returned products in their last 3 consecutive orders and
categorize them as "suspect" for potential abuse.
Explanation
You need to join the "orders" and "returns" tables, filter for customers who have made 3
consecutive returns, and categorize them as "suspect." The data should be ordered by order
date to identify the sequence of returns.
Datasets and SQL Schemas
• - Table creation and sample data
CREATE TABLE customers (
customer_id INT PRIMARY KEY,
customer_name VARCHAR(100)
);

CREATE TABLE orders (


order_id INT PRIMARY KEY,
customer_id INT,
order_date DATE,
order_status VARCHAR(20),
FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
);

CREATE TABLE returns (


return_id INT PRIMARY KEY,
order_id INT,
return_date DATE,
FOREIGN KEY (order_id) REFERENCES orders(order_id)
);

-- Sample data insertions


INSERT INTO customers (customer_id, customer_name)
VALUES
(1, 'John Doe'),
(2, 'Jane Smith'),
(3, 'Alice Johnson');

INSERT INTO orders (order_id, customer_id, order_date, order_status)


VALUES
(1, 1, '2025-01-01', 'Completed'),
(2, 1, '2025-01-05', 'Completed'),
(3, 1, '2025-01-10', 'Completed'),
(4, 2, '2025-01-02', 'Completed'),
(5, 2, '2025-01-07', 'Completed'),
(6, 3, '2025-01-04', 'Completed');

INSERT INTO returns (return_id, order_id, return_date)


VALUES
(1, 1, '2025-01-02'),
(2, 2, '2025-01-06'),
(3, 3, '2025-01-11'),
(4, 4, '2025-01-03');

452
1000+ SQL Interview Questions & Answers | By Zero Analyst

Learnings
• Using JOINs to combine multiple tables.
• Filtering with WHERE clause to identify returns.
• ROW_NUMBER() or LAG() to identify consecutive orders.
• Categorizing customers based on consecutive actions.
Solutions
• - PostgreSQL solution
WITH recent_returns AS (
SELECT r.customer_id, o.order_id, o.order_date, r.return_date,
ROW_NUMBER() OVER (PARTITION BY r.customer_id ORDER BY o.order_date DESC) AS
rn
FROM returns r
JOIN orders o ON r.order_id = o.order_id
WHERE o.order_status = 'Completed'
)
SELECT customer_id, COUNT(*) AS return_count
FROM recent_returns
WHERE rn <= 3
GROUP BY customer_id
HAVING COUNT(*) = 3
ORDER BY customer_id;
• - MySQL solution
WITH recent_returns AS (
SELECT r.customer_id, o.order_id, o.order_date, r.return_date,
ROW_NUMBER() OVER (PARTITION BY r.customer_id ORDER BY o.order_date DESC) AS
rn
FROM returns r
JOIN orders o ON r.order_id = o.order_id
WHERE o.order_status = 'Completed'
)
SELECT customer_id, COUNT(*) AS return_count
FROM recent_returns
WHERE rn <= 3
GROUP BY customer_id
HAVING COUNT(*) = 3
ORDER BY customer_id;

Spotify
• Datasets

Note

Please copy below datasets into PgAdmin or MySQL run these query to create
table and insert the data and solve questions in next sections!

1. Users Table
CREATE TABLE users (
user_id INT PRIMARY KEY,
user_name VARCHAR(100),
email VARCHAR(100),
country VARCHAR(50),
subscription_type VARCHAR(50), -- Free, Premium
sign_up_date DATE,
last_login TIMESTAMP
);

INSERT INTO users (user_id, user_name, email, country, subscription_type, sign_up_date,


last_login)
VALUES
(1, 'JohnDoe', '[email protected]', 'USA', 'Premium', '2020-05-01', '2024-12-25'),
(2, 'JaneSmith', '[email protected]', 'UK', 'Free', '2021-02-11', '2024-12-20'),

453
1000+ SQL Interview Questions & Answers | By Zero Analyst

(3, 'MikeJones', '[email protected]', 'Canada', 'Premium', '2019-08-15', '2024-12-22


'),
(4, 'EmilyWhite', '[email protected]', 'Australia', 'Premium', '2022-01-10', '2024-
12-10'),
(5, 'DavidClark', '[email protected]', 'India', 'Free', '2020-07-14', '2024-11-15')
,
(6, 'SarahDavis', '[email protected]', 'USA', 'Premium', '2019-12-05', '2024-12-28'
),
(7, 'KevinWilson', '[email protected]', 'UK', 'Free', '2021-11-20', '2024-12-05'),
(8, 'OliviaBrown', '[email protected]', 'Canada', 'Premium', '2020-08-12', '2024-1
2-23'),
(9, 'LucasMartin', '[email protected]', 'Germany', 'Premium', '2021-06-14', '2024-
12-10'),
(10, 'SophiaTaylor', '[email protected]', 'USA', 'Free', '2018-10-01', '2024-12-1
8'),
(11, 'DanielLopez', '[email protected]', 'Mexico', 'Premium', '2019-01-22', '2024
-12-15'),
(12, 'EmmaMiller', '[email protected]', 'Australia', 'Free', '2022-04-02', '2024-12
-09'),
(13, 'HenryMartinez', '[email protected]', 'Spain', 'Free', '2020-03-01', '2024-
11-30'),
(14, 'JackMoore', '[email protected]', 'UK', 'Premium', '2019-11-15', '2024-12-26'),
(15, 'AvaMartinez', '[email protected]', 'India', 'Free', '2021-07-05', '2024-12-1
2'),
(16, 'LiamGarcia', '[email protected]', 'USA', 'Premium', '2021-01-13', '2024-12-14
'),
(17, 'CharlotteWilson', '[email protected]', 'Canada', 'Premium', '2021-09-09'
, '2024-12-22'),
(18, 'JamesKing', '[email protected]', 'Germany', 'Free', '2021-05-25', '2024-12-03'
),
(19, 'AmeliaLopez', '[email protected]', 'Mexico', 'Free', '2020-06-17', '2024-11-
29'),
(20, 'EthanHernandez', '[email protected]', 'USA', 'Premium', '2019-09-11',
'2024-12-30');

2. Artists Table
CREATE TABLE artists (
artist_id INT PRIMARY KEY,
artist_name VARCHAR(100),
genre VARCHAR(50),
country VARCHAR(50),
date_joined TIMESTAMP
);

INSERT INTO artists (artist_id, artist_name, genre, country, date_joined)


VALUES
(1, 'Ariana Grande', 'Pop', 'USA', '2014-06-01'),
(2, 'Drake', 'Hip-Hop', 'Canada', '2012-03-23'),
(3, 'Ed Sheeran', 'Pop', 'UK', '2011-05-14'),
(4, 'BTS', 'K-Pop', 'South Korea', '2013-05-01'),
(5, 'Taylor Swift', 'Pop', 'USA', '2006-10-24'),
(6, 'Billie Eilish', 'Alternative', 'USA', '2015-11-01'),
(7, 'Shakira', 'Pop', 'Colombia', '2001-07-01'),
(8, 'Drake', 'Hip-Hop', 'Canada', '2011-10-30'),
(9, 'Imagine Dragons', 'Rock', 'USA', '2008-09-01'),
(10, 'Dua Lipa', 'Pop', 'UK', '2014-11-01'),
(11, 'Bruno Mars', 'Pop', 'USA', '2009-07-01'),
(12, 'Coldplay', 'Rock', 'UK', '1996-10-01'),
(13, 'Kanye West', 'Hip-Hop', 'USA', '2004-02-11'),
(14, 'Post Malone', 'Hip-Hop', 'USA', '2015-10-10'),
(15, 'Maroon 5', 'Pop', 'USA', '2001-03-01'),
(16, 'The Weeknd', 'R&B', 'Canada', '2010-07-01'),
(17, 'Rihanna', 'Pop', 'Barbados', '2005-08-01'),
(18, 'Lil Nas X', 'Hip-Hop', 'USA', '2018-12-01'),
(19, 'Kendrick Lamar', 'Hip-Hop', 'USA', '2004-10-25'),
(20, 'Miley Cyrus', 'Pop', 'USA', '2006-04-10');

454
1000+ SQL Interview Questions & Answers | By Zero Analyst

3. Albums Table
CREATE TABLE albums (
album_id INT PRIMARY KEY,
album_name VARCHAR(100),
artist_id INT,
release_date DATE,
genre VARCHAR(50),
total_tracks INT,
FOREIGN KEY (artist_id) REFERENCES artists(artist_id)
);

INSERT INTO albums (album_id, album_name, artist_id, release_date, genre, total_tracks)


VALUES
(1, 'Thank U, Next', 1, '2019-02-08', 'Pop', 12),
(2, 'Scorpion', 2, '2018-06-29', 'Hip-Hop', 25),
(3, 'Divide', 3, '2017-03-03', 'Pop', 12),
(4, 'Map of the Soul: Persona', 4, '2019-04-12', 'K-Pop', 7),
(5, '1989', 5, '2014-10-27', 'Pop', 13),
(6, 'When We All Fall Asleep, Where Do We Go?', 6, '2019-03-29', 'Alternative', 14),
(7, 'El Dorado', 7, '2017-05-26', 'Pop', 11),
(8, 'Nothing Was the Same', 8, '2013-09-24', 'Hip-Hop', 13),
(9, 'Evolve', 9, '2017-06-23', 'Rock', 12),
(10, 'Future Nostalgia', 10, '2020-03-27', 'Pop', 11),
(11, '24K Magic', 11, '2016-11-18', 'Pop', 9),
(12, 'Ghost Stories', 12, '2014-05-19', 'Rock', 9),
(13, 'The Life of Pablo', 13, '2016-02-14', 'Hip-Hop', 10),
(14, 'Beer Bongs & Bentleys', 14, '2018-04-27', 'Hip-Hop', 18),
(15, 'Songs About Jane', 15, '2002-06-25', 'Pop', 12),
(16, 'After Hours', 16, '2020-03-20', 'R&B', 14),
(17, 'Good Girl Gone Bad', 17, '2007-05-30', 'Pop', 12),
(18, 'Montero', 18, '2021-09-17', 'Hip-Hop', 15),
(19, 'DAMN.', 19, '2017-04-14',

'Hip-Hop', 14),
(20, 'Bangerz', 20, '2013-10-08', 'Pop', 13);

4. Tracks Table
CREATE TABLE tracks (
track_id INT PRIMARY KEY,
track_name VARCHAR(100),
album_id INT,
artist_id INT,
genre VARCHAR(50),
duration INT, -- in seconds
FOREIGN KEY (album_id) REFERENCES albums(album_id),
FOREIGN KEY (artist_id) REFERENCES artists(artist_id)
);

INSERT INTO tracks (track_id, track_name, album_id, artist_id, genre, duration)


VALUES
(1, '7 Rings', 1, 1, 'Pop', 180),
(2, 'God\\'s Plan', 2, 2, 'Hip-Hop', 199),
(3, 'Shape of You', 3, 3, 'Pop', 233),
(4, 'Boy with Luv', 4, 4, 'K-Pop', 210),
(5, 'Shake It Off', 5, 5, 'Pop', 241),
(6, 'Bad Guy', 6, 6, 'Alternative', 194),
(7, 'Hips Don\\'t Lie', 7, 7, 'Pop', 211),
(8, 'In My Feelings', 8, 8, 'Hip-Hop', 242),
(9, 'Thunder', 9, 9, 'Rock', 210),
(10, 'Don\\'t Start Now', 10, 10, 'Pop', 183),
(11, 'Uptown Funk', 11, 11, 'Pop', 268),
(12, 'Viva la Vida', 12, 12, 'Rock', 242),
(13, 'Stronger', 13, 13, 'Hip-Hop', 231),
(14, 'Rockstar', 14, 14, 'Hip-Hop', 222),
(15, 'This Love', 15, 15, 'Pop', 221),
(16, 'Blinding Lights', 16, 16, 'R&B', 200),
(17, 'Umbrella', 17, 17, 'Pop', 273),
(18, 'Industry Baby', 18, 18, 'Hip-Hop', 220),
(19, 'HUMBLE.', 19, 19, 'Hip-Hop', 192),

455
1000+ SQL Interview Questions & Answers | By Zero Analyst

(20, 'Wrecking Ball', 20, 20, 'Pop', 262);

• Q.361
Find the total number of tracks played by users from the USA in the last 6 months.

💡
Learnings:
• Use JOIN to combine the user_activity, users, and tracks tables.
• Apply date filters and group by user country.

Explanation:
This query gives insights into user activity in the USA, specifically the number of tracks
played by users within the last 6 months.

Expected Outcome:
A single result showing the total number of tracks played by users from the USA in the past 6
months.

Solution (MySQL/PostgreSQL/SQL Server):


SELECT u.country, COUNT(ua.track_id) AS total_tracks_played
FROM user_activity ua
JOIN users u ON ua.user_id = u.user_id
JOIN tracks t ON ua.track_id = t.track_id
WHERE u.country = 'USA' AND ua.played_at >= NOW() - INTERVAL 6 MONTH
GROUP BY u.country;

• Q.362
Identify the top 5 users who have listened to the most distinct tracks in the last 30 days.

💡
Learnings:
• Use COUNT(DISTINCT) for distinct tracks.
• Filter by the last 30 days using NOW() - INTERVAL 30 DAY.

Explanation:
This query helps identify users who have shown the most variety in their listening habits over
the past month.

Expected Outcome:
The top 5 users who have listened to the most distinct tracks in the last 30 days.

Solution (MySQL/PostgreSQL/SQL Server):


SELECT ua.user_id, COUNT(DISTINCT ua.track_id) AS distinct_tracks_played
FROM user_activity ua
WHERE ua.played_at >= NOW() - INTERVAL 30 DAY
GROUP BY ua.user_id

456
1000+ SQL Interview Questions & Answers | By Zero Analyst

ORDER BY distinct_tracks_played DESC


LIMIT 5;

• Q.363
Calculate the average play duration per track in the last 60 days, grouped by genre.

💡
Learnings:
• Use AVG() to calculate the average.
• Group by genre and apply filters for the last 60 days.

Explanation:
This query calculates how long tracks in each genre are played on average, helping identify
user preferences.

Expected Outcome:
The average play duration for each genre in the last 60 days.

Solution (MySQL/PostgreSQL/SQL Server):


SELECT t.genre, AVG(ua.play_duration) AS avg_play_duration
FROM user_activity ua
JOIN tracks t ON ua.track_id = t.track_id
WHERE ua.played_at >= NOW() - INTERVAL 60 DAY
GROUP BY t.genre;
• Q.364
Find all users who have listened to more than 50 tracks by a specific artist (e.g., Drake).

💡
Learnings:
• Use JOIN to connect user_activity, tracks, and artists.
• Use HAVING to filter results based on track count.

Explanation:
This query identifies users with a strong preference for an artist (e.g., Drake), showing active
engagement with their tracks.

Expected Outcome:
A list of users who have played more than 50 tracks by Drake.

Solution (MySQL/PostgreSQL/SQL Server):


SELECT ua.user_id, t.artist_id, COUNT(ua.track_id) AS tracks_played
FROM user_activity ua
JOIN tracks t ON ua.track_id = t.track_id
JOIN artists a ON t.artist_id = a.artist_id
WHERE a.artist_name = 'Drake'
GROUP BY ua.user_id, t.artist_id
HAVING COUNT(ua.track_id) > 50;
• Q.365

457
1000+ SQL Interview Questions & Answers | By Zero Analyst

List the top 3 genres with the highest total play duration in the last 3 months.

💡
Learnings:
• Use SUM() to calculate total play duration.
• Filter results by the last 3 months.

Explanation:
This query identifies the top 3 genres based on the total play time over the past 3 months,
helping to understand genre popularity.

Expected Outcome:
The top 3 genres with the highest play duration.

Solution (MySQL/PostgreSQL/SQL Server):


SELECT t.genre, SUM(ua.play_duration) AS total_play_duration
FROM user_activity ua
JOIN tracks t ON ua.track_id = t.track_id
WHERE ua.played_at >= NOW() - INTERVAL 3 MONTH
GROUP BY t.genre
ORDER BY total_play_duration DESC
LIMIT 3;

• Q.366
Find users who have listened to the most tracks from the album 'Scorpion' by Drake.

💡
Learnings:
• Use JOIN to link user_activity, tracks, and albums.
• Apply filters for album name and track count.

Explanation:
This query helps identify users who have shown a high preference for Drake’s "Scorpion"
album.

Expected Outcome:
A list of users who have listened to the most tracks from "Scorpion."

Solution (MySQL/PostgreSQL/SQL Server):


SELECT ua.user_id, COUNT(ua.track_id) AS tracks_played
FROM user_activity ua
JOIN tracks t ON ua.track_id = t.track_id
JOIN albums al ON t.album_id = al.album_id
WHERE al.album_name = 'Scorpion'
GROUP BY ua.user_id
ORDER BY tracks_played DESC;
• Q.367
Identify the top 5 artists with the most number of tracks played in the last 60 days.

458
1000+ SQL Interview Questions & Answers | By Zero Analyst

💡
Learnings:
• Aggregate by artist and filter by date.
• Use COUNT() to count the number of tracks played.

Explanation:
This query identifies the top 5 artists who have had the most engagement with listeners in the
last 60 days.

Expected Outcome:
The top 5 artists based on the number of tracks played.

Solution (MySQL/PostgreSQL/SQL Server):


SELECT a.artist_name, COUNT(ua.track_id) AS tracks_played
FROM user_activity ua
JOIN tracks t ON ua.track_id = t.track_id
JOIN artists a ON t.artist_id = a.artist_id
WHERE ua.played_at >= NOW() - INTERVAL 60 DAY
GROUP BY a.artist_name
ORDER BY tracks_played DESC
LIMIT 5;
• Q.368
List all tracks that were played more than 1000 times in the last 30 days.

💡
Learnings:
• Aggregate track plays using COUNT().
• Apply date filters and use HAVING to filter tracks with high play counts.

Explanation:
This query identifies popular tracks that have been played over 1000 times in the past 30
days.

Expected Outcome:
A list of tracks with over 1000 plays in the last month.

Solution (MySQL/PostgreSQL/SQL Server):

SELECT t.track_name, COUNT(ua.track_id) AS play_count


FROM user_activity ua
JOIN tracks t ON ua.track_id = t.track_id
WHERE ua.played_at >= NOW() - INTERVAL 30 DAY
GROUP BY t.track_name
HAVING COUNT(ua.track_id) > 1000;
• Q.369
Retrieve the top 3 most popular albums based on total plays in the last 90 days.

459
1000+ SQL Interview Questions & Answers | By Zero Analyst

💡
Learnings:
• Use aggregation functions like COUNT() to measure popularity.
• Apply date filters and ORDER BY to find the top albums.

Explanation:
This query helps to identify the albums with the most engagement in the last 90 days.

Expected Outcome:
The top 3 most played albums.

Solution (MySQL/PostgreSQL/SQL Server):


SELECT al.album_name, COUNT(ua.track_id) AS total_plays
FROM user_activity ua
JOIN tracks t ON ua.track_id = t.track_id
JOIN albums al ON t.album_id = al.album_id
WHERE ua.played_at >= NOW() - INTERVAL 90 DAY
GROUP BY al.album_name
ORDER BY total_plays DESC
LIMIT 3;

• Q.370
Identify users who have upgraded from Free to Premium subscription in the last 6
months.

💡
Learnings:
• Use JOIN to link the users table with itself.
• Apply WHERE and AND to filter by subscription status change.

Explanation:
This query finds users who have moved from Free to Premium subscriptions, providing
insight into user behavior and potential engagement strategies.

Expected Outcome:
A list of users who upgraded their subscription in the last 6 months.

Solution (MySQL/PostgreSQL/SQL Server):


SELECT u1.user_name, u1.subscription_type AS previous_subscription, u2.subscription_type
AS current_subscription
FROM users u1
JOIN users u2 ON u1.user_id = u2.user_id
WHERE u1.subscription_type = 'Free' AND u2.subscription_type = 'Premium'
AND u2.sign_up_date > u1.sign_up_date
AND u2.sign_up_date >= NOW() - INTERVAL 6 MONTH;
• Q.371
Question

460
1000+ SQL Interview Questions & Answers | By Zero Analyst

Find the top 3 Indian artists with the most number of tracks that have a rating greater than
4.5, grouped by genre.
Explanation
You need to join the "artists," "tracks," and "ratings" tables. Filter tracks that have ratings
greater than 4.5, then group the data by genre and artist. For each artist in each genre, count
the number of such high-rated tracks and retrieve the top 3 artists.
Datasets and SQL Schemas
• - Table creation and sample data
CREATE TABLE artists (
artist_id INT PRIMARY KEY,
artist_name VARCHAR(100),
nationality VARCHAR(50),
genre VARCHAR(50)
);

CREATE TABLE tracks (


track_id INT PRIMARY KEY,
track_name VARCHAR(100),
artist_id INT,
release_date DATE,
FOREIGN KEY (artist_id) REFERENCES artists(artist_id)
);

CREATE TABLE ratings (


rating_id INT PRIMARY KEY,
track_id INT,
rating DECIMAL(3, 2),
rating_date DATE,
FOREIGN KEY (track_id) REFERENCES tracks(track_id)
);

-- Sample data insertions


INSERT INTO artists (artist_id, artist_name, nationality, genre)
VALUES
(1, 'Arijit Singh', 'Indian', 'Bollywood'),
(2, 'Ravi Shankar', 'Indian', 'Classical'),
(3, 'Nina Rao', 'Indian', 'Folk'),
(4, 'A. R. Rahman', 'Indian', 'Bollywood'),
(5, 'Shankar Mahadevan', 'Indian', 'Classical');

INSERT INTO tracks (track_id, track_name, artist_id, release_date)


VALUES
(1, 'Tum Hi Ho', 1, '2025-01-01'),
(2, 'Morning Raga', 2, '2025-01-02'),
(3, 'Madhubala', 3, '2025-01-03'),
(4, 'Jai Ho', 4, '2025-01-04'),
(5, 'Breathless', 5, '2025-01-05');

INSERT INTO ratings (rating_id, track_id, rating, rating_date)


VALUES
(1, 1, 4.6, '2025-01-10'),
(2, 2, 4.8, '2025-01-11'),
(3, 3, 3.9, '2025-01-12'),
(4, 4, 4.7, '2025-01-13'),
(5, 5, 4.9, '2025-01-14');

Learnings
• Filtering data based on specific conditions using the WHERE clause.
• JOINs to combine related tables.
• COUNT() to count the number of high-rated tracks.
• GROUP BY to group the results by artist and genre.
Solutions

461
1000+ SQL Interview Questions & Answers | By Zero Analyst

• - PostgreSQL solution
WITH high_rated_tracks AS (
SELECT a.artist_name, a.genre, COUNT(*) AS track_count
FROM artists a
JOIN tracks t ON a.artist_id = t.artist_id
JOIN ratings r ON t.track_id = r.track_id
WHERE a.nationality = 'Indian' AND r.rating > 4.5
GROUP BY a.artist_name, a.genre
)
SELECT artist_name, genre, track_count
FROM high_rated_tracks
ORDER BY track_count DESC
LIMIT 3;
• - MySQL solution
WITH high_rated_tracks AS (
SELECT a.artist_name, a.genre, COUNT(*) AS track_count
FROM artists a
JOIN tracks t ON a.artist_id = t.artist_id
JOIN ratings r ON t.track_id = r.track_id
WHERE a.nationality = 'Indian' AND r.rating > 4.5
GROUP BY a.artist_name, a.genre
)
SELECT artist_name, genre, track_count
FROM high_rated_tracks
ORDER BY track_count DESC
LIMIT 3;
• Q.372
Question
Identify the top 3 albums in the 'Pop' genre with the highest average track ratings, but only
consider albums released within the last 6 months.
Explanation
You need to join the "albums," "tracks," and "ratings" tables. Filter albums based on the
genre ('Pop') and release date (last 6 months). Then, calculate the average rating of tracks
within each album, rank the albums by average rating, and retrieve the top 3.
Datasets and SQL Schemas
• - Table creation and sample data
CREATE TABLE albums (
album_id INT PRIMARY KEY,
album_name VARCHAR(100),
genre VARCHAR(50),
release_date DATE
);

CREATE TABLE tracks (


track_id INT PRIMARY KEY,
album_id INT,
track_name VARCHAR(100),
FOREIGN KEY (album_id) REFERENCES albums(album_id)
);

CREATE TABLE ratings (


rating_id INT PRIMARY KEY,
track_id INT,
rating DECIMAL(3, 2),
rating_date DATE,
FOREIGN KEY (track_id) REFERENCES tracks(track_id)
);

-- Sample data insertions


INSERT INTO albums (album_id, album_name, genre, release_date)
VALUES
(1, 'Pop Hits 2025', 'Pop', '2025-07-01'),
(2, 'Summer Pop', 'Pop', '2025-09-10'),

462
1000+ SQL Interview Questions & Answers | By Zero Analyst

(3, 'Classic Pop', 'Pop', '2024-12-01'),


(4, 'Indie Pop Vibes', 'Pop', '2025-06-15'),
(5, 'Pop Forever', 'Pop', '2025-08-20');

INSERT INTO tracks (track_id, track_name, album_id)


VALUES
(1, 'Track 1', 1),
(2, 'Track 2', 1),
(3, 'Track 3', 2),
(4, 'Track 4', 2),
(5, 'Track 5', 3),
(6, 'Track 6', 4),
(7, 'Track 7', 4),
(8, 'Track 8', 5);

INSERT INTO ratings (rating_id, track_id, rating, rating_date)


VALUES
(1, 1, 4.7, '2025-01-10'),
(2, 2, 4.5, '2025-01-12'),
(3, 3, 4.9, '2025-01-15'),
(4, 4, 4.6, '2025-01-18'),
(5, 5, 3.9, '2025-01-20'),
(6, 6, 4.8, '2025-01-25'),
(7, 7, 4.7, '2025-01-30'),
(8, 8, 4.5, '2025-02-01');

Learnings
• Using JOINs to combine album, track, and rating data.
• GROUP BY to aggregate ratings by album.
• AVG() to calculate the average rating for each album.
• Filtering by release date and genre.
• Using ORDER BY to rank albums by average rating.
Solutions
• - PostgreSQL solution
WITH album_ratings AS (
SELECT a.album_id, a.album_name, AVG(r.rating) AS avg_rating
FROM albums a
JOIN tracks t ON a.album_id = t.album_id
JOIN ratings r ON t.track_id = r.track_id
WHERE a.genre = 'Pop'
AND a.release_date >= CURRENT_DATE - INTERVAL '6 months'
GROUP BY a.album_id, a.album_name
)
SELECT album_name, avg_rating
FROM album_ratings
ORDER BY avg_rating DESC
LIMIT 3;
• - MySQL solution
WITH album_ratings AS (
SELECT a.album_id, a.album_name, AVG(r.rating) AS avg_rating
FROM albums a
JOIN tracks t ON a.album_id = t.album_id
JOIN ratings r ON t.track_id = r.track_id
WHERE a.genre = 'Pop'
AND a.release_date >= CURDATE() - INTERVAL 6 MONTH
GROUP BY a.album_id, a.album_name
)
SELECT album_name, avg_rating
FROM album_ratings
ORDER BY avg_rating DESC
LIMIT 3;
• Q.373
Question

463
1000+ SQL Interview Questions & Answers | By Zero Analyst

Identify artists who have at least 5 tracks with an average rating of 4.5 or higher, but only
count tracks released within the last 12 months. Also, filter out any artists who have more
than 3 tracks with a rating below 3.5.
Explanation
You need to join the "artists," "tracks," and "ratings" tables. Filter for tracks released within
the last 12 months and calculate the average rating for each artist. After that, you will need to
check that each artist has at least 5 tracks with an average rating of 4.5 or higher and filter out
any artist with more than 3 tracks with a rating below 3.5. The result should show artists
meeting both conditions.
Datasets and SQL Schemas
• - Table creation and sample data
CREATE TABLE artists (
artist_id INT PRIMARY KEY,
artist_name VARCHAR(100),
nationality VARCHAR(50),
genre VARCHAR(50)
);

CREATE TABLE tracks (


track_id INT PRIMARY KEY,
artist_id INT,
track_name VARCHAR(100),
release_date DATE,
FOREIGN KEY (artist_id) REFERENCES artists(artist_id)
);

CREATE TABLE ratings (


rating_id INT PRIMARY KEY,
track_id INT,
rating DECIMAL(3, 2),
rating_date DATE,
FOREIGN KEY (track_id) REFERENCES tracks(track_id)
);

-- Sample data insertions


INSERT INTO artists (artist_id, artist_name, nationality, genre)
VALUES
(1, 'John Doe', 'Indian', 'Pop'),
(2, 'Jane Smith', 'American', 'Rock'),
(3, 'Alice Johnson', 'Indian', 'Pop'),
(4, 'Ravi Shankar', 'Indian', 'Classical');

INSERT INTO tracks (track_id, artist_id, track_name, release_date)


VALUES
(1, 1, 'Track 1', '2025-03-01'),
(2, 1, 'Track 2', '2025-02-10'),
(3, 1, 'Track 3', '2025-01-20'),
(4, 1, 'Track 4', '2024-11-30'),
(5, 2, 'Track 5', '2024-06-15'),
(6, 2, 'Track 6', '2025-04-01'),
(7, 3, 'Track 7', '2025-01-10'),
(8, 3, 'Track 8', '2025-05-15'),
(9, 4, 'Track 9', '2025-01-01');

INSERT INTO ratings (rating_id, track_id, rating, rating_date)


VALUES
(1, 1, 4.6, '2025-01-01'),
(2, 2, 4.9, '2025-01-02'),
(3, 3, 3.8, '2025-01-03'),
(4, 4, 4.7, '2025-01-10'),
(5, 5, 3.0, '2024-07-01'),
(6, 6, 4.5, '2025-04-05'),
(7, 7, 4.8, '2025-01-15'),
(8, 8, 4.9, '2025-06-10'),
(9, 9, 2.9, '2025-01-10');

464
1000+ SQL Interview Questions & Answers | By Zero Analyst

Learnings
• Using JOINs to combine multiple tables.
• WHERE clause to filter by release dates and ratings.
• HAVING clause to apply conditions after aggregation (e.g., the minimum number of
tracks).
• COUNT() and AVG() to aggregate and compute the total number of tracks and average
ratings.
• GROUP BY to group by artist.
Solutions
• - PostgreSQL solution
WITH track_ratings AS (
SELECT t.artist_id, COUNT(*) AS track_count, AVG(r.rating) AS avg_rating
FROM tracks t
JOIN ratings r ON t.track_id = r.track_id
WHERE t.release_date >= CURRENT_DATE - INTERVAL '12 months'
GROUP BY t.artist_id, t.track_id
),
artist_ratings AS (
SELECT artist_id, COUNT(*) AS high_rating_tracks
FROM track_ratings
WHERE avg_rating >= 4.5
GROUP BY artist_id
HAVING COUNT(*) >= 5
),
artist_low_ratings AS (
SELECT t.artist_id, COUNT(*) AS low_rating_tracks
FROM tracks t
JOIN ratings r ON t.track_id = r.track_id
WHERE r.rating < 3.5
GROUP BY t.artist_id
HAVING COUNT(*) <= 3
)
SELECT a.artist_name
FROM artists a
JOIN artist_ratings ar ON a.artist_id = ar.artist_id
JOIN artist_low_ratings alr ON a.artist_id = alr.artist_id;
• - MySQL solution
WITH track_ratings AS (
SELECT t.artist_id, COUNT(*) AS track_count, AVG(r.rating) AS avg_rating
FROM tracks t
JOIN ratings r ON t.track_id = r.track_id
WHERE t.release_date >= CURDATE() - INTERVAL 12 MONTH
GROUP BY t.artist_id, t.track_id
),
artist_ratings AS (
SELECT artist_id, COUNT(*) AS high_rating_tracks
FROM track_ratings
WHERE avg_rating >= 4.5
GROUP BY artist_id
HAVING COUNT(*) >= 5
),
artist_low_ratings AS (
SELECT t.artist_id, COUNT(*) AS low_rating_tracks
FROM tracks t
JOIN ratings r ON t.track_id = r.track_id
WHERE r.rating < 3.5
GROUP BY t.artist_id
HAVING COUNT(*) <= 3
)
SELECT a.artist_name
FROM artists a
JOIN artist_ratings ar ON a.artist_id = ar.artist_id
JOIN artist_low_ratings alr ON a.artist_id = alr.artist_id;
• Q.374

465
1000+ SQL Interview Questions & Answers | By Zero Analyst

Question
Identify the top 3 Indian users who have the highest average rating on tracks they have
listened to, but only include tracks from the 'Pop' genre that were released in the last 6
months.
Explanation
You need to join the "users," "tracks," "ratings," and "albums" tables. Filter the tracks by
genre ('Pop') and release date (last 6 months). Then, calculate the average rating of tracks for
each user. Finally, rank the users by their average rating and retrieve the top 3.
Datasets and SQL Schemas
• - Table creation and sample data
CREATE TABLE users (
user_id INT PRIMARY KEY,
user_name VARCHAR(100),
nationality VARCHAR(50)
);

CREATE TABLE albums (


album_id INT PRIMARY KEY,
album_name VARCHAR(100),
genre VARCHAR(50),
release_date DATE
);

CREATE TABLE tracks (


track_id INT PRIMARY KEY,
album_id INT,
track_name VARCHAR(100),
FOREIGN KEY (album_id) REFERENCES albums(album_id)
);

CREATE TABLE ratings (


rating_id INT PRIMARY KEY,
track_id INT,
user_id INT,
rating DECIMAL(3, 2),
rating_date DATE,
FOREIGN KEY (track_id) REFERENCES tracks(track_id),
FOREIGN KEY (user_id) REFERENCES users(user_id)
);

-- Sample data insertions


INSERT INTO users (user_id, user_name, nationality)
VALUES
(1, 'Raj Sharma', 'Indian'),
(2, 'Priya Patel', 'Indian'),
(3, 'John Doe', 'American'),
(4, 'Amit Gupta', 'Indian');

INSERT INTO albums (album_id, album_name, genre, release_date)


VALUES
(1, 'Pop Hits 2025', 'Pop', '2025-03-01'),
(2, 'Pop Party', 'Pop', '2025-05-10'),
(3, 'Rock Classics', 'Rock', '2024-12-01'),
(4, 'Indie Pop Vibes', 'Pop', '2025-02-15');

INSERT INTO tracks (track_id, album_id, track_name)


VALUES
(1, 1, 'Track 1'),
(2, 1, 'Track 2'),
(3, 2, 'Track 3'),
(4, 2, 'Track 4'),
(5, 3, 'Track 5'),
(6, 4, 'Track 6'),
(7, 4, 'Track 7');

466
1000+ SQL Interview Questions & Answers | By Zero Analyst

INSERT INTO ratings (rating_id, track_id, user_id, rating, rating_date)


VALUES
(1, 1, 1, 4.5, '2025-01-10'),
(2, 2, 1, 4.7, '2025-01-12'),
(3, 3, 2, 3.8, '2025-02-05'),
(4, 4, 2, 4.2, '2025-02-10'),
(5, 5, 4, 2.9, '2025-02-01'),
(6, 6, 4, 4.8, '2025-03-05'),
(7, 7, 1, 4.6, '2025-03-10');

Learnings
• Using JOINs to combine data from multiple tables (users, tracks, ratings, albums).
• Filtering data based on genre and release date.
• AVG() to calculate the average rating for each user.
• GROUP BY to group the results by user.
• Using ORDER BY to sort the results and retrieve the top users.
Solutions
• - PostgreSQL solution
WITH user_ratings AS (
SELECT r.user_id, AVG(r.rating) AS avg_rating
FROM ratings r
JOIN tracks t ON r.track_id = t.track_id
JOIN albums a ON t.album_id = a.album_id
WHERE a.genre = 'Pop'
AND a.release_date >= CURRENT_DATE - INTERVAL '6 months'
AND r.user_id IN (SELECT user_id FROM users WHERE nationality = 'Indian')
GROUP BY r.user_id
)
SELECT u.user_name, ur.avg_rating
FROM user_ratings ur
JOIN users u ON ur.user_id = u.user_id
ORDER BY ur.avg_rating DESC
LIMIT 3;
• - MySQL solution
WITH user_ratings AS (
SELECT r.user_id, AVG(r.rating) AS avg_rating
FROM ratings r
JOIN tracks t ON r.track_id = t.track_id
JOIN albums a ON t.album_id = a.album_id
WHERE a.genre = 'Pop'
AND a.release_date >= CURDATE() - INTERVAL 6 MONTH
AND r.user_id IN (SELECT user_id FROM users WHERE nationality = 'Indian')
GROUP BY r.user_id
)
SELECT u.user_name, ur.avg_rating
FROM user_ratings ur
JOIN users u ON ur.user_id = u.user_id
ORDER BY ur.avg_rating DESC
LIMIT 3;
• Q.375
Question
Identify the top 3 Indian albums in the 'Classical' genre that have the highest average track
rating, considering only albums released in the last 12 months. Also, each album must have
at least 3 tracks with ratings of 4.0 or higher.
Explanation
You need to join the "albums," "tracks," and "ratings" tables. Filter albums by genre
('Classical') and release date (last 12 months). Then, calculate the average rating of tracks for

467
1000+ SQL Interview Questions & Answers | By Zero Analyst

each album, and filter albums that have at least 3 tracks with ratings of 4.0 or higher. Finally,
rank the albums by their average track rating and retrieve the top 3.
Datasets and SQL Schemas
• - Table creation and sample data
CREATE TABLE albums (
album_id INT PRIMARY KEY,
album_name VARCHAR(100),
genre VARCHAR(50),
release_date DATE
);

CREATE TABLE tracks (


track_id INT PRIMARY KEY,
album_id INT,
track_name VARCHAR(100),
FOREIGN KEY (album_id) REFERENCES albums(album_id)
);

CREATE TABLE ratings (


rating_id INT PRIMARY KEY,
track_id INT,
rating DECIMAL(3, 2),
rating_date DATE,
FOREIGN KEY (track_id) REFERENCES tracks(track_id)
);

-- Sample data insertions


INSERT INTO albums (album_id, album_name, genre, release_date)
VALUES
(1, 'Classical Masterpieces', 'Classical', '2024-11-01'),
(2, 'Indian Ragas 2025', 'Classical', '2025-02-10'),
(3, 'Soulful Classical', 'Classical', '2025-03-15'),
(4, 'Hindustani Classical', 'Classical', '2024-09-20');

INSERT INTO tracks (track_id, album_id, track_name)


VALUES
(1, 1, 'Raga Yaman'),
(2, 1, 'Raga Bhairav'),
(3, 1, 'Raga Hamsadhwani'),
(4, 2, 'Raga Bageshree'),
(5, 2, 'Raga Darbari Kanada'),
(6, 2, 'Raga Marwa'),
(7, 3, 'Raga Malkauns'),
(8, 3, 'Raga Desh'),
(9, 4, 'Raga Shankarabharanam');

INSERT INTO ratings (rating_id, track_id, rating, rating_date)


VALUES
(1, 1, 4.6, '2025-01-10'),
(2, 2, 4.7, '2025-01-12'),
(3, 3, 3.8, '2025-01-15'),
(4, 4, 4.9, '2025-01-20'),
(5, 5, 4.5, '2025-02-01'),
(6, 6, 3.9, '2025-02-10'),
(7, 7, 4.7, '2025-03-01'),
(8, 8, 4.8, '2025-03-05'),
(9, 9, 4.2, '2025-03-10');

Learnings
• Filtering data based on genre and release date.
• Using JOINs to combine albums, tracks, and ratings.
• HAVING clause to filter albums that meet the track rating threshold.
• AVG() to calculate the average rating for each album.
• GROUP BY to group by album and apply filtering after aggregation.
Solutions

468
1000+ SQL Interview Questions & Answers | By Zero Analyst

• - PostgreSQL solution
WITH album_ratings AS (
SELECT a.album_id, a.album_name, AVG(r.rating) AS avg_rating, COUNT(CASE WHEN r.rati
ng >= 4.0 THEN 1 END) AS high_rating_count
FROM albums a
JOIN tracks t ON a.album_id = t.album_id
JOIN ratings r ON t.track_id = r.track_id
WHERE a.genre = 'Classical'
AND a.release_date >= CURRENT_DATE - INTERVAL '12 months'
GROUP BY a.album_id, a.album_name
HAVING COUNT(CASE WHEN r.rating >= 4.0 THEN 1 END) >= 3
)
SELECT album_name, avg_rating
FROM album_ratings
ORDER BY avg_rating DESC
LIMIT 3;
• - MySQL solution
WITH album_ratings AS (
SELECT a.album_id, a.album_name, AVG(r.rating) AS avg_rating, COUNT(CASE WHEN r.rati
ng >= 4.0 THEN 1 END) AS high_rating_count
FROM albums a
JOIN tracks t ON a.album_id = t.album_id
JOIN ratings r ON t.track_id = r.track_id
WHERE a.genre = 'Classical'
AND a.release_date >= CURDATE() - INTERVAL 12 MONTH
GROUP BY a.album_id, a.album_name
HAVING COUNT(CASE WHEN r.rating >= 4.0 THEN 1 END) >= 3
)
SELECT album_name, avg_rating
FROM album_ratings
ORDER BY avg_rating DESC
LIMIT 3;
• Q.376
Question
Identify the peak hours when the most Indian users are active on Spotify, based on the
timestamps of when they rated tracks. Show the top 3 hours during which the highest
number of ratings are given by users.
Explanation
You need to analyze the ratings table by extracting the hour from the rating_date field.
Then, filter for Indian users and count the number of ratings given by users during each hour.
Finally, retrieve the top 3 hours during which the most ratings are given.
Datasets and SQL Schemas
• - Table creation and sample data
CREATE TABLE users (
user_id INT PRIMARY KEY,
user_name VARCHAR(100),
nationality VARCHAR(50)
);

CREATE TABLE ratings (


rating_id INT PRIMARY KEY,
track_id INT,
user_id INT,
rating DECIMAL(3, 2),
rating_date TIMESTAMP,
FOREIGN KEY (user_id) REFERENCES users(user_id)
);

-- Sample data insertions


INSERT INTO users (user_id, user_name, nationality)
VALUES
(1, 'Raj Sharma', 'Indian'),

469
1000+ SQL Interview Questions & Answers | By Zero Analyst

(2, 'Priya Patel', 'Indian'),


(3, 'John Doe', 'American'),
(4, 'Amit Gupta', 'Indian');

INSERT INTO ratings (rating_id, track_id, user_id, rating, rating_date)


VALUES
(1, 1, 1, 4.5, '2025-01-01 08:15:00'),
(2, 2, 1, 4.7, '2025-01-01 08:45:00'),
(3, 3, 2, 3.8, '2025-01-01 09:00:00'),
(4, 4, 2, 4.2, '2025-01-01 09:30:00'),
(5, 5, 3, 4.5, '2025-01-01 10:00:00'),
(6, 6, 4, 4.7, '2025-01-01 10:45:00'),
(7, 7, 1, 3.9, '2025-01-01 11:15:00'),
(8, 8, 1, 4.6, '2025-01-01 12:30:00'),
(9, 9, 4, 4.8, '2025-01-01 14:00:00'),
(10, 10, 4, 4.3, '2025-01-01 14:30:00');

Learnings
• Extracting the hour from a timestamp using EXTRACT() or HOUR().
• GROUP BY to group ratings by hour.
• Filtering users by nationality.
• COUNT() to count ratings in each hour.
• Using ORDER BY to rank hours by activity.
Solutions
• - PostgreSQL solution
SELECT EXTRACT(HOUR FROM r.rating_date) AS rating_hour, COUNT(*) AS ratings_count
FROM ratings r
JOIN users u ON r.user_id = u.user_id
WHERE u.nationality = 'Indian'
GROUP BY rating_hour
ORDER BY ratings_count DESC
LIMIT 3;
• - MySQL solution
SELECT HOUR(r.rating_date) AS rating_hour, COUNT(*) AS ratings_count
FROM ratings r
JOIN users u ON r.user_id = u.user_id
WHERE u.nationality = 'Indian'
GROUP BY rating_hour
ORDER BY ratings_count DESC
LIMIT 3;
• Q.377
Question
Identify the top 5 most listened tracks in the 'Pop' genre based on the number of ratings
given by users, considering only tracks released in the last 6 months.
Explanation
You need to join the "tracks," "albums," and "ratings" tables. Filter tracks based on the genre
('Pop') and release date (last 6 months). Then, count the number of ratings each track has
received. Finally, retrieve the top 5 tracks based on the highest number of ratings.
Datasets and SQL Schemas
• - Table creation and sample data
CREATE TABLE albums (
album_id INT PRIMARY KEY,
album_name VARCHAR(100),
genre VARCHAR(50),
release_date DATE
);

CREATE TABLE tracks (

470
1000+ SQL Interview Questions & Answers | By Zero Analyst

track_id INT PRIMARY KEY,


album_id INT,
track_name VARCHAR(100),
FOREIGN KEY (album_id) REFERENCES albums(album_id)
);

CREATE TABLE ratings (


rating_id INT PRIMARY KEY,
track_id INT,
user_id INT,
rating DECIMAL(3, 2),
rating_date TIMESTAMP,
FOREIGN KEY (track_id) REFERENCES tracks(track_id)
);

-- Sample data insertions


INSERT INTO albums (album_id, album_name, genre, release_date)
VALUES
(1, 'Pop Hits 2025', 'Pop', '2025-03-01'),
(2, 'Summer Vibes', 'Pop', '2025-05-10'),
(3, 'Rock Classics', 'Rock', '2024-12-01'),
(4, 'Pop Party 2025', 'Pop', '2025-01-15');

INSERT INTO tracks (track_id, album_id, track_name)


VALUES
(1, 1, 'Track 1'),
(2, 1, 'Track 2'),
(3, 2, 'Track 3'),
(4, 2, 'Track 4'),
(5, 4, 'Track 5');

INSERT INTO ratings (rating_id, track_id, user_id, rating, rating_date)


VALUES
(1, 1, 1, 4.5, '2025-01-10'),
(2, 1, 2, 4.7, '2025-01-12'),
(3, 3, 3, 3.8, '2025-02-05'),
(4, 4, 4, 4.2, '2025-02-10'),
(5, 5, 1, 4.5, '2025-03-01'),
(6, 2, 2, 4.6, '2025-03-15'),
(7, 4, 4, 3.9, '2025-04-01');

Learnings
• Filtering data by genre and release date.
• Using COUNT() to count ratings for each track.
• GROUP BY to aggregate data by track.
• Sorting results with ORDER BY to find the top tracks.
Solutions
• - PostgreSQL solution
SELECT t.track_name, COUNT(r.rating_id) AS ratings_count
FROM tracks t
JOIN albums a ON t.album_id = a.album_id
JOIN ratings r ON t.track_id = r.track_id
WHERE a.genre = 'Pop'
AND a.release_date >= CURRENT_DATE - INTERVAL '6 months'
GROUP BY t.track_name
ORDER BY ratings_count DESC
LIMIT 5;
• - MySQL solution
SELECT t.track_name, COUNT(r.rating_id) AS ratings_count
FROM tracks t
JOIN albums a ON t.album_id = a.album_id
JOIN ratings r ON t.track_id = r.track_id
WHERE a.genre = 'Pop'
AND a.release_date >= CURDATE() - INTERVAL 6 MONTH
GROUP BY t.track_name
ORDER BY ratings_count DESC

471
1000+ SQL Interview Questions & Answers | By Zero Analyst

LIMIT 5;
• Q.378
Question
Write a query to calculate the average rating for each track on Spotify, but apply the
following logic using a CASE statement:
• If the average rating is greater than or equal to 4.5, label the track as 'Excellent'.
• If the average rating is between 3.5 and 4.4, label the track as 'Good'.
• If the average rating is below 3.5, label the track as 'Poor'.
Explanation
You need to calculate the average rating for each track, and based on that average rating, use
a CASE statement to categorize the track as 'Excellent', 'Good', or 'Poor'. This will involve
joining the "tracks" and "ratings" tables, then applying the AVG() function for each track,
followed by the CASE statement to assign the appropriate label.
Datasets and SQL Schemas
• - Table creation and sample data
CREATE TABLE tracks (
track_id INT PRIMARY KEY,
track_name VARCHAR(100)
);

CREATE TABLE ratings (


rating_id INT PRIMARY KEY,
track_id INT,
rating DECIMAL(3, 2),
FOREIGN KEY (track_id) REFERENCES tracks(track_id)
);

-- Sample data insertions


INSERT INTO tracks (track_id, track_name)
VALUES
(1, 'Track 1'),
(2, 'Track 2'),
(3, 'Track 3'),
(4, 'Track 4');

INSERT INTO ratings (rating_id, track_id, rating)


VALUES
(1, 1, 4.6),
(2, 1, 4.8),
(3, 2, 3.5),
(4, 2, 4.0),
(5, 3, 2.9),
(6, 3, 3.2),
(7, 4, 4.1),
(8, 4, 4.5);

Learnings
• Using the AVG() function to calculate the average of a column.
• Applying CASE statements to create conditional logic.
• Using GROUP BY to aggregate results by track.
• Filtering and categorizing data dynamically based on the calculated average.
Solutions
• - PostgreSQL solution
SELECT t.track_name,
AVG(r.rating) AS avg_rating,
CASE

472
1000+ SQL Interview Questions & Answers | By Zero Analyst

WHEN AVG(r.rating) >= 4.5 THEN 'Excellent'


WHEN AVG(r.rating) BETWEEN 3.5 AND 4.4 THEN 'Good'
ELSE 'Poor'
END AS rating_category
FROM tracks t
JOIN ratings r ON t.track_id = r.track_id
GROUP BY t.track_name;
• - MySQL solution
SELECT t.track_name,
AVG(r.rating) AS avg_rating,
CASE
WHEN AVG(r.rating) >= 4.5 THEN 'Excellent'
WHEN AVG(r.rating) BETWEEN 3.5 AND 4.4 THEN 'Good'
ELSE 'Poor'
END AS rating_category
FROM tracks t
JOIN ratings r ON t.track_id = r.track_id
GROUP BY t.track_name;
• Q.379
Question
Identify the top 5 highest-rated tracks from Hollywood music albums based on their
average rating. Show the track name, album name, average rating, and rank based on the
highest rating.
Explanation
You need to join the "tracks," "albums," and "ratings" tables. Filter the albums by the genre
'Hollywood' and then calculate the average rating for each track. Rank the tracks based on
their average rating and return the top 5 highest-rated tracks.
Datasets and SQL Schemas
• - Table creation and sample data
CREATE TABLE albums (
album_id INT PRIMARY KEY,
album_name VARCHAR(100),
genre VARCHAR(50)
);

CREATE TABLE tracks (


track_id INT PRIMARY KEY,
album_id INT,
track_name VARCHAR(100),
FOREIGN KEY (album_id) REFERENCES albums(album_id)
);

CREATE TABLE ratings (


rating_id INT PRIMARY KEY,
track_id INT,
rating DECIMAL(3, 2),
FOREIGN KEY (track_id) REFERENCES tracks(track_id)
);

-- Sample data insertions


INSERT INTO albums (album_id, album_name, genre)
VALUES
(1, 'Hollywood Hits 2025', 'Hollywood'),
(2, 'Famous Soundtracks', 'Hollywood'),
(3, 'Pop Classics', 'Pop'),
(4, 'Classic Hollywood', 'Hollywood');

INSERT INTO tracks (track_id, album_id, track_name)


VALUES
(1, 1, 'Epic Tune'),
(2, 1, 'Love Story Theme'),
(3, 2, 'Action Anthem'),
(4, 2, 'Dramatic Prelude'),

473
1000+ SQL Interview Questions & Answers | By Zero Analyst

(5, 4, 'Golden Years Theme');

INSERT INTO ratings (rating_id, track_id, rating)


VALUES
(1, 1, 4.8),
(2, 1, 4.6),
(3, 2, 4.9),
(4, 2, 4.5),
(5, 3, 4.7),
(6, 4, 5.0),
(7, 4, 4.6);

Learnings
• Using JOINs to combine data from multiple tables.
• Calculating average ratings using AVG().
• RANK() to rank tracks based on average rating.
• Filtering results based on genre.
Solutions
• - PostgreSQL solution
WITH ranked_tracks AS (
SELECT t.track_name,
a.album_name,
AVG(r.rating) AS avg_rating,
RANK() OVER (ORDER BY AVG(r.rating) DESC) AS track_rank
FROM tracks t
JOIN albums a ON t.album_id = a.album_id
JOIN ratings r ON t.track_id = r.track_id
WHERE a.genre = 'Hollywood'
GROUP BY t.track_name, a.album_name
)
SELECT track_name, album_name, avg_rating, track_rank
FROM ranked_tracks
WHERE track_rank <= 5
ORDER BY track_rank;
• - MySQL solution
WITH ranked_tracks AS (
SELECT t.track_name,
a.album_name,
AVG(r.rating) AS avg_rating,
RANK() OVER (ORDER BY AVG(r.rating) DESC) AS track_rank
FROM tracks t
JOIN albums a ON t.album_id = a.album_id
JOIN ratings r ON t.track_id = r.track_id
WHERE a.genre = 'Hollywood'
GROUP BY t.track_name, a.album_name
)
SELECT track_name, album_name, avg_rating, track_rank
FROM ranked_tracks
WHERE track_rank <= 5
ORDER BY track_rank;
• Q.380
Question
Identify the users who canceled their subscriptions in the last month. For each of these
users, also calculate their listen time growth (difference in total listening time between the
last month before cancellation and the month prior to that).
Explanation
You need to:
• Identify users who canceled their subscriptions in the last month.

474
1000+ SQL Interview Questions & Answers | By Zero Analyst

• For these users, calculate their total listening time for the month immediately before the
cancellation and the month before that.
• Calculate the listen time growth as the difference in listening time between these two
months.
This will involve joining the "users," "subscriptions," and "listen_logs" tables, using date
manipulation to get the correct months, and then calculating the growth in listening time.
Datasets and SQL Schemas
• - Table creation and sample data
CREATE TABLE users (
user_id INT PRIMARY KEY,
user_name VARCHAR(100),
subscription_status VARCHAR(50),
subscription_end_date DATE
);

CREATE TABLE subscriptions (


subscription_id INT PRIMARY KEY,
user_id INT,
start_date DATE,
end_date DATE,
status VARCHAR(50),
FOREIGN KEY (user_id) REFERENCES users(user_id)
);

CREATE TABLE listen_logs (


log_id INT PRIMARY KEY,
user_id INT,
listen_time INT, -- in minutes
log_date DATE,
FOREIGN KEY (user_id) REFERENCES users(user_id)
);

-- Sample data insertions


INSERT INTO users (user_id, user_name, subscription_status, subscription_end_date)
VALUES
(1, 'Raj Sharma', 'Active', '2025-06-15'),
(2, 'Priya Patel', 'Canceled', '2025-04-30'),
(3, 'John Doe', 'Active', NULL),
(4, 'Amit Gupta', 'Canceled', '2025-05-31');

INSERT INTO subscriptions (subscription_id, user_id, start_date, end_date, status)


VALUES
(1, 1, '2024-01-01', '2025-06-15', 'Active'),
(2, 2, '2024-01-01', '2025-04-30', 'Canceled'),
(3, 3, '2024-01-01', NULL, 'Active'),
(4, 4, '2024-01-01', '2025-05-31', 'Canceled');

INSERT INTO listen_logs (log_id, user_id, listen_time, log_date)


VALUES
(1, 1, 120, '2025-05-01'),
(2, 1, 150, '2025-05-10'),
(3, 1, 200, '2025-06-05'),
(4, 2, 180, '2025-04-01'),
(5, 2, 210, '2025-04-10'),
(6, 2, 220, '2025-04-15'),
(7, 4, 100, '2025-05-01'),
(8, 4, 120, '2025-05-10'),
(9, 4, 150, '2025-05-15');

Learnings
• Using date manipulation functions like EXTRACT() or DATE_TRUNC() to extract months
and years.
• JOINs to link users, subscriptions, and listen logs.
• Calculating total listen time using SUM() and filtering by months.

475
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Using GROUP BY to aggregate results by user and month.


• Calculating growth by comparing listen time across two months.
Solutions
• - PostgreSQL solution
WITH canceled_users AS (
SELECT u.user_id, u.user_name, u.subscription_end_date
FROM users u
JOIN subscriptions s ON u.user_id = s.user_id
WHERE s.status = 'Canceled'
AND s.end_date BETWEEN CURRENT_DATE - INTERVAL '1 month' AND CURRENT_DATE
), listen_time_growth AS (
SELECT l.user_id,
EXTRACT(MONTH FROM l.log_date) AS log_month,
EXTRACT(YEAR FROM l.log_date) AS log_year,
SUM(l.listen_time) AS total_listen_time
FROM listen_logs l
GROUP BY l.user_id, EXTRACT(MONTH FROM l.log_date), EXTRACT(YEAR FROM l.log_date)
)
SELECT cu.user_name,
lt1.total_listen_time AS last_month_listen_time,
lt2.total_listen_time AS previous_month_listen_time,
(lt1.total_listen_time - lt2.total_listen_time) AS listen_time_growth
FROM canceled_users cu
JOIN listen_time_growth lt1 ON cu.user_id = lt1.user_id
JOIN listen_time_growth lt2 ON cu.user_id = lt2.user_id
WHERE lt1.log_month = EXTRACT(MONTH FROM CURRENT_DATE) - 1
AND lt2.log_month = EXTRACT(MONTH FROM CURRENT_DATE) - 2
ORDER BY listen_time_growth DESC;
• - MySQL solution
WITH canceled_users AS (
SELECT u.user_id, u.user_name, u.subscription_end_date
FROM users u
JOIN subscriptions s ON u.user_id = s.user_id
WHERE s.status = 'Canceled'
AND s.end_date BETWEEN CURDATE() - INTERVAL 1 MONTH AND CURDATE()
), listen_time_growth AS (
SELECT l.user_id,
MONTH(l.log_date) AS log_month,
YEAR(l.log_date) AS log_year,
SUM(l.listen_time) AS total_listen_time
FROM listen_logs l
GROUP BY l.user_id, MONTH(l.log_date), YEAR(l.log_date)
)
SELECT cu.user_name,
lt1.total_listen_time AS last_month_listen_time,
lt2.total_listen_time AS previous_month_listen_time,
(lt1.total_listen_time - lt2.total_listen_time) AS listen_time_growth
FROM canceled_users cu
JOIN listen_time_growth lt1 ON cu.user_id = lt1.user_id
JOIN listen_time_growth lt2 ON cu.user_id = lt2.user_id
WHERE lt1.log_month = MONTH(CURDATE()) - 1
AND lt2.log_month = MONTH(CURDATE()) - 2
ORDER BY listen_time_growth DESC;

AirBnB
• Q.381
Problem statement
Write an SQL query to find the total number of listings available in each city on Airbnb.
Explanation
• Group the data by city.
• Count the number of listings in each city.

476
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Return the result sorted by city.


Datasets and SQL Schemas
• - Table creation and sample data
CREATE TABLE Listings (
ListingID INT PRIMARY KEY,
City VARCHAR(100),
Price DECIMAL(10, 2),
Available BOOLEAN
);

-- Sample data insertions


INSERT INTO Listings (ListingID, City, Price, Available)
VALUES
(1, 'New York', 150.00, TRUE),
(2, 'Los Angeles', 200.00, TRUE),
(3, 'New York', 180.00, FALSE),
(4, 'San Francisco', 220.00, TRUE),
(5, 'Los Angeles', 170.00, TRUE),
(6, 'New York', 250.00, TRUE);

Learnings
• COUNT(): To count the number of listings in each city.
• GROUP BY: To group by City to calculate the total per city.
Solutions
• - PostgreSQL and MySQL solution
SELECT City, COUNT(ListingID) AS TotalListings
FROM Listings
GROUP BY City
ORDER BY City;

• Q.382
Problem statement
Write an SQL query to find the average price of listings in each city where the listings are
available (i.e., Available = TRUE) on Airbnb.
Explanation
• Filter the listings where Available = TRUE.
• Group the data by City.
• Calculate the average price for each city.
Datasets and SQL Schemas
• - Table creation and sample data
CREATE TABLE Listings (
ListingID INT PRIMARY KEY,
City VARCHAR(100),
Price DECIMAL(10, 2),
Available BOOLEAN
);

-- Sample data insertions


INSERT INTO Listings (ListingID, City, Price, Available)
VALUES
(1, 'New York', 150.00, TRUE),
(2, 'Los Angeles', 200.00, TRUE),
(3, 'New York', 180.00, FALSE),
(4, 'San Francisco', 220.00, TRUE),
(5, 'Los Angeles', 170.00, TRUE),
(6, 'San Francisco', 250.00, TRUE);

477
1000+ SQL Interview Questions & Answers | By Zero Analyst

Learnings
• AVG(): To calculate the average price.
• WHERE: To filter the records based on availability.
• GROUP BY: To group data by City.
Solutions
• - PostgreSQL and MySQL solution
SELECT City, AVG(Price) AS AveragePrice
FROM Listings
WHERE Available = TRUE
GROUP BY City;
• Q.383

Problem statement
Write an SQL query to find the top 3 highest-priced listings for each city on Airbnb. The
result should return the ListingID, City, and Price, sorted in descending order by price
within each city.
Explanation
• Sort the listings within each city by Price in descending order.
• Use a window function to assign a rank (1 to N) for the price within each city.
• Filter the results to return only the top 3 listings for each city.
Datasets and SQL Schemas
• - Table creation and sample data
CREATE TABLE Listings (
ListingID INT PRIMARY KEY,
City VARCHAR(100),
Price DECIMAL(10, 2),
Available BOOLEAN
);

-- Sample data insertions


INSERT INTO Listings (ListingID, City, Price, Available)
VALUES
(1, 'New York', 150.00, TRUE),
(2, 'Los Angeles', 200.00, TRUE),
(3, 'New York', 300.00, TRUE),
(4, 'San Francisco', 500.00, TRUE),
(5, 'Los Angeles', 450.00, TRUE),
(6, 'San Francisco', 550.00, TRUE),
(7, 'New York', 400.00, TRUE),
(8, 'San Francisco', 350.00, TRUE);

Learnings
• ROW_NUMBER(): To rank listings within each city by price.
• PARTITION BY: To restart ranking for each city.
• ORDER BY: To order listings by price in descending order.
Solutions
• - PostgreSQL and MySQL solution
WITH RankedListings AS (
SELECT ListingID,
City,
Price,
ROW_NUMBER() OVER (PARTITION BY City ORDER BY Price DESC) AS rank
FROM Listings

478
1000+ SQL Interview Questions & Answers | By Zero Analyst

)
SELECT ListingID, City, Price
FROM RankedListings
WHERE rank <= 3
ORDER BY City, Price DESC;

Explanation:
• ROW_NUMBER() OVER (PARTITION BY City ORDER BY Price DESC): This
window function ranks the listings by Price in descending order for each City.
• WITH RankedListings: This common table expression (CTE) ranks all listings within
their respective cities.
• WHERE rank <= 3: Filters the results to return only the top 3 listings for each city.
• ORDER BY City, Price DESC: Orders the final results by city name and price in
descending order.

• Q.384

Problem statement
Write an SQL query to find the number of available listings in each city on Airbnb.
Explanation
• Filter the data to include only listings where Available = TRUE.
• Group the results by city.
• Count the number of available listings for each city.
Datasets and SQL Schemas
• - Table creation and sample data
CREATE TABLE Listings (
ListingID INT PRIMARY KEY,
City VARCHAR(100),
Price DECIMAL(10, 2),
Available BOOLEAN
);

-- Sample data insertions


INSERT INTO Listings (ListingID, City, Price, Available)
VALUES
(1, 'New York', 150.00, TRUE),
(2, 'Los Angeles', 200.00, FALSE),
(3, 'New York', 180.00, TRUE),
(4, 'San Francisco', 220.00, TRUE),
(5, 'Los Angeles', 170.00, TRUE),
(6, 'San Francisco', 250.00, FALSE);

Learnings
• COUNT(): To count the number of available listings.
• WHERE: To filter only the available listings.
• GROUP BY: To group the data by City.
Solutions
• - PostgreSQL and MySQL solution
SELECT City, COUNT(ListingID) AS AvailableListings
FROM Listings
WHERE Available = TRUE
GROUP BY City;

479
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Q.385
Problem statement
Write an SQL query to find the top 2 highest-priced listings in each city on Airbnb.
Explanation
• Sort listings by Price in descending order for each city.
• Use a window function to assign ranks to listings based on price within each city.
• Filter to only return the top 2 highest-priced listings.
Datasets and SQL Schemas
• - Table creation and sample data
CREATE TABLE Listings (
ListingID INT PRIMARY KEY,
City VARCHAR(100),
Price DECIMAL(10, 2),
Available BOOLEAN
);

-- Sample data insertions


INSERT INTO Listings (ListingID, City, Price, Available)
VALUES
(1, 'New York', 150.00, TRUE),
(2, 'Los Angeles', 200.00, TRUE),
(3, 'New York', 300.00, TRUE),
(4, 'San Francisco', 500.00, TRUE),
(5, 'Los Angeles', 450.00, TRUE),
(6, 'San Francisco', 550.00, TRUE),
(7, 'New York', 400.00, TRUE),
(8, 'San Francisco', 350.00, TRUE);

Learnings
• ROW_NUMBER(): To rank the listings within each city based on price.
• PARTITION BY: To partition the data by city.
• ORDER BY: To order by price in descending order.
Solutions
• - PostgreSQL and MySQL solution
WITH RankedListings AS (
SELECT ListingID,
City,
Price,
ROW_NUMBER() OVER (PARTITION BY City ORDER BY Price DESC) AS rank
FROM Listings
)
SELECT ListingID, City, Price
FROM RankedListings
WHERE rank <= 2
ORDER BY City, Price DESC;
• Q.386
Problem statement
Write an SQL query to find the average price of listings by city and availability status
(Available vs. Not Available) on Airbnb.
Explanation
• Group the data by City and Availability (Available or Not Available).

480
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Calculate the average price for each group.


• Return the result showing the city, availability status, and average price.
Datasets and SQL Schemas
• - Table creation and sample data
CREATE TABLE Listings (
ListingID INT PRIMARY KEY,
City VARCHAR(100),
Price DECIMAL(10, 2),
Available BOOLEAN
);

-- Sample data insertions


INSERT INTO Listings (ListingID, City, Price, Available)
VALUES
(1, 'New York', 150.00, TRUE),
(2, 'Los Angeles', 200.00, FALSE),
(3, 'New York', 300.00, TRUE),
(4, 'San Francisco', 500.00, TRUE),
(5, 'Los Angeles', 450.00, TRUE),
(6, 'San Francisco', 550.00, FALSE),
(7, 'New York', 400.00, FALSE),
(8, 'San Francisco', 350.00, TRUE);

Learnings
• GROUP BY: To group the data by both City and Available status.
• AVG(): To calculate the average price for each group.
Solutions
• - PostgreSQL and MySQL solution
SELECT City,
CASE
WHEN Available = TRUE THEN 'Available'
ELSE 'Not Available'
END AS AvailabilityStatus,
AVG(Price) AS AveragePrice
FROM Listings
GROUP BY City, AvailabilityStatus
ORDER BY City, AvailabilityStatus;

Explanation for Advanced Question:


• CASE WHEN Available = TRUE THEN 'Available' ELSE 'Not Available' END: This
converts the boolean value of the Available column into a more readable string, either
'Available' or 'Not Available'.
• AVG(Price): Calculates the average price for listings within each group.
• GROUP BY City, AvailabilityStatus: Groups the data by both the City and
AvailabilityStatus to calculate the average for each combination.
• ORDER BY: Orders the results by city and availability status for better readability.

• Q.387

Problem statement
Write an SQL query to find the total revenue generated by Airbnb listings on each day of
the week. Use the OrderDate from the Orders table.
Explanation

481
1000+ SQL Interview Questions & Answers | By Zero Analyst

• DAYOFWEEK() function to extract the day of the week from the OrderDate.
• SUM() to calculate the total revenue for each day of the week.
• Use a CASE statement to ensure that days are labeled with their corresponding names
(e.g., Monday, Tuesday).
Datasets and SQL Schemas
• - Table creation and sample data
CREATE TABLE Orders (
OrderID INT PRIMARY KEY,
ListingID INT,
TotalAmount DECIMAL(10, 2),
OrderDate DATE
);

-- Sample data insertions


INSERT INTO Orders (OrderID, ListingID, TotalAmount, OrderDate)
VALUES
(1, 101, 150.00, '2024-06-01'),
(2, 102, 200.00, '2024-06-02'),
(3, 103, 300.00, '2024-06-03'),
(4, 101, 250.00, '2024-06-01'),
(5, 102, 180.00, '2024-06-02');

Learnings
• DAYOFWEEK(): Extracts the day of the week from a date.
• SUM(): Aggregates the total revenue.
• CASE: Used to label the days of the week.
Solutions
• - PostgreSQL and MySQL solution
SELECT
CASE
WHEN DAYOFWEEK(OrderDate) = 1 THEN 'Sunday'
WHEN DAYOFWEEK(OrderDate) = 2 THEN 'Monday'
WHEN DAYOFWEEK(OrderDate) = 3 THEN 'Tuesday'
WHEN DAYOFWEEK(OrderDate) = 4 THEN 'Wednesday'
WHEN DAYOFWEEK(OrderDate) = 5 THEN 'Thursday'
WHEN DAYOFWEEK(OrderDate) = 6 THEN 'Friday'
WHEN DAYOFWEEK(OrderDate) = 7 THEN 'Saturday'
END AS DayOfWeek,
SUM(TotalAmount) AS TotalRevenue
FROM Orders
GROUP BY DayOfWeek
ORDER BY FIELD(DayOfWeek, 'Sunday', 'Monday', 'Tuesday', 'Wednesday', 'Thursday',
'Friday', 'Saturday');

• Q.388
Problem statement
Write an SQL query to calculate the average price of listings that were booked more than 3
times in the past month. Return the listing ID and average price.
Explanation
• DATE_SUB() and CURDATE() (or CURRENT_DATE()) to filter orders in the past
month.
• COUNT() to filter listings that have more than 3 bookings.
• CASE statement to calculate average price only for listings that meet the booking
condition.

482
1000+ SQL Interview Questions & Answers | By Zero Analyst

Datasets and SQL Schemas


• - Table creation and sample data
CREATE TABLE Listings (
ListingID INT PRIMARY KEY,
Price DECIMAL(10, 2),
Available BOOLEAN
);

CREATE TABLE Orders (


OrderID INT PRIMARY KEY,
ListingID INT,
OrderDate DATE,
TotalAmount DECIMAL(10, 2)
);

-- Sample data insertions


INSERT INTO Listings (ListingID, Price, Available)
VALUES
(101, 150.00, TRUE),
(102, 200.00, TRUE),
(103, 180.00, TRUE),
(104, 250.00, TRUE);

INSERT INTO Orders (OrderID, ListingID, OrderDate, TotalAmount)


VALUES
(1, 101, '2024-12-10', 150.00),
(2, 101, '2024-12-12', 150.00),
(3, 101, '2024-12-14', 150.00),
(4, 102, '2024-12-01', 200.00),
(5, 102, '2024-12-02', 200.00),
(6, 103, '2024-12-04', 180.00),
(7, 104, '2024-12-05', 250.00);

Learnings
• DATE_SUB(): Subtracts a specified interval from a date.
• COUNT(): Filters based on the number of bookings.
• CASE: Conditionally calculates the average price only for listings with more than 3
bookings.
Solutions
• - PostgreSQL and MySQL solution
SELECT ListingID,
AVG(Price) AS AvgPrice
FROM Listings
WHERE ListingID IN (
SELECT ListingID
FROM Orders
WHERE OrderDate >= DATE_SUB(CURRENT_DATE(), INTERVAL 1 MONTH)
GROUP BY ListingID
HAVING COUNT(OrderID) > 3
)
GROUP BY ListingID;
• Q.389
Problem statement
Write an SQL query to calculate the average length of stay for each customer. Use the
BookingDate and CheckoutDate from the Bookings table. The result should also include the
total spending by each customer, and categorize the total spending using a CASE statement
into "Low", "Medium", and "High" based on the following criteria:
• Low: Total spending <= 500
• Medium: Total spending between 500 and 1000

483
1000+ SQL Interview Questions & Answers | By Zero Analyst

• High: Total spending > 1000


Explanation
• DATEDIFF() to calculate the length of stay for each booking.
• CASE statement to categorize total spending into "Low", "Medium", and "High".
• Group by CustomerID and aggregate the results.
Datasets and SQL Schemas
• - Table creation and sample data
CREATE TABLE Bookings (
BookingID INT PRIMARY KEY,
CustomerID INT,
ListingID INT,
BookingDate DATE,
CheckoutDate DATE,
TotalAmount DECIMAL(10, 2)
);

-- Sample data insertions


INSERT INTO Bookings (BookingID, CustomerID, ListingID, BookingDate, CheckoutDate, Total
Amount)
VALUES
(1, 101, 1001, '2024-12-01', '2024-12-05', 600.00),
(2, 102, 1002, '2024-12-02', '2024-12-04', 400.00),
(3, 101, 1003, '2024-12-10', '2024-12-15', 150.00),
(4, 103, 1004, '2024-12-01', '2024-12-08', 1200.00),
(5, 103, 1005, '2024-12-12', '2024-12-14', 300.00);

Learnings
• DATEDIFF(): To calculate the difference between two dates (length of stay).
• CASE: To categorize total spending into "Low", "Medium", and "High".
• AVG(): To calculate the average length of stay.
• SUM(): To calculate total spending.
Solutions
• - PostgreSQL and MySQL solution
SELECT CustomerID,
AVG(DATEDIFF(CheckoutDate, BookingDate)) AS AvgLengthOfStay,
SUM(TotalAmount) AS TotalSpending,
CASE
WHEN SUM(TotalAmount) <= 500 THEN 'Low'
WHEN SUM(TotalAmount) BETWEEN 500 AND 1000 THEN 'Medium'
ELSE 'High'
END AS SpendingCategory
FROM Bookings
GROUP BY CustomerID;

Explanation for Advanced Question:


• DATEDIFF(CheckoutDate, BookingDate): This function calculates the length of stay for
each booking (in days).
• SUM(TotalAmount): Calculates the total spending by each customer.
• CASE WHEN: Categorizes the total spending into "Low", "Medium", or "High" based on
the specified criteria.
• AVG(DATEDIFF()): Calculates the average length of stay across all bookings for each
customer.
• GROUP BY: Groups the data by CustomerID to calculate the required statistics per
customer.

484
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Q.390
Problem statement
Write an SQL query to find the unique domain names from the Email column in the Users
table. The domain name should be everything after the "@" symbol.
Explanation
• Use SUBSTRING_INDEX() or REGEXP_SUBSTR() (depending on your SQL dialect)
to extract the domain part of the email.
• Use DISTINCT to return unique domain names.
Datasets and SQL Schemas
• - Table creation and sample data
CREATE TABLE Users (
UserID INT PRIMARY KEY,
Name VARCHAR(100),
Email VARCHAR(100)
);

-- Sample data insertions


INSERT INTO Users (UserID, Name, Email)
VALUES
(1, 'John Doe', '[email protected]'),
(2, 'Jane Smith', '[email protected]'),
(3, 'Sam Brown', '[email protected]'),
(4, 'Emily White', '[email protected]');

Learnings
• SUBSTRING_INDEX(): To extract the part of a string before or after a specified
delimiter.
• DISTINCT: To get unique values from a result set.
Solutions
• - PostgreSQL and MySQL solution
SELECT DISTINCT
REGEXP_SUBSTR(Email, '@(.+)$') AS Domain
FROM Users;

• Q.391
Problem statement
Write an SQL query to extract the first 5 characters of the description column from the
Listings table and return only the rows where the description starts with "Lux".

Explanation
• Use SUBSTRING() or REGEXP_SUBSTR() to extract the first 5 characters.
• Use REGEXP or LIKE to filter descriptions that start with "Lux".
Datasets and SQL Schemas
• - Table creation and sample data
CREATE TABLE Listings (
ListingID INT PRIMARY KEY,
Description VARCHAR(255),
Price DECIMAL(10, 2)
);

485
1000+ SQL Interview Questions & Answers | By Zero Analyst

-- Sample data insertions


INSERT INTO Listings (ListingID, Description, Price)
VALUES
(1, 'Luxurious apartment in downtown', 150.00),
(2, 'Cozy cottage in the woods', 120.00),
(3, 'Luxury villa with pool', 250.00),
(4, 'Charming studio by the beach', 80.00);

Learnings
• SUBSTRING(): To extract part of a string.
• LIKE: To filter strings based on a pattern.
• REGEXP: To filter strings using a regular expression.
Solutions
• - PostgreSQL and MySQL solution
SELECT ListingID,
SUBSTRING(Description, 1, 5) AS ShortDescription
FROM Listings
WHERE Description LIKE 'Lux%';
• Q.392
Problem statement
Write an SQL query to find all listings whose Description column contains a valid phone
number in the format "(xxx) xxx-xxxx" (where "x" is a digit). Return the ListingID and
Description.

Explanation
• Use REGEXP to match the phone number pattern in the description.
• The phone number should be in the format (xxx) xxx-xxxx, where x is a digit.
• Use REGEXP_LIKE() (in PostgreSQL) or REGEXP (in MySQL) to filter descriptions
that match this pattern.
Datasets and SQL Schemas
• - Table creation and sample data
CREATE TABLE Listings (
ListingID INT PRIMARY KEY,
Description VARCHAR(255),
Price DECIMAL(10, 2)
);

-- Sample data insertions


INSERT INTO Listings (ListingID, Description, Price)
VALUES
(1, 'Luxurious apartment with phone number (123) 456-7890', 150.00),
(2, 'Beautiful home, contact at 123-456-7890', 200.00),
(3, 'Spacious villa, call (987) 654-3210 for details', 250.00),
(4, 'Contact (555) 555-5555 for booking', 300.00);

Learnings
• REGEXP: To match a pattern in a string.
• Pattern Matching: Using regex to find a phone number in a specific format.
Solutions
• - PostgreSQL solution
SELECT ListingID,
Description
FROM Listings
WHERE Description ~ '\\(\\d{3}\\) \\d{3}-\\d{4}';

486
1000+ SQL Interview Questions & Answers | By Zero Analyst

• - MySQL solution
SELECT ListingID,
Description
FROM Listings
WHERE Description REGEXP '\\([0-9]{3}\\) [0-9]{3}-[0-9]{4}';

Explanation for the Advanced Question:


• REGEXP: In MySQL, the REGEXP operator is used to match regular expressions.
• Pattern: The regular expression \\(\\d{3}\\) \\d{3}-\\d{4} is used to match the
phone number format (xxx) xxx-xxxx.
• \\( and \\) are used to escape the parentheses since they are special characters in regex.
• \\d{3} represents 3 digits.
• [0-9]{3} is the same as \\d{3}, used to match 3 digits for the second part of the phone
number.
• ~ (PostgreSQL): The ~ operator is used for regex matching in PostgreSQL.

• Q.393
Problem statement
Write an SQL query to calculate the rank of each listing based on its price within the
Listings table, sorted from highest to lowest price. The query should return the ListingID,
Price, and Rank.
Explanation
• Use the RANK() window function to assign ranks to the listings based on the Price in
descending order.
• The RANK() function assigns the same rank to listings with the same price.
Datasets and SQL Schemas
• - Table creation and sample data
CREATE TABLE Listings (
ListingID INT PRIMARY KEY,
Description VARCHAR(255),
Price DECIMAL(10, 2)
);

-- Sample data insertions


INSERT INTO Listings (ListingID, Description, Price)
VALUES
(1, 'Luxurious apartment in downtown', 150.00),
(2, 'Cozy cottage in the woods', 120.00),
(3, 'Luxury villa with pool', 250.00),
(4, 'Charming studio by the beach', 80.00),
(5, 'Modern penthouse with city view', 300.00);

Learnings
• RANK(): To assign ranks based on the sorting order of a column.
• PARTITION BY (optional): Not used here but can be added if we want to partition the
data by some column.
Solutions
• - PostgreSQL and MySQL solution
SELECT ListingID, Price,
RANK() OVER (ORDER BY Price DESC) AS Rank
FROM Listings;

487
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Q.394
Problem statement
Write an SQL query to calculate the average price of listings in each price range quartile.
The price ranges should be divided into 4 equal quartiles. Return the Quartile (1 to 4) and
the average price in each quartile.
Explanation
• Use NTILE(4) to divide the listings into 4 quartiles based on the Price.
• Use AVG() to calculate the average price within each quartile.
Datasets and SQL Schemas
• - Table creation and sample data
CREATE TABLE Listings (
ListingID INT PRIMARY KEY,
Description VARCHAR(255),
Price DECIMAL(10, 2)
);

-- Sample data insertions


INSERT INTO Listings (ListingID, Description, Price)
VALUES
(1, 'Luxurious apartment in downtown', 150.00),
(2, 'Cozy cottage in the woods', 120.00),
(3, 'Luxury villa with pool', 250.00),
(4, 'Charming studio by the beach', 80.00),
(5, 'Modern penthouse with city view', 300.00);

Learnings
• NTILE(): To divide the data into equal groups (quartiles, deciles, etc.).
• AVG(): To calculate the average of a set of values within each group.
Solutions
• - PostgreSQL and MySQL solution
SELECT NTILE(4) OVER (ORDER BY Price) AS Quartile,
AVG(Price) AS AvgPrice
FROM Listings
GROUP BY Quartile
ORDER BY Quartile;
• Q.395

Advanced Question:
Problem statement
Write an SQL query to find the moving average of prices over a 3-listing window based on
the Price column in the Listings table. Return the ListingID, Price, and 3-listing
moving average. The result should be sorted by ListingID in ascending order.
Explanation
• Use the AVG() window function with a 3-row sliding window to calculate the moving
average of the prices.
• Use ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING to create a 3-listing
window for calculating the moving average.
Datasets and SQL Schemas
• - Table creation and sample data

488
1000+ SQL Interview Questions & Answers | By Zero Analyst

CREATE TABLE Listings (


ListingID INT PRIMARY KEY,
Description VARCHAR(255),
Price DECIMAL(10, 2)
);

-- Sample data insertions


INSERT INTO Listings (ListingID, Description, Price)
VALUES
(1, 'Luxurious apartment in downtown', 150.00),
(2, 'Cozy cottage in the woods', 120.00),
(3, 'Luxury villa with pool', 250.00),
(4, 'Charming studio by the beach', 80.00),
(5, 'Modern penthouse with city view', 300.00);

Learnings
• AVG(): To calculate the average of a window of rows.
• ROWS BETWEEN: To define the sliding window for the moving average.
• Sliding Window: Using ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING to
include 3 rows for the moving average.
Solutions
• - PostgreSQL and MySQL solution
SELECT ListingID, Price,
AVG(Price) OVER (ORDER BY ListingID
ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) AS MovingAvgPrice
FROM Listings;

Explanation for the Advanced Question:


• ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING: This defines the window to
include the current row, the previous row, and the next row.
• It effectively creates a moving window of 3 rows around the current row.
• AVG(Price) OVER: The AVG() window function calculates the average of the Price
values over the defined window.
• The window slides over the rows based on the ORDER BY ListingID.
• The results are ordered by ListingID to ensure the moving average is calculated in the
correct sequence.

Key Concepts:
• NTILE(): To divide data into equal groups based on a numeric column (used for quartiles,
deciles, etc.).
• RANK(): Used to assign a rank to rows based on a specified ordering, with ties receiving
the same rank.
• Sliding Window (ROWS BETWEEN): A method to calculate running averages or other
aggregate functions over a sliding window of rows.
• Q.396
Problem statement
Write an SQL query to list all orders along with the customer name who placed the order. If
an order does not have a corresponding customer, still include the order with NULL for the
customer name.
Explanation

489
1000+ SQL Interview Questions & Answers | By Zero Analyst

To solve this:
• Use a LEFT JOIN to include all orders, even if there is no matching customer.
• Match Order.CustomerID with Customer.CustomerID to get the corresponding customer
name.
Datasets and SQL Schemas
• - Table creation and sample data
CREATE TABLE Orders (
OrderID INT PRIMARY KEY,
CustomerID INT,
OrderDate DATE
);

CREATE TABLE Customers (


CustomerID INT PRIMARY KEY,
CustomerName VARCHAR(100)
);

-- Sample data insertions


INSERT INTO Orders (OrderID, CustomerID, OrderDate)
VALUES
(1, 101, '2024-01-01'),
(2, 102, '2024-01-03'),
(3, 103, '2024-01-05'),
(4, 104, '2024-01-07'),
(5, 105, '2024-01-09'),
(6, NULL, '2024-01-11'),
(7, 106, '2024-01-12'),
(8, 107, '2024-01-14'),
(9, 108, '2024-01-16'),
(10, NULL, '2024-01-18');

INSERT INTO Customers (CustomerID, CustomerName)


VALUES
(101, 'Alice'),
(102, 'Bob'),
(103, 'Charlie'),
(104, 'David'),
(105, 'Eve'),
(106, 'Frank'),
(107, 'Grace'),
(108, 'Hannah');

Learnings
• LEFT JOIN: Includes all rows from the left table and matching rows from the right table,
returning NULL if no match is found.
• Join condition: The condition for matching CustomerID in both tables.
Solutions
• - PostgreSQL and MySQL solution
SELECT Orders.OrderID, Customers.CustomerName
FROM Orders
LEFT JOIN Customers
ON Orders.CustomerID = Customers.CustomerID;
• Q.397
Problem statement
Write an SQL query to find all customers who have placed more than one order. List the
customer name and the total number of orders they have placed.
Explanation
To solve this:

490
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Use an INNER JOIN between Customers and Orders on CustomerID.


• Use GROUP BY and HAVING to filter customers who have more than one order.
• Count the orders for each customer.
Datasets and SQL Schemas
• - Table creation and sample data
CREATE TABLE Orders (
OrderID INT PRIMARY KEY,
CustomerID INT,
OrderDate DATE
);

CREATE TABLE Customers (


CustomerID INT PRIMARY KEY,
CustomerName VARCHAR(100)
);

-- Sample data insertions


INSERT INTO Orders (OrderID, CustomerID, OrderDate)
VALUES
(1, 101, '2024-01-01'),
(2, 102, '2024-01-03'),
(3, 101, '2024-01-05'),
(4, 103, '2024-01-07'),
(5, 104, '2024-01-09'),
(6, 101, '2024-01-11'),
(7, 106, '2024-01-12'),
(8, 107, '2024-01-14'),
(9, 108, '2024-01-16'),
(10, 101, '2024-01-18');

INSERT INTO Customers (CustomerID, CustomerName)


VALUES
(101, 'Alice'),
(102, 'Bob'),
(103, 'Charlie'),
(104, 'David'),
(105, 'Eve'),
(106, 'Frank'),
(107, 'Grace'),
(108, 'Hannah');

Learnings
• INNER JOIN: Returns only rows with matching values from both tables.
• GROUP BY: Groups results based on a specific column, useful for aggregation.
• HAVING: Filters groups after aggregation (e.g., count > 1).
Solutions
• - PostgreSQL and MySQL solution
SELECT Customers.CustomerName, COUNT(Orders.OrderID) AS TotalOrders
FROM Orders
INNER JOIN Customers
ON Orders.CustomerID = Customers.CustomerID
GROUP BY Customers.CustomerName
HAVING COUNT(Orders.OrderID) > 1;
• Q.398
Problem statement
Write an SQL query to find the top 3 customers who have placed the most number of orders
in January 2024. Return the customer name, number of orders, and their rank based on the
total number of orders.
Explanation

491
1000+ SQL Interview Questions & Answers | By Zero Analyst

To solve this:
• Use an INNER JOIN between Customers and Orders.
• Filter orders placed in January 2024 using the WHERE clause.
• Use COUNT() to count the number of orders per customer.
• Use RANK() to rank customers based on the number of orders placed.
Datasets and SQL Schemas
• - Table creation and sample data
CREATE TABLE Orders (
OrderID INT PRIMARY KEY,
CustomerID INT,
OrderDate DATE
);

CREATE TABLE Customers (


CustomerID INT PRIMARY KEY,
CustomerName VARCHAR(100)
);

-- Sample data insertions


INSERT INTO Orders (OrderID, CustomerID, OrderDate)
VALUES
(1, 101, '2024-01-01'),
(2, 102, '2024-01-03'),
(3, 101, '2024-01-05'),
(4, 103, '2024-01-07'),
(5, 104, '2024-01-09'),
(6, 101, '2024-01-11'),
(7, 106, '2024-01-12'),
(8, 107, '2024-01-14'),
(9, 108, '2024-01-16'),
(10, 101, '2024-01-18'),
(11, 102, '2024-01-19'),
(12, 104, '2024-01-20'),
(13, 101, '2024-01-21'),
(14, 105, '2024-01-22'),
(15, 101, '2024-01-23');

INSERT INTO Customers (CustomerID, CustomerName)


VALUES
(101, 'Alice'),
(102, 'Bob'),
(103, 'Charlie'),
(104, 'David'),
(105, 'Eve'),
(106, 'Frank'),
(107, 'Grace'),
(108, 'Hannah');

Learnings
• INNER JOIN: To join data from both tables based on matching customer IDs.
• COUNT(): To count the number of orders per customer.
• RANK(): To assign ranks based on the number of orders placed.
• WHERE clause: To filter orders based on a specific date range.
Solutions
• - PostgreSQL and MySQL solution
WITH OrderCount AS (
SELECT Customers.CustomerName, COUNT(Orders.OrderID) AS TotalOrders
FROM Orders
INNER JOIN Customers
ON Orders.CustomerID = Customers.CustomerID
WHERE Orders.OrderDate BETWEEN '2024-01-01' AND '2024-01-31'
GROUP BY Customers.CustomerName

492
1000+ SQL Interview Questions & Answers | By Zero Analyst

)
SELECT CustomerName, TotalOrders,
RANK() OVER (ORDER BY TotalOrders DESC) AS Rank
FROM OrderCount
WHERE Rank <= 3;

Key Concepts:
• JOIN types (INNER, LEFT): Use to combine rows from multiple tables.
• GROUP BY: Group data based on specific column(s) for aggregation.
• COUNT(): Aggregate function to count rows.
• RANK(): Assigns a rank to each row based on an ordering condition.
• Q.399
Problem Statement
You are tasked with analyzing the bookings for an online platform across multiple countries.
The platform tracks bookings for properties, and you need to find the top 3 countries based
on the total booking amount for the year 2024, along with the average booking amount per
property in each country.
Additionally, you need to calculate the percentage contribution of each country to the total
booking amount for 2024.
Explanation
To solve this problem:
• Sum the total booking amount for each country for the year 2024.
• Calculate the average booking amount per property for each country.
• Calculate the percentage contribution of each country to the total booking amount for
2024.
• Rank the countries based on the total booking amount and return the top 3 countries.
You will use JOINs to link the booking data with country information, GROUP BY to
aggregate the results by country, and WINDOW FUNCTIONS to calculate percentage
contribution and rankings.
Datasets and SQL Schemas
• - Table creation and sample data
CREATE TABLE Bookings (
BookingID INT PRIMARY KEY,
PropertyID INT,
CountryID INT,
BookingDate DATE,
TotalAmount DECIMAL(10, 2)
);

CREATE TABLE Properties (


PropertyID INT PRIMARY KEY,
PropertyName VARCHAR(100),
CountryID INT
);

CREATE TABLE Countries (


CountryID INT PRIMARY KEY,
CountryName VARCHAR(100)
);

--
INSERT INTO Bookings (BookingID, PropertyID, CountryID, BookingDate, TotalAmount)
VALUES

493
1000+ SQL Interview Questions & Answers | By Zero Analyst

(1, 101, 1, '2024-01-01', 200.00),


(2, 102, 1, '2024-01-05', 300.00),
(3, 103, 2, '2024-02-10', 450.00),
(4, 104, 2, '2024-03-15', 500.00),
(5, 105, 3, '2024-04-20', 600.00),
(6, 106, 3, '2024-05-25', 700.00),
(7, 107, 1, '2024-06-30', 250.00),
(8, 108, 4, '2024-07-10', 800.00),
(9, 109, 4, '2024-08-05', 900.00),
(10, 110, 2, '2024-09-18', 350.00),
(11, 111, 1, '2024-10-10', 450.00),
(12, 112, 3, '2024-11-12', 800.00),
(13, 113, 4, '2024-12-25', 1000.00),
(14, 114, 3, '2024-12-29', 1100.00),
(15, 115, 1, '2024-12-31', 1500.00);

--
INSERT INTO Properties (PropertyID, PropertyName, CountryID)
VALUES
(101, 'Beach House', 1),
(102, 'Mountain Cabin', 1),
(103, 'City Apartment', 2),
(104, 'Lakefront Villa', 2),
(105, 'Country Inn', 3),
(106, 'Luxury Condo', 3),
(107, 'Eco Retreat', 1),
(108, 'Grand Resort', 4),
(109, 'Beachfront Villa', 4),
(110, 'Downtown Loft', 2),
(111, 'Mountain Lodge', 1),
(112, 'Seaside Cottage', 3),
(113, 'Vineyard Estate', 4),
(114, 'Forest Bungalow', 3),
(115, 'Skyline Penthouse', 1);

--
INSERT INTO Countries (CountryID, CountryName)
VALUES
(1, 'USA'),
(2, 'Canada'),
(3, 'Mexico'),
(4, 'France');

Learnings
• JOIN: You need to join the Bookings, Properties, and Countries tables on CountryID
and PropertyID.
• GROUP BY: To aggregate the total booking amount and average booking amount per
country.
• SUM(): To calculate the total booking amount per country.
• AVG(): To calculate the average booking amount per property in each country.
• RANK(): To rank countries based on the total booking amount.
• WINDOW FUNCTION: Use SUM() OVER to calculate the total bookings for percentage
contribution.

Solutions
PostgreSQL and MySQL solution
WITH CountryBookings AS (
SELECT
c.CountryName,
SUM(b.TotalAmount) AS TotalBookingAmount,
COUNT(DISTINCT p.PropertyID) AS TotalProperties,
AVG(b.TotalAmount) AS AvgBookingAmountPerProperty
FROM Bookings b
JOIN Properties p ON b.PropertyID = p.PropertyID

494
1000+ SQL Interview Questions & Answers | By Zero Analyst

JOIN Countries c ON p.CountryID = c.CountryID


WHERE EXTRACT(YEAR FROM b.BookingDate) = 2024
GROUP BY c.CountryName
),
TotalRevenue AS (
SELECT SUM(TotalBookingAmount) AS TotalRevenueFor2024
FROM CountryBookings
)
SELECT
cb.CountryName,
cb.TotalBookingAmount,
cb.AvgBookingAmountPerProperty,
ROUND((cb.TotalBookingAmount / tr.TotalRevenueFor2024) * 100, 2) AS PercentageContri
bution,
RANK() OVER (ORDER BY cb.TotalBookingAmount DESC) AS Rank
FROM CountryBookings cb
JOIN TotalRevenue tr ON 1=1
WHERE Rank <= 3
ORDER BY Rank;

Explanation:
• CountryBookings CTE (Common Table Expression):
• Joins Bookings, Properties, and Countries tables to get the total booking amount for
each country.
• It calculates the TotalBookingAmount (sum of TotalAmount) and the
AvgBookingAmountPerProperty (average of TotalAmount per property).
• Filters bookings for the year 2024 using EXTRACT(YEAR FROM b.BookingDate) = 2024.
• Groups the result by CountryName.
• TotalRevenue CTE:
• This calculates the TotalRevenueFor2024 (sum of all booking amounts for the year 2024)
to compute the percentage contribution of each country.
• Main Query:
• Joins the CountryBookings CTE with TotalRevenue to get the TotalBookingAmount
for each country and calculate the PercentageContribution of each country to the total
revenue for 2024.
• RANK() OVER: Ranks the countries based on their total booking amounts in descending
order.
• WHERE Rank <= 3: Filters to get the top 3 countries based on the booking amount.
• ROUND(): Rounds the percentage contribution to two decimal places.
• ORDER BY Rank: Orders the results by the rank in ascending order.

Expected Output Example:

CountryNa TotalBookingAm AvgBookingAmountPerPr PercentageContrib Ran


me ount operty ution k

USA 7500.00 937.50 35.71 1

France 3300.00 1100.00 15.71 2

Mexico 3200.00 800.00 15.24 3

495
1000+ SQL Interview Questions & Answers | By Zero Analyst

Key Concepts:
• JOINs: Combining data from multiple tables (Bookings, Properties, Countries).
• SUM() and AVG(): Aggregate functions to calculate total revenue and average booking
amounts.
• RANK(): Ranking countries based on total booking amounts.
• EXTRACT(): Function to extract the year from a date for filtering.
• Window functions: Used for ranking and percentage calculations.
• Q.400
Problem Statement
You are tasked with analyzing the cancellations of bookings in an online property platform.
The platform tracks bookings and cancellations, and you need to find the top 3 countries
where cancellations are most frequent. Along with this, you need to calculate the
cancellation rate for each country in the year 2024, based on the ratio of cancelled bookings
to total bookings for each country.
Additionally, the result should include the total cancellations, total bookings, and
cancellation rate for each country, and rank the countries by the cancellation rate in
descending order.
Explanation
To solve this:
• Identify the cancellations: Track bookings where cancellations are marked (there will be
a flag or status indicating cancellation).
• Count the total bookings and cancellations for each country in 2024.
• Calculate the cancellation rate as the ratio of cancellations to total bookings.
• Rank the countries by their cancellation rate and return the top 3.
You will need to use JOINs to link the booking data with cancellation information and
GROUP BY to aggregate results by country. Window functions will be used for ranking
countries based on cancellation rates.

Datasets and SQL Schemas


• - Table creation and sample data
CREATE TABLE Bookings (
BookingID INT PRIMARY KEY,
PropertyID INT,
CountryID INT,
BookingDate DATE,
TotalAmount DECIMAL(10, 2),
IsCancelled BOOLEAN
);

CREATE TABLE Properties (


PropertyID INT PRIMARY KEY,
PropertyName VARCHAR(100),
CountryID INT
);

CREATE TABLE Countries (


CountryID INT PRIMARY KEY,
CountryName VARCHAR(100)
);

--

496
1000+ SQL Interview Questions & Answers | By Zero Analyst

INSERT INTO Bookings (BookingID, PropertyID, CountryID, BookingDate, TotalAmount, IsCanc


elled)
VALUES
(1, 101, 1, '2024-01-01', 200.00, FALSE),
(2, 102, 1, '2024-01-05', 300.00, TRUE),
(3, 103, 2, '2024-02-10', 450.00, FALSE),
(4, 104, 2, '2024-03-15', 500.00, TRUE),
(5, 105, 3, '2024-04-20', 600.00, FALSE),
(6, 106, 3, '2024-05-25', 700.00, FALSE),
(7, 107, 1, '2024-06-30', 250.00, TRUE),
(8, 108, 4, '2024-07-10', 800.00, FALSE),
(9, 109, 4, '2024-08-05', 900.00, TRUE),
(10, 110, 2, '2024-09-18', 350.00, TRUE),
(11, 111, 1, '2024-10-10', 450.00, FALSE),
(12, 112, 3, '2024-11-12', 800.00, FALSE),
(13, 113, 4, '2024-12-25', 1000.00, TRUE),
(14, 114, 3, '2024-12-29', 1100.00, FALSE),
(15, 115, 1, '2024-12-31', 1500.00, TRUE);

--
INSERT INTO Properties (PropertyID, PropertyName, CountryID)
VALUES
(101, 'Beach House', 1),
(102, 'Mountain Cabin', 1),
(103, 'City Apartment', 2),
(104, 'Lakefront Villa', 2),
(105, 'Country Inn', 3),
(106, 'Luxury Condo', 3),
(107, 'Eco Retreat', 1),
(108, 'Grand Resort', 4),
(109, 'Beachfront Villa', 4),
(110, 'Downtown Loft', 2),
(111, 'Mountain Lodge', 1),
(112, 'Seaside Cottage', 3),
(113, 'Vineyard Estate', 4),
(114, 'Forest Bungalow', 3),
(115, 'Skyline Penthouse', 1);

--
INSERT INTO Countries (CountryID, CountryName)
VALUES
(1, 'USA'),
(2, 'Canada'),
(3, 'Mexico'),
(4, 'France');

Learnings
• JOIN: Joining multiple tables (Bookings, Properties, and Countries) based on CountryID
and PropertyID.
• COUNT(): Used to count the total number of bookings and cancellations.
• SUM(): To calculate the total cancellations in each country.
• GROUP BY: To aggregate the results by country.
• CASE WHEN: To count cancellations and non-cancellations.
• RANK(): Window function to rank countries by cancellation rate.

Solutions
PostgreSQL and MySQL Solution
WITH CountryCancellationStats AS (
SELECT
c.CountryName,
COUNT(b.BookingID) AS TotalBookings,
SUM(CASE WHEN b.IsCancelled = TRUE THEN 1 ELSE 0 END) AS TotalCancellations,

497
1000+ SQL Interview Questions & Answers | By Zero Analyst

ROUND(SUM(CASE WHEN b.IsCancelled = TRUE THEN 1 ELSE 0 END) * 100.0 / COUNT(b.Bo


okingID), 2) AS CancellationRate
FROM Bookings b
JOIN Properties p ON b.PropertyID = p.PropertyID
JOIN Countries c ON p.CountryID = c.CountryID
WHERE EXTRACT(YEAR FROM b.BookingDate) = 2024
GROUP BY c.CountryName
),
TotalBookings AS (
SELECT SUM(TotalBookings) AS TotalBookingsFor2024
FROM CountryCancellationStats
)
SELECT
ccs.CountryName,
ccs.TotalBookings,
ccs.TotalCancellations,
ccs.CancellationRate,
RANK() OVER (ORDER BY ccs.CancellationRate DESC) AS Rank
FROM CountryCancellationStats ccs
JOIN TotalBookings tb ON 1=1
WHERE ccs.CancellationRate > 0 -- Filter countries with non-zero cancellation rate
ORDER BY ccs.CancellationRate DESC
LIMIT 3;

Explanation:
• CountryCancellationStats CTE:
• Joins Bookings, Properties, and Countries tables to get the cancellation statistics for
each country in the year 2024.
• COUNT(b.BookingID) counts the total number of bookings for each country.
• SUM(CASE WHEN b.IsCancelled = TRUE THEN 1 ELSE 0 END) calculates the total
cancellations for each country.
• ROUND(SUM(CASE WHEN b.IsCancelled = TRUE THEN 1 ELSE 0 END) * 100.0 /
COUNT(b.BookingID), 2) calculates the cancellation rate (percentage of cancellations).
• TotalBookings CTE:
• This is used to get the total number of bookings for 2024 across all countries (though not
used in the final output, it could be used for a more detailed analysis of total bookings).
• Main Query:
• The RANK() window function is used to rank countries based on their cancellation rate in
descending order.
• Filters countries with cancellation rates greater than 0 to focus only on countries with
actual cancellations.
• The LIMIT 3 clause is used to return the top 3 countries with the highest cancellation
rates.
• Output:
• The output will include the CountryName, TotalBookings, TotalCancellations,
CancellationRate, and the Rank based on the cancellation rate.

Expected Output Example:

CountryName TotalBookings TotalCancellations CancellationRate Rank

USA 10 4 40.00 1

France 5 2 40.00 1

498
1000+ SQL Interview Questions & Answers | By Zero Analyst

Mexico 6 1 16.67 3
This query will return the top 3 countries with the highest cancellation rates for the year
2024, along with the total bookings, total cancellations, and cancellation rates.

Key Concepts:
• JOIN: To combine data from multiple tables (Bookings, Properties, Countries).
• COUNT() and SUM(): Aggregate functions to calculate total bookings and cancellations.
• CASE WHEN: For conditional counting (for cancellations).
• RANK(): To rank countries by their cancellation rate.
• EXTRACT(): Used to filter bookings from the year 2024.

Microsoft
• Q.401
Question
Find the top 3 customers who have spent the most on products from the 'Electronics'
category, but only count purchases made within the last 30 days. Consider both the price of
the product and the quantity purchased.
Explanation
You need to join the "customers," "orders," and "products" tables, filter for the "Electronics"
category, and calculate the total spend for each customer within the last 30 days. Rank the
customers based on their total spend and retrieve the top 3.
Datasets and SQL Schemas
• - Table creation and sample data
CREATE TABLE customers (
customer_id INT PRIMARY KEY,
customer_name VARCHAR(100)
);

CREATE TABLE products (


product_id INT PRIMARY KEY,
product_name VARCHAR(100),
category VARCHAR(50),
price DECIMAL(10, 2)
);

CREATE TABLE orders (


order_id INT PRIMARY KEY,
customer_id INT,
order_date DATE
);

CREATE TABLE order_items (


order_item_id INT PRIMARY KEY,
order_id INT,
product_id INT,
quantity INT,
FOREIGN KEY (order_id) REFERENCES orders(order_id),
FOREIGN KEY (product_id) REFERENCES products(product_id)
);

-- Sample data insertions


INSERT INTO customers (customer_id, customer_name)
VALUES

499
1000+ SQL Interview Questions & Answers | By Zero Analyst

(1, 'John Doe'),


(2, 'Jane Smith'),
(3, 'Alice Johnson');

INSERT INTO products (product_id, product_name, category, price)


VALUES
(1, 'Smartphone', 'Electronics', 500.00),
(2, 'Laptop', 'Electronics', 1000.00),
(3, 'Chair', 'Furniture', 150.00),
(4, 'Headphones', 'Electronics', 150.00);

INSERT INTO orders (order_id, customer_id, order_date)


VALUES
(1, 1, '2025-01-01'),
(2, 2, '2025-01-05'),
(3, 1, '2025-01-10'),
(4, 3, '2025-01-12');

INSERT INTO order_items (order_item_id, order_id, product_id, quantity)


VALUES
(1, 1, 1, 2),
(2, 2, 2, 1),
(3, 3, 1, 1),
(4, 3, 4, 2),
(5, 4, 4, 3);

Learnings
• Filtering data based on DATE conditions (last 30 days).
• Using JOINs to combine multiple tables.
• SUM() and GROUP BY to calculate the total spend for each customer.
• Using ROW_NUMBER() or RANK() to rank customers by their total spend.
Solutions
• - PostgreSQL solution
WITH recent_purchases AS (
SELECT o.customer_id, SUM(oi.quantity * p.price) AS total_spend
FROM orders o
JOIN order_items oi ON o.order_id = oi.order_id
JOIN products p ON oi.product_id = p.product_id
WHERE p.category = 'Electronics'
AND o.order_date >= CURRENT_DATE - INTERVAL '30 days'
GROUP BY o.customer_id
)
SELECT customer_id, total_spend
FROM recent_purchases
ORDER BY total_spend DESC
LIMIT 3;
• - MySQL solution
WITH recent_purchases AS (
SELECT o.customer_id, SUM(oi.quantity * p.price) AS total_spend
FROM orders o
JOIN order_items oi ON o.order_id = oi.order_id
JOIN products p ON oi.product_id = p.product_id
WHERE p.category = 'Electronics'
AND o.order_date >= CURDATE() - INTERVAL 30 DAY
GROUP BY o.customer_id
)
SELECT customer_id, total_spend
FROM recent_purchases
ORDER BY total_spend DESC
LIMIT 3;
• Q.402
Question
Write an SQL query to find the team size of each employee from the Employee table.

500
1000+ SQL Interview Questions & Answers | By Zero Analyst

Explanation
The task is to count the number of employees in each team and return that count for every
employee. This can be achieved by using a COUNT() aggregate function with a JOIN or a
GROUP BY clause.

Datasets and SQL Schemas


• - Table creation and sample data
CREATE TABLE Employee (
employee_id INT PRIMARY KEY,
team_id INT
);

-- Sample data insertions


INSERT INTO Employee (employee_id, team_id)
VALUES
(1, 8),
(2, 8),
(3, 8),
(4, 7),
(5, 9),
(6, 9);

Learnings
• Use of COUNT() to calculate team size.
• Understanding of JOIN or GROUP BY to aggregate data.
• Familiarity with how to display aggregated results for each row.
Solutions
• - PostgreSQL solution
SELECT e.employee_id, COUNT(*) AS team_size
FROM Employee e
JOIN Employee e2 ON e.team_id = e2.team_id
GROUP BY e.employee_id;
• - MySQL solution
SELECT e.employee_id, COUNT(*) AS team_size
FROM Employee e
JOIN Employee e2 ON e.team_id = e2.team_id
GROUP BY e.employee_id;
• Q.403
Question
Write an SQL query to find the customer_id from the Customer table who bought all the
products in the Product table.
Explanation
The query needs to find customers who have purchased every product listed in the Product
table. This can be done by counting the products each customer has bought and comparing it
with the total number of products in the Product table.
Datasets and SQL Schemas
• - Table creation and sample data
CREATE TABLE Customer (
customer_id INT,
product_key INT,
FOREIGN KEY (product_key) REFERENCES Product(product_key)
);

CREATE TABLE Product (


product_key INT PRIMARY KEY

501
1000+ SQL Interview Questions & Answers | By Zero Analyst

);

-- Sample data insertions


INSERT INTO Product (product_key) VALUES (5), (6);

INSERT INTO Customer (customer_id, product_key)


VALUES
(1, 5),
(2, 6),
(3, 5),
(3, 6),
(1, 6);

Learnings
• Use of COUNT() to check the number of products each customer bought.
• Understanding of GROUP BY and HAVING to filter customers who meet a certain condition.
• Subquery to compare counts and ensure all products are bought by the customer.
Solutions
• - PostgreSQL solution
SELECT customer_id
FROM Customer
GROUP BY customer_id
HAVING COUNT(DISTINCT product_key) = (SELECT COUNT(*) FROM Product);
• - MySQL solution
SELECT customer_id
FROM Customer
GROUP BY customer_id
HAVING COUNT(DISTINCT product_key) = (SELECT COUNT(*) FROM Product);
• Q.404
Question
Write an SQL query to find the person_name of the last person who can board the bus
without exceeding the weight limit of 1000 kilograms.
Explanation
The goal is to compute the cumulative weight of people boarding the bus in their given order
(defined by the turn column). The first person to exceed the weight limit should not be
included. To solve this, we can use a window function or a running total to check the
cumulative weight.
Datasets and SQL Schemas
• - Table creation and sample data
CREATE TABLE Queue (
person_id INT PRIMARY KEY,
person_name VARCHAR(100),
weight INT,
turn INT
);

INSERT INTO Queue (person_id, person_name, weight, turn)


VALUES
(5, 'Alice', 250, 1),
(4, 'Bob', 175, 5),
(3, 'Alex', 350, 2),
(6, 'John Cena', 400, 3),
(1, 'Winston', 500, 6),
(2, 'Marie', 200, 4);

Learnings
• Use of window functions like SUM() to calculate running totals.

502
1000+ SQL Interview Questions & Answers | By Zero Analyst

• The importance of filtering based on cumulative sums.


• Handling conditional logic for aggregate functions.
Solutions
• - PostgreSQL solution
WITH running_total AS (
SELECT person_name,
SUM(weight) OVER (ORDER BY turn) AS total_weight,
turn
FROM Queue
)
SELECT person_name
FROM running_total
WHERE total_weight <= 1000
ORDER BY turn DESC
LIMIT 1;
• - MySQL solution
WITH running_total AS (
SELECT person_name,
SUM(weight) OVER (ORDER BY turn) AS total_weight,
turn
FROM Queue
)
SELECT person_name
FROM running_total
WHERE total_weight <= 1000
ORDER BY turn DESC
LIMIT 1;
• Q.405
Question
Write an SQL query to find the person_name of the last person who can board the bus
without exceeding the weight limit of 1000 kilograms.
Explanation
The task is to compute the cumulative weight of people boarding the bus in the order of the
turn column. The goal is to identify the last person whose cumulative weight does not
exceed the 1000 kg limit. This can be achieved using a window function to calculate the
running total and then filtering based on that total.
Datasets and SQL Schemas
• - Table creation and sample data
CREATE TABLE Queue (
person_id INT PRIMARY KEY,
person_name VARCHAR(100),
weight INT,
turn INT
);

-- Sample data insertions


INSERT INTO Queue (person_id, person_name, weight, turn)
VALUES
(5, 'Alice', 250, 1),
(4, 'Bob', 175, 5),
(3, 'Alex', 350, 2),
(6, 'John Cena', 400, 3),
(1, 'Winston', 500, 6),
(2, 'Marie', 200, 4);

Learnings
• Use of window functions like SUM() to calculate a running total across rows.
• Understanding how to order rows with ORDER BY within window functions.

503
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Use of LIMIT to restrict the result to the last valid person.


Solutions
• - PostgreSQL solution
WITH running_total AS (
SELECT person_name,
SUM(weight) OVER (ORDER BY turn) AS total_weight,
turn
FROM Queue
)
SELECT person_name
FROM running_total
WHERE total_weight <= 1000
ORDER BY turn DESC
LIMIT 1;
• - MySQL solution
WITH running_total AS (
SELECT person_name,
SUM(weight) OVER (ORDER BY turn) AS total_weight,
turn
FROM Queue
)
SELECT person_name
FROM running_total
WHERE total_weight <= 1000
ORDER BY turn DESC
LIMIT 1;
• Q.406
Question
Write an SQL query to find the npv for each query in the Queries table based on the NPV
table.
Explanation
For each record in the Queries table, the corresponding npv from the NPV table should be
fetched based on matching id and year. A simple JOIN operation between the two tables will
achieve this, ensuring the correct npv value is returned for each query.
Datasets and SQL Schemas
• - Table creation and sample data
CREATE TABLE NPV (
id INT,
year INT,
npv INT,
PRIMARY KEY (id, year)
);

CREATE TABLE Queries (


id INT,
year INT,
PRIMARY KEY (id, year)
);

-- Sample data insertions


INSERT INTO NPV (id, year, npv)
VALUES
(1, 2018, 100),
(7, 2020, 30),
(13, 2019, 40),
(1, 2019, 113),
(2, 2008, 121),
(3, 2009, 12),
(11, 2020, 99),
(7, 2019, 0);

504
1000+ SQL Interview Questions & Answers | By Zero Analyst

INSERT INTO Queries (id, year)


VALUES
(1, 2019),
(2, 2008),
(3, 2009),
(7, 2018),
(7, 2019),
(7, 2020),
(13, 2019);

Learnings
• Use of JOIN to match rows from two tables based on common columns (id and year).
• Understanding of primary keys and how they help in efficiently joining tables.
• Basic knowledge of how to filter and return results from multiple tables.
Solutions
• - PostgreSQL solution
SELECT q.id, q.year, COALESCE(n.npv, 0) AS npv
FROM Queries q
LEFT JOIN NPV n ON q.id = n.id AND q.year = n.year;
• - MySQL solution
SELECT q.id, q.year, COALESCE(n.npv, 0) AS npv
FROM Queries q
LEFT JOIN NPV n ON q.id = n.id AND q.year = n.year;
• Q.407
Question
Write an SQL query to find the names of all activities that have neither the maximum nor the
minimum number of participants.
Explanation
The task is to identify activities that do not have the highest or lowest number of participants.
This can be achieved by first counting the number of participants for each activity, and then
filtering out those with the maximum and minimum participant counts.
Datasets and SQL Schemas
• - Table creation and sample data
CREATE TABLE Friends (
id INT PRIMARY KEY,
name VARCHAR(100),
activity VARCHAR(100)
);

CREATE TABLE Activities (


id INT PRIMARY KEY,
name VARCHAR(100)
);

-- Sample data insertions


INSERT INTO Friends (id, name, activity)
VALUES
(1, 'Jonathan D.', 'Eating'),
(2, 'Jade W.', 'Singing'),
(3, 'Victor J.', 'Singing'),
(4, 'Elvis Q.', 'Eating'),
(5, 'Daniel A.', 'Eating'),
(6, 'Bob B.', 'Horse Riding');

INSERT INTO Activities (id, name)


VALUES
(1, 'Eating'),
(2, 'Singing'),

505
1000+ SQL Interview Questions & Answers | By Zero Analyst

(3, 'Horse Riding');

Learnings
• Using COUNT() to count participants per activity.
• Filtering results based on aggregate values (maximum and minimum).
• Understanding how to exclude specific conditions with NOT IN.
Solutions
• - PostgreSQL solution
WITH activity_counts AS (
SELECT activity, COUNT(*) AS participant_count
FROM Friends
GROUP BY activity
),
min_max_counts AS (
SELECT MIN(participant_count) AS min_participants,
MAX(participant_count) AS max_participants
FROM activity_counts
)
SELECT ac.activity
FROM activity_counts ac
JOIN min_max_counts mm ON ac.participant_count NOT IN (mm.min_participants,
mm.max_participants);
• - MySQL solution
WITH activity_counts AS (
SELECT activity, COUNT(*) AS participant_count
FROM Friends
GROUP BY activity
),
min_max_counts AS (
SELECT MIN(participant_count) AS min_participants,
MAX(participant_count) AS max_participants
FROM activity_counts
)
SELECT ac.activity
FROM activity_counts ac
JOIN min_max_counts mm ON ac.participant_count NOT IN (mm.min_participants,
mm.max_participants);
• Q.408
Question
Write an SQL query to find the countries where the average call duration is strictly greater
than the global average call duration.
Explanation
To solve this problem:
• Calculate the global average call duration from the Calls table.
• Calculate the average call duration for each country, using the country code derived from
the caller's phone number.
• Filter out the countries where the average call duration is greater than the global average.
Datasets and SQL Schemas
• - Table creation and sample data
CREATE TABLE Person (
id INT PRIMARY KEY,
name VARCHAR(100),
phone_number VARCHAR(15)
);

CREATE TABLE Country (


name VARCHAR(100),

506
1000+ SQL Interview Questions & Answers | By Zero Analyst

country_code VARCHAR(3) PRIMARY KEY


);

CREATE TABLE Calls (


caller_id INT,
callee_id INT,
duration INT
);

INSERT INTO Person (id, name, phone_number)


VALUES
(1, 'Alice', '001-2345678'),
(2, 'Bob', '002-3456789'),
(3, 'Charlie', '001-3456789'),
(4, 'David', '003-4567890'),
(5, 'Eva', '004-5678901'),
(6, 'Frank', '002-6789012'),
(7, 'Grace', '001-2340000'),
(8, 'Hannah', '003-4567000'),
(9, 'Ivy', '004-5670000'),
(10, 'Jack', '001-2341111');

INSERT INTO Country (name, country_code)


VALUES
('United States', '001'),
('United Kingdom', '002'),
('Germany', '003'),
('France', '004');

INSERT INTO Calls (caller_id, callee_id, duration)


VALUES
(1, 2, 10),
(1, 3, 20),
(2, 1, 15),
(3, 1, 25),
(4, 1, 30),
(5, 6, 12),
(6, 5, 18),
(7, 1, 22),
(8, 1, 28),
(9, 10, 16),
(10, 9, 14),
(1, 7, 11),
(3, 8, 24),
(2, 6, 19),
(6, 4, 15);

Learnings
• The use of JOIN to combine multiple tables for related information.
• Aggregating data with AVG() to compute the average call duration for each country.
• Filtering results using subqueries and comparison operators (>) to compare with the global
average.
Solutions
• - PostgreSQL solution
WITH country_avg AS (
SELECT c.name AS country_name,
AVG(call.duration) AS avg_duration
FROM Calls call
JOIN Person p1 ON call.caller_id = p1.id
JOIN Person p2 ON call.callee_id = p2.id
JOIN Country c ON SUBSTRING(p1.phone_number FROM 1 FOR 3) = c.country_code
GROUP BY c.name
),
global_avg AS (
SELECT AVG(duration) AS avg_duration
FROM Calls
)

507
1000+ SQL Interview Questions & Answers | By Zero Analyst

SELECT ca.country_name
FROM country_avg ca, global_avg ga
WHERE ca.avg_duration > ga.avg_duration;
• - MySQL solution
WITH country_avg AS (
SELECT c.name AS country_name,
AVG(call.duration) AS avg_duration
FROM Calls call
JOIN Person p1 ON call.caller_id = p1.id
JOIN Person p2 ON call.callee_id = p2.id
JOIN Country c ON SUBSTRING(p1.phone_number, 1, 3) = c.country_code
GROUP BY c.name
),
global_avg AS (
SELECT AVG(duration) AS avg_duration
FROM Calls
)
SELECT ca.country_name
FROM country_avg ca, global_avg ga
WHERE ca.avg_duration > ga.avg_duration;

Explanation of the Sample Data


• Person Table: Contains a list of individuals with their corresponding phone numbers and
IDs.
• Country Table: Contains the country names and their respective country codes.
• Calls Table: Logs phone call durations between individuals, identified by their caller_id
and callee_id. This data is used to calculate the average call duration per country.
The phone number's country code (the first 3 digits) is used to determine the country for each
call.
• Q.409
Question
Write an SQL query to find the median population of the countries from the Country table.
Explanation
To find the median population of countries:
• Sort the populations in ascending order.
• If the number of rows is odd, the median is the middle value.
• If the number of rows is even, the median is the average of the two middle values.
• You can use ROW_NUMBER() or RANK() to assign row numbers, and then identify the middle
row(s).
Datasets and SQL Schemas
• - Table creation and sample data
CREATE TABLE Country (
id INT PRIMARY KEY,
name VARCHAR(100),
population INT
);

-- Sample data insertions


INSERT INTO Country (id, name, population)
VALUES
(1, 'United States', 331002651),
(2, 'India', 1380004385),
(3, 'China', 1439323776),
(4, 'Brazil', 212559417),
(5, 'Russia', 145934462),
(6, 'Mexico', 128933395),

508
1000+ SQL Interview Questions & Answers | By Zero Analyst

(7, 'Japan', 126476461),


(8, 'Germany', 83783942),
(9, 'United Kingdom', 67886011),
(10, 'France', 65273511);

Learnings
• Using ROW_NUMBER() or RANK() window functions to assign ranks to rows.
• Calculating median by identifying the middle row(s) based on row numbers.
• Handling even vs. odd cases for median calculation.
Solutions
• - PostgreSQL solution
WITH ranked_countries AS (
SELECT population,
ROW_NUMBER() OVER (ORDER BY population) AS row_num,
COUNT(*) OVER () AS total_rows
FROM Country
)
SELECT AVG(population) AS median_population
FROM ranked_countries
WHERE row_num IN (
(total_rows + 1) / 2,
(total_rows + 2) / 2
);
• - MySQL solution
WITH ranked_countries AS (
SELECT population,
ROW_NUMBER() OVER (ORDER BY population) AS row_num,
COUNT(*) OVER () AS total_rows
FROM Country
)
SELECT AVG(population) AS median_population
FROM ranked_countries
WHERE row_num IN (
(total_rows + 1) / 2,
(total_rows + 2) / 2
);

Explanation of the Solution


• The ROW_NUMBER() function is used to assign a sequential number to each row based on the
population in ascending order.
• The COUNT() function provides the total number of rows in the dataset.
• Depending on whether the total number of rows is even or odd, the query selects the
appropriate row(s) to compute the median:
• If the total number of rows is odd, it selects the middle row.
• If even, it averages the two middle rows.
• Q.410

Question
Write an SQL query to convert the number of users from thousands to millions, rounding the
result to two decimal places, and append "M" to the result.
Example Output:

Country Users (in Millions)

India 2.54M

509
1000+ SQL Interview Questions & Answers | By Zero Analyst

China 13.90M

United States 3.31M

Indonesia 2.71M

Brazil 2.12M

Explanation
You need to convert the "Number of Users" from thousands to millions by dividing by 1,000,
then round the result to two decimal places, and finally append "M" to indicate millions.

Datasets
Table creation and sample data:
CREATE TABLE Users (
Country VARCHAR(100),
NumberOfUsers INT
);

-- Sample data insertions


INSERT INTO Users (Country, NumberOfUsers)
VALUES
('India', 2543500),
('China', 13900000),
('United States', 3310000),
('Indonesia', 2710000),
('Brazil', 2120000),
('Pakistan', 1670000),
('Nigeria', 2060000),
('Bangladesh', 1680000),
('Russia', 1450000),
('Mexico', 1280000),
('Japan', 1260000),
('Ethiopia', 1150000),
('Philippines', 1120000),
('Egypt', 1040000),
('Vietnam', 980000),
('DR Congo', 900000),
('Turkey', 850000),
('Iran', 770000),
('Germany', 830000),
('Thailand', 750000),
('United Kingdom', 680000),
('France', 650000),
('South Africa', 550000),
('Italy', 450000),
('South Korea', 420000);

Learnings
• Use of SQL functions for mathematical calculations (division, rounding).
• String concatenation to append units (e.g., "M").
• Rounding numbers to a specified decimal place in SQL.

510
1000+ SQL Interview Questions & Answers | By Zero Analyst

Solutions
MySQL solution:
SELECT Country,
CONCAT(ROUND(NumberOfUsers / 1000000, 2), 'M') AS UsersInMillions
FROM Users;

PostgreSQL solution:
SELECT Country,
ROUND(NumberOfUsers / 1000000.0, 2) || 'M' AS UsersInMillions
FROM Users;

• Q.411
Question
Find the top 3 highest-paid employees in each department from the Employee table.

Explanation
To solve this, you need to rank employees within each department based on their salary and
return the top 3 for each department. This can be achieved using window functions like
ROW_NUMBER() or RANK().

Datasets
Table creation and sample data:
CREATE TABLE Employee (
id INT,
name VARCHAR(50),
department VARCHAR(50),
salary INT
);

-- Sample data insertions


INSERT INTO Employee (id, name, department, salary)
VALUES
(1, 'Alice', 'Engineering', 90000),
(2, 'Bob', 'Engineering', 85000),
(3, 'Charlie', 'Engineering', 95000),
(4, 'David', 'Marketing', 80000),
(5, 'Eva', 'Marketing', 75000),
(6, 'Frank', 'Marketing', 78000),
(7, 'Grace', 'Sales', 70000),
(8, 'Hank', 'Sales', 72000),
(9, 'Ivy', 'Sales', 75000);

Learnings
• Use of window functions (ROW_NUMBER() or RANK()) to rank results.
• Partitioning results based on a group (e.g., department).
• Filtering top N results using window functions.

511
1000+ SQL Interview Questions & Answers | By Zero Analyst

Solutions
MySQL solution:
WITH RankedEmployees AS (
SELECT id, name, department, salary,
ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS rank
FROM Employee
)
SELECT id, name, department, salary
FROM RankedEmployees
WHERE rank <= 3;

PostgreSQL solution:
WITH RankedEmployees AS (
SELECT id, name, department, salary,
ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS rank
FROM Employee
)
SELECT id, name, department, salary
FROM RankedEmployees
WHERE rank <= 3;
• Q.412
Question
Write an SQL query to calculate the cumulative salary total for each department,
ordered by employee salary.

Explanation
You need to calculate the running total of salary for each department, ordered by the salary of
employees. This can be done using the window function SUM() with the OVER() clause.

Datasets
Table creation and sample data:
CREATE TABLE Employee (
id INT,
name VARCHAR(50),
department VARCHAR(50),
salary INT
);

-- Sample data insertions


INSERT INTO Employee (id, name, department, salary)
VALUES
(1, 'Alice', 'Engineering', 90000),
(2, 'Bob', 'Engineering', 85000),
(3, 'Charlie', 'Engineering', 95000),
(4, 'David', 'Marketing', 80000),
(5, 'Eva', 'Marketing', 75000),
(6, 'Frank', 'Marketing', 78000);

Learnings
• Use of window functions to calculate cumulative totals (SUM()).
• Ordering and partitioning results to apply calculations across different groups.

512
1000+ SQL Interview Questions & Answers | By Zero Analyst

Solutions
MySQL solution:
SELECT id, name, department, salary,
SUM(salary) OVER (PARTITION BY department ORDER BY salary) AS cumulative_salary
FROM Employee;

PostgreSQL solution:
SELECT id, name, department, salary,
SUM(salary) OVER (PARTITION BY department ORDER BY salary) AS cumulative_salary
FROM Employee;
• Q.413
Question
Write an SQL query to find the number of employees in each department who have a
salary greater than the average salary of their department.

Explanation
You need to compare each employee's salary with the average salary of their department.
This can be done using a subquery to first calculate the average salary per department and
then comparing each employee’s salary to this average.

Datasets
Table creation and sample data:
CREATE TABLE Employee (
id INT,
name VARCHAR(50),
department VARCHAR(50),
salary INT
);

-- Sample data insertions


INSERT INTO Employee (id, name, department, salary)
VALUES
(1, 'Alice', 'Engineering', 90000),
(2, 'Bob', 'Engineering', 85000),
(3, 'Charlie', 'Engineering', 95000),
(4, 'David', 'Marketing', 80000),
(5, 'Eva', 'Marketing', 75000),
(6, 'Frank', 'Marketing', 78000);

Learnings
• Use of subqueries to calculate aggregates (like average salary).
• Filtering based on comparison with aggregates.
• Grouping results by department.

Solutions
MySQL solution:
SELECT department, COUNT(*) AS num_employees_above_avg
FROM Employee e

513
1000+ SQL Interview Questions & Answers | By Zero Analyst

WHERE salary > (SELECT AVG(salary) FROM Employee WHERE department = e.department)
GROUP BY department;

PostgreSQL solution:
SELECT department, COUNT(*) AS num_employees_above_avg
FROM Employee e
WHERE salary > (SELECT AVG(salary) FROM Employee WHERE department = e.department)
GROUP BY department;
• Q.414
Question
Find the second most expensive product in each category from the Products table.

Explanation
To find the second most expensive product in each category, you can use a subquery or
window functions like RANK() to rank products within each category by price and then filter
out the highest price.

Datasets
Table creation and sample data:
CREATE TABLE Products (
product_id INT,
category VARCHAR(50),
product_name VARCHAR(100),
price DECIMAL(10, 2)
);

INSERT INTO Products (product_id, category, product_name, price)


VALUES
(1, 'Electronics', 'Smartphone', 500),
(2, 'Electronics', 'Laptop', 1500),
(3, 'Electronics', 'Tablet', 800),
(4, 'Home Appliances', 'Blender', 120),
(5, 'Home Appliances', 'Microwave', 200),
(6, 'Home Appliances', 'Toaster', 70),
(7, 'Furniture', 'Sofa', 800),
(8, 'Furniture', 'Coffee Table', 150),
(9, 'Furniture', 'Chair', 120);

Learnings
• Use of window functions like RANK() or ROW_NUMBER() to rank products within categories.
• Subqueries to filter based on rank or price comparison.

Solutions
MySQL solution:
WITH RankedProducts AS (
SELECT product_id, category, product_name, price,
RANK() OVER (PARTITION BY category ORDER BY price DESC) AS rank
FROM Products
)
SELECT product_id, category, product_name, price
FROM RankedProducts

514
1000+ SQL Interview Questions & Answers | By Zero Analyst

WHERE rank = 2;

PostgreSQL solution:
WITH RankedProducts AS (
SELECT product_id, category, product_name, price,
RANK() OVER (PARTITION BY category ORDER BY price DESC) AS rank
FROM Products
)
SELECT product_id, category, product_name, price
FROM RankedProducts
WHERE rank = 2;

• Q.415

Question 2
Write an SQL query to count the number of orders placed by each customer, including
those who haven't placed any orders.

Explanation
To count the number of orders for each customer, including customers without any orders,
you will need to perform a LEFT JOIN between the Customers and Orders tables and then
count the orders per customer.

Datasets
Table creation and sample data:
CREATE TABLE Customers (
customer_id INT,
customer_name VARCHAR(100)
);

CREATE TABLE Orders (


order_id INT,
customer_id INT,
order_date DATE
);

-- Sample data insertions


INSERT INTO Customers (customer_id, customer_name)
VALUES
(1, 'Alice'),
(2, 'Bob'),
(3, 'Charlie');

INSERT INTO Orders (order_id, customer_id, order_date)


VALUES
(1, 1, '2022-01-10'),
(2, 1, '2022-02-10'),
(3, 2, '2022-03-15');

Learnings
• LEFT JOIN to include rows even when there’s no matching data.
• Using COUNT() with GROUP BY to aggregate data.

515
1000+ SQL Interview Questions & Answers | By Zero Analyst

Solutions
MySQL solution:
SELECT c.customer_id, c.customer_name, COUNT(o.order_id) AS order_count
FROM Customers c
LEFT JOIN Orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id;

PostgreSQL solution:
SELECT c.customer_id, c.customer_name, COUNT(o.order_id) AS order_count
FROM Customers c
LEFT JOIN Orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id;
• Q.416
Question
Write an SQL query to find the employees who have the highest salary in their
department but are not the highest paid overall.

Explanation
To solve this, you need to find the highest salary in each department, then compare it to the
highest salary overall to filter out the top earner.

Datasets
Table creation and sample data:
CREATE TABLE Employees (
employee_id INT,
department VARCHAR(50),
employee_name VARCHAR(100),
salary DECIMAL(10, 2)
);

-- Sample data insertions


INSERT INTO Employees (employee_id, department, employee_name, salary)
VALUES
(1, 'Engineering', 'Alice', 120000),
(2, 'Engineering', 'Bob', 100000),
(3, 'Marketing', 'Charlie', 80000),
(4, 'Marketing', 'David', 95000),
(5, 'Sales', 'Eva', 110000),
(6, 'Sales', 'Frank', 115000);

Learnings
• Subqueries to find the maximum salary in a department.
• Using MAX() to find the top salary overall and then applying conditions.

Solutions
MySQL solution:

516
1000+ SQL Interview Questions & Answers | By Zero Analyst

SELECT employee_id, department, employee_name, salary


FROM Employees e
WHERE salary = (SELECT MAX(salary) FROM Employees WHERE department = e.department)
AND salary < (SELECT MAX(salary) FROM Employees);

PostgreSQL solution:
SELECT employee_id, department, employee_name, salary
FROM Employees e
WHERE salary = (SELECT MAX(salary) FROM Employees WHERE department = e.department)
AND salary < (SELECT MAX(salary) FROM Employees);
• Q.417
Question
Write an SQL query to report the latest login for all users in the year 2020. Do not include
users who did not log in in 2020.
The query result should look like this:

Explanation
You need to filter out only the users who logged in during the year 2020, then for each of
these users, find the latest login timestamp within that year. The result should be formatted in
the desired output format, with a Z appended to the timestamp for UTC timezone.

Datasets
Table creation and sample data:
CREATE TABLE Logins (
user_id INT,
time_stamp DATETIME,
PRIMARY KEY (user_id, time_stamp)
);

INSERT INTO Logins (user_id, time_stamp)


VALUES
(6, '2020-06-30 15:06:07'),
(6, '2021-04-21 14:06:06'),
(6, '2019-03-07 00:18:15'),
(8, '2020-02-01 05:10:53'),
(8, '2020-12-30 00:46:50'),
(2, '2020-01-16 02:49:50'),
(2, '2019-08-25 07:59:08'),
(14, '2019-07-14 09:00:00'),
(14, '2021-01-06 11:59:59');

Learnings
• Using YEAR() function to filter rows by year.
• Using GROUP BY to group data by user_id.
• Using MAX() to select the latest timestamp.
• Formatting the datetime output.

Solutions
MySQL solution:
SELECT user_id,

517
1000+ SQL Interview Questions & Answers | By Zero Analyst

DATE_FORMAT(MAX(time_stamp), '%Y-%m-%dT%H:%i:%sZ') AS last_stamp


FROM Logins
WHERE YEAR(time_stamp) = 2020
GROUP BY user_id;

PostgreSQL solution:
SELECT user_id,
TO_CHAR(MAX(time_stamp), 'YYYY-MM-DD"T"HH24:MI:SS"Z"') AS last_stamp
FROM Logins
WHERE EXTRACT(YEAR FROM time_stamp) = 2020
GROUP BY user_id;
• Q.418
Question
Write an SQL query to find managers with at least five direct reports.
Return the result table in any order.

Explanation
To solve this, we need to count how many direct reports each manager has. This can be done
by grouping by the managerId and counting how many employees are associated with each
managerId. Then, we filter out managers who have fewer than five direct reports.

Datasets
Table creation and sample data:
CREATE TABLE Employee (
id INT,
name VARCHAR(100),
department VARCHAR(50),
managerId INT,
PRIMARY KEY (id)
);

INSERT INTO Employee (id, name, department, managerId)


VALUES
(101, 'John', 'A', NULL),
(102, 'Dan', 'A', 101),
(103, 'James', 'A', 101),
(104, 'Amy', 'A', 101),
(105, 'Anne', 'A', 101),
(106, 'Ron', 'B', 101);

Learnings
• Grouping data by manager using managerId.
• Counting direct reports using COUNT().
• Filtering based on conditions using HAVING.

Solutions
MySQL solution:
SELECT e.name
FROM Employee e

518
1000+ SQL Interview Questions & Answers | By Zero Analyst

JOIN Employee sub ON e.id = sub.managerId


GROUP BY e.id
HAVING COUNT(sub.id) >= 5;

PostgreSQL solution:
SELECT e.name
FROM Employee e
JOIN Employee sub ON e.id = sub.managerId
GROUP BY e.id
HAVING COUNT(sub.id) >= 5;
• Q.419
Question
Write an SQL query to find the confirmation rate of each user. The confirmation rate of a
user is calculated as the number of 'confirmed' actions divided by the total number of
confirmation requests (both 'confirmed' and 'timeout') for that user. If the user did not request
any confirmation messages, their confirmation rate should be 0. Round the confirmation rate
to two decimal places.

Explanation
To calculate the confirmation rate for each user, we need to:
• Count the total number of confirmation requests for each user (both 'confirmed' and
'timeout').
• Count the number of 'confirmed' requests for each user.
• Divide the number of 'confirmed' requests by the total number of confirmation requests.
• For users with no confirmation requests, the rate will be 0.
• Round the confirmation rate to two decimal places.
We'll join the Signups and Confirmations tables based on user_id, and use aggregate
functions to calculate the total and confirmed counts for each user.

Datasets
Table creation and sample data:
CREATE TABLE Signups (
user_id INT PRIMARY KEY,
time_stamp DATETIME
);

CREATE TABLE Confirmations (


user_id INT,
time_stamp DATETIME,
action ENUM('confirmed', 'timeout'),
PRIMARY KEY (user_id, time_stamp),
FOREIGN KEY (user_id) REFERENCES Signups(user_id)
);

-- Sample data insertions


INSERT INTO Signups (user_id, time_stamp)
VALUES
(3, '2020-03-21 10:16:13'),
(7, '2020-01-04 13:57:59'),
(2, '2020-07-29 23:09:44'),
(6, '2020-12-09 10:39:37');

INSERT INTO Confirmations (user_id, time_stamp, action)

519
1000+ SQL Interview Questions & Answers | By Zero Analyst

VALUES
(3, '2021-01-06 03:30:46', 'timeout'),
(3, '2021-07-14 14:00:00', 'timeout'),
(7, '2021-06-12 11:57:29', 'confirmed'),
(7, '2021-06-13 12:58:28', 'confirmed'),
(7, '2021-06-14 13:59:27', 'confirmed'),
(2, '2021-01-22 00:00:00', 'confirmed'),
(2, '2021-02-28 23:59:59', 'timeout');

Learnings
• Use of JOIN to combine data from multiple tables.
• Use of conditional aggregation with COUNT() to filter the confirmation actions.
• Handling division and conditional logic (e.g., dividing by zero when there are no requests).
• Rounding to a specified number of decimal places using SQL functions.

Solutions
MySQL solution:
SELECT s.user_id,
ROUND(
IFNULL(SUM(CASE WHEN c.action = 'confirmed' THEN 1 ELSE 0 END), 0)
/ IFNULL(COUNT(c.action), 1), 2) AS confirmation_rate
FROM Signups s
LEFT JOIN Confirmations c ON s.user_id = c.user_id
GROUP BY s.user_id;

Explanation:
• We join the Signups table with the Confirmations table using a LEFT JOIN to include
users who didn't request any confirmation messages.
• We count the number of 'confirmed' actions using a CASE statement inside the SUM()
function.
• We use COUNT(c.action) to calculate the total number of confirmation messages (both
'confirmed' and 'timeout').
• The IFNULL() function is used to handle cases where there are no confirmation requests by
returning 0 for the confirmed count and 1 for the total count.
• Finally, the ROUND() function is used to round the result to two decimal places.
PostgreSQL solution:
SELECT s.user_id,
ROUND(
COALESCE(SUM(CASE WHEN c.action = 'confirmed' THEN 1 ELSE 0 END), 0)
/ COALESCE(COUNT(c.action), 1), 2) AS confirmation_rate
FROM Signups s
LEFT JOIN Confirmations c ON s.user_id = c.user_id
GROUP BY s.user_id;

Explanation:
• Similar to the MySQL solution, but uses COALESCE() instead of IFNULL() to handle null
values.

• Q.420
Question

520
1000+ SQL Interview Questions & Answers | By Zero Analyst

Write an SQL query to find all possible pairs of users from the Signups table where the
user_id of the first user is less than the user_id of the second user. For each pair, return the
user_id of both users along with the difference in their signup timestamps in seconds. Only
include pairs where the time difference between their signups is less than 1 year.

Explanation
This problem requires:
• Cross Join: To generate all possible combinations of users from the Signups table.
• Time Difference Calculation: To calculate the time difference between the two users'
signup timestamps in seconds.
• Filtering: To include only those pairs where the time difference is less than 1 year.
• Condition on user_id: Ensure that the first user_id is less than the second to avoid
duplicate pairs (e.g., (user_1, user_2) and (user_2, user_1)).

Datasets
Table creation and sample data:
CREATE TABLE Signups (
user_id INT PRIMARY KEY,
time_stamp DATETIME
);

INSERT INTO Signups (user_id, time_stamp)


VALUES
(1, '2020-01-01 10:00:00'),
(2, '2020-06-15 15:30:00'),
(3, '2021-02-01 09:00:00'),
(4, '2021-10-10 12:00:00'),
(5, '2022-04-25 14:45:00');

Learnings
• Cross Join to generate all possible pairs of rows.
• Date Difference: Using TIMESTAMPDIFF() or equivalent functions to calculate the
difference between timestamps.
• Filtering pairs based on conditions.
• Order of user_id: Ensuring we only return pairs where user_id_1 < user_id_2.

Solutions
MySQL solution:
SELECT s1.user_id AS user_id_1,
s2.user_id AS user_id_2,
TIMESTAMPDIFF(SECOND, s1.time_stamp, s2.time_stamp) AS time_diff_seconds
FROM Signups s1
CROSS JOIN Signups s2
WHERE s1.user_id < s2.user_id
AND TIMESTAMPDIFF(SECOND, s1.time_stamp, s2.time_stamp) < 60 * 60 * 24 * 365
ORDER BY user_id_1, user_id_2;

Explanation:

521
1000+ SQL Interview Questions & Answers | By Zero Analyst

• We use a CROSS JOIN to generate all possible pairs of users from the Signups table.
• The TIMESTAMPDIFF() function calculates the difference between the two time_stamp
values in seconds.
• The WHERE clause ensures:
• Only pairs where user_id_1 < user_id_2 are considered.
• Only pairs where the time difference is less than 1 year (365 days, expressed in seconds).
• The result is ordered by user_id_1 and user_id_2 to ensure consistent pairing.
PostgreSQL solution:
SELECT s1.user_id AS user_id_1,
s2.user_id AS user_id_2,
EXTRACT(EPOCH FROM s2.time_stamp - s1.time_stamp) AS time_diff_seconds
FROM Signups s1
CROSS JOIN Signups s2
WHERE s1.user_id < s2.user_id
AND EXTRACT(EPOCH FROM s2.time_stamp - s1.time_stamp) < 60 * 60 * 24 * 365
ORDER BY user_id_1, user_id_2;

Explanation:
• In PostgreSQL, we use EXTRACT(EPOCH FROM ...) to calculate the time difference
between the two timestamps in seconds.
• The rest of the query structure is similar to MySQL, ensuring the correct ordering and
filtering.

Meta
• Q.421
Question
Write an SQL query to calculate the overall acceptance rate of friend requests. The
acceptance rate is the number of accepted requests divided by the total number of requests.
Return the result rounded to two decimal places.
• The FriendRequest table contains the ID of the user who sent the request, the ID of the
user who received the request, and the date of the request.
• The RequestAccepted table contains the ID of the user who sent the request, the ID of the
user who received the request, and the date when the request was accepted.
• If there are no requests at all, return an acceptance rate of 0.00.
Explanation
To solve this:
• We need to count the total number of unique requests in the FriendRequest table. This
involves counting distinct pairs of sender and receiver (i.e., sender_id and send_to_id).
• We need to count the total number of unique accepted requests from the RequestAccepted
table, which is again based on distinct sender and receiver pairs (i.e., requester_id and
accepter_id).
• The acceptance rate is the ratio of accepted requests to total requests, and we return this
value rounded to two decimal places.
• If no requests exist, we return 0.00.
Datasets and SQL Schemas
• - Table creation and sample data
CREATE TABLE FriendRequest (

522
1000+ SQL Interview Questions & Answers | By Zero Analyst

sender_id INT,
send_to_id INT,
request_date DATE
);

CREATE TABLE RequestAccepted (


requester_id INT,
accepter_id INT,
accept_date DATE
);

-- Sample data insertions for FriendRequest


INSERT INTO FriendRequest (sender_id, send_to_id, request_date)
VALUES
(1, 2, '2016-06-01'),
(1, 3, '2016-06-01'),
(1, 4, '2016-06-01'),
(2, 3, '2016-06-02'),
(3, 4, '2016-06-09');

-- Sample data insertions for RequestAccepted


INSERT INTO RequestAccepted (requester_id, accepter_id, accept_date)
VALUES
(1, 2, '2016-06-03'),
(1, 3, '2016-06-08'),
(2, 3, '2016-06-08'),
(3, 4, '2016-06-09'),
(3, 4, '2016-06-10');

Learnings
• COUNT(DISTINCT) to count unique combinations of sender and receiver, ensuring that
duplicates are not counted.
• JOIN between tables may be needed to match accepted requests and sent requests.
• ROUND() to round the final acceptance rate to two decimal places.
• Handling cases where there may be no requests at all by returning a default value of 0.00.
Solutions
• - PostgreSQL and MySQL solution
SELECT
ROUND(
IFNULL(
(SELECT COUNT(DISTINCT requester_id, accepter_id)
FROM RequestAccepted), 0)
/
IFNULL(
(SELECT COUNT(DISTINCT sender_id, send_to_id)
FROM FriendRequest), 0),
2) AS unique_accepted_request;

Explanation:
• COUNT(DISTINCT requester_id, accepter_id): This counts the number of unique
accepted requests, considering only distinct pairs of requesters and accepters.
• COUNT(DISTINCT sender_id, send_to_id): This counts the number of unique requests
in the FriendRequest table.
• IFNULL: This function ensures that in case there are no records in either table (to avoid
division by zero), we return 0. For MySQL, IFNULL() is used; for PostgreSQL, you can use
COALESCE().
• ROUND(): The result is rounded to two decimal places to match the required output
format.
This query returns the acceptance rate of requests, rounded to two decimal places, and
handles edge cases where no requests or acceptances exist.

523
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Q.422
Question
Write an SQL query to find the number of users who signed up each month in 2020. Return
the result with the month and the corresponding count of users.

Explanation
You need to group the users by their signup month and count the number of signups in each
month for the year 2020. Use the MONTH() function to extract the month from the
time_stamp field and filter only for the year 2020.

Datasets
Table creation and sample data:
CREATE TABLE Signups (
user_id INT PRIMARY KEY,
time_stamp DATETIME
);

INSERT INTO Signups (user_id, time_stamp)


VALUES
(1, '2020-01-10 09:00:00'),
(2, '2020-02-15 10:30:00'),
(3, '2020-02-20 12:45:00'),
(4, '2020-03-03 14:00:00'),
(5, '2020-03-12 16:20:00'),
(6, '2020-05-25 11:30:00'),
(7, '2020-07-01 13:10:00'),
(8, '2020-09-05 08:20:00');

Learnings
• Use of MONTH() and YEAR() functions to extract date parts.
• Grouping by extracted date parts to aggregate data.
• Filtering data by specific year.

Solutions
MySQL solution:
SELECT MONTH(time_stamp) AS month, COUNT(user_id) AS user_count
FROM Signups
WHERE YEAR(time_stamp) = 2020
GROUP BY MONTH(time_stamp)
ORDER BY month;

PostgreSQL solution:
SELECT EXTRACT(MONTH FROM time_stamp) AS month, COUNT(user_id) AS user_count
FROM Signups
WHERE EXTRACT(YEAR FROM time_stamp) = 2020
GROUP BY month
ORDER BY month;
• Q.423

524
1000+ SQL Interview Questions & Answers | By Zero Analyst

Question
Write an SQL query to find the top 3 departments with the highest average salary in the
Employees table. If two departments have the same average salary, return both in the same
rank.

Explanation
You need to calculate the average salary for each department and rank them accordingly. In
the case of ties (same average salary), you should return both departments with the same
rank.

Datasets
Table creation and sample data:
CREATE TABLE Employees (
employee_id INT PRIMARY KEY,
name VARCHAR(100),
department VARCHAR(50),
salary DECIMAL(10, 2)
);

-- Sample data insertions


INSERT INTO Employees (employee_id, name, department, salary)
VALUES
(1, 'John', 'Engineering', 100000),
(2, 'Jane', 'Engineering', 120000),
(3, 'Alice', 'Sales', 80000),
(4, 'Bob', 'Sales', 85000),
(5, 'Charlie', 'Engineering', 110000),
(6, 'David', 'HR', 75000),
(7, 'Eva', 'HR', 78000);

Learnings
• Using AVG() for aggregation and ranking.
• Use of RANK() or DENSE_RANK() for handling ties.
• Ordering the results based on the calculated average salary.

Solutions
MySQL solution:
SELECT department, AVG(salary) AS avg_salary
FROM Employees
GROUP BY department
ORDER BY avg_salary DESC
LIMIT 3;

PostgreSQL solution:
SELECT department, AVG(salary) AS avg_salary
FROM Employees
GROUP BY department
ORDER BY avg_salary DESC
LIMIT 3;
• Q.424

525
1000+ SQL Interview Questions & Answers | By Zero Analyst

Question
Write an SQL query to find the top 5 users who logged in the most times in the last 30 days,
excluding users who did not log in during this period. Return the user_id and the total
number of logins for each user.

Explanation
To solve this:
• You need to count the number of logins for each user in the last 30 days.
• Exclude users who haven't logged in during the past 30 days.
• Sort by the number of logins and return the top 5 users.

Datasets
Table creation and sample data:
CREATE TABLE Logins (
user_id INT,
time_stamp DATETIME
);

INSERT INTO Logins (user_id, time_stamp)


VALUES
(1, '2023-12-01 10:00:00'),
(1, '2023-12-05 15:00:00'),
(2, '2023-12-10 08:00:00'),
(3, '2023-11-30 14:00:00'),
(3, '2023-12-10 13:30:00'),
(3, '2023-12-12 09:00:00'),
(4, '2023-12-15 11:45:00'),
(5, '2023-12-01 17:30:00'),
(6, '2023-12-02 14:00:00');

Learnings
• Filtering data based on a date range using CURDATE() and INTERVAL.
• Using COUNT() to aggregate login data.
• Sorting and limiting the result set to get the top N users.

Solutions
MySQL solution:
SELECT user_id, COUNT(*) AS login_count
FROM Logins
WHERE time_stamp >= CURDATE() - INTERVAL 30 DAY
GROUP BY user_id
ORDER BY login_count DESC
LIMIT 5;

PostgreSQL solution:
SELECT user_id, COUNT(*) AS login_count
FROM Logins
WHERE time_stamp >= CURRENT_DATE - INTERVAL '30 days'
GROUP BY user_id
ORDER BY login_count DESC

526
1000+ SQL Interview Questions & Answers | By Zero Analyst

LIMIT 5;
• Q.425
Question
Write an SQL query to find the total number of employees in each department. Return the
department name and the corresponding count of employees.

Explanation
You need to group the employees by their department and count the number of employees in
each department. This can be achieved using the GROUP BY clause.

Datasets
Table creation and sample data:
CREATE TABLE Employees (
employee_id INT PRIMARY KEY,
name VARCHAR(100),
department VARCHAR(50)
);

-- Sample data insertions


INSERT INTO Employees (employee_id, name, department)
VALUES
(1, 'John', 'HR'),
(2, 'Jane', 'Engineering'),
(3, 'Alice', 'HR'),
(4, 'Bob', 'Sales'),
(5, 'Charlie', 'Engineering'),
(6, 'David', 'HR'),
(7, 'Eva', 'Sales');

Learnings
• Use of GROUP BY to aggregate data by department.
• Counting rows for each group using COUNT().

Solutions
MySQL solution:
SELECT department, COUNT(*) AS employee_count
FROM Employees
GROUP BY department;

PostgreSQL solution:
SELECT department, COUNT(*) AS employee_count
FROM Employees
GROUP BY department;

Medium Question

527
1000+ SQL Interview Questions & Answers | By Zero Analyst

Question
Write an SQL query to find all employees who have the same manager as at least one other
employee. Return their employee_id, name, and managerId.

Explanation
You need to identify employees who share the same managerId. This requires joining the
table with itself to compare employees’ managerId and filtering out those who don’t share it
with anyone.

Datasets
Table creation and sample data:
CREATE TABLE Employees (
employee_id INT PRIMARY KEY,
name VARCHAR(100),
department VARCHAR(50),
managerId INT
);

-- Sample data insertions


INSERT INTO Employees (employee_id, name, department, managerId)
VALUES
(1, 'John', 'HR', NULL),
(2, 'Jane', 'Engineering', 1),
(3, 'Alice', 'HR', 1),
(4, 'Bob', 'Sales', 1),
(5, 'Charlie', 'Engineering', 2),
(6, 'David', 'Sales', 3),
(7, 'Eva', 'HR', 1);

Learnings
• Self-joins to compare rows within the same table.
• Filtering results using conditions on joined tables.
• Handling NULL values for employees without managers.

Solutions
MySQL solution:
SELECT e1.employee_id, e1.name, e1.managerId
FROM Employees e1
JOIN Employees e2
ON e1.managerId = e2.managerId
WHERE e1.employee_id != e2.employee_id;

PostgreSQL solution:
SELECT e1.employee_id, e1.name, e1.managerId
FROM Employees e1
JOIN Employees e2
ON e1.managerId = e2.managerId
WHERE e1.employee_id != e2.employee_id;

528
1000+ SQL Interview Questions & Answers | By Zero Analyst

Hard Question

Question
Write an SQL query to find the employees who have been in the company the longest and the
shortest. Return their employee_id, name, department, and hire_date.

Explanation
• You need to calculate the employees with the maximum and minimum hire_date (earliest
and latest hires).
• Use MIN() and MAX() to find the corresponding dates and then join them with the
employees table to get their details.

Datasets
Table creation and sample data:
CREATE TABLE Employees (
employee_id INT PRIMARY KEY,
name VARCHAR(100),
department VARCHAR(50),
hire_date DATE
);

-- Sample data insertions


INSERT INTO Employees (employee_id, name, department, hire_date)
VALUES
(1, 'John', 'HR', '2015-02-12'),
(2, 'Jane', 'Engineering', '2020-03-01'),
(3, 'Alice', 'Sales', '2018-08-15'),
(4, 'Bob', 'Engineering', '2021-05-20'),
(5, 'Charlie', 'HR', '2014-06-10'),
(6, 'David', 'Sales', '2017-01-25');

Learnings
• Use of MIN() and MAX() functions to find the earliest and latest dates.
• Joining the table with the result of MIN() and MAX() to get full details.
• Handling hire_date as a date type for comparison.

Solutions
MySQL solution:
SELECT employee_id, name, department, hire_date
FROM Employees
WHERE hire_date = (SELECT MIN(hire_date) FROM Employees)
OR hire_date = (SELECT MAX(hire_date) FROM Employees);

PostgreSQL solution:
SELECT employee_id, name, department, hire_date
FROM Employees
WHERE hire_date = (SELECT MIN(hire_date) FROM Employees)
OR hire_date = (SELECT MAX(hire_date) FROM Employees);

529
1000+ SQL Interview Questions & Answers | By Zero Analyst

These questions cover a range of difficulty levels while still staying relevant to the data
analysis domain in a real-world scenario. Let me know if you need further modifications or
additional questions!
• Q.426
Question
Write an SQL query to find the department with the highest average salary. If multiple
departments have the same highest average salary, return all of them.

Explanation
You need to calculate the average salary for each department, and then select the
department(s) with the highest average salary. To handle ties, you can use a subquery or a
HAVING clause.

Datasets
Table creation and sample data:
CREATE TABLE Employees (
employee_id INT PRIMARY KEY,
name VARCHAR(100),
department VARCHAR(50),
salary DECIMAL(10, 2)
);

INSERT INTO Employees (employee_id, name, department, salary)


VALUES
(1, 'John', 'HR', 50000),
(2, 'Jane', 'Engineering', 120000),
(3, 'Alice', 'HR', 55000),
(4, 'Bob', 'Sales', 70000),
(5, 'Charlie', 'Engineering', 115000),
(6, 'David', 'Sales', 80000),
(7, 'Eva', 'HR', 53000);

Learnings
• Use of AVG() for aggregating salary data by department.
• Sorting and filtering to find the department with the highest average salary.
• Handling ties in aggregate functions.

Solutions
MySQL solution:
SELECT department, AVG(salary) AS avg_salary
FROM Employees
GROUP BY department
HAVING AVG(salary) = (SELECT MAX(AVG(salary)) FROM Employees GROUP BY department);

PostgreSQL solution:
SELECT department, AVG(salary) AS avg_salary
FROM Employees
GROUP BY department

530
1000+ SQL Interview Questions & Answers | By Zero Analyst

HAVING AVG(salary) = (SELECT MAX(AVG(salary)) FROM Employees GROUP BY department);


• Q.427
Question
Write an SQL query to find the top 3 most frequent departments among employees who earn
more than $80,000. Return the department and the count of employees in each department.

Explanation
You need to filter employees earning more than $80,000, group them by department, and
count the number of employees in each department. Afterward, you need to sort the result
and limit it to the top 3 departments based on the number of employees.

Datasets
Table creation and sample data:
CREATE TABLE Employees (
employee_id INT PRIMARY KEY,
name VARCHAR(100),
department VARCHAR(50),
salary DECIMAL(10, 2)
);

-- Sample data insertions


INSERT INTO Employees (employee_id, name, department, salary)
VALUES
(1, 'John', 'HR', 50000),
(2, 'Jane', 'Engineering', 120000),
(3, 'Alice', 'HR', 55000),
(4, 'Bob', 'Sales', 70000),
(5, 'Charlie', 'Engineering', 115000),
(6, 'David', 'Sales', 85000),
(7, 'Eva', 'HR', 53000),
(8, 'Grace', 'Engineering', 95000),
(9, 'Henry', 'Sales', 90000);

Learnings
• Filtering data based on salary condition.
• Grouping and counting the number of employees within each department.
• Sorting results to find the top 3 departments.

Solutions
MySQL solution:
SELECT department, COUNT(*) AS department_count
FROM Employees
WHERE salary > 80000
GROUP BY department
ORDER BY department_count DESC
LIMIT 3;

PostgreSQL solution:
SELECT department, COUNT(*) AS department_count
FROM Employees

531
1000+ SQL Interview Questions & Answers | By Zero Analyst

WHERE salary > 80000


GROUP BY department
ORDER BY department_count DESC
LIMIT 3;
• Q.428
Question
Write an SQL query to find the average salary of employees for each department, excluding
the top 1 highest salary in each department from the calculation.

Explanation
To solve this, you need to:
• Find the highest salary in each department.
• Exclude that salary and calculate the average salary for the remaining employees in the
same department.

Datasets
Table creation and sample data:
CREATE TABLE Employees (
employee_id INT PRIMARY KEY,
name VARCHAR(100),
department VARCHAR(50),
salary DECIMAL(10, 2)
);

INSERT INTO Employees (employee_id, name, department, salary)


VALUES
(1, 'John', 'HR', 50000),
(2, 'Jane', 'Engineering', 120000),
(3, 'Alice', 'HR', 55000),
(4, 'Bob', 'Sales', 70000),
(5, 'Charlie', 'Engineering', 115000),
(6, 'David', 'Sales', 80000),
(7, 'Eva', 'HR', 53000),
(8, 'Grace', 'Engineering', 95000),
(9, 'Henry', 'Sales', 90000);

Learnings
• Use of MAX() to find the highest salary in each department.
• Subquery filtering to exclude top salaries.
• Calculating the average salary for the remaining employees in each department.

Solutions
MySQL solution:
SELECT department, AVG(salary) AS avg_salary
FROM Employees
WHERE salary < (SELECT MAX(salary) FROM Employees e2 WHERE e2.department = Employees.dep
artment)
GROUP BY department;

PostgreSQL solution:

532
1000+ SQL Interview Questions & Answers | By Zero Analyst

SELECT department, AVG(salary) AS avg_salary


FROM Employees
WHERE salary < (SELECT MAX(salary) FROM Employees e2 WHERE e2.department = Employees.dep
artment)
GROUP BY department;
• Q.429

Question
Write an SQL query to classify employees into three categories based on their salary:
• "Low" for salary less than $60,000,
• "Medium" for salary between $60,000 and $100,000, and
• "High" for salary greater than $100,000.
Return the employee_id, name, and the salary classification.

Explanation
You need to use the CASE statement to create a salary classification for each employee. The
CASE statement helps to create conditional logic directly in the SQL query.

Datasets
Table creation and sample data:
CREATE TABLE Employees (
employee_id INT PRIMARY KEY,
name VARCHAR(100),
salary DECIMAL(10, 2)
);

INSERT INTO Employees (employee_id, name, salary)


VALUES
(1, 'John', 55000),
(2, 'Jane', 120000),
(3, 'Alice', 95000),
(4, 'Bob', 45000),
(5, 'Charlie', 110000);

Learnings
• Use of CASE to implement conditional logic in SQL.
• Categorizing data into groups based on numeric conditions.

Solutions
MySQL solution:
SELECT employee_id, name,
CASE
WHEN salary < 60000 THEN 'Low'
WHEN salary BETWEEN 60000 AND 100000 THEN 'Medium'
ELSE 'High'
END AS salary_classification
FROM Employees;

PostgreSQL solution:

533
1000+ SQL Interview Questions & Answers | By Zero Analyst

SELECT employee_id, name,


CASE
WHEN salary < 60000 THEN 'Low'
WHEN salary BETWEEN 60000 AND 100000 THEN 'Medium'
ELSE 'High'
END AS salary_classification
FROM Employees;

• Q.430
Question
Write an SQL query to calculate the total number of "likes" and classify them based on how
many likes each post has received. Classify as:
• "Low Engagement" for less than 50 likes,
• "Medium Engagement" for between 50 and 200 likes,
• "High Engagement" for more than 200 likes.
Return the post_id, likes_count, and the engagement classification.

Explanation
This query involves the use of the CASE statement to categorize the number of likes a post
received into "Low", "Medium", or "High" based on predefined thresholds.

Datasets
Table creation and sample data:
CREATE TABLE InstagramPosts (
post_id INT PRIMARY KEY,
likes_count INT
);

INSERT INTO InstagramPosts (post_id, likes_count)


VALUES
(1, 30),
(2, 150),
(3, 300),
(4, 75),
(5, 10);

Learnings
• Applying CASE to categorize data into ranges.
• Aggregation using simple classification.

Solutions
MySQL solution:
SELECT post_id, likes_count,
CASE

534
1000+ SQL Interview Questions & Answers | By Zero Analyst

WHEN likes_count < 50 THEN 'Low Engagement'


WHEN likes_count BETWEEN 50 AND 200 THEN 'Medium Engagement'
ELSE 'High Engagement'
END AS engagement_classification
FROM InstagramPosts;

PostgreSQL solution:
SELECT post_id, likes_count,
CASE
WHEN likes_count < 50 THEN 'Low Engagement'
WHEN likes_count BETWEEN 50 AND 200 THEN 'Medium Engagement'
ELSE 'High Engagement'
END AS engagement_classification
FROM InstagramPosts;
• Q.431
Question
Write an SQL query to calculate the number of users who either:
• "Followed" a post, based on the action follow,
• "Unfollowed" a post, based on the action unfollow.
Use the action field in the InstagramActions table to classify each action. Additionally, if
a user follows a post but hasn't unfollowed it, mark it as a "Net Follower" in a new column.
Return the post_id, follow_count, unfollow_count, and net_follower_count.

Explanation
• You need to classify the actions based on the action field.
• Then, you calculate the count of follows and unfollows.
• Use the CASE statement to classify users who followed a post but haven't unfollowed it, and
calculate the net number of followers.

Datasets
Table creation and sample data:
CREATE TABLE InstagramActions (
action_id INT PRIMARY KEY,
user_id INT,
post_id INT,
action ENUM('follow', 'unfollow'),
time_stamp DATETIME
);

-- Sample data insertions


INSERT INTO InstagramActions (action_id, user_id, post_id, action, time_stamp)
VALUES
(1, 101, 1, 'follow', '2021-08-01 10:00:00'),
(2, 102, 1, 'follow', '2021-08-01 11:00:00'),
(3, 103, 1, 'unfollow', '2021-08-02 12:00:00'),
(4, 101, 2, 'follow', '2021-08-01 13:00:00'),
(5, 102, 2, 'unfollow', '2021-08-02 14:00:00'),
(6, 104, 1, 'follow', '2021-08-03 15:00:00');

Learnings
• Using CASE to classify and filter based on action types.

535
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Aggregation and conditional counts.


• Handling net values for follow/unfollow actions.

Solutions
MySQL solution:
SELECT post_id,
SUM(CASE WHEN action = 'follow' THEN 1 ELSE 0 END) AS follow_count,
SUM(CASE WHEN action = 'unfollow' THEN 1 ELSE 0 END) AS unfollow_count,
SUM(CASE WHEN action = 'follow' THEN 1 ELSE 0 END) -
SUM(CASE WHEN action = 'unfollow' THEN 1 ELSE 0 END) AS net_follower_count
FROM InstagramActions
GROUP BY post_id;

PostgreSQL solution:
SELECT post_id,
SUM(CASE WHEN action = 'follow' THEN 1 ELSE 0 END) AS follow_count,
SUM(CASE WHEN action = 'unfollow' THEN 1 ELSE 0 END) AS unfollow_count,
SUM(CASE WHEN action = 'follow' THEN 1 ELSE 0 END) -
SUM(CASE WHEN action = 'unfollow' THEN 1 ELSE 0 END) AS net_follower_count
FROM InstagramActions
GROUP BY post_id;
• Q.432
Question
Write an SQL query to identify the users who have commented about "violence" in their last
3 comments, based on the presence of specific keywords related to violence such as
'violence', 'attack', 'fight', 'war', or 'blood'. The comment table contains a column
comment_text where comments are stored.
Return the user_id and the comment_id of the relevant comments.

Explanation
You need to:
• Filter the comments that contain certain violence-related keywords using LIKE or REGEXP
in SQL.
• Identify the last 3 comments for each user.
• Check if any of the last 3 comments contain violence-related words.
• Return the user_id and comment_id of the comments that match the criteria.

Datasets
Table creation and sample data:
CREATE TABLE Comments (
comment_id INT PRIMARY KEY,
user_id INT,
comment_text TEXT,
time_stamp DATETIME
);

-- Sample data insertions


INSERT INTO Comments (comment_id, user_id, comment_text, time_stamp)
VALUES

536
1000+ SQL Interview Questions & Answers | By Zero Analyst

(1, 101, 'The attack on the city was brutal and violent', '2021-08-01 10:00:00'),
(2, 101, 'I watched a bloody war on the news', '2021-08-02 12:00:00'),
(3, 101, 'This fight is going too far, people are getting hurt', '2021-08-03 14:00:0
0'),
(4, 102, 'The peaceful march was inspiring', '2021-08-01 15:00:00'),
(5, 102, 'Such a violent outburst in the city center', '2021-08-02 16:00:00'),
(6, 103, 'The new movie was amazing, no violence', '2021-08-02 17:00:00'),
(7, 103, 'Blood spilled in the streets during the riots', '2021-08-03 18:00:00'),
(8, 103, 'Fighting over the issue is senseless', '2021-08-04 19:00:00'),
(9, 104, 'Nothing violent about this situation', '2021-08-04 20:00:00');

Learnings
• Use of LIKE or REGEXP for pattern matching in comments.
• Handling multiple conditions to check for keywords related to violence.
• Ordering and filtering to select the last 3 comments per user.
• Combining text analysis with SQL filtering.

Solutions
MySQL solution:
SELECT user_id, comment_id
FROM (
SELECT user_id, comment_id, comment_text, time_stamp,
ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY time_stamp DESC) AS rn
FROM Comments
WHERE comment_text LIKE '%violence%' OR comment_text LIKE '%attack%' OR comment_text
LIKE '%fight%'
OR comment_text LIKE '%war%' OR comment_text LIKE '%blood%'
) AS subquery
WHERE rn <= 3;

PostgreSQL solution:
SELECT user_id, comment_id
FROM (
SELECT user_id, comment_id, comment_text, time_stamp,
ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY time_stamp DESC) AS rn
FROM Comments
WHERE comment_text ~* 'violence|attack|fight|war|blood'
) AS subquery
WHERE rn <= 3;

Notes
• The ROW_NUMBER() function is used to order the comments per user by timestamp and then
select the last 3 comments for each user.
• LIKE or REGEXP is used to match specific words related to violence in the comments. For
MySQL, LIKE is used, and for PostgreSQL, ~* (case-insensitive regular expression match) is
used.
• This query assumes that the comments table has enough data and that the comment_text
contains the necessary information to identify violence-related content.
• Q.433
Question
Write an SQL query to combine WhatsApp group chat messages and individual chat
messages into a single result set. The GroupChats table stores messages from group chats,
and the IndividualChats table stores messages from one-on-one conversations.

537
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Return the user_id, chat_type, and message for each message.


• The chat_type should be labeled as "Group" for group messages and "Individual" for
individual messages.
• Combine both tables using UNION to create a unified list of messages.

Explanation
You need to use UNION to combine results from two different tables, ensuring you select the
necessary columns from both tables and label them accordingly. This query will return all
messages with a label to identify if they are from a group chat or individual chat.

Datasets
Table creation and sample data:
CREATE TABLE GroupChats (
message_id INT PRIMARY KEY,
group_id INT,
user_id INT,
message TEXT,
time_stamp DATETIME
);

CREATE TABLE IndividualChats (


message_id INT PRIMARY KEY,
sender_id INT,
receiver_id INT,
message TEXT,
time_stamp DATETIME
);

INSERT INTO GroupChats (message_id, group_id, user_id, message, time_stamp)


VALUES
(1, 1001, 101, 'Hello everyone, how are you?', '2021-08-01 10:00:00'),
(2, 1001, 102, 'I am doing well, thanks!', '2021-08-01 10:05:00'),
(3, 1002, 103, 'Is anyone free for a meeting?', '2021-08-01 11:00:00');

INSERT INTO IndividualChats (message_id, sender_id, receiver_id, message, time_stamp)


VALUES
(1, 101, 102, 'Hi, are you available for a call?', '2021-08-01 09:00:00'),
(2, 102, 101, 'Yes, I am free now!', '2021-08-01 09:05:00'),
(3, 103, 101, 'Hey, when can we meet?', '2021-08-01 12:00:00');

Learnings
• Using UNION to combine data from multiple tables.
• Labeling the type of chat by adding a static value in the SELECT statement.
• Handling both group and individual messages in a single query.

Solutions
MySQL solution:
SELECT user_id AS user_id, 'Group' AS chat_type, message
FROM GroupChats
UNION
SELECT sender_id AS user_id, 'Individual' AS chat_type, message
FROM IndividualChats

538
1000+ SQL Interview Questions & Answers | By Zero Analyst

ORDER BY time_stamp;

PostgreSQL solution:
SELECT user_id AS user_id, 'Group' AS chat_type, message
FROM GroupChats
UNION
SELECT sender_id AS user_id, 'Individual' AS chat_type, message
FROM IndividualChats
ORDER BY time_stamp;
• Q.434
Question
Write an SQL query to combine incoming and outgoing messages in WhatsApp chat logs into
one result set. The IncomingMessages table stores messages received by users, and the
OutgoingMessages table stores messages sent by users.

• Return the user_id, message_type (either "Incoming" or "Outgoing"), and the message.
• Combine both tables using UNION and display the result ordered by timestamp.

Explanation
You need to combine the incoming and outgoing messages using UNION. Additionally, label
the messages as either "Incoming" or "Outgoing" using a SELECT with static values.

Datasets
Table creation and sample data:
CREATE TABLE IncomingMessages (
message_id INT PRIMARY KEY,
user_id INT,
message TEXT,
time_stamp DATETIME
);

CREATE TABLE OutgoingMessages (


message_id INT PRIMARY KEY,
user_id INT,
message TEXT,
time_stamp DATETIME
);

-- Sample data insertions


INSERT INTO IncomingMessages (message_id, user_id, message, time_stamp)
VALUES
(1, 101, 'Hey, how are you?', '2021-08-01 10:00:00'),
(2, 102, 'Can we schedule a meeting for tomorrow?', '2021-08-01 10:15:00');

INSERT INTO OutgoingMessages (message_id, user_id, message, time_stamp)


VALUES
(1, 101, 'I am good, thanks!', '2021-08-01 10:05:00'),
(2, 103, 'Sure, I will send a calendar invite', '2021-08-01 10:20:00');

Learnings
• Using UNION to merge data from different sources (incoming vs. outgoing messages).
• Adding static labels to each query to differentiate between message types.
• Ensuring the correct order of the messages by timestamp.

539
1000+ SQL Interview Questions & Answers | By Zero Analyst

Solutions
MySQL solution:
SELECT user_id, 'Incoming' AS message_type, message
FROM IncomingMessages
UNION
SELECT user_id, 'Outgoing' AS message_type, message
FROM OutgoingMessages
ORDER BY time_stamp;

PostgreSQL solution:
SELECT user_id, 'Incoming' AS message_type, message
FROM IncomingMessages
UNION
SELECT user_id, 'Outgoing' AS message_type, message
FROM OutgoingMessages
ORDER BY time_stamp;

Notes
• Both queries use UNION to combine messages from two different sources.
• The message_type column is added to label whether the message is incoming or outgoing.
• The query results are ordered by the time_stamp to ensure the messages are returned in
chronological order.
• Q.435
Question
Write an SQL query to find the prices of all products on 2019-08-16. Assume the price of all
products before any price change is 10.
Return the product_id and the price of each product as of 2019-08-16.

Explanation
You need to:
• For each product, find the most recent price change that occurred on or before 2019-08-
16.
• If no price change is found for a product before or on this date, the price will be assumed to
be the default value of 10.

Datasets
Table creation and sample data:
CREATE TABLE Products (
product_id INT,
new_price INT,
change_date DATE,
PRIMARY KEY (product_id, change_date)
);

INSERT INTO Products (product_id, new_price, change_date)


VALUES
(1, 20, '2019-08-14'),
(2, 50, '2019-08-14'),
(1, 30, '2019-08-15'),

540
1000+ SQL Interview Questions & Answers | By Zero Analyst

(1, 35, '2019-08-16'),


(2, 65, '2019-08-17'),
(3, 20, '2019-08-18');

Learnings
• Using LEFT JOIN to handle products with no price change before the specified date.
• Using MAX() and GROUP BY to find the most recent price change before or on the given
date.
• Handling default values when no data is available for certain conditions.

Solutions
MySQL solution:
SELECT p.product_id,
IFNULL(MAX(pr.new_price), 10) AS price
FROM (
SELECT DISTINCT product_id
FROM Products
) p
LEFT JOIN Products pr
ON p.product_id = pr.product_id
AND pr.change_date <= '2019-08-16'
GROUP BY p.product_id;

PostgreSQL solution:
SELECT p.product_id,
COALESCE(MAX(pr.new_price), 10) AS price
FROM (
SELECT DISTINCT product_id
FROM Products
) p
LEFT JOIN Products pr
ON p.product_id = pr.product_id
AND pr.change_date <= '2019-08-16'
GROUP BY p.product_id;

Notes
• The query selects all distinct product_id values first, ensuring that we handle all products.
• A LEFT JOIN is used to include products that have no price change by 2019-08-16,
ensuring those products have a default price of 10.
• The MAX() function is used to fetch the most recent price change that occurred before or on
2019-08-16 for each product.
• The IFNULL (MySQL) and COALESCE (PostgreSQL) functions handle the case where a
product does not have any price change by returning the default value 10
• Q.436
Question
Write an SQL query to find the name of the candidate who received the most votes. In case of
a tie, return the candidate with the lowest id among the tied candidates.

Explanation

541
1000+ SQL Interview Questions & Answers | By Zero Analyst

• You need to count the votes for each candidate from the Vote table, grouped by
CandidateId.
• You should then join this result with the Candidate table to get the name of the candidate.
• Finally, return the candidate name who received the most votes.

Datasets
Table creation and sample data:
CREATE TABLE Candidate (
id INT PRIMARY KEY AUTO_INCREMENT,
Name VARCHAR(100)
);

-- Sample data insertions for Candidate table


INSERT INTO Candidate (id, Name)
VALUES
(1, 'A'),
(2, 'B'),
(3, 'C'),
(4, 'D'),
(5, 'E');

CREATE TABLE Vote (


id INT PRIMARY KEY AUTO_INCREMENT,
CandidateId INT,
FOREIGN KEY (CandidateId) REFERENCES Candidate(id)
);

-- Sample data insertions for Vote table


INSERT INTO Vote (id, CandidateId)
VALUES
(1, 2),
(2, 4),
(3, 3),
(4, 2),
(5, 5);

Learnings
• Counting rows in a table using COUNT().
• Grouping results using GROUP BY.
• Handling ties and returning the candidate with the lowest id using ORDER BY and LIMIT.

Solutions
MySQL solution:
SELECT c.Name
FROM Candidate c
JOIN (
SELECT CandidateId, COUNT(*) AS vote_count
FROM Vote
GROUP BY CandidateId
) v ON c.id = v.CandidateId
ORDER BY v.vote_count DESC, c.id ASC
LIMIT 1;

PostgreSQL solution:
SELECT c.Name
FROM Candidate c
JOIN (

542
1000+ SQL Interview Questions & Answers | By Zero Analyst

SELECT CandidateId, COUNT(*) AS vote_count


FROM Vote
GROUP BY CandidateId
) v ON c.id = v.CandidateId
ORDER BY v.vote_count DESC, c.id ASC
LIMIT 1;

Notes
• The subquery calculates the total number of votes each candidate has received by using
COUNT(*) and grouping by CandidateId.
• The outer query joins the Candidate table with the result of the subquery to fetch the
candidate's name.
• The query orders by vote_count in descending order to find the candidate with the most
votes.
• In case of a tie (multiple candidates with the same highest vote count), the query orders by
id in ascending order to select the candidate with the lowest id.
• Finally, LIMIT 1 ensures that only the top candidate is returned.
• Q.437
Question
Write an SQL query to find the names of all candidates who have received more votes than
the average number of votes per candidate.

Explanation
• First, calculate the average number of votes per candidate.
• Then, count the number of votes for each candidate.
• Finally, find all candidates whose vote count is greater than the average.

Datasets
Table creation and sample data:
CREATE TABLE Candidate (
id INT PRIMARY KEY AUTO_INCREMENT,
Name VARCHAR(100)
);

INSERT INTO Candidate (id, Name)


VALUES
(1, 'A'),
(2, 'B'),
(3, 'C'),
(4, 'D'),
(5, 'E');

CREATE TABLE Vote (


id INT PRIMARY KEY AUTO_INCREMENT,
CandidateId INT,
FOREIGN KEY (CandidateId) REFERENCES Candidate(id)
);

INSERT INTO Vote (id, CandidateId)


VALUES
(1, 2),
(2, 4),
(3, 3),

543
1000+ SQL Interview Questions & Answers | By Zero Analyst

(4, 2),
(5, 5);

Learnings
• Using COUNT() to calculate the number of votes for each candidate.
• Using AVG() to calculate the average votes across all candidates.
• Using a HAVING clause to filter the results based on aggregated conditions.

Solutions
MySQL solution:
SELECT c.Name
FROM Candidate c
JOIN (
SELECT CandidateId, COUNT(*) AS vote_count
FROM Vote
GROUP BY CandidateId
) v ON c.id = v.CandidateId
WHERE v.vote_count > (
SELECT AVG(vote_count)
FROM (
SELECT COUNT(*) AS vote_count
FROM Vote
GROUP BY CandidateId
) avg_votes
);

PostgreSQL solution:
SELECT c.Name
FROM Candidate c
JOIN (
SELECT CandidateId, COUNT(*) AS vote_count
FROM Vote
GROUP BY CandidateId
) v ON c.id = v.CandidateId
WHERE v.vote_count > (
SELECT AVG(vote_count)
FROM (
SELECT COUNT(*) AS vote_count
FROM Vote
GROUP BY CandidateId
) avg_votes
);

Notes
• The subquery inside the WHERE clause calculates the average number of votes per candidate
using AVG().
• The outer query counts the votes for each candidate using COUNT(*), then filters the results
by comparing each candidate's vote count with the calculated average.
• The JOIN ensures that we can access both the candidate name and their vote count in the
final result.
• Q.438
Question
Write an SQL query to find the product_id of products that are both low fat and recyclable.
Return the result table in any order.

544
1000+ SQL Interview Questions & Answers | By Zero Analyst

Explanation
You need to:
• Filter products where the low_fats column is 'Y' (low fat) and the recyclable column is
'Y' (recyclable).
• Return the product_id of those products that satisfy both conditions.

Datasets
Table creation and sample data:
CREATE TABLE Products (
product_id INT PRIMARY KEY,
low_fats ENUM('Y', 'N'),
recyclable ENUM('Y', 'N')
);

-- Sample data insertions


INSERT INTO Products (product_id, low_fats, recyclable)
VALUES
(0, 'Y', 'N'),
(1, 'Y', 'Y'),
(2, 'N', 'Y'),
(3, 'Y', 'Y'),
(4, 'N', 'N');

Learnings
• Filtering results based on multiple conditions in the WHERE clause.
• Using the ENUM type for handling specific values (e.g., 'Y' and 'N').
• Selecting specific columns in the result table.

Solutions
MySQL solution:
SELECT product_id
FROM Products
WHERE low_fats = 'Y' AND recyclable = 'Y';

PostgreSQL solution:
SELECT product_id
FROM Products
WHERE low_fats = 'Y' AND recyclable = 'Y';

Notes
• The WHERE clause filters rows where both low_fats and recyclable are 'Y', meaning the
product is both low fat and recyclable.
• The result returns only the product_id of those products.
• Q.439
Question

545
1000+ SQL Interview Questions & Answers | By Zero Analyst

Write an SQL query to find the product_id and launch_year of products that were
launched in the last 3 years but have not been sold in the last year.

Explanation
You need to:
• Filter products that were launched in the last 3 years.
• Check that these products have not been sold in the last year.
• Return the product_id and the launch_year of those products.

Datasets
Table creation and sample data:
CREATE TABLE Products (
product_id INT PRIMARY KEY,
launch_date DATE
);

INSERT INTO Products (product_id, launch_date)


VALUES
(1, '2021-03-15'),
(2, '2022-07-01'),
(3, '2023-01-10'),
(4, '2020-05-20'),
(5, '2019-08-25');

CREATE TABLE Sales (


sale_id INT PRIMARY KEY,
product_id INT,
sale_date DATE,
FOREIGN KEY (product_id) REFERENCES Products(product_id)
);

INSERT INTO Sales (sale_id, product_id, sale_date)


VALUES
(1, 1, '2021-08-01'),
(2, 2, '2023-06-15'),
(3, 3, '2023-02-28'),
(4, 1, '2023-04-05');

Learnings
• Using DATE functions to filter records by specific time periods.
• Understanding how to compare dates across two tables.
• Using JOIN to combine product and sales data.
• Handling date calculations to determine a range (e.g., "last 3 years", "last year").

Solutions
MySQL solution:
SELECT p.product_id, YEAR(p.launch_date) AS launch_year
FROM Products p
LEFT JOIN Sales s ON p.product_id = s.product_id
WHERE p.launch_date >= CURDATE() - INTERVAL 3 YEAR
AND (s.sale_date IS NULL OR s.sale_date < CURDATE() - INTERVAL 1 YEAR)
GROUP BY p.product_id;

546
1000+ SQL Interview Questions & Answers | By Zero Analyst

PostgreSQL solution:
SELECT p.product_id, EXTRACT(YEAR FROM p.launch_date) AS launch_year
FROM Products p
LEFT JOIN Sales s ON p.product_id = s.product_id
WHERE p.launch_date >= CURRENT_DATE - INTERVAL '3 years'
AND (s.sale_date IS NULL OR s.sale_date < CURRENT_DATE - INTERVAL '1 year')
GROUP BY p.product_id;

Notes
• The query filters products launched in the last 3 years by checking if launch_date is
within the last 3 years using the INTERVAL keyword.
• The LEFT JOIN ensures that we get all products, even those that have not been sold yet
(where no matching sale_date exists).
• The condition s.sale_date IS NULL OR s.sale_date < CURDATE() - INTERVAL 1
YEAR ensures that products have not been sold in the last year.
• The result includes the product_id and the launch_year of those products.
• Q.440
Question
Write an SQL query to find the user_id who, over a period of time, has shown the largest
fluctuation in their engagement on a platform. Specifically, find the user whose engagement
(activity_score) has varied the most between any two consecutive days, based on the
activity data stored in the UserEngagement table.
The query should return:
• user_id
• The maximum difference in activity score between two consecutive days for that user
(calculated as the absolute difference between the scores).
• The start date of the period when this fluctuation started (the first day of the two
consecutive days with the largest difference).
• The end date of the period when this fluctuation ended (the second day of the two
consecutive days with the largest difference).

Explanation
You need to:
• Calculate the activity_score difference between consecutive days for each user.
• Find the user with the highest fluctuation in activity score between two consecutive days.
• Return the user_id, the maximum fluctuation, and the dates of the fluctuation.
The query should handle:
• Users who have multiple fluctuations and return the one with the largest absolute
difference.
• Ensure that the fluctuation calculation is based on consecutive days for each user.

Datasets
Table creation and sample data:

547
1000+ SQL Interview Questions & Answers | By Zero Analyst

CREATE TABLE UserEngagement (


user_id INT,
activity_score INT,
activity_date DATE,
PRIMARY KEY (user_id, activity_date)
);

-- Sample data insertions


INSERT INTO UserEngagement (user_id, activity_score, activity_date)
VALUES
(1, 10, '2023-01-01'),
(1, 25, '2023-01-02'),
(1, 50, '2023-01-03'),
(1, 5, '2023-01-04'),
(1, 40, '2023-01-05'),
(2, 100, '2023-01-01'),
(2, 90, '2023-01-02'),
(2, 120, '2023-01-03'),
(2, 80, '2023-01-04'),
(2, 200, '2023-01-05'),
(3, 50, '2023-01-01'),
(3, 70, '2023-01-02'),
(3, 30, '2023-01-03'),
(3, 100, '2023-01-04');

Learnings
• Using LAG() window function to calculate the previous day's activity score for each user.
• Calculating the absolute difference in scores between consecutive days.
• Finding the maximum difference and selecting the appropriate dates using window
functions and ORDER BY.
• Handling gaps in data and ensuring consecutive days are correctly identified.

Solutions
MySQL solution:
WITH ScoreDifferences AS (
SELECT
user_id,
activity_date,
activity_score,
LAG(activity_score) OVER (PARTITION BY user_id ORDER BY activity_date) AS prev_s
core
FROM UserEngagement
),
Fluctuations AS (
SELECT
user_id,
activity_date AS end_date,
LAG(activity_date) OVER (PARTITION BY user_id ORDER BY activity_date) AS start_d
ate,
ABS(activity_score - prev_score) AS score_diff
FROM ScoreDifferences
WHERE prev_score IS NOT NULL
)
SELECT
user_id,
MAX(score_diff) AS max_fluctuation,
start_date,
end_date
FROM Fluctuations
GROUP BY user_id
ORDER BY max_fluctuation DESC
LIMIT 1;

548
1000+ SQL Interview Questions & Answers | By Zero Analyst

PostgreSQL solution:
WITH ScoreDifferences AS (
SELECT
user_id,
activity_date,
activity_score,
LAG(activity_score) OVER (PARTITION BY user_id ORDER BY activity_date) AS prev_s
core
FROM UserEngagement
),
Fluctuations AS (
SELECT
user_id,
activity_date AS end_date,
LAG(activity_date) OVER (PARTITION BY user_id ORDER BY activity_date) AS start_d
ate,
ABS(activity_score - prev_score) AS score_diff
FROM ScoreDifferences
WHERE prev_score IS NOT NULL
)
SELECT
user_id,
MAX(score_diff) AS max_fluctuation,
start_date,
end_date
FROM Fluctuations
GROUP BY user_id
ORDER BY max_fluctuation DESC
LIMIT 1;

Notes
• The LAG() window function is used to retrieve the previous day's activity_score for
each user, partitioned by user_id and ordered by activity_date.
• The absolute difference is calculated using ABS() to find the fluctuation.
• We use a common table expression (CTE) to compute these differences and filter out cases
where no previous score exists (i.e., the first day for each user).
• The MAX() function is used to find the largest fluctuation for each user, and the LIMIT 1
ensures we only return the user with the largest fluctuation overall.
• The query returns the user_id, max_fluctuation, start_date, and end_date for the
user with the largest fluctuation in activity scores.

Netflix
• Q.441
Question
Write an SQL query to find the top 5 most-watched pieces of content in each region based
on total watch duration for the last month (December 2024).
Return the following columns:
• Region
• ContentID
• Total Watch Duration (sum of WatchDuration for each content in that region)

Explanation
You need to:

549
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Filter the ViewingHistory table to only include records from December 2024.
• Calculate the total watch duration for each content piece in each region.
• Rank the content within each region based on the total watch duration, and select the top 5
content for each region.
• Return the results ordered by region and total watch duration.

Datasets
Table creation and sample data:
CREATE TABLE ViewingHistory (
UserID INT,
ContentID INT,
WatchDate DATE,
WatchDuration INT,
Region VARCHAR(100)
);

-- Sample data insertions


INSERT INTO ViewingHistory (UserID, ContentID, WatchDate, WatchDuration, Region)
VALUES
(1, 101, '2024-12-01', 120, 'North America'),
(2, 102, '2024-12-05', 80, 'Europe'),
(3, 101, '2024-12-10', 150, 'North America'),
(4, 103, '2024-12-15', 100, 'Asia'),
(5, 101, '2024-12-20', 200, 'North America'),
(6, 104, '2024-12-25', 300, 'Europe'),
(7, 105, '2024-12-01', 100, 'North America'),
(8, 106, '2024-12-07', 50, 'Asia'),
(9, 101, '2024-12-30', 180, 'North America'),
(10, 107, '2024-12-05', 90, 'Europe'),
(11, 108, '2024-12-18', 120, 'Asia'),
(12, 109, '2024-12-19', 60, 'Europe'),
(13, 101, '2024-12-25', 220, 'North America'),
(14, 110, '2024-12-01', 150, 'South America'),
(15, 111, '2024-12-30', 180, 'South America');

Learnings
• Using the SUM() function to aggregate total watch duration.
• Using ROW_NUMBER() or RANK() to rank the top 5 content in each region.
• Filtering data based on date conditions (WHERE clause) for the last month.
• Window Functions (like ROW_NUMBER()) to rank content in each region.

Solutions

MySQL solution:
WITH RankedContent AS (
SELECT
Region,
ContentID,
SUM(WatchDuration) AS TotalWatchDuration,
ROW_NUMBER() OVER (PARTITION BY Region ORDER BY SUM(WatchDuration) DESC) AS Rank
FROM ViewingHistory
WHERE WatchDate BETWEEN '2024-12-01' AND '2024-12-31'
GROUP BY Region, ContentID
)
SELECT Region, ContentID, TotalWatchDuration
FROM RankedContent
WHERE Rank <= 5

550
1000+ SQL Interview Questions & Answers | By Zero Analyst

ORDER BY Region, TotalWatchDuration DESC;

PostgreSQL solution:
WITH RankedContent AS (
SELECT
Region,
ContentID,
SUM(WatchDuration) AS TotalWatchDuration,
ROW_NUMBER() OVER (PARTITION BY Region ORDER BY SUM(WatchDuration) DESC) AS Rank
FROM ViewingHistory
WHERE WatchDate BETWEEN '2024-12-01' AND '2024-12-31'
GROUP BY Region, ContentID
)
SELECT Region, ContentID, TotalWatchDuration
FROM RankedContent
WHERE Rank <= 5
ORDER BY Region, TotalWatchDuration DESC;

Notes
• The query filters the data for December 2024 using the WHERE clause.
• The SUM() function is used to calculate the total watch duration for each content piece.
• The ROW_NUMBER() window function ranks the content within each region by total watch
duration in descending order.
• Only the top 5 ranked content for each region are selected using WHERE Rank <= 5.
• The final result is ordered by Region and TotalWatchDuration in descending order to
show the most-watched content at the top.
• Q.442
Question
Write an SQL query to find, for each month and country, the following:
• The number of transactions.
• The total amount of all transactions.
• The number of approved transactions.
• The total amount of approved transactions.
Return the results in any order.

Explanation
You need to:
• Extract the month and year from the trans_date to group the data by month.
• Count the total number of transactions and calculate the total amount of transactions for
each country and month.
• For transactions with the state 'approved', calculate the count and total amount
separately.
• Return the results with the following columns:
• month (in YYYY-MM format)
• country
• trans_count (total number of transactions)
• approved_count (number of approved transactions)
• trans_total_amount (total amount of all transactions)

551
1000+ SQL Interview Questions & Answers | By Zero Analyst

• approved_total_amount (total amount of approved transactions)

Datasets
Table creation and sample data:
CREATE TABLE Transactions (
id INT PRIMARY KEY,
country VARCHAR(50),
state ENUM('approved', 'declined'),
amount INT,
trans_date DATE
);

-- Sample data insertions


INSERT INTO Transactions (id, country, state, amount, trans_date)
VALUES
(121, 'US', 'approved', 1000, '2018-12-18'),
(122, 'US', 'declined', 2000, '2018-12-19'),
(123, 'US', 'approved', 2000, '2019-01-01'),
(124, 'DE', 'approved', 2000, '2019-01-07');

Learnings
• Using DATE_FORMAT (MySQL) or TO_CHAR (PostgreSQL) to extract the year and month
from the transaction date.
• Using GROUP BY to aggregate data by month and country.
• Using COUNT() to count the number of transactions and SUM() to calculate total amounts.
• Filtering approved transactions using the WHERE clause or CASE statements.

Solutions

MySQL solution:
SELECT
DATE_FORMAT(trans_date, '%Y-%m') AS month,
country,
COUNT(*) AS trans_count,
SUM(CASE WHEN state = 'approved' THEN 1 ELSE 0 END) AS approved_count,
SUM(amount) AS trans_total_amount,
SUM(CASE WHEN state = 'approved' THEN amount ELSE 0 END) AS approved_total_amount
FROM Transactions
GROUP BY month, country
ORDER BY month, country;

PostgreSQL solution:
SELECT
TO_CHAR(trans_date, 'YYYY-MM') AS month,
country,
COUNT(*) AS trans_count,
SUM(CASE WHEN state = 'approved' THEN 1 ELSE 0 END) AS approved_count,
SUM(amount) AS trans_total_amount,
SUM(CASE WHEN state = 'approved' THEN amount ELSE 0 END) AS approved_total_amount
FROM Transactions
GROUP BY month, country
ORDER BY month, country;

Notes

552
1000+ SQL Interview Questions & Answers | By Zero Analyst

• DATE_FORMAT in MySQL and TO_CHAR in PostgreSQL are used to extract the


month and year from the trans_date column in YYYY-MM format.
• COUNT(*) counts the total number of transactions in each group.
• SUM(amount) calculates the total amount for each group of transactions.
• SUM(CASE WHEN state = 'approved' THEN amount ELSE 0 END) calculates the
total amount of approved transactions by summing the amount only for those transactions
where state = 'approved'.
• The query uses GROUP BY to group results by month and country.
• ORDER BY is used to order the results by month and country for clarity.

• Q.443
Question
Write an SQL query to calculate the monthly retention rate of users over the past 6 months,
given the following table:
Where:
• Each row in the table represents an activity performed by a user on a specific date.
• The table contains data for all users over multiple months.
The monthly retention rate for a given month is defined as the percentage of users who
logged in during that month and were also active in the previous month.

Explanation
To calculate the monthly retention rate for each month:
• Identify the active users for each month (those who logged in that month).
• Find the users who were active in the previous month (i.e., logged in during the previous
month).
• Calculate the retention rate for each month as:

Datasets
Table creation and sample data:
CREATE TABLE UserActivity (
UserID INT,
LoginDate DATE,
ActivityType VARCHAR(50)
);

INSERT INTO UserActivity (UserID, LoginDate, ActivityType)


VALUES
(1, '2024-06-15', 'login'),
(1, '2024-07-10', 'login'),
(2, '2024-07-25', 'login'),
(3, '2024-07-20', 'login'),
(1, '2024-08-05', 'login'),
(2, '2024-08-15', 'login'),

553
1000+ SQL Interview Questions & Answers | By Zero Analyst

(4, '2024-08-20', 'login'),


(5, '2024-08-30', 'login'),
(3, '2024-09-01', 'login'),
(1, '2024-09-05', 'login'),
(2, '2024-09-15', 'login'),
(5, '2024-09-25', 'login'),
(6, '2024-10-05', 'login'),
(7, '2024-10-15', 'login'),
(8, '2024-10-25', 'login');

Learnings
• Using DATE_TRUNC (PostgreSQL) or DATE_FORMAT (MySQL) to extract the month and year
from the login date.
• Using JOIN to link users who logged in both in the current month and the previous month.
• COUNTing distinct users to calculate retention.
• Calculating percentage retention by dividing the number of retained users by total active
users.

Solutions

MySQL solution:
WITH MonthlyActiveUsers AS (
SELECT DISTINCT
UserID,
DATE_FORMAT(LoginDate, '%Y-%m') AS month
FROM UserActivity
WHERE LoginDate BETWEEN '2024-07-01' AND '2024-12-31'
),
Retention AS (
SELECT
curr.UserID,
curr.month AS current_month,
prev.month AS previous_month
FROM MonthlyActiveUsers curr
LEFT JOIN MonthlyActiveUsers prev
ON curr.UserID = prev.UserID
AND prev.month = DATE_FORMAT(DATE_SUB(STR_TO_DATE(curr.month, '%Y-%m'), INTERVAL
1 MONTH), '%Y-%m')
)
SELECT
current_month AS month,
COUNT(DISTINCT current_month.UserID) AS active_users,
COUNT(DISTINCT CASE WHEN previous_month IS NOT NULL THEN current_month.UserID END) A
S retained_users,
ROUND(COUNT(DISTINCT CASE WHEN previous_month IS NOT NULL THEN current_month.UserID
END) /
COUNT(DISTINCT current_month.UserID) * 100, 2) AS retention_rate
FROM Retention
GROUP BY current_month
ORDER BY current_month DESC;

PostgreSQL solution:
WITH MonthlyActiveUsers AS (
SELECT DISTINCT
UserID,
TO_CHAR(LoginDate, 'YYYY-MM') AS month
FROM UserActivity
WHERE LoginDate BETWEEN '2024-07-01' AND '2024-12-31'
),
Retention AS (
SELECT
curr.UserID,

554
1000+ SQL Interview Questions & Answers | By Zero Analyst

curr.month AS current_month,
prev.month AS previous_month
FROM MonthlyActiveUsers curr
LEFT JOIN MonthlyActiveUsers prev
ON curr.UserID = prev.UserID
AND prev.month = TO_CHAR(TO_DATE(curr.month, 'YYYY-MM') - INTERVAL '1 month', 'Y
YYY-MM')
)
SELECT
current_month AS month,
COUNT(DISTINCT current_month.UserID) AS active_users,
COUNT(DISTINCT CASE WHEN previous_month IS NOT NULL THEN current_month.UserID END) A
S retained_users,
ROUND(COUNT(DISTINCT CASE WHEN previous_month IS NOT NULL THEN current_month.UserID
END) /
COUNT(DISTINCT current_month.UserID) * 100, 2) AS retention_rate
FROM Retention
GROUP BY current_month
ORDER BY current_month DESC;

Notes
• WITH Clauses are used to create two intermediate tables:
• MonthlyActiveUsers: Retrieves all distinct users who were active in each month between
July 2024 and December 2024.
• Retention: Joins the MonthlyActiveUsers table on itself to find users who were active in
both the current month and the previous month.
• COUNT(DISTINCT) is used to count distinct users to avoid duplicate counts for the same
user.
• The retention rate is calculated by dividing the number of retained users by the total
number of users active in the current month, multiplied by 100.
• DATE_FORMAT in MySQL and TO_CHAR in PostgreSQL are used to extract the
month and year from the LoginDate.
• The result is ordered by the current month to show the most recent month first.

Notes on the Output:


• The Retention Rate is the percentage of active users in the current month who also logged
in the previous month.
• For each month, we compute:
• active_users: The total number of distinct users who logged in that month.
• retained_users: The number of users who also logged in the previous month.
• retention_rate: The percentage of retained users.
• Q.444
Question
Write a query to find the top 3 most-watched genres in the USA, India, and the UK in the
past quarter from the ContentGenres table.

Explanation
You need to filter the data by country (USA, India, UK) and limit the result to the top 3
genres in each of these countries based on their popularity. The query will involve:
• Filtering by country.
• Grouping by genre and country.
• Ordering the results by the number of occurrences of each genre in the past quarter.

555
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Using ROW_NUMBER() or RANK() to rank genres by popularity for each country.

Datasets and SQL Schemas


• - Table creation and sample data
CREATE TABLE ContentGenres (
ContentID INT,
Genre VARCHAR(100),
Country VARCHAR(100),
WatchDate DATE
);

INSERT INTO ContentGenres (ContentID, Genre, Country, WatchDate)


VALUES
(1, 'Action', 'USA', '2023-10-10'),
(2, 'Drama', 'USA', '2023-10-05'),
(3, 'Comedy', 'USA', '2023-09-30'),
(4, 'Action', 'India', '2023-10-01'),
(5, 'Comedy', 'India', '2023-10-15'),
(6, 'Action', 'UK', '2023-10-12'),
(7, 'Drama', 'UK', '2023-09-25'),
(8, 'Action', 'USA', '2023-10-02'),
(9, 'Drama', 'India', '2023-10-20'),
(10, 'Comedy', 'UK', '2023-09-29');

Learnings
• Use GROUP BY and COUNT() to aggregate data by genre and country.
• Use ROW_NUMBER() or RANK() to limit the result to top N genres per country.
• Understand how to filter data by date (the past quarter in this case).
• Use PARTITION BY with window functions for ranking.

Solutions
• - PostgreSQL solution
WITH GenreRank AS (
SELECT Genre, Country, COUNT(*) AS GenreCount,
ROW_NUMBER() OVER (PARTITION BY Country ORDER BY COUNT(*) DESC) AS rank
FROM ContentGenres
WHERE Country IN ('USA', 'India', 'UK')
AND WatchDate >= CURRENT_DATE - INTERVAL '3 months'
GROUP BY Genre, Country
)
SELECT Genre, Country, GenreCount
FROM GenreRank
WHERE rank <= 3;
• - MySQL solution
WITH GenreRank AS (
SELECT Genre, Country, COUNT(*) AS GenreCount,
ROW_NUMBER() OVER (PARTITION BY Country ORDER BY COUNT(*) DESC) AS rank
FROM ContentGenres
WHERE Country IN ('USA', 'India', 'UK')
AND WatchDate >= CURDATE() - INTERVAL 3 MONTH
GROUP BY Genre, Country
)
SELECT Genre, Country, GenreCount
FROM GenreRank
WHERE rank <= 3;
• Q.445
Question
Write a query to identify users who watched at least 5 episodes consecutively in a single
sitting in the past week from the ViewingPatterns table.

556
1000+ SQL Interview Questions & Answers | By Zero Analyst

Explanation
You need to identify users who watched multiple episodes consecutively in a single sitting,
meaning the EndTime of one episode is very close to the StartTime of the next. The query
will:
• Filter data from the past week.
• Group by UserID and ContentID to identify consecutive episodes.
• Use window functions (e.g., LEAD or LAG) to determine consecutive episodes.
• Count how many consecutive episodes each user watched in a single sitting and return
users who watched at least 5.

Datasets and SQL Schemas


• - Table creation and sample data
CREATE TABLE ViewingPatterns (
UserID INT,
ContentID INT,
EpisodeID INT,
StartTime DATETIME,
EndTime DATETIME
);

-- Sample data insertions


INSERT INTO ViewingPatterns (UserID, ContentID, EpisodeID, StartTime, EndTime)
VALUES
(1, 101, 1, '2023-10-05 19:00', '2023-10-05 19:30'),
(1, 101, 2, '2023-10-05 19:30', '2023-10-05 20:00'),
(1, 101, 3, '2023-10-05 20:00', '2023-10-05 20:30'),
(1, 101, 4, '2023-10-05 20:30', '2023-10-05 21:00'),
(1, 101, 5, '2023-10-05 21:00', '2023-10-05 21:30'),
(2, 102, 1, '2023-10-06 18:00', '2023-10-06 18:45'),
(2, 102, 2, '2023-10-06 18:45', '2023-10-06 19:30'),
(3, 103, 1, '2023-10-05 21:00', '2023-10-05 21:45'),
(3, 103, 2, '2023-10-05 21:45', '2023-10-05 22:30');

Learnings
• Using window functions like LEAD() and LAG() to check for consecutive events.
• Filtering by date range (last week).
• Grouping data to aggregate viewing behavior.
• Detecting consecutive events based on time difference.

Solutions
• - PostgreSQL solution
WITH ConsecutiveEpisodes AS (
SELECT UserID, ContentID, EpisodeID, StartTime, EndTime,
LEAD(StartTime) OVER (PARTITION BY UserID, ContentID ORDER BY StartTime) AS N
extStartTime
FROM ViewingPatterns
WHERE StartTime >= CURRENT_DATE - INTERVAL '7 days'
)
SELECT UserID, ContentID
FROM ConsecutiveEpisodes
WHERE EXTRACT(EPOCH FROM NextStartTime - EndTime) <= 900 -- 15 minutes gap for consecut
ive episodes
GROUP BY UserID, ContentID
HAVING COUNT(*) >= 5;
• - MySQL solution
WITH ConsecutiveEpisodes AS (
SELECT UserID, ContentID, EpisodeID, StartTime, EndTime,
LEAD(StartTime) OVER (PARTITION BY UserID, ContentID ORDER BY StartTime) AS N
extStartTime

557
1000+ SQL Interview Questions & Answers | By Zero Analyst

FROM ViewingPatterns
WHERE StartTime >= CURDATE() - INTERVAL 7 DAY
)
SELECT UserID, ContentID
FROM ConsecutiveEpisodes
WHERE TIMESTAMPDIFF(SECOND, EndTime, NextStartTime) <= 900 -- 15 minutes gap for consec
utive episodes
GROUP BY UserID, ContentID
HAVING COUNT(*) >= 5;
• Q.446
Question
Write a query to calculate the percentage of users who canceled their subscription within 30
days of a billing cycle in the past year from the SubscriptionLogs table.

Explanation
To solve this, you need to:
• Filter the data to include only cancellations in the past year.
• Identify users who canceled within 30 days of a renewal action by comparing the
ActionDate of the cancellation to the ActionDate of the renewal.
• Calculate the percentage of these users relative to the total number of users who had a
renewal action within the past year.

Datasets and SQL Schemas


• - Table creation and sample data
CREATE TABLE SubscriptionLogs (
UserID INT,
ActionType VARCHAR(50),
ActionDate DATE
);

-- Sample data insertions


INSERT INTO SubscriptionLogs (UserID, ActionType, ActionDate)
VALUES
(1, 'Renewal', '2023-01-15'),
(1, 'Cancellation', '2023-02-10'),
(2, 'Renewal', '2023-03-01'),
(2, 'Cancellation', '2023-03-25'),
(3, 'Renewal', '2023-04-10'),
(3, 'Cancellation', '2023-05-05'),
(4, 'Renewal', '2023-06-15'),
(5, 'Cancellation', '2023-07-20'),
(6, 'Renewal', '2023-09-01'),
(6, 'Cancellation', '2023-09-05'),
(7, 'Renewal', '2023-10-01');

Learnings
• Using JOIN to compare actions for the same user (e.g., renewal and cancellation).
• Filtering data by date range (the past year).
• Using date functions like DATEDIFF() or EXTRACT() to calculate date differences.
• Aggregating data to compute percentages.

Solutions
• - PostgreSQL solution
WITH RenewalAndCancellation AS (
SELECT r.UserID, r.ActionDate AS RenewalDate, c.ActionDate AS CancellationDate
FROM SubscriptionLogs r

558
1000+ SQL Interview Questions & Answers | By Zero Analyst

LEFT JOIN SubscriptionLogs c ON r.UserID = c.UserID AND c.ActionType = 'Cancellation


'
WHERE r.ActionType = 'Renewal' AND r.ActionDate >= CURRENT_DATE - INTERVAL '1 year'
)
SELECT
ROUND(100.0 * COUNT(DISTINCT UserID) / (SELECT COUNT(DISTINCT UserID) FROM Subscript
ionLogs WHERE ActionType = 'Renewal' AND ActionDate >= CURRENT_DATE - INTERVAL '1 year')
, 2) AS CancellationRate
FROM RenewalAndCancellation
WHERE CancellationDate IS NOT NULL
AND EXTRACT(DAY FROM CancellationDate - RenewalDate) <= 30;
• - MySQL solution
WITH RenewalAndCancellation AS (
SELECT r.UserID, r.ActionDate AS RenewalDate, c.ActionDate AS CancellationDate
FROM SubscriptionLogs r
LEFT JOIN SubscriptionLogs c ON r.UserID = c.UserID AND c.ActionType = 'Cancellation
'
WHERE r.ActionType = 'Renewal' AND r.ActionDate >= CURDATE() - INTERVAL 1 YEAR
)
SELECT
ROUND(100.0 * COUNT(DISTINCT UserID) / (SELECT COUNT(DISTINCT UserID) FROM Subscript
ionLogs WHERE ActionType = 'Renewal' AND ActionDate >= CURDATE() - INTERVAL 1 YEAR), 2)
AS CancellationRate
FROM RenewalAndCancellation
WHERE CancellationDate IS NOT NULL
AND TIMESTAMPDIFF(DAY, RenewalDate, CancellationDate) <= 30;
• Q.447
Question
Write a query to generate a user-to-content watch matrix from a table ViewingHistory with
columns UserID, ContentID, and WatchDuration, showing the total watch duration for each
user-content pair.

Explanation
To solve this, you need to:
• Aggregate the data by UserID and ContentID to get the total WatchDuration for each
combination of user and content.
• The result should have UserID as rows, ContentID as columns, and the total
WatchDuration as the value in the matrix.

Datasets and SQL Schemas


• - Table creation and sample data
CREATE TABLE ViewingHistory (
UserID INT,
ContentID INT,
WatchDuration INT -- Duration in minutes
);

INSERT INTO ViewingHistory (UserID, ContentID, WatchDuration)


VALUES
(1, 101, 30),
(1, 102, 45),
(1, 103, 60),
(2, 101, 20),
(2, 103, 25),
(3, 102, 30),
(3, 101, 40),
(3, 104, 50);

Learnings
• Using GROUP BY to aggregate data.

559
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Pivoting data in SQL using conditional aggregation (CASE).


• Handling user-content matrix representation.
• Summing values for each user-content pair.

Solutions
• - PostgreSQL solution
SELECT
UserID,
SUM(CASE WHEN ContentID = 101 THEN WatchDuration ELSE 0 END) AS Content_101,
SUM(CASE WHEN ContentID = 102 THEN WatchDuration ELSE 0 END) AS Content_102,
SUM(CASE WHEN ContentID = 103 THEN WatchDuration ELSE 0 END) AS Content_103,
SUM(CASE WHEN ContentID = 104 THEN WatchDuration ELSE 0 END) AS Content_104
FROM ViewingHistory
GROUP BY UserID;
• - MySQL solution
SELECT
UserID,
SUM(CASE WHEN ContentID = 101 THEN WatchDuration ELSE 0 END) AS Content_101,
SUM(CASE WHEN ContentID = 102 THEN WatchDuration ELSE 0 END) AS Content_102,
SUM(CASE WHEN ContentID = 103 THEN WatchDuration ELSE 0 END) AS Content_103,
SUM(CASE WHEN ContentID = 104 THEN WatchDuration ELSE 0 END) AS Content_104
FROM ViewingHistory
GROUP BY UserID;
• Q.448
Question
Write a query to calculate the accuracy of recommendations by country from the
Recommendations table, where UserID, RecommendedContentID, WatchStatus ('Watched',
'Skipped'), and Country are columns.

Explanation
To calculate the accuracy of recommendations, you need to:
• Group the data by Country and UserID.
• Count the number of Watched recommendations for each country.
• Calculate the accuracy as the ratio of Watched recommendations to the total number of
recommendations (Watched + Skipped) for each country.

Datasets and SQL Schemas


• - Table creation and sample data
CREATE TABLE Recommendations (
UserID INT,
RecommendedContentID INT,
WatchStatus VARCHAR(10),
Country VARCHAR(50)
);
INSERT INTO Recommendations (UserID, RecommendedContentID, WatchStatus, Country)
VALUES
(1, 101, 'Watched', 'USA'),
(1, 102, 'Skipped', 'USA'),
(2, 103, 'Watched', 'USA'),
(2, 104, 'Skipped', 'USA'),
(3, 105, 'Watched', 'India'),
(3, 106, 'Skipped', 'India'),
(4, 107, 'Watched', 'India'),
(4, 108, 'Watched', 'India'),
(5, 109, 'Skipped', 'UK'),
(5, 110, 'Watched', 'UK');

560
1000+ SQL Interview Questions & Answers | By Zero Analyst

Learnings
• Grouping data by Country to calculate aggregated statistics.
• Using COUNT() and SUM() to calculate the number of Watched recommendations.
• Calculating accuracy as a ratio of favorable outcomes (Watched) to total outcomes
(Watched + Skipped).

Solutions
• - PostgreSQL solution
SELECT
Country,
ROUND(100.0 * SUM(CASE WHEN WatchStatus = 'Watched' THEN 1 ELSE 0 END) / COUNT(*), 2
) AS Accuracy
FROM Recommendations
GROUP BY Country;
• - MySQL solution
SELECT
Country,
ROUND(100.0 * SUM(CASE WHEN WatchStatus = 'Watched' THEN 1 ELSE 0 END) / COUNT(*), 2
) AS Accuracy
FROM Recommendations
GROUP BY Country;
• Q.449
Question
Write a query to find pairs of content with the same genre and similar average watch
durations (+/- 10 minutes) using tables ContentDetails and ViewingHistory.

Explanation
To solve this:
• Join the ContentDetails table with ViewingHistory on ContentID to get the genre and
watch duration.
• Calculate the average watch duration for each content.
• Find content pairs that belong to the same genre and have similar average watch durations
(within a 10-minute difference).
• Return the ContentID pairs along with their genre.

Datasets and SQL Schemas


• - Table creation and sample data
CREATE TABLE ContentDetails (
ContentID INT PRIMARY KEY,
Genre VARCHAR(100)
);

CREATE TABLE ViewingHistory (


UserID INT,
ContentID INT,
WatchDuration INT -- Duration in minutes
);

-- Sample data insertions


INSERT INTO ContentDetails (ContentID, Genre)
VALUES
(101, 'Action'),
(102, 'Action'),
(103, 'Comedy'),
(104, 'Comedy'),
(105, 'Action'),
(106, 'Drama');

561
1000+ SQL Interview Questions & Answers | By Zero Analyst

INSERT INTO ViewingHistory (UserID, ContentID, WatchDuration)


VALUES
(1, 101, 30),
(2, 101, 45),
(3, 102, 40),
(4, 103, 25),
(5, 103, 35),
(6, 104, 60),
(7, 105, 50),
(8, 105, 40);

Learnings
• Using JOIN to combine data from multiple tables.
• Aggregating data with AVG() to calculate average watch durations.
• Using conditional filtering (e.g., ABS() function) to find similar durations.
• Grouping by genre to find content within the same genre.

Solutions
• - PostgreSQL solution
WITH AverageDurations AS (
SELECT v.ContentID, c.Genre, AVG(v.WatchDuration) AS AvgWatchDuration
FROM ViewingHistory v
JOIN ContentDetails c ON v.ContentID = c.ContentID
GROUP BY v.ContentID, c.Genre
)
SELECT a.ContentID AS ContentID1, b.ContentID AS ContentID2, a.Genre, a.AvgWatchDuration
AS AvgDuration1, b.AvgWatchDuration AS AvgDuration2
FROM AverageDurations a
JOIN AverageDurations b ON a.Genre = b.Genre AND a.ContentID < b.ContentID
WHERE ABS(a.AvgWatchDuration - b.AvgWatchDuration) <= 10;
• - MySQL solution
WITH AverageDurations AS (
SELECT v.ContentID, c.Genre, AVG(v.WatchDuration) AS AvgWatchDuration
FROM ViewingHistory v
JOIN ContentDetails c ON v.ContentID = c.ContentID
GROUP BY v.ContentID, c.Genre
)
SELECT a.ContentID AS ContentID1, b.ContentID AS ContentID2, a.Genre, a.AvgWatchDuration
AS AvgDuration1, b.AvgWatchDuration AS AvgDuration2
FROM AverageDurations a
JOIN AverageDurations b ON a.Genre = b.Genre AND a.ContentID < b.ContentID
WHERE ABS(a.AvgWatchDuration - b.AvgWatchDuration) <= 10;
• Q.450
Question
Write a query to identify content that has become popular in India in the past 30 days but was
not in the top 100 a month ago from the ContentViewLogs table.

Explanation
To solve this:
• Get the content that has high view counts in India in the past 30 days.
• Check the view counts from a month ago for the same content.
• Compare the recent view counts with the previous month's view counts to determine if the
content wasn't in the top 100 a month ago.
• The result should list the content that has gained popularity in the last 30 days but wasn't
among the top 100 content a month ago.

562
1000+ SQL Interview Questions & Answers | By Zero Analyst

Datasets and SQL Schemas


• - Table creation and sample data
CREATE TABLE ContentViewLogs (
ContentID INT,
Country VARCHAR(50),
ViewCount INT,
ViewDate DATE
);

-- Sample data insertions


INSERT INTO ContentViewLogs (ContentID, Country, ViewCount, ViewDate)
VALUES
(101, 'India', 2000, '2023-12-01'),
(102, 'India', 1500, '2023-12-02'),
(103, 'India', 1200, '2023-12-10'),
(104, 'India', 1000, '2023-12-05'),
(105, 'India', 2500, '2023-12-12'),
(106, 'India', 300, '2023-11-01'),
(107, 'India', 600, '2023-11-15'),
(108, 'India', 500, '2023-11-20'),
(109, 'India', 700, '2023-11-25'),
(110, 'India', 800, '2023-11-30');

Learnings
• Using JOIN or SUBQUERY to compare recent data with historical data.
• Aggregating data using GROUP BY to identify the top content based on view counts.
• Using date filtering to focus on content viewed in specific time ranges (last 30 days vs.
previous month).
• Using NOT IN or NOT EXISTS to exclude content from the previous month's top content.

Solutions
• - PostgreSQL solution
WITH RecentViews AS (
SELECT ContentID, SUM(ViewCount) AS TotalViewCount
FROM ContentViewLogs
WHERE Country = 'India' AND ViewDate >= CURRENT_DATE - INTERVAL '30 days'
GROUP BY ContentID
),
PreviousTopViews AS (
SELECT ContentID
FROM ContentViewLogs
WHERE Country = 'India' AND ViewDate BETWEEN CURRENT_DATE - INTERVAL '60 days' AND C
URRENT_DATE - INTERVAL '30 days'
GROUP BY ContentID
ORDER BY SUM(ViewCount) DESC
LIMIT 100
)
SELECT r.ContentID, r.TotalViewCount
FROM RecentViews r
LEFT JOIN PreviousTopViews p ON r.ContentID = p.ContentID
WHERE p.ContentID IS NULL
ORDER BY r.TotalViewCount DESC;
• - MySQL solution
WITH RecentViews AS (
SELECT ContentID, SUM(ViewCount) AS TotalViewCount
FROM ContentViewLogs
WHERE Country = 'India' AND ViewDate >= CURDATE() - INTERVAL 30 DAY
GROUP BY ContentID
),
PreviousTopViews AS (
SELECT ContentID
FROM ContentViewLogs
WHERE Country = 'India' AND ViewDate BETWEEN CURDATE() - INTERVAL 60 DAY AND CURDATE
() - INTERVAL 30 DAY
GROUP BY ContentID

563
1000+ SQL Interview Questions & Answers | By Zero Analyst

ORDER BY SUM(ViewCount) DESC


LIMIT 100
)
SELECT r.ContentID, r.TotalViewCount
FROM RecentViews r
LEFT JOIN PreviousTopViews p ON r.ContentID = p.ContentID
WHERE p.ContentID IS NULL
ORDER BY r.TotalViewCount DESC;
• Q.451
Question
Write a query to group users into clusters based on their average watch time per genre using
tables UserGenres(UserID, Genre, TotalWatchTime) and Genres.

Explanation
To solve this:
• Calculate the average watch time for each user per genre.
• Group users based on their watch patterns. A common approach to clustering in SQL
involves using basic similarity measures or aggregations (e.g., using CASE WHEN or GROUP BY
for clustering).
• For more sophisticated clustering (e.g., k-means), a database might not be ideal; however,
this can be approximated with categorical grouping based on watch time thresholds.
This query will return users grouped by their average watch time across different genres,
which can be treated as a simple form of clustering.

Datasets and SQL Schemas


• - Table creation and sample data
CREATE TABLE UserGenres (
UserID INT,
Genre VARCHAR(100),
TotalWatchTime INT -- Watch time in minutes
);

CREATE TABLE Genres (


Genre VARCHAR(100)
);

-- Sample data insertions


INSERT INTO UserGenres (UserID, Genre, TotalWatchTime)
VALUES
(1, 'Action', 120),
(1, 'Comedy', 90),
(1, 'Drama', 150),
(2, 'Action', 200),
(2, 'Comedy', 60),
(3, 'Drama', 180),
(3, 'Action', 80),
(4, 'Comedy', 110),
(4, 'Action', 95);

INSERT INTO Genres (Genre)


VALUES
('Action'),
('Comedy'),
('Drama');

Learnings
• Using GROUP BY and AVG() to aggregate watch time by user and genre.
• Understanding how to group users based on their watch patterns.

564
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Applying conditional logic to classify users into broad "clusters" based on watch time.

Solutions
• - PostgreSQL solution
WITH AvgWatchTime AS (
SELECT UserID, Genre, AVG(TotalWatchTime) AS AvgWatchTime
FROM UserGenres
GROUP BY UserID, Genre
)
SELECT UserID,
CASE
WHEN AVG(AvgWatchTime) >= 150 THEN 'Heavy Watchers'
WHEN AVG(AvgWatchTime) BETWEEN 100 AND 149 THEN 'Moderate Watchers'
ELSE 'Light Watchers'
END AS WatchCluster
FROM AvgWatchTime
GROUP BY UserID
ORDER BY UserID;
• - MySQL solution
WITH AvgWatchTime AS (
SELECT UserID, Genre, AVG(TotalWatchTime) AS AvgWatchTime
FROM UserGenres
GROUP BY UserID, Genre
)
SELECT UserID,
CASE
WHEN AVG(AvgWatchTime) >= 150 THEN 'Heavy Watchers'
WHEN AVG(AvgWatchTime) BETWEEN 100 AND 149 THEN 'Moderate Watchers'
ELSE 'Light Watchers'
END AS WatchCluster
FROM AvgWatchTime
GROUP BY UserID
ORDER BY UserID;
• Q.452
Question
Write a query to calculate the total revenue generated per region over the past 6 months from
the Revenue table.

Explanation
To solve this:
• Filter the Revenue table to include only data from the past 6 months using the
PaymentDate.
• Group the data by Region to calculate the total revenue per region.
• Sum the Amount for each Region and return the result.

Datasets and SQL Schemas


• - Table creation and sample data
CREATE TABLE Revenue (
UserID INT,
SubscriptionPlan VARCHAR(100),
Amount DECIMAL(10, 2),
Region VARCHAR(100),
PaymentDate DATE
);

-- Sample data insertions


INSERT INTO Revenue (UserID, SubscriptionPlan, Amount, Region, PaymentDate)
VALUES
(1, 'Basic', 9.99, 'North America', '2023-07-10'),
(2, 'Premium', 19.99, 'Europe', '2023-08-15'),

565
1000+ SQL Interview Questions & Answers | By Zero Analyst

(3, 'Standard', 14.99, 'Asia', '2023-09-05'),


(4, 'Premium', 19.99, 'North America', '2023-10-20'),
(5, 'Basic', 9.99, 'Europe', '2023-11-01'),
(6, 'Standard', 14.99, 'Asia', '2023-12-10'),
(7, 'Premium', 19.99, 'North America', '2023-12-15'),
(8, 'Basic', 9.99, 'Europe', '2023-12-18'),
(9, 'Standard', 14.99, 'Asia', '2023-12-20'),
(10, 'Premium', 19.99, 'North America', '2023-12-25');

Learnings
• Filtering data by date using WHERE clause and date functions.
• Using SUM() to calculate the total revenue for each region.
• Grouping data by Region to get regional insights.

Solutions
• - PostgreSQL solution
SELECT Region,
SUM(Amount) AS TotalRevenue
FROM Revenue
WHERE PaymentDate >= CURRENT_DATE - INTERVAL '6 months'
GROUP BY Region
ORDER BY TotalRevenue DESC;
• - MySQL solution
SELECT Region,
SUM(Amount) AS TotalRevenue
FROM Revenue
WHERE PaymentDate >= CURDATE() - INTERVAL 6 MONTH
GROUP BY Region
ORDER BY TotalRevenue DESC;
• Q.453

Question
Write a query to find the percentage of users upgrading to higher-tier plans in the last year
from the SubscriptionChanges table, which has columns UserID, OldPlan, NewPlan, and
ChangeDate.

Explanation
To solve this:
• Filter the SubscriptionChanges table to include only records from the last year using
ChangeDate.
• Identify users who have upgraded to a higher-tier plan by comparing OldPlan and
NewPlan.
• Calculate the percentage of users who upgraded, relative to the total number of users who
had any plan change in the last year.

Datasets and SQL Schemas


• - Table creation and sample data
CREATE TABLE SubscriptionChanges (
UserID INT,
OldPlan VARCHAR(50),
NewPlan VARCHAR(50),
ChangeDate DATE
);

-- Sample data insertions


INSERT INTO SubscriptionChanges (UserID, OldPlan, NewPlan, ChangeDate)

566
1000+ SQL Interview Questions & Answers | By Zero Analyst

VALUES
(1, 'Basic', 'Premium', '2024-06-15'),
(2, 'Standard', 'Premium', '2024-07-01'),
(3, 'Premium', 'Standard', '2024-08-10'),
(4, 'Basic', 'Standard', '2024-09-20'),
(5, 'Premium', 'Premium', '2024-10-05'),
(6, 'Basic', 'Premium', '2023-11-18'),
(7, 'Standard', 'Premium', '2023-12-12'),
(8, 'Standard', 'Basic', '2024-12-15');

Learnings
• Using CASE expressions to identify upgrades.
• Calculating the percentage by comparing counts of upgraded users versus total users.
• Using date functions (CURRENT_DATE, INTERVAL) to filter records for the past year.

Solutions
• - PostgreSQL solution
WITH UserChanges AS (
SELECT UserID, OldPlan, NewPlan
FROM SubscriptionChanges
WHERE ChangeDate >= CURRENT_DATE - INTERVAL '1 year'
)
SELECT
ROUND(100.0 * COUNT(DISTINCT CASE WHEN (OldPlan = 'Basic' AND NewPlan = 'Premium')
OR (OldPlan = 'Standard' AND NewPlan = 'Premium')
OR (OldPlan = 'Standard' AND NewPlan = 'Basic')
THEN UserID END)
/ COUNT(DISTINCT UserID), 2) AS UpgradePercentage
FROM UserChanges;
• - MySQL solution
WITH UserChanges AS (
SELECT UserID, OldPlan, NewPlan
FROM SubscriptionChanges
WHERE ChangeDate >= CURDATE() - INTERVAL 1 YEAR
)
SELECT
ROUND(100.0 * COUNT(DISTINCT CASE WHEN (OldPlan = 'Basic' AND NewPlan = 'Premium')
OR (OldPlan = 'Standard' AND NewPlan = 'Premium')
OR (OldPlan = 'Standard' AND NewPlan = 'Basic')
THEN UserID END)
/ COUNT(DISTINCT UserID), 2) AS UpgradePercentage
FROM UserChanges;
• Q.454
Question
Write a query to calculate the average revenue per user for those who used discount codes
versus those who didn’t from the DiscountUsage table, which has columns UserID,
DiscountCode, UsageDate, and Plan.

Explanation
To solve this:
• Identify users who used discount codes by checking for the presence of DiscountCode.
• Separate the users into two groups: those who used a discount code and those who did not.
• Calculate the average revenue per user for both groups.
• Join with the SubscriptionPlans table to map Plan to a revenue value (assuming the
Plan corresponds to specific revenue).

Datasets and SQL Schemas

567
1000+ SQL Interview Questions & Answers | By Zero Analyst

• - Table creation and sample data


CREATE TABLE DiscountUsage (
UserID INT,
DiscountCode VARCHAR(50),
UsageDate DATE,
Plan VARCHAR(50)
);

-- Sample data insertions


INSERT INTO DiscountUsage (UserID, DiscountCode, UsageDate, Plan)
VALUES
(1, 'DISCOUNT10', '2023-06-01', 'Basic'),
(2, 'DISCOUNT15', '2023-06-15', 'Premium'),
(3, NULL, '2023-07-10', 'Standard'),
(4, NULL, '2023-07-20', 'Basic'),
(5, 'DISCOUNT20', '2023-08-05', 'Premium'),
(6, NULL, '2023-08-10', 'Standard'),
(7, 'DISCOUNT10', '2023-09-01', 'Basic');

Assume the SubscriptionPlans table is like this:


CREATE TABLE SubscriptionPlans (
Plan VARCHAR(50),
Revenue DECIMAL(10, 2)
);

-- Sample plan revenue insertions


INSERT INTO SubscriptionPlans (Plan, Revenue)
VALUES
('Basic', 9.99),
('Standard', 14.99),
('Premium', 19.99);

Learnings
• Using CASE statements to differentiate users who used a discount code and those who
didn’t.
• Calculating average revenue using AVG().
• Joins to map plans to revenue values.

Solutions
• - PostgreSQL solution
WITH RevenueData AS (
SELECT d.UserID,
d.DiscountCode,
sp.Revenue
FROM DiscountUsage d
JOIN SubscriptionPlans sp ON d.Plan = sp.Plan
)
SELECT
CASE
WHEN DiscountCode IS NOT NULL THEN 'With Discount'
ELSE 'Without Discount'
END AS DiscountGroup,
AVG(Revenue) AS AvgRevenuePerUser
FROM RevenueData
GROUP BY DiscountGroup;
• - MySQL solution
WITH RevenueData AS (
SELECT d.UserID,
d.DiscountCode,
sp.Revenue
FROM DiscountUsage d
JOIN SubscriptionPlans sp ON d.Plan = sp.Plan
)
SELECT
CASE

568
1000+ SQL Interview Questions & Answers | By Zero Analyst

WHEN DiscountCode IS NOT NULL THEN 'With Discount'


ELSE 'Without Discount'
END AS DiscountGroup,
AVG(Revenue) AS AvgRevenuePerUser
FROM RevenueData
GROUP BY DiscountGroup;
• Q.455

Question
Write a query to calculate the average lifetime value (LTV) of a Netflix subscriber using
tables UserRevenue (UserID, SubscriptionAmount, PaymentDate) and UserRetention
(UserID, RetentionDays).

Explanation
To calculate the average LTV:
• First, calculate the total revenue generated by each user. This is done by summing up the
SubscriptionAmount from UserRevenue for each UserID.
• Multiply the total revenue by the average retention period for that user from the
UserRetention table (in days).
• Calculate the average LTV across all users.

Datasets and SQL Schemas


• - Table creation and sample data
CREATE TABLE UserRevenue (
UserID INT,
SubscriptionAmount DECIMAL(10, 2),
PaymentDate DATE
);

CREATE TABLE UserRetention (


UserID INT,
RetentionDays INT
);

-- Sample data insertions


INSERT INTO UserRevenue (UserID, SubscriptionAmount, PaymentDate)
VALUES
(1, 9.99, '2023-01-01'),
(1, 9.99, '2023-02-01'),
(2, 14.99, '2023-03-01'),
(2, 14.99, '2023-04-01'),
(3, 19.99, '2023-01-15'),
(3, 19.99, '2023-02-15');

INSERT INTO UserRetention (UserID, RetentionDays)


VALUES
(1, 180), -- User 1 stayed for 180 days
(2, 120), -- User 2 stayed for 120 days
(3, 90); -- User 3 stayed for 90 days

Learnings
• Using SUM() to aggregate subscription amounts per user.
• Calculating total revenue per user by multiplying SubscriptionAmount with
RetentionDays.
• Using AVG() to find the average LTV across all users.

Solutions

569
1000+ SQL Interview Questions & Answers | By Zero Analyst

• - PostgreSQL solution
WITH UserTotalRevenue AS (
SELECT ur.UserID,
SUM(ur.SubscriptionAmount) AS TotalRevenue,
uret.RetentionDays
FROM UserRevenue ur
JOIN UserRetention uret ON ur.UserID = uret.UserID
GROUP BY ur.UserID, uret.RetentionDays
)
SELECT AVG(TotalRevenue * RetentionDays) AS AverageLTV
FROM UserTotalRevenue;
• - MySQL solution
WITH UserTotalRevenue AS (
SELECT ur.UserID,
SUM(ur.SubscriptionAmount) AS TotalRevenue,
uret.RetentionDays
FROM UserRevenue ur
JOIN UserRetention uret ON ur.UserID = uret.UserID
GROUP BY ur.UserID, uret.RetentionDays
)
SELECT AVG(TotalRevenue * RetentionDays) AS AverageLTV
FROM UserTotalRevenue;
• Q.456
Question
Write a query to calculate the percentage of churned users who resubscribed within 6 months
from the ChurnLogs table, which has columns UserID, ChurnDate, ResubscriptionDate,
and Region.

Explanation
To solve this:
• Identify the users who have a ResubscriptionDate within 6 months of their ChurnDate.
• Calculate the total number of users who churned (i.e., the total number of unique UserIDs
with a ChurnDate).
• Calculate the number of users who churned and then resubscribed within 6 months.
• Compute the percentage of users who resubscribed within 6 months relative to the total
number of churned users.

Datasets and SQL Schemas


• - Table creation and sample data
CREATE TABLE ChurnLogs (
UserID INT,
ChurnDate DATE,
ResubscriptionDate DATE,
Region VARCHAR(50)
);

-- Sample data insertions


INSERT INTO ChurnLogs (UserID, ChurnDate, ResubscriptionDate, Region)
VALUES
(1, '2023-05-01', '2023-07-01', 'North America'),
(2, '2023-06-15', '2023-12-01', 'Europe'),
(3, '2023-07-10', NULL, 'Asia'),
(4, '2023-08-20', '2023-09-15', 'North America'),
(5, '2023-09-05', '2024-01-10', 'Europe'),
(6, '2023-10-01', NULL, 'Asia');

Learnings

570
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Using DATEDIFF() (or equivalent) to calculate the difference in days between ChurnDate
and ResubscriptionDate.
• Identifying users who resubscribed within 6 months of churn using conditional logic.
• Calculating percentage values using COUNT() for the numerator and denominator.

Solutions
• - PostgreSQL solution
WITH ChurnedUsers AS (
SELECT UserID,
CASE
WHEN ResubscriptionDate IS NOT NULL AND ResubscriptionDate <= ChurnDate +
INTERVAL '6 months' THEN 1
ELSE 0
END AS ResubscribedWithin6Months
FROM ChurnLogs
)
SELECT
ROUND(100.0 * COUNT(ResubscribedWithin6Months) FILTER (WHERE ResubscribedWithin6Mont
hs = 1)
/ COUNT(UserID), 2) AS ResubscriptionPercentage
FROM ChurnedUsers;
• - MySQL solution
WITH ChurnedUsers AS (
SELECT UserID,
CASE
WHEN ResubscriptionDate IS NOT NULL AND DATEDIFF(ResubscriptionDate, Chur
nDate) <= 180 THEN 1
ELSE 0
END AS ResubscribedWithin6Months
FROM ChurnLogs
)
SELECT
ROUND(100.0 * COUNT(CASE WHEN ResubscribedWithin6Months = 1 THEN 1 END)
/ COUNT(UserID), 2) AS ResubscriptionPercentage
FROM ChurnedUsers;
• Q.457

Question
Write a query to find content licenses expiring in the next 30 days and their total view count.

Explanation
The task is to identify content licenses that are expiring in the next 30 days. The query should
join the Licenses table with a Views table (assuming it exists) to calculate the total view
count for each content item. Use the CURRENT_DATE function to filter the licenses expiring
within the next 30 days.

Datasets and SQL Schemas


• - Table creation
CREATE TABLE Licenses (
ContentID INT PRIMARY KEY,
LicenseStartDate DATE,
LicenseEndDate DATE,
Region VARCHAR(50)
);

CREATE TABLE Views (


ViewID INT PRIMARY KEY,
ContentID INT,
ViewCount INT,
ViewDate DATE,

571
1000+ SQL Interview Questions & Answers | By Zero Analyst

FOREIGN KEY (ContentID) REFERENCES Licenses(ContentID)


);
• - Datasets
-- Licenses
INSERT INTO Licenses (ContentID, LicenseStartDate, LicenseEndDate, Region)
VALUES
(1, '2024-01-01', '2025-01-15', 'US'),
(2, '2024-03-01', '2025-02-15', 'EU'),
(3, '2024-05-01', '2025-02-01', 'APAC');

-- Views
INSERT INTO Views (ViewID, ContentID, ViewCount, ViewDate)
VALUES
(1, 1, 50, '2024-12-01'),
(2, 1, 30, '2024-12-02'),
(3, 2, 100, '2024-12-01'),
(4, 3, 150, '2024-12-05');

Learnings
• Using CURRENT_DATE to filter based on a relative date range
• Joining tables to combine license data with view counts
• Using SUM() to aggregate view counts for each content

Solutions
• - PostgreSQL solution
SELECT l.ContentID,
COALESCE(SUM(v.ViewCount), 0) AS TotalViewCount
FROM Licenses l
LEFT JOIN Views v ON l.ContentID = v.ContentID
WHERE l.LicenseEndDate BETWEEN CURRENT_DATE AND CURRENT_DATE + INTERVAL '30 days'
GROUP BY l.ContentID;
• - MySQL solution
SELECT l.ContentID,
COALESCE(SUM(v.ViewCount), 0) AS TotalViewCount
FROM Licenses l
LEFT JOIN Views v ON l.ContentID = v.ContentID
WHERE l.LicenseEndDate BETWEEN CURDATE() AND CURDATE() + INTERVAL 30 DAY
GROUP BY l.ContentID;
• Q.458

Question
Write a query to find the top-performing content in each genre based on a combined metric of
total views and average watch duration.

Explanation
The goal is to identify the top-performing content in each genre, where the performance is
determined by a combination of total views and average watch duration. The query should
group the data by genre and calculate a weighted or combined score based on total views and
average watch duration, then return the content with the highest score in each genre.

Datasets and SQL Schemas


• - Table creation
CREATE TABLE ContentPerformance (
ContentID INT PRIMARY KEY,
Genre VARCHAR(50),
TotalViews INT,
WatchDuration DECIMAL(10, 2) -- Assuming average watch duration in minutes
);

572
1000+ SQL Interview Questions & Answers | By Zero Analyst

• - Datasets
-- ContentPerformance
INSERT INTO ContentPerformance (ContentID, Genre, TotalViews, WatchDuration)
VALUES
(1, 'Action', 1000, 45.5),
(2, 'Action', 1200, 40.0),
(3, 'Comedy', 800, 25.0),
(4, 'Comedy', 950, 30.0),
(5, 'Drama', 600, 50.0),
(6, 'Drama', 1100, 48.0);

Learnings
• Grouping data by genre
• Calculating combined performance metric (e.g., TotalViews * WatchDuration)
• Using aggregation functions (SUM(), AVG()) for performance metrics
• Using ROW_NUMBER() or similar window functions to rank content per genre

Solutions
• - PostgreSQL solution
WITH RankedContent AS (
SELECT ContentID,
Genre,
TotalViews,
WatchDuration,
(TotalViews * WatchDuration) AS PerformanceScore,
ROW_NUMBER() OVER (PARTITION BY Genre ORDER BY (TotalViews * WatchDuration) D
ESC) AS Rank
FROM ContentPerformance
)
SELECT ContentID, Genre, TotalViews, WatchDuration
FROM RankedContent
WHERE Rank = 1;
• - MySQL solution
WITH RankedContent AS (
SELECT ContentID,
Genre,
TotalViews,
WatchDuration,
(TotalViews * WatchDuration) AS PerformanceScore,
ROW_NUMBER() OVER (PARTITION BY Genre ORDER BY (TotalViews * WatchDuration) D
ESC) AS Rank
FROM ContentPerformance
)
SELECT ContentID, Genre, TotalViews, WatchDuration
FROM RankedContent
WHERE Rank = 1;
• Q.459

Question
Write a SQL query to identify the top 10 VIP users for Netflix, based on the most watched
hours of content in the last month.

Explanation
The goal is to identify the top 10 users with the highest total watch time in the past month.
This involves:
• Joining the users table with the watching_activity table.
• Filtering the activity data for the past month.
• Summing the hours_watched for each user.

573
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Sorting the users by the total watch time in descending order.


• Limiting the result to the top 10 users.

Datasets and SQL Schemas


• - Table creation
CREATE TABLE users (
user_id INT PRIMARY KEY,
sign_up_date DATE,
subscription_type VARCHAR(50)
);

CREATE TABLE watching_activity (


activity_id INT PRIMARY KEY,
user_id INT,
date_time TIMESTAMP,
show_id INT,
hours_watched DECIMAL(5, 2),
FOREIGN KEY (user_id) REFERENCES users(user_id)
);
• - Datasets
-- Users
INSERT INTO users (user_id, sign_up_date, subscription_type)
VALUES
(435, '2020-08-20', 'Standard'),
(278, '2021-01-01', 'Premium'),
(529, '2021-09-15', 'Basic'),
(692, '2021-12-28', 'Standard'),
(729, '2022-01-06', 'Premium');

-- Watching Activity
INSERT INTO watching_activity (activity_id, user_id, date_time, show_id, hours_watched)
VALUES
(10355, 435, '2022-02-09 12:30:00', 12001, 2.5),
(14872, 278, '2022-02-10 14:15:00', 17285, 1.2),
(12293, 529, '2022-02-18 21:10:00', 12001, 4.3),
(16352, 692, '2022-02-20 19:00:00', 17285, 3.7),
(17485, 729, '2022-02-25 16:45:00', 17285, 1.9);

Learnings
• Using JOIN to combine data from multiple tables.
• Filtering data based on date ranges (last month).
• Using aggregation (SUM()) to calculate total hours watched.
• Sorting the result in descending order and limiting the number of results.

Solutions
• - PostgreSQL solution
SELECT users.user_id, SUM(watching_activity.hours_watched) AS total_hours_watched
FROM users
JOIN watching_activity ON users.user_id = watching_activity.user_id
WHERE watching_activity.date_time BETWEEN date_trunc('month', CURRENT_DATE - INTERVAL '1
month') AND date_trunc('month', CURRENT_DATE)
GROUP BY users.user_id
ORDER BY total_hours_watched DESC
LIMIT 10;
• - MySQL solution
SELECT users.user_id, SUM(watching_activity.hours_watched) AS total_hours_watched
FROM users
JOIN watching_activity ON users.user_id = watching_activity.user_id
WHERE watching_activity.date_time BETWEEN CURDATE() - INTERVAL 1 MONTH AND CURDATE()
GROUP BY users.user_id
ORDER BY total_hours_watched DESC
LIMIT 10;
• Q.460

574
1000+ SQL Interview Questions & Answers | By Zero Analyst

Question
Write a SQL query to calculate the average rating for each show within a given month. The
results should be ordered by month and then by average rating in descending order.

Explanation
The task is to calculate the average rating for each show in each month. This involves:
• Extracting the month from the review_date.
• Grouping the data by show_id and month.
• Calculating the average rating (AVG(stars)).
• Sorting the results first by month and then by the average rating in descending order.

Datasets and SQL Schemas


• - Table creation
CREATE TABLE show_reviews (
review_id INT PRIMARY KEY,
user_id INT,
review_date TIMESTAMP,
show_id INT,
stars INT
);
• - Datasets
-- show_reviews
INSERT INTO show_reviews (review_id, user_id, review_date, show_id, stars)
VALUES
(6171, 123, '2022-06-08 00:00:00', 50001, 4),
(7802, 265, '2022-06-10 00:00:00', 69852, 4),
(5293, 362, '2022-06-18 00:00:00', 50001, 3),
(6352, 192, '2022-07-26 00:00:00', 69852, 3),
(4517, 981, '2022-07-05 00:00:00', 69852, 2);

Learnings
• Using EXTRACT(MONTH FROM date) to extract the month part from a TIMESTAMP or DATE.
• Aggregating data with AVG() to calculate the average rating.
• Grouping data by multiple columns (show_id and month).
• Sorting results using ORDER BY.

Solutions
• - PostgreSQL solution
SELECT
EXTRACT(MONTH FROM review_date) AS mth,
show_id,
AVG(stars) AS avg_stars
FROM
show_reviews
GROUP BY
mth,
show_id
ORDER BY
mth,
avg_stars DESC;
• - MySQL solution
SELECT
MONTH(review_date) AS mth,
show_id,
AVG(stars) AS avg_stars
FROM
show_reviews

575
1000+ SQL Interview Questions & Answers | By Zero Analyst

GROUP BY
mth,
show_id
ORDER BY
mth,
avg_stars DESC;

Uber
• Q.461
Question
Write a query to find the total number of rides each user has taken.
Explanation
This query should count the number of rides per user. It involves using the COUNT() function
to aggregate the rides and grouping by user_id.
Datasets and SQL Schemas
• - Table creation
CREATE TABLE rides (
ride_id INT PRIMARY KEY,
user_id INT,
ride_date TIMESTAMP,
ride_distance DECIMAL(10, 2),
fare DECIMAL(10, 2)
);
• - Datasets
-- Rides
INSERT INTO rides (ride_id, user_id, ride_date, ride_distance, fare)
VALUES
(1, 101, '2023-01-01 08:00:00', 5.0, 15.50),
(2, 102, '2023-01-01 09:00:00', 10.0, 25.00),
(3, 101, '2023-01-02 10:00:00', 3.5, 10.00),
(4, 103, '2023-01-02 11:00:00', 7.0, 18.50),
(5, 101, '2023-01-03 12:00:00', 4.0, 12.00);

Learnings
• Using COUNT() for aggregation.
• Grouping by user_id to get counts per user.
• Basic JOIN operations (if needed).
Solutions
• - PostgreSQL solution
SELECT user_id, COUNT(ride_id) AS total_rides
FROM rides
GROUP BY user_id;
• - MySQL solution
SELECT user_id, COUNT(ride_id) AS total_rides
FROM rides
GROUP BY user_id;

• Q.462
Question
Write a query to find the total number of active users who have taken at least one ride in the
last 7 days.
Explanation

576
1000+ SQL Interview Questions & Answers | By Zero Analyst

This query should filter users who have taken at least one ride in the last 7 days. It involves
using COUNT(DISTINCT user_id) to get the number of unique active users.
Datasets and SQL Schemas
• - Table creation
CREATE TABLE rides (
ride_id INT PRIMARY KEY,
user_id INT,
ride_date TIMESTAMP,
ride_distance DECIMAL(10, 2),
fare DECIMAL(10, 2)
);
• - Datasets
-- Rides
INSERT INTO rides (ride_id, user_id, ride_date, ride_distance, fare)
VALUES
(1, 101, '2023-01-01 08:00:00', 5.0, 15.50),
(2, 102, '2023-01-02 09:00:00', 10.0, 25.00),
(3, 103, '2023-01-07 10:00:00', 3.5, 12.00),
(4, 101, '2023-01-08 12:00:00', 7.0, 18.50),
(5, 104, '2023-01-09 14:00:00', 4.0, 14.00);

Learnings
• Using date functions to filter by the last 7 days (NOW() - INTERVAL '7 days').
• Aggregating with COUNT(DISTINCT user_id) for unique active users.
• Basic JOIN operations (if needed).
Solutions
• - PostgreSQL solution
SELECT COUNT(DISTINCT user_id) AS active_users
FROM rides
WHERE ride_date >= NOW() - INTERVAL '7 days';
• - MySQL solution
SELECT COUNT(DISTINCT user_id) AS active_users
FROM rides
WHERE ride_date >= CURDATE() - INTERVAL 7 DAY;
• Q.463
Question
Write a query to find the average fare for Uber rides taken during each hour of the day.
Explanation
This query should calculate the average fare per hour of the day, which requires extracting
the hour from the ride_date and grouping by that hour.
Datasets and SQL Schemas
• - Table creation
CREATE TABLE rides (
ride_id INT PRIMARY KEY,
user_id INT,
ride_date TIMESTAMP,
ride_distance DECIMAL(10, 2),
fare DECIMAL(10, 2)
);
• - Datasets
-- Rides
INSERT INTO rides (ride_id, user_id, ride_date, ride_distance, fare)
VALUES
(1, 101, '2023-01-01 08:15:00', 5.0, 15.50),
(2, 102, '2023-01-01 09:30:00', 10.0, 25.00),

577
1000+ SQL Interview Questions & Answers | By Zero Analyst

(3, 101, '2023-01-01 10:45:00', 3.5, 12.00),


(4, 103, '2023-01-01 08:30:00', 7.0, 18.50),
(5, 101, '2023-01-01 13:00:00', 4.0, 14.00);

Learnings
• Using EXTRACT(HOUR FROM ride_date) to get the hour part.
• Aggregating with AVG(fare) for average fare.
• Grouping by hour for analysis.
Solutions
• - PostgreSQL solution
SELECT EXTRACT(HOUR FROM ride_date) AS hour_of_day, AVG(fare) AS average_fare
FROM rides
GROUP BY hour_of_day
ORDER BY hour_of_day;
• - MySQL solution
SELECT HOUR(ride_date) AS hour_of_day, AVG(fare) AS average_fare
FROM rides
GROUP BY hour_of_day
ORDER BY hour_of_day;
• Q.464

Question
Write a query to calculate the total earnings for each driver in the past month. The total
earnings are the sum of the fare for all rides completed by each driver.
Explanation
This query involves filtering rides from the last month, grouping by driver_id, and
calculating the total earnings using the SUM() function.
Datasets and SQL Schemas
• - Table creation
CREATE TABLE rides (
ride_id INT PRIMARY KEY,
user_id INT,
driver_id INT,
ride_date TIMESTAMP,
fare DECIMAL(10, 2)
);
• - Datasets
-- Rides
INSERT INTO rides (ride_id, user_id, driver_id, ride_date, fare)
VALUES
(1, 101, 201, '2023-02-10 08:00:00', 15.50),
(2, 102, 202, '2023-02-15 09:00:00', 25.00),
(3, 103, 201, '2023-02-20 10:00:00', 18.50),
(4, 104, 201, '2023-03-01 11:00:00', 12.00),
(5, 105, 202, '2023-02-25 12:00:00', 14.00);

Learnings
• Using SUM() for aggregation.
• Filtering data for the last month.
• Grouping by driver_id.
Solutions
• - PostgreSQL solution
SELECT driver_id, SUM(fare) AS total_earnings
FROM rides

578
1000+ SQL Interview Questions & Answers | By Zero Analyst

WHERE ride_date >= NOW() - INTERVAL '1 month'


GROUP BY driver_id;
• - MySQL solution
SELECT driver_id, SUM(fare) AS total_earnings
FROM rides
WHERE ride_date >= CURDATE() - INTERVAL 1 MONTH
GROUP BY driver_id;
• Q.465
Question
Write a query to find the top 5 users who have traveled the most total distance in Uber rides.
Explanation
This query should sum up the total distance traveled by each user and return the top 5 users
who traveled the most. Use the SUM() function to calculate the total distance and order the
results by the sum in descending order.
Datasets and SQL Schemas
• - Table creation
CREATE TABLE rides (
ride_id INT PRIMARY KEY,
user_id INT,
ride_date TIMESTAMP,
ride_distance DECIMAL(10, 2),
fare DECIMAL(10, 2)
);
• - Datasets
-- Rides
INSERT INTO rides (ride_id, user_id, ride_date, ride_distance, fare)
VALUES
(1, 101, '2023-01-01 08:00:00', 5.0, 15.50),
(2, 102, '2023-01-02 09:00:00', 10.0, 25.00),
(3, 103, '2023-01-03 10:00:00', 7.0, 18.50),
(4, 104, '2023-01-04 11:00:00', 15.0, 35.00),
(5, 101, '2023-01-05 12:00:00', 4.0, 12.00),
(6, 105, '2023-01-06 13:00:00', 3.5, 14.00);

Learnings
• Using SUM() to calculate the total distance traveled by each user.
• Sorting the results in descending order.
• Using LIMIT to get the top 5 users.
Solutions
• - PostgreSQL solution
SELECT user_id, SUM(ride_distance) AS total_distance
FROM rides
GROUP BY user_id
ORDER BY total_distance DESC
LIMIT 5;
• - MySQL solution
SELECT user_id, SUM(ride_distance) AS total_distance
FROM rides
GROUP BY user_id
ORDER BY total_distance DESC
LIMIT 5;
• Q.466
Question
Write a query to find the number of Uber rides taken on each day of the week. Display the
day of the week and the total number of rides taken on that day.

579
1000+ SQL Interview Questions & Answers | By Zero Analyst

Explanation
This query should extract the day of the week from the ride_date and then count the number
of rides taken on each day of the week. The EXTRACT() function can be used to get the day of
the week.
Datasets and SQL Schemas
• - Table creation
CREATE TABLE rides (
ride_id INT PRIMARY KEY,
user_id INT,
ride_date TIMESTAMP,
ride_distance DECIMAL(10, 2),
fare DECIMAL(10, 2)
);
• - Datasets
-- Rides
INSERT INTO rides (ride_id, user_id, ride_date, ride_distance, fare)
VALUES
(1, 101, '2023-02-01 08:00:00', 5.0, 15.50),
(2, 102, '2023-02-02 09:00:00', 10.0, 25.00),
(3, 103, '2023-02-03 10:00:00', 7.0, 18.50),
(4, 104, '2023-02-03 11:00:00', 3.5, 12.00),
(5, 101, '2023-02-04 12:00:00', 6.0, 16.00),
(6, 105, '2023-02-05 13:00:00', 4.0, 14.00);

Learnings
• Using EXTRACT(DOW FROM ride_date) or DAYOFWEEK(ride_date) to get the day of the
week.
• Aggregating using COUNT() to find the number of rides per day.
• Grouping by day of the week.
Solutions
• - PostgreSQL solution
SELECT EXTRACT(DOW FROM ride_date) AS day_of_week, COUNT(ride_id) AS total_rides
FROM rides
GROUP BY day_of_week
ORDER BY day_of_week;
• - MySQL solution
SELECT DAYOFWEEK(ride_date) AS day_of_week, COUNT(ride_id) AS total_rides
FROM rides
GROUP BY day_of_week
ORDER BY day_of_week;
• Q.467
Question
Assume you are given the table below on Uber transactions made by users. Write a query to
obtain the third transaction of every user. Output the user id, spend, and transaction date.
Explanation
To get the third transaction for each user, we need to:
• Rank the transactions of each user based on the transaction_date.
• Filter to only include the third transaction for each user.
We can use the
ROW_NUMBER() window function to rank transactions for each user and then filter out those
that are ranked 3.
Datasets and SQL Schemas

580
1000+ SQL Interview Questions & Answers | By Zero Analyst

• - Table creation
CREATE TABLE transactions (
user_id INT,
spend DECIMAL(10, 2),
transaction_date TIMESTAMP
);
• - Datasets
-- Transactions
INSERT INTO transactions (user_id, spend, transaction_date)
VALUES
(111, 100.50, '2022-01-08 12:00:00'),
(111, 55.00, '2022-01-10 12:00:00'),
(121, 36.00, '2022-01-18 12:00:00'),
(145, 24.99, '2022-01-26 12:00:00'),
(111, 89.60, '2022-02-05 12:00:00');

Learnings
• Using window functions (ROW_NUMBER()) to rank data within partitions.
• Filtering data based on the row number to get specific entries, such as the third transaction.
Solutions
• - PostgreSQL solution
WITH ranked_transactions AS (
SELECT
user_id,
spend,
transaction_date,
ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY transaction_date) AS transactio
n_rank
FROM transactions
)
SELECT user_id, spend, transaction_date
FROM ranked_transactions
WHERE transaction_rank = 3;
• - MySQL solution
WITH ranked_transactions AS (
SELECT
user_id,
spend,
transaction_date,
ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY transaction_date) AS transactio
n_rank
FROM transactions
)
SELECT user_id, spend, transaction_date
FROM ranked_transactions
WHERE transaction_rank = 3;

Key Points:
• The ROW_NUMBER() function is used to assign a sequential rank to each transaction for each
user.
• PARTITION BY user_id ensures the ranking is reset for each user.
• WHERE transaction_rank = 3 filters for the third transaction in the sequence.
• Q.468
Question
As a data analyst at Uber, it's your job to report the latest metrics for specific groups of Uber
users. Some riders create their Uber account the same day they book their first ride; the rider
engagement team calls them "in-the-moment" users.

581
1000+ SQL Interview Questions & Answers | By Zero Analyst

Uber wants to know the average delay between the day of user sign-up and the day of their
2nd ride. Write a query to pull the average 2nd ride delay for "in-the-moment" Uber users.
Round the answer to 2-decimal places.
Explanation
"In-the-moment" users are those whose registration date matches the date of their first ride.
To find the average delay between the day of sign-up and the day of their 2nd ride:
• We first identify "in-the-moment" users (where the sign-up date equals the first ride date).
• For each "in-the-moment" user, we calculate the delay between the registration date and
the date of their second ride.
• Finally, we compute the average delay across all such users.
Datasets and SQL Schemas
• - Table creation
CREATE TABLE users (
user_id INT PRIMARY KEY,
registration_date DATE
);

CREATE TABLE rides (


ride_id INT PRIMARY KEY,
user_id INT,
ride_date DATE
);
• - Datasets
-- Users
INSERT INTO users (user_id, registration_date)
VALUES
(1, '2022-08-15'),
(2, '2022-08-21');

-- Rides
INSERT INTO rides (ride_id, user_id, ride_date)
VALUES
(1, 1, '2022-08-15'),
(2, 1, '2022-08-16'),
(3, 2, '2022-09-20'),
(4, 2, '2022-09-23');

Learnings
• Using window functions to rank rides by date.
• Filtering for users who meet the "in-the-moment" criteria.
• Using DATEDIFF() or equivalent functions to calculate date differences.
Solutions
• - PostgreSQL solution
WITH in_the_moment_users AS (
SELECT u.user_id, u.registration_date,
(SELECT ride_date FROM rides r WHERE r.user_id = u.user_id ORDER BY ride_date
LIMIT 1) AS first_ride_date,
(SELECT ride_date FROM rides r WHERE r.user_id = u.user_id ORDER BY ride_date
OFFSET 1 LIMIT 1) AS second_ride_date
FROM users u
)
SELECT ROUND(AVG(DATE_PART('day', second_ride_date - registration_date)), 2) AS avg_2nd_
ride_delay
FROM in_the_moment_users
WHERE registration_date = first_ride_date;
• - MySQL solution
WITH in_the_moment_users AS (
SELECT u.user_id, u.registration_date,

582
1000+ SQL Interview Questions & Answers | By Zero Analyst

(SELECT ride_date FROM rides r WHERE r.user_id = u.user_id ORDER BY ride_date


LIMIT 1) AS first_ride_date,
(SELECT ride_date FROM rides r WHERE r.user_id = u.user_id ORDER BY ride_date
LIMIT 1 OFFSET 1) AS second_ride_date
FROM users u
)
SELECT ROUND(AVG(DATEDIFF(second_ride_date, registration_date)), 2) AS avg_2nd_ride_dela
y
FROM in_the_moment_users
WHERE registration_date = first_ride_date;

Key Points:
• Identifying "In-the-moment" Users: These are users whose registration date matches the
date of their first ride.
• Ranking Rides: Using ORDER BY to select the first and second rides for each user.
• Date Difference: Calculating the delay between registration and the second ride using
DATE_PART (PostgreSQL) or DATEDIFF (MySQL).
• Rounding the Result: The ROUND() function ensures the delay is shown to 2 decimal
places.
• Q.469
Question
Uber has a diverse range of vehicles from bikes, scooters, to premium luxury cars. In order to
cater their services better, Uber wants to understand their customers' preference. The task is
to write a SQL query that filters out the most used vehicle type by Uber's customers in the
past year. To provide a more holistic view, the results should also exclude rides that were
cancelled by either the driver or the user.
Explanation
This query will:
• Filter out cancelled rides (cancelled = false).
• Limit the data to rides that occurred within the last year.
• Join the rides table with the vehicle_types table to fetch the vehicle type name.
• Count the number of rides for each vehicle type.
• Order the results by the total number of rides and return the vehicle type with the highest
usage.
Datasets and SQL Schemas
• - Table creation
CREATE TABLE rides (
ride_id INT PRIMARY KEY,
user_id INT,
vehicle_type_id INT,
start_time TIMESTAMP,
end_time TIMESTAMP,
cancelled BOOLEAN
);

CREATE TABLE vehicle_types (


type_id INT PRIMARY KEY,
vehicle_type VARCHAR(50)
);
• - Datasets
-- Rides
INSERT INTO rides (ride_id, user_id, vehicle_type_id, start_time, end_time, cancelled)
VALUES

583
1000+ SQL Interview Questions & Answers | By Zero Analyst

(88031, 61023, 5, '2021-07-01 08:15:00', '2021-07-01 08:45:00', false),


(88032, 61024, 1, '2021-07-01 09:15:00', '2021-07-01 09:45:00', false),
(88033, 61025, 2, '2021-07-01 10:15:00', '2021-07-01 10:45:00', true),
(88034, 61026, 5, '2021-07-01 11:15:00', '2021-07-01 11:45:00', false),
(88035, 61027, 3, '2021-07-01 12:15:00', '2021-07-01 12:45:00', false);

-- Vehicle Types
INSERT INTO vehicle_types (type_id, vehicle_type)
VALUES
(1, 'Bike'),
(2, 'Car'),
(3, 'SUV'),
(4, 'Luxury Car'),
(5, 'Scooter');

Learnings
• Using JOIN to combine data from multiple tables.
• Filtering data based on conditions such as cancelled = false and start_time >=
NOW() - INTERVAL '1 year'.
• Grouping data by vehicle type and using COUNT() to find the most popular vehicle type.
• Sorting the result in descending order and limiting to the top result with LIMIT 1.
Solutions
• - PostgreSQL solution
SELECT v.vehicle_type, COUNT(*) AS total_rides
FROM rides r
JOIN vehicle_types v ON r.vehicle_type_id = v.type_id
WHERE r.cancelled = false
AND r.start_time >= (NOW() - INTERVAL '1 year')
GROUP BY v.vehicle_type
ORDER BY total_rides DESC
LIMIT 1;
• - MySQL solution
SELECT v.vehicle_type, COUNT(*) AS total_rides
FROM rides r
JOIN vehicle_types v ON r.vehicle_type_id = v.type_id
WHERE r.cancelled = false
AND r.start_time >= CURDATE() - INTERVAL 1 YEAR
GROUP BY v.vehicle_type
ORDER BY total_rides DESC
LIMIT 1;

Key Points:
• Filtering Out Cancelled Rides: We exclude cancelled rides by filtering with
r.cancelled = false.
• Limiting by Date: We restrict the data to rides that occurred in the last year using NOW() -
INTERVAL '1 year' for PostgreSQL or CURDATE() - INTERVAL 1 YEAR for MySQL.
• Using COUNT: We count the number of rides for each vehicle type.
• Sorting and Limiting: The query sorts by the number of rides (total_rides) and returns
the vehicle type with the most rides.
• Q.470
Question
Uber has a diverse range of vehicles from bikes, scooters, to premium luxury cars. In order to
cater their services better, Uber wants to understand their customers' preference. The task is
to write a SQL query that filters out the most used vehicle type by Uber's customers in the
past year. To provide a more holistic view, the results should also exclude rides that were
cancelled by either the driver or the user.

584
1000+ SQL Interview Questions & Answers | By Zero Analyst

Explanation
To find the most used vehicle type in the past year:
• Filter out the cancelled rides (cancelled = false).
• Limit the data to rides that occurred within the past year.
• Join the rides table with the vehicle_types table to fetch vehicle names.
• Group by vehicle type and count the number of rides for each type.
• Sort the results by the total number of rides and return the most used vehicle type.
Datasets and SQL Schemas
• - Table creation
CREATE TABLE rides (
ride_id INT PRIMARY KEY,
user_id INT,
vehicle_type_id INT,
start_time TIMESTAMP,
end_time TIMESTAMP,
cancelled BOOLEAN
);

CREATE TABLE vehicle_types (


type_id INT PRIMARY KEY,
vehicle_type VARCHAR(50)
);
• - Datasets
-- Rides
INSERT INTO rides (ride_id, user_id, vehicle_type_id, start_time, end_time, cancelled)
VALUES
(88031, 61023, 5, '2021-07-01 08:15:00', '2021-07-01 08:45:00', false),
(88032, 61024, 1, '2021-07-01 09:15:00', '2021-07-01 09:45:00', false),
(88033, 61025, 2, '2021-07-01 10:15:00', '2021-07-01 10:45:00', true),
(88034, 61026, 5, '2021-07-01 11:15:00', '2021-07-01 11:45:00', false),
(88035, 61027, 3, '2021-07-01 12:15:00', '2021-07-01 12:45:00', false);

-- Vehicle Types
INSERT INTO vehicle_types (type_id, vehicle_type)
VALUES
(1, 'Bike'),
(2, 'Car'),
(3, 'SUV'),
(4, 'Luxury Car'),
(5, 'Scooter');

Learnings
• Joining Tables: The query uses JOIN to combine the rides and vehicle_types tables.
• Filtering Cancelled Rides: We exclude cancelled rides by checking r.cancelled =
false.
• Date Filtering: The query limits results to the past year by using NOW() - INTERVAL '1
year' for PostgreSQL or CURDATE() - INTERVAL 1 YEAR for MySQL.
• Counting Rides: The number of rides for each vehicle type is counted using COUNT(*).
• Sorting and Limiting: The results are sorted in descending order by the ride count, and
only the top result (most used vehicle type) is returned.
Solutions
• - PostgreSQL solution
SELECT v.vehicle_type, COUNT(*) AS total_rides
FROM rides r
JOIN vehicle_types v ON r.vehicle_type_id = v.type_id
WHERE r.cancelled = false
AND r.start_time >= (NOW() - INTERVAL '1 year')

585
1000+ SQL Interview Questions & Answers | By Zero Analyst

GROUP BY v.vehicle_type
ORDER BY total_rides DESC
LIMIT 1;
• - MySQL solution
SELECT v.vehicle_type, COUNT(*) AS total_rides
FROM rides r
JOIN vehicle_types v ON r.vehicle_type_id = v.type_id
WHERE r.cancelled = false
AND r.start_time >= CURDATE() - INTERVAL 1 YEAR
GROUP BY v.vehicle_type
ORDER BY total_rides DESC
LIMIT 1;

Key Points:
• Filtering Out Cancelled Rides: By using r.cancelled = false, we ensure that only
completed rides are considered.
• Limiting by Date: We restrict the data to the past year using NOW() - INTERVAL '1
year' (PostgreSQL) or CURDATE() - INTERVAL 1 YEAR (MySQL).
• Counting Rides: The COUNT() function is used to calculate the number of rides for each
vehicle type.
• Sorting and Limiting: The results are ordered by the number of rides (total_rides), and
we use LIMIT 1 to get the vehicle type with the most rides.
• Q.471
Question
As a data analyst for Uber, you are asked to determine each driver's average ratings for each
city. This will help Uber monitor performance and perhaps highlight any problems that might
be arising in any specific city.
We have two tables, rides and ratings.
Explanation
This query will:
• Join the rides table with the ratings table using ride_id to link the corresponding ride
and its rating.
• Group the results by driver_id and city to calculate the average rating per driver per
city.
• Use the AVG() function to compute the average rating for each group.
Datasets and SQL Schemas
• - Table creation
CREATE TABLE rides (
ride_id INT PRIMARY KEY,
driver_id INT,
city VARCHAR(100),
fare_amount DECIMAL
);

CREATE TABLE ratings (


ride_id INT PRIMARY KEY,
rating DECIMAL(3, 1)
);
• - Datasets
-- Rides
INSERT INTO rides (ride_id, driver_id, city, fare_amount)
VALUES
(101, 201, 'New York', 25.50),

586
1000+ SQL Interview Questions & Answers | By Zero Analyst

(102, 202, 'San Francisco', 18.00),


(103, 203, 'Chicago', 22.75),
(104, 201, 'San Francisco', 30.00),
(105, 202, 'New York', 20.00);

-- Ratings
INSERT INTO ratings (ride_id, rating)
VALUES
(101, 4.3),
(102, 4.1),
(103, 4.8),
(104, 4.7),
(105, 3.9);

Learnings
• Joining Tables: Use INNER JOIN to combine data from the rides and ratings tables
based on the ride_id.
• Grouping Data: Use GROUP BY to aggregate the data by driver_id and city.
• Aggregating with AVG(): Use AVG() to calculate the average rating for each group of
driver_id and city.

Solutions
• - PostgreSQL solution
SELECT r.driver_id, r.city, AVG(rt.rating) AS avg_rating
FROM rides r
INNER JOIN ratings rt ON r.ride_id = rt.ride_id
GROUP BY r.driver_id, r.city;
• - MySQL solution
SELECT r.driver_id, r.city, AVG(rt.rating) AS avg_rating
FROM rides r
INNER JOIN ratings rt ON r.ride_id = rt.ride_id
GROUP BY r.driver_id, r.city;

Key Points:
• INNER JOIN: Combines rides and ratings based on the matching ride_id.
• Grouping: Aggregates data by both driver_id and city using GROUP BY.
• AVG(): Computes the average rating for each group.
• Q.472
Question
As an SQL analyst at Uber, you are assigned to filter out the customers who have registered
using their Gmail IDs. You are given a database named 'users'. The records in this table
contain multiple email domains. You need to write an SQL query that filters only those
records where the 'email' field contains 'gmail.com'.
Explanation
To retrieve users with Gmail IDs:
• Use the LIKE operator to match emails containing 'gmail.com'.
• The % symbol is a wildcard that matches any number of characters before 'gmail.com'.
• Filter out only the records where the email domain is 'gmail.com'.
Datasets and SQL Schemas
• - Table creation
CREATE TABLE users (
user_id INT PRIMARY KEY,
full_name VARCHAR(100),

587
1000+ SQL Interview Questions & Answers | By Zero Analyst

registration_date DATE,
email VARCHAR(255)
);
• - Datasets
-- Users
INSERT INTO users (user_id, full_name, registration_date, email)
VALUES
(7162, 'John Doe', '2019-05-04', '[email protected]'),
(7625, 'Jane Smith', '2020-11-09', '[email protected]'),
(5273, 'Steve Johnson', '2018-06-20', '[email protected]'),
(6322, 'Emily Davis', '2021-08-14', '[email protected]'),
(4812, 'Olivia Brown', '2019-09-30', '[email protected]');

Learnings
• Using LIKE for Pattern Matching: The LIKE operator allows us to filter records based
on a pattern. Using %gmail.com ensures that only emails containing 'gmail.com' are returned.
• Wildcard Matching: % is a wildcard that matches zero or more characters, allowing
flexible pattern matching.
Solutions
• - PostgreSQL solution
SELECT *
FROM users
WHERE email LIKE '%gmail.com';
• - MySQL solution
SELECT *
FROM users
WHERE email LIKE '%gmail.com';

Key Points:
• LIKE Operator: Used for matching email addresses containing specific patterns.
• Wildcard %: Matches any sequence of characters, ensuring flexibility in the search.
• Q.473
Question
Uber is conducting an analysis of its driver performance across various cities. Your task is to
develop a SQL query to identify the top-performing drivers based on their average rating.
Only drivers who have completed at least 6 trips should be considered for this analysis. The
query should provide the driver's name, city, and their average rating, sorted in descending
order of average rating.
Note: Round the average rating to 2 decimal points.
Explanation
This query will:
• Join the Drivers and Trips tables using DRIVER_ID to fetch the relevant driver
information and their ratings.
• Filter out drivers who have completed fewer than 6 trips.
• Calculate the average rating for each driver.
• Round the average rating to 2 decimal places.
• Sort the results in descending order of the average rating.
Datasets and SQL Schemas
• - Table creation

588
1000+ SQL Interview Questions & Answers | By Zero Analyst

CREATE TABLE Drivers (


DRIVER_ID INT PRIMARY KEY,
DRIVER_NAME VARCHAR(100),
CITY VARCHAR(100)
);

CREATE TABLE Trips (


TRIP_ID INT PRIMARY KEY,
DRIVER_ID INT,
RATING DECIMAL(3, 1)
);
• - Datasets
-- Drivers
INSERT INTO Drivers (DRIVER_ID, DRIVER_NAME, CITY)
VALUES
(4, 'Emily Davis', 'San Francisco'),
(5, 'Christopher Wilson', 'Miami'),
(6, 'Jessica Martinez', 'Seattle');

-- Trips
INSERT INTO Trips (TRIP_ID, DRIVER_ID, RATING)
VALUES
(21, 4, 5),
(22, 4, 4),
(23, 4, 5);

Learnings
• JOIN: The JOIN operation is used to combine driver information with trip ratings based on
DRIVER_ID.
• HAVING: The HAVING clause filters out drivers with fewer than 6 trips after grouping by
DRIVER_ID.
• AVG(): The AVG() function calculates the average rating for each driver.
• ROUND(): The ROUND() function rounds the average rating to 2 decimal places.
• Sorting: The ORDER BY clause is used to sort the results in descending order of average
rating.
Solutions
• - PostgreSQL solution
SELECT d.DRIVER_NAME, d.CITY, ROUND(AVG(t.RATING), 2) AS avg_rating
FROM Drivers d
JOIN Trips t ON d.DRIVER_ID = t.DRIVER_ID
GROUP BY d.DRIVER_ID, d.DRIVER_NAME, d.CITY
HAVING COUNT(t.TRIP_ID) >= 6
ORDER BY avg_rating DESC;
• - MySQL solution
SELECT d.DRIVER_NAME, d.CITY, ROUND(AVG(t.RATING), 2) AS avg_rating
FROM Drivers d
JOIN Trips t ON d.DRIVER_ID = t.DRIVER_ID
GROUP BY d.DRIVER_ID, d.DRIVER_NAME, d.CITY
HAVING COUNT(t.TRIP_ID) >= 6
ORDER BY avg_rating DESC;

Key Points:
• JOIN: Combines data from Drivers and Trips based on DRIVER_ID.
• HAVING: Filters drivers who have completed at least 6 trips.
• AVG(): Calculates the average rating per driver.
• ROUND(): Rounds the average rating to 2 decimal points.
• ORDER BY: Sorts the drivers based on their average rating in descending order.
• Q.474

589
1000+ SQL Interview Questions & Answers | By Zero Analyst

Question
Write a query to find, for each user that has taken at least two trips with Uber, the time that
elapsed between the first trip and the second trip.
Explanation
This query:
• Identifies users who have taken at least two trips by counting the number of trips per rider.
• Uses the LAG() function to retrieve the timestamp of the previous trip for each rider,
ordered by the trip_timestamp.
• Calculates the time difference between each trip and the previous one (i.e., the time elapsed
between the first and second trip for each rider).
• Filters out riders who have fewer than two trips.
Datasets and SQL Schemas
• - Table creation
CREATE TABLE sign_ups (
rider_id INT PRIMARY KEY,
signup_timestamp DATE
);

CREATE TABLE trips (


trip_id INT PRIMARY KEY,
rider_id INT,
driver_id INT,
trip_timestamp TIMESTAMP
);
• - Datasets
-- Sign-ups
INSERT INTO sign_ups (rider_id, signup_timestamp)
VALUES
(1, '2022-03-01'),
(2, '2022-03-01'),
(3, '2022-03-01'),
(4, '2022-03-01'),
(5, '2022-03-01');

-- Trips
INSERT INTO trips (trip_id, rider_id, driver_id, trip_timestamp)
VALUES
(1, 1, 2, '2022-02-01'),
(2, 2, 2, '2022-03-11'),
(3, 1, 2, '2022-04-01'),
(4, 1, 2, '2022-05-21'),
(5, 2, 2, '2022-06-01'),
(6, 3, 2, '2022-07-31');

Learnings
• LAG() Window Function: The LAG() function allows us to access the previous row's
value, enabling us to calculate the time difference between consecutive trips.
• PARTITION BY: This ensures that the time difference is calculated separately for each
rider_id.
• Time Calculation: The query calculates the difference between timestamps and returns the
result in days.
• Filtering with HAVING: We filter out riders who have taken fewer than two trips using
HAVING COUNT(*) > 1.

Solutions
• - PostgreSQL solution

590
1000+ SQL Interview Questions & Answers | By Zero Analyst

SELECT rider_id,
trip_timestamp,
lag(trip_timestamp, 1) OVER (PARTITION BY rider_id
ORDER BY trip_timestamp DESC) - trip_timestamp
AS time_between_two_trip
FROM trips
WHERE rider_id IN
(SELECT rider_id
FROM trips
GROUP BY rider_id
HAVING COUNT(*) > 1);
• - MySQL solution
SELECT rider_id,
trip_timestamp,
TIMESTAMPDIFF(DAY,
lag(trip_timestamp, 1) OVER (PARTITION BY rider_id ORDER BY trip_ti
mestamp DESC),
trip_timestamp) AS time_between_two_trip
FROM trips
WHERE rider_id IN
(SELECT rider_id
FROM trips
GROUP BY rider_id
HAVING COUNT(*) > 1);

Key Points:
• LAG() Function: Retrieves the timestamp of the previous trip in the ordered sequence for
each rider.
• TIMESTAMPDIFF(): In MySQL, TIMESTAMPDIFF() is used to calculate the time
difference in days.
• PARTITION BY: Ensures calculations are done per rider.
• HAVING COUNT(*) > 1: Filters out riders with fewer than two trips.
• Q.475
Question
Write a query to find how many users placed their third order containing a product owned by
the ATG (holding company) on or after 9/21/22. Only consider orders containing at least one
ATG holding company product.
Explanation
This query:
• Filters for orders placed on or after 9/21/22.
• Identifies users who placed at least three orders containing products owned by ATG (the
holding company).
• Ensures that we only consider orders that contain a product owned by the ATG holding
company.
• Returns the user_id of users who meet these criteria.
Datasets and SQL Schemas
• - Table creation
CREATE TABLE brands (
brand_id INT PRIMARY KEY,
brand_name VARCHAR(50),
holding_company_id INT,
holding_company_name VARCHAR(50)
);

CREATE TABLE orders (

591
1000+ SQL Interview Questions & Answers | By Zero Analyst

order_id INT PRIMARY KEY,


user_id INT,
product_id INT,
brand_id INT,
price DECIMAL,
quantity INT,
date DATE,
store_id INT
);
• - Datasets
-- Brands
INSERT INTO brands (brand_id, brand_name, holding_company_id, holding_company_name)
VALUES
(1, 'A5', 10, 'Beam Suntory'),
(2, 'A4', 9, 'CDE'),
(3, 'A3', 8, 'ATG'),
(4, 'A2', 7, 'Chivas'),
(5, 'A1', 6, 'MMT');

-- Orders
INSERT INTO orders (order_id, user_id, product_id, brand_id, price, quantity, date, stor
e_id)
VALUES
(1, 111, 3, 1, 100, 2, '2022-09-21', 12),
(2, 222, 7, 2, 123, 30, '2022-12-09', 34),
(3, 222, 9, 1, 14, 1, '2020-05-02', 435),
(4, 222, 11, 3, 140, 40, '2019-11-03', 23),
(5, 333, 13, 5, 120, 15, '2019-10-01', 45);

Learnings
• CTE (Common Table Expression): Using WITH to create an intermediate result
(count_orders) that counts the total orders per user after 9/21/22.
• EXISTS: Filters only users who have placed orders with products owned by ATG by using
a subquery with EXISTS.
• Filtering by Date: We ensure that we only count orders after 9/21/22 by filtering using
the date column.
Solution
WITH count_orders AS (
SELECT user_id,
COUNT(order_id) AS total_orders
FROM orders
WHERE date >= '2022-09-21'
GROUP BY user_id
)
SELECT user_id
FROM count_orders
WHERE total_orders >= 3
AND EXISTS (
SELECT 1
FROM orders o
JOIN brands b ON o.brand_id = b.brand_id
WHERE o.user_id = count_orders.user_id
AND b.holding_company_name = 'ATG'
);

Explanation of the Solution:


• CTE - count_orders:
• This part of the query counts how many orders each user placed after 9/21/22 (WHERE
date >= '2022-09-21').
• The GROUP BY user_id ensures we get the total number of orders for each user.
• Main Query:

592
1000+ SQL Interview Questions & Answers | By Zero Analyst

• The main query uses the count_orders CTE to filter users who have placed at least three
orders (WHERE total_orders >= 3).
• EXISTS Subquery:
• The EXISTS subquery ensures that the user has ordered at least one product owned by the
ATG holding company (AND b.holding_company_name = 'ATG').
• The JOIN between the orders table and the brands table allows us to filter for products
belonging to ATG.
• Final Output:
• The final result returns user_id of users who placed at least three orders containing
products owned by ATG on or after 9/21/22.

Key Points:
• WITH (CTE): Used to calculate the number of orders for each user.
• EXISTS: Efficiently filters users who placed orders with ATG products.
• Date Filter: Ensures only orders after 9/21/22 are considered.
• Q.476
Question
Write a query to find the latest trip timestamp for each user who took at least one trip.
Explanation
The query should:
• Find the latest trip timestamp for each user from the trips table.
• Only consider users who have taken at least one trip.
• Sort the results by rider_id in ascending order.
Datasets and SQL Schemas
• - Table creation
CREATE TABLE sign_ups (
rider_id INT PRIMARY KEY,
signup_timestamp DATE
);

CREATE TABLE trips (


trip_id INT PRIMARY KEY,
rider_id INT,
driver_id INT,
trip_timestamp TIMESTAMP
);
• - Datasets
-- Sign-ups
INSERT INTO sign_ups (rider_id, signup_timestamp)
VALUES
(1, '2022-03-01'),
(2, '2022-03-01'),
(3, '2022-03-01'),
(4, '2022-03-01'),
(5, '2022-03-01');

-- Trips
INSERT INTO trips (trip_id, rider_id, driver_id, trip_timestamp)
VALUES
(1, 1, 2, '2022-02-01 00:00:00'),
(2, 2, 2, '2022-03-11 00:00:00'),
(3, 1, 2, '2022-04-01 00:00:00'),
(4, 1, 2, '2022-05-21 00:00:00'),

593
1000+ SQL Interview Questions & Answers | By Zero Analyst

(5, 2, 2, '2022-06-01 00:00:00'),


(6, 3, 2, '2022-07-31 00:00:00');

Learnings
• MAX() Function: The MAX() function is used to find the latest timestamp for each user.
• GROUP BY Clause: This groups the data by rider_id to get the latest trip for each user.
• ORDER BY Clause: Sorting the results by rider_id ensures that the output is ordered as
requested.
Solution
SELECT rider_id,
MAX(trip_timestamp) AS latest_trip_timestamp
FROM trips
GROUP BY rider_id
ORDER BY rider_id ASC;

Explanation of the Solution:


• MAX(trip_timestamp):
• This function retrieves the latest trip timestamp for each rider by selecting the maximum
trip_timestamp for each rider_id.
• GROUP BY rider_id:
• The GROUP BY clause groups the records by rider_id, ensuring that we get one row per
rider with the latest trip timestamp.
• ORDER BY rider_id ASC:
• Sorting the results by rider_id in ascending order ensures that the result is returned as
requested.

Key Points:
• MAX(): Helps in getting the most recent trip timestamp.
• GROUP BY: Used to aggregate results by each unique user (rider_id).
• Sorting: The result is ordered by rider_id to comply with the prompt requirements.
• Q.477
Question
You are given a table of Uber rides that contains the mileage and the purpose for the business
expense. Your task is to find the top 3 business purposes by total mileage driven for
passengers that use Uber for their business transportation.
The query should:
• Calculate the total miles driven for each business purpose.
• Only include trips where the business purpose is categorized as "Business."
• Return the top 3 business purposes by total mileage.

Datasets and Table Creation

Table Creation for my_uber_drives


-- Table Creation for my_uber_drives
CREATE TABLE my_uber_drives (
ride_id INT PRIMARY KEY,

594
1000+ SQL Interview Questions & Answers | By Zero Analyst

user_id INT,
miles_driven DECIMAL(10, 2),
business_purpose VARCHAR(50),
ride_date DATE
);

Sample Data for my_uber_drives


-- Sample Data Insertion for my_uber_drives
INSERT INTO my_uber_drives (ride_id, user_id, miles_driven, business_purpose, ride_date)
VALUES
(1, 101, 12.5, 'Business', '2023-06-01'),
(2, 102, 8.0, 'Business', '2023-06-02'),
(3, 103, 15.0, 'Business', '2023-06-03'),
(4, 104, 10.0, 'Leisure', '2023-06-04'),
(5, 105, 5.5, 'Business', '2023-06-05'),
(6, 106, 7.5, 'Business', '2023-06-06'),
(7, 107, 20.0, 'Leisure', '2023-06-07'),
(8, 108, 25.0, 'Business', '2023-06-08'),
(9, 109, 18.0, 'Business', '2023-06-09'),
(10, 110, 22.0, 'Business', '2023-06-10');

Learnings from this Query:


• Filtering Data: Learn how to filter data based on specific conditions (WHERE clause), such
as focusing on the 'Business' purpose in this case.
• Aggregation: Using aggregate functions like SUM() to calculate the total mileage for each
business purpose.
• Grouping: Grouping data by a specific column (business_purpose) to calculate total
mileage for each category.
• Sorting and Limiting: Sorting the results to identify the top categories and limiting the
results to the top 3.
• Using LIMIT: Using the LIMIT clause to restrict the result set to the top N records.

Explanation:
• Table Structure:
• The my_uber_drives table contains the columns:
• ride_id: Unique identifier for each ride.
• user_id: The ID of the user who took the ride.
• miles_driven: The total miles driven for the ride.
• business_purpose: The purpose of the ride (e.g., 'Business', 'Leisure').
• ride_date: The date when the ride occurred.
• Query Breakdown:
• We filter for rides where the business_purpose is 'Business'.
• We then aggregate the data by business_purpose using SUM(miles_driven) to calculate
the total miles driven for each business purpose.
• We use the ORDER BY clause to sort the results in descending order by the total miles
driven.
• Finally, the LIMIT 3 ensures we only return the top 3 business purposes.

SQL Solutions for MySQL and PostgreSQL:

595
1000+ SQL Interview Questions & Answers | By Zero Analyst

MySQL Solution:
SELECT business_purpose,
SUM(miles_driven) AS total_miles
FROM my_uber_drives
WHERE business_purpose = 'Business'
GROUP BY business_purpose
ORDER BY total_miles DESC
LIMIT 3;

PostgreSQL Solution:
SELECT business_purpose,
SUM(miles_driven) AS total_miles
FROM my_uber_drives
WHERE business_purpose = 'Business'
GROUP BY business_purpose
ORDER BY total_miles DESC
LIMIT 3;

Output:
Assuming the sample data provided in the my_uber_drives table, the output will show the
total mileage driven for each business purpose, ordered by total miles in descending order.

business_purpose total_miles

Business 95.0
In this case, only one business purpose category ('Business') is considered, so it returns that
with the sum of the miles driven for all rides categorized as 'Business'.

Key Points:
• SUM() Function: The SUM() function is used to calculate the total miles driven for each
business purpose.
• GROUP BY Clause: This is essential to group the data by business_purpose to aggregate
mileage for each category.
• Filtering Data: The query only considers rides where the business purpose is 'Business',
filtering out leisure-related rides.
• Limiting Results: We use the LIMIT 3 clause to restrict the output to the top 3 business
purposes (if there were more than one).
• Sorting: The query sorts the results by total_miles in descending order to ensure the
highest mileage is at the top.
• Q.478

Identify Users with Highest Spending in a Month


Uber wants to know which users spent the most in a given month. You are asked to write a
query that identifies the top 5 users who spent the highest amount on Uber rides in the month
of July 2023. The database contains a rides table with details of each ride, including the ride
user_id and fare_amount. Only include users who took at least one ride in that month.

Datasets and Table Creation

596
1000+ SQL Interview Questions & Answers | By Zero Analyst

Table Creation:
CREATE TABLE rides (
ride_id INT PRIMARY KEY,
user_id INT,
fare_amount DECIMAL(10, 2),
ride_date DATE
);

Insert Data:
INSERT INTO rides (ride_id, user_id, fare_amount, ride_date)
VALUES
(1, 101, 20.50, '2023-07-01'),
(2, 102, 30.75, '2023-07-02'),
(3, 103, 25.00, '2023-07-03'),
(4, 101, 15.00, '2023-07-10'),
(5, 104, 18.00, '2023-07-11'),
(6, 102, 28.25, '2023-07-12'),
(7, 105, 12.50, '2023-07-15'),
(8, 103, 40.00, '2023-07-18');

Solution
PostgreSQL and MySQL:
SELECT user_id, SUM(fare_amount) AS total_spent
FROM rides
WHERE ride_date BETWEEN '2023-07-01' AND '2023-07-31'
GROUP BY user_id
HAVING COUNT(ride_id) > 0
ORDER BY total_spent DESC
LIMIT 5;

Explanation:
• SUM(fare_amount): Calculates the total fare amount spent by each user.
• WHERE ride_date BETWEEN '2023-07-01' AND '2023-07-31': Filters the rides for the
month of July 2023.
• HAVING COUNT(ride_id) > 0: Ensures that only users who have at least one ride are
included.
• ORDER BY total_spent DESC: Orders the results by the total amount spent in descending
order.
• LIMIT 5: Limits the result to the top 5 users.

Learnings:
• Using Aggregation Functions: SUM() to calculate the total spending.
• Filtering by Date: Date operations using BETWEEN for a specific range.
• Grouping and Ordering: Using GROUP BY to group by user_id and ORDER BY to sort the
results.

• Q.479
Uber SQL Interview Question 2: Track Users Who Frequently Cancel Rides
Uber wants to understand which users are frequently canceling their rides. Write a query to
identify the top 3 users who have canceled the most rides in the last 30 days. The database
contains the rides table with the ride status (cancelled column) and user_id information.

Datasets and Table Creation

597
1000+ SQL Interview Questions & Answers | By Zero Analyst

Table Creation:
CREATE TABLE rides (
ride_id INT PRIMARY KEY,
user_id INT,
cancelled BOOLEAN,
ride_date DATE
);

Insert Data:
INSERT INTO rides (ride_id, user_id, cancelled, ride_date)
VALUES
(1, 101, TRUE, '2023-06-01'),
(2, 102, FALSE, '2023-06-02'),
(3, 103, TRUE, '2023-06-05'),
(4, 101, FALSE, '2023-06-07'),
(5, 104, TRUE, '2023-06-10'),
(6, 105, FALSE, '2023-06-12'),
(7, 101, TRUE, '2023-06-15'),
(8, 103, TRUE, '2023-06-20');

Solution
PostgreSQL and MySQL:
SELECT user_id, COUNT(*) AS cancelled_rides
FROM rides
WHERE cancelled = TRUE
AND ride_date >= CURDATE() - INTERVAL 30 DAY
GROUP BY user_id
ORDER BY cancelled_rides DESC
LIMIT 3;

Explanation:
• COUNT(*) AS cancelled_rides: Counts the number of canceled rides for each user.
• WHERE cancelled = TRUE: Filters to include only canceled rides.
• AND ride_date >= CURDATE() - INTERVAL 30 DAY: Filters for rides canceled within
the last 30 days using the CURDATE() function in MySQL and PostgreSQL.
• GROUP BY user_id: Groups the data by user_id to aggregate canceled rides per user.
• ORDER BY cancelled_rides DESC: Orders the users by the number of canceled rides in
descending order.
• LIMIT 3: Limits the results to the top 3 users.

Learnings:
• Filtering by Boolean: Handling boolean columns (cancelled = TRUE).
• Date Calculations: Using CURDATE() with INTERVAL for dynamic date filtering.
• Grouping and Sorting: Combining COUNT() with GROUP BY and ORDER BY.
• Q.480
Find the Average Ride Time for Each Driver
Write a query to find the average ride duration for each driver, measured in minutes. The
rides table contains ride start and end times for each ride. Return the driver ID, their name
(from the drivers table), and their average ride duration (rounded to 2 decimal places). Only
include drivers who have completed at least 5 rides.

Datasets and Table Creation


Table Creation:

598
1000+ SQL Interview Questions & Answers | By Zero Analyst

CREATE TABLE rides (


ride_id INT PRIMARY KEY,
driver_id INT,
start_time TIMESTAMP,
end_time TIMESTAMP
);

CREATE TABLE drivers (


driver_id INT PRIMARY KEY,
driver_name VARCHAR(100)
);

Insert Data:
INSERT INTO rides (ride_id, driver_id, start_time, end_time)
VALUES
(1, 201, '2023-06-01 08:00:00', '2023-06-01 08:30:00'),
(2, 202, '2023-06-02 09:00:00', '2023-06-02 09:15:00'),
(3, 201, '2023-06-03 10:00:00', '2023-06-03 10:45:00'),
(4, 202, '2023-06-04 12:00:00', '2023-06-04 12:30:00'),
(5, 201, '2023-06-05 13:00:00', '2023-06-05 13:25:00'),
(6, 203, '2023-06-06 14:00:00', '2023-06-06 14:20:00');

INSERT INTO drivers (driver_id, driver_name)


VALUES
(201, 'Emily Davis'),
(202, 'Christopher Lee'),
(203, 'Jessica Brown');

Solution
PostgreSQL and MySQL:
SELECT d.driver_name,
r.driver_id,
ROUND(AVG(TIMESTAMPDIFF(MINUTE, r.start_time, r.end_time)), 2) AS avg_ride_durati
on
FROM rides r
JOIN drivers d ON r.driver_id = d.driver_id
GROUP BY r.driver_id
HAVING COUNT(r.ride_id) >= 5;

Explanation:
• TIMESTAMPDIFF(MINUTE, r.start_time, r.end_time): Calculates the difference in
minutes between the start_time and end_time for each ride.
• ROUND(..., 2): Rounds the average duration to 2 decimal places.
• JOIN drivers d ON r.driver_id = d.driver_id: Joins the rides table with the
drivers table to fetch the driver's name.
• GROUP BY r.driver_id: Groups the results by driver_id to calculate average ride
duration per driver.
• HAVING COUNT(r.ride_id) >= 5: Filters the results to only include drivers with at least 5
rides.

Learnings:
• Calculating Time Differences: Using TIMESTAMPDIFF() to calculate the time difference
in minutes.
• Rounding Results: Using ROUND() to limit the precision of floating-point values.
• Joining Multiple Tables: Combining data from the rides and drivers tables.
• Filtering with HAVING: Using HAVING to filter results after aggregation (when using
COUNT()).

599
1000+ SQL Interview Questions & Answers | By Zero Analyst

Summary of
Key Concepts:
• Filtering by Date and Time: Using WHERE and BETWEEN to restrict data based on date
ranges.
• Aggregation Functions: Using SUM(), COUNT(), AVG() for aggregation and analysis.
• Grouping Data: Applying GROUP BY to perform analysis on groups of data.
• Joins: Combining data from multiple tables using JOIN to enhance results.
• Time Calculations: Working with timestamps and calculating time differences using
TIMESTAMPDIFF().

PayPal
• Q.481

Question:
Write an SQL query to report the fraction of players that logged in again on the day after the
day they first logged in, rounded to 2 decimal places.
In other words, count the number of players that logged in for at least two consecutive days
starting from their first login date, then divide that number by the total number of players.

Explanation:
• Identify first login date for each player.
• Check for consecutive logins: For each player, verify if they logged in on the day after
their first login date.
• Count the players who logged in consecutively starting from their first login date.
• Calculate the fraction of players who logged in on consecutive days by dividing the count
of players with consecutive logins by the total number of distinct players.

Datasets and SQL Schemas


• - Table creation
CREATE TABLE Activity (
player_id INT,
device_id INT,
event_date DATE,
games_played INT,
PRIMARY KEY (player_id, event_date)
);
• - Sample data for Activity table
INSERT INTO Activity (player_id, device_id, event_date, games_played)
VALUES
(1, 2, '2016-03-01', 5),
(1, 2, '2016-03-02', 6),
(2, 3, '2017-06-25', 1),
(3, 1, '2016-03-02', 0),
(3, 4, '2018-07-03', 5);

Learnings:
• Window Functions: Using window functions like MIN() and LEAD() to capture login
dates and check for consecutive days.
• Date Arithmetic: Using date functions to compare dates and check for consecutive login
days.
• Aggregation: Using COUNT() and DISTINCT to count players who meet the condition.

600
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Rounding: Using the ROUND() function to round the result to 2 decimal places.

Solutions
• - PostgreSQL solution
WITH FirstLogin AS (
SELECT player_id, MIN(event_date) AS first_login_date
FROM Activity
GROUP BY player_id
), ConsecutiveLogins AS (
SELECT a.player_id
FROM Activity a
JOIN FirstLogin f ON a.player_id = f.player_id
WHERE a.event_date = f.first_login_date + INTERVAL '1 day'
)
SELECT ROUND(COUNT(DISTINCT c.player_id) * 1.0 / (SELECT COUNT(DISTINCT player_id) FROM
Activity), 2) AS fraction
FROM ConsecutiveLogins c;
• - MySQL solution
WITH FirstLogin AS (
SELECT player_id, MIN(event_date) AS first_login_date
FROM Activity
GROUP BY player_id
), ConsecutiveLogins AS (
SELECT a.player_id
FROM Activity a
JOIN FirstLogin f ON a.player_id = f.player_id
WHERE a.event_date = DATE_ADD(f.first_login_date, INTERVAL 1 DAY)
)
SELECT ROUND(COUNT(DISTINCT c.player_id) / COUNT(DISTINCT a.player_id), 2) AS fraction
FROM Activity a
LEFT JOIN ConsecutiveLogins c ON a.player_id = c.player_id;
• Q.482

Question:
Write an SQL query to select the product ID, year, quantity, and price for the first year of
every product sold.

Explanation:
• Identify the first year for each product by selecting the minimum year from the sales
records of each product.
• Join the result with the Sales table to get the details for that first year (product_id, year,
quantity, price).
• Return the results for all products in any order.

Datasets and SQL Schemas


• - Table creation
CREATE TABLE Sales (
sale_id INT,
product_id INT,
year INT,
quantity INT,
price INT,
PRIMARY KEY (sale_id, year),
FOREIGN KEY (product_id) REFERENCES Product(product_id)
);

CREATE TABLE Product (


product_id INT,
product_name VARCHAR(255),
PRIMARY KEY (product_id)
);

601
1000+ SQL Interview Questions & Answers | By Zero Analyst

• - Sample data for Sales and Product tables


INSERT INTO Product (product_id, product_name)
VALUES
(100, 'Nokia'),
(200, 'Apple'),
(300, 'Samsung');

INSERT INTO Sales (sale_id, product_id, year, quantity, price)


VALUES
(1, 100, 2008, 10, 5000),
(2, 100, 2009, 12, 5000),
(7, 200, 2011, 15, 9000),
(3, 200, 2010, 8, 9000),
(4, 300, 2012, 20, 3000);

Learnings:
• Aggregation: Use of MIN() to identify the first year of sales for each product.
• Joins: How to join tables to combine related information from Product and Sales.
• Subquery: Using subquery to find the minimum year of sale for each product.

Solutions
• - PostgreSQL solution
SELECT s.product_id, s.year, s.quantity, s.price
FROM Sales s
JOIN (
SELECT product_id, MIN(year) AS first_year
FROM Sales
GROUP BY product_id
) first_sales ON s.product_id = first_sales.product_id
AND s.year = first_sales.first_year;
• - MySQL solution
SELECT s.product_id, s.year, s.quantity, s.price
FROM Sales s
JOIN (
SELECT product_id, MIN(year) AS first_year
FROM Sales
GROUP BY product_id
) first_sales ON s.product_id = first_sales.product_id
AND s.year = first_sales.first_year;
• Q.483

Question:
Write an SQL query to report the customer IDs from the Customer table that bought all the
products in the Product table.

Explanation:
• We need to find customers who have purchased every product listed in the Product table.
• To achieve this, we can count the distinct product_key for each customer and compare it
with the total number of distinct products in the Product table.
• If the customer has bought all the products, the count of distinct products they bought will
match the total number of distinct products in the Product table.

Datasets and SQL Schemas


• - Table creation
CREATE TABLE Customer (
customer_id INT,
product_key INT,
FOREIGN KEY (product_key) REFERENCES Product(product_key)

602
1000+ SQL Interview Questions & Answers | By Zero Analyst

);

CREATE TABLE Product (


product_key INT,
PRIMARY KEY (product_key)
);
• - Sample data for Customer and Product tables
INSERT INTO Product (product_key)
VALUES (5), (6);

INSERT INTO Customer (customer_id, product_key)


VALUES
(1, 5),
(2, 6),
(3, 5),
(3, 6),
(1, 6);

Learnings:
• Aggregation: Use of COUNT(DISTINCT ...) to count unique products bought by each
customer.
• Subquery: Using a subquery to get the total number of distinct products.
• Grouping: Grouping by customer_id to evaluate each customer individually.

Solutions
• - PostgreSQL solution
SELECT customer_id
FROM Customer
GROUP BY customer_id
HAVING COUNT(DISTINCT product_key) = (SELECT COUNT(DISTINCT product_key) FROM Product);
• - MySQL solution
SELECT customer_id
FROM Customer
GROUP BY customer_id
HAVING COUNT(DISTINCT product_key) = (SELECT COUNT(DISTINCT product_key) FROM Product);
• Q.484
Question:
Write an SQL query to report the customer IDs from the Customer table that bought all the
products in the Product table.

Explanation:
• We need to identify customers who have purchased every product listed in the Product
table.
• This can be done by comparing the count of distinct products a customer has bought to the
total count of distinct products in the Product table.
• If the customer has bought all products, the count of distinct products they bought will
match the total count of products in the Product table.

Datasets and SQL Schemas


• - Table creation
CREATE TABLE Customer (
customer_id INT,
product_key INT,
FOREIGN KEY (product_key) REFERENCES Product(product_key)
);

CREATE TABLE Product (

603
1000+ SQL Interview Questions & Answers | By Zero Analyst

product_key INT,
PRIMARY KEY (product_key)
);
• - Sample data for Customer and Product tables
INSERT INTO Product (product_key)
VALUES (5), (6);

INSERT INTO Customer (customer_id, product_key)


VALUES
(1, 5),
(2, 6),
(3, 5),
(3, 6),
(1, 6);

Learnings:
• Aggregation: Using COUNT(DISTINCT ...) to count the unique products bought by each
customer.
• Subquery: A subquery is used to calculate the total number of distinct products.
• Grouping: Grouping by customer_id ensures that we evaluate each customer
individually.

Solutions
• - PostgreSQL solution
SELECT customer_id
FROM Customer
GROUP BY customer_id
HAVING COUNT(DISTINCT product_key) = (SELECT COUNT(DISTINCT product_key) FROM Product);
• - MySQL solution
SELECT customer_id
FROM Customer
GROUP BY customer_id
HAVING COUNT(DISTINCT product_key) = (SELECT COUNT(DISTINCT product_key) FROM Product);
• Q.485

Question:
Given a table containing information about bank deposits and withdrawals made using
PayPal, write a query to retrieve the final account balance for each account, taking into
account all the transactions recorded in the table, with the assumption that there are no
missing transactions.

Explanation:
• For each account, we need to calculate the final balance by considering both deposits and
withdrawals.
• If the transaction type is 'Deposit', the amount is added to the balance, and if the transaction
type is 'Withdrawal', the amount is subtracted from the balance.
• The final balance for each account is the sum of the amounts, adjusted by the type of
transaction.

Datasets and SQL Schemas


• - Table creation
CREATE TABLE transactions (
transaction_id INT,
account_id INT,
amount DECIMAL,

604
1000+ SQL Interview Questions & Answers | By Zero Analyst

transaction_type VARCHAR
);
• - Sample data for transactions table
INSERT INTO transactions (transaction_id, account_id, amount, transaction_type)
VALUES
(123, 101, 10.00, 'Deposit'),
(124, 101, 20.00, 'Deposit'),
(125, 101, 5.00, 'Withdrawal'),
(126, 201, 20.00, 'Deposit'),
(128, 201, 10.00, 'Withdrawal');

Learnings:
• Conditional Aggregation: Using a CASE statement to adjust the sign of the amount based
on the transaction type.
• Grouping: Grouping by account_id to aggregate all transactions for each account.
• Arithmetic Operations: Calculating the final balance by considering deposits as positive
and withdrawals as negative values.

Solutions
• - PostgreSQL solution
SELECT
account_id,
SUM(CASE
WHEN transaction_type = 'Deposit' THEN amount
ELSE -amount
END) AS final_balance
FROM transactions
GROUP BY account_id;
• - MySQL solution
SELECT
account_id,
SUM(CASE
WHEN transaction_type = 'Deposit' THEN amount
ELSE -amount
END) AS final_balance
FROM transactions
GROUP BY account_id;
• Q.486
Question
Calculate the Average Transaction Amount per User
As a data scientist at PayPal, you have been asked to write a SQL query to analyze the
transaction history of PayPal users. Specifically, management wants to know the average
transaction amount for each user, and how they rank based on their averages. For this task:
• Calculate the average transaction amount for every user
• Rank the users by their average transaction amount in descending order
Note: When the same average transaction amount is found for multiple users, they should
have the same rank. The next rank should be consecutive.

Explanation
• Calculate the average transaction amount per user using the AVG() function.
• Rank users by their average transaction amounts using the RANK() window function,
ordering in descending order.
• Handle ties in average transaction amounts with the RANK() function, ensuring consecutive
ranks.

605
1000+ SQL Interview Questions & Answers | By Zero Analyst

Datasets and SQL Schemas


• - Table creation
CREATE TABLE transactions (
transaction_id INT,
user_id INT,
transaction_date DATE,
amount DECIMAL(10, 2)
);
• - Datasets
INSERT INTO transactions (transaction_id, user_id, transaction_date, amount)
VALUES
(1, 1000, '2021-01-25', 50),
(2, 1000, '2021-03-02', 150),
(3, 2000, '2021-03-04', 300),
(4, 3000, '2021-04-15', 100),
(5, 2000, '2021-04-18', 200),
(6, 3000, '2021-05-05', 100),
(7, 4000, '2021-05-10', 500);

Learnings
• Using AVG() to calculate the average transaction amount.
• The RANK() function allows ranking based on specific criteria (e.g., descending order).
• Handling ties in ranking with RANK() for equal values.

Solutions
• - PostgreSQL solution
WITH user_average AS (
SELECT
user_id,
AVG(amount) OVER (PARTITION BY user_id) AS avg_transaction
FROM transactions)

SELECT
user_id,
avg_transaction,
RANK() OVER (ORDER BY avg_transaction DESC) AS rank
FROM user_average
ORDER BY rank;
• - MySQL solution
WITH user_average AS (
SELECT
user_id,
AVG(amount) AS avg_transaction
FROM transactions
GROUP BY user_id)

SELECT
user_id,
avg_transaction,
RANK() OVER (ORDER BY avg_transaction DESC) AS rank
FROM user_average
ORDER BY rank;
• Q.487
Question
Unique Money Transfer Relationships

606
1000+ SQL Interview Questions & Answers | By Zero Analyst

You are given a table of PayPal payments showing the payer, the recipient, and the amount
paid. A two-way unique relationship is established when two people send money back and
forth. Write a query to find the number of two-way unique relationships in this data.

Explanation
• A unique relationship occurs when two people send money to each other, i.e., there is an
inverse pair of transactions.
• Use INTERSECT to find mutual payment pairs where the payer and recipient are reversed
between two records.
• Divide the count by 2 to avoid double-counting each relationship.

Datasets and SQL Schemas


• - Table creation
CREATE TABLE payments (
payer_id INT,
recipient_id INT,
amount INT
);
• - Datasets
INSERT INTO payments (payer_id, recipient_id, amount)
VALUES
(101, 201, 30),
(201, 101, 10),
(101, 301, 20),
(301, 101, 80),
(201, 301, 70);

Learnings
• Using INTERSECT to find mutual records in two tables.
• Avoiding double counting by dividing by 2 when counting relationships.
• Identifying unique bidirectional relationships between pairs.

Solutions
• - PostgreSQL solution
SELECT COUNT(payer_id) / 2 AS unique_relationships
FROM (
SELECT payer_id, recipient_id
FROM payments
INTERSECT
SELECT recipient_id, payer_id
FROM payments) AS relationships;
• - MySQL solution
SELECT COUNT(payer_id) / 2 AS unique_relationships
FROM (
SELECT payer_id, recipient_id
FROM payments
WHERE (payer_id, recipient_id) IN (
SELECT recipient_id, payer_id
FROM payments)) AS relationships;
• Q.488
Question
Determining High-Value Customers

607
1000+ SQL Interview Questions & Answers | By Zero Analyst

You are a data analyst at PayPal, and you have been asked to create a report that identifies all
users who have sent payments of more than 1000 or have received payments of more than
5000 in the last month. Additionally, you must filter out any users whose account is flagged
as "fraudulent".

Explanation
• Join the Transactions table with the User table on user_id.
• Filter for transactions within the last month using the transaction_date.
• Filter for users who sent payments greater than 1000 or received payments greater than
5000.
• Exclude users marked as "fraudulent".
• Group by user_id and username to avoid duplicate records.

Datasets and SQL Schemas


• - Table creation
CREATE TABLE Transactions (
transaction_id INT,
user_id INT,
transaction_date TIMESTAMP,
transaction_type VARCHAR(10),
amount INT
);

CREATE TABLE User (


user_id INT,
username VARCHAR(50),
is_fraudulent BOOLEAN
);
• - Datasets
INSERT INTO Transactions (transaction_id, user_id, transaction_date, transaction_type, a
mount)
VALUES
(101, 123, '2022-07-08 00:00:00', 'Sent', 750),
(102, 265, '2022-07-10 00:00:00', 'Received', 6000),
(103, 265, '2022-07-18 00:00:00', 'Sent', 1500),
(104, 362, '2022-07-26 00:00:00', 'Received', 6000),
(105, 981, '2022-07-05 00:00:00', 'Sent', 3000);

INSERT INTO User (user_id, username, is_fraudulent)


VALUES
(123, 'Jessica', false),
(265, 'Daniel', true),
(362, 'Michael', false),
(981, 'Sophia', false);

Learnings
• Using JOIN to combine data from multiple tables based on a common column (user_id).
• Filtering data based on date ranges using CURRENT_DATE - INTERVAL.
• Applying multiple conditions with AND and OR for complex filtering.
• Excluding specific users using boolean flags.

Solutions
• - PostgreSQL solution
SELECT u.user_id, u.username

608
1000+ SQL Interview Questions & Answers | By Zero Analyst

FROM Transactions t
JOIN User u ON t.user_id = u.user_id
WHERE t.transaction_date > (CURRENT_DATE - INTERVAL '1 month')
AND ((t.transaction_type = 'Sent' AND t.amount > 1000)
OR (t.transaction_type = 'Received' AND t.amount > 5000))
AND u.is_fraudulent = false
GROUP BY u.user_id, u.username;
• - MySQL solution
SELECT u.user_id, u.username
FROM Transactions t
JOIN User u ON t.user_id = u.user_id
WHERE t.transaction_date > (CURDATE() - INTERVAL 1 MONTH)
AND ((t.transaction_type = 'Sent' AND t.amount > 1000)
OR (t.transaction_type = 'Received' AND t.amount > 5000))
AND u.is_fraudulent = false
GROUP BY u.user_id, u.username;
• Q.489
Question
Calculate Click-Through Conversion Rate For PayPal
Given a hypothetical situation where PayPal runs several online marketing campaigns, they
want to monitor the click-through conversion rate for their campaigns. The click-through
conversion rate is the number of users who click on the advertisement and proceed to add a
product (in this case, setting up a new PayPal account) divided by the total number of users
who have clicked the ad.
Calculate the daily click-through conversion rate for the first week of September 2022.

Explanation
• Use the ad_clicks table to identify users who clicked on ads.
• Use the account_setup table to track users who successfully set up accounts after clicking
on ads.
• Calculate the daily click-through conversion rate as:
Click-
through conversion rate=Total users who set up accountsTotal users who clicked the ad\text{
Click-through conversion rate} = \frac{\text{Total users who set up accounts}}{\text{Total
users who clicked the ad}}
• Apply a LEFT JOIN to combine the two tables and filter the data for the first week of
September 2022.

Datasets and SQL Schemas


• - Table creation
CREATE TABLE ad_clicks (
click_id INT,
user_id INT,
click_time TIMESTAMP,
ad_id INT
);

CREATE TABLE account_setup (


setup_id INT,
user_id INT,
setup_time TIMESTAMP
);
• - Datasets

609
1000+ SQL Interview Questions & Answers | By Zero Analyst

INSERT INTO ad_clicks (click_id, user_id, click_time, ad_id)


VALUES
(1, 200, '2022-09-01 10:14:00', 4001),
(2, 534, '2022-09-01 11:30:00', 4003),
(3, 120, '2022-09-02 14:43:00', 4001),
(4, 534, '2022-09-03 16:15:00', 4002),
(5, 287, '2022-09-04 17:20:00', 4001);

INSERT INTO account_setup (setup_id, user_id, setup_time)


VALUES
(1, 200, '2022-09-01 10:30:00'),
(2, 287, '2022-09-04 17:40:00'),
(3, 534, '2022-09-01 11:45:00');

Learnings
• Using LEFT JOIN to combine tables while preserving all records from the ad_clicks
table.
• Using COUNT(DISTINCT ...) to ensure unique user counts for clicks and account setups.
• Filtering by date ranges using DATE() and BETWEEN.
• Calculating ratios for conversion rates.

Solutions
• - PostgreSQL solution
SELECT
DATE(ac.click_time) AS day,
COUNT(DISTINCT ac.user_id) AS total_clicks,
COUNT(DISTINCT as.user_id) AS total_setups,
COUNT(DISTINCT as.user_id)::float / COUNT(DISTINCT ac.user_id) AS click_through_conver
sion_rate
FROM
ad_clicks AS ac
LEFT JOIN
account_setup AS as ON ac.user_id = as.user_id
WHERE
DATE(ac.click_time) BETWEEN '2022-09-01' AND '2022-09-07'
GROUP BY
DATE(ac.click_time)
ORDER BY
day;
• - MySQL solution
SELECT
DATE(ac.click_time) AS day,
COUNT(DISTINCT ac.user_id) AS total_clicks,
COUNT(DISTINCT as.user_id) AS total_setups,
COUNT(DISTINCT as.user_id) / COUNT(DISTINCT ac.user_id) AS click_through_conversion_ra
te
FROM
ad_clicks AS ac
LEFT JOIN
account_setup AS as ON ac.user_id = as.user_id
WHERE
DATE(ac.click_time) BETWEEN '2022-09-01' AND '2022-09-07'
GROUP BY
DATE(ac.click_time)
ORDER BY
day;
• Q.490
Calculate the Total Revenue per User
You are working as a data analyst at PayPal and have been tasked with calculating the total
revenue generated by each user. The transactions table contains details of all transactions,

610
1000+ SQL Interview Questions & Answers | By Zero Analyst

and you need to calculate the total revenue per user, summing the amount of all their
transactions.

Explanation
• Use the SUM() function to calculate the total revenue for each user.
• Group the results by user_id to calculate the sum for each individual user.

Datasets and SQL Schemas


• - Table creation
CREATE TABLE transactions (
transaction_id INT,
user_id INT,
transaction_date DATE,
amount DECIMAL(10, 2)
);
• - Datasets
INSERT INTO transactions (transaction_id, user_id, transaction_date, amount)
VALUES
(1, 101, '2022-07-01', 100.00),
(2, 102, '2022-07-02', 250.00),
(3, 101, '2022-07-03', 300.00),
(4, 103, '2022-07-04', 150.00),
(5, 102, '2022-07-05', 50.00);

Learnings
• Using SUM() to calculate the total revenue.
• Grouping results by user_id to aggregate transaction amounts for each user.
• Working with date and amount data types.

Solutions
• - PostgreSQL solution
SELECT
user_id,
SUM(amount) AS total_revenue
FROM
transactions
GROUP BY
user_id
ORDER BY
total_revenue DESC;
• - MySQL solution
SELECT
user_id,
SUM(amount) AS total_revenue
FROM
transactions
GROUP BY
user_id
ORDER BY
total_revenue DESC;
• Q.491

Question 2: Find the Top 3 Users by Total Transaction Amount

611
1000+ SQL Interview Questions & Answers | By Zero Analyst

You have been asked to identify the top 3 users who have spent the most money on PayPal
transactions over the last month. The transactions table contains the details of all
transactions made by users. Calculate the total amount spent by each user and list the top 3
users.

Explanation
• Sum the transaction amounts for each user to calculate the total spent.
• Sort the results by total amount in descending order.
• Use LIMIT to return only the top 3 users.

Datasets and SQL Schemas


• - Table creation
CREATE TABLE transactions (
transaction_id INT,
user_id INT,
transaction_date DATE,
amount DECIMAL(10, 2)
);
• - Datasets
INSERT INTO transactions (transaction_id, user_id, transaction_date, amount)
VALUES
(1, 101, '2022-08-01', 500.00),
(2, 102, '2022-08-02', 300.00),
(3, 103, '2022-08-03', 700.00),
(4, 101, '2022-08-04', 200.00),
(5, 102, '2022-08-05', 400.00),
(6, 104, '2022-08-06', 100.00);

Learnings
• Using SUM() to calculate the total amount spent.
• Sorting results using ORDER BY in descending order.
• Using LIMIT to restrict the output to top N records.

Solutions
• - PostgreSQL solution
SELECT
user_id,
SUM(amount) AS total_spent
FROM
transactions
WHERE
transaction_date > CURRENT_DATE - INTERVAL '1 month'
GROUP BY
user_id
ORDER BY
total_spent DESC
LIMIT 3;
• - MySQL solution
SELECT
user_id,
SUM(amount) AS total_spent
FROM
transactions
WHERE
transaction_date > CURDATE() - INTERVAL 1 MONTH
GROUP BY

612
1000+ SQL Interview Questions & Answers | By Zero Analyst

user_id
ORDER BY
total_spent DESC
LIMIT 3;
• Q.492
Identify Users with Inactive Accounts (No Transactions in 3 Months)
You need to identify users who have not made any transactions in the last 3 months. The
transactions table tracks all user transactions, and you need to find those who have been
inactive for 3 months or longer.

Explanation
• Use the MAX() function to find the most recent transaction date for each user.
• Filter users whose most recent transaction is older than 3 months.

Datasets and SQL Schemas


• - Table creation
CREATE TABLE transactions (
transaction_id INT,
user_id INT,
transaction_date DATE,
amount DECIMAL(10, 2)
);
• - Datasets
INSERT INTO transactions (transaction_id, user_id, transaction_date, amount)
VALUES
(1, 101, '2022-05-01', 50.00),
(2, 102, '2022-06-01', 100.00),
(3, 103, '2022-07-01', 150.00),
(4, 104, '2022-07-01', 200.00),
(5, 101, '2022-08-01', 75.00);

Learnings
• Using MAX() to find the latest transaction date.
• Filtering data based on time intervals using CURRENT_DATE and INTERVAL.
• Identifying inactive users based on transaction history.

Solutions
• - PostgreSQL solution
SELECT
user_id
FROM
transactions
GROUP BY
user_id
HAVING
MAX(transaction_date) < CURRENT_DATE - INTERVAL '3 months';
• - MySQL solution
SELECT
user_id
FROM
transactions
GROUP BY
user_id
HAVING
MAX(transaction_date) < CURDATE() - INTERVAL 3 MONTH;

613
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Q.493
Question
Identify the Highest Revenue-Generating Products of PayPal
As a data analyst at PayPal, your task is to identify the products which generate the highest
total revenue for each month. Assume that each transaction on PayPal relates to a product
purchased, and the revenue generated is the transaction amount. Each transaction is
timestamped, and the product ID is also recorded.

Explanation
• Use EXTRACT(MONTH FROM transaction_date) to extract the month from the transaction
date.
• Calculate the total revenue for each product by summing the transaction_amount.
• Group by month and product_id to aggregate the total revenue for each product per
month.
• Sort the result by total_revenue in descending order to identify the highest revenue-
generating products.

Datasets and SQL Schemas


• - Table creation
CREATE TABLE transactions (
transaction_id INT,
user_id INT,
transaction_date TIMESTAMP,
product_id INT,
transaction_amount DECIMAL(10, 2)
);
• - Datasets
INSERT INTO transactions (transaction_id, user_id, transaction_date, product_id, transac
tion_amount)
VALUES
(218, 123, '2022-06-08 00:00:00', 50001, 150.00),
(320, 265, '2022-06-12 00:00:00', 69852, 200.00),
(475, 362, '2022-06-21 00:00:00', 50001, 300.00),
(650, 192, '2022-07-06 00:00:00', 69852, 100.00),
(789, 981, '2022-07-05 00:00:00', 69852, 250.00);

Learnings
• Using EXTRACT() to extract parts of a date, such as the month.
• Using SUM() to aggregate transaction amounts for each product.
• Grouping data by multiple fields (month and product).
• Sorting results in descending order to identify top-performing products.

Solutions
• - PostgreSQL solution
SELECT
EXTRACT(MONTH FROM transaction_date) AS month,
product_id AS product,
SUM(transaction_amount) AS total_revenue
FROM
transactions
GROUP BY

614
1000+ SQL Interview Questions & Answers | By Zero Analyst

EXTRACT(MONTH FROM transaction_date),


product_id
ORDER BY
total_revenue DESC;
• - MySQL solution
SELECT
MONTH(transaction_date) AS month,
product_id AS product,
SUM(transaction_amount) AS total_revenue
FROM
transactions
GROUP BY
MONTH(transaction_date),
product_id
ORDER BY
total_revenue DESC;
• Q.494
Question
Analyzing User Transaction Data
You're given two tables - "Users" and "Transactions". The "Users" table records PayPal's user
base. Each row represents a different user, and includes fields for the user_id and
signup_date. The "Transactions" table records transactions made by these users. Each row
represents a different transaction and includes fields for transaction_id, user_id,
transaction_date, and transaction_amount.
Write a SQL query that calculates the total and average transaction amount for all
transactions for each user. Include only users who have made at least two transactions.

Explanation
• Use the SUM() function to calculate the total transaction amount for each user.
• Use the AVG() function to calculate the average transaction amount for each user.
• Use COUNT() in the HAVING clause to filter users who have made at least two transactions.
• Group the results by user_id to aggregate the data for each user.

Datasets and SQL Schemas


• - Table creation
CREATE TABLE Users (
user_id INT,
signup_date DATE
);

CREATE TABLE Transactions (


transaction_id INT,
user_id INT,
transaction_date DATE,
transaction_amount DECIMAL(10, 2)
);
• - Datasets
INSERT INTO Users (user_id, signup_date)
VALUES
(1, '2020-01-30'),
(2, '2020-02-15'),
(3, '2020-03-20'),
(4, '2020-04-01');

INSERT INTO Transactions (transaction_id, user_id, transaction_date, transaction_amount)


VALUES

615
1000+ SQL Interview Questions & Answers | By Zero Analyst

(101, 1, '2020-02-01', 50.00),


(102, 1, '2020-02-02', 100.00),
(103, 2, '2020-02-20', 200.00),
(104, 2, '2020-02-25', 500.00),
(105, 3, '2020-03-25', 100.00),
(106, 4, '2020-05-05', 300.00);

Learnings
• Using SUM() and AVG() to calculate total and average values.
• Filtering results with HAVING to ensure only users with at least two transactions are
included.
• Grouping data by user_id for aggregation.

Solutions
• - PostgreSQL solution
SELECT
t.user_id,
SUM(t.transaction_amount) AS total_amount,
AVG(t.transaction_amount) AS average_amount
FROM
Transactions t
GROUP BY
t.user_id
HAVING
COUNT(t.transaction_id) >= 2;
• - MySQL solution
SELECT
t.user_id,
SUM(t.transaction_amount) AS total_amount,
AVG(t.transaction_amount) AS average_amount
FROM
Transactions t
GROUP BY
t.user_id
HAVING
COUNT(t.transaction_id) >= 2;
• Q.495
Question
Finding Unused Coupons:
Using a table of transactions (transaction_id, user_id, coupon_code, amount) and a table of
coupons (coupon_code, discount_amount, expiration_date), write a query to identify the
coupons that have been created but have never been used in a transaction.

Explanation
• The goal is to find coupons that are in the coupons table but have not appeared in the
transactions table.
• Use a LEFT JOIN to combine the two tables and identify coupons without matching entries
in the transactions table (i.e., where the coupon_code is NULL in the transactions table).
• Only include coupons that exist in the coupons table but not in any transaction.

Datasets and SQL Schemas


• - Table creation

616
1000+ SQL Interview Questions & Answers | By Zero Analyst

CREATE TABLE transactions (


transaction_id INT,
user_id INT,
coupon_code VARCHAR(50),
amount DECIMAL(10, 2)
);

CREATE TABLE coupons (


coupon_code VARCHAR(50),
discount_amount DECIMAL(10, 2),
expiration_date DATE
);
• - Datasets
INSERT INTO transactions (transaction_id, user_id, coupon_code, amount)
VALUES
(1, 101, 'COUPON123', 150.00),
(2, 102, 'COUPON456', 200.00),
(3, 103, 'COUPON789', 50.00);

INSERT INTO coupons (coupon_code, discount_amount, expiration_date)


VALUES
('COUPON123', 10.00, '2022-12-31'),
('COUPON456', 20.00, '2022-12-31'),
('COUPON999', 15.00, '2023-06-30');

Learnings
• Using LEFT JOIN to identify records with no matching data from the right table.
• Filtering for NULL values in a LEFT JOIN to find unused coupons.
• Identifying records that exist in one table but not the other.

Solutions
• - PostgreSQL solution
SELECT
c.coupon_code
FROM
coupons c
LEFT JOIN
transactions t ON c.coupon_code = t.coupon_code
WHERE
t.transaction_id IS NULL;
• - MySQL solution
SELECT
c.coupon_code
FROM
coupons c
LEFT JOIN
transactions t ON c.coupon_code = t.coupon_code
WHERE
t.transaction_id IS NULL;
• Q.496
Question
Analyzing Payment Failures:
Using a table of payments (payment_id, user_id, payment_method, amount, payment_status,
payment_date), write a query to calculate the failure rate of payments, broken down by
payment method (e.g., PayPal, credit card), for the last month. payment_status can be
'success', 'failed', or 'pending'.

617
1000+ SQL Interview Questions & Answers | By Zero Analyst

Explanation
• The failure rate is calculated as the number of failed payments divided by the total number
of payments for each payment method.
• We will filter the payments for the last month using the payment_date field.
• Use COUNT() to count the total and failed payments, and then calculate the failure rate by
dividing the count of failed payments by the total count of payments for each method.
• Group the results by payment_method to break down the failure rate by method.

Datasets and SQL Schemas


• - Table creation
CREATE TABLE payments (
payment_id INT,
user_id INT,
payment_method VARCHAR(50),
amount DECIMAL(10, 2),
payment_status VARCHAR(20),
payment_date DATE
);
• - Datasets
INSERT INTO payments (payment_id, user_id, payment_method, amount, payment_status, payme
nt_date)
VALUES
(1, 101, 'PayPal', 50.00, 'success', '2022-08-01'),
(2, 102, 'credit card', 100.00, 'failed', '2022-08-02'),
(3, 103, 'PayPal', 150.00, 'failed', '2022-08-05'),
(4, 104, 'credit card', 200.00, 'success', '2022-08-06'),
(5, 105, 'PayPal', 75.00, 'success', '2022-08-10'),
(6, 106, 'credit card', 80.00, 'failed', '2022-08-15');

Learnings
• Using COUNT() for counting occurrences of specific conditions (e.g., success and failed
payments).
• Filtering results by date ranges using WHERE and CURRENT_DATE - INTERVAL '1 month'.
• Grouping results by payment_method to calculate the failure rate for each method.

Solutions
• - PostgreSQL solution
SELECT
payment_method,
COUNT(CASE WHEN payment_status = 'failed' THEN 1 END) * 1.0 / COUNT(payment_id) AS f
ailure_rate
FROM
payments
WHERE
payment_date > CURRENT_DATE - INTERVAL '1 month'
GROUP BY
payment_method;
• - MySQL solution
SELECT
payment_method,
COUNT(CASE WHEN payment_status = 'failed' THEN 1 END) * 1.0 / COUNT(payment_id) AS f
ailure_rate
FROM
payments
WHERE
payment_date > CURDATE() - INTERVAL 1 MONTH
GROUP BY

618
1000+ SQL Interview Questions & Answers | By Zero Analyst

payment_method;
• Q.497
Question
Identifying Users Who Have Never Made a Payment:
Given a table of users (user_id, registration_date) and a table of transactions (transaction_id,
user_id, amount, transaction_date), write a query to find all users who have registered but
have never made a payment.

Explanation
• To find users who have never made a payment, we need to identify users in the users table
who do not have a corresponding record in the transactions table.
• We can achieve this by using a LEFT JOIN between the users table and the transactions
table, and then filtering for rows where no transaction exists (i.e., the transaction_id is
NULL).
• This will give us a list of users who have registered but have not made any payments.

Datasets and SQL Schemas


• - Table creation
CREATE TABLE users (
user_id INT,
registration_date DATE
);

CREATE TABLE transactions (


transaction_id INT,
user_id INT,
amount DECIMAL(10, 2),
transaction_date DATE
);
• - Datasets
INSERT INTO users (user_id, registration_date)
VALUES
(1, '2022-01-15'),
(2, '2022-02-20'),
(3, '2022-03-05'),
(4, '2022-04-10');

INSERT INTO transactions (transaction_id, user_id, amount, transaction_date)


VALUES
(101, 1, 50.00, '2022-02-10'),
(102, 2, 100.00, '2022-02-15');

Learnings
• Using LEFT JOIN to include all records from the users table, and matching records from
the transactions table.
• Filtering results with WHERE transaction_id IS NULL to identify users with no
transactions.
• Understanding how LEFT JOIN works when looking for non-matching records.

Solutions
• - PostgreSQL solution

619
1000+ SQL Interview Questions & Answers | By Zero Analyst

SELECT
u.user_id
FROM
users u
LEFT JOIN
transactions t ON u.user_id = t.user_id
WHERE
t.transaction_id IS NULL;
• - MySQL solution
SELECT
u.user_id
FROM
users u
LEFT JOIN
transactions t ON u.user_id = t.user_id
WHERE
t.transaction_id IS NULL;
• Q.498
Question
Detecting Fraudulent Transactions:
Using a table of transactions (transaction_id, user_id, amount, transaction_date), write a
query to detect users who have made more than 5 transactions in a single day, each exceeding
$500.

Explanation
• To detect potentially fraudulent transactions, we need to identify users who have made
more than 5 transactions on the same day where each transaction exceeds $500.
• Use COUNT() to count the number of transactions for each user per day where the
transaction amount is greater than $500.
• Use GROUP BY to group the data by user_id and transaction_date.
• Filter the results with a HAVING clause to include only users who made more than 5
transactions on the same day.

Datasets and SQL Schemas


• - Table creation
CREATE TABLE transactions (
transaction_id INT,
user_id INT,
amount DECIMAL(10, 2),
transaction_date DATE
);
• - Datasets
INSERT INTO transactions (transaction_id, user_id, amount, transaction_date)
VALUES
(1, 101, 600.00, '2022-08-01'),
(2, 101, 700.00, '2022-08-01'),
(3, 101, 550.00, '2022-08-01'),
(4, 101, 800.00, '2022-08-01'),
(5, 101, 900.00, '2022-08-01'),
(6, 101, 650.00, '2022-08-01'),
(7, 102, 300.00, '2022-08-01'),
(8, 102, 200.00, '2022-08-02'),
(9, 103, 700.00, '2022-08-02'),
(10, 103, 800.00, '2022-08-02');

Learnings

620
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Using COUNT() and HAVING to filter groups based on the count of qualifying records (in
this case, transactions greater than $500).
• Grouping data by both user_id and transaction_date to analyze transactions by day for
each user.
• Understanding how to filter for users who meet specific conditions (e.g., more than 5
qualifying transactions).

Solutions
• - PostgreSQL solution
SELECT
user_id,
transaction_date,
COUNT(transaction_id) AS num_transactions
FROM
transactions
WHERE
amount > 500
GROUP BY
user_id, transaction_date
HAVING
COUNT(transaction_id) > 5;
• - MySQL solution
SELECT
user_id,
transaction_date,
COUNT(transaction_id) AS num_transactions
FROM
transactions
WHERE
amount > 500
GROUP BY
user_id, transaction_date
HAVING
COUNT(transaction_id) > 5;
• Q.500
Question
Detecting Payment Spikes
Given a table of payments (payment_id, user_id, payment_method, amount, payment_date),
write a query to detect users who have made more than 3 payments in a single day with a
cumulative amount greater than $10,000.

Explanation
• The goal is to identify users who made multiple payments in a single day, and the total
amount of those payments exceeds $10,000.
• Use SUM() to calculate the total payment amount for each user per day.
• Use COUNT() to ensure that more than 3 payments are made.
• Group by user_id and payment_date and filter for groups where the total payment
amount exceeds $10,000 and the count of payments exceeds 3.

Datasets and SQL Schemas


• - Table creation
CREATE TABLE payments (
payment_id INT,

621
1000+ SQL Interview Questions & Answers | By Zero Analyst

user_id INT,
payment_method VARCHAR(50),
amount DECIMAL(10, 2),
payment_date DATE
);
• - Datasets
INSERT INTO payments (payment_id, user_id, payment_method, amount, payment_date)
VALUES
(1, 101, 'credit card', 3000.00, '2022-08-01'),
(2, 101, 'PayPal', 4000.00, '2022-08-01'),
(3, 101, 'credit card', 5000.00, '2022-08-01'),
(4, 101, 'PayPal', 2000.00, '2022-08-01'),
(5, 102, 'credit card', 1000.00, '2022-08-01'),
(6, 103, 'credit card', 15000.00, '2022-08-01'),
(7, 103, 'PayPal', 2000.00, '2022-08-02');

Learnings
• Using SUM() to calculate the total amount of payments.
• Using COUNT() to filter for users who made more than 3 payments.
• Grouping by both user_id and payment_date to aggregate payments by day.

Solutions
• - PostgreSQL solution
SELECT
user_id,
payment_date,
COUNT(payment_id) AS num_payments,
SUM(amount) AS total_amount
FROM
payments
GROUP BY
user_id, payment_date
HAVING
COUNT(payment_id) > 3 AND SUM(amount) > 10000;
• - MySQL solution
SELECT
user_id,
payment_date,
COUNT(payment_id) AS num_payments,
SUM(amount) AS total_amount
FROM
payments
GROUP BY
user_id, payment_date
HAVING
COUNT(payment_id) > 3 AND SUM(amount) > 10000;

PWc
• Q.501
Question
Given a table of products where each row indicates a price change for a product on a specific
date, write a SQL query to find the prices of all products on 2019-08-16. Assume that the
price of all products before any change is 10.

Explanation

622
1000+ SQL Interview Questions & Answers | By Zero Analyst

• We need to track the price changes for each product and determine the price on the specific
date 2019-08-16.
• If a product had multiple price changes before 2019-08-16, the most recent change before
that date will determine the price.
• If a product had no price change before that date, it should have the default price of 10.
• The task requires filtering the products based on the change date and joining the records to
get the price for the given date.

Datasets and SQL Schemas


• - Table creation
CREATE TABLE Products (
product_id INT,
new_price INT,
change_date DATE,
PRIMARY KEY (product_id, change_date)
);
• - Datasets
INSERT INTO Products (product_id, new_price, change_date)
VALUES
(1, 20, '2019-08-14'),
(2, 50, '2019-08-14'),
(1, 30, '2019-08-15'),
(1, 35, '2019-08-16'),
(2, 65, '2019-08-17'),
(3, 20, '2019-08-18');

Learnings
• Using LEFT JOIN to find the most recent price change before a specific date.
• Applying COALESCE() to handle products that have no price change by using the default
price (10).
• Using MAX() to filter out the most recent change for each product.

Solutions
• - PostgreSQL solution
SELECT
p.product_id,
COALESCE(MAX(pr.new_price), 10) AS price_on_2019_08_16
FROM
(SELECT DISTINCT product_id FROM Products) p
LEFT JOIN
Products pr ON p.product_id = pr.product_id AND pr.change_date <= '2019-08-16'
GROUP BY
p.product_id;
• - MySQL solution
SELECT
p.product_id,
COALESCE(MAX(pr.new_price), 10) AS price_on_2019_08_16
FROM
(SELECT DISTINCT product_id FROM Products) p
LEFT JOIN
Products pr ON p.product_id = pr.product_id AND pr.change_date <= '2019-08-16'
GROUP BY
p.product_id;
• Q.502
Question

623
1000+ SQL Interview Questions & Answers | By Zero Analyst

Given a table Queue where each row represents a person waiting to board a bus, and the bus
has a weight limit of 1000 kilograms, write a SQL query to find the name of the last person
that can board the bus without exceeding the weight limit. The turn column determines the
boarding order, and the weight column contains the weight of each person. The test cases are
generated such that the first person does not exceed the weight limit.

Explanation
• We need to iterate through the people in the queue and calculate the cumulative weight as
each person boards the bus.
• We stop once adding the next person would exceed the bus weight limit of 1000 kilograms.
• We must identify the last person who can board the bus without exceeding the weight
limit.
• This requires keeping track of the cumulative weight and finding the last valid person in
the sequence.

Datasets and SQL Schemas


• - Table creation
CREATE TABLE Queue (
person_id INT,
person_name VARCHAR(255),
weight INT,
turn INT
);
• - Datasets
INSERT INTO Queue (person_id, person_name, weight, turn)
VALUES
(1, 'Alice', 200, 1),
(2, 'Bob', 300, 2),
(3, 'Charlie', 500, 3),
(4, 'David', 400, 4),
(5, 'Eva', 250, 5);

Learnings
• Using SUM() with OVER() to calculate the cumulative sum of weights for each person.
• Filtering based on the cumulative sum to ensure that we don't exceed the weight limit of
1000 kg.
• Identifying the last person who can board using window functions or conditional
aggregation.

Solutions
• - PostgreSQL solution
WITH Cumulative_Weight AS (
SELECT
person_name,
weight,
turn,
SUM(weight) OVER (ORDER BY turn) AS total_weight
FROM Queue
)
SELECT
person_name
FROM
Cumulative_Weight

624
1000+ SQL Interview Questions & Answers | By Zero Analyst

WHERE
total_weight <= 1000
ORDER BY
turn DESC
LIMIT 1;
• - MySQL solution
WITH Cumulative_Weight AS (
SELECT
person_name,
weight,
turn,
SUM(weight) OVER (ORDER BY turn) AS total_weight
FROM Queue
)
SELECT
person_name
FROM
Cumulative_Weight
WHERE
total_weight <= 1000
ORDER BY
turn DESC
LIMIT 1;
• Q.503
Question
Given a table Accounts with the columns account_id and income, write a SQL query to
calculate the number of bank accounts for each salary category. The salary categories are as
follows:
• "Low Salary": All salaries strictly less than $20,000.
• "Average Salary": All salaries in the inclusive range [$20,000, $50,000].
• "High Salary": All salaries strictly greater than $50,000.
The result table must contain all three categories, even if some categories have zero accounts.

Explanation
• We need to categorize the bank accounts into three salary ranges.
• We count how many accounts fall into each category and return the count for each.
• If a category has no accounts, we should still return 0 for that category.
• This problem involves conditional aggregation based on income ranges.

Datasets and SQL Schemas


• - Table creation
CREATE TABLE Accounts (
account_id INT PRIMARY KEY,
income INT
);
• - Datasets
INSERT INTO Accounts (account_id, income)
VALUES
(3, 108939),
(2, 12747),
(8, 87709),
(6, 91796);

Learnings

625
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Using CASE WHEN statements to categorize the data.


• Using COUNT() for conditional aggregation to count the number of records in each
category.
• Ensuring all categories are represented even if there are no records in a category (achieved
using UNION ALL).

Solutions
• - PostgreSQL solution
SELECT 'Low Salary' AS category, COUNT(*) AS accounts_count
FROM Accounts
WHERE income < 20000
UNION ALL
SELECT 'Average Salary' AS category, COUNT(*) AS accounts_count
FROM Accounts
WHERE income BETWEEN 20000 AND 50000
UNION ALL
SELECT 'High Salary' AS category, COUNT(*) AS accounts_count
FROM Accounts
WHERE income > 50000;
• - MySQL solution
SELECT 'Low Salary' AS category, COUNT(*) AS accounts_count
FROM Accounts
WHERE income < 20000
UNION ALL
SELECT 'Average Salary' AS category, COUNT(*) AS accounts_count
FROM Accounts
WHERE income BETWEEN 20000 AND 50000
UNION ALL
SELECT 'High Salary' AS category, COUNT(*) AS accounts_count
FROM Accounts
WHERE income > 50000;
• Q.504
Question
Given a table employee with the columns employee_id, first_name, last_name,
department, and salary, write a PostgreSQL query to rank employees within their
respective departments based on their salary, in descending order. The employee with the
highest salary in a department should have a rank of 1.

Explanation
• We need to rank employees within each department based on their salary, where the
highest salary in each department should get rank 1.
• This can be accomplished using the RANK() window function, partitioned by department
and ordered by salary in descending order.
• Employees within the same department who have the same salary should receive the same
rank.

Datasets and SQL Schemas


• - Table creation
CREATE TABLE employee (
employee_id INT,
first_name VARCHAR(255),
last_name VARCHAR(255),
department VARCHAR(255),
salary INT

626
1000+ SQL Interview Questions & Answers | By Zero Analyst

);
• - Datasets
INSERT INTO employee (employee_id, first_name, last_name, department, salary)
VALUES
(101, 'John', 'Doe', 'IT', 50000),
(102, 'Jane', 'Smith', 'Accounting', 60000),
(103, 'Mary', 'Johnson', 'Marketing', 55000),
(104, 'James', 'Brown', 'IT', 70000),
(105, 'Patricia', 'Jones', 'Accounting', 65000);

Learnings
• Using the RANK() window function to assign ranks based on sorting criteria.
• Using PARTITION BY to calculate ranks within each department separately.
• Handling ties in salary with RANK(), ensuring employees with equal salaries receive the
same rank.

Solutions
• - PostgreSQL & MySQL solution
SELECT
employee_id,
first_name,
last_name,
department,
salary,
RANK() OVER (
PARTITION BY department
ORDER BY salary DESC
) AS rank
FROM employee;

• Q.505
Question
Given a table of PwC employee salary information, write a SQL query to find the top 3
highest-paid employees in each department.

Explanation
• We need to rank employees within each department based on their salary, selecting the top
3 highest-paid employees for each department.
• This can be achieved using the ROW_NUMBER() window function, which will assign a
unique rank to each employee based on their salary within each department.
• By filtering out employees with a rank greater than 3, we can retrieve only the top 3
employees for each department.

Datasets and SQL Schemas


• - Table creation
CREATE TABLE employee (
employee_id INT,
name VARCHAR(255),
salary INT,
department_id INT
);
• - Table creation for department

627
1000+ SQL Interview Questions & Answers | By Zero Analyst

CREATE TABLE department (


department_id INT,
department_name VARCHAR(255)
);
• - Datasets for employee table
INSERT INTO employee (employee_id, name, salary, department_id)
VALUES
(1, 'Emma Thompson', 3800, 1),
(2, 'Daniel Rodriguez', 2230, 1),
(3, 'Olivia Smith', 2000, 1),
(4, 'Noah Johnson', 6800, 2),
(5, 'Sophia Martinez', 1750, 1),
(8, 'William Davis', 6800, 2),
(10, 'James Anderson', 4000, 1);
• - Datasets for department table
INSERT INTO department (department_id, department_name)
VALUES
(1, 'Data Analytics'),
(2, 'Data Science');

Learnings
• Using the ROW_NUMBER() window function to rank employees within each department
based on salary.
• Using PARTITION BY to create ranks within each department.
• Filtering results to show only the top 3 employees by using the rank condition.

Solutions
• - PostgreSQL solution
WITH RankedEmployees AS (
SELECT
e.employee_id,
e.name,
e.salary,
e.department_id,
ROW_NUMBER() OVER (
PARTITION BY e.department_id
ORDER BY e.salary DESC
) AS rank
FROM employee e
)
SELECT
re.employee_id,
re.name,
re.salary,
d.department_name
FROM RankedEmployees re
JOIN department d ON re.department_id = d.department_id
WHERE re.rank <= 3
ORDER BY re.department_id, re.rank;
• Q.506
Question
How can you determine which records in one table are not present in another table?

Explanation
To find records in one table that are not present in another, there are a few SQL methods you
can use:

628
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Using LEFT JOIN: You can perform a LEFT JOIN between the two tables and check for
NULL values in the right-side table. If a record in the left table does not have a match in the
right table, it will show NULL for the right-side columns.
• Using EXCEPT: The EXCEPT operator returns the rows that are in the first query result but
not in the second. It is supported by PostgreSQL, SQL Server, and some other databases.

Datasets and SQL Schemas


• - Table creation for PwC Employees
CREATE TABLE pwc_employees (
id INT,
first_name VARCHAR(255),
last_name VARCHAR(255)
);
• - Table creation for PwC Managers
CREATE TABLE pwc_managers (
id INT,
first_name VARCHAR(255),
last_name VARCHAR(255)
);
• - Datasets for PwC Employees
INSERT INTO pwc_employees (id, first_name, last_name)
VALUES
(1, 'John', 'Doe'),
(2, 'Jane', 'Smith'),
(3, 'Mary', 'Johnson'),
(4, 'James', 'Brown');
• - Datasets for PwC Managers
INSERT INTO pwc_managers (id, first_name, last_name)
VALUES
(1, 'John', 'Doe'),
(4, 'James', 'Brown');

Learnings
• Using LEFT JOIN to find unmatched records from one table.
• Using EXCEPT to return rows that exist in one result set but not the other.
• Understanding that LEFT JOIN requires checking for NULL values to identify missing
records.
• EXCEPT is a set operation that directly compares two result sets and filters out duplicates.

Solutions
• - Using LEFT JOIN (Works in most databases)
SELECT *
FROM pwc_employees e
LEFT JOIN pwc_managers m
ON e.id = m.id
WHERE m.id IS NULL;

This query will return all records from the pwc_employees table where there is no matching
record in the pwc_managers table based on the id column.

• - Using EXCEPT (Works in PostgreSQL, SQL Server, etc.)


SELECT *
FROM pwc_employees
EXCEPT
SELECT *
FROM pwc_managers;

629
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Q.507
Question
You have two tables: orders (order_id, order_date, customer_id, total_amount) and
payments (payment_id, order_id, payment_date, payment_amount). Write a query to find
orders that have not been fully paid, i.e., where the total_amount in the orders table does
not match the sum of payment_amount in the payments table.

Explanation
• To find orders that have not been fully paid, we need to compare the total_amount from
the orders table with the sum of payment_amount from the payments table.
• We can achieve this by performing an INNER JOIN between the orders and payments
tables on the order_id, and then using a GROUP BY to aggregate payments by order_id.
• Finally, we filter out the orders where the sum of payment_amount is less than the
total_amount.

Datasets and SQL Schemas


• - Table creation for orders
CREATE TABLE orders (
order_id INT,
order_date DATE,
customer_id INT,
total_amount DECIMAL(10, 2)
);
• - Table creation for payments
CREATE TABLE payments (
payment_id INT,
order_id INT,
payment_date DATE,
payment_amount DECIMAL(10, 2)
);
• - Datasets for orders
INSERT INTO orders (order_id, order_date, customer_id, total_amount)
VALUES
(1, '2022-10-01', 101, 500.00),
(2, '2022-10-03', 102, 750.00),
(3, '2022-10-05', 103, 300.00);
• - Datasets for payments
INSERT INTO payments (payment_id, order_id, payment_date, payment_amount)
VALUES
(1, 1, '2022-10-02', 200.00),
(2, 1, '2022-10-04', 200.00),
(3, 2, '2022-10-04', 300.00);

Learnings
• Using JOIN to combine data from two tables based on a common column (order_id).
• Using SUM() with GROUP BY to aggregate the payment amounts.
• Filtering data to identify where the sum of payments does not match the total order
amount.

Solutions
• - PostgreSQL / MySQL Solution

630
1000+ SQL Interview Questions & Answers | By Zero Analyst

SELECT o.order_id, o.total_amount, COALESCE(SUM(p.payment_amount), 0) AS total_paid


FROM orders o
LEFT JOIN payments p ON o.order_id = p.order_id
GROUP BY o.order_id, o.total_amount
HAVING o.total_amount > COALESCE(SUM(p.payment_amount), 0);

This query performs the following:


• Uses a LEFT JOIN to include all orders, even those without matching payments.
• Aggregates the payment_amount for each order using SUM().
• Filters out orders where the total_amount is greater than the sum of payment_amount,
indicating that the order has not been fully paid.
• The COALESCE() function ensures that if there are no payments for an order, the sum will
default to 0.
• Q.508
Question
Using a table of customer orders (order_id, customer_id, product_id, order_date), write
a query to find duplicate orders. An order is considered a duplicate if the same customer has
placed the same order for the same product on the same day.

Explanation
To find duplicate orders, we need to identify rows in the orders table where the same
customer has placed the same order for the same product on the same day more than once.
This can be achieved by:
• Grouping the records by customer_id, product_id, and order_date.
• Counting how many times each combination occurs using COUNT().
• Filtering the results to show only those combinations where the count is greater than 1,
indicating a duplicate order.

Datasets and SQL Schemas


• - Table creation for orders
CREATE TABLE orders (
order_id INT,
customer_id INT,
product_id INT,
order_date DATE
);
• - Sample dataset for orders
INSERT INTO orders (order_id, customer_id, product_id, order_date)
VALUES
(1, 101, 2001, '2023-10-01'),
(2, 101, 2001, '2023-10-01'),
(3, 102, 2002, '2023-10-02'),
(4, 101, 2001, '2023-10-01'),
(5, 103, 2003, '2023-10-03'),
(6, 101, 2001, '2023-10-01');

Learnings
• Using GROUP BY to aggregate data by multiple columns (customer_id, product_id, and
order_date).
• Using HAVING to filter groups based on a condition (e.g., count > 1 for duplicates).

631
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Identifying duplicate records by counting the occurrences of specific combinations.

Solutions
• - PostgreSQL / MySQL Solution
SELECT customer_id, product_id, order_date, COUNT(*) AS duplicate_count
FROM orders
GROUP BY customer_id, product_id, order_date
HAVING COUNT(*) > 1;

This query works as follows:


• It groups the orders by customer_id, product_id, and order_date.
• It counts the number of occurrences for each group using COUNT(*).
• It filters the groups to only return those where the count is greater than 1, indicating that
the customer has placed the same order for the same product on the same day more than once.
• Q.509
Question
Using two tables: employees (employee_id, manager_id, salary) and departments
(department_id, department_name, manager_id), write a query to find the highest-paid
employee in each department and the difference in salary between the highest-paid employee
and their department manager.

Explanation
To solve this problem, we need to:
• Join the employees table with the departments table to get the department manager
information.
• Identify the highest-paid employee within each department using the MAX() function along
with GROUP BY.
• Calculate the salary difference between the highest-paid employee and their department
manager.
• Ensure that the employee with the highest salary is associated with their department
manager's salary to compute the difference.

Datasets and SQL Schemas


• - Table creation for employees
CREATE TABLE employees (
employee_id INT,
manager_id INT,
salary DECIMAL(10, 2)
);
• - Table creation for departments
CREATE TABLE departments (
department_id INT,
department_name VARCHAR(255),
manager_id INT
);
• - Sample datasets for employees
INSERT INTO employees (employee_id, manager_id, salary)
VALUES
(1, 101, 60000),
(2, 101, 50000),

632
1000+ SQL Interview Questions & Answers | By Zero Analyst

(3, 102, 80000),


(4, 102, 75000),
(5, 103, 90000);
• - Sample datasets for departments
INSERT INTO departments (department_id, department_name, manager_id)
VALUES
(1, 'IT', 101),
(2, 'Finance', 102),
(3, 'HR', 103);

Learnings
• Using JOIN to combine related data from two tables based on common columns
(manager_id).
• Using GROUP BY with aggregation functions like MAX() to find the highest-paid employee
in each department.
• Calculating differences in salary using simple arithmetic.

Solutions
• - PostgreSQL / MySQL Solution
SELECT d.department_name,
e.employee_id AS highest_paid_employee_id,
e.salary AS highest_salary,
m.salary AS manager_salary,
e.salary - m.salary AS salary_difference
FROM departments d
JOIN employees e ON d.manager_id = e.manager_id
JOIN employees m ON d.manager_id = m.employee_id
WHERE e.salary = (
SELECT MAX(salary)
FROM employees
WHERE manager_id = d.manager_id)
ORDER BY d.department_name;

Explanation:
• We perform two JOIN operations:
• The first JOIN connects the departments table with the employees table to associate each
department with its manager.
• The second JOIN brings the salary of the manager for the calculation of the salary
difference.
• We use a WHERE clause with a subquery to select only the highest-paid employee for each
department:
• The subquery SELECT MAX(salary) finds the employee with the highest salary within the
department.
• The difference between the highest-paid employee's salary and the manager's salary is
calculated using simple arithmetic: e.salary - m.salary.
• Finally, the results are ordered by the department name.

• Q.510
Question
Given a table of customer purchases (purchase_id, customer_id, purchase_amount,
purchase_date), write a query to rank customers based on their total purchases over the last
6 months. Exclude customers who made fewer than 5 purchases during this period.

633
1000+ SQL Interview Questions & Answers | By Zero Analyst

Additionally, calculate the rank within each segment of customers who made purchases
above $500.

Explanation
To solve this problem, we need to:
• Filter the purchases made within the last 6 months.
• Count the number of purchases for each customer within this period and exclude customers
with fewer than 5 purchases.
• Calculate the total purchase amount for each customer over the last 6 months.
• Rank the customers based on their total purchase amount, and also rank the customers who
made purchases greater than $500 within their segment.
• Use RANK() or ROW_NUMBER() for the rankings and PARTITION BY to calculate the rank for
customers who made purchases over $500.

Datasets and SQL Schemas


• - Table creation for customer_purchases
CREATE TABLE customer_purchases (
purchase_id INT,
customer_id INT,
purchase_amount DECIMAL(10, 2),
purchase_date DATE
);
• - Sample dataset for customer_purchases
INSERT INTO customer_purchases (purchase_id, customer_id, purchase_amount, purchase_date
)
VALUES
(1, 101, 200, '2023-07-15'),
(2, 101, 300, '2023-08-05'),
(3, 101, 150, '2023-09-10'),
(4, 101, 400, '2023-10-01'),
(5, 101, 600, '2023-10-10'),
(6, 102, 100, '2023-07-25'),
(7, 102, 200, '2023-08-15'),
(8, 102, 250, '2023-09-05'),
(9, 102, 400, '2023-09-30'),
(10, 102, 700, '2023-10-02'),
(11, 103, 50, '2023-05-12'),
(12, 103, 100, '2023-06-25'),
(13, 103, 300, '2023-08-05');

Learnings
• Using DATE_SUB() or CURRENT_DATE to filter records within a specific time range (last 6
months).
• Using COUNT() with GROUP BY to filter customers with fewer than 5 purchases.
• Using RANK() to rank customers based on their total purchases.
• Using PARTITION BY to rank within specific segments (those with purchases above $500).

Solutions
• - PostgreSQL / MySQL Solution
WITH filtered_purchases AS (
SELECT
customer_id,
SUM(purchase_amount) AS total_spent,

634
1000+ SQL Interview Questions & Answers | By Zero Analyst

COUNT(purchase_id) AS num_purchases
FROM
customer_purchases
WHERE
purchase_date >= CURRENT_DATE - INTERVAL '6 months'
GROUP BY
customer_id
HAVING
COUNT(purchase_id) >= 5
),
ranked_customers AS (
SELECT
customer_id,
total_spent,
RANK() OVER (ORDER BY total_spent DESC) AS overall_rank,
RANK() OVER (PARTITION BY CASE WHEN total_spent > 500 THEN 1 ELSE 0 END ORDER BY
total_spent DESC) AS segment_rank
FROM
filtered_purchases
)
SELECT
customer_id,
total_spent,
overall_rank,
segment_rank
FROM
ranked_customers;

Explanation:
• filtered_purchases CTE:
• This Common Table Expression (CTE) filters purchases made in the last 6 months using
the condition purchase_date >= CURRENT_DATE - INTERVAL '6 months'.
• It aggregates the total purchase amount (SUM(purchase_amount)) and counts the number
of purchases (COUNT(purchase_id)) for each customer.
• The HAVING COUNT(purchase_id) >= 5 clause ensures that only customers with 5 or
more purchases are included.
• ranked_customers CTE:
• This CTE uses RANK() to rank the customers based on their total spending (total_spent)
in descending order.
• The RANK() function calculates the overall_rank for all customers.
• For the segment_rank, customers who have spent more than $500 are grouped using the
PARTITION BY clause. This allows ranking customers based on their spending within this
segment, with those who have spent above $500 being ranked separately from others.
• Final SELECT:
• The final query selects the customer_id, total_spent, overall_rank, and
segment_rank for each customer.
• Q.511
Calculate the Average Number of Clients per Consultant
You are given two tables: consultants (consultant_id, consultant_name) and clients
(client_id, client_name, consultant_id). Write a SQL query to calculate the average number
of clients assigned to each consultant.

Explanation
• We need to count the number of clients assigned to each consultant and calculate the
average number of clients for all consultants.

635
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Use COUNT() and GROUP BY to count clients per consultant.

Datasets and SQL Schemas


-- Table creation for `consultants`
CREATE TABLE consultants (
consultant_id INT PRIMARY KEY,
consultant_name VARCHAR(100)
);

-- Table creation for `clients`


CREATE TABLE clients (
client_id INT PRIMARY KEY,
client_name VARCHAR(100),
consultant_id INT,
FOREIGN KEY (consultant_id) REFERENCES consultants(consultant_id)
);

-- Sample data for `consultants`


INSERT INTO consultants (consultant_id, consultant_name)
VALUES
(1, 'John Doe'),
(2, 'Jane Smith'),
(3, 'Alex Johnson');

-- Sample data for `clients`


INSERT INTO clients (client_id, client_name, consultant_id)
VALUES
(1, 'Client A', 1),
(2, 'Client B', 1),
(3, 'Client C', 2),
(4, 'Client D', 3),
(5, 'Client E', 3);

Learnings
• Using COUNT() to count rows within a GROUP BY clause.
• Calculating average values using AVG().

Solutions
• - PostgreSQL / MySQL Solution
SELECT
AVG(client_count) AS average_clients_per_consultant
FROM (
SELECT
consultant_id,
COUNT(client_id) AS client_count
FROM clients
GROUP BY consultant_id
) AS consultant_clients;
• Q.512
Find the Department with the Most Clients
You are given two tables: consultants (consultant_id, department_id) and clients
(client_id, consultant_id). Write a query to find the department with the most clients. If there
is a tie, return all departments with the same number of clients.

Explanation
• Join the two tables to get the number of clients for each department.

636
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Use COUNT() and GROUP BY to calculate the number of clients per department.
• Use ORDER BY to get the department with the most clients.

Datasets and SQL Schemas


-- Table creation for `consultants`
CREATE TABLE consultants (
consultant_id INT PRIMARY KEY,
department_id INT
);

-- Table creation for `clients`


CREATE TABLE clients (
client_id INT PRIMARY KEY,
consultant_id INT,
FOREIGN KEY (consultant_id) REFERENCES consultants(consultant_id)
);

-- Sample data for `consultants`


INSERT INTO consultants (consultant_id, department_id)
VALUES
(1, 101),
(2, 102),
(3, 101),
(4, 103);

-- Sample data for `clients`


INSERT INTO clients (client_id, consultant_id)
VALUES
(1, 1),
(2, 2),
(3, 3),
(4, 3),
(5, 4);

Learnings
• Using JOIN to combine data from multiple tables.
• Using COUNT() to aggregate data per group.
• Sorting results to find the maximum value.

Solutions
• - PostgreSQL / MySQL Solution
SELECT department_id, COUNT(client_id) AS num_clients
FROM clients
JOIN consultants ON clients.consultant_id = consultants.consultant_id
GROUP BY department_id
HAVING COUNT(client_id) = (
SELECT MAX(client_count)
FROM (
SELECT department_id, COUNT(client_id) AS client_count
FROM clients
JOIN consultants ON clients.consultant_id = consultants.consultant_id
GROUP BY department_id
) AS department_counts
)
ORDER BY num_clients DESC;
• Q.513
Identify Clients Without Consultants

637
1000+ SQL Interview Questions & Answers | By Zero Analyst

You are given two tables: consultants (consultant_id, consultant_name) and clients
(client_id, consultant_id). Write a SQL query to identify clients who do not have a consultant
assigned.

Explanation
• Use a LEFT JOIN between clients and consultants on consultant_id.
• Filter the results to show only clients without a consultant assigned by checking for NULL in
the consultant_id from the consultants table.

Datasets and SQL Schemas


-- Table creation for `consultants`
CREATE TABLE consultants (
consultant_id INT PRIMARY KEY,
consultant_name VARCHAR(100)
);

-- Table creation for `clients`


CREATE TABLE clients (
client_id INT PRIMARY KEY,
client_name VARCHAR(100),
consultant_id INT,
FOREIGN KEY (consultant_id) REFERENCES consultants(consultant_id)
);

-- Sample data for `consultants`


INSERT INTO consultants (consultant_id, consultant_name)
VALUES
(1, 'John Doe'),
(2, 'Jane Smith');

-- Sample data for `clients`


INSERT INTO clients (client_id, client_name, consultant_id)
VALUES
(1, 'Client A', 1),
(2, 'Client B', NULL),
(3, 'Client C', 2);

Learnings
• Using LEFT JOIN to get all records from the left table and matching records from the right.
• Filtering for NULL values to identify unmatched records.

Solutions
• - PostgreSQL / MySQL Solution
SELECT client_id, client_name
FROM clients
LEFT JOIN consultants ON clients.consultant_id = consultants.consultant_id
WHERE consultants.consultant_id IS NULL;
• Q.514
Count Clients Assigned to Multiple Consultants
You are given two tables: consultants (consultant_id, consultant_name) and clients
(client_id, consultant_id). Write a SQL query to count how many clients are assigned to more
than one consultant.

638
1000+ SQL Interview Questions & Answers | By Zero Analyst

Explanation
• Use GROUP BY on client_id and HAVING COUNT(DISTINCT consultant_id) > 1 to find
clients who are assigned to multiple consultants.

Datasets and SQL Schemas


-- Table creation for `consultants`
CREATE TABLE consultants (
consultant_id INT PRIMARY KEY,
consultant_name VARCHAR(100)
);

-- Table creation for `clients`


CREATE TABLE clients (
client_id INT PRIMARY KEY,
consultant_id INT,
FOREIGN KEY (consultant_id) REFERENCES consultants(consultant_id)
);

-- Sample data for `consultants`


INSERT INTO consultants (consultant_id, consultant_name)
VALUES
(1, 'John Doe'),
(2, 'Jane Smith'),
(3, 'Alex Johnson');

-- Sample data for `clients`


INSERT INTO clients (client_id, consultant_id)
VALUES
(1, 1),
(2, 1),
(2, 2),
(3, 1),
(3, 3);

Learnings
• Using GROUP BY to group results by specific columns.
• Using HAVING to filter groups based on a condition.
• Using COUNT(DISTINCT) to count unique values.

Solutions
• - PostgreSQL / MySQL Solution
SELECT COUNT(DISTINCT client_id) AS clients_assigned_to_multiple_consultants
FROM clients
GROUP BY client_id
HAVING COUNT(DISTINCT consultant_id) > 1;
• Q.515
Find the Most Recent Client Assignment
You are given two tables: consultants (consultant_id, consultant_name) and clients
(client_id, consultant_id, assignment_date). Write a SQL query to find the most recent
assignment for each client.

Explanation
• Use ROW_NUMBER() to assign a rank to each client's assignment based on the
assignment_date.

639
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Filter the result to select only the most recent assignment for each client.

Datasets and SQL Schemas


-- Table creation for `consultants`
CREATE TABLE consultants (
consultant_id INT PRIMARY KEY,
consultant_name VARCHAR(100)
);

-- Table creation for `clients`


CREATE TABLE clients (
client_id INT PRIMARY KEY,
consultant_id INT,
assignment_date DATE,
FOREIGN KEY (consultant_id) REFERENCES consultants(consultant_id)
);

-- Sample data for `consultants`


INSERT INTO consultants (consultant_id, consultant_name)
VALUES
(1, 'John Doe'),
(2, 'Jane Smith'),
(3, 'Alex Johnson');

-- Sample data for `clients`


INSERT INTO clients (client_id, consultant_id, assignment_date)
VALUES
(1, 1, '2023-05-01'),
(1, 2, '2023-08-01'),
(2, 1, '2023-07-15'),
(3, 3, '2023-06-20');

Learnings
• Using ROW_NUMBER() to rank records within partitions.
• Filtering for the most recent record per client.

Solutions
• - PostgreSQL / MySQL Solution
WITH ranked_assignments AS (
SELECT
client_id,
consultant_id,
assignment_date,
ROW_NUMBER() OVER (PARTITION BY client_id ORDER BY assignment_date DESC) AS rn
FROM clients
)
SELECT client_id, consultant_id, assignment_date
FROM ranked_assignments
WHERE rn = 1;

These SQL questions are tailored for the PwC domain and focus on concepts like joins,
aggregation, ranking, and filtering. They offer a mix of challenges involving client
relationships, assignments, and consultant data.
• Q.516
Calculate the Total Project Budget by Department

640
1000+ SQL Interview Questions & Answers | By Zero Analyst

You are given two tables: projects (project_id, project_name, department_id, budget) and
departments (department_id, department_name). Write a SQL query to calculate the total
budget for each department, including departments that have no projects.

Explanation
• Use LEFT JOIN to include all departments, even if they have no projects.
• Use SUM() to calculate the total budget for each department.

Datasets and SQL Schemas

CREATE TABLE projects (


project_id INT PRIMARY KEY,
project_name VARCHAR(100),
department_id INT,
budget INT
);

CREATE TABLE departments (


department_id INT PRIMARY KEY,
department_name VARCHAR(100)
);

INSERT INTO projects (project_id, project_name, department_id, budget)


VALUES
(1, 'Project Alpha', 1, 100000),
(2, 'Project Beta', 1, 200000),
(3, 'Project Gamma', 2, 150000),
(4, 'Project Delta', 3, 50000),
(5, 'Project Epsilon', 3, 120000);

INSERT INTO departments (department_id, department_name)


VALUES
(1, 'IT'),
(2, 'Marketing'),
(3, 'Finance'),
(4, 'Operations');

Learnings
• Using LEFT JOIN to ensure all departments are included.
• Aggregating with SUM().

Solutions
• - PostgreSQL / MySQL Solution
SELECT d.department_name, COALESCE(SUM(p.budget), 0) AS total_budget
FROM departments d
LEFT JOIN projects p ON d.department_id = p.department_id
GROUP BY d.department_name;
• Q.517
Identify the Clients Who Have Not Been Assigned a Consultant
You are given two tables: clients (client_id, client_name) and consultants (consultant_id,
consultant_name). Write a SQL query to find clients who are not currently assigned to any
consultants.

641
1000+ SQL Interview Questions & Answers | By Zero Analyst

Explanation
• Use a LEFT JOIN between clients and consultants, and filter for NULL in the
consultant_id column to find clients without an assigned consultant.

Datasets and SQL Schemas


-- Table creation for `clients`
CREATE TABLE clients (
client_id INT PRIMARY KEY,
client_name VARCHAR(100)
);

-- Table creation for `consultants`


CREATE TABLE consultants (
consultant_id INT PRIMARY KEY,
consultant_name VARCHAR(100)
);

-- Sample data for `clients`


INSERT INTO clients (client_id, client_name)
VALUES
(1, 'Client A'),
(2, 'Client B'),
(3, 'Client C'),
(4, 'Client D'),
(5, 'Client E');

-- Sample data for `consultants`


INSERT INTO consultants (consultant_id, consultant_name)
VALUES
(1, 'John Doe'),
(2, 'Jane Smith');

Learnings
• Using LEFT JOIN and filtering for NULL values.

Solutions
• - PostgreSQL / MySQL Solution
SELECT c.client_id, c.client_name
FROM clients c
LEFT JOIN consultants con ON c.client_id = con.consultant_id
WHERE con.consultant_id IS NULL;
• Q.518
Find Projects That Exceed the Average Budget of Their Department
You are given two tables: projects (project_id, project_name, department_id, budget) and
departments (department_id, department_name). Write a SQL query to find all projects that
exceed the average budget for their respective department.

Explanation
• Calculate the average budget per department using a subquery or a JOIN.
• Compare each project's budget against the department's average budget.

Datasets and SQL Schemas


-- Table creation for `projects`

642
1000+ SQL Interview Questions & Answers | By Zero Analyst

CREATE TABLE projects (


project_id INT PRIMARY KEY,
project_name VARCHAR(100),
department_id INT,
budget INT
);

-- Table creation for `departments`


CREATE TABLE departments (
department_id INT PRIMARY KEY,
department_name VARCHAR(100)
);

-- Sample data for `projects`


INSERT INTO projects (project_id, project_name, department_id, budget)
VALUES
(1, 'Project Alpha', 1, 120000),
(2, 'Project Beta', 1, 250000),
(3, 'Project Gamma', 2, 150000),
(4, 'Project Delta', 3, 200000),
(5, 'Project Epsilon', 3, 100000),
(6, 'Project Zeta', 2, 180000);

-- Sample data for `departments`


INSERT INTO departments (department_id, department_name)
VALUES
(1, 'IT'),
(2, 'Marketing'),
(3, 'Finance');

Learnings
• Using a subquery to calculate the average budget per department.
• Filtering based on comparison.

Solutions
• - PostgreSQL / MySQL Solution
SELECT p.project_name, p.budget, d.department_name
FROM projects p
JOIN departments d ON p.department_id = d.department_id
WHERE p.budget > (
SELECT AVG(budget)
FROM projects
WHERE department_id = p.department_id
);
• Q.519
Calculate the Total Spending by Clients in Each Project
You are given two tables: clients (client_id, client_name) and purchases (purchase_id,
client_id, project_id, amount_spent). Write a SQL query to calculate the total amount spent
by clients in each project.

Explanation
• Join the purchases and clients tables to get the client names and their spending.
• Group by project and sum the total spending.

Datasets and SQL Schemas


-- Table creation for `clients`
CREATE TABLE clients (

643
1000+ SQL Interview Questions & Answers | By Zero Analyst

client_id INT PRIMARY KEY,


client_name VARCHAR(100)
);

-- Table creation for `purchases`


CREATE TABLE purchases (
purchase_id INT PRIMARY KEY,
client_id INT,
project_id INT,
amount_spent INT,
FOREIGN KEY (client_id) REFERENCES clients(client_id)
);

-- Sample data for `clients`


INSERT INTO clients (client_id, client_name)
VALUES
(1, 'Client A'),
(2, 'Client B'),
(3, 'Client C');

-- Sample data for `purchases`


INSERT INTO purchases (purchase_id, client_id, project_id, amount_spent)
VALUES
(1, 1, 101, 5000),
(2, 1, 102, 10000),
(3, 2, 101, 3000),
(4, 3, 103, 7000),
(5, 2, 103, 8000),
(6, 3, 102, 4000);

Learnings
• Using JOIN to combine data from multiple tables.
• Using SUM() and GROUP BY to calculate the total spending per project.

Solutions
• - PostgreSQL / MySQL Solution
SELECT p.project_id, SUM(p.amount_spent) AS total_spending
FROM purchases p
GROUP BY p.project_id;
• Q.520
Find the Consultants with the Highest Number of Clients
You are given two tables: consultants (consultant_id, consultant_name) and clients
(client_id, consultant_id). Write a SQL query to find the consultant(s) with the highest
number of assigned clients.

Explanation
• Use GROUP BY to count the number of clients per consultant.
• Use ORDER BY and LIMIT to find the consultant(s) with the maximum client count.

Datasets and SQL Schemas


-- Table creation for `consultants`
CREATE TABLE consultants (
consultant_id INT PRIMARY KEY,
consultant_name VARCHAR(100)
);

-- Table creation for `clients`

644
1000+ SQL Interview Questions & Answers | By Zero Analyst

CREATE TABLE clients (


client_id INT PRIMARY KEY,
consultant_id INT,
FOREIGN KEY (consultant_id) REFERENCES consultants(consultant_id)
);

-- Sample data for `consultants`


INSERT INTO consultants (consultant_id, consultant_name)
VALUES
(1, 'John Doe'),
(2, 'Jane Smith'),
(3, 'Alex Johnson');

-- Sample data for `clients`


INSERT INTO clients (client_id, consultant_id)
VALUES
(1, 1),
(2, 1),
(3, 1),
(4, 2),
(5, 2),
(6, 3);

Learnings
• Using GROUP BY to aggregate data by consultant.
• Sorting and using LIMIT to find the top consultant(s).

Solutions
• - PostgreSQL / MySQL Solution
SELECT c.consultant_name, COUNT(cl.client_id) AS num_clients
FROM consultants c
JOIN clients cl ON c.consultant_id = cl.consultant_id
GROUP BY c.consultant_name
ORDER BY num_clients DESC
LIMIT 1;

Cisco
• Q.521
Question
Calculate customer product scores over time
Cisco Systems is interested in how their different network components' quality ratings vary
over time by clients. Write a SQL query to calculate the average star ratings for each product
per month.
Explanation
You need to calculate the average star ratings for each product by month. Use date_trunc()
to truncate the submit_date to the month level and AVG() to calculate the average stars.
Group the results by the truncated month and product.
Datasets and SQL Schemas
• - Table creation
CREATE TABLE product_reviews (
review_id INT,
client_id INT,
submit_date DATE,
product_id VARCHAR(50),
stars INT

645
1000+ SQL Interview Questions & Answers | By Zero Analyst

);
• - Datasets
INSERT INTO product_reviews (review_id, client_id, submit_date, product_id, stars)
VALUES
(6171, 123, '2022-06-08', 'RTR-901', 4),
(7802, 265, '2022-06-10', 'SWT-1050', 4),
(5293, 362, '2022-06-18', 'RTR-901', 3),
(6352, 192, '2022-07-26', 'SWT-1050', 3),
(4517, 981, '2022-07-05', 'SWT-1050', 2);

Learnings
• date_trunc() function to extract the month from a date
• AVG() function to calculate the average
• Grouping results using GROUP BY
• Ordering results with ORDER BY
Solutions
• - PostgreSQL solution
SELECT
date_trunc('month', submit_date) AS month,
product_id AS product,
AVG(stars) AS avg_stars
FROM
product_reviews
GROUP BY
month,
product
ORDER BY
month,
product;
• - MySQL solution
SELECT
DATE_FORMAT(submit_date, '%Y-%m-01') AS month,
product_id AS product,
AVG(stars) AS avg_stars
FROM
product_reviews
GROUP BY
month,
product
ORDER BY
month,
product;
• Q.522
Question
Find all patients who consulted both Apollo and Fortis hospitals in the past year.
Explanation
To find patients who have visited both Apollo and Fortis hospitals in the past year, use a JOIN
between the two tables on the patient_id field and filter the results to include only records
from the past year using CURRENT_DATE. Ensure that the patients appear in both tables.
Datasets and SQL Schemas
• - Table creation
CREATE TABLE apollo_patients (
patient_id INT,
patient_name VARCHAR(100),
consultation_date DATE
);

CREATE TABLE fortis_patients (

646
1000+ SQL Interview Questions & Answers | By Zero Analyst

patient_id INT,
patient_name VARCHAR(100),
consultation_date DATE
);
• - Datasets
INSERT INTO apollo_patients VALUES
(1, 'Ravi Joshi', '2023-05-15'),
(2, 'Sanya Kapoor', '2023-03-12'),
(3, 'Anuj Sethi', '2023-02-10');

INSERT INTO fortis_patients VALUES


(2, 'Sanya Kapoor', '2023-07-10'),
(3, 'Anuj Sethi', '2023-06-05'),
(4, 'Kunal Singh', '2023-04-20');

Learnings
• Using JOIN to combine data from two tables based on a common field
• Filtering records with CURRENT_DATE for the past year using DATE_SUB() or equivalent
• Ensuring the same patient exists in both tables
Solutions
• - PostgreSQL solution
SELECT DISTINCT ap.patient_id, ap.patient_name
FROM apollo_patients ap
JOIN fortis_patients fp
ON ap.patient_id = fp.patient_id
WHERE ap.consultation_date >= CURRENT_DATE - INTERVAL '1 year'
AND fp.consultation_date >= CURRENT_DATE - INTERVAL '1 year';
• - MySQL solution
SELECT DISTINCT ap.patient_id, ap.patient_name
FROM apollo_patients ap
JOIN fortis_patients fp
ON ap.patient_id = fp.patient_id
WHERE ap.consultation_date >= CURDATE() - INTERVAL 1 YEAR
AND fp.consultation_date >= CURDATE() - INTERVAL 1 YEAR;
• Q.523
Question
Find the top 5 cities with the highest number of tech startups that have received funding from
both angel investors and venture capitalists.
Explanation
To solve this, you need to identify startups that have received investments from both angel
investors and venture capitalists. Then, count the number of startups per city and select the
top 5 cities based on this count. Use JOIN operations to match startups with investments from
both sources, group by city_id, and sort by the number of startups.
Datasets and SQL Schemas
• - Table creation
CREATE TABLE cities (
city_id INT,
city_name VARCHAR(100)
);

CREATE TABLE startups (


startup_id INT,
startup_name VARCHAR(100),
city_id INT
);

CREATE TABLE angel_investments (


investment_id INT,

647
1000+ SQL Interview Questions & Answers | By Zero Analyst

startup_id INT,
amount DECIMAL(10, 2)
);

CREATE TABLE venture_capital_investments (


investment_id INT,
startup_id INT,
amount DECIMAL(10, 2)
);
• - Datasets
INSERT INTO cities VALUES
(1, 'Bangalore'),
(2, 'Delhi'),
(3, 'Mumbai'),
(4, 'Hyderabad'),
(5, 'Pune');

INSERT INTO startups VALUES


(1, 'Flipkart', 1),
(2, 'Ola', 1),
(3, 'Paytm', 2),
(4, 'Zomato', 3),
(5, 'Cure.fit', 4);

INSERT INTO angel_investments VALUES


(1, 1, 500000),
(2, 2, 300000),
(3, 3, 200000),
(4, 4, 400000),
(5, 5, 250000);

INSERT INTO venture_capital_investments VALUES


(1, 1, 1000000),
(2, 2, 700000),
(3, 3, 800000),
(4, 4, 600000),
(5, 5, 900000);

Learnings
• Using JOIN to combine data from multiple tables based on common keys
• Filtering startups that have investments from both angel investors and venture capitalists
• Counting the number of startups per city and sorting the results
• Using GROUP BY and ORDER BY to aggregate and rank the results
Solutions
• - PostgreSQL solution
SELECT c.city_name, COUNT(DISTINCT s.startup_id) AS startup_count
FROM cities c
JOIN startups s ON c.city_id = s.city_id
JOIN angel_investments ai ON s.startup_id = ai.startup_id
JOIN venture_capital_investments vci ON s.startup_id = vci.startup_id
GROUP BY c.city_name
ORDER BY startup_count DESC
LIMIT 5;
• - MySQL solution
SELECT c.city_name, COUNT(DISTINCT s.startup_id) AS startup_count
FROM cities c
JOIN startups s ON c.city_id = s.city_id
JOIN angel_investments ai ON s.startup_id = ai.startup_id
JOIN venture_capital_investments vci ON s.startup_id = vci.startup_id
GROUP BY c.city_name
ORDER BY startup_count DESC
LIMIT 5;
• Q.524
Question

648
1000+ SQL Interview Questions & Answers | By Zero Analyst

List all products sold in different categories that have received reviews from at least 3 unique
customers across both online and physical stores.
Explanation
You need to identify products that have received reviews from at least 3 unique customers
across both online and physical stores. First, join the online_reviews and
physical_reviews tables with the products table. Then, count the distinct customers per
product and filter out those with fewer than 3 unique customers. Finally, list the products
along with their categories.
Datasets and SQL Schemas
• - Table creation
CREATE TABLE products (
product_id INT,
product_name VARCHAR(100),
category_id INT
);

CREATE TABLE categories (


category_id INT,
category_name VARCHAR(100)
);

CREATE TABLE online_reviews (


review_id INT,
product_id INT,
customer_id INT
);

CREATE TABLE physical_reviews (


review_id INT,
product_id INT,
customer_id INT
);
• - Datasets
INSERT INTO categories VALUES
(1, 'Electronics'),
(2, 'Books'),
(3, 'Clothing'),
(4, 'Home Appliances');

INSERT INTO products VALUES


(1, 'Smartphone', 1),
(2, 'Laptop', 1),
(3, 'Novel', 2),
(4, 'Shirt', 3),
(5, 'Washing Machine', 4);

INSERT INTO online_reviews VALUES


(1, 1, 1),
(2, 1, 2),
(3, 2, 1),
(4, 3, 3);

INSERT INTO physical_reviews VALUES


(1, 1, 3),
(2, 2, 1),
(3, 3, 2),
(4, 4, 4);

Learnings
• Combining data from multiple tables using JOIN
• Counting unique customers using COUNT(DISTINCT ...)
• Filtering products based on a condition (HAVING clause)

649
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Grouping data with GROUP BY and joining with categories to display the category name
Solutions
• - PostgreSQL solution
SELECT p.product_name, c.category_name
FROM products p
JOIN categories c ON p.category_id = c.category_id
LEFT JOIN (
SELECT product_id, customer_id
FROM online_reviews
UNION
SELECT product_id, customer_id
FROM physical_reviews
) r ON p.product_id = r.product_id
GROUP BY p.product_name, c.category_name
HAVING COUNT(DISTINCT r.customer_id) >= 3;
• - MySQL solution
SELECT p.product_name, c.category_name
FROM products p
JOIN categories c ON p.category_id = c.category_id
LEFT JOIN (
SELECT product_id, customer_id
FROM online_reviews
UNION
SELECT product_id, customer_id
FROM physical_reviews
) r ON p.product_id = r.product_id
GROUP BY p.product_name, c.category_name
HAVING COUNT(DISTINCT r.customer_id) >= 3;
• Q.525
Question
Identify the products with sales above the average for their category.
Explanation
You need to calculate the average sales for each product category and then find the products
whose sales are above the average for their respective categories. This can be done by first
calculating the average sales per category and then comparing each product's sales with that
average.
Datasets and SQL Schemas
• - Table creation
CREATE TABLE products (
product_id INT,
product_name VARCHAR(100),
category_name VARCHAR(100)
);

CREATE TABLE sales (


sale_id INT,
product_id INT,
sales_amount DECIMAL(10, 2)
);
• - Datasets
INSERT INTO products VALUES
(1, 'Laptop', 'Electronics'),
(2, 'Smartphone', 'Electronics'),
(3, 'Tablet', 'Electronics'),
(4, 'Office Chair', 'Furniture');

INSERT INTO sales VALUES


(1, 1, 50000.00),
(2, 2, 30000.00),
(3, 1, 70000.00),

650
1000+ SQL Interview Questions & Answers | By Zero Analyst

(4, 4, 20000.00);

Learnings
• Using JOIN to combine data from multiple tables
• Using a subquery to calculate average sales by category
• Filtering products that have sales above the average using a HAVING clause
Solutions
• - PostgreSQL solution
SELECT p.product_name, p.category_name, s.sales_amount
FROM products p
JOIN sales s ON p.product_id = s.product_id
WHERE s.sales_amount > (
SELECT AVG(sales_amount)
FROM sales s2
JOIN products p2 ON s2.product_id = p2.product_id
WHERE p2.category_name = p.category_name
)
ORDER BY p.category_name, s.sales_amount DESC;
• - MySQL solution
SELECT p.product_name, p.category_name, s.sales_amount
FROM products p
JOIN sales s ON p.product_id = s.product_id
WHERE s.sales_amount > (
SELECT AVG(sales_amount)
FROM sales s2
JOIN products p2 ON s2.product_id = p2.product_id
WHERE p2.category_name = p.category_name
)
ORDER BY p.category_name, s.sales_amount DESC;
• Q.526
Question
Identify all customers who made purchases in both January and February.
Explanation
You need to find customers who have made purchases in both January and February. This
can be achieved by filtering the sales data for each customer and checking if there are
purchases in both months. You can use GROUP BY to group by customer_id and HAVING to
ensure that both January and February sales are present for each customer.
Datasets and SQL Schemas
• - Table creation
CREATE TABLE sales (
sale_id INT PRIMARY KEY,
customer_id INT,
sale_date DATE
);
• - Datasets
INSERT INTO sales VALUES
(1, 1, '2023-01-15'),
(2, 2, '2023-02-20'),
(3, 1, '2023-02-28'),
(4, 3, '2023-01-10'),
(5, 3, '2023-02-05'),
(6, 4, '2023-01-22'),
(7, 5, '2023-02-18'),
(8, 6, '2023-01-05'),
(9, 6, '2023-02-10'),
(10, 7, '2023-01-14'),
(11, 7, '2023-02-25'),
(12, 8, '2023-01-30'),

651
1000+ SQL Interview Questions & Answers | By Zero Analyst

(13, 9, '2023-02-01'),
(14, 10, '2023-01-05'),
(15, 11, '2023-02-15'),
(16, 11, '2023-01-28');

Learnings
• Filtering data by specific months using EXTRACT(MONTH FROM ...) or MONTH()
• Grouping data by customer_id
• Using HAVING to filter customers who have purchases in both months
Solutions
• - PostgreSQL solution
SELECT customer_id
FROM sales
WHERE EXTRACT(MONTH FROM sale_date) IN (1, 2)
GROUP BY customer_id
HAVING COUNT(DISTINCT EXTRACT(MONTH FROM sale_date)) = 2;
• - MySQL solution
SELECT customer_id
FROM sales
WHERE MONTH(sale_date) IN (1, 2)
GROUP BY customer_id
HAVING COUNT(DISTINCT MONTH(sale_date)) = 2;
• Q.527
Question
List the airlines British Airways customers traveled with in addition to British Airways.
Explanation
You need to find customers who have traveled with British Airways and then identify the
other airlines they have traveled with. Use a JOIN to link the customers and flights tables,
filtering for flights where the airline is not British Airways and where the customer has also
traveled with British Airways.
Datasets and SQL Schemas
• - Table creation
CREATE TABLE customers (
customer_id INT PRIMARY KEY,
customer_name VARCHAR(100)
);

CREATE TABLE flights (


flight_id INT PRIMARY KEY,
customer_id INT,
airline VARCHAR(50),
flight_date DATE
);
• - Datasets
INSERT INTO customers VALUES
(1, 'Sarah Wilson'),
(2, 'Liam Evans');

INSERT INTO flights VALUES


(1, 1, 'British Airways', '2023-03-15'),
(2, 1, 'EasyJet', '2023-04-20');

Learnings
• Using JOIN to combine customer and flight data
• Filtering data with WHERE to exclude British Airways
• Identifying customers who traveled with British Airways and other airlines

652
1000+ SQL Interview Questions & Answers | By Zero Analyst

Solutions
• - PostgreSQL solution
SELECT DISTINCT f.airline
FROM flights f
JOIN customers c ON f.customer_id = c.customer_id
WHERE c.customer_id IN (
SELECT customer_id
FROM flights
WHERE airline = 'British Airways'
) AND f.airline != 'British Airways';
• - MySQL solution
SELECT DISTINCT f.airline
FROM flights f
JOIN customers c ON f.customer_id = c.customer_id
WHERE c.customer_id IN (
SELECT customer_id
FROM flights
WHERE airline = 'British Airways'
) AND f.airline != 'British Airways';
• Q.528
Question
Identify the top 5 products that have been returned the most and the reason for their return.
Explanation
To solve this, you need to count the number of returns for each product and then list the top 5
products with the highest return count. You should also include the reason for the returns.
This can be achieved using GROUP BY and COUNT() to aggregate the data, and ORDER BY to
rank the products by their return frequency.
Datasets and SQL Schemas
• - Table creation
CREATE TABLE products (
product_id INT,
product_name VARCHAR(100)
);

CREATE TABLE returns (


return_id INT,
product_id INT,
return_reason VARCHAR(100)
);
• - Datasets
INSERT INTO products VALUES
(1, 'Laptop'),
(2, 'Smartphone'),
(3, 'Tablet'),
(4, 'Headphones'),
(5, 'Smartwatch');

INSERT INTO returns VALUES


(1, 1, 'Defective'),
(2, 1, 'Wrong item sent'),
(3, 2, 'Defective'),
(4, 3, 'Not as described'),
(5, 2, 'Defective'),
(6, 2, 'Customer changed mind'),
(7, 4, 'Defective'),
(8, 5, 'Not working'),
(9, 1, 'Defective');

Learnings
• Using JOIN to combine product and return data

653
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Grouping by product to count returns


• Using ORDER BY to get the top 5 products based on return counts
• Aggregating return reasons for each product
Solutions
• - PostgreSQL solution
SELECT p.product_name, r.return_reason, COUNT(*) AS return_count
FROM returns r
JOIN products p ON r.product_id = p.product_id
GROUP BY p.product_name, r.return_reason
ORDER BY return_count DESC
LIMIT 5;
• - MySQL solution
SELECT p.product_name, r.return_reason, COUNT(*) AS return_count
FROM returns r
JOIN products p ON r.product_id = p.product_id
GROUP BY p.product_name, r.return_reason
ORDER BY return_count DESC
LIMIT 5;
• Q.529
Question
Find all customers who booked flights through British Airways but stayed at hotels partnered
with Booking.com.
Explanation
You need to identify customers who have flown with British Airways and also stayed at
hotels booked via Booking.com. This can be achieved by using JOIN operations to combine
data from the flights and hotel_bookings tables, then filtering for records where the
airline is 'British Airways' and the platform is 'Booking.com'.
Datasets and SQL Schemas
• - Table creation
CREATE TABLE customers (
customer_id INT PRIMARY KEY,
customer_name VARCHAR(100)
);

CREATE TABLE flights (


flight_id INT PRIMARY KEY,
customer_id INT,
airline VARCHAR(50),
flight_date DATE
);

CREATE TABLE hotel_bookings (


booking_id INT PRIMARY KEY,
customer_id INT,
hotel_name VARCHAR(100),
platform VARCHAR(50)
);
• - Datasets
INSERT INTO customers VALUES
(1, 'Emily Taylor'),
(2, 'Jack Moore');

INSERT INTO flights VALUES


(1, 1, 'British Airways', '2023-04-15');

INSERT INTO hotel_bookings VALUES


(1, 1, 'Hilton', 'Booking.com');

654
1000+ SQL Interview Questions & Answers | By Zero Analyst

Learnings
• Using JOIN to combine data from flights and hotel_bookings tables
• Filtering data using WHERE for specific conditions (e.g., airline and platform)
• Identifying customers who meet both criteria (flights with British Airways and bookings
with Booking.com)
Solutions
• - PostgreSQL solution
SELECT c.customer_name
FROM customers c
JOIN flights f ON c.customer_id = f.customer_id
JOIN hotel_bookings hb ON c.customer_id = hb.customer_id
WHERE f.airline = 'British Airways'
AND hb.platform = 'Booking.com';
• - MySQL solution
SELECT c.customer_name
FROM customers c
JOIN flights f ON c.customer_id = f.customer_id
JOIN hotel_bookings hb ON c.customer_id = hb.customer_id
WHERE f.airline = 'British Airways'
AND hb.platform = 'Booking.com';
• Q.530
Question
Write a SQL query to find the customer who made the most recent order.
Explanation
You need to identify the customer who placed the most recent order based on the
order_date. You can achieve this by selecting the order with the latest order_date and then
joining it with the customers table to retrieve the corresponding customer information.
Datasets and SQL Schemas
• - Table creation
CREATE TABLE customers (
customer_id INT PRIMARY KEY,
customer_name VARCHAR(100)
);

CREATE TABLE orders (


order_id INT PRIMARY KEY,
customer_id INT,
amount DECIMAL(10, 2),
order_date DATE,
FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
);
• - Datasets
INSERT INTO customers (customer_id, customer_name) VALUES
(1, 'Anjali'),
(2, 'Rohan'),
(3, 'Suresh'),
(4, 'Priya'),
(5, 'Rahul');

INSERT INTO orders (order_id, customer_id, amount, order_date) VALUES


(1, 1, 2500, '2023-01-01'),
(2, 2, 3000, '2023-01-02'),
(3, 1, 1500, '2023-02-03'),
(4, 3, 4000, '2023-02-12'),
(5, 1, 3000, '2023-01-05'),
(6, 2, 4500, '2023-01-06'),
(7, 4, 5000, '2023-01-07'),
(8, 5, 2000, '2023-01-08');

655
1000+ SQL Interview Questions & Answers | By Zero Analyst

Learnings
• Selecting the most recent record using MAX() or ORDER BY with LIMIT
• Joining orders with customers to get customer details
• Using ORDER BY and LIMIT to pick the latest order

Solutions
• - PostgreSQL Solution
SELECT c.customer_name
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
ORDER BY o.order_date DESC
LIMIT 1;
• - MySQL Solution
SELECT c.customer_name
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
ORDER BY o.order_date DESC
LIMIT 1;

Explanation of the Query:


• The JOIN between customers and orders connects the customer information with their
orders.
• The ORDER BY o.order_date DESC orders the orders by date, with the most recent first.
• LIMIT 1 ensures that only the most recent order is returned.
• Q.531
Question
Identify the sales percentage contribution of each product for each month.
Explanation
To calculate the sales percentage contribution of each product for each month, you need to:
• Aggregate the sales by product and month.
• Calculate the total sales for each month.
• Divide the product's sales by the total sales for that month to get the percentage
contribution.
You can achieve this by using GROUP BY to aggregate sales by product and month, and a JOIN
or subquery to calculate the total sales for each month. Then, the percentage can be computed
by dividing individual product sales by total sales for that month.
Datasets and SQL Schemas
• - Table creation
CREATE TABLE sales (
sale_id INT,
product_id INT,
sale_date DATE,
sales_amount DECIMAL(10, 2),
customer_id INT
);
• - Datasets
INSERT INTO sales VALUES
(1, 1, '2024-01-10', 50000.00, 1001),
(2, 2, '2024-01-20', 70000.00, 1002),
(3, 1, '2024-02-10', 60000.00, 1001),
(4, 1, '2024-02-15', 40000.00, 1003),
(5, 3, '2024-01-25', 12000.00, 1004),

656
1000+ SQL Interview Questions & Answers | By Zero Analyst

(6, 4, '2024-01-28', 22000.00, 1005),


(7, 2, '2024-02-02', 55000.00, 1006),
(8, 3, '2024-02-03', 40000.00, 1007),
(9, 1, '2024-02-05', 15000.00, 1008),
(10, 2, '2024-02-10', 35000.00, 1009),
(11, 4, '2024-02-12', 25000.00, 1010),
(12, 1, '2024-03-01', 30000.00, 1011),
(13, 3, '2024-03-05', 18000.00, 1012),
(14, 2, '2024-03-10', 40000.00, 1013),
(15, 4, '2024-03-15', 50000.00, 1014),
(16, 1, '2024-03-20', 27000.00, 1015),
(17, 3, '2024-03-22', 22000.00, 1016),
(18, 2, '2024-03-25', 28000.00, 1017),
(19, 1, '2024-03-30', 34000.00, 1018);

Learnings
• Aggregating sales by product and month using GROUP BY
• Calculating total sales for each month using a subquery or window function
• Calculating percentage by dividing product sales by total sales for each month

Solutions
• - PostgreSQL Solution
SELECT
EXTRACT(MONTH FROM sale_date) AS month,
product_id,
SUM(sales_amount) AS product_sales,
(SUM(sales_amount) / total_sales.total_sales) * 100 AS sales_percentage
FROM
sales,
(SELECT EXTRACT(MONTH FROM sale_date) AS month, SUM(sales_amount) AS total_sales
FROM sales
GROUP BY month) AS total_sales
WHERE EXTRACT(MONTH FROM sale_date) = total_sales.month
GROUP BY month, product_id, total_sales.total_sales
ORDER BY month, product_id;
• - MySQL Solution
SELECT
MONTH(sale_date) AS month,
product_id,
SUM(sales_amount) AS product_sales,
(SUM(sales_amount) / total_sales.total_sales) * 100 AS sales_percentage
FROM
sales,
(SELECT MONTH(sale_date) AS month, SUM(sales_amount) AS total_sales
FROM sales
GROUP BY month) AS total_sales
WHERE MONTH(sale_date) = total_sales.month
GROUP BY month, product_id, total_sales.total_sales
ORDER BY month, product_id;

Explanation
• The subquery calculates the total sales per month.
• The main query calculates the sales amount per product for each month.
• The WHERE clause ensures that the total sales are correctly matched to each product's sales
by month.
• The sales percentage is calculated by dividing the product's sales by the total sales for the
month, then multiplying by 100.
• Q.532
Question
Find the most popular product in each category based on total sales.

657
1000+ SQL Interview Questions & Answers | By Zero Analyst

Explanation
To identify the most popular product in each category based on total sales, you need to:
• Aggregate the total sales for each product.
• Group the products by category.
• Select the product with the highest total sales for each category.
You can achieve this by using GROUP BY to calculate the total sales for each product and then
applying a JOIN with the products table to group by category. Finally, you can use a
subquery or window function to select the product with the highest total sales in each
category.
Datasets and SQL Schemas
• - Table creation
CREATE TABLE sales (
sale_id INT PRIMARY KEY,
product_id INT,
revenue DECIMAL(10, 2)
);

CREATE TABLE products (


product_id INT PRIMARY KEY,
product_name VARCHAR(100),
category VARCHAR(50)
);
• - Datasets
INSERT INTO products VALUES
(1, 'Smartphone', 'Electronics'),
(2, 'Laptop', 'Electronics'),
(3, 'Washing Machine', 'Appliances');

INSERT INTO sales VALUES


(1, 1, 50000.00),
(2, 2, 75000.00),
(3, 3, 30000.00),
(4, 1, 70000.00),
(5, 1, 80000.00),
(6, 2, 95000.00),
(7, 3, 35000.00),
(8, 1, 60000.00),
(9, 2, 100000.00),
(10, 3, 40000.00),
(11, 1, 75000.00),
(12, 2, 82000.00),
(13, 3, 25000.00),
(14, 1, 95000.00),
(15, 2, 110000.00),
(16, 3, 32000.00),
(17, 1, 68000.00),
(18, 2, 99000.00),
(19, 3, 28000.00);

Learnings
• Using GROUP BY to aggregate data and calculate total sales per product
• Using JOIN to bring product details into the query
• Selecting the highest sales product per category using aggregation or window functions

Solutions
• - PostgreSQL Solution
WITH total_sales AS (
SELECT
s.product_id,

658
1000+ SQL Interview Questions & Answers | By Zero Analyst

SUM(s.revenue) AS total_revenue
FROM sales s
GROUP BY s.product_id
)
SELECT p.category,
p.product_name,
ts.total_revenue
FROM total_sales ts
JOIN products p ON ts.product_id = p.product_id
WHERE (p.category, ts.total_revenue) IN (
SELECT category, MAX(total_revenue)
FROM total_sales ts
JOIN products p ON ts.product_id = p.product_id
GROUP BY category
)
ORDER BY p.category;
• - MySQL Solution
WITH total_sales AS (
SELECT
s.product_id,
SUM(s.revenue) AS total_revenue
FROM sales s
GROUP BY s.product_id
)
SELECT p.category,
p.product_name,
ts.total_revenue
FROM total_sales ts
JOIN products p ON ts.product_id = p.product_id
WHERE (p.category, ts.total_revenue) IN (
SELECT category, MAX(total_revenue)
FROM total_sales ts
JOIN products p ON ts.product_id = p.product_id
GROUP BY category
)
ORDER BY p.category;

Explanation of the Query:


• The total_sales CTE aggregates the total revenue per product by summing the revenue
values from the sales table.
• The main query joins the total_sales CTE with the products table to retrieve the
product's name and category.
• The subquery in the WHERE clause identifies the product with the highest total revenue for
each category by using MAX(total_revenue) and grouping by category.
• The result is filtered to only include the most popular product for each category based on
total sales.
• Q.533
Question
Find all patients who have been treated at both Apollo and Max Healthcare in the last year.
Explanation
To find patients treated at both Apollo and Max Healthcare, we need to:
• Identify patients who exist in both the apollo_patients and max_patients tables.
• Ensure the treatment date falls within the last year (you can use a condition to filter for
treatments within the last 12 months).
• Use an INNER JOIN or INTERSECT to combine data from both tables and find patients
who appear in both.
Datasets and SQL Schemas

659
1000+ SQL Interview Questions & Answers | By Zero Analyst

• - Table creation
CREATE TABLE apollo_patients (
patient_id INT,
patient_name VARCHAR(100),
treatment_date DATE
);

CREATE TABLE max_patients (


patient_id INT,
patient_name VARCHAR(100),
treatment_date DATE
);
• - Datasets
INSERT INTO apollo_patients VALUES
(1, 'Karan Kapoor', '2023-07-20'),
(2, 'Sita Mehta', '2023-01-10'),
(3, 'Amit Kumar', '2023-02-15');

INSERT INTO max_patients VALUES


(2, 'Sita Mehta', '2023-05-20'),
(3, 'Amit Kumar', '2023-04-10'),
(4, 'Rajesh Yadav', '2023-06-18');

Learnings
• Using JOIN to combine records from multiple tables based on common fields (e.g.,
patient_id).
• Filtering records for a date range (last year) to identify recent treatments.
• Identifying common records in both tables using INTERSECT or a combination of INNER
JOIN and filtering.

Solutions
• - PostgreSQL Solution
SELECT DISTINCT a.patient_name
FROM apollo_patients a
JOIN max_patients m ON a.patient_id = m.patient_id
WHERE a.treatment_date >= CURRENT_DATE - INTERVAL '1 year'
AND m.treatment_date >= CURRENT_DATE - INTERVAL '1 year';
• - MySQL Solution
SELECT DISTINCT a.patient_name
FROM apollo_patients a
JOIN max_patients m ON a.patient_id = m.patient_id
WHERE a.treatment_date >= CURDATE() - INTERVAL 1 YEAR
AND m.treatment_date >= CURDATE() - INTERVAL 1 YEAR;

Explanation of the Query:


• JOIN: The query joins the apollo_patients and max_patients tables on the
patient_id to find patients who appear in both tables.
• Filtering by Date: The WHERE clause ensures that only records within the last year are
included (CURRENT_DATE - INTERVAL '1 year' for PostgreSQL and CURDATE() -
INTERVAL 1 YEAR for MySQL).
• DISTINCT: The query uses DISTINCT to ensure that the same patient is not listed multiple
times if they have multiple treatments within the year.
• Result: The result will return the names of the patients who have been treated at both
Apollo and Max Healthcare within the last year.
• Q.534
Question
Get customer details along with their last order date.

660
1000+ SQL Interview Questions & Answers | By Zero Analyst

Explanation
To retrieve the customer details along with their most recent order date, you need to:
• Join the customers table with the orders table based on customer_id.
• Use the MAX function on the order_date to get the most recent order for each customer.
• Group the result by customer_id and customer_name to get one row per customer.
Datasets and SQL Schemas
• - Table creation
CREATE TABLE customers (
customer_id INT,
customer_name VARCHAR(100)
);

CREATE TABLE orders (


order_id INT,
customer_id INT,
order_date DATE
);
• - Datasets
INSERT INTO customers VALUES
(1, 'Rajesh Kumar'),
(2, 'Priya Singh'),
(3, 'Anjali Mehta'),
(4, 'Ravi Shankar'),
(5, 'Neha Sharma'),
(6, 'Vikram Roy'),
(7, 'Alok Verma');

INSERT INTO orders VALUES


(101, 1, '2024-01-10'),
(102, 2, '2024-01-15'),
(103, 1, '2024-02-20'),
(104, 1, '2024-02-25'),
(105, 2, '2024-03-05'),
(106, 3, '2024-01-28'),
(107, 4, '2024-03-10'),
(108, 5, '2024-03-15'),
(109, 1, '2024-03-01'),
(110, 2, '2024-02-28'),
(111, 3, '2024-03-20'),
(112, 4, '2024-02-05'),
(113, 5, '2024-02-25'),
(114, 6, '2024-03-22'),
(115, 7, '2024-03-18'),
(116, 6, '2024-02-15'),
(117, 7, '2024-01-10');

Learnings
• Using JOIN to combine data from multiple tables.
• Using MAX() to identify the most recent order date.
• Grouping by customer to ensure each customer is listed only once.

Solutions
• - PostgreSQL Solution
SELECT c.customer_id, c.customer_name, MAX(o.order_date) AS last_order_date
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.customer_name;
• - MySQL Solution
SELECT c.customer_id, c.customer_name, MAX(o.order_date) AS last_order_date
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id

661
1000+ SQL Interview Questions & Answers | By Zero Analyst

GROUP BY c.customer_id, c.customer_name;

Explanation of the Query:


• JOIN: The customers table is joined with the orders table on the customer_id field.
• MAX(o.order_date): The MAX function is used to get the most recent order_date for each
customer.
• GROUP BY: The query groups the results by customer_id and customer_name to ensure
only one row per customer.
• Result: The result will return each customer's details along with their most recent order
date.
• Q.535
Question
List all startups that have at least 2 co-founders and operate in the fintech space.
Explanation
You need to retrieve all startups that meet the following conditions:
• The startup operates in the 'Fintech' sector.
• The startup has at least 2 co-founders.
You can achieve this by filtering the
startups table with a WHERE clause for the sector and the number of co-founders.

Datasets and SQL Schemas


• - Table creation
CREATE TABLE startups (
startup_id INT PRIMARY KEY,
startup_name VARCHAR(100),
sector VARCHAR(50),
co_founders INT
);
• - Datasets
INSERT INTO startups VALUES
(1, 'Paytm', 'Fintech', 3),
(2, 'PhonePe', 'Fintech', 1),
(3, 'Razorpay', 'Fintech', 2),
(4, 'Cred', 'Fintech', 3),
(5, 'ZestMoney', 'Fintech', 1),
(6, 'Rupeek', 'Fintech', 4),
(7, 'Indifi', 'Fintech', 2),
(8, 'Lendingkart', 'Fintech', 2),
(9, 'BharatPe', 'Fintech', 5),
(10, 'Groww', 'Fintech', 3),
(11, 'Upstox', 'Fintech', 2),
(12, 'KreditBee', 'Fintech', 1),
(13, 'MobiKwik', 'Fintech', 3),
(14, 'Slice', 'Fintech', 2),
(15, 'TrueBalance', 'Fintech', 1);

Learnings
• Filtering data with WHERE to check conditions on specific columns (e.g., sector and
co_founders).
• Using the AND operator to combine multiple conditions in a WHERE clause.

Solutions
• - PostgreSQL Solution
SELECT startup_name

662
1000+ SQL Interview Questions & Answers | By Zero Analyst

FROM startups
WHERE sector = 'Fintech'
AND co_founders >= 2;
• - MySQL Solution
SELECT startup_name
FROM startups
WHERE sector = 'Fintech'
AND co_founders >= 2;

Explanation of the Query:


• WHERE: Filters the results to only include startups in the 'Fintech' sector.
• co_founders >= 2: Ensures that the startup has at least 2 co-founders.
• The result will return only those startups that match both conditions.
• Q.536
Question
Write a SQL query to count how many customers have placed more than 2 orders.
Explanation
To solve this, we need to count the number of orders placed by each customer, and then filter
customers who have placed more than 2 orders. This can be done using the GROUP BY clause
to aggregate by customer_id, and using the HAVING clause to filter customers with more than
2 orders.
Datasets and SQL Schemas
• - Table creation
CREATE TABLE customers (
customer_id INT PRIMARY KEY,
customer_name VARCHAR(100)
);

CREATE TABLE orders (


order_id INT PRIMARY KEY,
customer_id INT,
order_date DATE,
FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
);
• - Datasets
INSERT INTO customers (customer_id, customer_name) VALUES
(1, 'Alice'),
(2, 'Bob'),
(3, 'Charlie'),
(4, 'David');

INSERT INTO orders (order_id, customer_id, order_date) VALUES


(1, 1, '2023-01-01'),
(2, 2, '2023-01-02'),
(3, 1, '2023-01-03'),
(4, 3, '2023-01-04'),
(5, 1, '2023-01-05'),
(6, 2, '2023-01-06'),
(7, 4, '2023-01-07');

Learnings
• Using GROUP BY to aggregate data by customer.
• Using HAVING to filter results based on aggregate values.
• Counting the number of orders for each customer.

Solutions

663
1000+ SQL Interview Questions & Answers | By Zero Analyst

• - PostgreSQL / MySQL Solution


SELECT COUNT(DISTINCT customer_id) AS customer_count
FROM orders
GROUP BY customer_id
HAVING COUNT(order_id) > 2;

Explanation of the Query:


• GROUP BY customer_id: Groups orders by each customer.
• COUNT(order_id): Counts the number of orders for each customer.
• HAVING COUNT(order_id) > 2: Filters the customers to only those with more than 2
orders.
• COUNT(DISTINCT customer_id): Counts how many customers meet the condition of
placing more than 2 orders.
• Q.537
Question
Find all hotels that have received ratings from both OYO and MakeMyTrip customers.
Explanation
To solve this, we need to identify hotels that have reviews from both the oyo_reviews and
makemytrip_reviews tables. This can be done by using an INNER JOIN to find hotels that
appear in both tables, based on the hotel_id.
Datasets and SQL Schemas
• - Table creation
CREATE TABLE oyo_reviews (
review_id INT,
hotel_id INT,
hotel_name VARCHAR(100),
rating INT,
review_date DATE
);

CREATE TABLE makemytrip_reviews (


review_id INT,
hotel_id INT,
hotel_name VARCHAR(100),
rating INT,
review_date DATE
);
• - Datasets
INSERT INTO oyo_reviews VALUES
(1, 401, 'Hotel Bliss', 4, '2023-09-01'),
(2, 402, 'Sea View Resort', 5, '2023-09-15'),
(3, 401, 'Hotel Bliss', 3, '2023-10-02');

INSERT INTO makemytrip_reviews VALUES


(1, 401, 'Hotel Bliss', 5, '2023-09-05'),
(2, 403, 'Mountain Retreat', 4, '2023-09-20'),
(3, 402, 'Sea View Resort', 4, '2023-10-01');

Learnings
• Using INNER JOIN to combine data from two tables based on a common column (in this
case, hotel_id).
• Filtering hotels that have reviews from both sources.

Solutions
• - PostgreSQL / MySQL Solution

664
1000+ SQL Interview Questions & Answers | By Zero Analyst

SELECT DISTINCT o.hotel_name


FROM oyo_reviews o
JOIN makemytrip_reviews m ON o.hotel_id = m.hotel_id
WHERE o.hotel_id = m.hotel_id;

Explanation of the Query:


• INNER JOIN makemytrip_reviews m ON o.hotel_id = m.hotel_id: Joins the
oyo_reviews table with the makemytrip_reviews table based on the hotel_id column.
• WHERE o.hotel_id = m.hotel_id: Ensures that the hotel has reviews in both the OYO
and MakeMyTrip tables.
• DISTINCT o.hotel_name: Selects unique hotel names that appear in both tables.
• Q.538
Question
Generate a list of customers who have not made any purchases in 2023.
Explanation
To solve this, we need to find customers who do not have any associated sales in the year
2023. This can be done using a LEFT JOIN to join the customers table with the sales table,
and then filtering out customers who made purchases in 2023 using a WHERE clause. The
result will include customers with no sales in 2023.
Datasets and SQL Schemas
• - Table creation
CREATE TABLE customers (
customer_id INT PRIMARY KEY,
customer_name VARCHAR(100)
);

CREATE TABLE sales (


sale_id INT PRIMARY KEY,
customer_id INT,
sale_date DATE
);
• - Datasets
INSERT INTO customers VALUES
(1, 'Ankita Roy'),
(2, 'Rohit Jain'),
(3, 'Sneha Das'),
(4, 'Amit Kumar'),
(5, 'Priya Mehta'),
(6, 'Ravi Patel'),
(7, 'Anita Sharma');

INSERT INTO sales VALUES


(1, 2, '2022-11-15'),
(2, 1, '2022-05-10'),
(3, 1, '2023-03-12'),
(4, 2, '2021-08-15'),
(5, 3, '2024-01-20'),
(6, 5, '2024-06-22'),
(7, 1, '2024-11-30'),
(8, 2, '2024-09-01');

Learnings
• Using LEFT JOIN to find unmatched rows between two tables.
• Filtering results based on a specific condition (sale_date not in 2023).
• Identifying customers without sales in a given year.

665
1000+ SQL Interview Questions & Answers | By Zero Analyst

Solutions
• - PostgreSQL / MySQL Solution
SELECT c.customer_name
FROM customers c
LEFT JOIN sales s ON c.customer_id = s.customer_id AND EXTRACT(YEAR FROM s.sale_date) =
2023
WHERE s.sale_id IS NULL;

Explanation of the Query:


• LEFT JOIN sales s ON c.customer_id = s.customer_id AND EXTRACT(YEAR FROM
s.sale_date) = 2023: Joins the customers table with the sales table, filtering the sales to
only those made in 2023.
• WHERE s.sale_id IS NULL: Filters out customers who have any sales in 2023 (because
sale_id will be NULL for customers who have no sales in 2023). This ensures only
customers without purchases in 2023 are returned.
• Q.539
Question
Find the difference in sales between the current month and the previous month for each
product.
Explanation
To solve this, we need to compare the sales for each product in the current month and the
previous month. This can be done by:
• Extracting the month and year from the sale_date.
• Using window functions or self-joins to compare the sales for the current month and the
previous month for each product.
• Calculating the difference in sales between the two months.
Datasets and SQL Schemas
• - Table creation
CREATE TABLE sales (
sale_id INT,
product_id INT,
sale_date DATE,
sales_amount DECIMAL(10, 2)
);
• - Datasets
INSERT INTO sales VALUES
(1, 1, '2024-01-10', 50000.00),
(2, 1, '2024-02-20', 70000.00),
(3, 2, '2024-01-15', 30000.00),
(4, 1, '2024-03-10', 60000.00),
(5, 1, '2024-12-01', 55000.00),
(6, 1, '2024-12-05', 75000.00),
(7, 1, '2024-12-10', 80000.00),
(8, 1, '2024-12-15', 95000.00),
(9, 1, '2024-12-20', 50000.00),
(10, 1, '2024-12-25', 65000.00),
(11, 2, '2024-12-01', 30000.00),
(12, 2, '2024-12-07', 35000.00),
(13, 2, '2024-12-10', 40000.00),
(14, 2, '2024-12-15', 45000.00),
(15, 2, '2024-12-20', 42000.00),
(16, 2, '2024-12-25', 47000.00),
(17, 3, '2024-12-02', 60000.00),
(18, 3, '2024-12-05', 70000.00),
(19, 3, '2024-12-10', 65000.00),
(20, 3, '2024-12-15', 72000.00),

666
1000+ SQL Interview Questions & Answers | By Zero Analyst

(21, 3, '2024-12-20', 75000.00),


(22, 3, '2024-12-25', 68000.00),
(23, 4, '2024-12-01', 40000.00),
(24, 4, '2024-12-07', 43000.00),
(25, 4, '2024-12-10', 45000.00),
(26, 4, '2024-12-15', 47000.00),
(27, 4, '2024-12-20', 48000.00),
(28, 4, '2024-12-25', 50000.00),
(29, 5, '2024-12-01', 50000.00),
(30, 5, '2024-12-05', 55000.00),
(31, 5, '2024-12-10', 60000.00),
(32, 5, '2024-12-15', 65000.00),
(33, 5, '2024-12-20', 70000.00),
(34, 5, '2024-12-25', 75000.00),
(35, 6, '2025-01-01', 80000.00),
(36, 6, '2025-01-05', 85000.00),
(37, 6, '2025-01-10', 90000.00),
(38, 6, '2025-01-13', 95000.00);

Learnings
• Using EXTRACT(YEAR FROM sale_date) and EXTRACT(MONTH FROM sale_date) to break
down the date.
• Using JOIN to match sales data from the current and previous months.
• Calculating differences in sales for each product using subtraction.
Solutions
• - PostgreSQL/MySQL Solution
SELECT
p.product_name,
SUM(CASE
WHEN MONTH(s.sale_date) = MONTH(CURRENT_DATE) AND YEAR(s.sale_date) = YEAR(CURRE
NT_DATE)
THEN s.sales_amount
ELSE 0
END) AS current_month_sales,
SUM(CASE
WHEN MONTH(s.sale_date) = MONTH(CURRENT_DATE) - 1 AND YEAR(s.sale_date) = YEAR(C
URRENT_DATE)
THEN s.sales_amount
ELSE 0
END) AS previous_month_sales,
(SUM(CASE
WHEN MONTH(s.sale_date) = MONTH(CURRENT_DATE) AND YEAR(s.sale_date) = YEAR(CURRE
NT_DATE)
THEN s.sales_amount
ELSE 0
END) - SUM(CASE
WHEN MONTH(s.sale_date) = MONTH(CURRENT_DATE) - 1 AND YEAR(s.sale_date) = YEAR(C
URRENT_DATE)
THEN s.sales_amount
ELSE 0
END)) AS sales_difference
FROM sales s
JOIN products p ON s.product_id = p.product_id
GROUP BY p.product_name;

Solution for PostgreSQL:


SELECT
p.product_name,
SUM(CASE
WHEN EXTRACT(MONTH FROM s.sale_date) = EXTRACT(MONTH FROM CURRENT_DATE) AND EXTR
ACT(YEAR FROM s.sale_date) = EXTRACT(YEAR FROM CURRENT_DATE)
THEN s.sales_amount
ELSE 0
END) AS current_month_sales,
SUM(CASE

667
1000+ SQL Interview Questions & Answers | By Zero Analyst

WHEN EXTRACT(MONTH FROM s.sale_date) = EXTRACT(MONTH FROM CURRENT_DATE) - 1 AND


EXTRACT(YEAR FROM s.sale_date) = EXTRACT(YEAR FROM CURRENT_DATE)
THEN s.sales_amount
ELSE 0
END) AS previous_month_sales,
(SUM(CASE
WHEN EXTRACT(MONTH FROM s.sale_date) = EXTRACT(MONTH FROM CURRENT_DATE) AND EXTR
ACT(YEAR FROM s.sale_date) = EXTRACT(YEAR FROM CURRENT_DATE)
THEN s.sales_amount
ELSE 0
END) - SUM(CASE
WHEN EXTRACT(MONTH FROM s.sale_date) = EXTRACT(MONTH FROM CURRENT_DATE) - 1 AND
EXTRACT(YEAR FROM s.sale_date) = EXTRACT(YEAR FROM CURRENT_DATE)
THEN s.sales_amount
ELSE 0
END)) AS sales_difference
FROM sales s
JOIN products p ON s.product_id = p.product_id
GROUP BY p.product_name;

Explanation:
• CTE (sales_monthly): We first aggregate the total sales for each product by year and
month.
• LEFT JOIN: We join the sales data for the current month (current) with the previous
month (previous). We match on product_id, sale_year, and sale_month (with the
condition current.sale_month = previous.sale_month + 1 to get the previous month).
• COALESCE(previous.total_sales, 0): This handles the case where there are no sales in
the previous month, defaulting the sales to 0.
• sales_diff: The difference in sales between the current and previous months is calculated
by subtracting previous_month_sales from current_month_sales.
• Q.540
Question
List all projects that started in 2024 using a date function.
Explanation
The task requires extracting projects that started in the year 2024. This can be done by using a
date function to filter records based on the year of the start_date column.
Datasets and SQL Schemas
• - Table creation
CREATE TABLE projects (
project_id INT PRIMARY KEY,
project_name VARCHAR(100),
start_date DATE
);
• - Datasets
INSERT INTO projects VALUES
(1, 'Project Alpha', '2024-02-15'),
(2, 'Project Beta', '2023-11-30'),
(3, 'Project Gamma', '2024-05-10'),
(4, 'Project Delta', '2024-01-20'),
(5, 'Project Epsilon', '2024-08-01'),
(6, 'Project Zeta', '2023-12-25'),
(7, 'Project Eta', '2024-04-18'),
(8, 'Project Theta', '2024-07-12');

Learnings
• Use of date functions to extract year from a DATE type column.
• Filtering records based on year in SQL queries.

668
1000+ SQL Interview Questions & Answers | By Zero Analyst

Solutions
• - PostgreSQL solution
SELECT project_id, project_name, start_date
FROM projects
WHERE EXTRACT(YEAR FROM start_date) = 2024;
• - MySQL solution
SELECT project_id, project_name, start_date
FROM projects
WHERE YEAR(start_date) = 2024;

Zomato
• Q.541
Question
Write a SQL query to find all customers who never ordered anything from the Customers
and Orders tables. Return the customer names who do not appear in the Orders table.
Explanation
To solve this problem, we need to identify customers who are present in the Customers table
but do not have any corresponding entries in the Orders table. This can be achieved by
performing a LEFT JOIN or using a NOT EXISTS or NOT IN condition to check for
customers without matching order records.
Datasets and SQL Schemas
• - Table creation and sample data
CREATE TABLE Customers (
Id INT PRIMARY KEY,
NameCust VARCHAR(100)
);

CREATE TABLE Orders (


Id INT PRIMARY KEY,
CustomerId INT,
FOREIGN KEY (CustomerId) REFERENCES Customers(Id)
);

-- Sample data insertions


INSERT INTO Customers (Id, NameCust)
VALUES
(1, 'Joe'),
(2, 'Henry'),
(3, 'Sam'),
(4, 'Max');

INSERT INTO Orders (Id, CustomerId)


VALUES
(1, 3),
(2, 1);

Learnings
• Identifying customers without orders using LEFT JOIN and IS NULL condition.
• Using NOT EXISTS or NOT IN to filter out customers with existing orders.
• JOIN operations to combine data from multiple tables.
Solutions
• - PostgreSQL and MySQL solution
SELECT c.NameCust AS Customers
FROM Customers c
LEFT JOIN Orders o ON c.Id = o.CustomerId
WHERE o.CustomerId IS NULL;

669
1000+ SQL Interview Questions & Answers | By Zero Analyst

This query works as follows:


• LEFT JOIN combines all customers with the orders (if they exist).
• The WHERE o.CustomerId IS NULL condition filters out customers who have at least
one order, returning only those who have no matching orders.
• Q.542
Question
Write a SQL query to find the top 5 most ordered items for Blinkit in Gurgaon in the last
month Consider last current month as Jan 2025. The tables involved are Orders and
OrderDetails.
Explanation
We need to:
• Filter the Orders table for Gurgaon and last month orders.
• Join the Orders table with the OrderDetails table on OrderID.
• Group by ItemID to get the total quantity ordered for each item.
• Order the results by total quantity in descending order and limit the output to the top 5
items.
Datasets and SQL Schemas
• - Table creation and sample data
CREATE TABLE Orders (
OrderID INT PRIMARY KEY,
OrderType VARCHAR(50),
City VARCHAR(50),
OrderDate DATE
);

CREATE TABLE OrderDetails (


OrderID INT,
ItemID INT,
Quantity INT,
FOREIGN KEY (OrderID) REFERENCES Orders(OrderID)
);

-- Sample data insertions


INSERT INTO Orders (OrderID, OrderType, City, OrderDate)
VALUES
(1, 'Online', 'Gurgaon', '2024-12-05'),
(2, 'Offline', 'Gurgaon', '2024-12-10'),
(3, 'Online', 'Gurgaon', '2024-12-15'),
(4, 'Online', 'Delhi', '2024-12-20'),
(5, 'Offline', 'Gurgaon', '2024-11-10');

INSERT INTO OrderDetails (OrderID, ItemID, Quantity)


VALUES
(1, 101, 5),
(1, 102, 2),
(2, 101, 4),
(3, 103, 3),
(3, 102, 6),
(3, 101, 2),
(5, 101, 1);

Learnings
• Filtering records by City and date range (last month).
• Using JOINs to combine data from Orders and OrderDetails.
• GROUP BY to aggregate the total quantities per item.
• ORDER BY to sort the items by quantity in descending order.

670
1000+ SQL Interview Questions & Answers | By Zero Analyst

• LIMIT to fetch the top 5 items.


Solutions
• - PostgreSQL and MySQL solution
SELECT od.ItemID, SUM(od.Quantity) AS total_quantity
FROM Orders o
JOIN OrderDetails od ON o.OrderID = od.OrderID
WHERE o.City = 'Gurgaon'
AND o.OrderDate >= CURRENT_DATE - INTERVAL '1 month'
GROUP BY od.ItemID
ORDER BY total_quantity DESC
LIMIT 5;

This query works as follows:


• The JOIN combines the Orders and OrderDetails tables using the OrderID.
• WHERE filters the records for Gurgaon and the last month.
• GROUP BY groups the results by ItemID, summing the quantities for each item.
• The query orders by total quantity in descending order and limits the result to the top 5
items.
• Q.543
Question
Write a SQL query to identify the number of inactive customers (those who haven’t placed
an order in the last 30 days) for Zomato in each city. The tables involved are Customers and
Orders.
Explanation
To find inactive customers:
• We need to check if the Customer has any order placed in the last 30 days by referencing
the Orders table.
• Inactive customers are those who have no order records in the last 30 days.
• We will join the Customers table with the Orders table on CustomerID.
• Use CURRENT_DATE or CURDATE() to filter orders placed in the last 30 days.
• GROUP BY the city to count the inactive customers per city.
Datasets and SQL Schemas
• - Table creation and sample data
CREATE TABLE Customers (
CustomerID INT PRIMARY KEY,
City VARCHAR(50),
TotalOrders INT
);

CREATE TABLE Orders (


OrderID INT PRIMARY KEY,
CustomerID INT,
OrderDate DATE,
FOREIGN KEY (CustomerID) REFERENCES Customers(CustomerID)
);

-- Sample data insertions


INSERT INTO Customers (CustomerID, City, TotalOrders)
VALUES
(1, 'Delhi', 10),
(2, 'Mumbai', 5),
(3, 'Gurgaon', 8),
(4, 'Bangalore', 12),
(5, 'Delhi', 7);

671
1000+ SQL Interview Questions & Answers | By Zero Analyst

INSERT INTO Orders (OrderID, CustomerID, OrderDate)


VALUES
(1, 1, '2025-1-01'),
(2, 2, '2024-12-15'),
(3, 3, '2024-11-25'),
(4, 5, '2024-11-10');

Learnings
• Using LEFT JOIN to identify customers who have no orders in the last 30 days.
• Filtering records using CURRENT_DATE or CURDATE() for accurate date comparison.
• GROUP BY to aggregate inactive customers by City.
• Using COUNT to calculate the number of inactive customers.
Solutions
• - PostgreSQL and MySQL solution
SELECT c.City, COUNT(DISTINCT c.CustomerID) AS InactiveCustomers
FROM Customers c
LEFT JOIN Orders o ON c.CustomerID = o.CustomerID AND o.OrderDate > CURRENT_DATE - INTER
VAL '30 days'
GROUP BY c.City
HAVING COUNT(o.OrderID) = 0;

Explanation:
• LEFT JOIN: We join Customers with Orders, but only include orders placed in the last
30 days (using o.OrderDate > CURRENT_DATE - INTERVAL '30 days').
• COUNT(o.OrderID) = 0: After joining, we check if the customer has no recent orders.
The count of their OrderID should be 0, indicating no orders in the last 30 days.
• GROUP BY c.City: We group the results by city to count inactive customers for each city.
• COUNT(DISTINCT c.CustomerID): This counts the unique customers who have not
placed an order in the last 30 days.
• Q.544
Question
Write a SQL query to calculate the percentage of total orders from each city for food and
grocery deliveries in the past month. The table involved is Orders.
Explanation
To solve this:
• Filter Orders to include only food and grocery deliveries in the past month.
• Calculate the total number of orders and the number of orders from each city.
• Calculate the percentage of orders from each city by dividing the city-specific orders by
the total orders.
• Use GROUP BY to get results per city and filter for the last month using the OrderDate.
Datasets and SQL Schemas
• - Table creation and sample data
CREATE TABLE Cities (
CityID INT PRIMARY KEY,
CityName VARCHAR(50)
);

CREATE TABLE Orders (


OrderID INT PRIMARY KEY,
OrderType VARCHAR(50),
City VARCHAR(50),
OrderDate DATE

672
1000+ SQL Interview Questions & Answers | By Zero Analyst

);

-- Sample data insertions for Cities


INSERT INTO Cities (CityID, CityName)
VALUES
(1, 'Delhi'),
(2, 'Mumbai'),
(3, 'Bangalore'),
(4, 'Gurgaon'),
(5, 'Chennai');

-- Sample data insertions for Orders (15+ rows)


INSERT INTO Orders (OrderID, OrderType, City, OrderDate)
VALUES
(1, 'Food', 'Delhi', '2024-12-01'),
(2, 'Grocery', 'Delhi', '2024-12-03'),
(3, 'Food', 'Mumbai', '2024-12-05'),
(4, 'Food', 'Bangalore', '2024-12-06'),
(5, 'Grocery', 'Gurgaon', '2024-12-07'),
(6, 'Food', 'Chennai', '2024-11-09'),
(7, 'Food', 'Delhi', '2024-14-10'),
(8, 'Grocery', 'Bangalore', '2025-12-12'),
(9, 'Food', 'Chennai', '2024-12-15'),
(10, 'Grocery', 'Mumbai', '2024-12-17'),
(11, 'Food', 'Delhi', '2024-12-18'),
(12, 'Food', 'Mumbai', '2024-12-20'),
(13, 'Grocery', 'Bangalore', '2024-12-22'),
(14, 'Food', 'Gurgaon', '2024-12-24'),
(15, 'Grocery', 'Chennai', '2024-10-28');

Learnings
• Filtering by date range to consider only the last month's orders.
• Using GROUP BY to aggregate orders by city.
• COUNT() to calculate the number of orders from each city.
• Calculating the percentage using simple arithmetic.
• JOINs (if necessary) to reference the Cities table, though in this case, the City is directly
in the Orders table.
Solutions
• - PostgreSQL and MySQL solution
SELECT
o.City,
COUNT(o.OrderID) AS CityOrders,
(COUNT(o.OrderID) * 100.0 / (SELECT COUNT(*) FROM Orders WHERE OrderDate >= CURRENT_
DATE - INTERVAL '1 month' AND (OrderType = 'Food' OR OrderType = 'Grocery'))) AS Percent
ageOfTotalOrders
FROM Orders o
WHERE o.OrderDate >= CURRENT_DATE - INTERVAL '1 month'
AND (o.OrderType = 'Food' OR o.OrderType = 'Grocery')
GROUP BY o.City
ORDER BY PercentageOfTotalOrders DESC;

Explanation:
• WHERE: Filters orders for the last month (OrderDate >= CURRENT_DATE - INTERVAL
'1 month') and for either food or grocery (OrderType = 'Food' OR OrderType =
'Grocery').
• COUNT(o.OrderID): Counts the total orders from each city in the last month.
• Percentage Calculation: The formula (COUNT(o.OrderID) * 100.0) / total_orders
calculates the percentage of orders from each city. The subquery (SELECT COUNT(*) FROM
Orders WHERE OrderDate >= CURRENT_DATE - INTERVAL '1 month' AND (OrderType
= 'Food' OR OrderType = 'Grocery')) computes the total orders of food and grocery
in the last month.

673
1000+ SQL Interview Questions & Answers | By Zero Analyst

• GROUP BY o.City: Groups the results by City to calculate the number of orders per city.
• ORDER BY PercentageOfTotalOrders DESC: Orders the cities by the percentage of
orders in descending order.

Notes:
• Ensure that the CURRENT_DATE is considered correctly for the query depending on the
database system (it typically works for both MySQL and PostgreSQL).
• Adjust the INTERVAL '1 month' depending on the database if needed (e.g., for some
databases, INTERVAL 1 MONTH or INTERVAL '1' MONTH might be used).
• Q.545
Question
Write a SQL query to find the most searched item on Blinkit in 2024 and its search volume
(i.e., how many times it was searched). The table involved is SearchRecords and Items.
Explanation
To find the most searched item:
• We need to join the SearchRecords table (which logs item searches) with the Items table
(which contains details of the items).
• Filter the records to consider only searches made in 2024.
• Group the results by ItemID and calculate the search volume for each item.
• Order the results by search volume in descending order and return the top item.
Datasets and SQL Schemas
• - Table creation and sample data
CREATE TABLE Items (
ItemID INT PRIMARY KEY,
ItemName VARCHAR(100)
);

CREATE TABLE SearchRecords (


SearchID INT PRIMARY KEY,
ItemID INT,
SearchDate DATE,
FOREIGN KEY (ItemID) REFERENCES Items(ItemID)
);

-- Sample data insertions for Items (10+ rows)


INSERT INTO Items (ItemID, ItemName)
VALUES
(1, 'Milk'),
(2, 'Bread'),
(3, 'Eggs'),
(4, 'Cheese'),
(5, 'Rice'),
(6, 'Oil'),
(7, 'Sugar'),
(8, 'Tea'),
(9, 'Coffee'),
(10, 'Flour');

-- Sample data insertions for SearchRecords (35+ rows)


INSERT INTO SearchRecords (SearchID, ItemID, SearchDate)
VALUES
(1, 1, '2024-01-10'),
(2, 2, '2024-01-15'),
(3, 3, '2024-01-20'),
(4, 1, '2024-02-02'),
(5, 4, '2024-02-05'),
(6, 5, '2024-02-07'),

674
1000+ SQL Interview Questions & Answers | By Zero Analyst

(7, 2, '2024-02-10'),
(8, 6, '2024-02-11'),
(9, 3, '2024-02-15'),
(10, 7, '2024-02-18'),
(11, 8, '2024-03-01'),
(12, 9, '2024-03-03'),
(13, 4, '2024-03-07'),
(14, 1, '2024-03-10'),
(15, 2, '2024-03-12'),
(16, 5, '2024-03-15'),
(17, 6, '2024-03-17'),
(18, 7, '2024-03-20'),
(19, 8, '2024-03-22'),
(20, 9, '2024-03-25'),
(21, 10, '2024-03-28'),
(22, 1, '2024-04-01'),
(23, 2, '2024-04-02'),
(24, 3, '2024-04-03'),
(25, 4, '2024-04-04'),
(26, 5, '2024-04-05'),
(27, 6, '2024-04-06'),
(28, 7, '2024-04-07'),
(29, 8, '2024-04-08'),
(30, 9, '2024-04-09'),
(31, 10, '2024-04-10'),
(32, 1, '2024-04-11'),
(33, 2, '2024-04-12'),
(34, 3, '2024-04-13'),
(35, 4, '2024-04-14');

Learnings
• Filtering records based on date range (2024 in this case).
• Using JOIN to combine SearchRecords with Items to get item names.
• GROUP BY to aggregate search volume by ItemID.
• COUNT() function to calculate the search volume for each item.
• Sorting the result by search volume to identify the most searched item.
Solutions
• - PostgreSQL and MySQL solution
SELECT i.ItemName, COUNT(sr.SearchID) AS SearchVolume
FROM SearchRecords sr
JOIN Items i ON sr.ItemID = i.ItemID
WHERE sr.SearchDate BETWEEN '2024-01-01' AND '2024-12-31'
GROUP BY i.ItemName
ORDER BY SearchVolume DESC
LIMIT 1;

Explanation:
• JOIN: The SearchRecords table is joined with the Items table on ItemID to get the name
of the item.
• WHERE: The SearchDate is filtered for the year 2024 using the BETWEEN '2024-01-01'
AND '2024-12-31' condition.
• COUNT(sr.SearchID): The COUNT() function is used to calculate how many times each
item was searched in 2024.
• GROUP BY i.ItemName: We group by ItemName to aggregate the search counts for
each item.
• ORDER BY SearchVolume DESC: The results are ordered by the search volume in
descending order to get the most searched item at the top.
• LIMIT 1: Only the top 1 item (most searched) is returned.

675
1000+ SQL Interview Questions & Answers | By Zero Analyst

Notes:
• You can modify the LIMIT 1 if you want the top N most searched items.
• Q.546
Question
Write a SQL query to find the top 3 most searched items for Blinkit in 2024, along with
their search volumes. The tables involved are SearchRecords and Items.
Explanation
To find the top 3 most searched items:
• Join the SearchRecords table with the Items table to get item names.
• Filter the records for the year 2024.
• Group by ItemID to calculate the search volume for each item.
• Order by the search volume in descending order to get the most searched items.
• Limit the results to top 3 items.
Datasets and SQL Schemas
• - Table creation and sample data
CREATE TABLE Items (
ItemID INT PRIMARY KEY,
ItemName VARCHAR(100)
);

CREATE TABLE SearchRecords (


SearchID INT PRIMARY KEY,
ItemID INT,
SearchDate DATE,
FOREIGN KEY (ItemID) REFERENCES Items(ItemID)
);

-- Sample data insertions for Items (10+ rows)


INSERT INTO Items (ItemID, ItemName)
VALUES
(1, 'Milk'),
(2, 'Bread'),
(3, 'Eggs'),
(4, 'Cheese'),
(5, 'Rice'),
(6, 'Oil'),
(7, 'Sugar'),
(8, 'Tea'),
(9, 'Coffee'),
(10, 'Flour');

-- Sample data insertions for SearchRecords (35+ rows)


INSERT INTO SearchRecords (SearchID, ItemID, SearchDate)
VALUES
(1, 1, '2024-01-10'),
(2, 2, '2024-01-15'),
(3, 3, '2024-01-20'),
(4, 1, '2024-02-02'),
(5, 4, '2024-02-05'),
(6, 5, '2024-02-07'),
(7, 2, '2024-02-10'),
(8, 6, '2024-02-11'),
(9, 3, '2024-02-15'),
(10, 7, '2024-02-18'),
(11, 8, '2024-03-01'),
(12, 9, '2024-03-03'),
(13, 4, '2024-03-07'),
(14, 1, '2024-03-10'),
(15, 2, '2024-03-12'),
(16, 5, '2024-03-15'),

676
1000+ SQL Interview Questions & Answers | By Zero Analyst

(17, 6, '2024-03-17'),
(18, 7, '2024-03-20'),
(19, 8, '2024-03-22'),
(20, 9, '2024-03-25'),
(21, 10, '2024-03-28'),
(22, 1, '2024-04-01'),
(23, 2, '2024-04-02'),
(24, 3, '2024-04-03'),
(25, 4, '2024-04-04'),
(26, 5, '2024-04-05'),
(27, 6, '2024-04-06'),
(28, 7, '2024-04-07'),
(29, 8, '2024-04-08'),
(30, 9, '2024-04-09'),
(31, 10, '2024-04-10'),
(32, 1, '2024-04-11'),
(33, 2, '2024-04-12'),
(34, 3, '2024-04-13'),
(35, 4, '2024-04-14');

Learnings
• Filtering by date range for 2024.
• JOIN to combine the SearchRecords and Items tables based on ItemID.
• COUNT() to calculate search volume for each item.
• GROUP BY to aggregate by item.
• LIMIT to return only the top N results.
Solutions
• - PostgreSQL and MySQL solution
SELECT
i.ItemName,
COUNT(sr.SearchID) AS SearchVolume
FROM SearchRecords sr
JOIN Items i ON sr.ItemID = i.ItemID
WHERE sr.SearchDate BETWEEN '2024-01-01' AND '2024-12-31'
GROUP BY i.ItemName
ORDER BY SearchVolume DESC
LIMIT 3;

Explanation:
• JOIN: Combines SearchRecords with Items on ItemID to get the item names.
• WHERE: Filters search records to only include those from 2024.
• COUNT(sr.SearchID): Counts how many times each item was searched.
• GROUP BY i.ItemName: Aggregates results by ItemName.
• ORDER BY SearchVolume DESC: Orders the items by the number of searches in
descending order.
• LIMIT 3: Returns the top 3 most searched items.

Notes:
• You can adjust the LIMIT value to return more or fewer items based on your needs.
• Q.547
Question
Write a SQL query to find the top 5 most frequently ordered items from Blinkit in 2024,
along with their total order quantity, considering both food and grocery categories, and
excluding items ordered less than 10 times in total. The tables involved are Orders and
OrderDetails.

677
1000+ SQL Interview Questions & Answers | By Zero Analyst

Explanation
To solve this:
• Join the Orders table with the OrderDetails table to get the details of items ordered.
• Filter for orders in 2024 and for food and grocery categories.
• Calculate the total order quantity for each item in the year.
• Exclude items with a total order quantity of less than 10.
• Order the items by their total order quantity in descending order and return the top 5.
Datasets and SQL Schemas
• - Table creation and sample data
CREATE TABLE Orders (
OrderID INT PRIMARY KEY,
OrderType VARCHAR(50),
OrderDate DATE
);

CREATE TABLE OrderDetails (


OrderDetailID INT PRIMARY KEY,
OrderID INT,
ItemID INT,
Quantity INT,
FOREIGN KEY (OrderID) REFERENCES Orders(OrderID)
);

CREATE TABLE Items (


ItemID INT PRIMARY KEY,
ItemName VARCHAR(100)
);

-- Sample data insertions for Orders (20+ rows)


INSERT INTO Orders (OrderID, OrderType, OrderDate)
VALUES
(1, 'Food', '2024-01-10'),
(2, 'Grocery', '2024-01-12'),
(3, 'Food', '2024-02-01'),
(4, 'Grocery', '2024-02-15'),
(5, 'Food', '2024-03-05'),
(6, 'Grocery', '2024-03-10'),
(7, 'Food', '2024-04-01'),
(8, 'Food', '2024-04-05'),
(9, 'Grocery', '2024-04-15'),
(10, 'Food', '2024-05-10'),
(11, 'Grocery', '2024-06-01'),
(12, 'Food', '2024-07-10'),
(13, 'Food', '2024-08-10'),
(14, 'Grocery', '2024-09-15'),
(15, 'Food', '2024-10-20'),
(16, 'Food', '2024-11-25'),
(17, 'Grocery', '2024-12-01'),
(18, 'Food', '2024-12-05');

-- Sample data insertions for OrderDetails (35+ rows)


INSERT INTO OrderDetails (OrderDetailID, OrderID, ItemID, Quantity)
VALUES
(1, 1, 1, 5),
(2, 1, 2, 2),
(3, 2, 3, 3),
(4, 2, 4, 1),
(5, 3, 1, 6),
(6, 3, 4, 2),
(7, 4, 3, 4),
(8, 4, 2, 5),
(9, 5, 1, 8),
(10, 5, 3, 3),
(11, 6, 4, 2),
(12, 6, 1, 7),

678
1000+ SQL Interview Questions & Answers | By Zero Analyst

(13, 7, 2, 5),
(14, 8, 1, 4),
(15, 9, 2, 3),
(16, 9, 3, 5),
(17, 10, 1, 9),
(18, 10, 2, 3),
(19, 11, 3, 6),
(20, 12, 4, 2),
(21, 13, 1, 7),
(22, 14, 3, 4),
(23, 14, 1, 3),
(24, 15, 2, 6),
(25, 15, 1, 2),
(26, 16, 3, 4),
(27, 17, 4, 5),
(28, 17, 2, 7),
(29, 18, 1, 6);

Learnings
• JOIN operation between Orders and OrderDetails to gather order details and item
quantities.
• Using GROUP BY to aggregate the total quantity of each item.
• HAVING clause to filter out items with a total order quantity less than 10.
• Calculating the total order quantity for each item over 2024.
• Sorting the results in descending order of quantity and limiting the results to the top 5
items.
Solutions
• - PostgreSQL and MySQL solution
SELECT i.ItemName, SUM(od.Quantity) AS TotalQuantity
FROM OrderDetails od
JOIN Orders o ON od.OrderID = o.OrderID
JOIN Items i ON od.ItemID = i.ItemID
WHERE o.OrderDate BETWEEN '2024-01-01' AND '2024-12-31'
AND (o.OrderType = 'Food' OR o.OrderType = 'Grocery')
GROUP BY i.ItemName
HAVING SUM(od.Quantity) >= 10
ORDER BY TotalQuantity DESC
LIMIT 5;

Explanation:
• JOIN: We join OrderDetails with Orders on OrderID and Items on ItemID to get the
item names.
• WHERE: Filters orders for food and grocery and limits the date range to 2024.
• SUM(od.Quantity): This calculates the total quantity ordered for each item across all
orders.
• GROUP BY i.ItemName: We group by ItemName to aggregate the results for each item.
• HAVING: Filters out items with a total quantity less than 10.
• ORDER BY TotalQuantity DESC: Sorts the items in descending order of total quantity
ordered.
• LIMIT 5: Returns only the top 5 most ordered items.

Notes:
• Ensure that all necessary JOINs are performed to combine the OrderDetails, Orders, and
Items tables.
• Adjust the HAVING clause to ensure items with a total quantity of at least 10 are included.
• Q.548

679
1000+ SQL Interview Questions & Answers | By Zero Analyst

Question
Write a SQL query to find the top 5 most ordered food items on Zomato in 2024, along
with their total order quantities for each item. The tables involved are Orders,
OrderDetails, and FoodItems.
Explanation
To find the top 5 most ordered food items:
• Join the Orders table with the OrderDetails table to get item-level details.
• Filter the orders to include only food items (i.e., OrderType = 'Food').
• Filter the records to only include orders from 2024.
• Group the results by ItemID to calculate the total order quantity for each food item.
• Order the results by the total order quantity in descending order and return the top 5 most
ordered items.
Datasets and SQL Schemas
• - Table creation and sample data
CREATE TABLE Orders (
OrderID INT PRIMARY KEY,
OrderType VARCHAR(50),
OrderDate DATE
);

CREATE TABLE OrderDetails (


OrderDetailID INT PRIMARY KEY,
OrderID INT,
ItemID INT,
Quantity INT,
FOREIGN KEY (OrderID) REFERENCES Orders(OrderID)
);

CREATE TABLE FoodItems (


ItemID INT PRIMARY KEY,
ItemName VARCHAR(100)
);

--
INSERT INTO Orders (OrderID, OrderType, OrderDate)
VALUES
(1, 'Food', '2024-01-10'),
(2, 'Food', '2024-01-12'),
(3, 'Food', '2024-02-01'),
(4, 'Food', '2024-02-15'),
(5, 'Food', '2024-03-05'),
(6, 'Food', '2024-03-10'),
(7, 'Food', '2024-04-01'),
(8, 'Food', '2024-04-05'),
(9, 'Food', '2024-04-15'),
(10, 'Food', '2024-05-10');

--
INSERT INTO OrderDetails (OrderDetailID, OrderID, ItemID, Quantity)
VALUES
(1, 1, 1, 5),
(2, 1, 2, 2),
(3, 2, 3, 3),
(4, 2, 4, 1),
(5, 3, 1, 6),
(6, 3, 4, 2),
(7, 4, 3, 4),
(8, 4, 2, 5),
(9, 5, 1, 8),
(10, 5, 3, 3),
(11, 6, 4, 2),
(12, 6, 1, 7),

680
1000+ SQL Interview Questions & Answers | By Zero Analyst

(13, 7, 2, 5),
(14, 8, 1, 4),
(15, 9, 2, 3),
(16, 9, 3, 5),
(17, 10, 1, 9),
(18, 10, 2, 3);

--
INSERT INTO FoodItems (ItemID, ItemName)
VALUES
(1, 'Pizza'),
(2, 'Burger'),
(3, 'Pasta'),
(4, 'Fries'),
(5, 'Sandwich');

Learnings
• JOIN to combine Orders, OrderDetails, and FoodItems based on ItemID.
• Filtering records for food orders only.
• Using GROUP BY to aggregate the total order quantity for each food item.
• ORDER BY to sort the results by total order quantity.
• LIMIT to get the top N items.
Solutions
• - PostgreSQL and MySQL solution
SELECT fi.ItemName, SUM(od.Quantity) AS TotalQuantity
FROM OrderDetails od
JOIN Orders o ON od.OrderID = o.OrderID
JOIN FoodItems fi ON od.ItemID = fi.ItemID
WHERE o.OrderType = 'Food'
AND o.OrderDate BETWEEN '2024-01-01' AND '2024-12-31'
GROUP BY fi.ItemName
ORDER BY TotalQuantity DESC
LIMIT 5;

Explanation:
• JOIN: Joins OrderDetails with Orders on OrderID and FoodItems on ItemID to get
item names.
• WHERE: Filters only food orders and restricts to the year 2024 using the BETWEEN clause.
• SUM(od.Quantity): Computes the total quantity ordered for each food item.
• GROUP BY fi.ItemName: Aggregates the data by ItemName.
• ORDER BY TotalQuantity DESC: Sorts by the total quantity ordered in descending
order.
• LIMIT 5: Returns only the top 5 most ordered food items.

Notes:
• Adjust the LIMIT if you need more or fewer top items.
• The WHERE clause filters to ensure only food items are considered for analysis.
• Q.549
Question
Write a SQL query to find the best-selling category on Zomato in 2024, based on the total
order quantity of food items in each category. The tables involved are Orders,
OrderDetails, FoodItems, and Categories.
Explanation
To find the best-selling category:

681
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Join the Orders table with the OrderDetails table to get the details of items ordered.
• Join the FoodItems table with the Categories table to categorize the items.
• Filter the records to include only food orders in 2024.
• Group the results by CategoryName and calculate the total order quantity for each
category.
• Order the results by total order quantity in descending order and return the top category.
Datasets and SQL Schemas
• - Table creation and sample data
CREATE TABLE Orders (
OrderID INT PRIMARY KEY,
OrderType VARCHAR(50),
OrderDate DATE
);

CREATE TABLE OrderDetails (


OrderDetailID INT PRIMARY KEY,
OrderID INT,
ItemID INT,
Quantity INT,
FOREIGN KEY (OrderID) REFERENCES Orders(OrderID)
);

CREATE TABLE FoodItems (


ItemID INT PRIMARY KEY,
ItemName VARCHAR(100),
CategoryID INT
);

CREATE TABLE Categories (


CategoryID INT PRIMARY KEY,
CategoryName VARCHAR(50)
);

--
INSERT INTO Orders (OrderID, OrderType, OrderDate)
VALUES
(1, 'Food', '2024-01-10'),
(2, 'Food', '2024-01-12'),
(3, 'Food', '2024-02-01'),
(4, 'Food', '2024-02-15'),
(5, 'Food', '2024-03-05'),
(6, 'Food', '2024-03-10'),
(7, 'Food', '2024-04-01'),
(8, 'Food', '2024-04-05'),
(9, 'Food', '2024-04-15'),
(10, 'Food', '2024-05-10');

--
INSERT INTO OrderDetails (OrderDetailID, OrderID, ItemID, Quantity)
VALUES
(1, 1, 1, 5),
(2, 1, 2, 2),
(3, 2, 3, 3),
(4, 2, 4, 1),
(5, 3, 1, 6),
(6, 3, 4, 2),
(7, 4, 3, 4),
(8, 4, 2, 5),
(9, 5, 1, 8),
(10, 5, 3, 3),
(11, 6, 4, 2),
(12, 6, 1, 7),
(13, 7, 2, 5),
(14, 8, 1, 4),
(15, 9, 2, 3),
(16, 9, 3, 5),
(17, 10, 1, 9),

682
1000+ SQL Interview Questions & Answers | By Zero Analyst

(18, 10, 2, 3);

--
INSERT INTO Categories (CategoryID, CategoryName)
VALUES
(1, 'Pizza'),
(2, 'Burgers'),
(3, 'Pasta'),
(4, 'Fries'),
(5, 'Sandwiches');

--
INSERT INTO FoodItems (ItemID, ItemName, CategoryID)
VALUES
(1, 'Margherita Pizza', 1),
(2, 'Cheese Burger', 2),
(3, 'Spaghetti Carbonara', 3),
(4, 'French Fries', 4),
(5, 'Veg Sandwich', 5),
(6, 'Pepperoni Pizza', 1),
(7, 'Chicken Burger', 2),
(8, 'Penne Alfredo', 3),
(9, 'Curly Fries', 4),
(10, 'Grilled Cheese Sandwich', 5);

Learnings
• JOIN operation between OrderDetails, Orders, FoodItems, and Categories to combine
item and category information.
• Filtering orders for food types only.
• GROUP BY to calculate total order quantities for each category.
• ORDER BY to sort categories by the total order quantity in descending order.
• LIMIT 1 to find the best-selling category.
Solutions
• - PostgreSQL and MySQL solution
SELECT c.CategoryName, SUM(od.Quantity) AS TotalQuantity
FROM OrderDetails od
JOIN Orders o ON od.OrderID = o.OrderID
JOIN FoodItems fi ON od.ItemID = fi.ItemID
JOIN Categories c ON fi.CategoryID = c.CategoryID
WHERE o.OrderType = 'Food'
AND o.OrderDate BETWEEN '2024-01-01' AND '2024-12-31'
GROUP BY c.CategoryName
ORDER BY TotalQuantity DESC
LIMIT 1;

Explanation:
• JOIN: We join OrderDetails with Orders on OrderID, FoodItems on ItemID, and
Categories on CategoryID to get category names.
• WHERE: Filters to include only food orders within the year 2024.
• SUM(od.Quantity): Calculates the total quantity of items ordered in each category.
• GROUP BY c.CategoryName: Aggregates the total quantity by CategoryName.
• ORDER BY TotalQuantity DESC: Orders the categories by the total quantity in
descending order.
• LIMIT 1: Returns the category with the highest total order quantity.

Notes:
• This query returns the best-selling category by total order quantity. Adjust the LIMIT if
you want the top N categories.

683
1000+ SQL Interview Questions & Answers | By Zero Analyst

• The JOIN operation is critical to combine the information from the multiple tables.
• Q.550
Question
Write a SQL query to identify the highest ordered restaurant in each city on Zomato in
2024, based on the total quantity of items ordered. The tables involved are Orders,
OrderDetails, Restaurants, and Cities.
Explanation
To find the highest ordered restaurant in each city:
• Join the Orders table with the OrderDetails table to get the details of items ordered.
• Join the Restaurants table to link each order to a specific restaurant.
• Join the Cities table to associate each restaurant with a city.
• Filter the records to only include orders from 2024.
• Group the results by CityName and RestaurantID to calculate the total order quantity
for each restaurant.
• Identify the restaurant with the highest total order quantity in each city.
Datasets and SQL Schemas
• - Table creation and sample data
CREATE TABLE Cities (
CityID INT PRIMARY KEY,
CityName VARCHAR(100)
);

CREATE TABLE Restaurants (


RestaurantID INT PRIMARY KEY,
RestaurantName VARCHAR(100),
CityID INT,
FOREIGN KEY (CityID) REFERENCES Cities(CityID)
);

CREATE TABLE Orders (


OrderID INT PRIMARY KEY,
OrderDate DATE,
RestaurantID INT,
OrderType VARCHAR(50),
FOREIGN KEY (RestaurantID) REFERENCES Restaurants(RestaurantID)
);

CREATE TABLE OrderDetails (


OrderDetailID INT PRIMARY KEY,
OrderID INT,
ItemID INT,
Quantity INT,
FOREIGN KEY (OrderID) REFERENCES Orders(OrderID)
);

--
INSERT INTO Cities (CityID, CityName)
VALUES
(1, 'Delhi'),
(2, 'Mumbai'),
(3, 'Bangalore'),
(4, 'Chennai');

--
INSERT INTO Restaurants (RestaurantID, RestaurantName, CityID)
VALUES
(1, 'Restaurant A', 1),
(2, 'Restaurant B', 1),
(3, 'Restaurant C', 1),

684
1000+ SQL Interview Questions & Answers | By Zero Analyst

(4, 'Restaurant D', 1),


(5, 'Restaurant E', 1),
(6, 'Restaurant F', 2),
(7, 'Restaurant G', 2),
(8, 'Restaurant H', 2),
(9, 'Restaurant I', 2),
(10, 'Restaurant J', 2),
(11, 'Restaurant K', 3),
(12, 'Restaurant L', 3),
(13, 'Restaurant M', 3),
(14, 'Restaurant N', 3),
(15, 'Restaurant O', 3),
(16, 'Restaurant P', 4),
(17, 'Restaurant Q', 4),
(18, 'Restaurant R', 4),
(19, 'Restaurant S', 4),
(20, 'Restaurant T', 4);

--
INSERT INTO Orders (OrderID, OrderDate, RestaurantID, OrderType)
VALUES
(1, '2024-01-10', 1, 'Food'),
(2, '2024-01-12', 2, 'Food'),
(3, '2024-02-01', 3, 'Food'),
(4, '2024-02-15', 4, 'Food'),
(5, '2024-03-05', 5, 'Food'),
(6, '2024-03-10', 6, 'Food'),
(7, '2024-04-01', 7, 'Food'),
(8, '2024-04-05', 8, 'Food'),
(9, '2024-04-15', 9, 'Food'),
(10, '2024-05-10', 10, 'Food'),
(11, '2024-05-12', 11, 'Food'),
(12, '2024-06-01', 12, 'Food'),
(13, '2024-06-05', 13, 'Food'),
(14, '2024-07-01', 14, 'Food'),
(15, '2024-07-10', 15, 'Food'),
(16, '2024-08-01', 16, 'Food'),
(17, '2024-09-01', 17, 'Food'),
(18, '2024-09-15', 18, 'Food'),
(19, '2024-10-05', 19, 'Food'),
(20, '2024-10-15', 20, 'Food');

--
INSERT INTO OrderDetails (OrderDetailID, OrderID, ItemID, Quantity)
VALUES
(1, 1, 1, 5),
(2, 1, 2, 2),
(3, 2, 3, 3),
(4, 2, 4, 1),
(5, 3, 1, 6),
(6, 3, 4, 2),
(7, 4, 3, 4),
(8, 4, 2, 5),
(9, 5, 1, 8),
(10, 5, 3, 3),
(11, 6, 4, 2),
(12, 6, 1, 7),
(13, 7, 2, 5),
(14, 8, 1, 4),
(15, 9, 2, 3),
(16, 9, 3, 5),
(17, 10, 1, 9),
(18, 10, 2, 3),
(19, 11, 3, 6),
(20, 12, 4, 2),
(21, 13, 1, 7),
(22, 14, 3, 4),
(23, 14, 1, 3),
(24, 15, 2, 6),
(25, 15, 1, 2),
(26, 16, 3, 4),
(27, 17, 4, 5),

685
1000+ SQL Interview Questions & Answers | By Zero Analyst

(28, 17, 2, 7),


(29, 18, 1, 6),
(30, 19, 4, 3),
(31, 19, 2, 6),
(32, 20, 3, 4);

Learnings
• JOIN operations between Orders, OrderDetails, Restaurants, and Cities to link order
data with restaurant and city information.
• GROUP BY to aggregate total order quantities for each restaurant within each city.
• Using ORDER BY to sort by total quantity ordered for each restaurant in each city.
• ROW_NUMBER() to identify the highest ordered restaurant in each city.
Solutions
• - PostgreSQL and MySQL solution
WITH RestaurantTotalOrders AS (
SELECT r.CityID, r.RestaurantID, r.RestaurantName, SUM(od.Quantity) AS TotalQuantity
FROM OrderDetails od
JOIN Orders o ON od.OrderID = o.OrderID
JOIN Restaurants r ON o.RestaurantID = r.RestaurantID
WHERE o.OrderDate BETWEEN '2024-01-01' AND '2024-12-31'
GROUP BY r.CityID, r.RestaurantID, r.RestaurantName
)
SELECT c.CityName, r.RestaurantName, r.TotalQuantity
FROM RestaurantTotalOrders r
JOIN Cities c ON r.CityID = c.CityID
WHERE (r.CityID, r.TotalQuantity) IN (
SELECT CityID, MAX(TotalQuantity)
FROM RestaurantTotalOrders
GROUP BY CityID
)
ORDER BY c.CityName;

Explanation:
• WITH: The RestaurantTotalOrders Common Table Expression (CTE) aggregates the
total quantity ordered by each restaurant in each city in 2024.
• JOIN: Joins RestaurantTotalOrders with Cities to get the city name.
• WHERE: Filters the results to get only the restaurant with the highest total quantity in
each city.
• MAX(TotalQuantity): Identifies the restaurant with the highest total quantity in each city
using the MAX function.
• ORDER BY: Orders the final result by CityName for clarity.

Notes:
• This query returns the highest ordered restaurant in each city by total quantity of items
ordered.
• The WITH clause helps to pre-
aggregate the total order quantities for restaurants in each city.
• Q.551
Question
Write a SQL query to select the name and bonus of all employees whose bonus is less than
1000. If an employee does not have a bonus, the result should show NULL for the bonus.
Explanation

686
1000+ SQL Interview Questions & Answers | By Zero Analyst

To solve this:
• Perform a LEFT JOIN between the Employee and Bonus tables, using empId as the
common key.
• Filter the results to only include employees whose bonus is less than 1000 or where the
bonus is NULL.
• Select the name and bonus of the employees.
Datasets and SQL Schemas
• - Table creation and sample data
CREATE TABLE Employee (
empId INT PRIMARY KEY,
name VARCHAR(50),
supervisor INT,
salary INT
);

CREATE TABLE Bonus (


empId INT PRIMARY KEY,
bonus INT
);

-- Sample data insertions for Employee


INSERT INTO Employee (empId, name, supervisor, salary)
VALUES
(1, 'John', 3, 1000),
(2, 'Dan', 3, 2000),
(3, 'Brad', NULL, 4000),
(4, 'Thomas', 3, 4000);

-- Sample data insertions for Bonus


INSERT INTO Bonus (empId, bonus)
VALUES
(2, 500),
(4, 2000);

Learnings
• LEFT JOIN to include all employees from the Employee table, even if they don't have a
bonus in the Bonus table.
• Use of WHERE clause to filter employees with a bonus less than 1000 or NULL.
• NULL handling to ensure employees with no bonus are included with a NULL value for
bonus.
Solutions
• - PostgreSQL and MySQL solution
SELECT e.name, b.bonus
FROM Employee e
LEFT JOIN Bonus b ON e.empId = b.empId
WHERE (b.bonus < 1000 OR b.bonus IS NULL);

Explanation:
• LEFT JOIN ensures all employees are included even if they don't have a corresponding
record in the Bonus table.
• The WHERE clause filters the employees whose bonus is either less than 1000 or NULL.
• The query returns the name and the bonus of those employees, including NULL for those
who don't have a bonus or have no matching record in the Bonus table.
• Q.552
Question

687
1000+ SQL Interview Questions & Answers | By Zero Analyst

Write an SQL query to find the customer_number for the customer who has placed the
largest number of orders in the Orders table. It is guaranteed that exactly one customer will
have placed more orders than any other customer.
Explanation
To solve this:
• Group the Orders table by customer_number and count the number of orders placed by
each customer.
• Sort the customers by the number of orders in descending order.
• Select the customer with the highest order count.
Datasets and SQL Schemas
• - Table creation and sample data
CREATE TABLE Orders (
order_number INT PRIMARY KEY,
customer_number INT
);

-- Sample data insertions for Orders


INSERT INTO Orders (order_number, customer_number)
VALUES
(1, 1),
(2, 2),
(3, 3),
(4, 3);

Learnings
• GROUP BY to aggregate the number of orders for each customer.
• COUNT() function to count the number of orders per customer.
• ORDER BY to sort the customers by order count in descending order.
• LIMIT 1 to select the customer with the most orders.
Solutions
• - PostgreSQL and MySQL solution
SELECT customer_number
FROM Orders
GROUP BY customer_number
ORDER BY COUNT(order_number) DESC
LIMIT 1;

Explanation:
• GROUP BY: This groups the orders by customer_number.
• COUNT(order_number): This counts the number of orders for each customer.
• ORDER BY COUNT(order_number) DESC: Sorts the customers by the total number of
orders in descending order.
• LIMIT 1: Ensures only the customer with the most orders is returned.
This query returns the customer_number of the customer who has placed the most orders.
• Q.553
Question
Write an SQL query to report the movies from the Cinema table that have an odd-numbered
id and a description that is not "boring". Return the result sorted by rating in descending
order.

688
1000+ SQL Interview Questions & Answers | By Zero Analyst

Explanation
To solve this:
• Filter movies where the id is odd using the modulo operation (id % 2 != 0).
• Ensure the description is not "boring" by using a WHERE clause (description !=
'boring').
• Sort the results in descending order of the rating.
Datasets and SQL Schemas
• - Table creation and sample data
CREATE TABLE Cinema (
id INT PRIMARY KEY,
movie VARCHAR(100),
description VARCHAR(255),
rating FLOAT
);

-- Sample data insertions for Cinema


INSERT INTO Cinema (id, movie, description, rating)
VALUES
(1, 'War', 'great 3D', 8.9),
(2, 'Science', 'fiction', 8.5),
(3, 'Irish', 'boring', 6.2),
(4, 'Ice Song', 'Fantasy', 8.6),
(5, 'House Card', 'Interesting', 9.1);

Learnings
• Use the MOD operator (%) to check for odd numbers (id % 2 != 0).
• Filter rows using the WHERE clause for conditions like description not being "boring".
• Sort the result with ORDER BY rating DESC to get the movies with the highest rating
first.
Solutions
• - PostgreSQL and MySQL solution
SELECT id, movie, description, rating
FROM Cinema
WHERE id % 2 != 0
AND description != 'boring'
ORDER BY rating DESC;

Explanation:
• WHERE id % 2 != 0: This filters out movies with an even id, leaving only those with odd
id values.
• AND description != 'boring': Ensures that the description is not "boring".
• ORDER BY rating DESC: Sorts the movies by rating in descending order, so movies
with higher ratings appear first.
This query will return the list of movies that meet the criteria, sorted by their rating in
descending order.
• Q.554
Question
Write an SQL query to find the top-performing rider(s) for Zomato, based on the number of
deliveries completed in the last 30 days. The rider(s) who made the most deliveries during
this period should be considered top performers. Return the rider_id and the total_deliveries
for each top-performing rider.

689
1000+ SQL Interview Questions & Answers | By Zero Analyst

Explanation
To solve this:
• Calculate the total number of deliveries made by each rider in the last 30 days. This
requires filtering the data for the last 30 days based on the delivery_date.
• Count the number of deliveries per rider using the COUNT() function.
• Find the maximum number of deliveries made by any rider in this period.
• Filter out the riders who made the maximum number of deliveries.
• Return the rider_id and total_deliveries for the top-performing rider(s).
Datasets and SQL Schemas
• - Table creation and sample data
CREATE TABLE Deliveries (
delivery_id INT PRIMARY KEY,
rider_id INT,
delivery_date DATE
);

-- Sample data insertions for Deliveries


INSERT INTO Deliveries (delivery_id, rider_id, delivery_date)
VALUES
(1, 101, '2024-12-15'),
(2, 102, '2024-12-16'),
(3, 101, '2024-12-17'),
(4, 103, '2024-12-17'),
(5, 102, '2024-12-18'),
(6, 101, '2024-12-19'),
(7, 103, '2024-12-19'),
(8, 104, '2024-12-20'),
(9, 101, '2024-12-21');

Learnings
• Use the WHERE clause to filter deliveries within the last 30 days.
• Use COUNT() to count the number of deliveries per rider.
• MAX() to find the highest number of deliveries.
• Use GROUP BY to group the results by rider_id and HAVING to filter out riders who
made the maximum deliveries.
Solutions
• - PostgreSQL and MySQL solution
SELECT rider_id, COUNT(delivery_id) AS total_deliveries
FROM Deliveries
WHERE delivery_date >= CURRENT_DATE - INTERVAL 30 DAY
GROUP BY rider_id
HAVING COUNT(delivery_id) = (
SELECT MAX(delivery_count)
FROM (
SELECT rider_id, COUNT(delivery_id) AS delivery_count
FROM Deliveries
WHERE delivery_date >= CURRENT_DATE - INTERVAL 30 DAY
GROUP BY rider_id
) AS rider_counts
);

Explanation:
• WHERE delivery_date >= CURRENT_DATE - INTERVAL 30 DAY: Filters
deliveries in the last 30 days.
• COUNT(delivery_id): Counts the number of deliveries made by each rider.
• GROUP BY rider_id: Groups the results by rider_id to count deliveries per rider.

690
1000+ SQL Interview Questions & Answers | By Zero Analyst

• HAVING COUNT(delivery_id) = MAX(delivery_count): Filters out the riders who have


the maximum number of deliveries in the last 30 days.
• Subquery: The inner query calculates the maximum number of deliveries made by any
rider in the last 30 days. The outer query uses this value to filter for the top-performing
rider(s).
This query returns the rider_id and their total_deliveries for the top-performing rider(s) in
the last 30 days.
• Q.556
Question
Write an SQL query to find the top 3 riders who earned the most based on the total
successful deliveries in the last 30 days. Each successful delivery earns the rider $5, and we
also need to count the unsuccessful deliveries (where the status is "unsuccessful"). Return
the rider_id, total_successful_deliveries, and total_earnings for the top 3 riders.
Explanation
To solve this:
• Calculate the number of successful deliveries per rider in the last 30 days.
• Calculate the total earnings for each rider, which is the number of successful deliveries
multiplied by $5.
• Also count the number of unsuccessful deliveries for each rider in the same period.
• Sort the riders based on the total earnings in descending order and select the top 3 riders.
• Return the rider_id, total_successful_deliveries, and total_earnings for each of the top 3
riders.
Datasets and SQL Schemas
• - Table creation and sample data
CREATE TABLE Deliveries (
delivery_id INT PRIMARY KEY,
rider_id INT,
delivery_date DATE,
status VARCHAR(20)
);

-- Sample data insertions for Deliveries


INSERT INTO Deliveries (delivery_id, rider_id, delivery_date, status)
VALUES
(1, 101, '2024-12-15', 'successful'),
(2, 102, '2024-12-16', 'unsuccessful'),
(3, 101, '2024-12-17', 'successful'),
(4, 103, '2024-12-17', 'successful'),
(5, 102, '2024-12-18', 'unsuccessful'),
(6, 101, '2024-12-19', 'successful'),
(7, 103, '2024-12-19', 'successful'),
(8, 104, '2024-12-20', 'successful'),
(9, 101, '2024-12-21', 'successful'),
(10, 102, '2024-12-22', 'unsuccessful'),
(11, 103, '2024-12-22', 'successful'),
(12, 104, '2024-12-22', 'unsuccessful'),
(13, 105, '2024-12-23', 'successful'),
(14, 101, '2024-12-23', 'unsuccessful'),
(15, 106, '2024-12-24', 'successful'),
(16, 103, '2024-12-24', 'unsuccessful'),
(17, 104, '2024-12-25', 'successful'),
(18, 105, '2024-12-25', 'successful'),
(19, 106, '2024-12-26', 'successful'),
(20, 101, '2024-12-26', 'unsuccessful'),
(21, 106, '2024-12-27', 'successful'),

691
1000+ SQL Interview Questions & Answers | By Zero Analyst

(22, 104, '2024-12-27', 'unsuccessful'),


(23, 103, '2024-12-28', 'successful'),
(24, 102, '2024-12-28', 'successful'),
(25, 101, '2024-12-29', 'successful');

Learnings
• COUNT() function to count successful and unsuccessful deliveries.
• CASE WHEN to differentiate between successful and unsuccessful deliveries.
• SUM() function to calculate total earnings.
• ORDER BY to sort based on earnings in descending order.
• LIMIT 3 to select only the top 3 riders.
Solutions
• - PostgreSQL and MySQL solution
SELECT rider_id,
COUNT(CASE WHEN status = 'successful' THEN 1 END) AS total_successful_deliveries,
COUNT(CASE WHEN status = 'unsuccessful' THEN 1 END) AS total_unsuccessful_deliver
ies,
COUNT(CASE WHEN status = 'successful' THEN 1 END) * 5 AS total_earnings
FROM Deliveries
WHERE delivery_date >= CURRENT_DATE - INTERVAL 30 DAY
GROUP BY rider_id
ORDER BY total_earnings DESC
LIMIT 3;

Explanation:
• COUNT(CASE WHEN status = 'successful' THEN 1 END): This counts the number of
successful deliveries for each rider.
• COUNT(CASE WHEN status = 'unsuccessful' THEN 1 END): This counts the number
of unsuccessful deliveries for each rider.
• COUNT(CASE WHEN status = 'successful' THEN 1 END) * 5: This calculates the
total earnings, where each successful delivery earns $5.
• WHERE delivery_date >= CURRENT_DATE - INTERVAL 30 DAY: Filters
deliveries within the last 30 days.
• GROUP BY rider_id: Groups the results by rider to count the deliveries and earnings per
rider.
• ORDER BY total_earnings DESC: Sorts riders by total earnings in descending order.
• LIMIT 3: Selects only the top 3 riders based on their earnings.
This query will return the rider_id, total_successful_deliveries,
total_unsuccessful_deliveries, and total_earnings for the top 3 riders who made the most
successful deliveries in the last 30 days.
• Q.557
Question
Write an SQL query to find the average rider working hours per day based on the order
delivery times from the OrderDelivery table. The table contains the rider_id, order_id,
and delivery_time (in timestamp format). The goal is to calculate the average working hours
each rider spends delivering orders per day.
Explanation
To solve this:
• Extract the working hours for each rider per day from the delivery_time.

692
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Group the data by rider_id and date (extracted from delivery_time).


• Calculate the total working hours for each rider on each day by subtracting the start time
from the end time (if available).
• Calculate the average working hours per day for each rider across all days.
Datasets and SQL Schemas
• - Table creation and sample data
CREATE TABLE OrderDelivery (
order_id INT PRIMARY KEY,
rider_id INT,
delivery_time TIMESTAMP
);

-- Sample data insertions for OrderDelivery


INSERT INTO OrderDelivery (order_id, rider_id, delivery_time)
VALUES
(1, 101, '2024-12-01 08:00:00'),
(2, 101, '2024-12-01 09:30:00'),
(3, 102, '2024-12-01 10:00:00'),
(4, 101, '2024-12-02 08:00:00'),
(5, 103, '2024-12-02 11:00:00'),
(6, 102, '2024-12-02 12:00:00'),
(7, 104, '2024-12-02 15:30:00'),
(8, 101, '2024-12-03 09:00:00'),
(9, 103, '2024-12-03 10:30:00'),
(10, 104, '2024-12-03 14:00:00');

Learnings
• DATE() or CAST() to extract the date part from a timestamp.
• COUNT() to calculate the number of delivery days for each rider.
• TIMESTAMPDIFF() or EXTRACT() to calculate the difference between two
timestamps (start and end times) to get working hours.
• AVG() to calculate the average working hours per day for each rider.
Solutions
• - PostgreSQL and MySQL solution
SELECT rider_id,
AVG(TIMESTAMPDIFF(HOUR, delivery_time, NOW())) AS avg_working_hours_per_day
FROM OrderDelivery
GROUP BY rider_id
ORDER BY avg_working_hours_per_day DESC;

Explanation:
• TIMESTAMPDIFF(HOUR, delivery_time, NOW()): This calculates the difference
between the delivery_time and the current time in hours. For a proper "working hour per
day," you might need two timestamps (e.g., start and end time) for deliveries, but if it's a
simple timestamp column representing delivery time, this will give the hours from the
delivery time to the current time.
• AVG(): To calculate the average of these hourly differences for each rider.
• GROUP BY rider_id: This groups the results by rider_id to calculate the working hours
per rider.
• ORDER BY avg_working_hours_per_day DESC: This orders the results by average
working hours per day in descending order.
This query will return the rider_id along with their average working hours per day. You
can adjust the logic if there are explicit start and end times for each delivery.
• Q.558

693
1000+ SQL Interview Questions & Answers | By Zero Analyst

Question
Write an SQL query to calculate the overall idle time for each rider on 31st December 2024.
The idle time for each rider is defined as the time between the first order's delivery time
and the next order's receive time. Only consider the orders that occurred on 31st December
2024. Return the rider_id along with their overall idle time in hours.
Explanation
To solve this:
• Identify the first order: The first order of the day for each rider on 31st December 2024.
• Find the next order: The order placed right after the first order for each rider.
• Calculate idle time: The idle time is the difference between the first order's delivery
time and the next order's receive time.
• Summing idle times: We need to calculate the total idle time for each rider based on all
orders on 31st December 2024.
Datasets and SQL Schemas
• - Table creation and sample data
CREATE TABLE OrderDelivery (
order_id INT PRIMARY KEY,
rider_id INT,
order_time TIMESTAMP
);

-- Sample data insertions for OrderDelivery (31st December 2024)


INSERT INTO OrderDelivery (order_id, rider_id, order_time)
VALUES
(1, 101, '2024-12-31 08:00:00'),
(2, 101, '2024-12-31 09:00:00'),
(3, 101, '2024-12-31 11:00:00'),
(4, 102, '2024-12-31 09:00:00'),
(5, 102, '2024-12-31 10:30:00'),
(6, 103, '2024-12-31 10:00:00'),
(7, 103, '2024-12-31 12:00:00'),
(8, 104, '2024-12-31 08:30:00'),
(9, 104, '2024-12-31 10:00:00'),
(10, 101, '2024-12-31 12:30:00'),
(11, 102, '2024-12-31 11:00:00'),
(12, 103, '2024-12-31 13:00:00'),
(13, 104, '2024-12-31 13:00:00');

Learnings
• LEAD(): To get the next order's delivery time after the current order.
• DATE(): To filter only orders that occurred on 31st December 2024.
• TIMESTAMPDIFF(): To calculate the time difference between two timestamps in hours.
• SUM(): To get the total idle time for each rider.
Solutions
• - PostgreSQL and MySQL solution
WITH OrderWithNext AS (
SELECT rider_id,
order_time,
LEAD(order_time) OVER (PARTITION BY rider_id ORDER BY order_time) AS next_ord
er_time
FROM OrderDelivery
WHERE DATE(order_time) = '2024-12-31'
)
SELECT rider_id,
SUM(TIMESTAMPDIFF(HOUR, order_time, next_order_time)) AS overall_idle_time
FROM OrderWithNext

694
1000+ SQL Interview Questions & Answers | By Zero Analyst

WHERE next_order_time IS NOT NULL


GROUP BY rider_id
ORDER BY overall_idle_time DESC;

Explanation:
• LEAD(order_time) OVER (PARTITION BY rider_id ORDER BY order_time): This
window function returns the next order time for each rider based on the order_time ordered
in ascending order.
• It partitions the data by rider_id, meaning that for each rider, it will look at their orders
and get the time of the next order for the rider.
• WHERE DATE(order_time) = '2024-12-31': Filters the data to only consider orders on
31st December 2024.
• TIMESTAMPDIFF(HOUR, order_time, next_order_time): This function calculates the
time difference (in hours) between the order_time and the next_order_time (i.e., idle time
between orders).
• SUM(TIMESTAMPDIFF(HOUR, order_time, next_order_time)): This sums up the
idle time for all orders for each rider, i.e., the total idle time for each rider on 31st December
2024.
• GROUP BY rider_id: Groups the results by rider_id to calculate the total idle time for
each rider.
• WHERE next_order_time IS NOT NULL: Ensures that only records with a next order
(i.e., there is a subsequent order to calculate idle time) are included.
• ORDER BY overall_idle_time DESC: Orders the results by the total idle time for each
rider in descending order.
This query will return the rider_id and their overall idle time in hours for 31st December
2024, sorted in descending order by idle time.
• Q.559
Problem statement
Write an SQL query to calculate the revenue contribution of food delivery vs. grocery
delivery for the top 5 revenue-generating cities in the last quarter.
Explanation
To solve this:
• Filter the data to consider only the orders from the last quarter.
• Sum the revenue for food delivery and grocery delivery orders separately.
• Identify the top 5 cities based on total revenue.
• Calculate the contribution percentage for food delivery and grocery delivery in these
cities.
Datasets and SQL Schemas
• - Table creation and sample data
CREATE TABLE Revenue (
OrderID INT PRIMARY KEY,
OrderType VARCHAR(50), -- 'food' or 'grocery'
City VARCHAR(100),
TotalAmount DECIMAL(10, 2),
OrderDate DATE
);

-- Sample data insertions for Revenue (for the last quarter)


INSERT INTO Revenue (OrderID, OrderType, City, TotalAmount, OrderDate)

695
1000+ SQL Interview Questions & Answers | By Zero Analyst

VALUES
(1, 'food', 'New York', 100.00, '2024-10-01'),
(2, 'grocery', 'New York', 50.00, '2024-10-05'),
(3, 'food', 'Los Angeles', 120.00, '2024-11-15'),
(4, 'grocery', 'Los Angeles', 75.00, '2024-11-17'),
(5, 'food', 'San Francisco', 150.00, '2024-12-20'),
(6, 'grocery', 'San Francisco', 40.00, '2024-12-22'),
(7, 'food', 'Chicago', 80.00, '2024-10-10'),
(8, 'grocery', 'Chicago', 60.00, '2024-10-12'),
(9, 'food', 'Houston', 90.00, '2024-11-05'),
(10, 'grocery', 'Houston', 100.00, '2024-11-08'),
(11, 'food', 'Phoenix', 110.00, '2024-12-01'),
(12, 'grocery', 'Phoenix', 120.00, '2024-12-05');

Learnings
• Filtering by date: Use the WHERE clause to filter data for the last quarter.
• Aggregating by order type: Use SUM() to get total revenue for food and grocery.
• Ranking cities: Use GROUP BY and ORDER BY to get top cities based on total revenue.
• Calculating percentages: The revenue contribution is calculated as (revenue from food
/ total revenue) * 100 and similarly for grocery.

Solutions
• - PostgreSQL and MySQL solution
WITH CityRevenue AS (
SELECT City,
SUM(CASE WHEN OrderType = 'food' THEN TotalAmount ELSE 0 END) AS FoodRevenue,
SUM(CASE WHEN OrderType = 'grocery' THEN TotalAmount ELSE 0 END) AS GroceryRe
venue,
SUM(TotalAmount) AS TotalRevenue
FROM Revenue
WHERE OrderDate >= '2024-10-01' AND OrderDate <= '2024-12-31' -- Last Quarter (Q4 2
024)
GROUP BY City
ORDER BY TotalRevenue DESC
LIMIT 5
)

SELECT City,
FoodRevenue,
GroceryRevenue,
TotalRevenue,
(FoodRevenue / TotalRevenue) * 100 AS FoodRevenueContribution,
(GroceryRevenue / TotalRevenue) * 100 AS GroceryRevenueContribution
FROM CityRevenue;

Explanation:
• CityRevenue CTE:
• This common table expression (CTE) filters the data to the last quarter (from October 1,
2024, to December 31, 2024).
• It calculates the FoodRevenue, GroceryRevenue, and TotalRevenue for each city using
the SUM() function.
• We use CASE inside SUM() to sum the amounts conditionally based on OrderType (food or
grocery).
• After grouping by city, we order the cities by TotalRevenue in descending order and limit
the result to the top 5 cities.
• Final SELECT:
• The outer query selects the city and calculates the percentage contributions of food and
grocery revenues by dividing the respective revenues by the total revenue for each city.
• We multiply by 100 to express the contributions as percentages.

696
1000+ SQL Interview Questions & Answers | By Zero Analyst

• WHERE OrderDate >= '2024-10-01' AND OrderDate <= '2024-12-31': Filters the
orders that are placed during the last quarter of 2024 (October, November, December).
• LIMIT 5: Restricts the result to only the top 5 cities based on total revenue.

Expected Output:

FoodRev GroceryRe TotalRev FoodRevenueCont GroceryRevenueCon


City
enue venue enue ribution tribution

New
100.00 50.00 150.00 66.67% 33.33%
York

Los
Angel 120.00 75.00 195.00 61.54% 38.46%
es

San
Franci 150.00 40.00 190.00 78.95% 21.05%
sco

Houst
90.00 100.00 190.00 47.37% 52.63%
on

Chica
80.00 60.00 140.00 57.14% 42.86%
go
This query calculates the revenue contribution of food vs. grocery orders in the top 5
revenue-generating cities in the last quarter of 2024.
• Q.560
Write an SQL query to find the average order amount for each OrderType (food or
grocery) in the last 30 days for all cities.
Explanation
To solve this:
• Filter the orders from the last 30 days.
• Group by OrderType to calculate the average order amount separately for food and
grocery.
• Return the results with the average order amount for each order type.
Datasets and SQL Schemas
• - Table creation and sample data
CREATE TABLE Revenue (
OrderID INT PRIMARY KEY,
OrderType VARCHAR(50), -- 'food' or 'grocery'
City VARCHAR(100),
TotalAmount DECIMAL(10, 2),
OrderDate DATE
);

697
1000+ SQL Interview Questions & Answers | By Zero Analyst

-- Sample data insertions for Revenue (last 30 days)


INSERT INTO Revenue (OrderID, OrderType, City, TotalAmount, OrderDate)
VALUES
(1, 'food', 'New York', 100.00, '2024-12-10'),
(2, 'grocery', 'New York', 50.00, '2024-12-12'),
(3, 'food', 'Los Angeles', 120.00, '2024-12-15'),
(4, 'grocery', 'Los Angeles', 75.00, '2024-12-16'),
(5, 'food', 'San Francisco', 150.00, '2024-12-18'),
(6, 'grocery', 'San Francisco', 40.00, '2024-12-22'),
(7, 'food', 'Chicago', 80.00, '2024-12-25'),
(8, 'grocery', 'Chicago', 60.00, '2024-12-26'),
(9, 'food', 'Houston', 90.00, '2024-12-28'),
(10, 'grocery', 'Houston', 100.00, '2024-12-29');

Learnings
• DATE subtraction: Use the CURRENT_DATE function to subtract 30 days and filter the
orders in the last 30 days.
• GROUP BY and AVG(): Use GROUP BY to group the data by OrderType and AVG() to
calculate the average order amount for each type.
Solutions
• - PostgreSQL and MySQL solution
SELECT OrderType,
AVG(TotalAmount) AS AverageOrderAmount
FROM Revenue
WHERE OrderDate >= CURRENT_DATE - INTERVAL 30 DAY
GROUP BY OrderType;

Explanation:
• WHERE OrderDate >= CURRENT_DATE - INTERVAL 30 DAY: Filters the orders
to include only those that have been placed in the last 30 days from the current date.
• GROUP BY OrderType: Groups the data by OrderType (food or grocery) so we can
calculate the average separately for each type.
• AVG(TotalAmount): Calculates the average of the TotalAmount for each group of orders
(food or grocery).

Swiggy
• Q.561
Question
Find city-wise customer count who have placed more than three orders in November 2023.
Explanation
To solve this, we need to:
• Filter orders placed in November 2023.
• Group by customer_id to count the number of orders for each customer.
• Filter out customers who have placed more than three orders.
• Group by city to count the number of customers in each city.
Datasets and SQL Schemas
• - Table creation
CREATE TABLE orders(
order_id INT PRIMARY KEY,
customer_id INT,
order_date DATE,

698
1000+ SQL Interview Questions & Answers | By Zero Analyst

price FLOAT,
city VARCHAR(25)
);
• - Datasets
INSERT INTO orders (order_id, customer_id, order_date, price, city) VALUES
(1, 101, '2023-11-01', 150.50, 'Mumbai'),
(2, 102, '2023-11-05', 200.75, 'Delhi'),
(3, 103, '2023-11-10', 180.25, 'Mumbai'),
(4, 104, '2023-11-15', 120.90, 'Delhi'),
(5, 105, '2023-11-20', 250.00, 'Mumbai'),
(6, 108, '2023-11-25', 180.75, 'Gurgoan'),
(7, 107, '2023-12-30', 300.25, 'Delhi'),
(8, 108, '2023-12-02', 220.50, 'Gurgoan'),
(9, 109, '2023-11-08', 170.00, 'Mumbai'),
(10, 110, '2023-10-12', 190.75, 'Delhi'),
(11, 108, '2023-10-18', 210.25, 'Gurgoan'),
(12, 112, '2023-11-24', 280.50, 'Mumbai'),
(13, 113, '2023-10-29', 150.00, 'Mumbai'),
(14, 103, '2023-11-03', 200.75, 'Mumbai'),
(15, 115, '2023-10-07', 230.90, 'Delhi'),
(16, 116, '2023-11-11', 260.00, 'Mumbai'),
(17, 117, '2023-11-16', 180.75, 'Mumbai'),
(18, 102, '2023-11-22', 320.25, 'Delhi'),
(19, 103, '2023-11-27', 170.50, 'Mumbai'),
(20, 102, '2023-11-05', 220.75, 'Delhi'),
(21, 103, '2023-11-09', 300.25, 'Mumbai'),
(22, 101, '2023-11-15', 180.50, 'Mumbai'),
(23, 104, '2023-11-18', 250.75, 'Delhi'),
(24, 102, '2023-11-20', 280.25, 'Delhi'),
(25, 117, '2023-11-16', 180.75, 'Mumbai'),
(26, 117, '2023-11-16', 180.75, 'Mumbai'),
(27, 117, '2023-11-16', 180.75, 'Mumbai'),
(28, 117, '2023-11-16', 180.75, 'Mumbai');

Learnings
• Use of GROUP BY to aggregate data.
• Using HAVING to filter on aggregated results.
• Date filtering with WHERE clause.
Solutions
• - PostgreSQL solution
SELECT city, COUNT(DISTINCT customer_id) AS customer_count
FROM orders
WHERE order_date >= '2023-11-01' AND order_date <= '2023-11-30'
GROUP BY city
HAVING COUNT(order_id) > 3;
• - MySQL solution
SELECT city, COUNT(DISTINCT customer_id) AS customer_count
FROM orders
WHERE order_date BETWEEN '2023-11-01' AND '2023-11-30'
GROUP BY city
HAVING COUNT(order_id) > 3;
• Q.562
Question
Count the delayed orders for each delivery partner based on predicted and actual delivery
times.
Explanation
The task is to count the number of delayed orders for each delivery partner. An order is
considered delayed if the actual delivery time is later than the predicted delivery time. We
need to:

699
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Compare the predicted_time with delivery_time.


• Filter orders where the actual delivery time is later than the predicted time.
• Group by del_partner to count the delayed orders for each partner.
Datasets and SQL Schemas
• - Table creation
CREATE TABLE order_details (
order_id INT,
del_partner VARCHAR(255),
predicted_time TIMESTAMP,
delivery_time TIMESTAMP
);
• - Datasets
INSERT INTO order_details (order_id, del_partner, predicted_time, delivery_time) VALUES
(11, 'Partner C', '2024-02-29 11:30:00', '2024-02-29 12:00:00'),
(12, 'Partner A', '2024-02-29 10:45:00', '2024-02-29 11:30:00'),
(13, 'Partner B', '2024-02-29 09:00:00', '2024-02-29 09:45:00'),
(14, 'Partner A', '2024-02-29 12:15:00', '2024-02-29 13:00:00'),
(15, 'Partner C', '2024-02-29 13:30:00', '2024-02-29 14:15:00'),
(16, 'Partner B', '2024-02-29 14:45:00', '2024-02-29 15:30:00'),
(17, 'Partner A', '2024-02-29 16:00:00', '2024-02-29 16:45:00'),
(18, 'Partner B', '2024-02-29 17:15:00', '2024-02-29 18:00:00'),
(19, 'Partner C', '2024-02-29 18:30:00', '2024-02-29 19:15:00');

Learnings
• Use of WHERE clause for filtering data based on time comparison.
• GROUP BY to aggregate data by each delivery partner.
• Counting filtered results with COUNT().
Solutions
• - PostgreSQL solution
SELECT del_partner, COUNT(order_id) AS delayed_orders
FROM order_details
WHERE delivery_time > predicted_time
GROUP BY del_partner;
• - MySQL solution
SELECT del_partner, COUNT(order_id) AS delayed_orders
FROM order_details
WHERE delivery_time > predicted_time
GROUP BY del_partner;
• Q.563
Question
Calculate the bad experience rate for new users who signed up in June 2022 during their first
14 days on the platform. The output should include the percentage of bad experiences,
rounded to 2 decimal places. A bad experience is defined as orders that were either completed
incorrectly, never received, or delivered late (i.e., delivery was more than 30 minutes later
than the estimated delivery time).
Explanation
To calculate the bad experience rate, follow these steps:
• Identify the customers who signed up in June 2022.
• Filter the orders placed by these customers within their first 14 days.
• Join the orders with the trips table to check for late deliveries (actual delivery time >
estimated delivery time by more than 30 minutes).

700
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Count the number of "bad experiences" (completed incorrectly, never received, or late
deliveries).
• Calculate the percentage of bad experiences based on the total number of orders within the
first 14 days.
Datasets and SQL Schemas
• - customers Table
CREATE TABLE customers (
customer_id INTEGER PRIMARY KEY,
signup_timestamp TIMESTAMP
);
• - orders Table
CREATE TABLE orders (
order_id INTEGER PRIMARY KEY,
customer_id INTEGER,
trip_id INTEGER,
status VARCHAR(255),
order_timestamp TIMESTAMP
);
• - trips Table
CREATE TABLE trips (
dasher_id INTEGER,
trip_id INTEGER PRIMARY KEY,
estimated_delivery_timestamp TIMESTAMP,
actual_delivery_timestamp TIMESTAMP
);
• - Datasets
INSERT INTO customers (customer_id, signup_timestamp) VALUES
(8472, '2022-05-30 00:00:00'),
(2341, '2022-06-01 00:00:00'),
(1314, '2022-06-03 00:00:00'),
(1435, '2022-06-05 00:00:00'),
(5421, '2022-06-07 00:00:00');
INSERT INTO orders (order_id, customer_id, trip_id, status, order_timestamp) VALUES
(727424, 8472, 100463, 'completed successfully', '2022-06-05 09:12:00'),
(242513, 2341, 100482, 'completed incorrectly', '2022-06-05 14:40:00'),
(141367, 1314, 100362, 'completed incorrectly', '2022-06-07 15:03:00'),
(582193, 5421, 100657, 'never_received', '2022-07-07 15:22:00'),
(253613, 1314, 100213, 'completed successfully', '2022-06-12 13:43:00');
INSERT INTO trips (dasher_id, trip_id, estimated_delivery_timestamp, actual_delivery_tim
estamp) VALUES
(101, 100463, '2022-06-05 09:42:00', '2022-06-05 09:38:00'),
(102, 100482, '2022-06-05 15:10:00', '2022-06-05 15:46:00'),
(101, 100362, '2022-06-07 15:33:00', '2022-06-07 16:45:00'),
(102, 100657, '2022-07-07 15:52:00', NULL),
(103, 100213, '2022-06-12 14:13:00', '2022-06-12 14:10:00');

Learnings
• Use of EXTRACT function to filter users by signup date.
• Filtering orders based on order timestamps to ensure they are within the first 14 days of
signup.
• Understanding the use of INTERVAL to calculate date ranges.
• Use of JOIN between multiple tables to aggregate data from different sources.
• Applying aggregation and conditional counting with COUNT() and WHERE clauses.
Solutions
• - PostgreSQL solution
WITH june22_cte AS (
SELECT
orders.order_id,
orders.trip_id,
orders.status

701
1000+ SQL Interview Questions & Answers | By Zero Analyst

FROM customers
INNER JOIN orders
ON customers.customer_id = orders.customer_id
WHERE EXTRACT(MONTH FROM customers.signup_timestamp) = 6
AND EXTRACT(YEAR FROM customers.signup_timestamp) = 2022
AND orders.order_timestamp BETWEEN customers.signup_timestamp
AND customers.signup_timestamp + INTERVAL '14 DAYS'
)
SELECT
ROUND(
100.0 * COUNT(june22.order_id) / (SELECT COUNT(order_id) FROM june22_cte),
2) AS bad_experience_pct
FROM june22_cte AS june22
INNER JOIN trips
ON june22.trip_id = trips.trip_id
WHERE june22.status IN ('completed incorrectly', 'never_received')
OR trips.actual_delivery_timestamp > trips.estimated_delivery_timestamp + INTERVAL
'30 MINUTE';
• - MySQL solution
WITH june22_cte AS (
SELECT
orders.order_id,
orders.trip_id,
orders.status
FROM customers
INNER JOIN orders
ON customers.customer_id = orders.customer_id
WHERE MONTH(customers.signup_timestamp) = 6
AND YEAR(customers.signup_timestamp) = 2022
AND orders.order_timestamp BETWEEN customers.signup_timestamp
AND DATE_ADD(customers.signup_timestamp, INTERVAL 14 DAY)
)
SELECT
ROUND(
100.0 * COUNT(june22.order_id) / (SELECT COUNT(order_id) FROM june22_cte),
2) AS bad_experience_pct
FROM june22_cte AS june22
INNER JOIN trips
ON june22.trip_id = trips.trip_id
WHERE june22.status IN ('completed incorrectly', 'never_received')
OR trips.actual_delivery_timestamp > DATE_ADD(trips.estimated_delivery_timestamp,
INTERVAL 30 MINUTE);
• Q.564
Question
As a Data Analyst at Swiggy Analyst, compute the average delivery duration for each driver
on each day, the rank of each driver's daily average delivery duration, and the overall average
delivery duration per driver.
Explanation
The task requires calculating:
• The average delivery duration for each driver on a specific day.
• Ranking drivers by their daily average delivery duration.
• The overall average delivery duration for each driver across all their deliveries.
To achieve this, we need to:
• Use EXTRACT(EPOCH FROM ...) to calculate the delivery duration in minutes for each
delivery.
• Use window functions (RANK() and AVG()) to rank the drivers and compute the overall
average.
Datasets and SQL Schemas
• - deliveries Table

702
1000+ SQL Interview Questions & Answers | By Zero Analyst

CREATE TABLE deliveries (


delivery_id INTEGER PRIMARY KEY,
driver_id INTEGER,
delivery_start_time TIMESTAMP,
delivery_end_time TIMESTAMP
);
• - Datasets
INSERT INTO deliveries (delivery_id, driver_id, delivery_start_time, delivery_end_time)
VALUES
(1, 123, '2022-08-01 14:00:00', '2022-08-01 14:40:00'),
(2, 123, '2022-08-01 15:15:00', '2022-08-01 16:10:00'),
(3, 265, '2022-08-01 14:00:00', '2022-08-01 15:30:00'),
(4, 265, '2022-08-01 16:00:00', '2022-08-01 16:50:00'),
(5, 123, '2022-08-02 11:00:00', '2022-08-02 11:35:00');

Learnings
• Use of EXTRACT(EPOCH FROM ...) to calculate delivery durations in minutes.
• Use of RANK() to assign ranks based on daily average delivery duration.
• Use of AVG() window function to calculate the overall average delivery duration for each
driver.
• Use of PARTITION BY to perform calculations within partitions of data, such as per driver
and per day.
Solutions
• - PostgreSQL solution
SELECT
driver_id,
day,
avg_delivery_duration,
RANK() OVER (PARTITION BY driver_id ORDER BY avg_delivery_duration) AS rank,
AVG(avg_delivery_duration) OVER (PARTITION BY driver_id) AS overall_avg_delivery_dur
ation
FROM
(SELECT
driver_id,
DATE(delivery_start_time) AS day,
AVG(EXTRACT(EPOCH FROM (delivery_end_time - delivery_start_time)) / 60)
OVER (PARTITION BY driver_id, DATE(delivery_start_time)) AS avg_delivery_duratio
n
FROM
deliveries) subquery;
• - MySQL solution
SELECT
driver_id,
day,
avg_delivery_duration,
RANK() OVER (PARTITION BY driver_id ORDER BY avg_delivery_duration) AS rank,
AVG(avg_delivery_duration) OVER (PARTITION BY driver_id) AS overall_avg_delivery_dur
ation
FROM
(SELECT
driver_id,
DATE(delivery_start_time) AS day,
AVG(TIMESTAMPDIFF(SECOND, delivery_start_time, delivery_end_time) / 60)
OVER (PARTITION BY driver_id, DATE(delivery_start_time)) AS avg_delivery_duratio
n
FROM
deliveries) subquery;
• Q.565
Question
Identify the top 5 restaurants with the most orders in the last month from a database that
contains restaurants, users, and orders tables.

703
1000+ SQL Interview Questions & Answers | By Zero Analyst

Explanation
To solve this, we need to:
• Join the orders table with the restaurants table on the restaurant_id.
• Filter the orders to only include those within the last month using NOW() - INTERVAL '1
month'.
• Group the results by the restaurant name to count the number of orders for each restaurant.
• Sort the results by the order count in descending order to identify the top 5 restaurants.
• Use LIMIT 5 to restrict the results to the top 5 restaurants.
Datasets and SQL Schemas
• - restaurants Table
CREATE TABLE restaurants (
restaurant_id VARCHAR(10) PRIMARY KEY,
restaurant_name VARCHAR(100)
);
• - users Table
CREATE TABLE users (
user_id INT PRIMARY KEY,
user_name VARCHAR(100)
);
• - orders Table
CREATE TABLE orders (
order_id INT PRIMARY KEY,
user_id INT,
restaurant_id VARCHAR(10),
order_date TIMESTAMP
);
• - Datasets
INSERT INTO restaurants (restaurant_id, restaurant_name) VALUES
('001', 'Burger King'),
('002', 'KFC'),
('003', 'McDonald\'s'),
('004', 'Pizza Hut'),
('005', 'Starbucks');
INSERT INTO users (user_id, user_name) VALUES
(101, 'John Doe'),
(102, 'Jane Smith'),
(103, 'Bob Johnson'),
(104, 'Alice Anderson'),
(105, 'Emma Wilson');
INSERT INTO orders (order_id, user_id, restaurant_id, order_date) VALUES
(2001, 101, '001', '2022-10-01'),
(2002, 102, '002', '2022-10-02'),
(2003, 101, '003', '2022-10-03'),
(2004, 103, '002', '2022-10-04'),
(2005, 102, '001', '2022-10-05'),
(2006, 104, '004', '2022-10-06'),
(2007, 105, '005', '2022-10-07'),
(2008, 101, '001', '2022-10-08'),
(2009, 102, '002', '2022-10-09'),
(2010, 104, '005', '2022-10-10');

Learnings
• Use of JOIN to combine data from multiple tables based on a common key.
• Filtering data by date range with NOW() and INTERVAL.
• Grouping and aggregation using COUNT() to calculate the number of orders per restaurant.
• Sorting results with ORDER BY and limiting the output with LIMIT.
Solutions
• - PostgreSQL solution

704
1000+ SQL Interview Questions & Answers | By Zero Analyst

SELECT r.restaurant_name, COUNT(o.order_id) AS order_count


FROM orders o
JOIN restaurants r ON o.restaurant_id = r.restaurant_id
WHERE o.order_date >= NOW() - INTERVAL '1 month'
GROUP BY r.restaurant_name
ORDER BY order_count DESC
LIMIT 5;
• - MySQL solution
SELECT r.restaurant_name, COUNT(o.order_id) AS order_count
FROM orders o
JOIN restaurants r ON o.restaurant_id = r.restaurant_id
WHERE o.order_date >= NOW() - INTERVAL 1 MONTH
GROUP BY r.restaurant_name
ORDER BY order_count DESC
LIMIT 5;
• Q.566
Question
Swap the seat ID of every two consecutive students in the Seat table. If the number of
students is odd, the last student's ID should not be swapped. Return the result table ordered
by id in ascending order.
Explanation
To solve this problem:
• You need to swap the seat id of every two consecutive students.
• If the number of students is odd, the last student should remain in their original seat.
• Ensure the results are returned in ascending order of id.
The solution can be approached by:
• Using window functions or row numbering to create pairs of consecutive students.
• Swapping the values of id between these pairs.
• Ensuring that if the number of students is odd, the last student remains in the same seat.
• Sorting the final result by id.
Datasets and SQL Schemas
• - Seat Table
CREATE TABLE Seat (
id INT PRIMARY KEY,
student VARCHAR(100)
);
• - Datasets
INSERT INTO Seat (id, student) VALUES
(1, 'Abbot'),
(2, 'Doris'),
(3, 'Emerson'),
(4, 'Green'),
(5, 'Jeames');

Learnings
• You can use ROW_NUMBER() or MOD() functions to handle alternating rows and determine
which rows to swap.
• Use CASE or conditional logic to handle the odd number of rows.
• Sorting by id ensures the final result is in ascending order.
Solutions
• - PostgreSQL/MySQL solution
WITH swapped_seats AS (

705
1000+ SQL Interview Questions & Answers | By Zero Analyst

SELECT
id,
student,
ROW_NUMBER() OVER (ORDER BY id) AS row_num
FROM Seat
)
SELECT
CASE
WHEN row_num % 2 = 1 AND row_num + 1 <= (SELECT COUNT(*) FROM Seat) THEN (SELECT
student FROM swapped_seats WHERE row_num = swapped_seats.row_num + 1)
WHEN row_num % 2 = 0 AND row_num - 1 >= 1 THEN (SELECT student FROM swapped_seat
s WHERE row_num = swapped_seats.row_num - 1)
ELSE student
END AS student,
id
FROM swapped_seats
ORDER BY id;

Explanation of the solution:


• ROW_NUMBER() assigns a sequential number to each row based on the id, which helps in
identifying consecutive rows.
• The CASE statement checks if the row number is odd or even and swaps the student value
accordingly. If the row number is odd, it fetches the next student's name (for swapping); if
even, it fetches the previous student's name.
• If the row is the last one in an odd-length list, it leaves the student in the original position.
• Q.567
Question
Find the name of the user who has rated the greatest number of movies. In case of a tie, return
the lexicographically smaller user name.
Explanation
To solve these problems:
• For the first task:
• We need to count how many movies each user has rated, and then identify the user who
has rated the greatest number of movies. If there is a tie, we return the lexicographically
smaller user name.
Datasets and SQL Schemas
• - Movies Table
CREATE TABLE Movies (
movie_id INT PRIMARY KEY,
title VARCHAR(100)
);
• - Users Table
CREATE TABLE Users (
user_id INT PRIMARY KEY,
name VARCHAR(100)
);
• - MovieRating Table
CREATE TABLE MovieRating (
movie_id INT,
user_id INT,
rating INT,
created_at DATE,
PRIMARY KEY (movie_id, user_id)
);
• - Example Data
-- Insert into Movies table
INSERT INTO Movies (movie_id, title) VALUES

706
1000+ SQL Interview Questions & Answers | By Zero Analyst

(1, 'Inception'),
(2, 'Titanic'),
(3, 'Avatar'),
(4, 'The Dark Knight');

-- Insert into Users table


INSERT INTO Users (user_id, name) VALUES
(1, 'John'),
(2, 'Jane'),
(3, 'Alice'),
(4, 'Bob');

-- Insert into MovieRating table


INSERT INTO MovieRating (movie_id, user_id, rating, created_at) VALUES
(1, 1, 5, '2020-02-01'),
(1, 2, 4, '2020-02-05'),
(2, 3, 3, '2020-02-10'),
(3, 4, 5, '2020-02-15'),
(4, 1, 5, '2020-02-20'),
(2, 4, 4, '2020-02-25'),
(3, 2, 4, '2020-02-28');

Learnings
• Use aggregation (COUNT(), AVG()) to summarize data.
• Handle ties by sorting the result lexicographically (ORDER BY with LIMIT).
• Filtering for a specific date range (WHERE clause with date conditions).
Solutions
1. Find the user who has rated the greatest number of movies
SELECT u.name
FROM Users u
JOIN MovieRating mr ON u.user_id = mr.user_id
GROUP BY u.name
ORDER BY COUNT(mr.movie_id) DESC, u.name ASC
LIMIT 1;

Explanation:
• JOIN the Users table with the MovieRating table on user_id.
• GROUP BY the user_name to count the number of movies rated by each user.
• ORDER BY first by the count of ratings in descending order, then by the name
lexicographically in ascending order to break ties.
• Q.568
Question
Find the movie name with the highest average rating in February 2020. In case of a tie, return
the lexicographically smaller movie name.
Explanation
To solve these problems:
For the second task:
• We need to find the movie with the highest average rating in February 2020. If multiple
movies have the same average rating, we return the movie with the lexicographically smallest
name.
Datasets and SQL Schemas
• - Movies Table
CREATE TABLE Movies (
movie_id INT PRIMARY KEY,

707
1000+ SQL Interview Questions & Answers | By Zero Analyst

title VARCHAR(100)
);
• - Users Table
CREATE TABLE Users (
user_id INT PRIMARY KEY,
name VARCHAR(100)
);
• - MovieRating Table
CREATE TABLE MovieRating (
movie_id INT,
user_id INT,
rating INT,
created_at DATE,
PRIMARY KEY (movie_id, user_id)
);
• - Example Data
-- Insert into Movies table
INSERT INTO Movies (movie_id, title) VALUES
(1, 'Inception'),
(2, 'Titanic'),
(3, 'Avatar'),
(4, 'The Dark Knight');

-- Insert into Users table


INSERT INTO Users (user_id, name) VALUES
(1, 'John'),
(2, 'Jane'),
(3, 'Alice'),
(4, 'Bob');

-- Insert into MovieRating table


INSERT INTO MovieRating (movie_id, user_id, rating, created_at) VALUES
(1, 1, 5, '2020-02-01'),
(1, 2, 4, '2020-02-05'),
(2, 3, 3, '2020-02-10'),
(3, 4, 5, '2020-02-15'),
(4, 1, 5, '2020-02-20'),
(2, 4, 4, '2020-02-25'),
(3, 2, 4, '2020-02-28');

Learnings
• Use aggregation (COUNT(), AVG()) to summarize data.
• Handle ties by sorting the result lexicographically (ORDER BY with LIMIT).
• Filtering for a specific date range (WHERE clause with date conditions).
Solutions
Find the movie with the highest average rating in February 2020
SELECT m.title
FROM Movies m
JOIN MovieRating mr ON m.movie_id = mr.movie_id
WHERE mr.created_at BETWEEN '2020-02-01' AND '2020-02-29'
GROUP BY m.title
ORDER BY AVG(mr.rating) DESC, m.title ASC
LIMIT 1;

Explanation:
• JOIN the Movies table with the MovieRating table on movie_id.
• WHERE filter the ratings to only include those created in February 2020 (from 2020-02-01
to 2020-02-29).
• GROUP BY movie title to calculate the average rating for each movie.
• ORDER BY first by the average rating in descending order, then by the movie title
lexicographically in ascending order to break ties.

708
1000+ SQL Interview Questions & Answers | By Zero Analyst

• LIMIT 1 to return the movie with the highest average rating (and lexicographically
smallest title in case of a tie).
• Q.569
Question
Find the top 3 delivery partners who have made the highest number of deliveries in the last
30 days (consider today as ‘5th Feb 2024’). In case of a tie, return the delivery partners in
lexicographical order.
Explanation
• You need to count the number of deliveries made by each delivery partner in the last 30
days.
• Sort the result by the delivery count in descending order. In case of ties, order the delivery
partners lexicographically.
• Return only the top 3 delivery partners.
• Today’s date is given as ‘5th February 2024’ to ensure consistent results.
Datasets and SQL Schemas
• Deliveries Table
CREATE TABLE Deliveries (
delivery_id INT PRIMARY KEY,
delivery_partner VARCHAR(100),
delivery_date DATE
);
• Datasets
INSERT INTO Deliveries (delivery_id, delivery_partner, delivery_date) VALUES
(1, 'Partner A', '2024-01-01'),
(2, 'Partner B', '2024-01-02'),
(3, 'Partner A', '2024-01-05'),
(4, 'Partner C', '2024-01-10'),
(5, 'Partner B', '2024-01-12'),
(6, 'Partner A', '2024-01-15'),
(7, 'Partner C', '2024-01-20'),
(8, 'Partner B', '2024-01-25'),
(9, 'Partner D', '2024-10-29'),
(10, 'Partner A','2024-02-01');

Learnings
• Using COUNT() for aggregation.
• Sorting by both numerical and alphabetical order.
• Handling time-based filtering with DATE and INTERVAL.
• Using an explicit date for filtering the last 30 days.
Solution
SELECT delivery_partner, COUNT(delivery_id) AS delivery_count
FROM Deliveries
WHERE delivery_date >= '2024-02-05' - INTERVAL 30 DAY
GROUP BY delivery_partner
ORDER BY delivery_count DESC, delivery_partner ASC
LIMIT 3;

Explanation of Changes:
• Explicit Date Handling:
• In the original solution, CURDATE() was used, which automatically considers the current
date. However, since the question specifically asks to consider ‘5th February 2024’ as today's
date, I have explicitly used the date '2024-02-05' in the query.

709
1000+ SQL Interview Questions & Answers | By Zero Analyst

• INTERVAL 30 DAY:
• We subtract 30 days from '2024-02-05' to get the date range for the last 30 days (i.e.,
from 2024-01-06 to 2024-02-05).

• Q.570
Question
List all the restaurants that have received more than 10 orders in the month of December
2023.
Explanation
• You need to count how many orders each restaurant received in December 2023.
• Return only those restaurants that have more than 10 orders.
Datasets and SQL Schemas
• - Orders Table
CREATE TABLE Orders (
order_id INT PRIMARY KEY,
restaurant_id INT,
order_date DATE
);
• - Datasets
INSERT INTO Orders (order_id, restaurant_id, order_date) VALUES
(1, 101, '2023-12-01'),
(2, 101, '2023-12-03'),
(3, 102, '2023-12-04'),
(4, 102, '2023-12-05'),
(5, 101, '2023-12-06'),
(6, 103, '2023-12-07'),
(7, 101, '2023-12-08'),
(8, 102, '2023-12-09'),
(9, 101, '2023-12-10'),
(10, 104, '2023-12-11'),
(11, 101, '2023-12-12'),
(12, 101, '2023-12-13'),
(13, 102, '2023-12-14'),
(14, 103, '2023-12-15'),
(15, 101, '2023-12-16'),
(16, 104, '2023-12-17'),
(17, 101, '2023-12-18');

Learnings
• Counting rows with a GROUP BY.
• Filtering based on date conditions.
• Using HAVING for conditions after GROUP BY.
Solution
SELECT restaurant_id
FROM Orders
WHERE order_date BETWEEN '2023-12-01' AND '2023-12-31'
GROUP BY restaurant_id
HAVING COUNT(order_id) > 10;
• Q.571
Question
Calculate the average order value (price per order) for each restaurant for the month of
December 2023. In case a restaurant did not receive any orders in December, show the
restaurant's name as well, but with a NULL for the average order value.

710
1000+ SQL Interview Questions & Answers | By Zero Analyst

Explanation
• You need to join the Orders and Restaurant tables to get the average order value for each
restaurant.
• If the restaurant did not receive any orders in December, return NULL for the average order
value.
• Handle restaurants with zero orders in December as well.
Datasets and SQL Schemas
• - Orders Table
CREATE TABLE Orders (
order_id INT PRIMARY KEY,
restaurant_id INT,
order_date DATE,
price DECIMAL(10,2)
);
• - Restaurants Table
CREATE TABLE Restaurants (
restaurant_id INT PRIMARY KEY,
restaurant_name VARCHAR(100)
);
• - Datasets
INSERT INTO Orders (order_id, restaurant_id, order_date, price) VALUES
(1, 101, '2023-12-01', 250.00),
(2, 101, '2023-12-03', 300.00),
(3, 102, '2023-12-04', 450.00),
(4, 101, '2023-12-06', 275.00),
(5, 103, '2023-12-07', 150.00),
(6, 101, '2023-12-08', 400.00),
(7, 104, '2023-12-09', 350.00),
(8, 101, '2023-12-10', 500.00),
(9, 102, '2023-12-15', 400.00),
(10, 101, '2023-12-16', 350.00),
(11, 101, '2023-12-18', 450.00);
• - Restaurants Table
INSERT INTO Restaurants (restaurant_id, restaurant_name) VALUES
(101, 'Burger King'),
(102, 'McDonalds'),
(103, 'Pizza Hut'),
(104, 'Starbucks');

Learnings
• Handling missing data (using LEFT JOIN to include all restaurants).
• Grouping and calculating averages with AVG().
• Using conditional aggregation with COALESCE() to handle NULL values.
Solution
SELECT r.restaurant_name,
AVG(o.price) AS avg_order_value
FROM Restaurants r
LEFT JOIN Orders o
ON r.restaurant_id = o.restaurant_id
AND o.order_date BETWEEN '2023-12-01' AND '2023-12-31'
GROUP BY r.restaurant_name;
• Q.572
Question
Find the number of orders placed by each customer, and categorize them into three groups:
• 'Frequent' (more than 10 orders),
• 'Regular' (between 5 and 10 orders),

711
1000+ SQL Interview Questions & Answers | By Zero Analyst

• 'Occasional' (less than 5 orders).


Explanation
• You need to count the number of orders placed by each customer.
• Categorize each customer based on the number of orders they placed using a CASE
statement.
• Use GROUP BY to group the results by customer.
Datasets and SQL Schemas
• Orders Table
CREATE TABLE Orders (
order_id INT PRIMARY KEY,
customer_id INT,
order_date DATE
);
• Datasets
INSERT INTO Orders (order_id, customer_id, order_date) VALUES
(1, 101, '2024-01-01'),
(2, 102, '2024-01-03'),
(3, 101, '2024-01-05'),
(4, 103, '2024-01-07'),
(5, 101, '2024-01-09'),
(6, 101, '2024-01-10'),
(7, 102, '2024-01-12'),
(8, 103, '2024-01-13'),
(9, 104, '2024-01-15'),
(10, 101, '2024-01-18');

Learnings
• Using COUNT() to aggregate data.
• Using CASE to categorize data based on conditions.
• Using GROUP BY for grouping by customer.
Solution
SELECT customer_id,
COUNT(order_id) AS total_orders,
CASE
WHEN COUNT(order_id) > 10 THEN 'Frequent'
WHEN COUNT(order_id) BETWEEN 5 AND 10 THEN 'Regular'
ELSE 'Occasional'
END AS order_category
FROM Orders
GROUP BY customer_id;

These three questions cover different aspects of SQL:


• CASE Statements for categorizing data.
• Window Functions (RANK) to rank items based on a calculated field.
• GROUP BY to calculate aggregate values with additional complexity (percentages and
ordering).
• Q.573
Question
Find the average delivery time (in minutes) for each delivery partner. Additionally, rank the
delivery partners based on their average delivery time, with the fastest ranked first.
Explanation

712
1000+ SQL Interview Questions & Answers | By Zero Analyst

• You need to calculate the average delivery time for each delivery partner using AVG() and
the TIMESTAMPDIFF function.
• Use RANK() as a window function to rank delivery partners based on their average delivery
time.
Datasets and SQL Schemas
• Deliveries Table
CREATE TABLE Deliveries (
delivery_id INT PRIMARY KEY,
delivery_partner VARCHAR(100),
delivery_start_time TIMESTAMP,
delivery_end_time TIMESTAMP
);
• Datasets
INSERT INTO Deliveries (delivery_id, delivery_partner, delivery_start_time, delivery_end
_time) VALUES
(1, 'Partner A', '2024-01-01 10:00:00', '2024-01-01 10:45:00'),
(2, 'Partner B', '2024-01-01 10:30:00', '2024-01-01 11:00:00'),
(3, 'Partner A', '2024-01-02 11:00:00', '2024-01-02 11:40:00'),
(4, 'Partner B', '2024-01-02 14:00:00', '2024-01-02 14:30:00'),
(5, 'Partner C', '2024-01-03 08:30:00', '2024-01-03 09:00:00'),
(6, 'Partner C', '2024-01-03 10:00:00', '2024-01-03 10:30:00');

Learnings
• Using AVG() to calculate the average delivery time.
• Using TIMESTAMPDIFF to calculate the delivery duration in minutes.
• Using RANK() to rank the delivery partners based on their performance.
Solution
SELECT delivery_partner,
AVG(TIMESTAMPDIFF(MINUTE, delivery_start_time, delivery_end_time)) AS avg_deliver
y_time,
RANK() OVER (ORDER BY AVG(TIMESTAMPDIFF(MINUTE, delivery_start_time, delivery_end
_time))) AS rank
FROM Deliveries
GROUP BY delivery_partner;
• Q.574
Question
Identify the top 5 customers with the highest total spending in the last 30 days, and calculate
their rank based on the total amount spent. Additionally, show the percentage of their total
spending relative to the entire spending of all customers in the last 30 days.
Explanation
• You need to calculate the total spending of each customer in the last 30 days.
• Rank customers based on the total amount spent.
• Calculate the percentage of each customer’s spending relative to the total spending of all
customers in the last 30 days.
Datasets and SQL Schemas
• Orders Table
CREATE TABLE Orders (
order_id INT PRIMARY KEY,
customer_id INT,
order_date DATE,
total_amount DECIMAL(10, 2)
);
• Datasets

713
1000+ SQL Interview Questions & Answers | By Zero Analyst

INSERT INTO Orders (order_id, customer_id, order_date, total_amount) VALUES


(1, 101, '2024-01-01', 100.00),
(2, 102, '2024-01-03', 150.00),
(3, 103, '2024-01-05', 200.00),
(4, 101, '2024-01-06', 50.00),
(5, 104, '2024-01-07', 250.00),
(6, 102, '2024-01-09', 80.00),
(7, 103, '2024-01-10', 300.00),
(8, 101, '2024-01-15', 120.00),
(9, 104, '2024-01-17', 180.00),
(10, 105, '2024-01-20', 200.00);

Learnings
• Using SUM() to calculate the total spending per customer.
• Using RANK() to rank the customers based on total spending.
• Calculating the percentage of each customer’s spending relative to the total spending of all
customers.
Solution
WITH TotalSpending AS (
SELECT customer_id,
SUM(total_amount) AS customer_spending
FROM Orders
WHERE order_date >= CURDATE() - INTERVAL 30 DAY
GROUP BY customer_id
),
TotalSpendingAll AS (
SELECT SUM(total_amount) AS total_spending
FROM Orders
WHERE order_date >= CURDATE() - INTERVAL 30 DAY
)

SELECT t.customer_id,
t.customer_spending,
RANK() OVER (ORDER BY t.customer_spending DESC) AS rank,
ROUND(100.0 * t.customer_spending / tsa.total_spending, 2) AS spending_percentage
FROM TotalSpending t, TotalSpendingAll tsa
ORDER BY t.customer_spending DESC
LIMIT 5;
• Q.575

Question
Compute the moving average of how much the customer paid in a seven-day window, where
the moving average is calculated for the current day and the 6 days before. The result should
be rounded to two decimal places and ordered by visited_on.

Explanation
You need to calculate a rolling average for each day over the past 7 days (current day + 6
previous days) for the amount column. This involves using window functions to compute the
moving average for each day and ensuring the result is ordered by visited_on.

Datasets and SQL Schemas


-- Table creation
CREATE TABLE Customer (
customer_id INT,
name VARCHAR,
visited_on DATE,
amount INT,
PRIMARY KEY (customer_id, visited_on)
);

714
1000+ SQL Interview Questions & Answers | By Zero Analyst

-- Insert sample data


INSERT INTO Customer (customer_id, name, visited_on, amount)
VALUES
(1, 'Jhon', '2019-01-01', 100),
(2, 'Daniel', '2019-01-02', 110),
(3, 'Jade', '2019-01-03', 120),
(4, 'Khaled', '2019-01-04', 130),
(5, 'Winston', '2019-01-05', 110),
(6, 'Elvis', '2019-01-06', 140),
(7, 'Anna', '2019-01-07', 150),
(8, 'Maria', '2019-01-08', 80),
(9, 'Jaze', '2019-01-09', 110),
(1, 'Jhon', '2019-01-10', 130),
(3, 'Jade', '2019-01-10', 150);

Learnings
• Using window functions like AVG() with OVER clause for moving averages.
• Understanding date ranges and rolling windows for aggregate calculations.
• How to use ORDER BY and date filters in SQL.

Solutions
PostgreSQL solution
SELECT
visited_on,
SUM(amount) AS amount,
ROUND(AVG(amount) OVER (ORDER BY visited_on ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
), 2) AS average_amount
FROM
Customer
GROUP BY
visited_on
ORDER BY
visited_on;

MySQL solution
SELECT
visited_on,
SUM(amount) AS amount,
ROUND(AVG(amount) OVER (ORDER BY visited_on ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
), 2) AS average_amount
FROM
Customer
GROUP BY
visited_on
ORDER BY
visited_on;
• Q.576
Question
Calculate the total amount of food wasted due to order cancellations for each day. The food
waste is equal to the total value of the canceled orders. Return the result ordered by
order_date in ascending order.

Explanation
You need to compute the total amount of food waste per day based on the canceled orders.
This involves summing up the amount of canceled orders for each order_date. Ensure the
result is ordered by the order_date in ascending order.

Datasets and SQL Schemas

715
1000+ SQL Interview Questions & Answers | By Zero Analyst

-- Table creation
CREATE TABLE Orders (
order_id INT,
customer_id INT,
order_date DATE,
amount INT,
status VARCHAR(20),
PRIMARY KEY (order_id)
);

-- Insert sample data


INSERT INTO Orders (order_id, customer_id, order_date, amount, status)
VALUES
(1, 101, '2023-01-01', 50, 'completed'),
(2, 102, '2023-01-01', 40, 'canceled'),
(3, 103, '2023-01-02', 60, 'completed'),
(4, 104, '2023-01-02', 30, 'canceled'),
(5, 105, '2023-01-03', 70, 'completed'),
(6, 106, '2023-01-03', 20, 'canceled'),
(7, 107, '2023-01-04', 80, 'completed'),
(8, 108, '2023-01-04', 100, 'canceled'),
(9, 109, '2023-01-05', 90, 'completed'),
(10, 110, '2023-01-05', 50, 'canceled');

Learnings
• Using conditional aggregation (SUM()) with a WHERE clause to filter specific order statuses.
• Understanding how to aggregate data for each date.
• Ensuring data is properly filtered and ordered.

Solutions
PostgreSQL solution
SELECT
order_date,
SUM(amount) AS total_food_waste
FROM
Orders
WHERE
status = 'canceled'
GROUP BY
order_date
ORDER BY
order_date;

MySQL solution
SELECT
order_date,
SUM(amount) AS total_food_waste
FROM
Orders
WHERE
status = 'canceled'
GROUP BY
order_date
ORDER BY
order_date;

• Q.577
Question 1: Customer Retention Analysis
Question

716
1000+ SQL Interview Questions & Answers | By Zero Analyst

Calculate the customer retention rate for each month. The retention rate is defined as the
percentage of customers who made at least one order in a given month and also made an
order in the previous month.
Explanation
To calculate customer retention:
• Identify customers who placed orders in a given month.
• Identify customers who placed orders in the previous month.
• Calculate the percentage of customers who ordered in both the given month and the
previous month.
Datasets and SQL Schemas
-- Table creation
CREATE TABLE orders (
order_id INT PRIMARY KEY,
customer_id VARCHAR(20),
order_date DATE,
total_amount DECIMAL(10, 2)
);

-- Insert sample data


INSERT INTO orders (order_id, customer_id, order_date, total_amount)
VALUES
(1, 'C001', '2023-01-10', 500),
(2, 'C002', '2023-01-15', 200),
(3, 'C003', '2023-02-05', 700),
(4, 'C001', '2023-02-15', 400),
(5, 'C004', '2023-02-18', 150),
(6, 'C003', '2023-03-01', 250),
(7, 'C005', '2023-03-15', 300),
(8, 'C001', '2023-03-20', 600),
(9, 'C002', '2023-03-25', 450);

Learnings
• Using EXTRACT(MONTH FROM date) to group by month.
• Using JOIN to find customers who ordered in two consecutive months.
• Calculating percentages and aggregating data.
Solutions
PostgreSQL solution
WITH Retention AS (
SELECT
EXTRACT(MONTH FROM order_date) AS order_month,
EXTRACT(YEAR FROM order_date) AS order_year,
customer_id
FROM orders
GROUP BY customer_id, EXTRACT(YEAR FROM order_date), EXTRACT(MONTH FROM order_date)
)
SELECT
a.order_year,
a.order_month,
COUNT(DISTINCT a.customer_id) AS customers_ordered_in_month,
COUNT(DISTINCT b.customer_id) AS customers_retained,
ROUND((COUNT(DISTINCT b.customer_id)::DECIMAL / COUNT(DISTINCT a.customer_id)) * 100
, 2) AS retention_rate
FROM
Retention a
LEFT JOIN
Retention b ON a.customer_id = b.customer_id
AND a.order_year = b.order_year
AND a.order_month = b.order_month - 1
GROUP BY

717
1000+ SQL Interview Questions & Answers | By Zero Analyst

a.order_year, a.order_month
ORDER BY
a.order_year, a.order_month;

MySQL solution
WITH Retention AS (
SELECT
YEAR(order_date) AS order_year,
MONTH(order_date) AS order_month,
customer_id
FROM orders
GROUP BY customer_id, YEAR(order_date), MONTH(order_date)
)
SELECT
a.order_year,
a.order_month,
COUNT(DISTINCT a.customer_id) AS customers_ordered_in_month,
COUNT(DISTINCT b.customer_id) AS customers_retained,
ROUND((COUNT(DISTINCT b.customer_id) / COUNT(DISTINCT a.customer_id)) * 100, 2) AS r
etention_rate
FROM
Retention a
LEFT JOIN
Retention b ON a.customer_id = b.customer_id
AND a.order_year = b.order_year
AND a.order_month = b.order_month - 1
GROUP BY
a.order_year, a.order_month
ORDER BY
a.order_year, a.order_month;
• Q.578
Top-selling Products in Each Region
Question
Identify the top 3 selling products in each region for the last quarter of 2023 (October,
November, December). The total sales are calculated by summing up the total_amount for
each product. Return the region, product_id, and total sales for the top 3 products in each
region, ordered by sales in descending order.
Explanation
• Filter orders from the last quarter (October, November, December) of 2023.
• Sum the total_amount for each product_id by region.
• Rank the products within each region by total sales.
• Select the top 3 products per region.
Datasets and SQL Schemas
-- Table creation
CREATE TABLE orders (
order_id INT PRIMARY KEY,
customer_id VARCHAR(20),
product_id VARCHAR(20),
total_amount DECIMAL(10, 2),
order_date DATE,
region VARCHAR(50)
);

-- Insert sample data


INSERT INTO orders (order_id, customer_id, product_id, total_amount, order_date, region)
VALUES
(1, 'C001', 'P001', 500, '2023-10-05', 'North'),
(2, 'C002', 'P002', 200, '2023-10-15', 'North'),
(3, 'C003', 'P001', 300, '2023-11-10', 'South'),
(4, 'C004', 'P003', 700, '2023-11-18', 'North'),
(5, 'C001', 'P002', 400, '2023-12-01', 'West'),

718
1000+ SQL Interview Questions & Answers | By Zero Analyst

(6, 'C002', 'P003', 250, '2023-12-10', 'South'),


(7, 'C003', 'P001', 600, '2023-12-20', 'North'),
(8, 'C004', 'P004', 150, '2023-12-22', 'West'),
(9, 'C001', 'P005', 350, '2023-12-25', 'South');

Learnings
• Using SUM() for aggregating sales.
• Filtering based on a specific quarter and using EXTRACT(MONTH FROM date) for date
filtering.
• Using RANK() or ROW_NUMBER() to rank products by sales within each region.
Solutions
PostgreSQL solution
WITH ProductSales AS (
SELECT
region,
product_id,
SUM(total_amount) AS total_sales
FROM orders
WHERE order_date BETWEEN '2023-10-01' AND '2023-12-31'
GROUP BY region, product_id
),
RankedProducts AS (
SELECT
region,
product_id,
total_sales,
ROW_NUMBER() OVER (PARTITION BY region ORDER BY total_sales DESC) AS rank
FROM ProductSales
)
SELECT
region,
product_id,
total_sales
FROM RankedProducts
WHERE rank <= 3
ORDER BY region, total_sales DESC;

MySQL solution
WITH ProductSales AS (
SELECT
region,
product_id,
SUM(total_amount) AS total_sales
FROM orders
WHERE order_date BETWEEN '2023-10-01' AND '2023-12-31'
GROUP BY region, product_id
),
RankedProducts AS (
SELECT
region,
product_id,
total_sales,
@rank := IF(@prev_region = region, @rank + 1, 1) AS rank,
@prev_region := region
FROM ProductSales
ORDER BY region, total_sales DESC
)
SELECT
region,
product_id,
total_sales
FROM RankedProducts
WHERE rank <= 3
ORDER BY region, total_sales DESC;
• Q.579

719
1000+ SQL Interview Questions & Answers | By Zero Analyst

Question
Calculate the year-over-year growth rate of the amount spent by each customer. The growth
rate is calculated as the difference in the total amount spent between two consecutive years
divided by the amount spent in the earlier year.

Explanation
To calculate the year-over-year growth rate:
• First, aggregate the total amount spent by each customer for each year.
• Then, calculate the difference in the total amount spent between two consecutive years for
each customer.
• Finally, compute the growth rate as the difference divided by the amount spent in the
earlier year.

Datasets and SQL Schemas


-- Table creation
CREATE TABLE customer_purchases (
Customer_id VARCHAR(20),
Product_id VARCHAR,
Purchase_amount DECIMAL(10,2),
Purchase_date DATE
);

-- Insert sample data


INSERT INTO customer_purchases (Customer_id, Product_id, Purchase_amount, Purchase_date)
VALUES
('C001', 'P001', 100.00, '2023-01-15'),
('C001', 'P002', 150.00, '2023-05-20'),
('C001', 'P003', 200.00, '2022-06-10'),
('C001', 'P004', 250.00, '2022-09-01'),
('C002', 'P001', 300.00, '2022-02-05'),
('C002', 'P003', 400.00, '2022-07-20'),
('C002', 'P004', 500.00, '2023-03-15');

Learnings
• Aggregating data by year using YEAR() function.
• Calculating the year-over-year growth rate.
• Using JOIN to align data from consecutive years.

Solutions
PostgreSQL solution
WITH YearlySpend AS (
SELECT
Customer_id,
EXTRACT(YEAR FROM Purchase_date) AS Purchase_year,
SUM(Purchase_amount) AS total_spent
FROM
customer_purchases
GROUP BY
Customer_id, EXTRACT(YEAR FROM Purchase_date)
)
SELECT
a.Customer_id,
a.Purchase_year AS year,
ROUND(((a.total_spent - b.total_spent) / b.total_spent) * 100, 2) AS yoy_growth_rate
FROM
YearlySpend a
JOIN
YearlySpend b ON a.Customer_id = b.Customer_id

720
1000+ SQL Interview Questions & Answers | By Zero Analyst

AND a.Purchase_year = b.Purchase_year + 1


ORDER BY
a.Customer_id, a.Purchase_year;

MySQL solution
WITH YearlySpend AS (
SELECT
Customer_id,
YEAR(Purchase_date) AS Purchase_year,
SUM(Purchase_amount) AS total_spent
FROM
customer_purchases
GROUP BY
Customer_id, YEAR(Purchase_date)
)
SELECT
a.Customer_id,
a.Purchase_year AS year,
ROUND(((a.total_spent - b.total_spent) / b.total_spent) * 100, 2) AS yoy_growth_rate
FROM
YearlySpend a
JOIN
YearlySpend b ON a.Customer_id = b.Customer_id
AND a.Purchase_year = b.Purchase_year + 1
ORDER BY
a.Customer_id, a.Purchase_year;
• Q.580
Question
Select the Supplier_id, Product_id, and the start date of the period when the stock quantity
was below 50 units for more than two consecutive days.

Explanation
To solve this:
• We need to identify periods where the Stock_quantity was below 50 for more than two
consecutive days.
• For this, we can use a window function or self-join to check if a product's stock remained
below 50 units for at least three consecutive days.
• Return the Supplier_id, Product_id, and the start date of that period.

Datasets and SQL Schemas


-- Table creation
CREATE TABLE supplier_inventory (
Supplier_id VARCHAR(20),
Product_id VARCHAR(20),
Stock_quantity INT,
Record_date DATE
);

-- Insert sample data


INSERT INTO supplier_inventory (Supplier_id, Product_id, Stock_quantity, Record_date)
VALUES
('S001', 'P001', 40, '2023-01-01'),
('S001', 'P001', 45, '2023-01-02'),
('S001', 'P001', 30, '2023-01-03'),
('S001', 'P001', 20, '2023-01-04'),
('S001', 'P001', 10, '2023-01-05'),
('S001', 'P002', 55, '2023-01-01'),
('S001', 'P002', 50, '2023-01-02'),
('S001', 'P002', 60, '2023-01-03'),
('S002', 'P003', 40, '2023-01-01'),
('S002', 'P003', 30, '2023-01-02'),
('S002', 'P003', 25, '2023-01-03');

721
1000+ SQL Interview Questions & Answers | By Zero Analyst

Learnings
• Using self-joins or window functions to identify consecutive dates.
• Filtering on Stock_quantity values that are below 50 for more than two consecutive
days.
• Handling date sequences for consecutive periods.

Solutions
PostgreSQL solution
WITH ConsecutiveDays AS (
SELECT
Supplier_id,
Product_id,
Record_date,
LAG(Record_date, 1) OVER (PARTITION BY Supplier_id, Product_id ORDER BY Record_d
ate) AS prev_date,
LEAD(Record_date, 1) OVER (PARTITION BY Supplier_id, Product_id ORDER BY Record_
date) AS next_date,
Stock_quantity
FROM
supplier_inventory
)
SELECT
Supplier_id,
Product_id,
MIN(Record_date) AS start_date
FROM
ConsecutiveDays
WHERE
Stock_quantity < 50
AND prev_date IS NOT NULL
AND next_date IS NOT NULL
AND (Record_date - prev_date = INTERVAL '1 day' AND next_date - Record_date = INTERV
AL '1 day')
GROUP BY
Supplier_id, Product_id
ORDER BY
Supplier_id, Product_id, start_date;

MySQL solution
WITH ConsecutiveDays AS (
SELECT
Supplier_id,
Product_id,
Record_date,
LAG(Record_date, 1) OVER (PARTITION BY Supplier_id, Product_id ORDER BY Record_d
ate) AS prev_date,
LEAD(Record_date, 1) OVER (PARTITION BY Supplier_id, Product_id ORDER BY Record_
date) AS next_date,
Stock_quantity
FROM
supplier_inventory
)
SELECT
Supplier_id,
Product_id,
MIN(Record_date) AS start_date
FROM
ConsecutiveDays
WHERE
Stock_quantity < 50
AND prev_date IS NOT NULL
AND next_date IS NOT NULL
AND DATEDIFF(Record_date, prev_date) = 1
AND DATEDIFF(next_date, Record_date) = 1
GROUP BY
Supplier_id, Product_id

722
1000+ SQL Interview Questions & Answers | By Zero Analyst

ORDER BY
Supplier_id, Product_id, start_date;

Key Concepts
• LAG() and LEAD(): These window functions help identify the previous and next dates to
detect consecutive days.
• MIN(Record_date): Used to find the start date of the period.
• DATEDIFF(): Used in MySQL to calculate the difference between two dates to check if the
dates are consecutive.
• Consecutive period logic: Ensure the gap between the current date and the previous and
next dates is exactly one day to confirm the consecutive sequence.

Tesla
• Q.581
Question
Write a query to identify the average charging duration at different Tesla charging stations,
and rank the stations by the duration of charging time for each model of Tesla vehicle.

Explanation
To solve this:
• Group the data by charging_station_id and vehicle_model.
• Calculate the average charging duration for each charging station and model combination.
• Rank the stations for each model based on the average charging duration.
• Return the charging_station_id, vehicle_model, average charging duration, and rank.

Datasets and SQL Schemas


-- Table creation
CREATE TABLE tesla_charging_data (
charging_station_id VARCHAR(20),
vehicle_model VARCHAR(50),
charging_start_time DATETIME,
charging_end_time DATETIME
);

-- Insert sample data


INSERT INTO tesla_charging_data (charging_station_id, vehicle_model, charging_start_time
, charging_end_time)
VALUES
('S001', 'Model S', '2023-01-01 08:00:00', '2023-01-01 08:30:00'),
('S001', 'Model 3', '2023-01-01 09:00:00', '2023-01-01 09:40:00'),
('S002', 'Model X', '2023-01-01 10:00:00', '2023-01-01 10:50:00'),
('S002', 'Model S', '2023-01-01 11:00:00', '2023-01-01 11:25:00'),
('S003', 'Model 3', '2023-01-01 12:00:00', '2023-01-01 12:30:00'),
('S003', 'Model Y', '2023-01-01 13:00:00', '2023-01-01 13:45:00'),
('S001', 'Model 3', '2023-01-02 08:00:00', '2023-01-02 08:30:00'),
('S002', 'Model X', '2023-01-02 10:00:00', '2023-01-02 10:55:00');

Learnings
• Use of TIMESTAMPDIFF() or EXTRACT() to calculate charging duration.
• Grouping by charging_station_id and vehicle_model for aggregation.
• Using RANK() to rank stations by the charging duration.
• Handling date and time calculations for durations.

723
1000+ SQL Interview Questions & Answers | By Zero Analyst

Solutions
PostgreSQL solution
WITH ChargingDurations AS (
SELECT
charging_station_id,
vehicle_model,
EXTRACT(EPOCH FROM (charging_end_time - charging_start_time)) / 60 AS charging_d
uration_minutes
FROM tesla_charging_data
),
AverageChargingDurations AS (
SELECT
charging_station_id,
vehicle_model,
AVG(charging_duration_minutes) AS avg_charging_duration
FROM ChargingDurations
GROUP BY charging_station_id, vehicle_model
)
SELECT
charging_station_id,
vehicle_model,
avg_charging_duration,
RANK() OVER (PARTITION BY vehicle_model ORDER BY avg_charging_duration DESC) AS stat
ion_rank
FROM AverageChargingDurations
ORDER BY vehicle_model, station_rank;

MySQL solution
WITH ChargingDurations AS (
SELECT
charging_station_id,
vehicle_model,
TIMESTAMPDIFF(MINUTE, charging_start_time, charging_end_time) AS charging_durati
on_minutes
FROM tesla_charging_data
),
AverageChargingDurations AS (
SELECT
charging_station_id,
vehicle_model,
AVG(charging_duration_minutes) AS avg_charging_duration
FROM ChargingDurations
GROUP BY charging_station_id, vehicle_model
)
SELECT
charging_station_id,
vehicle_model,
avg_charging_duration,
RANK() OVER (PARTITION BY vehicle_model ORDER BY avg_charging_duration DESC) AS stat
ion_rank
FROM AverageChargingDurations
ORDER BY vehicle_model, station_rank;

Key Concepts
• Charging Duration: Calculated as the difference between charging_end_time and
charging_start_time, expressed in minutes.
• AVG(): Aggregates the total charging time for each station and model combination.
• RANK(): Ranks the stations within each model category based on the average charging
duration in descending order.
• Grouping: Use of GROUP BY to aggregate data by charging_station_id and
vehicle_model.
• Q.582
Question

724
1000+ SQL Interview Questions & Answers | By Zero Analyst

Calculate the sum of tiv_2016 for all policyholders who have the same tiv_2015 value as
one or more other policyholders, and are not located in the same city as any other
policyholder (i.e., the (lat, lon) pairs must be unique). Round the tiv_2016 values to two
decimal places.

Explanation
• First, identify the policyholders who have the same tiv_2015 value as one or more other
policyholders.
• Then, ensure that the policyholders are located in unique cities by ensuring that (lat,
lon) pairs are not repeated.
• Finally, sum the tiv_2016 values for those policyholders and round the result to two
decimal places.

Datasets and SQL Schemas


-- Table creation
CREATE TABLE Insurance (
pid INT PRIMARY KEY,
tiv_2015 FLOAT,
tiv_2016 FLOAT,
lat FLOAT,
lon FLOAT
);

-- Insert sample data


INSERT INTO Insurance (pid, tiv_2015, tiv_2016, lat, lon)
VALUES
(1, 10, 5, 10, 10),
(2, 20, 20, 20, 20),
(3, 10, 30, 20, 20),
(4, 10, 40, 40, 40);

Learnings
• Use of JOIN or GROUP BY to identify common values (tiv_2015).
• Use of HAVING or filtering conditions to ensure unique cities based on (lat, lon).
• Conditional aggregation with SUM() and rounding in SQL.

Solutions
PostgreSQL solution
SELECT
ROUND(SUM(tiv_2016), 2) AS tiv_2016
FROM
Insurance i
WHERE
tiv_2015 IN (
SELECT tiv_2015
FROM Insurance
GROUP BY tiv_2015
HAVING COUNT(pid) > 1
)
AND (lat, lon) IN (
SELECT lat, lon
FROM Insurance
GROUP BY lat, lon
HAVING COUNT(pid) = 1
);

MySQL solution
SELECT
ROUND(SUM(tiv_2016), 2) AS tiv_2016

725
1000+ SQL Interview Questions & Answers | By Zero Analyst

FROM
Insurance i
WHERE
tiv_2015 IN (
SELECT tiv_2015
FROM Insurance
GROUP BY tiv_2015
HAVING COUNT(pid) > 1
)
AND (lat, lon) IN (
SELECT lat, lon
FROM Insurance
GROUP BY lat, lon
HAVING COUNT(pid) = 1
);

• Q.583
Question
Given two tables, one for Tesla vehicle production (with columns model_id,
production_date) and one for vehicle deliveries (with columns delivery_id, model_id,
delivery_date), write a query to calculate the average time it takes from production to
delivery for each vehicle model.

Explanation
To solve this:
• We need to join the production table with the deliveries table on model_id.
• For each vehicle model, calculate the difference between the delivery_date and
production_date.
• Calculate the average time between production and delivery for each model.
• Return the model ID and the average delivery time (in days).

Datasets and SQL Schemas


-- Table creation for production
CREATE TABLE vehicle_production (
model_id VARCHAR(50),
production_date DATE
);

-- Table creation for deliveries


CREATE TABLE vehicle_deliveries (
delivery_id INT PRIMARY KEY,
model_id VARCHAR(50),
delivery_date DATE
);

-- Insert sample data for production


INSERT INTO vehicle_production (model_id, production_date)
VALUES
('Model S', '2023-01-01'),
('Model 3', '2023-02-15'),
('Model X', '2023-03-10'),
('Model Y', '2023-04-01');

-- Insert sample data for deliveries


INSERT INTO vehicle_deliveries (delivery_id, model_id, delivery_date)
VALUES
(1, 'Model S', '2023-01-10'),
(2, 'Model 3', '2023-02-20'),
(3, 'Model X', '2023-03-15'),
(4, 'Model Y', '2023-04-10'),
(5, 'Model S', '2023-01-20'),

726
1000+ SQL Interview Questions & Answers | By Zero Analyst

(6, 'Model 3', '2023-02-25');

Learnings
• Joining tables: The query requires a JOIN between two tables using the model_id as the
common key.
• Date difference calculation: We calculate the difference between delivery_date and
production_date using date functions.
• Aggregation: Using AVG() to calculate the average time between production and delivery.
• Grouping: Group the results by model_id to calculate the average time per vehicle model.

Solutions
PostgreSQL solution
SELECT
p.model_id,
ROUND(AVG(EXTRACT(EPOCH FROM (d.delivery_date - p.production_date)) / 86400), 2) AS
avg_delivery_time_days
FROM
vehicle_production p
JOIN
vehicle_deliveries d
ON
p.model_id = d.model_id
GROUP BY
p.model_id
ORDER BY
p.model_id;

MySQL solution
SELECT
p.model_id,
ROUND(AVG(DATEDIFF(d.delivery_date, p.production_date)), 2) AS avg_delivery_time_day
s
FROM
vehicle_production p
JOIN
vehicle_deliveries d
ON
p.model_id = d.model_id
GROUP BY
p.model_id
ORDER BY
p.model_id;

Key Concepts
• JOIN: Used to combine data from the vehicle_production and vehicle_deliveries
tables.
• Date difference: In PostgreSQL, we use EXTRACT(EPOCH FROM (date1 - date2)) /
86400 to calculate the difference in days, and in MySQL, we use DATEDIFF() directly.
• AVG(): Aggregates the delivery times for each vehicle model to calculate the average.
• ROUND(): Rounds the result to two decimal places for readability.
• Q.584
Question
Write a query to calculate the month-on-month sales growth percentage for a specific Tesla
model, and display the sales figures, revenue, and growth percentages over the last 12
months.

727
1000+ SQL Interview Questions & Answers | By Zero Analyst

Explanation
To solve this:
• Filter the sales data for the specific Tesla model.
• Group the data by month, calculate the total number of units sold (sales) and the total
revenue (revenue) for each month.
• Calculate the month-on-month growth in sales and revenue.
• Display the sales, revenue, and growth percentage for each of the last 12 months.
Steps:
• Use GROUP BY to aggregate sales and revenue for each month.
• Use LAG() function to calculate the sales and revenue for the previous month to compute
growth percentages.
• Calculate growth percentages for both sales and revenue.
• Filter data to show only the last 12 months (use date filtering).

Datasets and SQL Schemas


-- Table creation for sales data
CREATE TABLE tesla_sales (
sale_id INT PRIMARY KEY,
model_id VARCHAR(50),
sale_date DATE,
units_sold INT,
revenue DECIMAL(10, 2)
);

-- Insert sample data


INSERT INTO tesla_sales (sale_id, model_id, sale_date, units_sold, revenue)
VALUES
(1, 'Model S', '2023-01-10', 50, 2500000),
(2, 'Model S', '2023-02-15', 60, 3000000),
(3, 'Model S', '2023-03-10', 55, 2750000),
(4, 'Model S', '2023-04-05', 70, 3500000),
(5, 'Model S', '2023-05-15', 80, 4000000),
(6, 'Model S', '2023-06-10', 75, 3750000),
(7, 'Model S', '2023-07-05', 85, 4250000),
(8, 'Model S', '2023-08-10', 90, 4500000),
(9, 'Model S', '2023-09-10', 95, 4750000),
(10, 'Model S', '2023-10-05', 100, 5000000),
(11, 'Model S', '2023-11-10', 110, 5500000),
(12, 'Model S', '2023-12-01', 120, 6000000);

Learnings
• GROUP BY: Group sales and revenue by month to aggregate data.
• LAG(): To calculate previous month's sales/revenue for growth calculation.
• Date filtering: Using WHERE to filter the data for the last 12 months.
• Growth calculation: The percentage growth formula:
Growth Percentage=(Current Month−Previous MonthPrevious Month)×100\text{Growth
Percentage} = \left(\frac{\text{Current Month} - \text{Previous Month}}{\text{Previous
Month}}\right) \times 100
• Date formatting: Using DATE_TRUNC() or MONTH() for extracting the month part of the
date.

Solutions
PostgreSQL solution

728
1000+ SQL Interview Questions & Answers | By Zero Analyst

WITH MonthlySales AS (
SELECT
model_id,
DATE_TRUNC('month', sale_date) AS sale_month,
SUM(units_sold) AS total_units_sold,
SUM(revenue) AS total_revenue
FROM tesla_sales
WHERE model_id = 'Model S'
AND sale_date >= CURRENT_DATE - INTERVAL '1 year'
GROUP BY model_id, DATE_TRUNC('month', sale_date)
),
GrowthCalculation AS (
SELECT
model_id,
sale_month,
total_units_sold,
total_revenue,
LAG(total_units_sold) OVER (ORDER BY sale_month) AS previous_month_units,
LAG(total_revenue) OVER (ORDER BY sale_month) AS previous_month_revenue
FROM MonthlySales
)
SELECT
model_id,
sale_month,
total_units_sold AS sales,
total_revenue AS revenue,
ROUND(((total_units_sold - previous_month_units) / NULLIF(previous_month_units, 0))
* 100, 2) AS sales_growth_percentage,
ROUND(((total_revenue - previous_month_revenue) / NULLIF(previous_month_revenue, 0))
* 100, 2) AS revenue_growth_percentage
FROM GrowthCalculation
ORDER BY sale_month DESC
LIMIT 12;

MySQL solution
WITH MonthlySales AS (
SELECT
model_id,
DATE_FORMAT(sale_date, '%Y-%m-01') AS sale_month,
SUM(units_sold) AS total_units_sold,
SUM(revenue) AS total_revenue
FROM tesla_sales
WHERE model_id = 'Model S'
AND sale_date >= CURDATE() - INTERVAL 1 YEAR
GROUP BY model_id, DATE_FORMAT(sale_date, '%Y-%m-01')
),
GrowthCalculation AS (
SELECT
model_id,
sale_month,
total_units_sold,
total_revenue,
LAG(total_units_sold) OVER (ORDER BY sale_month) AS previous_month_units,
LAG(total_revenue) OVER (ORDER BY sale_month) AS previous_month_revenue
FROM MonthlySales
)
SELECT
model_id,
sale_month,
total_units_sold AS sales,
total_revenue AS revenue,
ROUND(((total_units_sold - previous_month_units) / NULLIF(previous_month_units, 0))
* 100, 2) AS sales_growth_percentage,
ROUND(((total_revenue - previous_month_revenue) / NULLIF(previous_month_revenue, 0))
* 100, 2) AS revenue_growth_percentage
FROM GrowthCalculation
ORDER BY sale_month DESC
LIMIT 12;

Key Concepts

729
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Date truncation: In PostgreSQL, DATE_TRUNC('month', sale_date) is used to group by


the start of the month. In MySQL, DATE_FORMAT(sale_date, '%Y-%m-01') is used for the
same purpose.
• Growth calculation: LAG() allows us to get the sales/revenue of the previous month for
comparison.
• NULLIF(): Prevents division by zero by returning NULL if the previous month's value is
zero.
• Filtering: Only includes sales from the last 12 months using date filtering (CURRENT_DATE
- INTERVAL '1 year' or CURDATE() - INTERVAL 1 YEAR).
• Q.585
Question
Write an SQL query that calculates the efficiency of electric vehicle charging stations by
dividing the total kWh of energy dispensed at each station by the total charging time (in
hours), and then order the stations based on their efficiency from highest to lowest.

Explanation
To solve this:
• Group the data by charging_station_id to aggregate the total energy dispensed (in
kWh) and the total charging time (in hours).
• Calculate the efficiency by dividing the total kWh by the total charging time (in hours).
• Order the stations based on efficiency in descending order.
Steps:
• Sum of Energy Dispensed: Calculate the total kWh dispensed at each station.
• Sum of Charging Time: Calculate the total charging time in hours.
• Efficiency Calculation: The formula for efficiency is:
Efficiency=Total kWhTotal Charging Time (hours)\text{Efficiency} = \frac{\text{Total
kWh}}{\text{Total Charging Time (hours)}}
• Sorting: Sort the results by efficiency in descending order.

Datasets and SQL Schemas


-- Table creation for charging station data
CREATE TABLE ev_charging_sessions (
charging_station_id VARCHAR(50),
charging_start_time DATETIME,
charging_end_time DATETIME,
energy_dispensed_kWh DECIMAL(10, 2) -- energy dispensed in kWh
);

-- Insert sample data


INSERT INTO ev_charging_sessions (charging_station_id, charging_start_time, charging_end
_time, energy_dispensed_kWh)
VALUES
('Station_A', '2023-01-01 08:00:00', '2023-01-01 09:00:00', 50.00),
('Station_B', '2023-01-01 08:30:00', '2023-01-01 09:30:00', 60.00),
('Station_A', '2023-01-01 10:00:00', '2023-01-01 11:00:00', 55.00),
('Station_C', '2023-01-02 12:00:00', '2023-01-02 14:00:00', 100.00),
('Station_B', '2023-01-02 15:00:00', '2023-01-02 16:30:00', 70.00),
('Station_A', '2023-01-02 17:00:00', '2023-01-02 18:30:00', 80.00),
('Station_C', '2023-01-03 09:00:00', '2023-01-03 11:00:00', 90.00);

Learnings

730
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Date/Time calculations: We need to calculate the charging time (in hours) by finding the
difference between charging_end_time and charging_start_time.
• Aggregation: Use SUM() to calculate total energy dispensed and total charging time for
each station.
• Efficiency formula: The efficiency of a station is calculated as the ratio of energy
dispensed (kWh) to charging time (hours).
• Ordering: The results are ordered by efficiency from highest to lowest.

Solutions
PostgreSQL solution
SELECT
charging_station_id,
ROUND(SUM(energy_dispensed_kWh) / SUM(EXTRACT(EPOCH FROM (charging_end_time - chargi
ng_start_time)) / 3600), 2) AS efficiency
FROM
ev_charging_sessions
GROUP BY
charging_station_id
ORDER BY
efficiency DESC;

MySQL solution
SELECT
charging_station_id,
ROUND(SUM(energy_dispensed_kWh) / SUM(TIMESTAMPDIFF(SECOND, charging_start_time, cha
rging_end_time) / 3600), 2) AS efficiency
FROM
ev_charging_sessions
GROUP BY
charging_station_id
ORDER BY
efficiency DESC;

Key Concepts
• Energy Dispensed: The total energy dispensed at each station is aggregated using
SUM(energy_dispensed_kWh).
• Charging Time Calculation: In PostgreSQL, EXTRACT(EPOCH FROM
(charging_end_time - charging_start_time)) / 3600 gives the time in hours, while
in MySQL, TIMESTAMPDIFF(SECOND, charging_start_time, charging_end_time) /
3600 calculates the time in hours.
• Efficiency Calculation: The efficiency is calculated by dividing the total kWh by the total
charging time (in hours).
• ROUND(): Used to round the efficiency to two decimal places for better readability.
• Ordering: The stations are ordered by efficiency in descending order, showing the most
efficient stations first.
• Q.586
Question
Given a dataset of Tesla charging stations, we'd like to analyze the usage pattern. The dataset
captures when a Tesla car starts charging, finishes charging, and the charging station used.
Calculate the total charging time at each station and compare it with the previous day.

Explanation
To solve this:

731
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Calculate the Total Charging Time: For each charging session, compute the difference
between end_time and start_time and convert this to hours. We then sum these values to
get the total charging time per day for each station.
• Compare with Previous Day: Use LAG() to calculate the difference in charging time
between the current day and the previous day for each station.
• Group and Order Data: Group the data by station_id and the truncated date of
start_time (to get the day), then order the results by station_id and date.

Datasets and SQL Schemas


-- Table creation for charging sessions
CREATE TABLE charging_data (
charge_id INT PRIMARY KEY,
start_time TIMESTAMP,
end_time TIMESTAMP,
station_id INT,
car_id INT
);

-- Insert sample data


INSERT INTO charging_data (charge_id, start_time, end_time, station_id, car_id)
VALUES
(1001, '2022-07-01 08:00:00', '2022-07-01 09:00:00', 2001, 3001),
(1002, '2022-07-01 12:00:00', '2022-07-01 13:00:00', 2001, 3002),
(1003, '2022-07-02 10:00:00', '2022-07-02 11:00:00', 2002, 3003),
(1004, '2022-07-02 11:30:00', '2022-07-02 12:30:00', 2001, 3001);

Learnings
• Date Truncation: Using date_trunc() to truncate the timestamp to the day ensures that
we aggregate the data by day.
• Time Calculation: Use EXTRACT(EPOCH FROM (end_time - start_time)) / 3600 to
calculate the difference between end_time and start_time in hours.
• LAG() Window Function: The LAG() function is used to access the value from the
previous day, allowing us to compute the difference in total charging hours between
consecutive days.
• Grouping and Ordering: The query groups by station_id and the truncated date
(charge_day), then orders by station_id and charge_day to ensure the correct sequence.

Solutions
PostgreSQL solution
SELECT
station_id,
date_trunc('day', start_time) AS charge_day,
SUM(EXTRACT(EPOCH FROM (end_time - start_time))/3600) AS total_charge_hours,
(SUM(EXTRACT(EPOCH FROM (end_time - start_time))/3600)
- LAG(SUM(EXTRACT(EPOCH FROM (end_time - start_time))/3600), 1, 0)
OVER (PARTITION BY station_id ORDER BY date_trunc('day', start_time))
) AS diff_prev_day_hours
FROM
charging_data
GROUP BY
station_id, charge_day
ORDER BY
station_id, charge_day;

MySQL solution
SELECT
station_id,
DATE(DATE_FORMAT(start_time, '%Y-%m-%d')) AS charge_day,

732
1000+ SQL Interview Questions & Answers | By Zero Analyst

ROUND(SUM(TIMESTAMPDIFF(SECOND, start_time, end_time))/3600, 2) AS total_charge_hour


s,
ROUND(
(SUM(TIMESTAMPDIFF(SECOND, start_time, end_time))/3600
- COALESCE(LAG(SUM(TIMESTAMPDIFF(SECOND, start_time, end_time))/3600, 1)
OVER (PARTITION BY station_id ORDER BY DATE(DATE_FORMAT(start_time, '%Y-%m-%d'))
), 0)
), 2) AS diff_prev_day_hours
FROM
charging_data
GROUP BY
station_id, charge_day
ORDER BY
station_id, charge_day;

Key Concepts
• LAG(): This window function allows us to retrieve the value from the previous row (in this
case, the previous day's total charging hours) in a partitioned and ordered dataset.
• Time Difference: We calculate the difference between start_time and end_time using
EXTRACT(EPOCH FROM (end_time - start_time)) for PostgreSQL and TIMESTAMPDIFF()
in MySQL.
• Grouping and Aggregation: By grouping by station_id and charge_day, we calculate
the total charging time for each station on each day.
• Handling NULLs: The COALESCE() function in MySQL ensures that if there is no
previous day's data (i.e., the first day), we substitute NULL with 0.

Output
The query will output the following columns:
• station_id: The ID of the charging station.
• charge_day: The day when the charging took place (truncated from start_time).
• total_charge_hours: The total charging hours for that station on that day.
• diff_prev_day_hours: The difference in charging hours compared to the previous day for
that station.
• Q.587
Question
Write a SQL query that determines which parts have begun the assembly process but are not
yet finished.

Explanation
To solve this problem:
• Identify unfinished parts: We assume that the finish_date being NULL indicates that the
part is still in the assembly process.
• Query the table: We need to filter the records where finish_date is NULL to find the
parts that haven't been finished yet.
• Return the part and assembly step: The query should return the part name (or ID) and
the corresponding assembly step.

Datasets and SQL Schemas


-- Table creation for parts assembly
CREATE TABLE parts_assembly (
part VARCHAR(50),

733
1000+ SQL Interview Questions & Answers | By Zero Analyst

assembly_step VARCHAR(50),
start_date DATE,
finish_date DATE
);

-- Insert sample data


INSERT INTO parts_assembly (part, assembly_step, start_date, finish_date)
VALUES
('Battery', 'Step 1', '2022-07-01', NULL),
('Motor', 'Step 2', '2022-07-02', '2022-07-05'),
('Door', 'Step 3', '2022-07-03', NULL),
('Window', 'Step 4', '2022-07-04', NULL),
('Seat', 'Step 5', '2022-07-05', '2022-07-06');

Learnings
• Filtering with NULL: We use the condition finish_date IS NULL to find parts that are
unfinished.
• Basic SELECT and WHERE: This problem involves basic SQL queries that use SELECT,
FROM, and WHERE to filter data.
• No JOINs: This problem does not require any joins or complex operations since it only
involves filtering one table based on a condition.

Solutions
PostgreSQL solution
SELECT part, assembly_step
FROM parts_assembly
WHERE finish_date IS NULL;

MySQL solution
SELECT part, assembly_step
FROM parts_assembly
WHERE finish_date IS NULL;

Key Concepts
• NULL Handling: In SQL, NULL represents missing or undefined values. Here,
finish_date IS NULL helps identify rows where the part is still in the assembly process.
• Simple Query: The query involves a straightforward SELECT with a WHERE clause to filter
out the finished parts.
• Q.588
Question
Write a query to calculate the 3-day weighted moving average of sales for each product using
weights: 0.5 (current day), 0.3 (previous day), 0.2 (two days ago).

Explanation
To solve this:
• Identify the Weighted Moving Average (WMA): The formula for the 3-day weighted
moving average is:

• Use Window Functions: We can use LAG() to get sales from the previous day and two
days ago.

734
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Calculate Weighted Average: Apply the weights (0.5, 0.3, 0.2) on the respective sales
figures for the current day, previous day, and two days ago.

Datasets and SQL Schemas


-- Table creation for product sales
CREATE TABLE product_sales (
product_id INT,
sale_date DATE,
sales_amount DECIMAL(10, 2)
);

-- Insert sample data


INSERT INTO product_sales (product_id, sale_date, sales_amount)
VALUES
(101, '2022-07-01', 500.00),
(101, '2022-07-02', 600.00),
(101, '2022-07-03', 550.00),
(101, '2022-07-04', 700.00),
(102, '2022-07-01', 300.00),
(102, '2022-07-02', 350.00),
(102, '2022-07-03', 400.00),
(102, '2022-07-04', 450.00);

Learnings
• Window Functions: The LAG() function is useful to get previous values (sales amounts
from previous days).
• Weighted Calculations: Use basic arithmetic to calculate the weighted moving average.
• Handling NULLs: For the first two days, the LAG() function will return NULL for missing
values, which can be handled appropriately.

Solutions
PostgreSQL solution
SELECT
product_id,
sale_date,
ROUND(
0.5 * sales_amount +
0.3 * COALESCE(LAG(sales_amount, 1) OVER (PARTITION BY product_id ORDER BY sale_
date), 0) +
0.2 * COALESCE(LAG(sales_amount, 2) OVER (PARTITION BY product_id ORDER BY sale_
date), 0),
2) AS weighted_moving_avg
FROM
product_sales
ORDER BY
product_id, sale_date;

MySQL solution
SELECT
product_id,
sale_date,
ROUND(
0.5 * sales_amount +
0.3 * COALESCE(LAG(sales_amount, 1) OVER (PARTITION BY product_id ORDER BY sale_
date), 0) +
0.2 * COALESCE(LAG(sales_amount, 2) OVER (PARTITION BY product_id ORDER BY sale_
date), 0),
2) AS weighted_moving_avg
FROM
product_sales
ORDER BY
product_id, sale_date;

735
1000+ SQL Interview Questions & Answers | By Zero Analyst

Key Concepts
• LAG() Window Function: LAG() allows us to access data from the previous rows within
the same partition, which is essential for calculating the moving average.
• COALESCE(): This function handles NULL values that may arise from the LAG() function,
substituting them with 0 for missing sales data on the first two days.
• Weighted Moving Average: The moving average is calculated by applying specific
weights to the sales amounts of the current day and the previous two days.
• Rounding: We use ROUND() to format the result to two decimal places, as required.

Output
This query will return the product_id, sale_date, and the 3-day weighted moving average
of sales (weighted_moving_avg) for each product, ordered by product_id and sale_date.
• Q.589
Question
Write an SQL query to identify the top 10 states with the highest Tesla sales for the past year,
grouped by vehicle model.

Explanation
To solve this:
• Filter Sales for the Last Year: We need to filter the data to include only sales that
occurred in the past year. This can be done by comparing the sale_date with the current
date minus one year.
• Group by State and Vehicle Model: We group the results by state and vehicle_model
to aggregate sales for each combination.
• Sum the Sales: For each state and vehicle model, we need to calculate the total sales,
typically by summing up a sales amount column (e.g., sales_amount).
• Rank the States: Using the RANK() or ROW_NUMBER() function, we can rank states based
on their sales, and then select the top 10 based on these ranks.

Datasets and SQL Schemas


-- Table creation for Tesla sales
CREATE TABLE tesla_sales (
sale_id INT PRIMARY KEY,
vehicle_model VARCHAR(50),
state VARCHAR(50),
sales_amount DECIMAL(10, 2),
sale_date DATE
);

-- Insert sample data


INSERT INTO tesla_sales (sale_id, vehicle_model, state, sales_amount, sale_date)
VALUES
(1, 'Model S', 'California', 70000, '2023-05-01'),
(2, 'Model 3', 'California', 40000, '2023-06-15'),
(3, 'Model X', 'Texas', 80000, '2023-04-10'),
(4, 'Model Y', 'New York', 50000, '2023-07-25'),
(5, 'Model S', 'California', 75000, '2023-08-11'),
(6, 'Model 3', 'Texas', 40000, '2023-09-05'),
(7, 'Model X', 'Florida', 70000, '2023-03-18'),
(8, 'Model Y', 'California', 55000, '2023-11-05'),
(9, 'Model 3', 'New York', 35000, '2023-05-14'),
(10, 'Model S', 'Texas', 70000, '2023-06-19'),
(11, 'Model 3', 'California', 60000, '2023-09-10'),

736
1000+ SQL Interview Questions & Answers | By Zero Analyst

(12, 'Model X', 'Nevada', 65000, '2023-08-15'),


(13, 'Model Y', 'Texas', 70000, '2023-05-03'),
(14, 'Model S', 'Florida', 75000, '2023-07-22'),
(15, 'Model 3', 'Washington', 40000, '2023-06-10'),
(16, 'Model X', 'Oregon', 72000, '2023-04-20'),
(17, 'Model S', 'New York', 85000, '2023-03-25'),
(18, 'Model Y', 'Illinois', 78000, '2023-06-30'),
(19, 'Model 3', 'Nevada', 65000, '2023-09-01'),
(20, 'Model X', 'New Jersey', 60000, '2023-05-20'),
(21, 'Model S', 'North Carolina', 70000, '2023-02-10'),
(22, 'Model 3', 'Colorado', 55000, '2023-06-01'),
(23, 'Model Y', 'Georgia', 60000, '2023-07-05'),
(24, 'Model X', 'Arizona', 50000, '2023-08-12'),
(25, 'Model S', 'Nevada', 80000, '2023-09-12'),
(26, 'Model 3', 'Utah', 60000, '2023-07-15'),
(27, 'Model S', 'Washington', 65000, '2023-03-18'),
(28, 'Model 3', 'Virginia', 40000, '2023-05-18'),
(29, 'Model Y', 'California', 70000, '2023-07-20'),
(30, 'Model X', 'Ohio', 55000, '2023-06-25'),
(31, 'Model 3', 'Florida', 75000, '2023-04-30'),
(32, 'Model Y', 'Texas', 65000, '2023-03-12'),
(33, 'Model X', 'Georgia', 60000, '2023-08-05'),
(34, 'Model S', 'South Carolina', 80000, '2023-07-18'),
(35, 'Model 3', 'Minnesota', 60000, '2023-04-05'),
(36, 'Model S', 'Tennessee', 55000, '2023-06-05'),
(37, 'Model X', 'Michigan', 45000, '2023-05-25'),
(38, 'Model Y', 'Missouri', 50000, '2023-09-01'),
(39, 'Model 3', 'Indiana', 55000, '2023-08-25'),
(40, 'Model X', 'Louisiana', 60000, '2023-07-28'),
(41, 'Model Y', 'Kentucky', 65000, '2023-05-10'),
(42, 'Model 3', 'Montana', 70000, '2023-04-18'),
(43, 'Model S', 'New Hampshire', 75000, '2023-09-05'),
(44, 'Model Y', 'Oregon', 60000, '2023-08-20'),
(45, 'Model 3', 'Delaware', 65000, '2023-07-12'),
(46, 'Model X', 'Wisconsin', 50000, '2023-05-06'),
(47, 'Model S', 'Nebraska', 70000, '2023-03-30'),
(48, 'Model Y', 'Idaho', 60000, '2023-02-15'),
(49, 'Model 3', 'Rhode Island', 40000, '2023-06-12'),
(50, 'Model S', 'Alabama', 85000, '2023-09-01');

Learnings
• Date Filtering: Use CURRENT_DATE or NOW() to filter records for the past year.
• Aggregating Data: Group the data by state and vehicle_model to calculate the total
sales.
• Window Functions: Use RANK() or ROW_NUMBER() to rank states by their total sales for
each model.
• Handling Sales: The sales_amount column is summed to get the total sales for each state
and model.

Solutions
PostgreSQL solution
WITH ranked_sales AS (
SELECT
state,
vehicle_model,
SUM(sales_amount) AS total_sales,
RANK() OVER (PARTITION BY vehicle_model ORDER BY SUM(sales_amount) DESC) AS sale
s_rank
FROM tesla_sales
WHERE sale_date >= CURRENT_DATE - INTERVAL '1 year'
GROUP BY state, vehicle_model
)
SELECT state, vehicle_model, total_sales
FROM ranked_sales
WHERE sales_rank <= 10

737
1000+ SQL Interview Questions & Answers | By Zero Analyst

ORDER BY vehicle_model, total_sales DESC;

MySQL solution
WITH ranked_sales AS (
SELECT
state,
vehicle_model,
SUM(sales_amount) AS total_sales,
RANK() OVER (PARTITION BY vehicle_model ORDER BY SUM(sales_amount) DESC) AS sale
s_rank
FROM tesla_sales
WHERE sale_date >= CURDATE() - INTERVAL 1 YEAR
GROUP BY state, vehicle_model
)
SELECT state, vehicle_model, total_sales
FROM ranked_sales
WHERE sales_rank <= 10
ORDER BY vehicle_model, total_sales DESC;

Key Concepts
• RANK() Window Function: This function ranks each state by the total sales for each
vehicle model. States with the highest sales for a model get a rank of 1, 2, etc. We filter the
results to only include the top 10 states based on sales for each model.
• Date Filtering: We filter the sales data for the last year using CURRENT_DATE - INTERVAL
'1 year' (PostgreSQL) or CURDATE() - INTERVAL 1 YEAR (MySQL).
• SUM() Aggregation: The SUM(sales_amount) calculates the total sales for each
combination of state and vehicle model.
• Q.590
Question
Write an SQL query to calculate the click-through conversion rates for Tesla's digital ads,
from viewing a digital ad to adding a product (vehicle model) to the cart.

Explanation
To calculate the click-through conversion rate:
• Group Click Data: First, we need to count how many times each product was clicked
within each ad campaign by grouping the ad_clicks table by ad_campaign and
product_model.
• Group Add Data: Next, count how many times a product was added to the cart from the
add_to_carts table, grouping by product_model.
• Calculate Conversion Rate: For each ad campaign and product, we calculate the
conversion rate as:
Conversion Rate

where "Number of Adds" refers to the number of times a product was added to the cart, and
"Number of Clicks" refers to the number of times a user clicked the ad.
• Join the Tables: The ad_clicks and add_to_carts tables are joined on product_model
to get the relevant click and add data for each product within each campaign.

738
1000+ SQL Interview Questions & Answers | By Zero Analyst

Datasets and SQL Schemas


-- Table creation for ad_clicks (Digital Ad Clicks)
CREATE TABLE ad_clicks (
click_id INT PRIMARY KEY,
user_id INT,
click_date DATETIME,
ad_campaign VARCHAR(50),
product_model VARCHAR(50)
);

-- Table creation for add_to_carts (Products Added to Cart)


CREATE TABLE add_to_carts (
cart_id INT PRIMARY KEY,
user_id INT,
add_date DATETIME,
product_model VARCHAR(50)
);

-- Sample Data for ad_clicks


INSERT INTO ad_clicks (click_id, user_id, click_date, ad_campaign, product_model)
VALUES
(1256, 867, '2022-06-08 00:00:00', 'Campaign1', 'Model S'),
(2453, 345, '2022-06-08 00:00:00', 'Campaign2', 'Model X'),
(4869, 543, '2022-06-10 00:00:00', 'Campaign1', 'Model 3'),
(7853, 543, '2022-06-18 00:00:00', 'Campaign3', 'Model Y'),
(3248, 865, '2022-07-26 00:00:00', 'Campaign2', 'Model S');

-- Sample Data for add_to_carts


INSERT INTO add_to_carts (cart_id, user_id, add_date, product_model)
VALUES
(1234, 867, '2022-06-08 00:00:00', 'Model S'),
(7324, 345, '2022-06-10 00:00:00', 'Model X'),
(6271, 543, '2022-06-11 00:00:00', 'Model 3');

Learnings
• Aggregation: Use COUNT() with GROUP BY to aggregate the number of clicks and adds for
each product.
• Joining: Join the tables on the product_model to combine the data from both tables (click
and add actions).
• Conversion Rate Calculation: The conversion rate is calculated by dividing the number
of adds by the number of clicks, then multiplying by 100 to get the percentage.

Solutions
PostgreSQL Solution
WITH clicks AS (
SELECT ad_campaign, product_model, COUNT(*) AS num_clicks
FROM ad_clicks
GROUP BY ad_campaign, product_model
),
adds AS (
SELECT product_model, COUNT(*) AS num_adds
FROM add_to_carts
GROUP BY product_model
)
SELECT clicks.ad_campaign,
clicks.product_model,
clicks.num_clicks,
adds.num_adds,
(adds.num_adds::DECIMAL / clicks.num_clicks) * 100 AS conversion_rate
FROM clicks
JOIN adds ON clicks.product_model = adds.product_model;

MySQL Solution
WITH clicks AS (

739
1000+ SQL Interview Questions & Answers | By Zero Analyst

SELECT ad_campaign, product_model, COUNT(*) AS num_clicks


FROM ad_clicks
GROUP BY ad_campaign, product_model
),
adds AS (
SELECT product_model, COUNT(*) AS num_adds
FROM add_to_carts
GROUP BY product_model
)
SELECT clicks.ad_campaign,
clicks.product_model,
clicks.num_clicks,
adds.num_adds,
(adds.num_adds / clicks.num_clicks) * 100 AS conversion_rate
FROM clicks
JOIN adds ON clicks.product_model = adds.product_model;

Output
The result will return the following columns:
• ad_campaign: The name of the ad campaign (e.g., 'Campaign1').
• product_model: The Tesla vehicle model (e.g., 'Model S').
• num_clicks: The total number of clicks for the product in the ad campaign.
• num_adds: The total number of times the product was added to the cart.
• conversion_rate: The click-through conversion rate for the ad campaign and product.
• Q.591
Question
Write an SQL query to identify the top 5 Tesla models with the highest average revenue per
sale in 2024. The query should return the model name, total number of sales, and the average
sale price for that model.
Explanation
• Filter for 2024 Sales: The sales data should be filtered for the year 2024.
• Aggregate by Model: For each model, calculate the total sales and average revenue.
• Sort and Limit: Sort the models by average revenue per sale in descending order and
return only the top 5 models.

Datasets and SQL Schemas


-- Table creation for sales data
CREATE TABLE sales (
sale_id INT PRIMARY KEY,
model_name VARCHAR(50),
sale_date DATE,
price DECIMAL(10, 2)
);

-- Sample Data for sales


INSERT INTO sales (sale_id, model_name, sale_date, price)
VALUES
(1, 'Model S', '2024-01-01', 80000),
(2, 'Model 3', '2024-01-05', 35000),
(3, 'Model X', '2024-02-10', 90000),
(4, 'Model Y', '2024-03-12', 50000),
(5, 'Model S', '2024-04-18', 79000),
(6, 'Model 3', '2024-05-20', 34000),
(7, 'Model X', '2024-06-10', 95000),
(8, 'Model Y', '2024-07-15', 52000),
(9, 'Cybertruck', '2024-08-21', 40000),
(10, 'Model 3', '2024-09-03', 36000),

740
1000+ SQL Interview Questions & Answers | By Zero Analyst

(11, 'Model Y', '2024-10-10', 54000),


(12, 'Model S', '2024-11-12', 78000),
(13, 'Model X', '2024-12-01', 91000),
(14, 'Cybertruck', '2024-12-04', 41000);

Learnings
• Date Filtering: Use YEAR() or EXTRACT() to filter sales data for the year 2024.
• Aggregation: Use COUNT() for total sales and AVG() for average price per sale.
• Sorting and Limiting: Use ORDER BY and LIMIT to sort the results and restrict it to the top
5.

Solutions
PostgreSQL Solution
SELECT model_name, COUNT(*) AS total_sales, AVG(price) AS avg_price
FROM sales
WHERE EXTRACT(YEAR FROM sale_date) = 2024
GROUP BY model_name
ORDER BY avg_price DESC
LIMIT 5;

MySQL Solution
SELECT model_name, COUNT(*) AS total_sales, AVG(price) AS avg_price
FROM sales
WHERE YEAR(sale_date) = 2024
GROUP BY model_name
ORDER BY avg_price DESC
LIMIT 5;

Output

model_name total_sales avg_price

Model X 3 92000

Model S 3 78333.33

Model Y 3 52000

Cybertruck 2 40500

Model 3 3 35000

Output

model_name total_miles avg_power_consumed

741
1000+ SQL Interview Questions & Answers | By Zero Analyst

Model 3 10550 408.33

Model X 4500 430

Model S 2300 410

Model Y 1800 365

• Q.592
Write an SQL query to calculate the total miles driven and average power consumption for
each Tesla model in 2024. The result should return the model name, total miles driven, and
the average power consumed for that model.
Explanation
• Filter for 2024 Service Data: The service data should be filtered for the year 2024.
• Aggregation: Calculate the total distance driven and average power consumption for each
model.
• Group by Model: Group the results by model_name.

Datasets and SQL Schemas


-- Table creation for service data
CREATE TABLE service_data (
record_id INT PRIMARY KEY,
vehicle_id VARCHAR(20),
distance_driven INT,
power_consumed INT,
service_date DATE
);

-- Sample Data for service data


INSERT INTO service_data (record_id, vehicle_id, distance_driven, power_consumed, servic
e_date)
VALUES
(1, '001', 1200, 400, '2024-01-10'),
(2, '002', 1000, 300, '2024-02-15'),
(3, '003', 1500, 500, '2024-03-12'),
(4, '004', 1300, 450, '2024-04-20'),
(5, '005', 1100, 350, '2024-05-15'),
(6, '001', 1100, 420, '2024-06-10'),
(7, '002', 1050, 320, '2024-07-22'),
(8, '003', 1600, 530, '2024-08-05'),
(9, '004', 1200, 400, '2024-09-10'),
(10, '005', 1150, 360, '2024-10-15'),
(11, '003', 1450, 480, '2024-11-05'),
(12, '002', 900, 290, '2024-12-01');

Learnings
• Date Filtering: Use EXTRACT() or YEAR() to filter records for the year 2024.
• Aggregation: Use SUM() for total distance driven and AVG() for average power
consumption.
• Grouping: Group the results by model_name to summarize data by Tesla model.

742
1000+ SQL Interview Questions & Answers | By Zero Analyst

Solutions
PostgreSQL Solution
SELECT model_name,
SUM(distance_driven) AS total_miles,
AVG(power_consumed) AS avg_power_consumed
FROM service_data s
JOIN vehicles v ON s.vehicle_id = v.vehicle_id
WHERE EXTRACT(YEAR FROM service_date) = 2024
GROUP BY model_name
ORDER BY total_miles DESC;

MySQL Solution
SELECT model_name,
SUM(distance_driven) AS total_miles,
AVG(power_consumed) AS avg_power_consumed
FROM service_data s
JOIN vehicles v ON s.vehicle_id = v.vehicle_id
WHERE YEAR(service_date) = 2024
GROUP BY model_name
ORDER BY total_miles DESC;
• Q.593
Identify Tesla Models with the Highest Number of Service Records in 2024
Problem Statement
You are tasked with identifying the Tesla models that have had the highest number of service
records in 2024. The result should return the model name and the total number of service
records for each model. The query should be sorted in descending order based on the number
of service records, and the model with the most service records should be displayed.

Datasets and SQL Schemas


Here is the schema and sample data for the service_data table that records the vehicle
service information, including the type of service performed and the service date.

service_data Table
-- Table definition for service data
CREATE TABLE service_data (
record_id INT PRIMARY KEY,
vehicle_id VARCHAR(20),
service_type VARCHAR(50),
service_date DATE
);

-- Sample data inserted into service_data table


INSERT INTO service_data (record_id, vehicle_id, service_type, service_date)
VALUES
(1, '001', 'Battery Check', '2024-01-10'),
(2, '002', 'Tire Replacement', '2024-02-15'),
(3, '003', 'Battery Check', '2024-03-12'),
(4, '004', 'Battery Check', '2024-04-20'),
(5, '001', 'Wheel Alignment', '2024-05-18'),
(6, '003', 'Battery Check', '2024-06-15'),
(7, '003', 'Tire Replacement', '2024-07-22'),
(8, '004', 'Battery Check', '2024-08-25'),
(9, '002', 'Tire Replacement', '2024-09-10'),
(10, '001', 'Battery Check', '2024-10-05'),
(11, '004', 'Tire Replacement', '2024-11-12'),
(12, '001', 'Battery Check', '2024-12-01');

743
1000+ SQL Interview Questions & Answers | By Zero Analyst

Key SQL Concepts and Operations


• Filtering by Year: Use EXTRACT(YEAR FROM service_date) or YEAR(service_date) to
filter the service data for the year 2024.
• Counting Service Records: The COUNT() function will be used to count the number of
records per model.
• Grouping by Model: Use the GROUP BY clause to group the results by the Tesla vehicle
model.
• Sorting the Results: The ORDER BY clause will help sort the models by the number of
service records in descending order.
• Limiting Results: In case you want only the model with the highest service records, LIMIT
1 will restrict the output.

Solution Query
The following query retrieves the Tesla model with the highest number of service records in
2024:

PostgreSQL Solution
SELECT v.model_name, COUNT(s.record_id) AS total_services
FROM service_data s
JOIN vehicles v ON s.vehicle_id = v.vehicle_id
WHERE EXTRACT(YEAR FROM service_date) = 2024
GROUP BY v.model_name
ORDER BY total_services DESC
LIMIT 1;

MySQL Solution
SELECT v.model_name, COUNT(s.record_id) AS total_services
FROM service_data s
JOIN vehicles v ON s.vehicle_id = v.vehicle_id
WHERE YEAR(service_date) = 2024
GROUP BY v.model_name
ORDER BY total_services DESC
LIMIT 1;
• Q.594
Question
Write an SQL query to identify the Tesla model with the highest number of customer
complaints in the UK for the year 2024. The query should return the model name and the
total number of complaints for that model.

Explanation
• Filter Complaints for the Year 2024: The complaints should be filtered for the year 2024.
• Filter for UK: The complaints should be specifically filtered for customers located in the
UK (you will likely need to filter by the country or similar field).
• Aggregation: The number of complaints for each model in 2024 should be aggregated.
• Identify the Model with Most Complaints: Use ORDER BY to sort the models by the
number of complaints in descending order, and use LIMIT to get the model with the highest
number of complaints.

Datasets and SQL Schemas


-- Table creation for customer_complaints
CREATE TABLE customer_complaints (

744
1000+ SQL Interview Questions & Answers | By Zero Analyst

complaint_id INT PRIMARY KEY,


model_name VARCHAR(50),
complaint_date DATE,
country VARCHAR(50)
);

-- Sample Data for customer_complaints


INSERT INTO customer_complaints (complaint_id, model_name, complaint_date, country)
VALUES
(1, 'Model S', '2024-01-15', 'UK'),
(2, 'Model 3', '2024-02-20', 'UK'),
(3, 'Model Y', '2024-03-12', 'UK'),
(4, 'Model 3', '2024-05-25', 'UK'),
(5, 'Model X', '2024-06-10', 'UK'),
(6, 'Model 3', '2024-07-15', 'UK'),
(7, 'Model Y', '2024-09-20', 'UK'),
(8, 'Cybertruck', '2024-10-05', 'US'),
(9, 'Model 3', '2024-11-18', 'UK'),
(10, 'Model X', '2024-12-01', 'UK'),
(11, 'Model 3', '2024-01-15', 'UK'),
(12, 'Model S', '2024-05-18', 'UK');

Learnings
• Filtering by Year: Use YEAR() or EXTRACT(YEAR FROM complaint_date) to filter
complaints for the year 2024.
• Filtering by Country: Filter complaints based on the country column (specifically 'UK').
• Aggregation: Use COUNT() to count the number of complaints for each model.
• Sorting: Sort the results by the number of complaints in descending order and limit the
result to one row.

Solutions
PostgreSQL Solution
SELECT model_name, COUNT(*) AS total_complaints
FROM customer_complaints
WHERE EXTRACT(YEAR FROM complaint_date) = 2024
AND country = 'UK'
GROUP BY model_name
ORDER BY total_complaints DESC
LIMIT 1;

MySQL Solution
SELECT model_name, COUNT(*) AS total_complaints
FROM customer_complaints
WHERE YEAR(complaint_date) = 2024
AND country = 'UK'
GROUP BY model_name
ORDER BY total_complaints DESC
LIMIT 1;

Output
The output will return the Tesla model with the highest number of complaints in the UK for
2024, along with the total number of complaints.

model_name total_complaints

Model 3 4
• Q.595

745
1000+ SQL Interview Questions & Answers | By Zero Analyst

Question
Write an SQL query to identify the Tesla model with the highest number of sales in the year
2024. The result should return the model name and the total number of units sold in that year.

Explanation
• Filter Sales for 2024: The sales data should be filtered for the year 2024.
• Aggregation: You need to count the total number of sales for each model in 2024.
• Identify Highest Sales: Use the ORDER BY clause and LIMIT to identify the model with the
highest sales.

Datasets and SQL Schemas


-- Table creation for sales
CREATE TABLE sales (
sale_id INT PRIMARY KEY,
model_name VARCHAR(50),
sale_date DATE,
price DECIMAL(10, 2)
);

-- Sample Data for sales


INSERT INTO sales (sale_id, model_name, sale_date, price)
VALUES
(1, 'Model S', '2024-01-15', 80000),
(2, 'Model 3', '2024-02-20', 35000),
(3, 'Model Y', '2024-03-12', 50000),
(4, 'Model 3', '2024-05-25', 35000),
(5, 'Model X', '2024-06-10', 90000),
(6, 'Model 3', '2024-07-15', 35000),
(7, 'Model Y', '2024-09-20', 50000),
(8, 'Cybertruck', '2024-10-05', 39999),
(9, 'Model Y', '2024-11-18', 50000),
(10, 'Model 3', '2024-12-01', 35000);

Learnings
• Filtering by Year: Use YEAR() or EXTRACT(YEAR FROM sale_date) to filter sales for the
year 2024.
• Aggregation: Use COUNT() to aggregate the number of sales for each model.
• Sorting: Use ORDER BY to sort by the number of sales in descending order and retrieve the
top model.

Solutions
PostgreSQL Solution
SELECT model_name, COUNT(*) AS total_sales
FROM sales
WHERE EXTRACT(YEAR FROM sale_date) = 2024
GROUP BY model_name
ORDER BY total_sales DESC
LIMIT 1;

MySQL Solution
SELECT model_name, COUNT(*) AS total_sales
FROM sales
WHERE YEAR(sale_date) = 2024
GROUP BY model_name
ORDER BY total_sales DESC
LIMIT 1;

Output

746
1000+ SQL Interview Questions & Answers | By Zero Analyst

The output will return the Tesla model with the highest sales in 2024, along with the number
of units sold.

model_name total_sales

Model 3 4
• Q.596

Question
Write an SQL query to identify the total sales of the Tesla Cybertruck in the month it was
launched. The query should calculate the total sales in terms of the number of units sold and
the total revenue generated (assuming sales price is included in the sales data).

Explanation
• Identifying the Launch Month: First, you need to identify the launch month of the Tesla
Cybertruck. This can be done using the sale_date column in the sales table.
• Filter Sales for Cybertruck: The model_name will be used to filter for Cybertruck sales.
• Calculate Total Sales: Use aggregation to calculate the total number of units sold and total
revenue in that launch month.

Datasets and SQL Schemas


-- Table creation for sales
CREATE TABLE sales (
sale_id INT PRIMARY KEY,
model_name VARCHAR(50),
sale_date DATE,
price DECIMAL(10, 2)
);

-- Sample Data for sales


INSERT INTO sales (sale_id, model_name, sale_date, price)
VALUES
(1, 'Cybertruck', '2021-11-20', 39999),
(2, 'Cybertruck', '2021-11-22', 39999),
(3, 'Cybertruck', '2021-11-23', 39999),
(4, 'Model S', '2021-11-25', 80000),
(5, 'Cybertruck', '2021-11-30', 39999),
(6, 'Model 3', '2021-11-27', 35000);

Learnings
• Date Filtering: Use MONTH() and YEAR() to filter by the specific launch month.
• Aggregation: Use COUNT() for total units sold and SUM() for total revenue.
• Filtering by Model: Use the model_name column to filter the data for Cybertruck sales
only.

Solutions
PostgreSQL Solution
SELECT
model_name,
EXTRACT(MONTH FROM sale_date) AS launch_month,
EXTRACT(YEAR FROM sale_date) AS launch_year,
COUNT(*) AS total_sales,
SUM(price) AS total_revenue
FROM

747
1000+ SQL Interview Questions & Answers | By Zero Analyst

sales
WHERE
model_name = 'Cybertruck'
AND EXTRACT(MONTH FROM sale_date) = 11 -- Month of launch (November)
AND EXTRACT(YEAR FROM sale_date) = 2021 -- Launch year
GROUP BY
model_name, launch_month, launch_year;

MySQL Solution
SELECT
model_name,
MONTH(sale_date) AS launch_month,
YEAR(sale_date) AS launch_year,
COUNT(*) AS total_sales,
SUM(price) AS total_revenue
FROM
sales
WHERE
model_name = 'Cybertruck'
AND MONTH(sale_date) = 11 -- Month of launch (November)
AND YEAR(sale_date) = 2021 -- Launch year
GROUP BY
model_name, launch_month, launch_year;

Output
The output will return the total number of Cybertruck units sold and the total revenue for the
launch month (November 2021).

model_name launch_month launch_year total_sales total_revenue

Cybertruck 11 2021 4 159996


• Q.597

Question
Write an SQL query to produce a report summarizing the average distance driven and
average power consumed by each Tesla vehicle model, grouped by the year of manufacture.
The report should include the model name, manufacture year, average distance driven (in
miles), and average power consumed (in kilowatt-hour).

Explanation
• Join Operation: You need to join the vehicles and service_data tables on the
vehicle_id to combine the data of each vehicle with its service records.
• Aggregation: Use the AVG() function to calculate the average distance driven and average
power consumed for each vehicle model grouped by the year of manufacture.
• Group By: The results should be grouped by model_name and manufacture_year to get
the average statistics for each model and year.
• Ordering: Order the result by model_name and manufacture_year to maintain a logical
order in the output.

Datasets and SQL Schemas


-- Table creation for vehicles
CREATE TABLE vehicles (
vehicle_id VARCHAR(10) PRIMARY KEY,
model_name VARCHAR(50),
manufacture_year INT,

748
1000+ SQL Interview Questions & Answers | By Zero Analyst

owner_id INT
);

-- Sample Data for vehicles


INSERT INTO vehicles (vehicle_id, model_name, manufacture_year, owner_id)
VALUES
('001', 'Model S', 2018, 1001),
('002', 'Model 3', 2019, 1002),
('003', 'Model X', 2020, 1003),
('004', 'Model S', 2019, 1004),
('005', 'Model 3', 2018, 1005);

-- Table creation for service_data


CREATE TABLE service_data (
record_id VARCHAR(10) PRIMARY KEY,
vehicle_id VARCHAR(10),
distance_driven INT,
power_consumed DECIMAL(10, 2)
);

-- Sample Data for service_data


INSERT INTO service_data (record_id, vehicle_id, distance_driven, power_consumed)
VALUES
('a001', '001', 1200, 400),
('a002', '002', 1000, 250),
('a003', '003', 1500, 500),
('a004', '001', 1300, 450),
('a005', '004', 1100, 420);

Learnings
• Join: You need to join multiple tables on a common column (e.g., vehicle_id).
• Aggregation: Use AVG() to calculate averages for a set of values.
• Grouping: Group by columns to aggregate data at a higher level (in this case, by
model_name and manufacture_year).
• Ordering: Use ORDER BY to sort the results in a readable manner.

Solutions
PostgreSQL Solution
SELECT
v.model_name,
v.manufacture_year,
AVG(s.distance_driven) AS average_distance,
AVG(s.power_consumed) AS average_power
FROM
vehicles v
JOIN
service_data s
ON
v.vehicle_id = s.vehicle_id
GROUP BY
v.model_name,
v.manufacture_year
ORDER BY
v.model_name,
v.manufacture_year;

MySQL Solution
SELECT
v.model_name,
v.manufacture_year,
AVG(s.distance_driven) AS average_distance,
AVG(s.power_consumed) AS average_power
FROM
vehicles v
JOIN

749
1000+ SQL Interview Questions & Answers | By Zero Analyst

service_data s
ON
v.vehicle_id = s.vehicle_id
GROUP BY
v.model_name,
v.manufacture_year
ORDER BY
v.model_name,
v.manufacture_year;

Output
The output will return a table summarizing the average distance driven and average power
consumed for each Tesla model in each year.
• Q.598
Question
Write an SQL query to compute the battery performance index for each test run based on the
formula:
performance_index

Where:
• CHARGE is the energy used to fully charge the battery (in kWh).
• DISCHARGE is the energy recovered from the battery (in kWh).
• DAYS is the runtime of the test, which is calculated as the difference between end_date
and start_date in days.
You need to round off the performance index to two decimal places.

Explanation
• Date Calculation: The number of days (DAYS) is computed by subtracting start_date
from end_date and adding 1 to account for both start and end dates.
• Performance Index Formula: The formula provided for performance_index involves
calculating the absolute difference between charge_energy and discharge_energy, and
then dividing it by the square root of the runtime in days.
• Rounding: The result of the formula is rounded to two decimal places.

Datasets and SQL Schemas


-- Table creation for battery runs
CREATE TABLE battery_runs (
run_id INT PRIMARY KEY,
battery_model VARCHAR(50),
start_date DATE,
end_date DATE,
charge_energy DECIMAL(10, 2),
discharge_energy DECIMAL(10, 2)
);

-- Sample Data for battery_runs


INSERT INTO battery_runs (run_id, battery_model, start_date, end_date, charge_energy, di
scharge_energy)

750
1000+ SQL Interview Questions & Answers | By Zero Analyst

VALUES
(1, 'Model S', '2021-07-31', '2021-08-05', 100, 98),
(2, 'Model S', '2021-08-10', '2021-08-12', 102, 99),
(3, 'Model 3', '2021-09-01', '2021-09-04', 105, 103),
(4, 'Model X', '2021-10-01', '2021-10-10', 110, 107),
(5, 'Model 3', '2021-11-01', '2021-11-03', 100, 95);

Learnings
• Date Difference: Use date subtraction to calculate the number of days between
start_date and end_date.
• Mathematical Operations: Use ABS() to compute the absolute value and SQRT() to
compute the square root.
• Rounding: Use ROUND() to round the result to two decimal places.

Solutions
PostgreSQL Solution
SELECT
run_id,
battery_model,
ROUND(ABS(charge_energy - discharge_energy) / SQRT(end_date - start_date + 1), 2) AS
performance_index
FROM
battery_runs;

MySQL Solution
SELECT
run_id,
battery_model,
ROUND(ABS(charge_energy - discharge_energy) / SQRT(DATEDIFF(end_date, start_date) +
1), 2) AS performance_index
FROM
battery_runs;

Output
The result will return the following columns:
• run_id: The unique identifier for each battery test run.
• battery_model: The model of the battery being tested (e.g., 'Model S', 'Model 3').
• performance_index: The calculated battery performance index, rounded to two decimal
places.

Example Output

run_id battery_model performance_index

1 Model S 0.36

2 Model S 0.25

3 Model 3 0.29

4 Model X 0.31

751
1000+ SQL Interview Questions & Answers | By Zero Analyst

5 Model 3 0.45
This output shows the performance index for each battery model in each test run, helping to
evaluate the efficiency of the batteries.
• Q.599

Question
Write an SQL query to calculate the average selling price per Tesla car model for each year.
The result should show the year, model, and average price.

Explanation
To calculate the average selling price per model for each year:
• Extract Year: We will extract the year from the sale_date column using the EXTRACT()
function.
• Group by Year and Model: We will group the data by model_id and the extracted year
to compute the average price for each model within each year.
• Average Calculation: We will use the AVG() function to calculate the average price of
each model for the respective year.

Datasets and SQL Schemas


-- Table creation for sales data
CREATE TABLE sales (
sale_id INT PRIMARY KEY,
model_id VARCHAR(50),
sale_date DATE,
price DECIMAL(10, 2)
);

-- Sample Data for sales


INSERT INTO sales (sale_id, model_id, sale_date, price)
VALUES
(1, 'ModelS', '2018-06-08', 80000),
(2, 'ModelS', '2018-10-12', 79000),
(3, 'ModelX', '2019-09-18', 100000),
(4, 'Model3', '2020-07-26', 38000),
(5, 'Model3', '2020-12-05', 40000),
(6, 'ModelY', '2021-06-08', 50000),
(7, 'ModelY', '2021-10-10', 52000);

Learnings
• Date Extraction: Use the EXTRACT() function to get specific parts of a date (in this case,
the year).
• Aggregation: Use AVG() to calculate the average value of a column.
• Grouping: Group by both the year and the car model to get the average price for each
model per year.

Solutions
PostgreSQL Solution
SELECT EXTRACT(YEAR FROM sale_date) AS year,
model_id AS model,
AVG(price) AS average_price
FROM sales

752
1000+ SQL Interview Questions & Answers | By Zero Analyst

GROUP BY year, model


ORDER BY year, model;

MySQL Solution
SELECT YEAR(sale_date) AS year,
model_id AS model,
AVG(price) AS average_price
FROM sales
GROUP BY year, model
ORDER BY year, model;

Output
The result will return the following columns:
• year: The year of the sale.
• model: The car model (e.g., 'ModelS', 'ModelX').
• average_price: The average selling price of that model for that year.
• Q.600
Question
Find the Second-Highest Price for Each Tesla Car Model
Write an SQL query to find the second-highest price for each Tesla car model from the Cars
table. If there is no second-highest price (e.g., only one car model in the table), return NULL
for that model.

Explanation
Use the ROW_NUMBER() or RANK() window function to rank car prices within each car model.
Then, filter the results to get the second-highest price for each model. If there is only one
price, return NULL for that car model.

Datasets and SQL Schemas


• - Table creation
CREATE TABLE Cars (
CarID INT,
ModelName VARCHAR(255),
Price DECIMAL(10, 2)
);
• - Datasets
INSERT INTO Cars (CarID, ModelName, Price)
VALUES
(1, 'Model S', 79999.99),
(2, 'Model S', 84999.99),
(3, 'Model 3', 39999.99),
(4, 'Model 3', 42999.99),
(5, 'Model X', 99999.99),
(6, 'Model X', 109999.99),
(7, 'Model Y', 52999.99);

Learnings
• Using window functions like ROW_NUMBER() and RANK() to rank items within each group.
• Handling cases where there may be no second-highest value by filtering based on rank.
• Partitioning results by a specific column (e.g., car model) while applying ranking.

Solutions

753
1000+ SQL Interview Questions & Answers | By Zero Analyst

PostgreSQL solution
In PostgreSQL, the RANK() window function can be used to rank car prices within each
model. We will filter the second-highest ranked price.
WITH RankedPrices AS (
SELECT CarID, ModelName, Price,
RANK() OVER (PARTITION BY ModelName ORDER BY Price DESC) AS rank
FROM Cars
)
SELECT CarID, ModelName, Price
FROM RankedPrices
WHERE rank = 2;
• RANK() assigns a rank to each car based on the price within each car model (PARTITION BY
ModelName).
• The query then filters only those with a rank of 2 to get the second-highest price for each
model.

MySQL solution
In MySQL, you can use the same approach with the RANK() or ROW_NUMBER() window
function to rank the car prices and get the second-highest.
WITH RankedPrices AS (
SELECT CarID, ModelName, Price,
RANK() OVER (PARTITION BY ModelName ORDER BY Price DESC) AS rank
FROM Cars
)
SELECT CarID, ModelName, Price
FROM RankedPrices
WHERE rank = 2;
• The logic is the same as in PostgreSQL, where we partition by ModelName and order by
Price in descending order. We then filter for rank 2.

Tik Tok
• Q.601
Calculate the Total Number of Likes for Each TikTok Video
Problem Statement
Write an SQL query to calculate the total number of likes for each TikTok video. The result
should return the video_id and the total_likes for each video, sorted by the video_id.

Explanation
• Aggregation: Count the total number of likes for each video.
• Grouping: Group the result by video_id to get the total likes for each individual video.
• Sorting: Sort the result by video_id in ascending order.

Datasets and SQL Schemas


-- Table definition for video likes
CREATE TABLE video_likes (
like_id INT PRIMARY KEY,
video_id INT,
user_id INT,
like_date DATE
);

754
1000+ SQL Interview Questions & Answers | By Zero Analyst

-- Sample data for video_likes


INSERT INTO video_likes (like_id, video_id, user_id, like_date)
VALUES
(1, 101, 201, '2023-01-01'),
(2, 101, 202, '2023-01-02'),
(3, 102, 203, '2023-02-01'),
(4, 103, 204, '2023-03-01'),
(5, 101, 205, '2023-04-01'),
(6, 102, 206, '2023-05-01'),
(7, 104, 207, '2023-06-01'),
(8, 103, 208, '2023-07-01'),
(9, 101, 209, '2023-08-01'),
(10, 105, 210, '2023-09-01');

Solutions
PostgreSQL Solution
SELECT video_id, COUNT(like_id) AS total_likes
FROM video_likes
GROUP BY video_id
ORDER BY video_id;

MySQL Solution
SELECT video_id, COUNT(like_id) AS total_likes
FROM video_likes
GROUP BY video_id
ORDER BY video_id;

Learnings
• Aggregation: The COUNT() function is used to count the total number of likes for each
video.
• Grouping: GROUP BY helps in grouping the results by video_id so that the count is per
video.
• Sorting: ORDER BY video_id ensures the result is sorted by video IDs.

• Q.602
Find the Most Popular TikTok Hashtags
Problem Statement
Write an SQL query to find the top 5 most popular hashtags on TikTok. The result should
return the hashtag and the total_mentions, sorted by the total number of mentions in
descending order.

Explanation
• Count Mentions: Count the number of times each hashtag has been mentioned.
• Top 5: Limit the result to the top 5 most mentioned hashtags.
• Sorting: Sort by the total number of mentions in descending order to show the most
popular hashtags.

755
1000+ SQL Interview Questions & Answers | By Zero Analyst

Datasets and SQL Schemas


-- Table definition for hashtag mentions
CREATE TABLE hashtag_mentions (
mention_id INT PRIMARY KEY,
hashtag VARCHAR(50),
video_id INT,
mention_date DATE
);

-- Sample data for hashtag_mentions


INSERT INTO hashtag_mentions (mention_id, hashtag, video_id, mention_date)
VALUES
(1, '#dance', 101, '2023-01-01'),
(2, '#fun', 101, '2023-01-02'),
(3, '#dance', 102, '2023-02-01'),
(4, '#fun', 103, '2023-03-01'),
(5, '#challenge', 101, '2023-04-01'),
(6, '#fun', 102, '2023-05-01'),
(7, '#dance', 104, '2023-06-01'),
(8, '#challenge', 103, '2023-07-01'),
(9, '#fun', 104, '2023-08-01'),
(10, '#viral', 105, '2023-09-01');

Solutions
PostgreSQL Solution
SELECT hashtag, COUNT(mention_id) AS total_mentions
FROM hashtag_mentions
GROUP BY hashtag
ORDER BY total_mentions DESC
LIMIT 5;

MySQL Solution
SELECT hashtag, COUNT(mention_id) AS total_mentions
FROM hashtag_mentions
GROUP BY hashtag
ORDER BY total_mentions DESC
LIMIT 5;

Learnings
• Counting: COUNT() is used to count how many times each hashtag has been mentioned.
• Grouping: GROUP BY groups the hashtags so we can aggregate the counts.
• Limiting: LIMIT 5 ensures that only the top 5 hashtags are returned.
• Sorting: Sorting by total_mentions in descending order to find the most popular
hashtags.
• Q.603
Identify TikTok Users with the Most Posts
Problem Statement
Write an SQL query to identify the top 3 TikTok users with the most posts. The result should
return the user_id and the total_posts, sorted by the total number of posts in descending
order.

Explanation
• Count Posts: Count the number of posts each user has made.

756
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Top 3: Limit the result to the top 3 users with the most posts.
• Sorting: Sort the results by the number of posts in descending order to identify the most
active users.

Datasets and SQL Schemas


-- Table definition for user posts
CREATE TABLE user_posts (
post_id INT PRIMARY KEY,
user_id INT,
post_date DATE,
content TEXT
);

-- Sample data for user_posts


INSERT INTO user_posts (post_id, user_id, post_date, content)
VALUES
(1, 101, '2023-01-01', 'Dancing to a new song'),
(2, 102, '2023-01-05', 'Check out this new trend'),
(3, 101, '2023-01-10', 'Another dance video'),
(4, 103, '2023-02-01', 'Tutorial on TikTok'),
(5, 102, '2023-02-10', 'Reacting to a funny video'),
(6, 104, '2023-03-01', 'Making a funny meme'),
(7, 101, '2023-03-05', 'Singing my favorite song'),
(8, 105, '2023-04-01', 'TikTok cooking recipe'),
(9, 103, '2023-04-05', 'Sharing life tips'),
(10, 102, '2023-05-01', 'Dance choreography video');

Solutions
PostgreSQL Solution
SELECT user_id, COUNT(post_id) AS total_posts
FROM user_posts
GROUP BY user_id
ORDER BY total_posts DESC
LIMIT 3;

MySQL Solution
SELECT user_id, COUNT(post_id) AS total_posts
FROM user_posts
GROUP BY user_id
ORDER BY total_posts DESC
LIMIT 3;

Learnings
• Counting: COUNT(post_id) counts the number of posts per user.
• Grouping: The GROUP BY user_id groups the posts by user, enabling the calculation of
total posts per user.
• Limiting: LIMIT 3 ensures the query returns the top 3 users.
• Sorting: Sorting the results by total_posts in descending order gives the users with the
most posts.

• Q.604

Calculate the Average Video Duration Based on Likes


Problem Statement

757
1000+ SQL Interview Questions & Answers | By Zero Analyst

Write an SQL query to calculate the average duration of TikTok videos (in seconds) for each
video category. The result should return the category, the average_duration of the
videos, and the total number of likes for that category.

Explanation
• Group by Category: Group the result by category to calculate the average duration and
likes for each category.
• Aggregate Functions: Use AVG() to calculate the average video duration and SUM() to
calculate the total number of likes for each category.
• Join Tables: Join the video_details table (which includes video metadata) with the
video_likes table to count the total likes for each category.

Datasets and SQL Schemas


-- Table definition for video details
CREATE TABLE video_details (
video_id INT PRIMARY KEY,
category VARCHAR(50),
video_duration INT -- Duration in seconds
);

-- Table definition for video likes


CREATE TABLE video_likes (
like_id INT PRIMARY KEY,
video_id INT,
user_id INT
);

-- Sample data for video_details


INSERT INTO video_details (video_id, category, video_duration)
VALUES
(1, 'Dance', 120),
(2, 'Cooking', 150),
(3, 'Dance', 130),
(4, 'Travel', 180),
(5, 'Cooking', 200),
(6, 'Dance', 125);

-- Sample data for video_likes


INSERT INTO video_likes (like_id, video_id, user_id)
VALUES
(1, 1, 101),
(2, 1, 102),
(3, 2, 103),
(4, 3, 104),
(5, 3, 105),
(6, 4, 106),
(7, 5, 107);

Solutions
PostgreSQL Solution
SELECT v.category,
AVG(v.video_duration) AS average_duration,
COUNT(l.like_id) AS total_likes
FROM video_details v
JOIN video_likes l ON v.video_id = l.video_id
GROUP BY v.category;

MySQL Solution

758
1000+ SQL Interview Questions & Answers | By Zero Analyst

SELECT v.category,
AVG(v.video_duration) AS average_duration,
COUNT(l.like_id) AS total_likes
FROM video_details v
JOIN video_likes l ON v.video_id = l.video_id
GROUP BY v.category;

Learnings
• Aggregating by Category: Using GROUP BY allows you to calculate the average duration
and total likes for each video category.
• Join Operations: By joining two tables, you can gather data from multiple sources (e.g.,
video metadata and likes).
• Use of Aggregate Functions: AVG() and COUNT() allow for calculating averages and totals
over groups of records.
• Q.605
Identify the Top 3 Most Active TikTok Users in Terms of Content Creation
Problem Statement
Write an SQL query to identify the top 3 most active TikTok users based on the number of
videos they have uploaded. The result should return the user_id, total_videos_uploaded,
and the user_name (if available) sorted by the number of videos uploaded in descending
order.

Explanation
• Count Videos: Count the number of videos each user has uploaded.
• Limit the Results: Return the top 3 users with the most videos.
• Sort the Results: Sort the users by the number of videos uploaded, in descending order.

Datasets and SQL Schemas


-- Table definition for user profiles
CREATE TABLE user_profiles (
user_id INT PRIMARY KEY,
user_name VARCHAR(100)
);

-- Table definition for user videos


CREATE TABLE user_videos (
video_id INT PRIMARY KEY,
user_id INT,
upload_date DATE,
video_title VARCHAR(100)
);

-- Sample data for user_profiles


INSERT INTO user_profiles (user_id, user_name)
VALUES
(1, 'Alice'),
(2, 'Bob'),
(3, 'Charlie'),
(4, 'David'),
(5, 'Eva');

-- Sample data for user_videos


INSERT INTO user_videos (video_id, user_id, upload_date, video_title)
VALUES

759
1000+ SQL Interview Questions & Answers | By Zero Analyst

(1, 1, '2023-01-01', 'Video 1'),


(2, 2, '2023-01-02', 'Video 2'),
(3, 1, '2023-01-05', 'Video 3'),
(4, 1, '2023-01-10', 'Video 4'),
(5, 3, '2023-01-12', 'Video 5'),
(6, 2, '2023-01-15', 'Video 6'),
(7, 2, '2023-01-18', 'Video 7'),
(8, 3, '2023-02-01', 'Video 8'),
(9, 4, '2023-02-02', 'Video 9'),
(10, 5, '2023-02-05', 'Video 10');

Solutions
PostgreSQL Solution
SELECT u.user_id,
COUNT(v.video_id) AS total_videos_uploaded,
u.user_name
FROM user_videos v
JOIN user_profiles u ON v.user_id = u.user_id
GROUP BY u.user_id, u.user_name
ORDER BY total_videos_uploaded DESC
LIMIT 3;

MySQL Solution
SELECT u.user_id,
COUNT(v.video_id) AS total_videos_uploaded,
u.user_name
FROM user_videos v
JOIN user_profiles u ON v.user_id = u.user_id
GROUP BY u.user_id, u.user_name
ORDER BY total_videos_uploaded DESC
LIMIT 3;

Learnings
• Counting User Contributions: COUNT() helps in identifying how many videos each user
has uploaded.
• Sorting and Limiting: Sorting by total_videos_uploaded and limiting the result helps
to identify the most active users.
• Join Operations: Joining the user_profiles and user_videos tables allows retrieving
user details alongside the count of videos.
• Q.606
Calculate the Average Number of Comments per Video Category
Problem Statement
Write an SQL query to calculate the average number of comments per video category for
TikTok videos. The result should return the category, the average_comments, and the
total_comments for each category.

Explanation
• Join Tables: Join the video_details table (containing video metadata) with the
video_comments table (containing comments for videos).
• Group by Category: Group the result by category to calculate the average number of
comments per category.

760
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Aggregation: Use COUNT() to get the total number of comments and AVG() to get the
average number of comments.

Datasets and SQL Schemas


-- Table definition for video categories and details
CREATE TABLE video_details (
video_id INT PRIMARY KEY,
category VARCHAR(50),
video_duration INT
);

-- Table definition for video comments


CREATE TABLE video_comments (
comment_id INT PRIMARY KEY,
video_id INT,
comment_text TEXT,
comment_date DATE
);

-- Sample data for video_details


INSERT INTO video_details (video_id, category, video_duration)
VALUES
(1, 'Dance', 120),
(2, 'Cooking', 150),
(3, 'Dance', 130),
(4, 'Travel', 180),
(5, 'Cooking', 200),
(6, 'Dance', 125);

-- Sample data for video_comments


INSERT INTO video_comments (comment_id, video_id, comment_text, comment_date)
VALUES
(1, 1, 'Nice moves!', '2023-01-01'),
(2, 1, 'Amazing!', '2023-01-02'),
(3, 2, 'Looks delicious!', '2023-02-01'),
(4, 3, 'Great choreography!', '2023-03-01'),
(5, 3, 'Love this dance!', '2023-03-05'),
(6, 4, 'Nice trip!', '2023-04-01'),
(7, 4, 'Beautiful view!', '2023-04-02');

Solutions
PostgreSQL Solution
SELECT v.category,
AVG(comment_count) AS average_comments,
SUM(comment_count) AS total_comments
FROM (
SELECT video_id, category, COUNT(c.comment_id) AS comment_count
FROM video_details v
LEFT JOIN video_comments c ON v.video_id = c.video_id
GROUP BY v.video_id, v.category
) AS subquery
GROUP BY category;

MySQL Solution
SELECT v.category,
AVG(comment_count) AS average_comments,
SUM(comment_count) AS total_comments
FROM (
SELECT video_id, category, COUNT(c.comment_id) AS comment_count
FROM video_details v
LEFT JOIN video_comments c ON v.video_id = c.video_id
GROUP BY v.video_id, v
.category
) AS subquery

761
1000+ SQL Interview Questions & Answers | By Zero Analyst

GROUP BY category;

Learnings
• Left Join: Using LEFT JOIN ensures that videos without comments are still included in the
result with a count of 0.
• Aggregating Counts: COUNT() is used to get the number of comments per video, and
AVG() and SUM() help with calculating averages and totals at the category level.
• Subqueries: A subquery is used to aggregate the comment counts for each video, which is
then aggregated again by category.
• Q.607

Question 1: Identify the Most Popular Time of Day for TikTok Video Uploads
Problem Statement
Write an SQL query to identify the most popular time of day (hour of the day) for video
uploads on TikTok. The result should return the hour of the day (upload_hour), the number
of videos uploaded during that hour (total_uploads), and the percentage of total uploads
that occurred during that hour.

Explanation
• Extract Hour: Use EXTRACT() or HOUR() to extract the hour from the upload_date.
• Count Uploads: Count the number of video uploads for each hour.
• Calculate Percentage: Calculate the percentage of uploads for each hour based on the
total number of uploads.

Datasets and SQL Schemas


-- Table definition for video uploads
CREATE TABLE video_uploads (
video_id INT PRIMARY KEY,
user_id INT,
upload_date DATETIME,
video_title VARCHAR(100)
);

-- Sample data for video_uploads


INSERT INTO video_uploads (video_id, user_id, upload_date, video_title)
VALUES
(1, 101, '2024-01-01 08:30:00', 'Morning Dance'),
(2, 102, '2024-01-01 09:15:00', 'Cooking Tutorial'),
(3, 103, '2024-01-01 11:45:00', 'Afternoon Adventure'),
(4, 104, '2024-01-01 12:10:00', 'Lunch Break Fun'),
(5, 105, '2024-01-01 14:00:00', 'Fitness Routine'),
(6, 106, '2024-01-01 16:30:00', 'Evening Ride'),
(7, 107, '2024-01-01 17:00:00', 'Dance Challenge'),
(8, 108, '2024-01-01 18:30:00', 'Music Jam'),
(9, 109, '2024-01-01 20:00:00', 'Night Party'),
(10, 110, '2024-01-01 23:00:00', 'Late Night Routine');

Solutions
PostgreSQL Solution

762
1000+ SQL Interview Questions & Answers | By Zero Analyst

WITH hourly_uploads AS (
SELECT EXTRACT(HOUR FROM upload_date) AS upload_hour,
COUNT(video_id) AS total_uploads
FROM video_uploads
GROUP BY upload_hour
)
SELECT upload_hour,
total_uploads,
ROUND((total_uploads::DECIMAL / (SELECT COUNT(*) FROM video_uploads)) * 100, 2) A
S upload_percentage
FROM hourly_uploads
ORDER BY total_uploads DESC;

MySQL Solution
WITH hourly_uploads AS (
SELECT HOUR(upload_date) AS upload_hour,
COUNT(video_id) AS total_uploads
FROM video_uploads
GROUP BY upload_hour
)
SELECT upload_hour,
total_uploads,
ROUND((total_uploads / (SELECT COUNT(*) FROM video_uploads)) * 100, 2) AS upload_
percentage
FROM hourly_uploads
ORDER BY total_uploads DESC;

Learnings
• Extracting Date Components: Using EXTRACT() or HOUR() helps to break down the
timestamp into usable parts like the hour.
• Aggregation and Grouping: Aggregating the number of uploads per hour helps to
determine peak times.
• Calculating Percentages: Using a subquery to get the total number of uploads enables the
calculation of upload percentages.
• Q.608

Question 2: Track the Growth of TikTok Influencers by Follower Count


Problem Statement
Write an SQL query to calculate the change in follower count for TikTok influencers over the
last 3 months. The query should return the influencer’s user_id, their user_name, the
follower count at the start of the period, the follower count at the end of the period, and the
percentage change in their follower count.

Explanation
• Track Follower Count: Use user_id to track the follower count for each influencer over
the last three months.
• Calculate Percentage Change: Calculate the percentage change in follower count
between the start and end of the 3-month period.
• Consider Date Filtering: Filter the records to only consider data within the last 3 months.

Datasets and SQL Schemas


-- Table definition for user profiles
CREATE TABLE user_profiles (
user_id INT PRIMARY KEY,

763
1000+ SQL Interview Questions & Answers | By Zero Analyst

user_name VARCHAR(100),
follower_count INT,
profile_update_date DATE
);

-- Sample data for user_profiles


INSERT INTO user_profiles (user_id, user_name, follower_count, profile_update_date)
VALUES
(1, 'Alice', 15000, '2024-01-05'),
(2, 'Bob', 12000, '2024-01-10'),
(3, 'Charlie', 18000, '2024-01-15'),
(1, 'Alice', 16000, '2024-02-05'),
(2, 'Bob', 12500, '2024-02-10'),
(3, 'Charlie', 18500, '2024-02-15'),
(1, 'Alice', 17000, '2024-03-05'),
(2, 'Bob', 13000, '2024-03-10'),
(3, 'Charlie', 19000, '2024-03-15');

Solutions
PostgreSQL Solution
WITH follower_changes AS (
SELECT user_id, user_name,
FIRST_VALUE(follower_count) OVER (PARTITION BY user_id ORDER BY profile_updat
e_date) AS start_follower_count,
LAST_VALUE(follower_count) OVER (PARTITION BY user_id ORDER BY profile_update
_date ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS end_follower_count
FROM user_profiles
WHERE profile_update_date BETWEEN '2024-01-01' AND '2024-03-31'
)
SELECT user_id, user_name, start_follower_count, end_follower_count,
ROUND(((end_follower_count - start_follower_count)::DECIMAL / start_follower_coun
t) * 100, 2) AS percentage_change
FROM follower_changes;

MySQL Solution
WITH follower_changes AS (
SELECT user_id, user_name,
FIRST_VALUE(follower_count) OVER (PARTITION BY user_id ORDER BY profile_updat
e_date) AS start_follower_count,
LAST_VALUE(follower_count) OVER (PARTITION BY user_id ORDER BY profile_update
_date ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS end_follower_count
FROM user_profiles
WHERE profile_update_date BETWEEN '2024-01-01' AND '2024-03-31'
)
SELECT user_id, user_name, start_follower_count, end_follower_count,
ROUND(((end_follower_count - start_follower_count) / start_follower_count) * 100,
2) AS percentage_change
FROM follower_changes;

Learnings
• Date Filtering: By using the WHERE clause to filter the records based on date ranges, you
can track changes over a specific period.
• Window Functions: The use of FIRST_VALUE() and LAST_VALUE() allows you to fetch
the start and end follower counts for each user.
• Percentage Calculation: The percentage change formula helps in understanding the
growth of each influencer’s followers.
• Q.609
Identify the Most Popular Hashtags for TikTok Videos in 2024
Problem Statement

764
1000+ SQL Interview Questions & Answers | By Zero Analyst

Write an SQL query to identify the most popular hashtags used in TikTok videos in 2024.
The query should return the hashtag, the number of times it was used, and the percentage of
total hashtag uses for that year.

Explanation
• Count Hashtags: Count the number of times each hashtag appears in the videos uploaded
in 2024.
• Calculate Percentage: Calculate the percentage of total hashtag usage for each hashtag.
• Filter by Year: Ensure that only hashtags used in 2024 are considered.

Datasets and SQL Schemas


-- Table definition for video details
CREATE TABLE video_details (
video_id INT PRIMARY KEY,
upload_date DATE,
video_title VARCHAR(100)
);

-- Table definition for video hashtags


CREATE TABLE video_hashtags (
hashtag_id INT PRIMARY KEY,
video_id INT,
hashtag VARCHAR(50)
);

-- Sample data for video_details


INSERT INTO video_details (video_id, upload_date, video_title)
VALUES
(1, '2024-01-01', 'Dance Challenge'),
(2, '2024-02-05', 'Cooking Recipe'),
(3, '2024-03-10', 'Fitness Routine'),
(4, '2024-03-20', 'Travel Vlog'),
(5, '2024-04-15', 'Dance Moves'),
(6, '2024-05-20', 'Night Routine');

-- Sample data for video_hashtags


INSERT INTO video_hashtags (hashtag_id, video_id, hashtag)
VALUES
(1, 1, '#dance'),
(2, 2, '#cooking'),
(3, 3,'#fitness'),
(4, 4, '#travel'),
(5, 5, '#dance'),
(6, 6, '#night');

Postgres Solution
WITH hashtag_counts AS (
SELECT h.hashtag, COUNT(h.hashtag_id) AS hashtag_count
FROM video_hashtags h
JOIN video_details v ON h.video_id = v.video_id
WHERE EXTRACT(YEAR FROM v.upload_date) = 2024
GROUP BY h.hashtag
)
SELECT hashtag, hashtag_count,
ROUND((hashtag_count::DECIMAL / (SELECT COUNT(*) FROM video_hashtags)) * 100, 2)
AS percentage
FROM hashtag_counts
ORDER BY hashtag_count DESC;

MySQL Solution
WITH hashtag_counts AS (
SELECT h.hashtag, COUNT(h.hashtag_id) AS hashtag_count

765
1000+ SQL Interview Questions & Answers | By Zero Analyst

FROM video_hashtags h
JOIN video_details v ON h.video_id = v.video_id
WHERE YEAR(v.upload_date) = 2024
GROUP BY h.hashtag
)
SELECT hashtag, hashtag_count,
ROUND((hashtag_count / (SELECT COUNT(*) FROM video_hashtags)) * 100, 2) AS percen
tage
FROM hashtag_counts
ORDER BY hashtag_count DESC;

Learnings
• Using JOIN: Joining video_hashtags and video_details allows us to count hashtags
from videos uploaded in a specific year.
• Date Filtering: The EXTRACT(YEAR FROM ...) or YEAR() function is used to ensure we
only consider videos uploaded in 2024.
• Percentage Calculation: The percentage gives an idea of how popular a particular hashtag
was relative to all hashtags.
• Q.610
Analyze Video Engagement by Week
Problem Statement
Write an SQL query to analyze the total number of views and likes for TikTok videos
uploaded each week in 2024. The query should return the week number (week_of_year), the
total views, total likes, and the average views per like for each week.

Explanation
• Extract Week Information: Use EXTRACT(WEEK FROM ...) or WEEK() to extract the
week number of the year from the video upload date.
• Aggregate Data: Use SUM() to calculate total views and likes for each week.
• Calculate Views per Like: Calculate the average views per like by dividing the total
views by the total likes.

Datasets and SQL Schemas


-- Table definition for video details
CREATE TABLE video_details (
video_id INT PRIMARY KEY,
user_id INT,
upload_date DATETIME,
views INT,
likes INT,
video_title VARCHAR(100)
);

-- Sample data for video_details


INSERT INTO video_details (video_id, user_id, upload_date, views, likes, video_title)
VALUES
(1, 101, '2024-01-01 10:00:00', 1000, 200, 'Dance Challenge'),
(2, 102, '2024-01-03 12:00:00', 2000, 300, 'Cooking Recipe'),
(3, 103, '2024-01-08 14:00:00', 3000, 500, 'Fitness Routine'),
(4, 104, '2024-01-12 09:00:00', 1500, 250, 'Travel Vlog'),
(5, 105, '2024-01-20 16:00:00', 2500, 400, 'Dance Moves'),
(6, 106, '2024-01-22 11:00:00', 3500, 600, 'Night Routine'),
(7, 107, '2024-02-01 08:00:00', 5000, 700, 'Morning Motivation'),
(8, 108, '2024-02-05 17:00:00', 4500, 650, 'Cooking Tips'),

766
1000+ SQL Interview Questions & Answers | By Zero Analyst

(9, 109, '2024-02-10 19:00:00', 2200, 300, 'Fitness Hacks'),


(10, 110, '2024-02-15 13:00:00', 3200, 400, 'Dance Party');

Solutions
PostgreSQL Solution
SELECT EXTRACT(WEEK FROM upload_date) AS week_of_year,
SUM(views) AS total_views,
SUM(likes) AS total_likes,
ROUND(SUM(views)::DECIMAL / NULLIF(SUM(likes), 0), 2) AS views_per_like
FROM video_details
WHERE EXTRACT(YEAR FROM upload_date) = 2024
GROUP BY week_of_year
ORDER BY week_of_year;

MySQL Solution
SELECT WEEK(upload_date) AS week_of_year,
SUM(views) AS total_views,
SUM(likes) AS total_likes,
ROUND(SUM(views) / NULLIF(SUM(likes), 0), 2) AS views_per_like
FROM video_details
WHERE YEAR(upload_date) = 2024
GROUP BY week_of_year
ORDER BY week_of_year;

Learnings
• Aggregating Data by Week: Using EXTRACT(WEEK FROM ...) or WEEK() helps group
data by week, allowing analysis of trends over time.
• Handling Division by Zero: The NULLIF function ensures that division by zero does not
occur when calculating views per like.
• Date Filtering: The WHERE clause ensures that only videos uploaded in 2024 are included
in the result.
• Q.611
Identify Top Performing TikTok Influencers by Total Likes
Problem Statement
Write an SQL query to identify the top 5 TikTok influencers based on the total number of
likes received on their videos in 2024. The query should return the influencer's user_id,
user_name, and the total likes they have received, sorted by total likes in descending order.

Explanation
• Sum of Likes: Use SUM(likes) to calculate the total likes for each influencer.
• Sort and Limit: Sort the results by total likes in descending order and return only the top 5
influencers.

Datasets and SQL Schemas


-- Table definition for influencer profiles
CREATE TABLE influencer_profiles (
user_id INT PRIMARY KEY,
user_name VARCHAR(100)
);

-- Table definition for video details

767
1000+ SQL Interview Questions & Answers | By Zero Analyst

CREATE TABLE video_details (


video_id INT PRIMARY KEY,
user_id INT,
upload_date DATETIME,
views INT,
likes INT,
video_title VARCHAR(100)
);

-- Sample data for influencer_profiles


INSERT INTO influencer_profiles (user_id, user_name)
VALUES
(101, 'Alice'),
(102, 'Bob'),
(103, 'Charlie'),
(104, 'David'),
(105, 'Emma'),
(106, 'Frank'),
(107, 'Grace'),
(108, 'Hank'),
(109, 'Ivy'),
(110, 'Jack');

-- Sample data for video_details


INSERT INTO video_details (video_id, user_id, upload_date, views, likes, video_title)
VALUES
(1, 101, '2024-01-01 10:00:00', 1000, 200, 'Dance Challenge'),
(2, 102, '2024-01-03 12:00:00', 2000, 300, 'Cooking Recipe'),
(3, 103, '2024-01-08 14:00:00', 3000, 500, 'Fitness Routine'),
(4, 101, '2024-01-12 09:00:00', 1500, 250, 'Travel Vlog'),
(5, 104, '2024-01-20 16:00:00', 2500, 400, 'Dance Moves'),
(6, 105, '2024-01-22 11:00:00', 3500, 600, 'Night Routine'),
(7, 106, '2024-02-01 08:00:00', 5000, 700, 'Morning Motivation'),
(8, 107, '2024-02-05 17:00:00', 4500, 650, 'Cooking Tips'),
(9, 108, '2024-02-10 19:00:00', 2200, 300, 'Fitness Hacks'),
(10, 109, '2024-02-15 13:00:00', 3200, 400, 'Dance Party');

Solutions
PostgreSQL Solution
SELECT i.user_id, i.user_name, SUM(v.likes) AS total_likes
FROM influencer_profiles i
JOIN video_details v ON i.user_id = v.user_id
WHERE EXTRACT(YEAR FROM v.upload_date) = 2024
GROUP BY i.user_id, i.user_name
ORDER BY total_likes DESC
LIMIT 5;

MySQL Solution
SELECT i.user_id, i.user_name, SUM(v.likes) AS total_likes
FROM influencer_profiles i
JOIN video_details v ON i.user_id = v.user_id
WHERE YEAR(v.upload_date) = 2024
GROUP BY i.user_id, i.user_name
ORDER BY total_likes DESC
LIMIT 5;

Learnings
• JOIN Operation: Joining the influencer_profiles table with the video_details table
allows you to aggregate likes by influencer.
• LIMIT: Using LIMIT 5 ensures that only the top 5 influencers are returned.
• Date Filtering: The WHERE clause filters the data to include only videos uploaded in 2024.
• Q.612

768
1000+ SQL Interview Questions & Answers | By Zero Analyst

Determine Most Active Video Genres by Views


Problem Statement
Write an SQL query to determine the most active video genres in terms of views. The query
should return the video_genre, the total views for that genre, and the percentage of total
views that genre represents in 2024.

Explanation
• Group by Genre: Group the data by video genre to aggregate total views for each genre.
• Calculate Percentage: Calculate the percentage of total views each genre represents.
• Filter by Year: Only consider videos uploaded in 2024.

Datasets and SQL Schemas


-- Table definition for video details
CREATE TABLE video_details (
video_id INT PRIMARY KEY,
user_id INT,
upload_date DATETIME,
views INT,
likes INT,
video_title VARCHAR(100),
video_genre VARCHAR(50)
);

-- Sample data for video_details


INSERT INTO video_details (video_id, user_id, upload_date, views, likes, video_title, vi
deo_genre)
VALUES
(1, 101, '2024-01-01 10:00:00', 1000, 200, 'Dance Challeng', 'Dance'),
(2, 102, '2024-01-03 12:00:00', 2000, 300, 'Cooking Recipe', 'Cooking'),
(3, 103, '2024-01-08 14:00:00', 3000, 500, 'Fitness Routine', 'Fitness'),
(4, 101, '2024-01-12 09:00:00', 1500, 250, 'Travel Vlog', 'Travel'),
(5, 104, '2024-01-20 16:00:00', 2500, 400, 'Dance Moves', 'Dance'),
(6, 105, '2024-01-22 11:00:00', 3500, 600, 'Night Routine', 'Lifestyle'),
(7, 106, '2024-02-01 08:00:00', 5000, 700, 'Morning Motivation', 'Motivation'),
(8, 107, '2024-02-05 17:00:00', 4500, 650, 'Cooking Tips', 'Cooking'),
(9, 108, '2024-02-10 19:00:00', 2200, 300, 'Fitness Hacks', 'Fitness'),
(10, 109, '2024-02-15 13:00:00', 3200, 400, 'Dance Party', 'Dance');

PostgreSQL Solution
WITH genre_views AS (
SELECT video_genre, SUM(views) AS total_views
FROM video_details
WHERE EXTRACT(YEAR FROM upload_date) = 2024
GROUP BY video_genre
)
SELECT video_genre, total_views,
ROUND((total_views::DECIMAL / (SELECT SUM(views) FROM video_details WHERE EXTRACT
(YEAR FROM upload_date) = 2024)) * 100, 2) AS percentage
FROM genre_views
ORDER BY total_views DESC;

MySQL Solution
WITH genre_views AS (
SELECT video_genre, SUM(views) AS total_views
FROM video_details
WHERE YEAR(upload_date) = 2024
GROUP BY video_genre
)
SELECT video_genre, total_views,

769
1000+ SQL Interview Questions & Answers | By Zero Analyst

ROUND((total_views / (SELECT SUM(views) FROM video_details WHERE YEAR(upload_date


) = 2024)) * 100, 2) AS percentage
FROM genre_views
ORDER BY total_views DESC;

Learnings
• Aggregating Data by Genre: Grouping by video genre allows for easy calculation of total
views per genre.
• Percentage Calculation: Calculating the percentage of views per genre provides insight
into which genres are the most popular.
• Using WITH Clause: The WITH clause simplifies the logic by creating intermediate views to
calculate total views by genre.
• Q.613
Calculate the Retention Rate of Users After Watching Ads
Problem Statement
Write an SQL query to calculate the retention rate of users who watched an ad campaign in
2024. The retention rate is defined as the percentage of users who performed an action (like
liking, commenting, or sharing a video) after interacting with an ad.
• A user is considered to have "retained" if they performed any action (like, comment, or
share) on a video after viewing an ad.
• You need to calculate the retention rate for each ad campaign in 2024.

Explanation
• Data Sources: You have two tables: ad_interactions (which records when a user clicks
on an ad) and user_actions (which records when a user performs an action like liking,
commenting, or sharing a video).
• Joins: You'll need to join these tables on user_id and calculate the retention rate for each
ad campaign.
• Retention Calculation: Retention rate is calculated as COUNT(DISTINCT
retained_users) / COUNT(DISTINCT total_users) * 100.

Datasets and SQL Schemas


-- Table definition for ad_interactions
CREATE TABLE ad_interactions (
interaction_id INT PRIMARY KEY,
user_id INT,
ad_campaign VARCHAR(50),
interaction_date DATETIME
);

-- Table definition for user_actions


CREATE TABLE user_actions (
action_id INT PRIMARY KEY,
user_id INT,
action_type VARCHAR(50),
action_date DATETIME
);

-- Sample data for ad_interactions


INSERT INTO ad_interactions (interaction_id, user_id, ad_campaign, interaction_date)
VALUES

770
1000+ SQL Interview Questions & Answers | By Zero Analyst

(1, 101, 'Campaign A', '2024-01-01 10:00:00'),


(2, 102, 'Campaign B', '2024-02-01 12:00:00'),
(3, 103, 'Campaign A', '2024-03-01 14:00:00'),
(4, 104, 'Campaign B', '2024-04-01 16:00:00'),
(5, 105, 'Campaign A', '2024-05-01 18:00:00');

-- Sample data for user_actions


INSERT INTO user_actions (action_id, user_id, action_type, action_date)
VALUES
(1, 101, 'like', '2024-01-05 11:00:00'),
(2, 102, 'share', '2024-02-05 13:00:00'),
(3, 103, 'comment', '2024-03-05 15:00:00'),
(4, 105, 'like', '2024-06-01 10:00:00');

Solutions
PostgreSQL Solution
WITH retained_users AS (
SELECT DISTINCT ai.user_id, ai.ad_campaign
FROM ad_interactions ai
JOIN user_actions ua
ON ai.user_id = ua.user_id
WHERE ai.interaction_date <= ua.action_date
AND EXTRACT(YEAR FROM ai.interaction_date) = 2024
),
total_users AS (
SELECT DISTINCT user_id, ad_campaign
FROM ad_interactions
WHERE EXTRACT(YEAR FROM interaction_date) = 2024
)
SELECT ai.ad_campaign,
ROUND(COUNT(DISTINCT ru.user_id) * 100.0 / COUNT(DISTINCT tu.user_id), 2) AS rete
ntion_rate
FROM total_users tu
JOIN retained_users ru ON tu.user_id = ru.user_id AND tu.ad_campaign = ru.ad_campaign
GROUP BY ai.ad_campaign;

MySQL Solution
WITH retained_users AS (
SELECT DISTINCT ai.user_id, ai.ad_campaign
FROM ad_interactions ai
JOIN user_actions ua
ON ai.user_id = ua.user_id
WHERE ai.interaction_date <= ua.action_date
AND YEAR(ai.interaction_date) = 2024
),
total_users AS (
SELECT DISTINCT user_id, ad_campaign
FROM ad_interactions
WHERE YEAR(interaction_date) = 2024
)
SELECT ai.ad_campaign,
ROUND(COUNT(DISTINCT ru.user_id) * 100.0 / COUNT(DISTINCT tu.user_id), 2) AS rete
ntion_rate
FROM total_users tu
JOIN retained_users ru ON tu.user_id = ru.user_id AND tu.ad_campaign = ru.ad_campaign
GROUP BY ai.ad_campaign;

Learnings
• Complex Joins: Joining on multiple conditions and using subqueries to calculate retention
based on specific date constraints.
• Data Aggregation: Using COUNT(DISTINCT ...) to count unique users for each ad
campaign.

771
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Retention Calculation: Ensuring the retention rate is calculated as a percentage.


• Q.614
Analyze Most Engaging Content by Video Genre and User Activity
Problem Statement
Write an SQL query to analyze the most engaging TikTok video genres based on the total
number of actions (likes, comments, and shares). The query should return the video genre, the
total number of actions, and the number of unique users that interacted with each genre in
2024.

Explanation
• Multiple Actions: Actions can be likes, comments, or shares, and need to be aggregated.
• Group by Genre: Calculate the total actions and the number of unique users per genre.
• Filter by Year: Only include data from 2024.

Datasets and SQL Schemas


-- Table definition for video_details
CREATE TABLE video_details (
video_id INT PRIMARY KEY,
user_id INT,
video_genre VARCHAR(50),
upload_date DATETIME
);

-- Table definition for user_actions


CREATE TABLE user_actions (
action_id INT PRIMARY KEY,
user_id INT,
video_id INT,
action_type VARCHAR(50),
action_date DATETIME
);

-- Sample data for video_details


INSERT INTO video_details (video_id, user_id, video_genre, upload_date)
VALUES
(1, 101, 'Dance', '2024-01-01 10:00:00'),
(2, 102, 'Fitness', '2024-02-01 12:00:00'),
(3, 103, 'Cooking', '2024-03-01 14:00:00'),
(4, 104, 'Travel', '2024-04-01 16:00:00'),
(5, 105, 'Dance', '2024-05-01 18:00:00');

-- Sample data for user_actions


INSERT INTO user_actions (action_id, user_id, video_id, action_type, action_date)
VALUES
(1, 101, 1, 'like', '2024-01-02 10:00:00'),
(2, 102, 1, 'share', '2024-01-02 11:00:00'),
(3, 103, 2, 'comment', '2024-02-02 13:00:00'),
(4, 104, 3, 'like', '2024-03-03 15:00:00'),
(5, 105, 4, 'share', '2024-04-04 16:00:00'),
(6, 101, 4, 'comment', '2024-04-04 17:00:00');

Solutions
PostgreSQL Solution
SELECT vd.video_genre,
COUNT(ua.action_id) AS total_actions,

772
1000+ SQL Interview Questions & Answers | By Zero Analyst

COUNT(DISTINCT ua.user_id) AS unique_users


FROM video_details vd
JOIN user_actions ua ON vd.video_id = ua.video_id
WHERE EXTRACT(YEAR FROM ua.action_date) = 2024
GROUP BY vd.video_genre
ORDER BY total_actions DESC;

MySQL Solution
SELECT vd.video_genre,
COUNT(ua.action_id) AS total_actions,
COUNT(DISTINCT ua.user_id) AS unique_users
FROM video_details vd
JOIN user_actions ua ON vd.video_id = ua.video_id
WHERE YEAR(ua.action_date) = 2024
GROUP BY vd.video_genre
ORDER BY total_actions DESC;

Learnings
• Multi-Table Join: Combining data from both the video_details and user_actions
tables.
• Complex Aggregation: Calculating both the total number of actions and the number of
unique users for each genre.
• Grouping and Sorting: Grouping data by video genre and sorting by total actions to
identify the most engaging genres.
• Q.615
Identify Top 3 Users with the Highest Video Engagement Rate
Problem Statement
Write an SQL query to identify the top 3 users with the highest video engagement rate in
2024. The engagement rate is defined as the total number of actions (likes, comments, and
shares) divided by the total number of videos uploaded by the user.
• Only consider actions performed on videos uploaded in 2024.
• Sort the users by engagement rate in descending order.

Explanation
• Engagement Rate: For each user, calculate the total number of actions on their videos and
divide it by the number of videos they uploaded.
• Ranking: Sort the users by their engagement rate and return the top 3

Datasets and SQL Schemas


-- Table definition for video_details
CREATE TABLE video_details (
video_id INT PRIMARY KEY,
user_id INT,
video_genre VARCHAR(50),
upload_date DATETIME
);

-- Table definition for user_actions


CREATE TABLE user_actions (
action_id INT PRIMARY KEY,
user_id INT,
video_id INT,
action_type VARCHAR(50),

773
1000+ SQL Interview Questions & Answers | By Zero Analyst

action_date DATETIME
);

-- Sample data for video_details


INSERT INTO video_details (video_id, user_id, video_genre, upload_date)
VALUES
(1, 101, 'Dance', '2024-01-01 10:00:00'),
(2, 102, 'Fitness', '2024-02-01 12:00:00'),
(3, 103, 'Cooking', '2024-03-01 14:00:00'),
(4, 104, 'Travel', '2024-04-01 16:00:00'),
(5, 105, 'Dance', '2024-05-01 18:00:00');

-- Sample data for user_actions


INSERT INTO user_actions (action_id, user_id, video_id, action_type, action_date)
VALUES
(1, 101, 1, 'like', '2024-01-02 10:00:00'),
(2, 102, 1, 'share', '2024-01-02 11:00:00'),
(3, 103, 2, 'comment', '2024-02-02 13:00:00'),
(4, 104, 3, 'like', '2024-03-03 15:00:00'),
(5, 105, 4, 'share', '2024-04-04 16:00:00'),
(6, 101, 4, 'comment', '2024-04-04 17:00:00');

Solutions
PostgreSQL Solution
WITH user_engagement AS (
SELECT vd.user_id,
COUNT(ua.action_id) AS total_actions,
COUNT(DISTINCT vd.video_id) AS total_videos
FROM video_details vd
LEFT JOIN user_actions ua ON vd.video_id = ua.video_id
WHERE EXTRACT(YEAR FROM vd.upload_date) = 2024
GROUP BY vd.user_id
)
SELECT user_id,
total_actions / NULLIF(total_videos, 0) AS engagement_rate
FROM user_engagement
ORDER BY engagement_rate DESC
LIMIT 3;

MySQL Solution
WITH user_engagement AS (
SELECT vd.user_id,
COUNT(ua.action_id) AS total_actions,
COUNT(DISTINCT vd.video_id) AS total_videos
FROM video_details vd
LEFT JOIN user_actions ua ON vd.video_id = ua.video_id
WHERE YEAR(vd.upload_date) = 2024
GROUP BY vd.user_id
)
SELECT user_id,
total_actions / NULLIF(total_videos, 0) AS engagement_rate
FROM user_engagement
ORDER BY engagement_rate DESC
LIMIT 3;

Learnings
• NULLIF: This function ensures that division by zero does not occur when there are no
videos uploaded by a user.
• Engagement Rate Calculation: Understanding how to compute an engagement metric by
dividing total actions by the number of videos uploaded.
• Ranking: Sorting the results to get the top users based on engagement.
• Q.616

774
1000+ SQL Interview Questions & Answers | By Zero Analyst

Problem Statement
Write an SQL query to calculate the average watch time per user for each video genre in
2024. Watch time is recorded in seconds, and you need to calculate it for each genre, grouped
by the user's interactions with videos of that genre. Only include data for videos uploaded in
2024.

Explanation
• Data Sources: You have two tables: video_details (which holds information about
videos, including the genre and upload date) and video_interactions (which records the
watch time for each user).
• Watch Time Aggregation: For each genre, calculate the average watch time per user.
• Filter by Year: Only include interactions with videos uploaded in 2024.

Datasets and SQL Schemas


-- Table definition for video_details
CREATE TABLE video_details (
video_id INT PRIMARY KEY,
video_genre VARCHAR(50),
upload_date DATETIME
);

-- Table definition for video_interactions


CREATE TABLE video_interactions (
interaction_id INT PRIMARY KEY,
user_id INT,
video_id INT,
watch_time_seconds INT, -- Watch time in seconds
interaction_date DATETIME
);

-- Sample data for video_details


INSERT INTO video_details (video_id, video_genre, upload_date)
VALUES
(1, 'Dance', '2024-01-01 10:00:00'),
(2, 'Fitness', '2024-02-01 12:00:00'),
(3, 'Cooking', '2024-03-01 14:00:00'),
(4, 'Travel', '2024-04-01 16:00:00'),
(5, 'Dance', '2024-05-01 18:00:00');

-- Sample data for video_interactions


INSERT INTO video_interactions (interaction_id, user_id, video_id, watch_time_seconds, i
nteraction_date)
VALUES
(1, 101, 1, 300, '2024-01-02 10:00:00'),
(2, 102, 1, 150, '2024-01-02 11:00:00'),
(3, 103, 2, 200, '2024-02-02 13:00:00'),
(4, 104, 3, 450, '2024-03-03 15:00:00'),
(5, 101, 4, 350, '2024-04-04 16:00:00'),
(6, 105, 5, 500, '2024-05-05 17:00:00');

Solutions
PostgreSQL Solution
SELECT vd.video_genre,
AVG(vi.watch_time_seconds) AS avg_watch_time
FROM video_details vd
JOIN video_interactions vi ON vd.video_id = vi.video_id
WHERE EXTRACT(YEAR FROM vi.interaction_date) = 2024

775
1000+ SQL Interview Questions & Answers | By Zero Analyst

GROUP BY vd.video_genre
ORDER BY avg_watch_time DESC;

MySQL Solution
SELECT vd.video_genre,
AVG(vi.watch_time_seconds) AS avg_watch_time
FROM video_details vd
JOIN video_interactions vi ON vd.video_id = vi.video_id
WHERE YEAR(vi.interaction_date) = 2024
GROUP BY vd.video_genre
ORDER BY avg_watch_time DESC;

Learnings
• JOIN with Aggregation: Combining data from the video_details and
video_interactions tables to calculate an average.
• Time-based Filtering: Using EXTRACT() or YEAR() to filter data by year.
• Aggregation: Calculating average watch time per genre.
• Q.617
Problem Statement
Write an SQL query to identify users who have shared videos more than 3 times within a
single week in 2024. The query should return the user_id, share_count, and the week
number for the share activity.

Explanation
• Data Sources: You have the user_actions table, which tracks user activities (like, share,
comment).
• Week Number Calculation: You will need to use WEEK() or DATE_TRUNC() to calculate
the week number of the year.
• Group by Week and User: Count how many shares each user has performed in each week
and filter for users who shared more than 3 times.

Datasets and SQL Schemas


-- Table definition for user_actions
CREATE TABLE user_actions (
action_id INT PRIMARY KEY,
user_id INT,
action_type VARCHAR(50),
action_date DATETIME
);

-- Sample data for user_actions


INSERT INTO user_actions (action_id, user_id, action_type, action_date)
VALUES
(1, 101, 'share', '2024-01-01 10:00:00'),
(2, 101, 'like', '2024-01-02 11:00:00'),
(3, 102, 'share', '2024-02-03 14:00:00'),
(4, 101, 'share', '2024-01-03 15:00:00'),
(5, 103, 'comment', '2024-03-01 16:00:00'),
(6, 101, 'share', '2024-01-04 17:00:00'),
(7, 102, 'share', '2024-02-05 18:00:00'),
(8, 104, 'like', '2024-02-06 19:00:00');

776
1000+ SQL Interview Questions & Answers | By Zero Analyst

Solutions
PostgreSQL Solution
SELECT user_id,
COUNT(*) AS share_count,
EXTRACT(WEEK FROM action_date) AS week_number
FROM user_actions
WHERE action_type = 'share'
AND EXTRACT(YEAR FROM action_date) = 2024
GROUP BY user_id, week_number
HAVING COUNT(*) > 3
ORDER BY share_count DESC;

MySQL Solution
SELECT user_id,
COUNT(*) AS share_count,
WEEK(action_date) AS week_number
FROM user_actions
WHERE action_type = 'share'
AND YEAR(action_date) = 2024
GROUP BY user_id, week_number
HAVING COUNT(*) > 3
ORDER BY share_count DESC;

Learnings
• Filtering Specific Actions: Using WHERE to focus only on 'share' actions.
• Week Calculation: Using EXTRACT(WEEK) or WEEK() to group by week numbers.
• HAVING: Applying filters after the aggregation to only include users with more than 3
shares in a week.
• Q.618
Problem Statement
Write an SQL query to identify the top 3 most commented TikTok videos in 2024. The query
should return the video ID, the number of comments, and the video genre. Sort the results by
the number of comments in descending order.

Explanation
• Data Sources: You have two tables: video_details (which includes the genre and video
ID) and user_actions (which tracks the comments).
• Grouping by Video: You need to count how many comments each video has in 2024 and
sort them.
• Filter by Year: Only consider comments made in 2024.

Datasets and SQL Schemas


-- Table definition for video_details
CREATE TABLE video_details (
video_id INT PRIMARY KEY,
video_genre VARCHAR(50)
);

-- Table definition for user_actions


CREATE TABLE user_actions (
action_id INT PRIMARY KEY,
video_id INT,
action_type VARCHAR(50),

777
1000+ SQL Interview Questions & Answers | By Zero Analyst

action_date DATETIME
);

-- Sample data for video_details


INSERT INTO video_details (video_id, video_genre)
VALUES
(1, 'Dance'),
(2, 'Fitness'),
(3, 'Cooking'),
(4, 'Travel'),
(5, 'Dance');

-- Sample data for user_actions


INSERT INTO user_actions (action_id, video_id, action_type, action_date)
VALUES
(1, 1, 'comment', '2024-01-01 10:00:00'),
(2, 1, 'comment', '2024-01-02 11:00:00'),
(3, 2, 'comment', '2024-02-03 14:00:00'),
(4, 2, 'comment', '2024-02-04 15:00:00'),
(5, 3, 'comment', '2024-03-01 16:00:00'),
(6, 3, 'comment', '2024-03-02 17:00:00'),
(7, 4, 'comment', '2024-04-01 18:00:00'),
(8, 4, 'comment', '2024-04-02 19:00:00');

Solutions
PostgreSQL Solution
SELECT vd.video_id,
COUNT(ua.action_id) AS comment_count,
vd.video_genre
FROM user_actions ua
JOIN video_details vd ON ua.video_id = vd.video_id
WHERE ua.action_type = 'comment'
AND EXTRACT(YEAR FROM ua.action_date) = 2024
GROUP BY vd.video_id, vd.video_genre
ORDER BY comment_count DESC
LIMIT 3;

MySQL Solution
SELECT vd.video_id,
COUNT(ua.action_id) AS comment_count,
vd.video_genre
FROM user_actions ua
JOIN video_details vd ON ua.video_id = vd.video_id
WHERE ua.action_type = 'comment'
AND YEAR(ua.action_date) = 2024
GROUP BY vd.video_id, vd.video_genre
ORDER BY comment_count DESC
LIMIT 3;

Learnings
• JOIN: Combining two tables (user_actions and video_details) to get the genre and
comment counts.
• Filtering Actions: Using WHERE to focus only on 'comment' actions.
• Top N Results: Using LIMIT to get the top 3 videos based on the comment count.
• Q.619
Problem Statement
You are tasked with identifying the user IDs of those who did not confirm their sign-up on
the first day but confirmed their sign-up on the second day after signing up.

778
1000+ SQL Interview Questions & Answers | By Zero Analyst

• The emails table holds the signup_date of users.


• The texts table records when users confirmed their sign-up via text (with signup_action
as 'Confirmed' or 'Not Confirmed') and the date of the confirmation (action_date).

Explanation
• Non-Confirmation on Day 1: We need to check if a user did not confirm on the same day
as their signup_date (i.e., where signup_action = 'Not Confirmed' and action_date is
the same as signup_date).
• Confirmation on Day 2: After a user has failed to confirm on Day 1, we need to check if
they confirmed on the next day, which is one day after their signup_date (i.e., where
signup_action = 'Confirmed' and action_date is exactly one day after the
signup_date).

Datasets and SQL Schemas


-- Table for emails (signup information)
CREATE TABLE emails (
email_id INT PRIMARY KEY,
user_id INT,
signup_date DATETIME
);

-- Table for texts (confirmation actions)


CREATE TABLE texts (
text_id INT PRIMARY KEY,
email_id INT,
signup_action VARCHAR(50),
action_date DATETIME
);

-- Sample data for emails


INSERT INTO emails (email_id, user_id, signup_date)
VALUES
(125, 7771, '2022-06-14 00:00:00'),
(433, 1052, '2022-07-09 00:00:00');

-- Sample data for texts


INSERT INTO texts (text_id, email_id, signup_action, action_date)
VALUES
(6878, 125, 'Confirmed', '2022-06-14 00:00:00'),
(6997, 433, 'Not Confirmed', '2022-07-09 00:00:00'),
(7000, 433, 'Confirmed', '2022-07-10 00:00:00');

Query Solution
The goal is to identify users who did not confirm on the signup date but confirmed the
next day.
SELECT e.user_id
FROM emails e
JOIN texts t1 ON e.email_id = t1.email_id
JOIN texts t2 ON e.email_id = t2.email_id
WHERE t1.signup_action = 'Not Confirmed'
AND t2.signup_action = 'Confirmed'
AND t1.action_date = e.signup_date
AND t2.action_date = DATE_ADD(e.signup_date, INTERVAL 1 DAY);

Explanation of the Query:

779
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Join the emails table with the texts table twice (as t1 and t2) to get both the "Not
Confirmed" and "Confirmed" actions.
• Condition 1: Ensure that on t1, the action is "Not Confirmed" and the action_date
matches the signup_date from the emails table.
• Condition 2: Ensure that on t2, the action is "Confirmed" and the action_date is exactly
one day after the signup_date.
• Final Output: The query returns the user_id of those who confirmed their account on the
second day after signing up.
• Q.620
Analyzing User Behavior and Content Interactions on TikTok
Problem Statement
You are tasked with identifying the top 5 users who have uploaded videos that have received
the most likes on TikTok. The output should display the following:
• User ID
• The total number of videos uploaded by the user
• The total number of likes received by the videos uploaded by the user
Tables:
• Users Table:
• This table contains information about the TikTok users.
CREATE TABLE Users (
user_id INT PRIMARY KEY,
username VARCHAR(100),
country VARCHAR(100),
join_date DATE
);
• Videos Table:
• This table contains information about the videos uploaded by users, including the number
of likes each video has received.
CREATE TABLE Videos (
video_id INT PRIMARY KEY,
upload_date DATE,
user_id INT,
video_likes INT
);

Sample Data:
-- Sample data for Users
INSERT INTO Users (user_id, username, country, join_date)
VALUES
(1, 'user1', 'USA', '2021-01-01'),
(2, 'user2', 'Canada', '2021-02-01'),
(3, 'user3', 'UK', '2021-01-31'),
(4, 'user4', 'USA', '2021-01-30'),
(5, 'user5', 'Canada', '2021-01-15');

-- Sample data for Videos


INSERT INTO Videos (video_id, upload_date, user_id, video_likes)
VALUES
(101, '2021-01-01', 1, 500),
(102, '2021-02-01', 2, 1000),
(103, '2021-02-01', 1, 1500),
(104, '2021-03-01', 3, 2000),
(105, '2021-03-01', 4, 250),
(106, '2021-04-01', 5, 5000);

780
1000+ SQL Interview Questions & Answers | By Zero Analyst

Requirements
• Join the Users and Videos tables on user_id to link each video to its respective user.
• Aggregate the data by user to calculate the total number of videos uploaded and the total
number of likes their videos received.
• Sort the result by the total likes in descending order to get the top users.
• Limit the result to the top 5 users.

SQL Query Solution


SELECT
Users.user_id,
COUNT(Videos.video_id) AS total_videos,
SUM(Videos.video_likes) AS total_likes
FROM
Users
JOIN
Videos ON Users.user_id = Videos.user_id
GROUP BY
Users.user_id
ORDER BY
total_likes DESC
LIMIT 5;

Explanation of the Query:


• SELECT Clause:
• We select user_id from the Users table to identify the user.
• COUNT(Videos.video_id) calculates the total number of videos uploaded by each user.
• SUM(Videos.video_likes) calculates the total number of likes all videos of a user have
received.
• FROM Clause:
• We join the Users table with the Videos table on user_id, linking the videos to their
respective users.
• GROUP BY Clause:
• The query groups the results by user_id to aggregate data for each user.
• ORDER BY Clause:
• We order the results by total_likes in descending order, so that the users with the most
likes come first.
• LIMIT Clause:
• The query limits the result to the top 5 users with the highest total likes.

Apple
• Q.621
Identify Top Customers by Total Purchase Amount in a Given Year
Problem Statement:
Apple needs to track its top customers based on the total purchase amount in a given year.
You need to write a SQL query that returns the top 5 customers who made the highest total
purchases in 2023. The output should show the customer ID and the total amount they spent.

781
1000+ SQL Interview Questions & Answers | By Zero Analyst

Datasets and SQL Schemas


-- Table creation for customers' purchase data
CREATE TABLE customer_purchases (
customer_id INT,
purchase_date DATE,
amount DECIMAL(10, 2)
);

-- Sample Data for customer purchases


INSERT INTO customer_purchases (customer_id, purchase_date, amount)
VALUES
(1, '2023-01-05', 1000.50),
(2, '2023-02-15', 1500.75),
(3, '2023-03-20', 1200.00),
(1, '2023-04-10', 500.00),
(4, '2023-05-05', 800.25),
(5, '2023-06-22', 600.00),
(2, '2023-07-18', 1300.50),
(3, '2023-08-10', 700.00),
(1, '2023-09-05', 1500.00),
(4, '2023-10-02', 1200.00);

Requirements:
• Filter the data for the year 2023 using YEAR(purchase_date).
• Sum the total amount spent by each customer during the year.
• Sort the results by total purchase amount in descending order.
• Limit the output to the top 5 customers.

Postgres & MySQL Query Solution


SELECT customer_id,
SUM(amount) AS total_spent
FROM customer_purchases
WHERE YEAR(purchase_date) = 2023
GROUP BY customer_id
ORDER BY total_spent DESC
LIMIT 5;
• Q.622
Calculate the Average Purchase Amount by Product Category
Problem Statement:
Apple needs to calculate the average purchase amount for each product category in the year
2023. You need to write a SQL query that returns the average amount spent in each category.

Datasets and SQL Schemas


-- Table creation for products' purchase data
CREATE TABLE product_purchases (
product_id INT,
product_category VARCHAR(50),
purchase_date DATE,
amount DECIMAL(10, 2)
);

-- Sample Data for product purchases


INSERT INTO product_purchases (product_id, product_category, purchase_date, amount)
VALUES
(1, 'iPhone', '2023-01-10', 1000.00),
(2, 'MacBook', '2023-02-20', 2000.00),
(3, 'iPhone', '2023-03-25', 1200.00),

782
1000+ SQL Interview Questions & Answers | By Zero Analyst

(4, 'AirPods', '2023-04-15', 250.00),


(5, 'MacBook', '2023-05-10', 1800.00),
(6, 'iPhone', '2023-06-05', 999.99),
(7, 'iPad', '2023-07-10', 450.00),
(8, 'AirPods', '2023-08-22', 200.00);

Requirements:
• Group by product category to calculate the average purchase amount for each category.
• Calculate the average using the AVG() function.
• Filter for the year 2023 using YEAR(purchase_date).

Postgres & MySQL Query Solution


SELECT product_category,
AVG(amount) AS average_amount
FROM product_purchases
WHERE YEAR(purchase_date) = 2023
GROUP BY product_category;
• Q.623
Find the Number of Orders for Each Customer
Problem Statement:
Apple wants to track the number of orders placed by each customer. Write a SQL query to
count the number of orders each customer made and show the total number of orders for each
customer.

Datasets and SQL Schemas


-- Table creation for customer orders
CREATE TABLE customer_orders (
order_id INT PRIMARY KEY,
customer_id INT,
order_date DATE,
order_amount DECIMAL(10, 2)
);

-- Sample Data for customer orders


INSERT INTO customer_orders (order_id, customer_id, order_date, order_amount)
VALUES
(101, 1, '2023-01-10', 500.00),
(102, 2, '2023-02-15', 1500.00),
(103, 1, '2023-03-20', 1000.00),
(104, 3, '2023-04-05', 800.00),
(105, 4, '2023-05-22', 1200.00),
(106, 1, '2023-06-10', 600.00),
(107, 2, '2023-07-18', 1300.00),
(108, 3, '2023-08-05', 500.00),
(109, 4, '2023-09-15', 2000.00),
(110, 1, '2023-10-05', 1500.00);

Requirements:
• Group by customer_id to count the number of orders placed by each customer.
• Use the COUNT() function to calculate the number of orders.
• Show customer_id and order count.

783
1000+ SQL Interview Questions & Answers | By Zero Analyst

Postgres & SQL Query Solution


SELECT customer_id,
COUNT(order_id) AS total_orders
FROM customer_orders
GROUP BY customer_id;
• Q.624

Question
Apple has a trade-in program where customers can return their old iPhone device and receive
a trade-in payout in cash. For each store, write a query to calculate the total revenue from the
trade-in payouts. Order the result by total revenue in descending order.

Explanation
To solve this, you need to join the trade_in_transactions table with the
trade_in_payouts table using the model_id column. Then, for each store, calculate the
total trade-in revenue by multiplying the number of transactions for each model by its
respective payout amount. Finally, order the results by total revenue in descending order.

Datasets and SQL Schemas


Table creation
-- Creating trade_in_transactions table
CREATE TABLE trade_in_transactions (
transaction_id INTEGER,
model_id INTEGER,
store_id INTEGER,
transaction_date DATE
);

-- Creating trade_in_payouts table


CREATE TABLE trade_in_payouts (
model_id INTEGER,
model_name VARCHAR(100),
payout_amount INTEGER
);

Datasets
-- Inserting data into trade_in_transactions
INSERT INTO trade_in_transactions (transaction_id, model_id, store_id, transaction_date)
VALUES
(1, 112, 512, '2022-01-01'),
(2, 113, 512, '2022-01-01');

-- Inserting data into trade_in_payouts


INSERT INTO trade_in_payouts (model_id, model_name, payout_amount)
VALUES
(111, 'iPhone 11', 200),
(112, 'iPhone 12', 350),
(113, 'iPhone 13', 450),
(114, 'iPhone 13 Pro Max', 650);

Learnings
• Using JOIN to combine data from multiple tables based on common columns (model_id).
• Aggregating data with COUNT or SUM for calculating total revenues.
• Ordering results with ORDER BY in descending order to get the stores with the highest
trade-in payouts first.

Solutions

784
1000+ SQL Interview Questions & Answers | By Zero Analyst

PostgreSQL Solution
SELECT t.store_id, SUM(p.payout_amount) AS total_revenue
FROM trade_in_transactions t
JOIN trade_in_payouts p ON t.model_id = p.model_id
GROUP BY t.store_id
ORDER BY total_revenue DESC;

MySQL Solution
SELECT t.store_id, SUM(p.payout_amount) AS total_revenue
FROM trade_in_transactions t
JOIN trade_in_payouts p ON t.model_id = p.model_id
GROUP BY t.store_id
ORDER BY total_revenue DESC;
• Q.625

Question
Write a query to determine the percentage of buyers who bought AirPods directly after they
bought iPhones, with no intermediate purchases in between. Round the answer to a whole
percentage (e.g., 20 for 20%, 50 for 50%).

Explanation
To solve this, you need to:
• Identify customers who bought iPhones and later bought AirPods.
• Ensure no intermediate purchases (e.g., iPads, etc.) occurred between buying an iPhone
and AirPods.
• Calculate the percentage of customers who bought AirPods after iPhones relative to the
total number of customers who bought iPhones.

Datasets and SQL Schemas


Table creation
-- Creating transactions table
CREATE TABLE transactions (
transaction_id INTEGER,
customer_id INTEGER,
product_name VARCHAR(100),
transaction_timestamp DATETIME
);

Datasets
-- Inserting data into transactions
INSERT INTO transactions (transaction_id, customer_id, product_name, transaction_timesta
mp)
VALUES
(1, 101, 'iPhone', '2022-08-08 00:00:00'),
(2, 101, 'AirPods', '2022-08-08 00:00:00'),
(5, 301, 'iPhone', '2022-09-05 00:00:00'),
(6, 301, 'iPad', '2022-09-06 00:00:00'),
(7, 301, 'AirPods', '2022-09-07 00:00:00');

Learnings
• Using JOIN or SELF JOIN to track sequences of events for each customer.
• Filtering with conditions on timestamps to ensure correct order of purchases.
• Calculating percentage by dividing the count of desired events by the total and multiplying
by 100 for percentage.

785
1000+ SQL Interview Questions & Answers | By Zero Analyst

Solutions
PostgreSQL Solution
WITH iPhone_buyers AS (
SELECT DISTINCT customer_id
FROM transactions
WHERE product_name = 'iPhone'
),
airpod_followers AS (
SELECT DISTINCT t1.customer_id
FROM transactions t1
JOIN transactions t2
ON t1.customer_id = t2.customer_id
AND t1.product_name = 'iPhone'
AND t2.product_name = 'AirPods'
AND t1.transaction_timestamp < t2.transaction_timestamp
WHERE NOT EXISTS (
SELECT 1
FROM transactions t3
WHERE t3.customer_id = t1.customer_id
AND t3.transaction_timestamp > t1.transaction_timestamp
AND t3.transaction_timestamp < t2.transaction_timestamp
AND t3.product_name != 'AirPods'
)
)
SELECT ROUND(100.0 * COUNT(DISTINCT af.customer_id) / COUNT(DISTINCT ib.customer_id)) AS
percentage
FROM iPhone_buyers ib
LEFT JOIN airpod_followers af
ON ib.customer_id = af.customer_id;

MySQL Solution
WITH iPhone_buyers AS (
SELECT DISTINCT customer_id
FROM transactions
WHERE product_name = 'iPhone'
),
airpod_followers AS (
SELECT DISTINCT t1.customer_id
FROM transactions t1
JOIN transactions t2
ON t1.customer_id = t2.customer_id
AND t1.product_name = 'iPhone'
AND t2.product_name = 'AirPods'
AND t1.transaction_timestamp < t2.transaction_timestamp
WHERE NOT EXISTS (
SELECT 1
FROM transactions t3
WHERE t3.customer_id = t1.customer_id
AND t3.transaction_timestamp > t1.transaction_timestamp
AND t3.transaction_timestamp < t2.transaction_timestamp
AND t3.product_name != 'AirPods'
)
)
SELECT ROUND(100 * COUNT(DISTINCT af.customer_id) / COUNT(DISTINCT ib.customer_id)) AS p
ercentage
FROM iPhone_buyers ib
LEFT JOIN airpod_followers af
ON ib.customer_id = af.customer_id;
• Q.626

Question
Write a SQL query to calculate the monthly average rating for each Apple product based on
reviews submitted by users. The review table contains the following columns: review_id,
user_id, submit_date, product_id, and stars. For the purposes of this problem, assume
that the product_id corresponds to an Apple product.

786
1000+ SQL Interview Questions & Answers | By Zero Analyst

Explanation
To solve this problem, you need to:
• Extract the month and year from the submit_date to group reviews by month.
• Calculate the average stars for each product within each month.
• Group by the extracted month and product to get the average rating for each Apple
product.
• Order the results by month and product.

Datasets and SQL Schemas


Table creation
-- Creating the reviews table
CREATE TABLE reviews (
review_id INTEGER,
user_id INTEGER,
submit_date DATETIME,
product_id INTEGER,
stars INTEGER
);

Datasets
-- Inserting sample data into reviews
INSERT INTO reviews (review_id, user_id, submit_date, product_id, stars)
VALUES
(6171, 123, '2022-06-08 00:00:00', 50001, 4),
(7802, 265, '2022-06-10 00:00:00', 69852, 4),
(5293, 362, '2022-06-18 00:00:00', 50001, 3),
(6352, 192, '2022-07-26 00:00:00', 69852, 3),
(4517, 981, '2022-07-05 00:00:00', 69852, 2);

Learnings
• Using EXTRACT() or DATE_TRUNC() to isolate parts of a date (month, year).
• Aggregating data using AVG() to compute average ratings.
• Grouping data using GROUP BY and sorting results with ORDER BY.

Solutions
PostgreSQL Solution
SELECT EXTRACT(MONTH FROM submit_date) AS mth,
product_id AS product,
AVG(stars) AS avg_stars
FROM reviews
GROUP BY mth, product
ORDER BY mth, product;

MySQL Solution
SELECT MONTH(submit_date) AS mth,
product_id AS product,
AVG(stars) AS avg_stars
FROM reviews
GROUP BY mth, product
ORDER BY mth, product;
• Q.627

Question

787
1000+ SQL Interview Questions & Answers | By Zero Analyst

Write a SQL query to compute the average quantity of each product sold per month for the
year 2021. You are given two tables: products and sales. The products table contains
information about Apple products, and the sales table contains data about product sales,
including quantity sold and the sale date.

Explanation
To solve this problem:
• Join the products and sales tables on the product_id column.
• Filter the sales for the year 2021 using the YEAR() function.
• Extract the month from the date_of_sale to group the sales by month.
• Calculate the average quantity sold for each product per month.
• Group by the extracted month and the product name to get the monthly average sales.
• Order the results by month and product.

Datasets and SQL Schemas


Table creation
-- Creating products table
CREATE TABLE products (
product_id INTEGER,
product_name VARCHAR(100)
);

-- Creating sales table


CREATE TABLE sales (
sales_id INTEGER,
product_id INTEGER,
date_of_sale DATE,
quantity_sold INTEGER
);

Datasets
-- Inserting data into products table
INSERT INTO products (product_id, product_name)
VALUES
(1, 'iPhone 12'),
(2, 'Apple Watch'),
(3, 'MacBook Pro');

-- Inserting data into sales table


INSERT INTO sales (sales_id, product_id, date_of_sale, quantity_sold)
VALUES
(1, 1, '2021-01-10', 100),
(2, 1, '2021-01-15', 200),
(3, 2, '2021-01-20', 50),
(4, 2, '2021-02-15', 75),
(5, 3, '2021-02-10', 20);

Learnings
• Using JOIN to combine data from multiple tables based on a common column
(product_id).
• Using YEAR() to filter data for a specific year.
• Using MONTH() to extract the month from a date for grouping purposes.
• Using AVG() to calculate the average of a numeric column.
• Grouping and ordering results with GROUP BY and ORDER BY.

788
1000+ SQL Interview Questions & Answers | By Zero Analyst

Solutions
PostgreSQL Solution
SELECT EXTRACT(MONTH FROM s.date_of_sale) AS "Month",
p.product_name,
AVG(s.quantity_sold) AS "Average_Sold"
FROM sales s
JOIN products p ON s.product_id = p.product_id
WHERE EXTRACT(YEAR FROM s.date_of_sale) = 2021
GROUP BY EXTRACT(MONTH FROM s.date_of_sale), p.product_name
ORDER BY "Month", p.product_name;

MySQL Solution
SELECT MONTH(s.date_of_sale) AS 'Month',
p.product_name,
AVG(s.quantity_sold) AS 'Average_Sold'
FROM sales s
JOIN products p ON s.product_id = p.product_id
WHERE YEAR(s.date_of_sale) = 2021
GROUP BY MONTH(s.date_of_sale), p.product_name
ORDER BY 'Month', p.product_name;
• Q.628

Question
Write a SQL query to calculate the Add-to-Bag Conversion Rate for each product in the
Apple Store. The conversion rate is defined as the number of users who add a product to their
shopping bag (cart) after clicking on the product listing, divided by the total number of clicks
for that product. The result should be broken down by product_id.

Explanation
To calculate the conversion rate:
• Join the clicks table with the bag_adds table on product_id and user_id, ensuring that
only users who clicked on a product and added it to the bag are considered.
• Count the total clicks for each product and the total number of successful adds-to-bag
(where add_id is not null).
• Calculate the conversion rate as the ratio of adds-to-bag to total clicks for each product.
• Group the result by product_id to get the conversion rate for each product.

Datasets and SQL Schemas


Table creation
-- Creating clicks table
CREATE TABLE clicks (
click_id INTEGER,
product_id INTEGER,
user_id INTEGER,
click_time DATETIME
);

-- Creating bag_adds table


CREATE TABLE bag_adds (
add_id INTEGER,
product_id INTEGER,
user_id INTEGER,
add_time DATETIME
);

Datasets

789
1000+ SQL Interview Questions & Answers | By Zero Analyst

-- Inserting data into clicks table


INSERT INTO clicks (click_id, product_id, user_id, click_time)
VALUES
(1, 5001, 123, '2022-06-08 00:00:00'),
(2, 6001, 456, '2022-06-10 00:00:00'),
(3, 5001, 789, '2022-06-18 00:00:00'),
(4, 7001, 321, '2022-07-26 00:00:00'),
(5, 5001, 654, '2022-07-05 00:00:00');

-- Inserting data into bag_adds table


INSERT INTO bag_adds (add_id, product_id, user_id, add_time)
VALUES
(1, 5001, 123, '2022-06-08 00:02:00'),
(2, 6001, 456, '2022-06-10 00:01:00'),
(3, 5001, 789, '2022-06-18 00:03:00'),
(4, 7001, 321, '2022-07-26 00:04:00'),
(5, 5001, 985, '2022-07-05 00:05:00');

Learnings
• LEFT JOIN is used to ensure we include all clicks, even if there was no add-to-bag
action.
• Conditional aggregation (CASE WHEN) helps count only those records where an add
occurred.
• Aggregation with GROUP BY allows us to compute the conversion rate per product.

Solutions
PostgreSQL Solution
SELECT
c.product_id,
SUM(CASE WHEN a.add_id IS NOT NULL THEN 1 ELSE 0 END) / COUNT(c.click_id) AS convers
ion_rate
FROM
clicks c
LEFT JOIN bag_adds a ON a.product_id = c.product_id AND a.user_id = c.user_id
GROUP BY c.product_id;

MySQL Solution
SELECT
c.product_id,
SUM(CASE WHEN a.add_id IS NOT NULL THEN 1 ELSE 0 END) / COUNT(c.click_id) AS convers
ion_rate
FROM
clicks c
LEFT JOIN bag_adds a ON a.product_id = c.product_id AND a.user_id = c.user_id
GROUP BY c.product_id;
• Q.629
Question
Write a SQL query to find all users who have more than one type of device (e.g., both an
iPhone and a MacBook) and are using more than 50GB of total iCloud storage across all their
devices. The result should include the UserID, UserName, total number of devices, and total
storage used. Order the results by the total storage used in descending order.

Explanation
To solve this:
• Join the Users, Devices, and StorageUsage tables:
• Users table contains user details.
• Devices table contains device details, including the type of device.

790
1000+ SQL Interview Questions & Answers | By Zero Analyst

• StorageUsage table contains the amount of iCloud storage used for each device.
• Count the distinct device types for each user to check if they have more than one type of
device.
• Sum the total storage used for each user across all their devices.
• Filter results using HAVING to ensure:
• The user has more than one device type.
• The total storage usage is greater than 50GB.
• Group the results by UserID and UserName to get the summary for each user.
• Order the results by total storage used in descending order.

Datasets and SQL Schemas


Table creation
-- Creating Users table
CREATE TABLE Users (
UserID INT,
UserName VARCHAR(100),
Email VARCHAR(100),
Country VARCHAR(100)
);

-- Creating Devices table


CREATE TABLE Devices (
DeviceID INT,
UserID INT,
DeviceType VARCHAR(50),
PurchaseDate DATE
);

-- Creating StorageUsage table


CREATE TABLE StorageUsage (
DeviceID INT,
StorageUsed INT, -- Storage used in GB
LastUpdate DATE
);

Datasets
-- Inserting data into Users table
INSERT INTO Users (UserID, UserName, Email, Country)
VALUES
(1, 'John Doe', '[email protected]', 'USA'),
(2, 'Jane Smith', '[email protected]', 'Canada'),
(3, 'Alice Johnson', '[email protected]', 'UK');

-- Inserting data into Devices table


INSERT INTO Devices (DeviceID, UserID, DeviceType, PurchaseDate)
VALUES
(1, 1, 'iPhone', '2021-01-10'),
(2, 1, 'MacBook', '2022-05-20'),
(3, 2, 'iPhone', '2021-02-15'),
(4, 2, 'iPad', '2022-04-18'),
(5, 3, 'iPhone', '2021-03-25');

-- Inserting data into StorageUsage table


INSERT INTO StorageUsage (DeviceID, StorageUsed, LastUpdate)
VALUES
(1, 30, '2022-06-01'),
(2, 40, '2022-06-01'),
(3, 20, '2022-06-01'),
(4, 60, '2022-06-01'),
(5, 15, '2022-06-01');

Learnings

791
1000+ SQL Interview Questions & Answers | By Zero Analyst

• JOIN: Combining data from multiple tables based on shared keys (UserID and DeviceID).
• GROUP BY: Grouping data by UserID and UserName to aggregate information at the user
level.
• HAVING: Filtering the grouped results to ensure users have multiple device types and
exceed the storage usage threshold.
• COUNT(DISTINCT): Counting distinct device types to ensure the user has more than one
type of device.
• SUM(): Summing the total storage used by each user across their devices.

Solutions
PostgreSQL Solution
SELECT
u.UserID,
u.UserName,
COUNT(DISTINCT d.DeviceType) AS TotalDevices,
SUM(s.StorageUsed) AS TotalStorageUsed
FROM
Users u
JOIN
Devices d ON u.UserID = d.UserID
JOIN
StorageUsage s ON d.DeviceID = s.DeviceID
GROUP BY
u.UserID,
u.UserName
HAVING
COUNT(DISTINCT d.DeviceType) > 1
AND SUM(s.StorageUsed) > 50
ORDER BY
TotalStorageUsed DESC;

MySQL Solution
SELECT
u.UserID,
u.UserName,
COUNT(DISTINCT d.DeviceType) AS TotalDevices,
SUM(s.StorageUsed) AS TotalStorageUsed
FROM
Users u
JOIN
Devices d ON u.UserID = d.UserID
JOIN
StorageUsage s ON d.DeviceID = s.DeviceID
GROUP BY
u.UserID,
u.UserName
HAVING
COUNT(DISTINCT d.DeviceType) > 1
AND SUM(s.StorageUsed) > 50
ORDER BY
TotalStorageUsed DESC;
• Q.630
Device Upgrade Frequency
Write a SQL query to calculate the average number of months between each user's device
purchases. Only consider users who have more than one device. The result should include the
UserID, UserName, and the average number of months between their device purchases. Order
the results by the average number of months in descending order.
Explanation

792
1000+ SQL Interview Questions & Answers | By Zero Analyst

To solve this:
• Join the Users and Devices tables to get device purchase information for each user.
• Filter to include only users who have more than one device.
• Calculate the months between consecutive device purchases for each user.
• Aggregate the results to calculate the average number of months between device purchases
for each user.
• Group by UserID and UserName and order by the average number of months in
descending order.
Datasets and SQL Schemas
-- Creating Users table
CREATE TABLE Users (
UserID INT,
UserName VARCHAR(100),
Email VARCHAR(100),
Country VARCHAR(100)
);

-- Creating Devices table


CREATE TABLE Devices (
DeviceID INT,
UserID INT,
DeviceType VARCHAR(50),
PurchaseDate DATE
);

Datasets
-- Inserting data into Users table
INSERT INTO Users (UserID, UserName, Email, Country)
VALUES
(1, 'John Doe', '[email protected]', 'USA'),
(2, 'Jane Smith', '[email protected]', 'Canada'),
(3, 'Alice Johnson', '[email protected]', 'UK');

-- Inserting data into Devices table


INSERT INTO Devices (DeviceID, UserID, DeviceType, PurchaseDate)
VALUES
(1, 1, 'iPhone', '2021-01-10'),
(2, 1, 'MacBook', '2022-05-20'),
(3, 2, 'iPhone', '2021-02-15'),
(4, 2, 'iPad', '2022-04-18'),
(5, 3, 'iPhone', '2021-03-25'),
(6, 3, 'MacBook', '2022-02-10');

Solutions
PostgreSQL Solution
SELECT
u.UserID,
u.UserName,
AVG(EXTRACT(MONTH FROM d2.PurchaseDate - d1.PurchaseDate)) AS avg_months_between_pur
chases
FROM
Users u
JOIN
Devices d1 ON u.UserID = d1.UserID
JOIN
Devices d2 ON u.UserID = d2.UserID
WHERE
d1.PurchaseDate < d2.PurchaseDate
GROUP BY
u.UserID, u.UserName
HAVING
COUNT(d1.DeviceID) > 1
ORDER BY

793
1000+ SQL Interview Questions & Answers | By Zero Analyst

avg_months_between_purchases DESC;

MySQL Solution
SELECT
u.UserID,
u.UserName,
AVG(TIMESTAMPDIFF(MONTH, d1.PurchaseDate, d2.PurchaseDate)) AS avg_months_between_pu
rchases
FROM
Users u
JOIN
Devices d1 ON u.UserID = d1.UserID
JOIN
Devices d2 ON u.UserID = d2.UserID
WHERE
d1.PurchaseDate < d2.PurchaseDate
GROUP BY
u.UserID, u.UserName
HAVING
COUNT(d1.DeviceID) > 1
ORDER BY
avg_months_between_purchases DESC;
• Q.631
Device Types per User
Write a SQL query to calculate the number of distinct device types each user owns and filter
the results to show only those users who have purchased at least three different device types.
The output should include UserID, UserName, and the total number of distinct device types
they own, ordered by the total device types in descending order.
Explanation
To solve this:
• Join the Users and Devices tables to get the device type for each user.
• Count the distinct device types for each user using COUNT(DISTINCT).
• Filter users who have purchased at least three different device types.
• Group by UserID and UserName and order by the number of distinct device types in
descending order.
Datasets and SQL Schemas
-- Creating Users table
CREATE TABLE Users (
UserID INT,
UserName VARCHAR(100),
Email VARCHAR(100),
Country VARCHAR(100)
);

-- Creating Devices table


CREATE TABLE Devices (
DeviceID INT,
UserID INT,
DeviceType VARCHAR(50),
PurchaseDate DATE
);

Datasets
-- Inserting data into Users table
INSERT INTO Users (UserID, UserName, Email, Country)
VALUES
(1, 'John Doe', '[email protected]', 'USA'),
(2, 'Jane Smith', '[email protected]', 'Canada'),
(3, 'Alice Johnson', '[email protected]', 'UK');

794
1000+ SQL Interview Questions & Answers | By Zero Analyst

-- Inserting data into Devices table


INSERT INTO Devices (DeviceID, UserID, DeviceType, PurchaseDate)
VALUES
(1, 1, 'iPhone', '2021-01-10'),
(2, 1, 'MacBook', '2022-05-20'),
(3, 1, 'iPad', '2022-06-25'),
(4, 2, 'iPhone', '2021-02-15'),
(5, 2, 'iPad', '2022-04-18'),
(6, 3, 'iPhone', '2021-03-25'),
(7, 3, 'MacBook', '2022-02-10'),
(8, 3, 'Apple Watch', '2022-05-30');

Solutions
PostgreSQL Solution
SELECT
u.UserID,
u.UserName,
COUNT(DISTINCT d.DeviceType) AS TotalDeviceTypes
FROM
Users u
JOIN
Devices d ON u.UserID = d.UserID
GROUP BY
u.UserID, u.UserName
HAVING
COUNT(DISTINCT d.DeviceType) >= 3
ORDER BY
TotalDeviceTypes DESC;

MySQL Solution
SELECT
u.UserID,
u.UserName,
COUNT(DISTINCT d.DeviceType) AS TotalDeviceTypes
FROM
Users u
JOIN
Devices d ON u.UserID = d.UserID
GROUP BY
u.UserID, u.UserName
HAVING
COUNT(DISTINCT d.DeviceType) >= 3
ORDER BY
TotalDeviceTypes DESC;
• Q.632
Total Device Storage per User
Write a SQL query to calculate the total iCloud storage used by each user across all their
devices. Only include users who have more than one device and filter for users who are using
more than 100GB of total storage. The result should include UserID, UserName, and
TotalStorageUsed, ordered by TotalStorageUsed in descending order.
Explanation
To solve this:
• Join the Users, Devices, and StorageUsage tables.
• Sum the total storage used for each user across all their devices.
• Filter to include only users who have more than one device and are using more than
100GB of storage.
• Group by UserID and UserName.
• Order the results by TotalStorageUsed in descending order.
Datasets and SQL Schemas

795
1000+ SQL Interview Questions & Answers | By Zero Analyst

-- Creating Users table


CREATE TABLE Users (
UserID INT,
UserName VARCHAR(100),
Email VARCHAR(100),
Country VARCHAR(100)
);

-- Creating Devices table


CREATE TABLE Devices (
DeviceID INT,
UserID INT,
DeviceType VARCHAR(50),
PurchaseDate DATE
);

-- Creating StorageUsage table


CREATE TABLE StorageUsage (
DeviceID INT,
StorageUsed INT, -- Storage used in GB
LastUpdate DATE
);

Datasets
-- Inserting data into Users table
INSERT INTO Users (UserID, UserName, Email, Country)
VALUES
(1, 'John Doe', '[email protected]', 'USA'),
(2, 'Jane Smith', '[email protected]', 'Canada'),
(3, 'Alice Johnson', '[email protected]', 'UK');

-- Inserting data into Devices table


INSERT INTO Devices (DeviceID, UserID, DeviceType, PurchaseDate)
VALUES
(1, 1, 'iPhone', '2021-01-10'),
(2, 1, 'MacBook', '2022-05-20'),
(3, 2, 'iPhone', '2021-02-15'),
(4, 2, 'iPad', '2022-04-18'),
(5, 3, 'iPhone', '2021-03-25'),
(6, 3, 'MacBook', '2022-02-10');

-- Inserting data into StorageUsage table


INSERT INTO StorageUsage (DeviceID, StorageUsed, LastUpdate)
VALUES
(1, 30, '2022-06-01'),
(2, 60, '2022-06-01'),
(3, 20, '2022-06-01'),
(4, 50, '2022-06-01'),
(5, 15, '2022-06-01'),
(6, 80, '2022-06-01');

Solutions

PostgreSQL Solution
SELECT
u.UserID,
u.UserName,
SUM(s.StorageUsed) AS TotalStorageUsed
FROM
Users u
JOIN
Devices d ON u.UserID = d.UserID
JOIN
StorageUsage s ON d.DeviceID = s.DeviceID
GROUP BY
u.UserID, u.UserName
HAVING
COUNT(d.DeviceID) > 1
AND SUM(s.StorageUsed) > 100
ORDER BY

796
1000+ SQL Interview Questions & Answers | By Zero Analyst

TotalStorageUsed DESC;

MySQL Solution
SELECT
u.UserID,
u.UserName,
SUM(s.StorageUsed) AS TotalStorageUsed
FROM
Users u
JOIN
Devices d ON u.UserID = d.UserID
JOIN
StorageUsage s ON d.DeviceID = s.DeviceID
GROUP BY
u.UserID, u.UserName
HAVING
COUNT(d.DeviceID) > 1
AND SUM(s.StorageUsed) > 100
ORDER BY
TotalStorageUsed DESC;
• Q.633
Most Popular Device in a Given Month
Write a SQL query to find the most popular device purchased in each month of 2022. The
popularity of a device is determined by the number of purchases (i.e., the total quantity sold)
in that month. The result should include the Month, DeviceType, and the
TotalQuantitySold, ordered by the Month and TotalQuantitySold in descending order.
Explanation
To solve this:
• Join the Devices and Sales tables to get the device type and the quantity sold for each
sale.
• Extract the month and year from the PurchaseDate to group the data by month.
• Sum the quantity_sold for each device in each month.
• Group the results by month and device type, and order by the Month and
TotalQuantitySold in descending order to show the most popular devices.

Datasets and SQL Schemas


-- Creating Devices table
CREATE TABLE Devices (
DeviceID INT,
DeviceType VARCHAR(50),
PurchaseDate DATE
);

-- Creating Sales table


CREATE TABLE Sales (
SaleID INT,
DeviceID INT,
QuantitySold INT,
SaleDate DATE
);

Datasets
-- Inserting data into Devices table
INSERT INTO Devices (DeviceID, DeviceType, PurchaseDate)
VALUES
(1, 'iPhone', '2021-01-10'),
(2, 'MacBook', '2022-05-20'),
(3, 'iPad', '2022-06-15'),
(4, 'AirPods', '2022-06-20'),
(5, 'Apple Watch', '2022-07-05');

797
1000+ SQL Interview Questions & Answers | By Zero Analyst

-- Inserting data into Sales table


INSERT INTO Sales (SaleID, DeviceID, QuantitySold, SaleDate)
VALUES
(1, 1, 150, '2022-01-15'),
(2, 2, 200, '2022-05-18'),
(3, 3, 50, '2022-06-10'),
(4, 4, 80, '2022-06-20'),
(5, 1, 120, '2022-06-30'),
(6, 5, 180, '2022-07-05');

Solutions
PostgreSQL Solution
SELECT
EXTRACT(MONTH FROM s.SaleDate) AS Month,
d.DeviceType,
SUM(s.QuantitySold) AS TotalQuantitySold
FROM
Sales s
JOIN
Devices d ON s.DeviceID = d.DeviceID
WHERE
EXTRACT(YEAR FROM s.SaleDate) = 2022
GROUP BY
Month, d.DeviceType
ORDER BY
Month, TotalQuantitySold DESC;

MySQL Solution
SELECT
MONTH(s.SaleDate) AS Month,
d.DeviceType,
SUM(s.QuantitySold) AS TotalQuantitySold
FROM
Sales s
JOIN
Devices d ON s.DeviceID = d.DeviceID
WHERE
YEAR(s.SaleDate) = 2022
GROUP BY
Month, d.DeviceType
ORDER BY
Month, TotalQuantitySold DESC;
• Q.634
Users with the Most Devices
Write a SQL query to find the top 3 users who own the most devices. The result should
include UserID, UserName, and the TotalDevices, ordered by TotalDevices in descending
order.
Explanation
To solve this:
• Join the Users and Devices tables to get device ownership details for each user.
• Count the total number of devices for each user.
• Order the result by the number of devices in descending order.
• Limit the result to show only the top 3 users with the most devices.
Datasets and SQL Schemas
-- Creating Users table
CREATE TABLE Users (
UserID INT,
UserName VARCHAR(100),

798
1000+ SQL Interview Questions & Answers | By Zero Analyst

Email VARCHAR(100)
);

-- Creating Devices table


CREATE TABLE Devices (
DeviceID INT,
UserID INT,
DeviceType VARCHAR(50),
PurchaseDate DATE
);

Datasets
-- Inserting data into Users table
INSERT INTO Users (UserID, UserName, Email)
VALUES
(1, 'John Doe', '[email protected]'),
(2, 'Jane Smith', '[email protected]'),
(3, 'Alice Johnson', '[email protected]'),
(4, 'Robert Brown', '[email protected]');

-- Inserting data into Devices table


INSERT INTO Devices (DeviceID, UserID, DeviceType, PurchaseDate)
VALUES
(1, 1, 'iPhone', '2021-01-10'),
(2, 1, 'MacBook', '2022-05-20'),
(3, 1, 'Apple Watch', '2022-06-25'),
(4, 2, 'iPad', '2021-02-15'),
(5, 2, 'MacBook', '2022-04-18'),
(6, 3, 'iPhone', '2021-03-25'),
(7, 3, 'MacBook', '2022-02-10'),
(8, 3, 'AirPods', '2022-07-05'),
(9, 4, 'iPhone', '2021-01-05');

Solutions
PostgreSQL Solution
SELECT
u.UserID,
u.UserName,
COUNT(d.DeviceID) AS TotalDevices
FROM
Users u
JOIN
Devices d ON u.UserID = d.UserID
GROUP BY
u.UserID, u.UserName
ORDER BY
TotalDevices DESC
LIMIT 3;

MySQL Solution
SELECT
u.UserID,
u.UserName,
COUNT(d.DeviceID) AS TotalDevices
FROM
Users u
JOIN
Devices d ON u.UserID = d.UserID
GROUP BY
u.UserID, u.UserName
ORDER BY
TotalDevices DESC
LIMIT 3;
• Q.635
Average Device Age by Device Type

799
1000+ SQL Interview Questions & Answers | By Zero Analyst

Write a SQL query to calculate the average age of each device type as of January 1, 2023.
The result should include the DeviceType and the AverageDeviceAge, ordered by the
DeviceType.

Explanation
To solve this:
• Join the Devices table to calculate the age of each device as of January 1, 2023.
• Calculate the age of each device by subtracting the PurchaseDate from the fixed date
(January 1, 2023).
• Group the results by DeviceType.
• Calculate the average age for each device type.
Datasets and SQL Schemas
-- Creating Devices table
CREATE TABLE Devices (
DeviceID INT,
DeviceType VARCHAR(50),
PurchaseDate DATE
);

Datasets
-- Inserting data into Devices table
INSERT INTO Devices (DeviceID, DeviceType, PurchaseDate)
VALUES
(1, 'iPhone', '2021-01-10'),
(2, 'MacBook', '2022-05-20'),
(3, 'iPad', '2021-02-15'),
(4, 'AirPods', '2022-06-25'),
(5, 'Apple Watch', '2021-03-25');

Solutions
PostgreSQL Solution
SELECT
DeviceType,
AVG(EXTRACT(YEAR FROM DATE '2023-01-01' - PurchaseDate)) AS AverageDeviceAge
FROM
Devices
GROUP BY
DeviceType
ORDER BY
DeviceType;

MySQL Solution
SELECT
DeviceType,
AVG(TIMESTAMPDIFF(YEAR, PurchaseDate, '2023-01-01')) AS AverageDeviceAge
FROM
Devices
GROUP BY
DeviceType
ORDER BY
DeviceType;
• Q.636
Most Profitable Device by Region
Write a SQL query to find the most profitable device by region. Profitability is determined by
the total revenue generated from sales of each device, where revenue is calculated by
multiplying the quantity sold by the sale price. The result should include Region,

800
1000+ SQL Interview Questions & Answers | By Zero Analyst

DeviceType, and TotalRevenue, and it should be ordered by Region and TotalRevenue in


descending order.
Explanation
To solve this:
• Join the Sales, Devices, and Regions tables to link sales, devices, and the region they
were sold in.
• Calculate the revenue for each sale by multiplying the quantity sold by the device price.
• Sum the total revenue for each device in each region.
• Group by Region and DeviceType, and return the device with the highest revenue per
region.
Datasets and SQL Schemas
-- Creating Devices table
CREATE TABLE Devices (
DeviceID INT,
DeviceType VARCHAR(50),
SalePrice DECIMAL(10, 2)
);

-- Creating Sales table


CREATE TABLE Sales (
SaleID INT,
DeviceID INT,
QuantitySold INT,
SaleDate DATE
);

-- Creating Regions table


CREATE TABLE Regions (
SaleID INT,
Region VARCHAR(50)
);

Datasets
-- Inserting data into Devices table
INSERT INTO Devices (DeviceID, DeviceType, SalePrice)
VALUES
(1, 'iPhone', 999.99),
(2, 'MacBook', 1999.99),
(3, 'iPad', 799.99),
(4, 'AirPods', 249.99),
(5, 'Apple Watch', 399.99);

-- Inserting data into Sales table


INSERT INTO Sales (SaleID, DeviceID, QuantitySold, SaleDate)
VALUES
(1, 1, 150, '2022-01-10'),
(2, 2, 200, '2022-05-20'),
(3, 3, 50, '2022-06-15'),
(4, 4, 80, '2022-06-20'),
(5, 5, 120, '2022-07-05');

-- Inserting data into Regions table


INSERT INTO Regions (SaleID, Region)
VALUES
(1, 'North America'),
(2, 'Europe'),
(3, 'North America'),
(4, 'Europe'),
(5, 'Asia');

Solutions
PostgreSQL Solution

801
1000+ SQL Interview Questions & Answers | By Zero Analyst

SELECT
r.Region,
d.DeviceType,
SUM(s.QuantitySold * d.SalePrice) AS TotalRevenue
FROM
Sales s
JOIN
Devices d ON s.DeviceID = d.DeviceID
JOIN
Regions r ON s.SaleID = r.SaleID
GROUP BY
r.Region, d.DeviceType
ORDER BY
r.Region, TotalRevenue DESC;

MySQL Solution
SELECT
r.Region,
d.DeviceType,
SUM(s.QuantitySold * d.SalePrice) AS TotalRevenue
FROM
Sales s
JOIN
Devices d ON s.DeviceID = d.DeviceID
JOIN
Regions r ON s.SaleID = r.SaleID
GROUP BY
r.Region, d.DeviceType
ORDER BY
r.Region, TotalRevenue DESC;

• Q.637
Device Compatibility Check
Write a SQL query to find all pairs of users who own two devices that are compatible with
each other. Compatibility is determined by checking if a user owns both an iPhone and an
iPad. The query should return UserID1, UserID2, and the DevicePair as 'iPhone + iPad'.
Ensure that the result only contains unique pairs of users (i.e., (1, 2) and (2, 1) should not
appear twice).
Explanation
To solve this:
• Join the Devices table with itself to compare the devices owned by two users.
• Filter the devices to include only 'iPhone' and 'iPad'.
• Ensure that each pair is unique by using a condition where UserID1 < UserID2 to avoid
duplication.
Datasets and SQL Schemas
-- Creating Users table
CREATE TABLE Users (
UserID INT,
UserName VARCHAR(100),
Email VARCHAR(100)
);

-- Creating Devices table


CREATE TABLE Devices (
DeviceID INT,
UserID INT,
DeviceType VARCHAR(50),
PurchaseDate DATE
);

802
1000+ SQL Interview Questions & Answers | By Zero Analyst

Datasets
-- Inserting data into Users table
INSERT INTO Users (UserID, UserName, Email)
VALUES
(1, 'John Doe', '[email protected]'),
(2, 'Jane Smith', '[email protected]'),
(3, 'Alice Johnson', '[email protected]');

-- Inserting data into Devices table


INSERT INTO Devices (DeviceID, UserID, DeviceType, PurchaseDate)
VALUES
(1, 1, 'iPhone', '2021-01-10'),
(2, 1, 'iPad', '2022-05-20'),
(3, 2, 'iPhone', '2021-06-15'),
(4, 2, 'MacBook', '2022-07-10'),
(5, 3, 'iPad', '2022-06-25');

Solutions
PostgreSQL Solution
SELECT
DISTINCT LEAST(d1.UserID, d2.UserID) AS UserID1,
GREATEST(d1.UserID, d2.UserID) AS UserID2,
'iPhone + iPad' AS DevicePair
FROM
Devices d1
JOIN
Devices d2 ON d1.UserID < d2.UserID
WHERE
d1.DeviceType = 'iPhone' AND d2.DeviceType = 'iPad';

MySQL Solution
SELECT
DISTINCT LEAST(d1.UserID, d2.UserID) AS UserID1,
GREATEST(d1.UserID, d2.UserID) AS UserID2,
'iPhone + iPad' AS DevicePair
FROM
Devices d1
JOIN
Devices d2 ON d1.UserID < d2.UserID
WHERE
d1.DeviceType = 'iPhone' AND d2.DeviceType = 'iPad';
• Q.638

Question 1: Devices with Shared Apps


Write a SQL query to identify users who have at least two devices (e.g., an iPhone and an
iPad) that share the same app (by AppID). The query should return the UserID, DeviceType1,
DeviceType2, and the AppID shared between these devices.
Explanation
To solve this:
• Join the Devices table with the Apps table to track which apps are installed on which
devices.
• Filter for users who have at least two devices of different types (e.g., iPhone and iPad).
• Identify the shared AppID between the two devices of different types.
• Group by UserID and AppID to ensure uniqueness.
Datasets and SQL Schemas
-- Creating Devices table
CREATE TABLE Devices (
DeviceID INT,

803
1000+ SQL Interview Questions & Answers | By Zero Analyst

UserID INT,
DeviceType VARCHAR(50),
PurchaseDate DATE
);

-- Creating Apps table


CREATE TABLE Apps (
AppID INT,
DeviceID INT,
AppName VARCHAR(100)
);

Datasets
-- Inserting data into Devices table
INSERT INTO Devices (DeviceID, UserID, DeviceType, PurchaseDate)
VALUES
(1, 1, 'iPhone', '2021-01-10'),
(2, 1, 'iPad', '2022-05-20'),
(3, 2, 'iPhone', '2021-06-15'),
(4, 2, 'MacBook', '2022-07-10'),
(5, 3, 'iPad', '2022-06-25');

-- Inserting data into Apps table


INSERT INTO Apps (AppID, DeviceID, AppName)
VALUES
(1, 1, 'Apple Music'),
(2, 2, 'Apple Music'),
(3, 2, 'iMessage'),
(4, 3, 'Apple Music'),
(5, 5, 'Safari');

Solutions
PostgreSQL Solution
SELECT
d1.UserID,
d1.DeviceType AS DeviceType1,
d2.DeviceType AS DeviceType2,
a.AppID
FROM
Devices d1
JOIN
Devices d2 ON d1.UserID = d2.UserID AND d1.DeviceID != d2.DeviceID
JOIN
Apps a ON a.DeviceID = d1.DeviceID
WHERE
d1.DeviceType != d2.DeviceType
AND EXISTS (
SELECT 1
FROM Apps a2
WHERE a2.DeviceID = d2.DeviceID AND a2.AppID = a.AppID
)
GROUP BY
d1.UserID, d1.DeviceType, d2.DeviceType, a.AppID;

MySQL Solution
SELECT
d1.UserID,
d1.DeviceType AS DeviceType1,
d2.DeviceType AS DeviceType2,
a.AppID
FROM
Devices d1
JOIN
Devices d2 ON d1.UserID = d2.UserID AND d1.DeviceID != d2.DeviceID
JOIN
Apps a ON a.DeviceID = d1.DeviceID
WHERE
d1.DeviceType != d2.DeviceType
AND EXISTS (

804
1000+ SQL Interview Questions & Answers | By Zero Analyst

SELECT 1
FROM Apps a2
WHERE a2.DeviceID = d2.DeviceID AND a2.AppID = a.AppID
)
GROUP BY
d1.UserID, d1.DeviceType, d2.DeviceType, a.AppID;
• Q.639
Device Battery Health Check
Write a SQL query to find users who own multiple devices and have at least one device with
a battery health percentage of less than 30%. Return the UserID, UserName, the DeviceID,
DeviceType, and the BatteryHealth of the device with poor battery health.
Explanation
To solve this:
• Join the Devices table with the BatteryHealth table to track battery health percentages.
• Filter the users who own more than one device.
• Check if any device has a battery health of less than 30%.
• Return the UserID, DeviceID, DeviceType, and BatteryHealth of the devices with poor
battery health.
Datasets and SQL Schemas
-- Creating Devices table
CREATE TABLE Devices (
DeviceID INT,
UserID INT,
DeviceType VARCHAR(50),
PurchaseDate DATE
);

-- Creating BatteryHealth table


CREATE TABLE BatteryHealth (
DeviceID INT,
BatteryHealth INT
);

Datasets
-- Inserting data into Devices table
INSERT INTO Devices (DeviceID, UserID, DeviceType, PurchaseDate)
VALUES
(1, 1, 'iPhone', '2021-01-10'),
(2, 1, 'iPad', '2022-05-20'),
(3, 2, 'iPhone', '2021-06-15'),
(4, 2, 'MacBook', '2022-07-10'),
(5, 3, 'iPad', '2022-06-25');

-- Inserting data into BatteryHealth table


INSERT INTO BatteryHealth (DeviceID, BatteryHealth)
VALUES
(1, 85),
(2, 75),
(3, 20),
(4, 45),
(5, 29);

Solutions
PostgreSQL Solution
SELECT
d.UserID,
u.UserName,
d.DeviceID,
d.DeviceType,

805
1000+ SQL Interview Questions & Answers | By Zero Analyst

b.BatteryHealth
FROM
Devices d
JOIN
BatteryHealth b ON d.DeviceID = b.DeviceID
JOIN
Users u ON d.UserID = u.UserID
WHERE
b.BatteryHealth < 30
AND EXISTS (
SELECT 1
FROM Devices d2
WHERE d2.UserID = d.UserID AND d2.DeviceID != d.DeviceID
)
ORDER BY
d.UserID, d.DeviceID;

MySQL Solution
SELECT
d.UserID,
u.UserName,
d.DeviceID,
d.DeviceType,
b.BatteryHealth
FROM
Devices d
JOIN
BatteryHealth b ON d.DeviceID = b.DeviceID
JOIN
Users u ON d.UserID = u.UserID
WHERE
b.BatteryHealth < 30
AND EXISTS (
SELECT 1
FROM Devices d2
WHERE d2.UserID = d.UserID AND d2.DeviceID != d.DeviceID
)
ORDER BY
d.UserID, d.DeviceID;
• Q.640
Device Interaction Log
Write a SQL query to find the most frequent device interaction (e.g., pairing an iPhone with
an Apple Watch) that occurs for each user. An interaction is defined as a user pairing two
devices of different types (e.g., iPhone and Apple Watch). Return the UserID, DeviceType1,
DeviceType2, and the InteractionCount (number of pairings).
Explanation
To solve this:
• Join the Devices table with itself to track pairings between devices of different types for
the same user.
• Group by the UserID and the device types involved in the interaction.
• Count the interactions and return the most frequent interaction for each user.
Datasets and SQL Schemas
-- Creating Devices table
CREATE TABLE Devices (
DeviceID INT,
UserID INT,
DeviceType VARCHAR(50),
PurchaseDate DATE
);

Datasets

806
1000+ SQL Interview Questions & Answers | By Zero Analyst

-- Inserting data into Devices table


INSERT INTO Devices (DeviceID, UserID, DeviceType, PurchaseDate)
VALUES
(1, 1, 'iPhone', '2021-01-10'),
(2, 1, 'Apple Watch', '2021-05-20'),
(3, 1, 'MacBook', '2021-06-25'),
(4, 2, 'iPhone', '2021-06-15'),
(5, 2, 'Apple Watch', '2021-08-01'),
(6, 3, 'iPhone', '2022-01-10'),
(7, 3, 'Apple Watch', '2022-03-15');

Solutions
PostgreSQL Solution
SELECT
d1.UserID,
d1.DeviceType AS DeviceType1,
d2.DeviceType AS DeviceType2,
COUNT(*) AS InteractionCount
FROM
Devices d1
JOIN
Devices d2 ON d1.UserID = d2.UserID
WHERE
d1.DeviceType != d2.DeviceType
GROUP BY
d1.UserID, d1.DeviceType, d2.DeviceType
ORDER BY
InteractionCount DESC;

MySQL Solution
SELECT
d1.UserID,
d1.DeviceType AS DeviceType1,
d2.DeviceType AS DeviceType2,
COUNT(*) AS InteractionCount
FROM
Devices d1
JOIN
Devices d2 ON d1.UserID = d2.UserID
WHERE
d1.DeviceType != d2.DeviceType
GROUP BY
d1.UserID, d1.DeviceType, d2.DeviceType
ORDER BY
InteractionCount DESC;

Adobe
• Q.641
Find the total number of visitors (distinct users) to the website for each day.
Explanation
This query requires counting the number of distinct users (using DISTINCT) who visited the
website on each specific day. The data is stored in a website_visits table with user visits,
and the goal is to group the data by date.
Datasets and SQL Schemas
• - Table creation
CREATE TABLE website_visits (
visit_id INT,
user_id INT,
visit_date DATE
);
• - Datasets

807
1000+ SQL Interview Questions & Answers | By Zero Analyst

INSERT INTO website_visits (visit_id, user_id, visit_date)


VALUES
(1, 101, '2022-01-01'),
(2, 102, '2022-01-01'),
(3, 101, '2022-01-02'),
(4, 103, '2022-01-02'),
(5, 104, '2022-01-02');

Learnings
• Using COUNT(DISTINCT ...) to count unique visitors
• Grouping data by date using GROUP BY
Solutions
• - PostgreSQL solution
SELECT visit_date, COUNT(DISTINCT user_id) AS total_visitors
FROM website_visits
GROUP BY visit_date
ORDER BY visit_date;
• - MySQL solution
SELECT visit_date, COUNT(DISTINCT user_id) AS total_visitors
FROM website_visits
GROUP BY visit_date
ORDER BY visit_date;
• Q.642
Find the average time spent on the website per session for each user in the last 30 days.
Explanation
This question asks to calculate the average time spent per session for each user. We will need
a session_duration column that contains the duration (in seconds or minutes) of each
session. The query will filter sessions within the last 30 days, group by user, and calculate the
average session duration.
Datasets and SQL Schemas
• - Table creation
CREATE TABLE user_sessions (
session_id INT,
user_id INT,
session_duration INT, -- Duration in seconds
session_date DATE
);
• - Datasets
INSERT INTO user_sessions (session_id, user_id, session_duration, session_date)
VALUES
(1, 101, 300, '2022-12-01'),
(2, 102, 400, '2022-12-02'),
(3, 101, 500, '2022-12-05'),
(4, 103, 250, '2022-12-05'),
(5, 101, 350, '2022-12-10');

Learnings
• Using AVG() to calculate the average session duration
• Filtering data using date ranges
• Grouping data by user to calculate individual averages
Solutions
• - PostgreSQL solution
SELECT user_id, AVG(session_duration) AS avg_session_duration
FROM user_sessions
WHERE session_date >= CURRENT_DATE - INTERVAL '30 days'

808
1000+ SQL Interview Questions & Answers | By Zero Analyst

GROUP BY user_id
ORDER BY avg_session_duration DESC;
• - MySQL solution
SELECT user_id, AVG(session_duration) AS avg_session_duration
FROM user_sessions
WHERE session_date >= CURDATE() - INTERVAL 30 DAY
GROUP BY user_id
ORDER BY avg_session_duration DESC;
• Q.643

Question 3
Identify the top 3 pages most visited in the last 7 days.
Explanation
This question asks to find the most popular pages on the website based on the number of
visits. The page_visits table records visits to various pages, and we need to filter data by
the last 7 days and count the number of visits per page.
Datasets and SQL Schemas
• - Table creation
CREATE TABLE page_visits (
visit_id INT,
page_id INT,
visit_date DATE
);
• - Datasets
INSERT INTO page_visits (visit_id, page_id, visit_date)
VALUES
(1, 201, '2022-12-05'),
(2, 202, '2022-12-06'),
(3, 201, '2022-12-06'),
(4, 203, '2022-12-06'),
(5, 202, '2022-12-07');

Learnings
• Using COUNT() to count the number of visits
• Filtering data by date to consider only the last 7 days
• Using LIMIT to get the top results
Solutions
• - PostgreSQL solution
SELECT page_id, COUNT(*) AS total_visits
FROM page_visits
WHERE visit_date >= CURRENT_DATE - INTERVAL '7 days'
GROUP BY page_id
ORDER BY total_visits DESC
LIMIT 3;
• - MySQL solution
SELECT page_id, COUNT(*) AS total_visits
FROM page_visits
WHERE visit_date >= CURDATE() - INTERVAL 7 DAY
GROUP BY page_id
ORDER BY total_visits DESC
LIMIT 3;

• Q.644

809
1000+ SQL Interview Questions & Answers | By Zero Analyst

Calculate the conversion rate of visitors who clicked on a campaign and then made a
purchase within 3 days.
Explanation
This query calculates the conversion rate for a specific campaign. It checks for users who
clicked on the campaign and then made a purchase within a 3-day window. The conversion
rate is calculated as the ratio of users who made a purchase to those who clicked on the
campaign.
Datasets and SQL Schemas
• - Table creation
CREATE TABLE campaign_clicks (
click_id INT,
user_id INT,
campaign_id INT,
click_date DATE
);
• - Datasets
INSERT INTO campaign_clicks (click_id, user_id, campaign_id, click_date)
VALUES
(1, 101, 1, '2022-12-01'),
(2, 102, 1, '2022-12-02'),
(3, 103, 2, '2022-12-03'),
(4, 101, 1, '2022-12-05'),
(5, 104, 1, '2022-12-07');
• - Table creation
CREATE TABLE purchases (
purchase_id INT,
user_id INT,
purchase_date DATE
);
• - Datasets
INSERT INTO purchases (purchase_id, user_id, purchase_date)
VALUES
(1, 101, '2022-12-03'),
(2, 102, '2022-12-10'),
(3, 103, '2022-12-07'),
(4, 105, '2022-12-02');

Learnings
• Using JOIN to link clicks and purchases
• Filtering data based on time windows (3-day period)
• Calculating conversion rates
Solutions
• - PostgreSQL solution
WITH click_to_purchase AS (
SELECT c.user_id, c.campaign_id
FROM campaign_clicks c
LEFT JOIN purchases p ON c.user_id = p.user_id
WHERE p.purchase_date BETWEEN c.click_date AND c.click_date + INTERVAL '3 days'
)
SELECT campaign_id,
COUNT(DISTINCT user_id) AS total_conversions,
(COUNT(DISTINCT user_id) / (SELECT COUNT(DISTINCT user_id) FROM campaign_clicks W
HERE campaign_id = 1)) * 100 AS conversion_rate
FROM click_to_purchase
GROUP BY campaign_id;
• - MySQL solution
WITH click_to_purchase AS (
SELECT c.user_id, c.campaign_id
FROM campaign_clicks c

810
1000+ SQL Interview Questions & Answers | By Zero Analyst

LEFT JOIN purchases p ON c.user_id = p.user_id


WHERE p.purchase_date BETWEEN c.click_date AND DATE_ADD(c.click_date, INTERVAL 3 DAY
)
)
SELECT campaign_id,
COUNT(DISTINCT user_id) AS total_conversions,
(COUNT(DISTINCT user_id) / (SELECT COUNT(DISTINCT user_id) FROM campaign_clicks W
HERE campaign_id = 1)) * 100 AS conversion_rate
FROM click_to_purchase
GROUP BY campaign_id;
• Q.645
Identify the users who made purchases in at least 2 different campaigns within the same
month.
Explanation
This query identifies users who were involved in at least two distinct campaigns in the same
month. It requires filtering purchases by campaign and grouping them by user and month,
with a condition that the user must appear in multiple campaigns in the same month.
Datasets and SQL Schemas
• - Table creation
CREATE TABLE campaign_purchases (
purchase_id INT,
user_id INT,
campaign_id INT,
purchase_date DATE
);
• - Datasets
INSERT INTO campaign_purchases (purchase_id, user_id, campaign_id, purchase_date)
VALUES
(1, 101, 1, '2022-01-05'),
(2, 101, 2, '2022-01-10'),
(3, 102, 1, '2022-01-12'),
(4, 102, 3, '2022-01-18'),
(5, 103, 1, '2022-02-05'),
(6, 104, 2, '2022-02-15');

Learnings
• Using GROUP BY to group by user and month
• Using HAVING to filter users with purchases in multiple campaigns
• Date-based filtering using MONTH() and YEAR()
Solutions
• - PostgreSQL solution
SELECT user_id,
EXTRACT(MONTH FROM purchase_date) AS month,
EXTRACT(YEAR FROM purchase_date) AS year,
COUNT(DISTINCT campaign_id) AS campaign_count
FROM campaign_purchases
GROUP BY user_id, year, month
HAVING COUNT(DISTINCT campaign_id) >= 2;
• - MySQL solution
SELECT user_id,
MONTH(purchase_date) AS month,
YEAR(purchase_date) AS year,
COUNT(DISTINCT campaign_id) AS campaign_count
FROM campaign_purchases
GROUP BY user_id, year, month
HAVING COUNT(DISTINCT campaign_id) >= 2;
• Q.646
Find the average spend per user in each product category for the year 2022.

811
1000+ SQL Interview Questions & Answers | By Zero Analyst

Explanation
This query calculates the average amount spent per user in each product category during the
year 2022. We need to aggregate the total spend per user per category and then compute the
average for each category.
Datasets and SQL Schemas
• - Table creation
CREATE TABLE product_purchases (
purchase_id INT,
user_id INT,
category_id INT,
purchase_date DATE,
purchase_amount DECIMAL(10, 2)
);
• - Datasets
INSERT INTO product_purchases (purchase_id, user_id, category_id, purchase_date, purchas
e_amount)
VALUES
(1, 101, 1, '2022-01-01', 100.00),
(2, 101, 2, '2022-02-01', 200.00),
(3, 102, 1, '2022-03-01', 150.00),
(4, 103, 2, '2022-04-01', 250.00),
(5, 101, 1, '2022-05-01', 120.00),
(6, 102, 1, '2022-06-01', 300.00);

Learnings
• Using AVG() to calculate average spend per user
• Filtering data for a specific year
• Grouping by category to calculate averages per category
Solutions
• - PostgreSQL solution
SELECT category_id,
AVG(purchase_amount) AS avg_spend_per_user
FROM product_purchases
WHERE EXTRACT(YEAR FROM purchase_date) = 2022
GROUP BY category_id;
• - MySQL solution
SELECT category_id,
AVG(purchase_amount) AS avg_spend_per_user
FROM product_purchases
WHERE YEAR(purchase_date) = 2022
GROUP BY category_id;
• Q.647
Find the top 3 products with the highest total sales in the last quarter (3 months) of the
year.
Explanation
This query involves calculating the total sales for each product in the last quarter of the year
(October to December), then identifying the top 3 products with the highest sales. You will
need to filter the data for the relevant months and aggregate the sales by product.
Datasets and SQL Schemas
• - Table creation
CREATE TABLE product_sales (
sale_id INT,
product_id INT,
sale_amount DECIMAL(10, 2),
sale_date DATE

812
1000+ SQL Interview Questions & Answers | By Zero Analyst

);
• - Datasets
INSERT INTO product_sales (sale_id, product_id, sale_amount, sale_date)
VALUES
(1, 101, 300.00, '2022-10-15'),
(2, 102, 500.00, '2022-11-05'),
(3, 101, 200.00, '2022-11-10'),
(4, 103, 400.00, '2022-12-01'),
(5, 101, 150.00, '2022-12-20'),
(6, 104, 600.00, '2022-12-25');

Learnings
• Filtering data for a specific time period (last quarter)
• Aggregating sales by product
• Using ORDER BY and LIMIT to find the top products
Solutions
• - PostgreSQL solution
SELECT product_id,
SUM(sale_amount) AS total_sales
FROM product_sales
WHERE sale_date BETWEEN '2022-10-01' AND '2022-12-31'
GROUP BY product_id
ORDER BY total_sales DESC
LIMIT 3;
• - MySQL solution
SELECT product_id,
SUM(sale_amount) AS total_sales
FROM product_sales
WHERE sale_date BETWEEN '2022-10-01' AND '2022-12-31'
GROUP BY product_id
ORDER BY total_sales DESC
LIMIT 3;
• Q.648
Find the number of distinct users who purchased a specific product in each month of
2022.
Explanation
This query involves counting the distinct users who purchased a specific product (e.g.,
product_id = 101) in each month of 2022. The goal is to show the number of distinct users
by month for the chosen product. This is a useful metric for tracking engagement with a
product over time.
Datasets and SQL Schemas
• - Table creation
CREATE TABLE product_purchases (
purchase_id INT,
user_id INT,
product_id INT,
purchase_date DATE
);
• - Datasets
INSERT INTO product_purchases (purchase_id, user_id, product_id, purchase_date)
VALUES
(1, 101, 101, '2022-01-15'),
(2, 102, 101, '2022-02-10'),
(3, 103, 101, '2022-02-20'),
(4, 104, 101, '2022-03-05'),
(5, 105, 101, '2022-04-10'),
(6, 101, 101, '2022-04-15');

813
1000+ SQL Interview Questions & Answers | By Zero Analyst

Learnings
• Using COUNT(DISTINCT ...) to count unique users
• Grouping by month and year using EXTRACT() or MONTH()
• Filtering for a specific product
Solutions
• - PostgreSQL solution
SELECT EXTRACT(MONTH FROM purchase_date) AS month,
COUNT(DISTINCT user_id) AS distinct_users
FROM product_purchases
WHERE product_id = 101 AND EXTRACT(YEAR FROM purchase_date) = 2022
GROUP BY month
ORDER BY month;
• - MySQL solution
SELECT MONTH(purchase_date) AS month,
COUNT(DISTINCT user_id) AS distinct_users
FROM product_purchases
WHERE product_id = 101 AND YEAR(purchase_date) = 2022
GROUP BY month
ORDER BY month;
• Q.649
Identify the top 5 products that have the highest average sales amount per transaction
in 2022.
Explanation
In this query, we calculate the average sale amount for each product in 2022. The goal is to
identify the top 5 products that generated the highest average sales per transaction. The AVG()
function will help calculate the average sale amount for each product.
Datasets and SQL Schemas
• - Table creation
CREATE TABLE product_sales (
sale_id INT,
product_id INT,
sale_amount DECIMAL(10, 2),
sale_date DATE
);
• - Datasets
INSERT INTO product_sales (sale_id, product_id, sale_amount, sale_date)
VALUES
(1, 101, 200.00, '2022-01-10'),
(2, 102, 500.00, '2022-03-15'),
(3, 103, 700.00, '2022-05-20'),
(4, 101, 300.00, '2022-07-12'),
(5, 104, 450.00, '2022-08-25'),
(6, 101, 350.00, '2022-10-30'),
(7, 102, 600.00, '2022-11-01');

Learnings
• Using AVG() to calculate the average sales amount
• Filtering data for a specific year
• Sorting the results to find the top products based on average sales
Solutions
• - PostgreSQL solution
SELECT product_id,
AVG(sale_amount) AS avg_sale_amount
FROM product_sales

814
1000+ SQL Interview Questions & Answers | By Zero Analyst

WHERE EXTRACT(YEAR FROM sale_date) = 2022


GROUP BY product_id
ORDER BY avg_sale_amount DESC
LIMIT 5;
• - MySQL solution
SELECT product_id,
AVG(sale_amount) AS avg_sale_amount
FROM product_sales
WHERE YEAR(sale_date) = 2022
GROUP BY product_id
ORDER BY avg_sale_amount DESC
LIMIT 5;
• Q.650
Identify the top 3 campaigns that had the highest number of conversions within the first
7 days after a user interacted with them, based on click-through and conversion actions.
Explanation
This query requires identifying the campaigns that had the highest conversion rate (users who
clicked and then converted within 7 days). You need to calculate the number of conversions
for each campaign and then identify the top 3 campaigns with the most conversions. The time
window between click and conversion is essential to accurately filter the results.
Datasets and SQL Schemas
• - Table creation
CREATE TABLE campaign_interactions (
interaction_id INT,
user_id INT,
campaign_id INT,
interaction_type VARCHAR(50), -- 'Clicked' or 'Viewed'
interaction_date DATE
);
• - Datasets
INSERT INTO campaign_interactions (interaction_id, user_id, campaign_id, interaction_typ
e, interaction_date)
VALUES
(1, 101, 1, 'Clicked', '2022-10-01'),
(2, 101, 1, 'Converted', '2022-10-03'),
(3, 102, 2, 'Clicked', '2022-10-05'),
(4, 102, 2, 'Converted', '2022-10-07'),
(5, 103, 3, 'Clicked', '2022-10-06'),
(6, 104, 3, 'Converted', '2022-10-14'),
(7, 105, 1, 'Clicked', '2022-10-07');

Learnings
• Using JOIN to link clicks with conversions
• Filtering actions based on a specific time window (7 days)
• Aggregating data based on campaign performance
• Using COUNT(DISTINCT ...) to count unique users
Solutions
• - PostgreSQL solution
WITH campaign_conversions AS (
SELECT ci.campaign_id, ci.user_id
FROM campaign_interactions ci
JOIN campaign_interactions cii
ON ci.user_id = cii.user_id AND ci.campaign_id = cii.campaign_id
WHERE ci.interaction_type = 'Clicked'
AND cii.interaction_type = 'Converted'
AND cii.interaction_date BETWEEN ci.interaction_date AND ci.interaction_date + INTER
VAL '7 days'
)

815
1000+ SQL Interview Questions & Answers | By Zero Analyst

SELECT campaign_id, COUNT(DISTINCT user_id) AS total_conversions


FROM campaign_conversions
GROUP BY campaign_id
ORDER BY total_conversions DESC
LIMIT 3;
• - MySQL solution
WITH campaign_conversions AS (
SELECT ci.campaign_id, ci.user_id
FROM campaign_interactions ci
JOIN campaign_interactions cii
ON ci.user_id = cii.user_id AND ci.campaign_id = cii.campaign_id
WHERE ci.interaction_type = 'Clicked'
AND cii.interaction_type = 'Converted'
AND cii.interaction_date BETWEEN ci.interaction_date AND DATE_ADD(ci.interaction_dat
e, INTERVAL 7 DAY)
)
SELECT campaign_id, COUNT(DISTINCT user_id) AS total_conversions
FROM campaign_conversions
GROUP BY campaign_id
ORDER BY total_conversions DESC
LIMIT 3;
• Q.651
Determine the monthly user engagement (average number of sessions per user) for each
Adobe Experience Cloud product in 2022.
Explanation
This query calculates the average number of sessions per user for each Adobe Experience
Cloud product on a monthly basis in 2022. The goal is to understand user engagement with
different products. You will need to aggregate session data, group it by user and product, and
calculate the average number of sessions per user for each product.
Datasets and SQL Schemas
• - Table creation
CREATE TABLE product_sessions (
session_id INT,
user_id INT,
product_id INT,
session_date DATE
);
• - Datasets
INSERT INTO product_sessions (session_id, user_id, product_id, session_date)
VALUES
(1, 101, 1, '2022-01-05'),
(2, 101, 2, '2022-01-10'),
(3, 102, 1, '2022-02-07'),
(4, 103, 3, '2022-02-12'),
(5, 101, 1, '2022-03-08'),
(6, 101, 1, '2022-03-10'),
(7, 104, 2, '2022-04-05');

Learnings
• Using COUNT(*) to calculate the number of sessions
• Filtering data based on the year (2022)
• Grouping by product and month
• Using AVG() to calculate the average number of sessions per user
Solutions
• - PostgreSQL solution
SELECT product_id,
EXTRACT(MONTH FROM session_date) AS month,
AVG(session_count) AS avg_sessions_per_user

816
1000+ SQL Interview Questions & Answers | By Zero Analyst

FROM (
SELECT user_id, product_id, EXTRACT(MONTH FROM session_date) AS month, COUNT(*) AS s
ession_count
FROM product_sessions
WHERE EXTRACT(YEAR FROM session_date) = 2022
GROUP BY user_id, product_id, month
) AS monthly_sessions
GROUP BY product_id, month
ORDER BY product_id, month;
• - MySQL solution
SELECT product_id,
MONTH(session_date) AS month,
AVG(session_count) AS avg_sessions_per_user
FROM (
SELECT user_id, product_id, MONTH(session_date) AS month, COUNT(*) AS session_count
FROM product_sessions
WHERE YEAR(session_date) = 2022
GROUP BY user_id, product_id, month
) AS monthly_sessions
GROUP BY product_id, month
ORDER BY product_id, month;
• Q.652
Find the correlation between user activity (clicks and views) and product adoption rates
for Adobe Creative Cloud (e.g., Photoshop, Illustrator) for a given campaign.
Explanation
This question involves calculating the relationship between user activities (clicks and views)
and product adoption rates (the number of users who subscribed or started using the product)
for a specific campaign. The correlation metric is needed to understand the effect of user
engagement on product adoption. You will need to join activity data with product adoption
data and calculate the correlation.
Datasets and SQL Schemas
• - Table creation
CREATE TABLE campaign_activity (
activity_id INT,
user_id INT,
campaign_id INT,
activity_type VARCHAR(50), -- 'Clicked' or 'Viewed'
activity_date DATE
);
• - Datasets
INSERT INTO campaign_activity (activity_id, user_id, campaign_id, activity_type, activit
y_date)
VALUES
(1, 101, 1, 'Clicked', '2022-01-01'),
(2, 101, 1, 'Viewed', '2022-01-02'),
(3, 102, 1, 'Clicked', '2022-01-05'),
(4, 103, 1, 'Viewed', '2022-01-06'),
(5, 104, 1, 'Clicked', '2022-01-07');
• - Table creation
CREATE TABLE product_adoption (
adoption_id INT,
user_id INT,
product_id INT,
campaign_id INT,
adoption_date DATE
);
• - Datasets
INSERT INTO product_adoption (adoption_id, user_id, product_id, campaign_id, adoption_da
te)
VALUES
(1, 101, 1, 1, '2022-01-10'),
(2, 102, 2, 1, '2022-01-15'),

817
1000+ SQL Interview Questions & Answers | By Zero Analyst

(3, 103, 1, 1, '2022-01-20');

Learnings
• Joining activity data with adoption data
• Counting different activity types (clicks, views)
• Analyzing user engagement and adoption for correlation
• Correlation analysis techniques (though SQL itself won't calculate correlation directly, we
can prepare the data for such analysis)
Solutions
• - PostgreSQL solution
SELECT a.campaign_id,
COUNT(DISTINCT CASE WHEN ac.activity_type = 'Clicked' THEN ac.user_id END) AS cli
cks,
COUNT(DISTINCT CASE WHEN ac.activity_type = 'Viewed' THEN ac.user_id END) AS view
s,
COUNT(DISTINCT pa.user_id) AS product_adoptions
FROM campaign_activity ac
LEFT JOIN product_adoption pa ON ac.user_id = pa.user_id AND ac.campaign_id = pa.campaig
n_id
WHERE ac.campaign_id = 1
GROUP BY a.campaign_id;
• - MySQL solution
SELECT ac.campaign_id,
COUNT(DISTINCT CASE WHEN ac.activity_type = 'Clicked' THEN ac.user_id END) AS cli
cks,COUNT(DISTINCT CASE WHEN ac.activity_type = 'Viewed' THEN ac.user_id END) AS views,C
OUNT(DISTINCT pa.user_id) AS product_adoptions FROM campaign_activity acLEFT JOIN produc
t_adoption pa ON ac.user_id = pa.user_id AND ac.campaign_id = pa.campaign_idWHERE ac.cam
paign_id = 1
GROUP BY ac.campaign_id;
• Q.653

Question: Photoshop Revenue Analysis


Explanation
The task is to identify customers who purchased Photoshop and calculate the total revenue
spent on other products (excluding Photoshop). The result should include customer IDs and
the total spent on non-Photoshop products. The output should be sorted by customer IDs in
ascending order.

Datasets and SQL Schemas


• - Table creation
CREATE TABLE adobe_transactions (
customer_id INT,
product VARCHAR(50),
revenue INT
);
• - Datasets
INSERT INTO adobe_transactions (customer_id, product, revenue)
VALUES
(123, 'Photoshop', 50),
(123, 'Premier Pro', 100),
(123, 'After Effects', 50),
(234, 'Illustrator', 200),
(234, 'Premier Pro', 100);

Learnings

818
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Using SUM() to calculate total revenue


• Using GROUP BY to aggregate results by customer
• Filtering with NOT LIKE to exclude Photoshop-related products
• Sorting the result by customer_id

Solutions
• - PostgreSQL solution
SELECT customer_id,
SUM(revenue) AS total_spent
FROM adobe_transactions
WHERE product != 'Photoshop'
AND customer_id IN (
SELECT DISTINCT customer_id
FROM adobe_transactions
WHERE product = 'Photoshop'
)
GROUP BY customer_id
ORDER BY customer_id;
• - MySQL solution
SELECT customer_id,
SUM(revenue) AS total_spent
FROM adobe_transactions
WHERE product != 'Photoshop'
AND customer_id IN (
SELECT DISTINCT customer_id
FROM adobe_transactions
WHERE product = 'Photoshop'
)
GROUP BY customer_id
ORDER BY customer_id;
• Q.654
Adobe User Behavior
Explanation
The task is to identify active users of Adobe products who have used any product more than
4 times in a month and have provided reviews with a rating of 4 stars or higher. The
solution should return the user_id and name of those users who meet both conditions.

Datasets and SQL Schemas


• - Table creation
CREATE TABLE users (
user_id INT,
name VARCHAR(100),
sign_up_date DATE
);
• - Datasets
INSERT INTO users (user_id, name, sign_up_date)
VALUES
(123, 'John Doe', '2022-05-01'),
(265, 'Jane Smith', '2022-05-10'),
(362, 'Alice Johnson', '2022-06-15'),
(192, 'Bob Brown', '2022-06-20'),
(981, 'Charlie Davis', '2022-07-01');
• - Table creation
CREATE TABLE usage (
user_id INT,
product VARCHAR(100),
usage_date DATE
);

819
1000+ SQL Interview Questions & Answers | By Zero Analyst

• - Datasets
INSERT INTO usage (user_id, product, usage_date)
VALUES
(123, 'Photoshop', '2022-06-05'),
(123, 'Photoshop', '2022-06-06'),
(123, 'Photoshop', '2022-06-07'),
(123, 'Photoshop', '2022-06-08'),
(123, 'Photoshop', '2022-06-09'),
(265, 'Lightroom', '2022-07-10'),
(265, 'Lightroom', '2022-07-11'),
(265, 'Lightroom', '2022-07-12'),
(362, 'Illustrator', '2022-07-17'),
(362, 'Illustrator', '2022-07-18');
• - Table creation
CREATE TABLE reviews (
review_id INT,
user_id INT,
submit_date DATE,
product_id VARCHAR(100),
stars INT
);
• - Datasets
INSERT INTO reviews (review_id, user_id, submit_date, product_id, stars)
VALUES
(6171, 123, '2022-06-08', 'Photoshop', 4),
(5293, 362, '2022-07-18', 'Illustrator', 5),
(7802, 265, '2022-07-12', 'Lightroom', 4);

Learnings
• Using INNER JOIN to combine users and their activity data
• Using HAVING COUNT() to filter users based on activity frequency
• Using EXISTS to check for users who have relevant reviews
• Filtering data based on a dynamic month condition
• Aggregating usage data per user within a specific time frame

Solutions
• - PostgreSQL solution
SELECT u.user_id, u.name
FROM users u
INNER JOIN (
SELECT user_id
FROM usage
WHERE EXTRACT(MONTH FROM usage_date) = EXTRACT(MONTH FROM CURRENT_DATE)
GROUP BY user_id
HAVING COUNT(DISTINCT usage_date) > 4
) act ON u.user_id = act.user_id
WHERE EXISTS (
SELECT 1
FROM reviews r
WHERE r.user_id = u.user_id AND r.stars >= 4
);
• - MySQL solution
SELECT u.user_id, u.name
FROM users u
INNER JOIN (
SELECT user_id
FROM usage
WHERE MONTH(usage_date) = MONTH(CURRENT_DATE)
GROUP BY user_id
HAVING COUNT(DISTINCT usage_date) > 4
) act ON u.user_id = act.user_id
WHERE EXISTS (

820
1000+ SQL Interview Questions & Answers | By Zero Analyst

SELECT 1
FROM reviews r
WHERE r.user_id = u.user_id AND r.stars >= 4
);
• Q.655
Average Monthly Subscription Duration
Explanation
The task is to calculate the average subscription duration for each Adobe product in terms
of days. The subscription duration is the difference between the start_date and end_date
for each subscription. After calculating the duration for each subscription, the average
duration for each product should be computed.

Datasets and SQL Schemas


• - Table creation
CREATE TABLE subscriptions (
subscription_id INT,
user_id INT,
product_id INT,
start_date DATE,
end_date DATE
);
• - Datasets
INSERT INTO subscriptions (subscription_id, user_id, product_id, start_date, end_date)
VALUES
(1001, 50, 3001, '2022-01-01', '2022-04-01'),
(1002, 70, 3002, '2022-01-02', '2022-03-01'),
(1003, 90, 3001, '2022-01-03', '2022-01-04'),
(1004, 120, 3002, '2022-01-04', '2022-10-01'),
(1005, 200, 3003, '2022-01-05', '2022-08-01');
• - Table creation
CREATE TABLE products (
product_id INT,
product_name VARCHAR(100)
);
• - Datasets
INSERT INTO products (product_id, product_name)
VALUES
(3001, 'Adobe Photoshop'),
(3002, 'Adobe Acrobat'),
(3003, 'Adobe Illustrator');

Learnings
• Calculating the duration between two dates using DATEDIFF()
• Using AVG() to compute the average of the durations
• JOIN operation between subscriptions and products tables
• Grouping results by product name

Solutions
• - PostgreSQL solution
SELECT
P.product_name,
AVG(DATE_PART('day', S.end_date - S.start_date)) AS avg_subscription_days
FROM
subscriptions S
JOIN

821
1000+ SQL Interview Questions & Answers | By Zero Analyst

products P ON S.product_id = P.product_id


GROUP BY
P.product_name;
• - MySQL solution
SELECT
P.product_name,
AVG(DATEDIFF(S.end_date, S.start_date)) AS avg_subscription_days
FROM
subscriptions S
JOIN
products P ON S.product_id = P.product_id
GROUP BY
P.product_name;

• Q.656
Calculate the Average Spending of Customers
Explanation
The task is to calculate the average order value for each customer by joining the Customers
and Orders tables. The average order value for a customer is the average of the
total_amount from all their orders. The result should display each customer's customer_id,
first_name, last_name, and their corresponding average order value.

Datasets and SQL Schemas


• - Table creation
CREATE TABLE Customers (
customer_id INT,
first_name VARCHAR(50),
last_name VARCHAR(50),
email VARCHAR(100),
create_date DATE
);
• - Datasets
INSERT INTO Customers (customer_id, first_name, last_name, email, create_date)
VALUES
(101, 'John', 'Doe', '[email protected]', '2020-01-01'),
(102, 'Jane', 'Doe', '[email protected]', '2020-02-02'),
(103, 'Bob', 'Smith', '[email protected]', '2020-03-03'),
(104, 'Alice', 'Johnson', '[email protected]', '2020-04-04');
• - Table creation
CREATE TABLE Orders (
order_id INT,
customer_id INT,
order_date DATE,
total_amount DECIMAL(10, 2)
);
• - Datasets
INSERT INTO Orders (order_id, customer_id, order_date, total_amount)
VALUES
(201, 101, '2020-05-05', 200.00),
(202, 101, '2020-06-06', 300.00),
(203, 102, '2020-07-07', 400.00),
(204, 103, '2020-08-08', 500.00),
(205, 104, '2020-09-09', 600.00);

Learnings
• Using AVG() to calculate the average of a numeric column
• Joining two tables with JOIN based on a common column (customer_id)

822
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Grouping the result by customer details (customer_id, first_name, last_name)


• Calculating aggregate values per group in SQL

Solutions
• - PostgreSQL solution
SELECT c.customer_id, c.first_name, c.last_name, AVG(o.total_amount) AS avg_order_value
FROM Customers c
JOIN Orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.first_name, c.last_name;
• - MySQL solution
SELECT c.customer_id, c.first_name, c.last_name, AVG(o.total_amount) AS avg_order_value
FROM Customers c
JOIN Orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.first_name, c.last_name;
• Q.657
Calculate Product Rating and Popularity Based on Review Data
Explanation
The task is to calculate the popularity and average rating for each product based on review
data. The popularity is defined as the square root of the number of reviews for a product,
rounded to the nearest whole number. However, if a product has less than 10 reviews, it
should not be included in the popularity calculation, but its average rating should still be
computed. Products should be ranked by popularity and then by average rating in descending
order.

Datasets and SQL Schemas


• - Table creation
CREATE TABLE reviews (
review_id INT,
user_id INT,
submit_date TIMESTAMP,
product_id INT,
stars INT
);
• - Datasets
INSERT INTO reviews (review_id, user_id, submit_date, product_id, stars)
VALUES
(6171, 123, '2022-06-08 00:00:00', 50001, 4),
(7802, 265, '2022-06-10 00:00:00', 69852, 4),
(5293, 362, '2022-06-18 00:00:00', 50001, 3),
(6352, 192, '2022-07-26 00:00:00', 69852, 3),
(5455, 891, '2022-07-12 00:00:00', 50001, 5),
(5831, 654, '2022-08-23 00:00:00', 69852, 2),
(6661, 892, '2022-06-21 00:00:00', 50001, 1);

Learnings
• Using COUNT(*) to count the number of reviews for each product
• Calculating popularity with the square root of the review count, using ROUND()
• Using AVG() to calculate the average rating for each product
• Applying conditions to handle products with fewer than 10 reviews
• Sorting the result by popularity and average rating

Solutions

823
1000+ SQL Interview Questions & Answers | By Zero Analyst

• - PostgreSQL solution
SELECT product_id,
CASE
WHEN COUNT(*) >= 10 THEN ROUND(SQRT(COUNT(*)))
ELSE NULL
END AS popularity,
ROUND(AVG(stars)::numeric, 2) AS avg_rating
FROM reviews
GROUP BY product_id
ORDER BY popularity DESC NULLS LAST, avg_rating DESC;
• - MySQL solution
SELECT product_id,
CASE
WHEN COUNT(*) >= 10 THEN ROUND(SQRT(COUNT(*)))
ELSE NULL
END AS popularity,
ROUND(AVG(stars), 2) AS avg_rating
FROM reviews
GROUP BY product_id
ORDER BY popularity DESC NULLS LAST, avg_rating DESC;

Explanation of Solution
• COUNT(*) is used to calculate the number of reviews for each product.
• SQRT(COUNT(*)) calculates the square root of the number of reviews, and ROUND() rounds
it to the nearest integer.
• AVG(stars) calculates the average rating for each product, rounded to two decimal places.
• The CASE condition ensures that popularity is only calculated if the product has at least 10
reviews; otherwise, it is set to NULL.
• The result is sorted first by popularity (in descending order), and then by average rating
in descending order for products with the same popularity.
• Q.658
Analyzing AI-Based Product Usage Patterns
Explanation
Adobe uses AI to monitor and analyze how users engage with AI-driven tools across its
products. The task is to identify users who have used AI tools more than 5 times within a
given month and then categorize them by the product they used. Additionally, calculate the
total usage count and average session duration for these users.

Datasets and SQL Schemas


• - Table creation
CREATE TABLE ai_usage (
usage_id INT,
user_id INT,
product_id INT,
ai_tool VARCHAR(100),
session_duration INT,
usage_date DATE
);
• - Datasets
INSERT INTO ai_usage (usage_id, user_id, product_id, ai_tool, session_duration, usage_da
te)
VALUES
(1, 101, 3001, 'AI-powered Filter', 120, '2023-10-05'),
(2, 102, 3002, 'AI Image Enhancement', 150, '2023-10-06'),
(3, 103, 3001, 'AI-powered Filter', 100, '2023-10-07'),
(4, 104, 3003, 'AI Object Detection', 180, '2023-10-10'),

824
1000+ SQL Interview Questions & Answers | By Zero Analyst

(5, 105, 3001, 'AI-powered Filter', 110, '2023-10-11'),


(6, 101, 3002, 'AI Image Enhancement', 130, '2023-10-12'),
(7, 101, 3001, 'AI-powered Filter', 115, '2023-10-15'),
(8, 102, 3003, 'AI Object Detection', 140, '2023-10-16'),
(9, 105, 3003, 'AI Object Detection', 160, '2023-10-17');

Learnings
• Using COUNT() to calculate the number of times a tool has been used by a user
• Calculating the total usage count and average session duration for each user
• Grouping data by product and filtering for users with more than 5 tool uses within a month
• Sorting results to identify the most frequent users of AI tools

Solutions
• - PostgreSQL solution
SELECT u.user_id, p.product_id, COUNT(*) AS usage_count, AVG(session_duration) AS avg_se
ssion_duration
FROM ai_usage u
JOIN products p ON u.product_id = p.product_id
WHERE u.usage_date BETWEEN '2023-10-01' AND '2023-10-31'
GROUP BY u.user_id, p.product_id
HAVING COUNT(*) > 5
ORDER BY usage_count DESC, avg_session_duration DESC;
• - MySQL solution
SELECT u.user_id, p.product_id, COUNT(*) AS usage_count, AVG(session_duration) AS avg_se
ssion_duration
FROM ai_usage u
JOIN products p ON u.product_id = p.product_id
WHERE u.usage_date BETWEEN '2023-10-01' AND '2023-10-31'
GROUP BY u.user_id, p.product_id
HAVING COUNT(*) > 5
ORDER BY usage_count DESC, avg_session_duration DESC;
• Q.659
AI Tool Effectiveness in Adobe Products
Explanation
Adobe wants to evaluate how AI tools are impacting user performance across different
products. For this, we need to calculate the average rating of users who interacted with AI
tools and compare it against users who did not. The query should exclude users with fewer
than 3 AI interactions. Results should be ordered by the difference in ratings between AI and
non-AI users.

Datasets and SQL Schemas


• - Table creation
CREATE TABLE user_ratings (
rating_id INT,
user_id INT,
product_id INT,
rating INT,
rating_date DATE
);
• - Datasets
INSERT INTO user_ratings (rating_id, user_id, product_id, rating, rating_date)
VALUES
(1, 101, 3001, 4, '2023-09-01'),
(2, 102, 3002, 5, '2023-09-03'),
(3, 103, 3001, 3, '2023-09-07'),

825
1000+ SQL Interview Questions & Answers | By Zero Analyst

(4, 104, 3003, 4, '2023-09-10'),


(5, 105, 3001, 2, '2023-09-12'),
(6, 101, 3002, 5, '2023-09-14'),
(7, 102, 3003, 3, '2023-09-15');
• - Table creation for AI tool usage
CREATE TABLE ai_usage (
usage_id INT,
user_id INT,
ai_tool VARCHAR(100),
usage_date DATE
);
• - Datasets
INSERT INTO ai_usage (usage_id, user_id, ai_tool, usage_date)
VALUES
(1, 101, 'AI Image Enhancement', '2023-09-01'),
(2, 101, 'AI Object Detection', '2023-09-05'),
(3, 102, 'AI Image Enhancement', '2023-09-08'),
(4, 102, 'AI Object Detection', '2023-09-09'),
(5, 105, 'AI Image Enhancement', '2023-09-11');

Learnings
• Differentiating users based on whether they used AI tools or not
• Counting AI tool interactions with COUNT()
• Filtering users with at least 3 AI interactions
• Using AVG() to calculate ratings for AI and non-AI users
• Comparing average ratings for both groups and sorting results

Solutions
• - PostgreSQL solution
SELECT
u.product_id,
AVG(CASE WHEN ai.usage_id IS NOT NULL THEN ur.rating ELSE NULL END) AS ai_avg_rating
,
AVG(CASE WHEN ai.usage_id IS NULL THEN ur.rating ELSE NULL END) AS non_ai_avg_rating
,
(AVG(CASE WHEN ai.usage_id IS NOT NULL THEN ur.rating ELSE NULL END) -
AVG(CASE WHEN ai.usage_id IS NULL THEN ur.rating ELSE NULL END)) AS rating_differen
ce
FROM user_ratings ur
LEFT JOIN ai_usage ai ON ur.user_id = ai.user_id
GROUP BY u.product_id
HAVING COUNT(ai.usage_id) >= 3
ORDER BY rating_difference DESC;
• - MySQL solution
SELECT
ur.product_id,
AVG(CASE WHEN ai.usage_id IS NOT NULL THEN ur.rating ELSE NULL END) AS ai_avg_rating
,
AVG(CASE WHEN ai.usage_id IS NULL THEN ur.rating ELSE NULL END) AS non_ai_avg_rating
,
(AVG(CASE WHEN ai.usage_id IS NOT NULL THEN ur.rating ELSE NULL END) -
AVG(CASE WHEN ai.usage_id IS NULL THEN ur.rating ELSE NULL END)) AS rating_differen
ce
FROM user_ratings ur
LEFT JOIN ai_usage ai ON ur.user_id = ai.user_id
GROUP BY ur.product_id
HAVING COUNT(ai.usage_id) >= 3
ORDER BY rating_difference DESC;
• Q.660
Find the user(s) who spent the most money in consecutive transactions (in a single month).
Return the user ID, the month, and the sum of the consecutive transactions, where the

826
1000+ SQL Interview Questions & Answers | By Zero Analyst

consecutive transactions are defined as a period of two or more transactions with no gaps of
more than 1 day in between.
Explanation
This complex query requires identifying "consecutive" transactions, which means
transactions where the gap between them is no more than 1 day. We then need to find the
highest spend for a user in such consecutive transaction periods within each month. The
challenge is to identify groups of consecutive transactions and calculate their total spend.
Datasets and SQL Schemas
• - Table creation
CREATE TABLE transactions (
transaction_id INT,
user_id INT,
transaction_date DATE,
transaction_amount DECIMAL(10, 2)
);
• - Datasets
INSERT INTO transactions (transaction_id, user_id, transaction_date, transaction_amount)
VALUES
(1, 101, '2022-01-01', 100.00),
(2, 101, '2022-01-02', 150.00),
(3, 101, '2022-01-05', 200.00),
(4, 101, '2022-01-06', 50.00),
(5, 102, '2022-01-01', 300.00),
(6, 102, '2022-01-02', 400.00),
(7, 102, '2022-01-04', 500.00),
(8, 103, '2022-01-03', 250.00),
(9, 103, '2022-01-05', 350.00);

Learnings
• Use of window functions like LAG() and LEAD() to identify consecutive rows
• Use of date difference (DATEDIFF() or similar) to identify gaps
• Grouping consecutive transactions and calculating sums
• Dealing with transaction data that spans across multiple time periods
Solutions
• - PostgreSQL solution
WITH consecutive_transactions AS (
SELECT user_id, transaction_date, transaction_amount,
ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY transaction_date) -
ROW_NUMBER() OVER (PARTITION BY user_id, EXTRACT(MONTH FROM transaction_date)
ORDER BY transaction_date) AS grp
FROM transactions
)
SELECT user_id,
EXTRACT(MONTH FROM transaction_date) AS month,
SUM(transaction_amount) AS total_spent
FROM consecutive_transactions
GROUP BY user_id, month, grp
ORDER BY total_spent DESC
LIMIT 1;
• - MySQL solution
WITH consecutive_transactions AS (
SELECT user_id, transaction_date, transaction_amount,
@grp := IF(@prev_user_id = user_id AND DATEDIFF(transaction_date, @prev_date)
<= 1, @grp, @grp + 1) AS grp,
@prev_user_id := user_id,
@prev_date := transaction_date
FROM transactions
ORDER BY user_id, transaction_date
)

827
1000+ SQL Interview Questions & Answers | By Zero Analyst

SELECT user_id,
MONTH(transaction_date) AS month,
SUM(transaction_amount) AS total_spent
FROM consecutive_transactions
GROUP BY user_id, month, grp
ORDER BY total_spent DESC
LIMIT 1;

Samsung
• Q.661
Question
Write a SQL query to calculate the average rating (stars) for each product per month. The
submit_date is saved in the 'YYYY-MM-DD HH:MI:SS' format.

Explanation
You need to:
• Extract the month and year from the submit_date.
• Calculate the average rating (stars) for each product per month.
• Group the results by product_id and the extracted month/year combination.
• Order the result by month/year and product_id to make it easy to interpret.
Datasets and SQL Schemas
• - Reviews table creation
CREATE TABLE reviews (
review_id INT,
user_id INT,
submit_date TIMESTAMP,
product_id INT,
stars INT
);
• - Insert sample data into reviews table
INSERT INTO reviews (review_id, user_id, submit_date, product_id, stars)
VALUES
(6171, 123, '2022-06-08 00:00:00', 50001, 4),
(7802, 265, '2022-06-10 00:00:00', 69852, 4),
(5293, 362, '2022-06-18 00:00:00', 50001, 3),
(6352, 192, '2022-07-26 00:00:00', 69852, 3),
(4517, 981, '2022-07-05 00:00:00', 69852, 2);

Learnings
• date_part() to extract parts of a timestamp (e.g., month, year)
• AVG() to calculate the average of the ratings
• GROUP BY to group by product and time period
• ORDER BY to sort the results in chronological order by month and product
Solutions
• - PostgreSQL solution
SELECT
EXTRACT(YEAR FROM submit_date) AS year,
EXTRACT(MONTH FROM submit_date) AS month,
product_id,
AVG(stars) AS avg_stars
FROM
reviews
GROUP BY
year, month, product_id
ORDER BY
year, month, product_id;

828
1000+ SQL Interview Questions & Answers | By Zero Analyst

• - MySQL solution
SELECT
YEAR(submit_date) AS year,
MONTH(submit_date) AS month,
product_id,
AVG(stars) AS avg_stars
FROM
reviews
GROUP BY
year, month, product_id
ORDER BY
year, month, product_id;

Explanation of the Query


• EXTRACT(YEAR/MONTH) (PostgreSQL) or YEAR()/MONTH() (MySQL) is used to
extract the year and month from the submit_date.
• AVG(stars) calculates the average rating for each product.
• GROUP BY groups the data by both product_id and the extracted year/month.
• ORDER BY ensures that the data is ordered by year, month, and product_id to maintain
chronological order.
• Q.662
Question
Write a SQL query to identify the most popular smartphone model by region, based on the
total number of units sold. The data is stored across three tables: regions, smartphone models,
and sales.
Explanation
You need to:
• Join the sales, regions, and smartphone_models tables on their respective keys
(region_id and model_id).
• Sum the units sold for each smartphone model by region.
• Group the data by region_name and model_name to get the total sales per model per
region.
• Order the results by region_name and total units sold in descending order to get the most
popular model at the top.
Datasets and SQL Schemas
• - Regions table creation
CREATE TABLE regions (
region_id INT,
region_name VARCHAR(255)
);
• - Smartphone models table creation
CREATE TABLE smartphone_models (
model_id INT,
model_name VARCHAR(255)
);
• - Sales table creation
CREATE TABLE sales (
sale_id INT,
region_id INT,
model_id INT,
units_sold INT
);

829
1000+ SQL Interview Questions & Answers | By Zero Analyst

• - Insert sample data into regions table


INSERT INTO regions (region_id, region_name)
VALUES
(1, 'North America'),
(2, 'South America'),
(3, 'Europe'),
(4, 'Asia');
• - Insert sample data into smartphone_models table
INSERT INTO smartphone_models (model_id, model_name)
VALUES
(1, 'Galaxy S21'),
(2, 'Galaxy S21 Ultra'),
(3, 'Galaxy Note20'),
(4, 'Galaxy Z Fold2');
• - Insert sample data into sales table
INSERT INTO sales (sale_id, region_id, model_id, units_sold)
VALUES
(1, 1, 1, 500),
(2, 1, 2, 700),
(3, 1, 3, 300),
(4, 2, 1, 600),
(5, 2, 3, 800),
(6, 3, 2, 1000),
(7, 4, 4, 2000);

Learnings
• JOIN operations to link tables together based on related keys
• SUM() to calculate the total number of units sold
• GROUP BY to aggregate data by region and model
• ORDER BY to sort the results based on the total units sold per region
Solutions
• - PostgreSQL and MySQL solution
SELECT r.region_name,
m.model_name,
SUM(s.units_sold) AS total_sold
FROM sales s
JOIN regions r ON s.region_id = r.region_id
JOIN smartphone_models m ON s.model_id = m.model_id
GROUP BY r.region_name, m.model_name
ORDER BY r.region_name, total_sold DESC;

Explanation of the Query


• JOIN: The sales table is joined with the regions table using the region_id field, and
with the smartphone_models table using the model_id field to bring together the necessary
information.
• SUM(): The total units sold for each smartphone model in each region are calculated using
the SUM() aggregate function.
• GROUP BY: The query groups the results by region_name and model_name, so the sales
are aggregated by these two columns.
• ORDER BY: The result is ordered first by region_name, and within each region, the
models are ordered by the total units sold in descending order.
• Q.663
Question
Calculate the Average Purchase Price of Galaxy Models in 2022
Write a SQL query to calculate the average purchase price for all Galaxy models in 2022.

830
1000+ SQL Interview Questions & Answers | By Zero Analyst

Explanation
You need to:
• Join the sales, smartphone_models, and products tables.
• Filter for sales in 2022.
• Calculate the average price for each Galaxy model by using the AVG() function.
• Make sure that only models containing "Galaxy" in their name are considered.

Datasets and SQL Schemas


• - Table creation
CREATE TABLE sales (
sale_id INT,
region_id INT,
model_id INT,
units_sold INT,
sale_date DATE
);

CREATE TABLE smartphone_models (


model_id INT,
model_name VARCHAR(100)
);

CREATE TABLE products (


product_id INT,
model_id INT,
price_usd DECIMAL(10, 2)
);
• - Datasets
-- Smartphone Models Data
INSERT INTO smartphone_models (model_id, model_name)
VALUES
(101, 'Galaxy S21'),
(102, 'Galaxy Note20'),
(103, 'Galaxy Z Fold3'),
(104, 'Galaxy S20'),
(105, 'Galaxy A52'),
(106, 'Galaxy Z Flip3'),
(107, 'Galaxy A72'),
(108, 'Galaxy S22'),
(109, 'Galaxy Note10');

-- Products Data (Price Info)


INSERT INTO products (product_id, model_id, price_usd)
VALUES
(1, 101, 799.99),
(2, 102, 999.99),
(3, 103, 1799.99),
(4, 104, 899.99),
(5, 105, 499.99),
(6, 106, 999.99),
(7, 107, 549.99),
(8, 108, 899.99),
(9, 109, 899.99);

-- Sales Data for 2022


INSERT INTO sales (sale_id, region_id, model_id, units_sold, sale_date)
VALUES
(1, 1, 101, 1500, '2022-01-21'),
(2, 1, 102, 1200, '2022-03-10'),
(3, 1, 104, 1800, '2022-05-05'),
(4, 1, 101, 2200, '2022-06-01'),
(5, 1, 105, 2200, '2022-07-15'),
(6, 1, 106, 1400, '2022-08-20'),
(7, 2, 101, 800, '2022-01-15'),
(8, 2, 103, 1500, '2022-02-25'),

831
1000+ SQL Interview Questions & Answers | By Zero Analyst

(9, 2, 104, 900, '2022-04-10'),


(10, 2, 108, 1800, '2022-06-05'),
(11, 2, 109, 1300, '2022-07-25'),
(12, 3, 101, 2000, '2022-03-05'),
(13, 3, 103, 1200, '2022-05-15'),
(14, 3, 105, 1600, '2022-06-30'),
(15, 3, 106, 1900, '2022-08-10'),
(16, 3, 108, 1700, '2022-09-15'),
(17, 4, 102, 1100, '2022-04-01'),
(18, 4, 104, 950, '2022-05-20'),
(19, 4, 106, 800, '2022-07-05'),
(20, 4, 107, 1400, '2022-06-10'),
(21, 4, 108, 1300, '2022-08-25');

Learnings
• Joins: The query involves joining the sales, smartphone_models, and products tables to
link model information and product prices with sales data.
• Filtering by Year: Use the EXTRACT(YEAR FROM sale_date) = 2022 or
YEAR(sale_date) = 2022 condition to focus on sales that happened in 2022.
• Average Price Calculation: Use the AVG(price_usd) function to compute the average
purchase price for each Galaxy model.
• Filtering by Product Name: You can filter the smartphone_models table for models that
contain "Galaxy" in the model_name column.

Solutions
• - PostgreSQL solution
SELECT
sm.model_name,
AVG(p.price_usd) AS avg_purchase_price
FROM
sales s
JOIN
smartphone_models sm ON s.model_id = sm.model_id
JOIN
products p ON sm.model_id = p.model_id
WHERE
EXTRACT(YEAR FROM s.sale_date) = 2022
AND sm.model_name LIKE '%Galaxy%'
GROUP BY
sm.model_name
ORDER BY
avg_purchase_price DESC;
• - MySQL solution
SELECT
sm.model_name,
AVG(p.price_usd) AS avg_purchase_price
FROM
sales s
JOIN
smartphone_models sm ON s.model_id = sm.model_id
JOIN
products p ON sm.model_id = p.model_id
WHERE
YEAR(s.sale_date) = 2022
AND sm.model_name LIKE '%Galaxy%'
GROUP BY
sm.model_name
ORDER BY
avg_purchase_price DESC;
• Q.664
Question
Find the Top 3 Samsung Products Based on Customer Ratings

832
1000+ SQL Interview Questions & Answers | By Zero Analyst

Write a SQL query to find the top 3 Samsung products based on average customer ratings.

Explanation
You need to:
• Join the reviews and products tables to link product names with customer ratings.
• Group the results by product_id to calculate the average rating (AVG(stars)).
• Sort the results by average rating in descending order to get the top-rated products.
• Limit the results to the top 3 products.

Datasets and SQL Schemas


• - Table creation
CREATE TABLE reviews (
review_id INT,
customer_id INT,
product_id INT,
stars INT
);

CREATE TABLE products (


product_id INT,
product_name VARCHAR(100)
);
• - Datasets
-- Products Data
INSERT INTO products (product_id, product_name)
VALUES
(101, 'Galaxy S21'),
(102, 'Galaxy Note20'),
(103, 'Galaxy Z Fold3'),
(104, 'Galaxy S20'),
(105, 'Galaxy A52'),
(106, 'Galaxy Z Flip3'),
(107, 'Galaxy A72'),
(108, 'Galaxy S22'),
(109, 'Galaxy Note10');

-- Reviews Data
INSERT INTO reviews (review_id, customer_id, product_id, stars)
VALUES
(1, 1, 101, 5), -- Galaxy S21 - 5 stars
(2, 2, 102, 4), -- Galaxy Note20 - 4 stars
(3, 3, 103, 5), -- Galaxy Z Fold3 - 5 stars
(4, 4, 104, 3), -- Galaxy S20 - 3 stars
(5, 5, 105, 4), -- Galaxy A52 - 4 stars
(6, 6, 106, 5), -- Galaxy Z Flip3 - 5 stars
(7, 7, 107, 4), -- Galaxy A72 - 4 stars
(8, 8, 108, 4), -- Galaxy S22 - 4 stars
(9, 9, 109, 2), -- Galaxy Note10 - 2 stars
(10, 10, 101, 5), -- Galaxy S21 - 5 stars
(11, 11, 102, 3), -- Galaxy Note20 - 3 stars
(12, 12, 103, 5), -- Galaxy Z Fold3 - 5 stars
(13, 13, 104, 4), -- Galaxy S20 - 4 stars
(14, 14, 105, 5), -- Galaxy A52 - 5 stars
(15, 15, 106, 4); -- Galaxy Z Flip3 - 4 stars

Learnings
• Joining Tables: The query involves joining the reviews and products tables on
product_id to link the product names with ratings.
• Grouping Data: The GROUP BY clause is used to group by product_id so that the average
rating can be calculated for each product.

833
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Aggregating Data: The AVG(stars) function is used to calculate the average rating for
each product.
• Sorting and Limiting Results: The ORDER BY clause sorts the products by their average
rating in descending order, and the LIMIT clause restricts the results to the top 3 products.

Solutions
• - PostgreSQL solution
SELECT
p.product_name,
AVG(r.stars) AS avg_rating
FROM
reviews r
JOIN
products p ON r.product_id = p.product_id
GROUP BY
p.product_name
ORDER BY
avg_rating DESC
LIMIT 3;
• - MySQL solution
SELECT
p.product_name,
AVG(r.stars) AS avg_rating
FROM
reviews r
JOIN
products p ON r.product_id = p.product_id
GROUP BY
p.product_name
ORDER BY
avg_rating DESC
LIMIT 3;
• Q.665

Question
Identify Customers Who Have Purchased More Than One Model in the Last Year
Write a SQL query to identify customers who have purchased more than one Samsung
Galaxy model in the last year.

Explanation
You need to:
• Join the customers and purchases tables to get customer information and their purchase
history.
• Filter for purchases made in the last year (i.e., the past 12 months).
• Use the HAVING clause to identify customers who have purchased more than one distinct
model.
• Return the customer's name, email, and the count of distinct Galaxy models purchased.

Datasets and SQL Schemas


• - Table creation
CREATE TABLE customers (
customer_id INT,
name VARCHAR(100),
email VARCHAR(100)
);

834
1000+ SQL Interview Questions & Answers | By Zero Analyst

CREATE TABLE purchases (


purchase_id INT,
customer_id INT,
product VARCHAR(100),
purchase_date DATE
);
• - Datasets
-- Customers Data
INSERT INTO customers (customer_id, name, email)
VALUES
(1, 'John Doe', '[email protected]'),
(2, 'Sophia Brown', '[email protected]'),
(3, 'Liam Smith', '[email protected]'),
(4, 'Ava Johnson', '[email protected]'),
(5, 'Noah Thompson', '[email protected]');

-- Purchases Data
INSERT INTO purchases (purchase_id, customer_id, product, purchase_date)
VALUES
(1, 1, 'Galaxy S21', '2023-01-15'),
(2, 1, 'Galaxy Z Flip3', '2023-02-10'),
(3, 2, 'Galaxy Note20', '2022-12-05'),
(4, 2, 'Galaxy A72', '2022-10-30'),
(5, 3, 'Galaxy S21', '2023-03-05'),
(6, 3, 'Galaxy Note20', '2023-04-01'),
(7, 4, 'Galaxy S20', '2023-06-25'),
(8, 4, 'Galaxy Z Fold3', '2023-07-15'),
(9, 5, 'Galaxy A52', '2022-09-10'),
(10, 5, 'Galaxy A72', '2022-11-20');

Learnings
• Joining Tables: The query involves joining the customers and purchases tables to get
the purchase history for each customer.
• Filtering by Date: The WHERE clause filters the purchases to include only those made in
the last year. You can use the CURRENT_DATE function and subtract INTERVAL 1 YEAR to get
the last 12 months.
• Grouping and Aggregating: The GROUP BY clause groups purchases by customer_id to
calculate the number of distinct products purchased per customer.
• Using HAVING: The HAVING clause ensures that only customers who purchased more
than one distinct product are selected.

Solutions
• - PostgreSQL solution
SELECT
c.name,
c.email,
COUNT(DISTINCT p.product) AS distinct_products_purchased
FROM
customers c
JOIN
purchases p ON c.customer_id = p.customer_id
WHERE
p.purchase_date >= CURRENT_DATE - INTERVAL '1 year'
GROUP BY
c.customer_id
HAVING
COUNT(DISTINCT p.product) > 1;
• - MySQL solution
SELECT
c.name,
c.email,
COUNT(DISTINCT p.product) AS distinct_products_purchased
FROM
customers c

835
1000+ SQL Interview Questions & Answers | By Zero Analyst

JOIN
purchases p ON c.customer_id = p.customer_id
WHERE
p.purchase_date >= CURDATE() - INTERVAL 1 YEAR
GROUP BY
c.customer_id
HAVING
COUNT(DISTINCT p.product) > 1;
• Q.666
Question
Calculate the Average Sale Price for Each Samsung Product
As a Data Analyst at Samsung, you are asked to analyze the sale data. For each product,
calculate the average sale price per month, for the year 2022. Assume today is July 31st,
2022.

Explanation
Calculate the average sale price for each product per month, filtering for sales from the year
2022. Group the results by month and product ID, then order by month and product ID.

Datasets and SQL Schemas


• - Table creation
CREATE TABLE sales (
sale_id INT,
product_id INT,
sale_date DATE,
price DECIMAL(10, 2)
);
• - Datasets
INSERT INTO sales (sale_id, product_id, sale_date, price)
VALUES
(31121, 90123, '2022-01-21', 899.99),
(42351, 60532, '2022-02-14', 349.00),
(57831, 90123, '2022-03-13', 879.99),
(64689, 53821, '2022-04-25', 699.99),
(77652, 53821, '2022-06-20', 689.00),
(89712, 60532, '2022-07-05', 359.99);

Learnings
• Use of EXTRACT function to extract month and year from sale_date.
• Aggregation with AVG to calculate the average price.
• Grouping data by multiple columns (month and product_id).
• Filtering data using WHERE based on the year.

Solutions
• - PostgreSQL solution
SELECT
EXTRACT(MONTH FROM sale_date) AS month,
product_id,
AVG(price) AS avg_price
FROM
sales
WHERE
EXTRACT(YEAR FROM sale_date) = 2022
GROUP BY
month, product_id
ORDER BY 1, 2;
• - MySQL solution

836
1000+ SQL Interview Questions & Answers | By Zero Analyst

SELECT
MONTH(sale_date) AS month,
product_id,
AVG(price) AS avg_price
FROM
sales
WHERE
YEAR(sale_date) = 2022
GROUP BY
month, product_id
ORDER BY 1, 2;
• Q.667
Question
Write a SQL query to filter the customers who purchased the Samsung Galaxy S21 in the
year 2022 and are signed up for the Samsung Members program. The query should return
their contact details for a promotional email campaign.
Explanation
You need to:
• Join the customers table with the purchases table on the customer_id field.
• Filter the data to include only customers who purchased the Galaxy S21 in the year 2022.
• Check if the customer is signed up for the Samsung Members program (where
user_signed_up = 1).
• Select the relevant contact details (name, email) from the customers table.
Datasets and SQL Schemas
• - Customers table creation
CREATE TABLE customers (
customer_id INT,
name VARCHAR(255),
email VARCHAR(255),
user_signed_up INT -- 1 for signed up, 0 for not signed up
);
• - Purchases table creation
CREATE TABLE purchases (
purchase_id INT,
customer_id INT,
product VARCHAR(255),
year INT
);
• - Insert sample data into customers table
INSERT INTO customers (customer_id, name, email, user_signed_up)
VALUES
(9615, 'James Smith', '[email protected]', 1),
(7021, 'Samantha Brown', '[email protected]', 1),
(8523, 'John Doe', '[email protected]', 0),
(6405, 'Anna Johnson', '[email protected]', 1),
(9347, 'Emma Black', '[email protected]', 0);
• - Insert sample data into purchases table
INSERT INTO purchases (purchase_id, customer_id, product, year)
VALUES
(5171, 9615, 'S21', 2022),
(7802, 7021, 'S21', 2022),
(8235, 8523, 'Note20', 2022),
(6320, 6405, 'S21', 2021),
(7395, 9347, 'S21', 2022);

Learnings
• JOIN operations to merge customer and purchase data
• WHERE clause for filtering based on multiple conditions

837
1000+ SQL Interview Questions & Answers | By Zero Analyst

• INNER JOIN to only get customers who have made a purchase


• Selection of specific fields (name, email) for the output
Solutions
• - PostgreSQL and MySQL solution
SELECT c.name, c.email
FROM customers c
INNER JOIN purchases p ON c.customer_id = p.customer_id
WHERE p.product = 'S21' AND p.year = 2022 AND c.user_signed_up = 1;

Explanation of the Query


• INNER JOIN: The customers table is joined with the purchases table on the
customer_id field. This ensures only customers who have made a purchase are included.
• WHERE: The query filters for:
• Customers who purchased the 'S21' product.
• Purchases made in the year 2022.
• Customers who are signed up for the Samsung Members program (user_signed_up = 1).
• SELECT: The query returns only the name and email fields of customers who meet all the
conditions.
• Q.668

Question
Find the Most Popular Samsung Smartphone Model in Each Region
Write a SQL query to find the most popular Samsung smartphone model in each region based
on the total number of units sold. The sales table contains data on region_id, model_id, and
units_sold.

Explanation
You need to:
• Join the sales table with the regions and smartphone_models tables.
• Aggregate the total sales for each model by region.
• Use ORDER BY to sort the models by the total number of units sold in each region and
LIMIT to select only the most popular model per region.

Datasets and SQL Schemas


• - Table creation
CREATE TABLE sales (
sale_id INT,
region_id INT,
model_id INT,
units_sold INT
);

CREATE TABLE regions (


region_id INT,
region_name VARCHAR(100)
);

CREATE TABLE smartphone_models (


model_id INT,
model_name VARCHAR(100)
);

838
1000+ SQL Interview Questions & Answers | By Zero Analyst

• - Datasets
-- Regions Data
INSERT INTO regions (region_id, region_name)
VALUES
(1, 'North America'),
(2, 'Europe'),
(3, 'Asia'),
(4, 'Australia');

-- Smartphone Models Data


INSERT INTO smartphone_models (model_id, model_name)
VALUES
(101, 'Galaxy S21'),
(102, 'Galaxy Note20'),
(103, 'Galaxy Z Fold3'),
(104, 'Galaxy S20'),
(105, 'Galaxy A52'),
(106, 'Galaxy Z Flip3'),
(107, 'Galaxy A72'),
(108, 'Galaxy S22'),
(109, 'Galaxy Note10');

-- Sales Data
INSERT INTO sales (sale_id, region_id, model_id, units_sold)
VALUES
-- North America
(1, 1, 101, 1500), -- Galaxy S21
(2, 1, 102, 1200), -- Galaxy Note20
(3, 1, 104, 1800), -- Galaxy S20
(4, 1, 105, 2200), -- Galaxy A52
(5, 1, 106, 1400), -- Galaxy Z Flip3
(6, 1, 107, 1000), -- Galaxy A72

-- Europe
(7, 2, 101, 800), -- Galaxy S21
(8, 2, 103, 1500), -- Galaxy Z Fold3
(9, 2, 104, 900), -- Galaxy S20
(10, 2, 108, 1800), -- Galaxy S22
(11, 2, 109, 1300), -- Galaxy Note10

-- Asia
(12, 3, 101, 2000), -- Galaxy S21
(13, 3, 103, 1200), -- Galaxy Z Fold3
(14, 3, 105, 1600), -- Galaxy A52
(15, 3, 106, 1900), -- Galaxy Z Flip3
(16, 3, 108, 1700), -- Galaxy S22

-- Australia
(17, 4, 102, 1100), -- Galaxy Note20
(18, 4, 104, 950), -- Galaxy S20
(19, 4, 106, 800), -- Galaxy Z Flip3
(20, 4, 107, 1400), -- Galaxy A72
(21, 4, 108, 1300); -- Galaxy S22

Learnings
• Using JOIN to combine multiple tables based on common columns (region_id,
model_id).
• Aggregating sales data using SUM to calculate the total units sold.
• Sorting results with ORDER BY and limiting the output with LIMIT to select the top result
per region.
• The use of grouping (GROUP BY) to calculate the total units sold per model and region.

Solutions
• - PostgreSQL solution
WITH RankedModels AS (
SELECT

839
1000+ SQL Interview Questions & Answers | By Zero Analyst

r.region_name,
m.model_name,
SUM(s.units_sold) AS total_units_sold,
ROW_NUMBER() OVER (PARTITION BY r.region_name ORDER BY SUM(s.units_sold) DESC) AS ra
nk
FROM
sales s
JOIN
regions r ON s.region_id = r.region_id
JOIN
smartphone_models m ON s.model_id = m.model_id
GROUP BY
r.region_name, m.model_name
)
SELECT
region_name,
model_name,
total_units_sold
FROM
RankedModels
WHERE
rank = 1;
• - MySQL solution
WITH RankedModels AS (
SELECT
r.region_name,
m.model_name,
SUM(s.units_sold) AS total_units_sold,
RANK() OVER (PARTITION BY r.region_name ORDER BY SUM(s.units_sold) DESC) AS rank
FROM
sales s
JOIN
regions r ON s.region_id = r.region_id
JOIN
smartphone_models m ON s.model_id = m.model_id
GROUP BY
r.region_name, m.model_name
)
SELECT
region_name,
model_name,
total_units_sold
FROM
RankedModels
WHERE
rank = 1;
• Q.669

Question
Finding all customers who bought 'Galaxy' series
You are the Data Analyst at Samsung and your manager has asked you to find all customers
who have purchased from the 'Galaxy' series. Samsung has multiple product lines, but you
are only interested in customers who purchased any product with 'Galaxy' in its product
name. For this task, use the 'customers' and 'products' tables. The 'products' table has a
column called 'product_name' where all the product names are stored.

Explanation
To find customers who bought products from the 'Galaxy' series, use the SQL LIKE operator
with the % wildcard to match any product names containing 'Galaxy'. Join the 'customers' and
'products' tables on customer_id.

Datasets and SQL Schemas

840
1000+ SQL Interview Questions & Answers | By Zero Analyst

• - Table creation
CREATE TABLE customers (
customer_id INT,
first_name VARCHAR(50),
last_name VARCHAR(50)
);

CREATE TABLE products (


product_id INT,
product_name VARCHAR(100),
customer_id INT
);
• - Datasets
INSERT INTO customers (customer_id, first_name, last_name)
VALUES
(123, 'John', 'Doe'),
(265, 'Sophia', 'Brown'),
(362, 'Liam', 'Smith'),
(192, 'Ava', 'Johnson'),
(981, 'Noah', 'Thompson');

INSERT INTO products (product_id, product_name, customer_id)


VALUES
(50001, 'Galaxy S21', 123),
(69852, 'IPhone 13', 265),
(25035, 'Galaxy Note10', 362),
(54603, 'Z Flip3', 192),
(56310, 'Galaxy Z Fold3', 981);

Learnings
• Use of LIKE with wildcard % to search for partial matches in a string.
• SQL JOIN operation to combine data from multiple tables based on a related column
(customer_id).
• Filtering data based on conditions using WHERE.

Solutions
• - PostgreSQL solution
SELECT c.first_name, c.last_name, p.product_name
FROM customers c
JOIN products p ON c.customer_id = p.customer_id
WHERE p.product_name LIKE '%Galaxy%';
• - MySQL solution
SELECT c.first_name, c.last_name, p.product_name
FROM customers c
JOIN products p ON c.customer_id = p.customer_id
WHERE p.product_name LIKE '%Galaxy%';
• Q.670
Question
Calculate the Average Price of Samsung Products Sold in 2022 by Region
Write a SQL query to calculate the average price of Samsung products sold in 2022, grouped
by region.

Explanation
You need to:
• Join the sales, regions, and smartphone_models tables.
• Filter the sales data to only include records from 2022.

841
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Join the smartphone_models table to include product pricing information (assuming there
is a price column in the smartphone_models table).
• Calculate the average price of Samsung products sold per region by using the AVG()
function.

Datasets and SQL Schemas


• - Table creation
CREATE TABLE sales (
sale_id INT,
region_id INT,
model_id INT,
units_sold INT,
sale_date DATE
);

CREATE TABLE regions (


region_id INT,
region_name VARCHAR(100)
);

CREATE TABLE smartphone_models (


model_id INT,
model_name VARCHAR(100),
price DECIMAL(10, 2) -- Added price column to store product price
);
• - Datasets
-- Regions Data
INSERT INTO regions (region_id, region_name)
VALUES
(1, 'North America'),
(2, 'Europe'),
(3, 'Asia'),
(4, 'Australia');

-- Smartphone Models Data with Prices


INSERT INTO smartphone_models (model_id, model_name, price)
VALUES
(101, 'Galaxy S21', 799.99),
(102, 'Galaxy Note20', 999.99),
(103, 'Galaxy Z Fold3', 1799.99),
(104, 'Galaxy S20', 899.99),
(105, 'Galaxy A52', 499.99),
(106, 'Galaxy Z Flip3', 999.99),
(107, 'Galaxy A72', 549.99),
(108, 'Galaxy S22', 899.99),
(109, 'Galaxy Note10', 899.99);

-- Sales Data for 2022


INSERT INTO sales (sale_id, region_id, model_id, units_sold, sale_date)
VALUES
-- North America
(1, 1, 101, 1500, '2022-01-21'),
(2, 1, 102, 1200, '2022-03-10'),
(3, 1, 104, 1800, '2022-05-05'),
(4, 1, 101, 2200, '2022-06-01'),
(5, 1, 105, 2200, '2022-07-15'),
(6, 1, 106, 1400, '2022-08-20'),

-- Europe
(7, 2, 101, 800, '2022-01-15'),
(8, 2, 103, 1500, '2022-02-25'),
(9, 2, 104, 900, '2022-04-10'),
(10, 2, 108, 1800, '2022-06-05'),
(11, 2, 109, 1300, '2022-07-25'),

-- Asia
(12, 3, 101, 2000, '2022-03-05'),
(13, 3, 103, 1200, '2022-05-15'),

842
1000+ SQL Interview Questions & Answers | By Zero Analyst

(14, 3, 105, 1600, '2022-06-30'),


(15, 3, 106, 1900, '2022-08-10'),
(16, 3, 108, 1700, '2022-09-15'),

-- Australia
(17, 4, 102, 1100, '2022-04-01'),
(18, 4, 104, 950, '2022-05-20'),
(19, 4, 106, 800, '2022-07-05'),
(20, 4, 107, 1400, '2022-06-10'),
(21, 4, 108, 1300, '2022-08-25');

Learnings
• Joining Multiple Tables: This query involves joining three tables: sales, regions, and
smartphone_models to get the relevant data.
• Filtering Data by Year: You filter the sales data to only include records for the year 2022.
• Aggregating Data: The AVG() function is used to calculate the average price of Samsung
products sold by region.
• Handling Price Data: The price information is retrieved from the smartphone_models
table, which is then used in the aggregation.

Solutions
• - PostgreSQL solution
SELECT
r.region_name,
AVG(m.price) AS avg_price
FROM
sales s
JOIN
regions r ON s.region_id = r.region_id
JOIN
smartphone_models m ON s.model_id = m.model_id
WHERE
EXTRACT(YEAR FROM s.sale_date) = 2022
GROUP BY
r.region_name
ORDER BY
r.region_name;
• - MySQL solution
SELECT
r.region_name,
AVG(m.price) AS avg_price
FROM
sales s
JOIN
regions r ON s.region_id = r.region_id
JOIN
smartphone_models m ON s.model_id = m.model_id
WHERE
YEAR(s.sale_date) = 2022
GROUP BY
r.region_name
ORDER BY
r.region_name;

This query calculates the average price of Samsung products sold in 2022, grouped by region.
• Q.671
Question
Calculate the Total Sales of Galaxy S21 in 2022 by Region
Write a SQL query to calculate the total units sold for the Galaxy S21 model across different
regions in 2022.

843
1000+ SQL Interview Questions & Answers | By Zero Analyst

Explanation
You need to:
• Join the sales, regions, and smartphone_models tables.
• Filter the sales data for the Galaxy S21 model (identified by its model_name).
• Filter the sales by the year 2022.
• Aggregate the total units sold by region.

Datasets and SQL Schemas


• - Table creation
CREATE TABLE sales (
sale_id INT,
region_id INT,
model_id INT,
units_sold INT,
sale_date DATE
);

CREATE TABLE regions (


region_id INT,
region_name VARCHAR(100)
);

CREATE TABLE smartphone_models (


model_id INT,
model_name VARCHAR(100)
);
• - Datasets
-- Regions Data
INSERT INTO regions (region_id, region_name)
VALUES
(1, 'North America'),
(2, 'Europe'),
(3, 'Asia'),
(4, 'Australia');

-- Smartphone Models Data


INSERT INTO smartphone_models (model_id, model_name)
VALUES
(101, 'Galaxy S21'),
(102, 'Galaxy Note20'),
(103, 'Galaxy Z Fold3'),
(104, 'Galaxy S20'),
(105, 'Galaxy A52'),
(106, 'Galaxy Z Flip3'),
(107, 'Galaxy A72'),
(108, 'Galaxy S22'),
(109, 'Galaxy Note10');

-- Sales Data for 2022


INSERT INTO sales (sale_id, region_id, model_id, units_sold, sale_date)
VALUES
-- North America
(1, 1, 101, 1500, '2022-01-21'), -- Galaxy S21
(2, 1, 102, 1200, '2022-03-10'), -- Galaxy Note20
(3, 1, 104, 1800, '2022-05-05'), -- Galaxy S20
(4, 1, 101, 2200, '2022-06-01'), -- Galaxy S21
(5, 1, 105, 2200, '2022-07-15'), -- Galaxy A52
(6, 1, 106, 1400, '2022-08-20'), -- Galaxy Z Flip3

-- Europe
(7, 2, 101, 800, '2022-01-15'), -- Galaxy S21
(8, 2, 103, 1500, '2022-02-25'), -- Galaxy Z Fold3
(9, 2, 104, 900, '2022-04-10'), -- Galaxy S20
(10, 2, 108, 1800, '2022-06-05'), -- Galaxy S22
(11, 2, 109, 1300, '2022-07-25'), -- Galaxy Note10

844
1000+ SQL Interview Questions & Answers | By Zero Analyst

-- Asia
(12, 3, 101, 2000, '2022-03-05'), -- Galaxy S21
(13, 3, 103, 1200, '2022-05-15'), -- Galaxy Z Fold3
(14, 3, 105, 1600, '2022-06-30'), -- Galaxy A52
(15, 3, 106, 1900, '2022-08-10'), -- Galaxy Z Flip3
(16, 3, 108, 1700, '2022-09-15'), -- Galaxy S22

-- Australia
(17, 4, 102, 1100, '2022-04-01'), -- Galaxy Note20
(18, 4, 104, 950, '2022-05-20'), -- Galaxy S20
(19, 4, 106, 800, '2022-07-05'), -- Galaxy Z Flip3
(20, 4, 107, 1400, '2022-06-10'), -- Galaxy A72
(21, 4, 108, 1300, '2022-08-25'); -- Galaxy S22

Learnings
• Filtering sales based on the product model name (Galaxy S21).
• Filtering by year using YEAR(sale_date) = 2022.
• Aggregating the total units sold by region using SUM.
• Using JOIN to combine relevant data from the sales, regions, and smartphone_models
tables.

Solutions
• - PostgreSQL solution
SELECT
r.region_name,
SUM(s.units_sold) AS total_units_sold
FROM
sales s
JOIN
regions r ON s.region_id = r.region_id
JOIN
smartphone_models m ON s.model_id = m.model_id
WHERE
m.model_name = 'Galaxy S21'
AND EXTRACT(YEAR FROM s.sale_date) = 2022
GROUP BY
r.region_name
ORDER BY
r.region_name;
• - MySQL solution
SELECT
r.region_name,
SUM(s.units_sold) AS total_units_sold
FROM
sales s
JOIN
regions r ON s.region_id = r.region_id
JOIN
smartphone_models m ON s.model_id = m.model_id
WHERE
m.model_name = 'Galaxy S21'
AND YEAR(s.sale_date) = 2022
GROUP BY
r.region_name
ORDER BY
r.region_name;
• Q.672

Question
Identify the Customers Who Bought the Galaxy Note20 and Rated it Below 3

845
1000+ SQL Interview Questions & Answers | By Zero Analyst

Write a SQL query to find customers who purchased the Galaxy Note20 and gave a rating of
less than 3. Return their name, email, and rating.

Explanation
You need to:
• Join the customers, purchases, and reviews tables.
• Filter the records to include only those customers who purchased the Galaxy Note20.
• Filter for reviews where the rating (stars) is less than 3.
• Return the customer's name, email, and the review rating.

Datasets and SQL Schemas


• - Table creation
CREATE TABLE customers (
customer_id INT,
name VARCHAR(100),
email VARCHAR(100),
user_signed_up DATE
);

CREATE TABLE purchases (


purchase_id INT,
customer_id INT,
product VARCHAR(100),
year INT,
units_sold INT
);

CREATE TABLE reviews (


review_id INT,
customer_id INT,
product_id INT,
stars INT
);
• - Datasets
-- Customers Data
INSERT INTO customers (customer_id, name, email, user_signed_up)
VALUES
(1, 'John Doe', '[email protected]', '2021-05-01'),
(2, 'Sophia Brown', '[email protected]', '2020-03-14'),
(3, 'Liam Smith', '[email protected]', '2021-09-09'),
(4, 'Ava Johnson', '[email protected]', '2022-01-10'),
(5, 'Noah Thompson', '[email protected]', '2021-06-30');

-- Purchases Data
INSERT INTO purchases (purchase_id, customer_id, product, year, units_sold)
VALUES
(1, 1, 'Galaxy Note20', 2022, 1),
(2, 2, 'Galaxy Note20', 2022, 2),
(3, 3, 'Galaxy S21', 2022, 1),
(4, 4, 'Galaxy Note20', 2022, 1),
(5, 5, 'Galaxy Note20', 2022, 1);

-- Reviews Data
INSERT INTO reviews (review_id, customer_id, product_id, stars)
VALUES
(1, 1, 102, 2), -- John Doe rated Galaxy Note20 with 2 stars
(2, 2, 102, 1), -- Sophia Brown rated Galaxy Note20 with 1 star
(3, 3, 101, 4), -- Liam Smith rated Galaxy S21 with 4 stars
(4, 4, 102, 3), -- Ava Johnson rated Galaxy Note20 with 3 stars
(5, 5, 102, 1); -- Noah Thompson rated Galaxy Note20 with 1 star

Learnings

846
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Joining Multiple Tables: You will need to join customers, purchases, and reviews to
get the relevant customer information and their corresponding reviews.
• Filtering Data: The filter conditions will include both the product ('Galaxy Note20')
and the review rating (stars < 3).
• Return Specific Columns: The output needs to return the customer's name, email, and
their rating for the Galaxy Note20.

Solutions
• - PostgreSQL solution
SELECT
c.name,
c.email,
r.stars AS rating
FROM
customers c
JOIN
purchases p ON c.customer_id = p.customer_id
JOIN
reviews r ON c.customer_id = r.customer_id
WHERE
p.product = 'Galaxy Note20'
AND r.stars < 3;
• - MySQL solution
SELECT
c.name,
c.email,
r.stars AS rating
FROM
customers c
JOIN
purchases p ON c.customer_id = p.customer_id
JOIN
reviews r ON c.customer_id = r.customer_id
WHERE
p.product = 'Galaxy Note20'
AND r.stars < 3;
• Q.673
Question
Identify the Customers Who Bought Galaxy Buds After Buying Samsung Galaxy S23
Ultra
Write a SQL query to identify customers who purchased Galaxy Buds after purchasing the
Samsung Galaxy S23 Ultra. Return the customer's name, email, and the purchase details for
both products.

Explanation
You need to:
• Join the customers and purchases tables to link customers with their purchase history.
• Use a self-join on the purchases table to match customers who bought both products,
ensuring that the purchase of Galaxy Buds occurs after the purchase of the Galaxy S23
Ultra.
• Filter based on the product names: Galaxy S23 Ultra first, and Galaxy Buds second.
• Return the relevant customer details, along with the product names and purchase dates.

Datasets and SQL Schemas

847
1000+ SQL Interview Questions & Answers | By Zero Analyst

• - Table creation
CREATE TABLE customers (
customer_id INT PRIMARY KEY,
name VARCHAR(100),
email VARCHAR(100)
);

INSERT INTO customers (customer_id, name, email) VALUES


(1, 'John Doe', '[email protected]'),
(2, 'Sophia Brown', '[email protected]'),
(3, 'Liam Smith', '[email protected]'),
(4, 'Ava Johnson', '[email protected]'),
(5, 'Noah Thompson', '[email protected]'),
(6, 'Emma White', '[email protected]'),
(7, 'Mason Black', '[email protected]'),
(8, 'Olivia Green', '[email protected]'),
(9, 'Lucas Blue', '[email protected]'),
(10, 'Isabella King', '[email protected]');
• - Datasets
CREATE TABLE products (
product_id INT PRIMARY KEY,
product_name VARCHAR(100),
price_usd DECIMAL(10, 2)
);

INSERT INTO products (product_id, product_name, price_usd) VALUES


(101, 'Galaxy S23 Ultra', 1299.99),
(102, 'Galaxy Buds Pro', 199.99),
(103, 'Galaxy Buds Live', 169.99),
(104, 'Galaxy Watch 4', 249.99),
(105, 'Galaxy S21', 799.99),
(106, 'Galaxy Note20', 999.99);

CREATE TABLE purchases (


purchase_id INT PRIMARY KEY,
customer_id INT,
product_id INT,
purchase_date DATE,
units_sold INT
);

INSERT INTO purchases (purchase_id, customer_id, product_id, purchase_date, units_sold)


VALUES
(1, 1, 101, '2022-01-10', 1),
(2, 1, 102, '2022-02-05', 2),
(3, 2, 101, '2022-03-15', 1),
(4, 2, 102, '2022-03-20', 1),
(5, 3, 105, '2022-01-01', 1),
(6, 3, 102, '2022-04-15', 1),
(7, 4, 101, '2022-05-05', 1),
(8, 4, 103, '2022-06-10', 2),
(9, 5, 104, '2022-07-07', 1),
(10, 5, 101, '2022-08-10', 1),
(11, 6, 101, '2022-09-09', 1),
(12, 7, 106, '2022-10-11', 2),
(13, 7, 101, '2022-11-10', 1),
(14, 8, 101, '2022-12-12', 1),
(15, 9, 102, '2022-12-13', 1);

Learnings
• Self-Join: A self-join is required to match two different purchases of the same customer,
with one product bought before the other.
• Filtering Products: The query involves filtering for Galaxy S23 Ultra and Galaxy Buds.
• Ensuring Order of Purchases: The query ensures that Galaxy Buds are purchased after
the Galaxy S23 Ultra by comparing the purchase dates.

848
1000+ SQL Interview Questions & Answers | By Zero Analyst

Solutions
• - PostgreSQL solution
SELECT
c.name,
c.email,
p1.product AS first_product,
p1.purchase_date AS first_purchase_date,
p2.product AS second_product,
p2.purchase_date AS second_purchase_date
FROM
customers c
JOIN
purchases p1 ON c.customer_id = p1.customer_id
JOIN
purchases p2 ON c.customer_id = p2.customer_id
WHERE
p1.product = 'Galaxy S23 Ultra'
AND p2.product = 'Galaxy Buds'
AND p1.purchase_date < p2.purchase_date;
• - MySQL solution
SELECT
c.name,
c.email,
p1.product AS first_product,
p1.purchase_date AS first_purchase_date,
p2.product AS second_product,
p2.purchase_date AS second_purchase_date
FROM
customers c
JOIN
purchases p1 ON c.customer_id = p1.customer_id
JOIN
purchases p2 ON c.customer_id = p2.customer_id
WHERE
p1.product = 'Galaxy S23 Ultra'
AND p2.product = 'Galaxy Buds'
AND p1.purchase_date < p2.purchase_date;

This query identifies the customers who bought Galaxy Buds after purchasing the Galaxy
S23 Ultra, along with the purchase details of both products.
• Q.674
Identify the Region with the Highest Total Sales of Samsung Galaxy S21 in 2022

Question
Write a SQL query to identify the region with the highest total sales (in units) of the
Samsung Galaxy S21 in 2022. Return the region name, total units sold, and the region's total
sales.

Explanation
• You need to join the sales, regions, and smartphone_models tables.
• Filter for the Samsung Galaxy S21 model and sales in 2022.
• Aggregate the total units sold per region.
• Return the region with the highest total units sold, along with the total units and sales.

Datasets and SQL Schemas


CREATE TABLE regions (
region_id INT PRIMARY KEY,
region_name VARCHAR(100)
);

849
1000+ SQL Interview Questions & Answers | By Zero Analyst

INSERT INTO regions (region_id, region_name) VALUES


(1, 'North America'),
(2, 'Europe'),
(3, 'Asia'),
(4, 'Middle East'),
(5, 'Africa'),
(6, 'South America');

CREATE TABLE smartphone_models (


model_id INT PRIMARY KEY,
model_name VARCHAR(100)
);

INSERT INTO smartphone_models (model_id, model_name) VALUES


(101, 'Galaxy S21'),
(102, 'Galaxy Note20'),
(103, 'Galaxy Z Fold3'),
(104, 'Galaxy Z Flip3'),
(105, 'Galaxy S22');
CREATE TABLE sales (
sale_id INT PRIMARY KEY,
region_id INT,
model_id INT,
units_sold INT,
sale_date DATE
);

INSERT INTO sales (sale_id, region_id, model_id, units_sold, sale_date) VALUES


(1, 1, 101, 1500, '2022-01-15'),
(2, 2, 101, 2000, '2022-03-20'),
(3, 3, 101, 1200, '2022-05-10'),
(4, 4, 101, 2500, '2022-07-10'),
(5, 5, 101, 1000, '2022-09-15'),
(6, 6, 101, 1800, '2022-11-05'),
(7, 1, 102, 2200, '2022-02-15'),
(8, 3, 102, 800, '2022-06-20'),
(9, 2, 103, 3000, '2022-08-05'),
(10, 4, 103, 1500, '2022-10-15'),
(11, 5, 104, 2000, '2022-04-05'),
(12, 1, 105, 1300, '2022-12-20'),
(13, 2, 105, 900, '2022-11-10'),
(14, 3, 101, 500, '2022-07-01'),
(15, 4, 101, 2000, '2022-09-10');

Solutions
• - PostgreSQL solution
SELECT
r.region_name,
SUM(s.units_sold) AS total_units_sold,
SUM(s.units_sold * p.price_usd) AS total_sales
FROM
sales s
JOIN
regions r ON s.region_id = r.region_id
JOIN
smartphone_models sm ON s.model_id = sm.model_id
JOIN
products p ON sm.model_id = p.model_id
WHERE
sm.model_name = 'Galaxy S21'
AND EXTRACT(YEAR FROM s.sale_date) = 2022
GROUP BY
r.region_name
ORDER BY
total_units_sold DESC
LIMIT 1;

850
1000+ SQL Interview Questions & Answers | By Zero Analyst

• - MySQL solution
SELECT
r.region_name,
SUM(s.units_sold) AS total_units_sold,
SUM(s.units_sold * p.price_usd) AS total_sales
FROM
sales s
JOIN
regions r ON s.region_id = r.region_id
JOIN
smartphone_models sm ON s.model_id = sm.model_id
JOIN
products p ON sm.model_id = p.model_id
WHERE
sm.model_name = 'Galaxy S21'
AND YEAR(s.sale_date) = 2022
GROUP BY
r.region_name
ORDER BY
total_units_sold DESC
LIMIT 1;
• Q.675
Find the Customers Who Have Given More Than One Review for the Same Product

Question
Write a SQL query to find all customers who have given more than one review for the same
Samsung product. Return the customer name, product name, and the count of reviews.

Explanation
• You need to join the customers, reviews, and products tables.
• Use GROUP BY to identify customers who have provided multiple reviews for the same
product.
• Filter for customers who have given more than one review for the same product.

Datasets and SQL Schemas


CREATE TABLE customers (
customer_id INT PRIMARY KEY,
name VARCHAR(100),
email VARCHAR(100)
);

INSERT INTO customers (customer_id, name, email) VALUES


(1, 'John Doe', '[email protected]'),
(2, 'Sophia Brown', '[email protected]'),
(3, 'Liam Smith', '[email protected]'),
(4, 'Ava Johnson', '[email protected]'),
(5, 'Noah Thompson', '[email protected]'),
(6, 'Emma White', '[email protected]'),
(7, 'Mason Black', '[email protected]'),
(8, 'Olivia Green', '[email protected]'),
(9, 'Lucas Blue', '[email protected]'),
(10, 'Isabella King', '[email protected]');
CREATE TABLE products (
product_id INT PRIMARY KEY,
product_name VARCHAR(100)
);

INSERT INTO products (product_id, product_name) VALUES


(101, 'Galaxy S21'),
(102, 'Galaxy Note20'),
(103, 'Galaxy Z Fold3'),
(104, 'Galaxy Z Flip3'),

851
1000+ SQL Interview Questions & Answers | By Zero Analyst

(105, 'Galaxy S22'),


(106, 'Galaxy A52');

CREATE TABLE reviews (


review_id INT PRIMARY KEY,
customer_id INT,
product_id INT,
stars INT,
review_date DATE
);

INSERT INTO reviews (review_id, customer_id, product_id, stars, review_date) VALUES


(1, 1, 101, 4, '2023-01-01'),
(2, 1, 101, 5, '2023-02-01'),
(3, 2, 102, 3, '2022-12-01'),
(4, 2, 102, 2, '2022-12-05'),
(5, 3, 103, 4, '2023-03-10'),
(6, 3, 103, 5, '2023-03-15'),
(7, 4, 104, 3, '2022-11-01'),
(8, 5, 101, 4, '2022-10-10'),
(9, 5, 101, 2, '2022-10-15'),
(10, 6, 105, 5, '2023-01-05'),
(11, 7, 106, 3, '2022-09-10'),
(12, 7, 106, 4, '2022-09-12'),
(13, 8, 101, 4, '2023-02-01'),
(14, 9, 103, 5, '2023-04-01'),
(15, 10, 101, 3, '2023-05-10');

Solutions
• - PostgreSQL solution
SELECT
c.name,
p.product_name,
COUNT(r.review_id) AS review_count
FROM
reviews r
JOIN
customers c ON r.customer_id = c.customer_id
JOIN
products p ON r.product_id = p.product_id
GROUP BY
c.name, p.product_name
HAVING
COUNT(r.review_id) > 1;
• - MySQL solution
SELECT
c.name,
p.product_name,
COUNT(r.review_id) AS review_count
FROM
reviews r
JOIN
customers c ON r.customer_id = c.customer_id
JOIN
products p ON r.product_id = p.product_id
GROUP BY
c.name, p.product_name
HAVING
COUNT(r.review_id) > 1;
• Q.676
Calculate the Total Revenue for Each Product in 2022

Question
Write a SQL query to calculate the total revenue for each Samsung product sold in 2022.
Revenue is calculated as the number of units sold multiplied by the product price.

852
1000+ SQL Interview Questions & Answers | By Zero Analyst

Explanation
• Join the sales, products, and smartphone_models tables.
• Filter the sales data for 2022.
• Multiply the units sold by the product price to calculate total revenue.
• Return the total revenue for each product.

Datasets and SQL Schemas


CREATE TABLE products (
product_id INT PRIMARY KEY,
product_name VARCHAR(100),
price_usd DECIMAL(10, 2)
);

INSERT INTO products (product_id, product_name, price_usd) VALUES


(1, 'Galaxy S21', 799.99),
(2, 'Galaxy Note20', 999.99),
(3, 'Galaxy Z Fold3', 1799.99),
(4, 'Galaxy Z Flip3', 999.99),
(5, 'Galaxy S22', 999.99),
(6, 'Galaxy A52', 349.99),
(7, 'Galaxy A72', 449.99);
CREATE TABLE sales (
sale_id INT PRIMARY KEY,
region_id INT,
model_id INT,
units_sold INT,
sale_date DATE
);

INSERT INTO sales (sale_id, region_id, model_id, units_sold, sale_date) VALUES


(1, 1, 1, 1500, '2022-01-15'),
(2, 2, 1, 2000, '2022-03-20'),
(3, 3, 1, 1200, '2022-05-10'),
(4, 4, 1, 2500, '2022-07-10'),
(5, 5, 1, 1000, '2022-09-15'),
(6, 6, 1, 1800, '2022-11-05'),
(7, 1, 2, 2200, '2022-02-15'),
(8, 3, 2, 800, '2022-06-20'),
(9, 2, 3, 3000, '2022-08-05'),
(10, 4, 3, 1500, '2022-10-15'),
(11, 5, 4, 2000, '2022-04-05'),
(12, 1, 5, 1300, '2022-12-20'),
(13, 2, 5, 900, '2022-11-10'),
(14, 3, 1, 500, '2022-07-01'),
(15, 4, 1, 2000, '2022-09-10');

Solutions
• - PostgreSQL solution
SELECT
sm.model_name,
SUM(s.units_sold * p.price_usd) AS total_revenue
FROM
sales s
JOIN
smartphone_models sm ON s.model_id = sm.model_id
JOIN
products p ON sm.product_id = p.product_id
WHERE
EXTRACT(YEAR FROM s.sale_date) = 2022
GROUP BY
sm.model_name
ORDER BY
total_revenue DESC;

853
1000+ SQL Interview Questions & Answers | By Zero Analyst

• - MySQL solution
SELECT
sm.model_name,
SUM(s.units_sold * p.price_usd) AS total_revenue
FROM
sales s
JOIN
smartphone_models sm ON s.model_id = sm.model_id
JOIN
products p ON sm.product_id = p.product_id
WHERE
YEAR(s.sale_date) = 2022
GROUP BY
sm.model_name
ORDER BY
total_revenue DESC;
• Q.677

Problem 1: Find the Number of Products in Each Category


Question:
Write a SQL query to find the total number of products in each category.
Explanation:
You need to:
• Group products by category.
• Count the number of products in each category.

Datasets and SQL Schemas

Products Table
CREATE TABLE products (
product_id INT PRIMARY KEY,
product_name VARCHAR(100),
category VARCHAR(50)
);

INSERT INTO products (product_id, product_name, category) VALUES


(1, 'Galaxy S23 Ultra', 'Smartphone'),
(2, 'Galaxy Buds Pro', 'Accessories'),
(3, 'Galaxy Watch 4', 'Smartwatch'),
(4, 'Galaxy S21', 'Smartphone'),
(5, 'Galaxy Z Fold 3', 'Smartphone'),
(6, 'Galaxy S22', 'Smartphone'),
(7, 'Galaxy Buds Live', 'Accessories'),
(8, 'Galaxy Watch 4 Classic', 'Smartwatch');

SQL Solution:
SELECT category, COUNT(*) AS total_products
FROM products
GROUP BY category;

• Q.678
Find Customers Who Have Bought More Than One Product
Question:
Write a SQL query to find customers who have purchased more than one product.

854
1000+ SQL Interview Questions & Answers | By Zero Analyst

Explanation:
You need to:
• Join the customers and purchases tables.
• Group by customer and count the number of products they have bought.
• Filter customers who have purchased more than one product.

Datasets and SQL Schemas

Customers Table
CREATE TABLE customers (
customer_id INT PRIMARY KEY,
name VARCHAR(100),
email VARCHAR(100)
);

INSERT INTO customers (customer_id, name, email) VALUES


(1, 'John Doe', '[email protected]'),
(2, 'Sophia Brown', '[email protected]'),
(3, 'Liam Smith', '[email protected]'),
(4, 'Ava Johnson', '[email protected]'),
(5, 'Noah Thompson', '[email protected]');

Purchases Table
CREATE TABLE purchases (
purchase_id INT PRIMARY KEY,
customer_id INT,
product_id INT,
purchase_date DATE
);

INSERT INTO purchases (purchase_id, customer_id, product_id, purchase_date) VALUES


(1, 1, 101, '2022-01-10'),
(2, 1, 102, '2022-02-05'),
(3, 2, 101, '2022-03-15'),
(4, 2, 103, '2022-04-20'),
(5, 3, 104, '2022-05-25'),
(6, 4, 105, '2022-06-05'),
(7, 5, 106, '2022-07-10'),
(8, 5, 107, '2022-08-12');

SQL Solution:
SELECT c.name, c.email
FROM customers c
JOIN purchases p ON c.customer_id = p.customer_id
GROUP BY c.customer_id
HAVING COUNT(p.product_id) > 1;

Key Concepts for Both Problems:


• GROUP BY: Used to aggregate data based on specific columns.
• COUNT: Used to count occurrences.
• HAVING: Used to filter results after aggregation.
• JOIN: Used to combine data from multiple tables.
• Q.679
Identify Products That Have a Rating Below 3 for More Than 50% of Their Reviews

Explanation:

855
1000+ SQL Interview Questions & Answers | By Zero Analyst

To identify products that have received a rating below 3 for more than 50% of their reviews:
• Join the reviews and products tables on the product_id to get product details for each
review.
• Calculate the percentage of reviews with a rating below 3 for each product by using COUNT
and FILTER for the reviews with stars below 3.
• Filter products where the percentage of reviews with a rating below 3 is greater than 50%.

Datasets and SQL Schemas

Reviews Table
CREATE TABLE reviews (
review_id INT PRIMARY KEY,
customer_id INT,
product_id INT,
stars INT,
review_date DATE
);

INSERT INTO reviews (review_id, customer_id, product_id, stars, review_date) VALUES


(1, 1, 101, 5, '2023-01-01'),
(2, 1, 101, 3, '2023-02-01'),
(3, 2, 102, 1, '2022-12-01'),
(4, 2, 102, 2, '2022-12-05'),
(5, 3, 103, 4, '2023-03-10'),
(6, 3, 103, 2, '2023-03-15'),
(7, 4, 104, 5, '2023-01-10'),
(8, 5, 101, 1, '2022-10-20'),
(9, 5, 101, 2, '2022-10-25'),
(10, 6, 105, 4, '2023-01-05'),
(11, 7, 106, 3, '2022-09-10'),
(12, 7, 101, 2, '2022-09-12'),
(13, 8, 103, 5, '2022-08-15'),
(14, 9, 102, 1, '2022-07-10'),
(15, 10, 104, 3, '2022-06-05');

Products Table
CREATE TABLE products (
product_id INT PRIMARY KEY,
product_name VARCHAR(100)
);

INSERT INTO products (product_id, product_name) VALUES


(101, 'Galaxy S21'),
(102, 'Galaxy Note20'),
(103, 'Galaxy Z Flip3'),
(104, 'Galaxy Watch 4'),
(105, 'Galaxy S22'),
(106, 'Galaxy Buds Pro');

SQL Solution:
SELECT p.product_name
FROM products p
JOIN reviews r ON p.product_id = r.product_id
GROUP BY p.product_id, p.product_name
HAVING COUNT(CASE WHEN r.stars < 3 THEN 1 END) * 1.0 / COUNT(r.review_id) > 0.5;

Explanation of the Query:


• JOIN: We join the reviews table with the products table using the product_id.
• COUNT with CASE:

856
1000+ SQL Interview Questions & Answers | By Zero Analyst

• The COUNT(CASE WHEN r.stars < 3 THEN 1 END) counts the number of reviews with a
rating below 3 for each product.
• The COUNT(r.review_id) counts the total number of reviews for each product.
• HAVING Clause:
• We calculate the ratio of reviews with a rating below 3 by dividing the count of low-star
reviews by the total number of reviews.
• The HAVING clause filters products where the ratio of low-star reviews is greater than 50%
(i.e., more than 0.5).

Key Takeaways:
• CASE WHEN is used to selectively count reviews that meet the condition of having a
rating below 3.
• HAVING is used after aggregation (GROUP BY) to filter based on the calculated ratio.
• JOIN ensures we link products with their corresponding reviews to analyze the ratings.

• Q.680
Find Customers Who Have Never Bought a Galaxy Model

Explanation:
To identify customers who have never bought a Samsung Galaxy model:
• Use a LEFT JOIN between the customers and purchases tables.
• Filter for customers who do not have any purchase records for Galaxy models by using the
WHERE condition with a NULL check on the product_id or product_name column for Galaxy
products.

Datasets and SQL Schemas

Customers Table
CREATE TABLE customers (
customer_id INT PRIMARY KEY,
name VARCHAR(100),
email VARCHAR(100)
);

INSERT INTO customers (customer_id, name, email) VALUES


(1, 'John Doe', '[email protected]'),
(2, 'Sophia Brown', '[email protected]'),
(3, 'Liam Smith', '[email protected]'),
(4, 'Ava Johnson', '[email protected]'),
(5, 'Noah Thompson', '[email protected]'),
(6, 'Emma White', '[email protected]'),
(7, 'Mason Black', '[email protected]'),
(8, 'Olivia Green', '[email protected]'),
(9, 'Lucas Blue', '[email protected]'),
(10, 'Isabella King', '[email protected]');

Purchases Table
CREATE TABLE purchases (
purchase_id INT PRIMARY KEY,
customer_id INT,
product VARCHAR(100),
purchase_date DATE

857
1000+ SQL Interview Questions & Answers | By Zero Analyst

);

INSERT INTO purchases (purchase_id, customer_id, product, purchase_date) VALUES


(1, 1, 'Galaxy S21', '2022-01-10'),
(2, 2, 'Galaxy Note20', '2022-02-05'),
(3, 3, 'IPhone 13', '2022-03-15'),
(4, 4, 'Samsung Smartwatch', '2022-04-10'),
(5, 5, 'Galaxy Z Flip3', '2022-05-05'),
(6, 6, 'IPhone 14', '2022-06-20'),
(7, 7, 'Galaxy S22', '2022-07-15'),
(8, 8, 'MacBook Pro', '2022-08-10'),
(9, 9, 'Galaxy Buds Pro', '2022-09-01'),
(10, 10, 'IPhone 13', '2022-10-05');

SQL Solution:
SELECT c.customer_id, c.name, c.email
FROM customers c
LEFT JOIN purchases p ON c.customer_id = p.customer_id AND p.product LIKE 'Galaxy%'
WHERE p.purchase_id IS NULL;

Explanation of the Query:


• LEFT JOIN is used to include all customers from the customers table and match them
with purchase records from the purchases table.
• p.product LIKE 'Galaxy%' ensures that the join only considers purchases of Galaxy
models.
• The WHERE p.purchase_id IS NULL condition filters out customers who have made any
purchases of Galaxy models, leaving only those who have never purchased any Galaxy
products.

Key Takeaways:
• LEFT JOIN ensures that all customers are included in the result, even if they don't have
matching records in the purchases table.
• LIKE 'Galaxy%' is used to filter only Galaxy model purchases.
• NULL check (WHERE p.purchase_id IS NULL) identifies customers who have no
records for Galaxy products.

IBM
• Q.681
Calculate Average Sales per Region
Explanation
IBM is analyzing sales data across multiple regions. The task is to calculate the average sales
for each region. You need to group the data by region and return the average sales for each
region.

Datasets and SQL Schemas


• - Table creation
CREATE TABLE sales (
sale_id INT,
region VARCHAR(50),
sale_amount DECIMAL(10, 2),

858
1000+ SQL Interview Questions & Answers | By Zero Analyst

sale_date DATE
);
• - Datasets
INSERT INTO sales (sale_id, region, sale_amount, sale_date)
VALUES
(1, 'North', 200.00, '2023-01-01'),
(2, 'South', 300.00, '2023-01-05'),
(3, 'East', 150.00, '2023-01-07'),
(4, 'North', 400.00, '2023-01-10'),
(5, 'South', 250.00, '2023-01-12');

Learnings
• Grouping data by region
• Calculating the average of a column using AVG()
• Summarizing sales data by region

Solutions
• - PostgreSQL solution
SELECT region, AVG(sale_amount) AS avg_sales
FROM sales
GROUP BY region
ORDER BY region;
• - MySQL solution
SELECT region, AVG(sale_amount) AS avg_sales
FROM sales
GROUP BY region
ORDER BY region;
• Q.682
Total Sales by Product
Explanation
IBM wants to know how much total sales were made for each product. The task is to sum the
sales for each product and return the total sales amount for each product.

Datasets and SQL Schemas


• - Table creation
CREATE TABLE product_sales (
product_id INT,
product_name VARCHAR(100),
sale_amount DECIMAL(10, 2),
sale_date DATE
);
• - Datasets
INSERT INTO product_sales (product_id, product_name, sale_amount, sale_date)
VALUES
(1, 'Laptop', 1000.00, '2023-01-01'),
(2, 'Smartphone', 500.00, '2023-01-05'),
(1, 'Laptop', 1500.00, '2023-01-10'),
(3, 'Tablet', 300.00, '2023-01-12'),
(2, 'Smartphone', 600.00, '2023-01-14');

Learnings
• Grouping data by product
• Calculating the sum of a column using SUM()

859
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Aggregating sales data by product

Solutions
• - PostgreSQL solution
SELECT product_name, SUM(sale_amount) AS total_sales
FROM product_sales
GROUP BY product_name
ORDER BY total_sales DESC;
• - MySQL solution
SELECT product_name, SUM(sale_amount) AS total_sales
FROM product_sales
GROUP BY product_name
ORDER BY total_sales DESC;
• Q.683
Counting Orders Above a Threshold
Explanation
IBM is analyzing customer order data and wants to count how many orders exceed a
specified amount (e.g., $500). The task is to count the number of orders above the given
threshold.

Datasets and SQL Schemas


• - Table creation
CREATE TABLE customer_orders (
order_id INT,
customer_id INT,
order_amount DECIMAL(10, 2),
order_date DATE
);
• - Datasets
INSERT INTO customer_orders (order_id, customer_id, order_amount, order_date)
VALUES
(1, 101, 450.00, '2023-01-01'),
(2, 102, 600.00, '2023-01-05'),
(3, 103, 700.00, '2023-01-10'),
(4, 104, 350.00, '2023-01-12'),
(5, 105, 800.00, '2023-01-14');

Learnings
• Using the COUNT() function to count rows
• Filtering data using WHERE clause with a threshold
• Aggregating data based on a condition

Solutions
• - PostgreSQL solution
SELECT COUNT(*) AS orders_above_500
FROM customer_orders
WHERE order_amount > 500;
• - MySQL solution
SELECT COUNT(*) AS orders_above_500
FROM customer_orders
WHERE order_amount > 500;
• Q.684

860
1000+ SQL Interview Questions & Answers | By Zero Analyst

Calculate the Monthly Revenue for Each Region


Explanation
IBM wants to analyze the monthly revenue generated from each region. The task is to
calculate the total revenue for each region for each month and return the results sorted by the
month and region.

Datasets and SQL Schemas


• - Table creation
CREATE TABLE sales (
sale_id INT,
region VARCHAR(50),
sale_amount DECIMAL(10, 2),
sale_date DATE
);
• - Datasets
INSERT INTO sales (sale_id, region, sale_amount, sale_date)
VALUES
(1, 'North', 1000.00, '2023-01-05'),
(2, 'South', 2000.00, '2023-01-15'),
(3, 'East', 1500.00, '2023-01-20'),
(4, 'North', 1200.00, '2023-02-10'),
(5, 'South', 1800.00, '2023-02-15'),
(6, 'East', 1300.00, '2023-02-18');

Learnings
• Using EXTRACT(MONTH FROM date) to get the month from a date
• Aggregating data by region and month
• Calculating the total revenue for each region per month
• Sorting the results by month and region

Solutions
• - PostgreSQL solution
SELECT region,
EXTRACT(MONTH FROM sale_date) AS month,
SUM(sale_amount) AS total_revenue
FROM sales
GROUP BY region, EXTRACT(MONTH FROM sale_date)
ORDER BY month, region;
• - MySQL solution
SELECT region,
MONTH(sale_date) AS month,
SUM(sale_amount) AS total_revenue
FROM sales
GROUP BY region, MONTH(sale_date)
ORDER BY month, region;
• Q.685
Identify Customers Who Have Made More Than 3 Orders in a Month
Explanation
IBM wants to identify customers who are highly engaged by placing more than 3 orders
within a given month. The task is to return the list of customer IDs who have made more than
3 orders in any given month.

861
1000+ SQL Interview Questions & Answers | By Zero Analyst

Datasets and SQL Schemas


• - Table creation
CREATE TABLE customer_orders (
order_id INT,
customer_id INT,
order_amount DECIMAL(10, 2),
order_date DATE
);
• - Datasets
INSERT INTO customer_orders (order_id, customer_id, order_amount, order_date)
VALUES
(1, 101, 200.00, '2023-01-05'),
(2, 102, 300.00, '2023-01-10'),
(3, 101, 150.00, '2023-01-15'),
(4, 103, 250.00, '2023-01-20'),
(5, 101, 100.00, '2023-01-25'),
(6, 104, 500.00, '2023-02-01'),
(7, 101, 450.00, '2023-02-03');

Learnings
• Using COUNT() to count the number of orders per customer
• Grouping by customer_id and month
• Filtering customers who have made more than 3 orders in a month
• Using HAVING to apply conditions to aggregated results

Solutions
• - PostgreSQL solution
SELECT customer_id,
EXTRACT(MONTH FROM order_date) AS month,
COUNT(*) AS order_count
FROM customer_orders
GROUP BY customer_id, EXTRACT(MONTH FROM order_date)
HAVING COUNT(*) > 3
ORDER BY customer_id, month;
• - MySQL solution
SELECT customer_id,
MONTH(order_date) AS month,
COUNT(*) AS order_count
FROM customer_orders
GROUP BY customer_id, MONTH(order_date)
HAVING COUNT(*) > 3
ORDER BY customer_id, month;
• Q.686
Product Sales Performance Based on Region and Category
Explanation
IBM wants to assess product performance by region and category. The task is to calculate the
total sales for each product category and region for the last quarter, ensuring to exclude
products with sales below a threshold (e.g., $500).

Datasets and SQL Schemas


• - Table creation
CREATE TABLE product_sales (
sale_id INT,
product_id INT,
product_name VARCHAR(100),

862
1000+ SQL Interview Questions & Answers | By Zero Analyst

category VARCHAR(50),
region VARCHAR(50),
sale_amount DECIMAL(10, 2),
sale_date DATE
);
• - Datasets
INSERT INTO product_sales (sale_id, product_id, product_name, category, region, sale_amo
unt, sale_date)
VALUES
(1, 101, 'Laptop', 'Electronics', 'North', 1500.00, '2023-03-05'),
(2, 102, 'Smartphone', 'Electronics', 'South', 600.00, '2023-03-10'),
(3, 103, 'Tablet', 'Electronics', 'East', 200.00, '2023-03-15'),
(4, 104, 'Monitor', 'Electronics', 'North', 350.00, '2023-03-20'),
(5, 105, 'Headphones', 'Accessories', 'South', 700.00, '2023-03-25'),
(6, 106, 'Charger', 'Accessories', 'East', 100.00, '2023-03-30');

Learnings
• Aggregating sales by category and region
• Filtering products with sales above a threshold
• Using GROUP BY to calculate total sales per category and region
• Working with date ranges for the last quarter

Solutions
• - PostgreSQL solution
SELECT category, region, SUM(sale_amount) AS total_sales
FROM product_sales
WHERE sale_date BETWEEN '2023-01-01' AND '2023-03-31'
GROUP BY category, region
HAVING SUM(sale_amount) > 500
ORDER BY total_sales DESC;
• - MySQL solution
SELECT category, region, SUM(sale_amount) AS total_sales
FROM product_sales
WHERE sale_date BETWEEN '2023-01-01' AND '2023-03-31'
GROUP BY category, region
HAVING SUM(sale_amount) > 500
ORDER BY total_sales DESC;
• Q.687
Extract Product Code from Product Name
Explanation
IBM wants to clean up product data by extracting product codes from the product names,
which are formatted as "Product Name - [ProductCode]". The task is to extract only the
ProductCode from the product name, ensuring the product code is a 6-character
alphanumeric string.

Datasets and SQL Schemas


• - Table creation
CREATE TABLE products (
product_id INT,
product_name VARCHAR(255)
);
• - Datasets
INSERT INTO products (product_id, product_name)
VALUES
(1, 'Laptop - ABC123'),

863
1000+ SQL Interview Questions & Answers | By Zero Analyst

(2, 'Smartphone - XYX456'),


(3, 'Tablet - ZZZ789'),
(4, 'Monitor - QQW111');

Learnings
• Using REGEXP (or REGEXP_REPLACE) for pattern matching
• Extracting a substring with regular expressions
• Handling alphanumeric patterns with wildcards in regex

Solutions
• - PostgreSQL solution
SELECT product_id,
REGEXP_REPLACE(product_name, '^.* - ([A-Za-z0-9]{6})$', '\1') AS product_code
FROM products;
• - MySQL solution
SELECT product_id,
REGEXP_SUBSTR(product_name, '([A-Za-z0-9]{6})$') AS product_code
FROM products;
• Q.688
Find All Email Addresses with Specific Domain
Explanation
IBM wants to identify all the email addresses from a specific domain (e.g., @ibm.com). Write
a query that uses wildcards to find email addresses from the customers table that end with
@ibm.com.

Datasets and SQL Schemas


• - Table creation
CREATE TABLE customers (
customer_id INT,
customer_name VARCHAR(100),
email VARCHAR(100)
);
• - Datasets
INSERT INTO customers (customer_id, customer_name, email)
VALUES
(1, 'John Doe', '[email protected]'),
(2, 'Jane Smith', '[email protected]'),
(3, 'Alice Johnson', '[email protected]'),
(4, 'Bob Brown', '[email protected]');

Learnings
• Using wildcards (% in LIKE clause) for pattern matching
• Matching email addresses based on a domain name
• Using LIKE for simple text pattern matching

Solutions
• - PostgreSQL solution
SELECT customer_id, customer_name, email
FROM customers
WHERE email LIKE '%@ibm.com';

864
1000+ SQL Interview Questions & Answers | By Zero Analyst

• - MySQL solution
SELECT customer_id, customer_name, email
FROM customers
WHERE email LIKE '%@ibm.com';
• Q.689
Masking Part of Credit Card Numbers
Explanation
IBM needs to ensure that only part of the credit card number is visible in reports. The goal is
to mask the middle digits of the credit card number, leaving the first 4 and last 4 digits
visible (e.g., 1234-****-****-5678).
Write a query to achieve this using text functions and regular expressions.

Datasets and SQL Schemas


• - Table creation
CREATE TABLE payments (
payment_id INT,
customer_id INT,
credit_card_number VARCHAR(16)
);
• - Datasets
INSERT INTO payments (payment_id, customer_id, credit_card_number)
VALUES
(1, 101, '1234567812345678'),
(2, 102, '9876543298765432'),
(3, 103, '5678123456781234');

Learnings
• Using REGEXP_REPLACE to mask parts of a string
• Regular expressions to replace a range of characters
• Working with text manipulation functions in SQL

Solutions
• - PostgreSQL solution
SELECT payment_id, customer_id,
REGEXP_REPLACE(credit_card_number, '(\d{4})\d{8}(\d{4})', '\1-****-****-\2') AS m
asked_card
FROM payments;
• - MySQL solution
SELECT payment_id, customer_id,
CONCAT(SUBSTRING(credit_card_number, 1, 4), '-****-****-', SUBSTRING(credit_card_
number, 13, 4)) AS masked_card
FROM payments;

Key Learnings from These Questions:


• Text functions such as REGEXP_REPLACE, REGEXP_SUBSTR, SUBSTRING and CONCAT for
pattern matching and string manipulation.
• How to use wildcards (e.g., %) in LIKE statements for text matching.
• Understanding regular expressions (regex) to manipulate and extract specific patterns
from strings.

865
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Masking sensitive data (e.g., credit card numbers) using regular expressions.
• Q.690
Calculate Discounted Price Based on Product Category
Explanation
IBM wants to calculate the discounted price for each product. The discount is applied based
on the product category as follows:
• Electronics: 10% discount
• Clothing: 20% discount
• Home Goods: 15% discount
• All other categories: no discount
Write a SQL query that calculates the discounted price for each product, using a CASE
statement based on the category.

Datasets and SQL Schemas


• - Table creation
CREATE TABLE products (
product_id INT,
product_name VARCHAR(255),
category VARCHAR(50),
price DECIMAL(10, 2)
);
• - Datasets
INSERT INTO products (product_id, product_name, category, price)
VALUES
(1, 'Laptop', 'Electronics', 1200.00),
(2, 'Shirt', 'Clothing', 50.00),
(3, 'Coffee Maker', 'Home Goods', 100.00),
(4, 'Sofa', 'Furniture', 500.00);

Learnings
• Using the CASE statement for conditional logic
• Performing calculations based on categories
• Applying discounts based on different conditions

Solutions
• - PostgreSQL solution
SELECT product_id, product_name, category, price,
CASE
WHEN category = 'Electronics' THEN price * 0.90
WHEN category = 'Clothing' THEN price * 0.80
WHEN category = 'Home Goods' THEN price * 0.85
ELSE price
END AS discounted_price
FROM products;
• - MySQL solution
SELECT product_id, product_name, category, price,
CASE
WHEN category = 'Electronics' THEN price * 0.90
WHEN category = 'Clothing' THEN price * 0.80
WHEN category = 'Home Goods' THEN price * 0.85
ELSE price
END AS discounted_price

866
1000+ SQL Interview Questions & Answers | By Zero Analyst

FROM products;
• Q.691

Question 2: Find Customers Who Have Purchased More Than One Product in
a Single Transaction
Explanation
IBM wants to identify customers who have purchased more than one product in a single
transaction. The task is to join the orders and order_items tables and use a CASE statement
to flag whether a customer bought multiple products in the same transaction.

Datasets and SQL Schemas


• - Table creation
CREATE TABLE orders (
order_id INT,
customer_id INT,
order_date DATE
);
• - Table creation
CREATE TABLE order_items (
order_item_id INT,
order_id INT,
product_id INT,
quantity INT
);
• - Datasets
INSERT INTO orders (order_id, customer_id, order_date)
VALUES
(1, 101, '2023-01-10'),
(2, 102, '2023-01-12'),
(3, 103, '2023-01-14');
INSERT INTO order_items (order_item_id, order_id, product_id, quantity)
VALUES
(1, 1, 1001, 1),
(2, 1, 1002, 2),
(3, 2, 1003, 1),
(4, 3, 1004, 3);

Learnings
• Using JOIN to combine data from related tables
• Using COUNT() and GROUP BY to count the number of products in each order
• Using CASE to flag when a customer buys more than one product in a transaction

Solutions
• - PostgreSQL solution
SELECT o.customer_id, o.order_id,
CASE
WHEN COUNT(oi.product_id) > 1 THEN 'Multiple Products'
ELSE 'Single Product'
END AS product_type
FROM orders o
JOIN order_items oi ON o.order_id = oi.order_id
GROUP BY o.customer_id, o.order_id;
• - MySQL solution
SELECT o.customer_id, o.order_id,
CASE

867
1000+ SQL Interview Questions & Answers | By Zero Analyst

WHEN COUNT(oi.product_id) > 1 THEN 'Multiple Products'


ELSE 'Single Product'
END AS product_type
FROM orders o
JOIN order_items oi ON o.order_id = oi.order_id
GROUP BY o.customer_id, o.order_id;
• Q.692
Calculate Employee Performance Based on Target Achievements
Explanation
IBM wants to evaluate employee performance by comparing their sales against predefined
targets. The target performance is categorized as:
• Exceeded: Sales greater than 120% of the target
• Met: Sales between 100% and 120% of the target
• Below Target: Sales below 100% of the target
Write a query using a CASE statement to categorize each employee based on their sales and
target.

Datasets and SQL Schemas


• - Table creation
CREATE TABLE employee_sales (
employee_id INT,
employee_name VARCHAR(100),
sales DECIMAL(10, 2),
target DECIMAL(10, 2)
);
• - Datasets
INSERT INTO employee_sales (employee_id, employee_name, sales, target)
VALUES
(1, 'John Doe', 150000.00, 120000.00),
(2, 'Jane Smith', 100000.00, 120000.00),
(3, 'Alice Johnson', 80000.00, 100000.00),
(4, 'Bob Brown', 110000.00, 110000.00);

Learnings
• Using CASE to create conditional categories based on sales vs target
• Performing comparison operations (>, <, >=, <=) within CASE statements
• Labeling employees based on their performance

Solutions
• - PostgreSQL solution
SELECT employee_id, employee_name, sales, target,
CASE
WHEN sales > target * 1.2 THEN 'Exceeded'
WHEN sales >= target THEN 'Met'
ELSE 'Below Target'
END AS performance
FROM employee_sales;
• - MySQL solution
SELECT employee_id, employee_name, sales, target,
CASE
WHEN sales > target * 1.2 THEN 'Exceeded'
WHEN sales >= target THEN 'Met'
ELSE 'Below Target'

868
1000+ SQL Interview Questions & Answers | By Zero Analyst

END AS performance
FROM employee_sales;

Key Learnings from These Questions:


• CASE statements allow for conditional logic to handle different scenarios directly within
SQL.
• JOIN operations help link related tables together, allowing for more complex data
analysis.
• Aggregations (like COUNT() or SUM()) are useful when combined with GROUP BY to
categorize or flag data.
• Data categorization (e.g., performance levels, multiple product purchases) is common
when analyzing business metrics.
• Q.693
Identify Top Investors of IBM from Each Country
Explanation
IBM wants to identify the top investors in its company, grouped by country. The task is to
return a list of the highest investment value for each country, ensuring that for each country,
the investor with the highest amount invested is returned.
The data is stored in two tables:
• investors - Contains investor details.
• investments - Contains records of investments made by each investor.
Write a SQL query to find the top investors in each country based on the total investment
amount.

Datasets and SQL Schemas


• - Table creation
CREATE TABLE investors (
investor_id INT PRIMARY KEY,
investor_name VARCHAR(255),
country VARCHAR(100)
);
• - Table creation
CREATE TABLE investments (
investment_id INT PRIMARY KEY,
investor_id INT,
investment_amount DECIMAL(15, 2),
investment_date DATE,
FOREIGN KEY (investor_id) REFERENCES investors(investor_id)
);
• - Datasets
INSERT INTO investors (investor_id, investor_name, country)
VALUES
(1, 'John Smith', 'USA'),
(2, 'Alice Johnson', 'Canada'),
(3, 'Bob Brown', 'USA'),
(4, 'Charlie Davis', 'Canada'),
(5, 'Sarah Lee', 'USA');
INSERT INTO investments (investment_id, investor_id, investment_amount, investment_date)
VALUES
(101, 1, 1000000, '2023-05-01'),
(102, 2, 500000, '2023-06-15'),
(103, 3, 2000000, '2023-07-01'),

869
1000+ SQL Interview Questions & Answers | By Zero Analyst

(104, 4, 800000, '2023-06-01'),


(105, 5, 1500000, '2023-08-01');

Learnings
• Using aggregate functions (SUM) to calculate the total investment per investor.
• Using GROUP BY to group the data by country and get the highest investment.
• Using window functions like ROW_NUMBER() to rank investors and select the top investor
for each country.

Solutions
• - PostgreSQL solution
WITH ranked_investors AS (
SELECT
i.investor_id,
i.investor_name,
i.country,
SUM(inv.investment_amount) AS total_investment,
ROW_NUMBER() OVER (PARTITION BY i.country ORDER BY SUM(inv.investment_amount) DE
SC) AS rank
FROM
investors i
JOIN investments inv ON i.investor_id = inv.investor_id
GROUP BY
i.investor_id, i.investor_name, i.country
)
SELECT
investor_id,
investor_name,
country,
total_investment
FROM ranked_investors
WHERE rank = 1
ORDER BY country;
• - MySQL solution
WITH ranked_investors AS (
SELECT
i.investor_id,
i.investor_name,
i.country,
SUM(inv.investment_amount) AS total_investment,
ROW_NUMBER() OVER (PARTITION BY i.country ORDER BY SUM(inv.investment_amount) DE
SC) AS rank
FROM
investors i
JOIN investments inv ON i.investor_id = inv.investor_id
GROUP BY
i.investor_id, i.investor_name, i.country
)
SELECT
investor_id,
investor_name,
country,
total_investment
FROM ranked_investors
WHERE rank = 1
ORDER BY country;
• Q.694
Find the Most Active Investors in IBM by Investment Growth Rate
Explanation

870
1000+ SQL Interview Questions & Answers | By Zero Analyst

IBM wants to track investors who have shown the highest growth rate in their investments
over the past year. The growth rate is calculated as the percentage change in the total
investment made in the last 12 months compared to the previous 12 months.
The task is to calculate the investment growth rate for each investor and identify the top
investors by growth. The solution should take into account the following:
• The data needs to be grouped by each investor.
• For each investor, compare their investment in the last 12 months to the investment in the
previous 12 months.
• Return investors with a positive growth rate, sorted in descending order by growth.

Datasets and SQL Schemas


• - Table creation
CREATE TABLE investors (
investor_id INT PRIMARY KEY,
investor_name VARCHAR(255),
country VARCHAR(100)
);
• - Table creation
CREATE TABLE investments (
investment_id INT PRIMARY KEY,
investor_id INT,
investment_amount DECIMAL(15, 2),
investment_date DATE,
FOREIGN KEY (investor_id) REFERENCES investors(investor_id)
);
• - Datasets
INSERT INTO investors (investor_id, investor_name, country)
VALUES
(1, 'John Smith', 'USA'),
(2, 'Alice Johnson', 'Canada'),
(3, 'Bob Brown', 'USA'),
(4, 'Charlie Davis', 'Canada'),
(5, 'Sarah Lee', 'USA');
INSERT INTO investments (investment_id, investor_id, investment_amount, investment_date)
VALUES
(101, 1, 1000000, '2023-01-01'),
(102, 1, 1500000, '2023-06-01'),
(103, 2, 500000, '2022-06-01'),
(104, 2, 700000, '2023-06-15'),
(105, 3, 2000000, '2023-07-01'),
(106, 4, 800000, '2023-06-01'),
(107, 4, 1200000, '2023-07-01'),
(108, 5, 1500000, '2023-05-01');

Learnings
• Using window functions for partitioned calculations across different time frames.
• Calculating percentage change in investment over time.
• Combining aggregation and time-based filtering (last 12 months, previous 12 months).

Solutions
• - PostgreSQL solution
WITH investment_growth AS (
SELECT
i.investor_id,
i.investor_name,

871
1000+ SQL Interview Questions & Answers | By Zero Analyst

SUM(CASE WHEN inv.investment_date >= CURRENT_DATE - INTERVAL '1 year' THEN inv.i
nvestment_amount ELSE 0 END) AS last_year_investment,
SUM(CASE WHEN inv.investment_date >= CURRENT_DATE - INTERVAL '2 year' AND inv.in
vestment_date < CURRENT_DATE - INTERVAL '1 year' THEN inv.investment_amount ELSE 0 END)
AS previous_year_investment
FROM
investors i
JOIN investments inv ON i.investor_id = inv.investor_id
GROUP BY
i.investor_id, i.investor_name
)
SELECT
investor_id,
investor_name,
last_year_investment,
previous_year_investment,
ROUND(((last_year_investment - previous_year_investment) / previous_year_investment)
* 100, 2) AS growth_rate
FROM
investment_growth
WHERE
previous_year_investment > 0
ORDER BY
growth_rate DESC;
• - MySQL solution
WITH investment_growth AS (
SELECT
i.investor_id,
i.investor_name,
SUM(CASE WHEN inv.investment_date >= CURDATE() - INTERVAL 1 YEAR THEN inv.invest
ment_amount ELSE 0 END) AS last_year_investment,
SUM(CASE WHEN inv.investment_date >= CURDATE() - INTERVAL 2 YEAR AND inv.investm
ent_date < CURDATE() - INTERVAL 1 YEAR THEN inv.investment_amount ELSE 0 END) AS previou
s_year_investment
FROM
investors i
JOIN investments inv ON i.investor_id = inv.investor_id
GROUP BY
i.investor_id, i.investor_name
)
SELECT
investor_id,
investor_name,
last_year_investment,
previous_year_investment,
ROUND(((last_year_investment - previous_year_investment) / previous_year_investment)
* 100, 2) AS growth_rate
FROM
investment_growth
WHERE
previous_year_investment > 0
ORDER BY
growth_rate DESC;
• Q.695
Identify the Most Profitable Countries for IBM Investors
Explanation
IBM is interested in analyzing which countries bring the highest total investment profit. In
this case, the profit is calculated by comparing the total investments made by the investors
from each country and sorting by the total amount invested.
The task is to write a SQL query that calculates the total investment amount per country
and ranks countries by their total investment in IBM.

872
1000+ SQL Interview Questions & Answers | By Zero Analyst

Datasets and SQL Schemas


• - Table creation
CREATE TABLE investors (
investor_id INT PRIMARY KEY,
investor_name VARCHAR(255),
country VARCHAR(100)
);
• - Table creation
CREATE TABLE investments (
investment_id INT PRIMARY KEY,
investor_id INT,
investment_amount DECIMAL(15, 2),
investment_date DATE,
FOREIGN KEY (investor_id) REFERENCES investors(investor_id)
);
• - Datasets
INSERT INTO investors (investor_id, investor_name, country)
VALUES
(1, 'John Smith', 'USA'),
(2, 'Alice Johnson', 'Canada'),
(3, 'Bob Brown', 'USA'),
(4, 'Charlie Davis', 'Canada'),
(5, 'Sarah Lee', 'USA');
INSERT INTO investments (investment_id, investor_id, investment_amount, investment_date)
VALUES
(101, 1, 1000000, '2023-01-01'),
(102, 2, 500000, '2023-06-01'),
(103, 3, 2000000, '2023-07-01'),
(104, 4, 800000, '2023-06-01'),
(105, 5, 1500000, '2023-08-01');

Learnings
• Aggregating data by country using SUM.
• Using GROUP BY and ORDER BY to rank countries by total investment.
• Combining JOINs with aggregate functions to calculate total investments.

Solutions
• - PostgreSQL solution
SELECT
i.country,
SUM(inv.investment_amount) AS total_investment
FROM
investors i
JOIN investments inv ON i.investor_id = inv.investor_id
GROUP BY
i.country
ORDER BY
total_investment DESC;
• - MySQL solution
SELECT
i.country,
SUM(inv.investment_amount) AS total_investment
FROM
investors i
JOIN investments inv ON i.investor_id = inv.investor_id
GROUP BY
i.country
ORDER BY
total_investment DESC;
• Q.696
Track Investors Who Have Increased Investment in AI Products Over Time

873
1000+ SQL Interview Questions & Answers | By Zero Analyst

Explanation
IBM wants to identify investors who have shown increased interest in AI-related products
over time. The goal is to track whether an investor has increased their investment in
products like IBM AI or IBM Watson by comparing the current investment amount to the
investment amount from the same period last year.
The solution should:
• Identify investments in AI-related products.
• Calculate the year-over-year investment growth for each investor.
• Return investors who have increased their investments in AI products over the past year.

Datasets and SQL Schemas


• - Table creation
CREATE TABLE ai_products (
product_id INT PRIMARY KEY,
product_name VARCHAR(255)
);
• - Datasets
INSERT INTO ai_products (product_id, product_name)
VALUES
(101, 'IBM Cloud'),
(102, 'IBM Watson'),
(104, 'IBM AI');

Learnings
• Using DATE functions to calculate year-over-year changes.
• *
Filtering specific product categories** (AI-related products).
• Using self-joins to compare investment amounts over time.

Solutions
• - PostgreSQL solution
WITH current_year_investments AS (
SELECT
inv.investor_id,
inv.product_id,
SUM(inv.investment_amount) AS current_investment
FROM
investments inv
JOIN ai_products aip ON inv.product_id = aip.product_id
WHERE
inv.investment_date >= CURRENT_DATE - INTERVAL '1 year'
GROUP BY
inv.investor_id, inv.product_id
),
previous_year_investments AS (
SELECT
inv.investor_id,
inv.product_id,
SUM(inv.investment_amount) AS previous_investment
FROM
investments inv
JOIN ai_products aip ON inv.product_id = aip.product_id
WHERE

874
1000+ SQL Interview Questions & Answers | By Zero Analyst

inv.investment_date >= CURRENT_DATE - INTERVAL '2 year' AND inv.investment_date


< CURRENT_DATE - INTERVAL '1 year'
GROUP BY
inv.investor_id, inv.product_id
)
SELECT
cyi.investor_id,
SUM(cyi.current_investment - pyi.previous_investment) AS investment_growth
FROM
current_year_investments cyi
JOIN
previous_year_investments pyi ON cyi.investor_id = pyi.investor_id AND cyi.product_i
d = pyi.product_id
GROUP BY
cyi.investor_id
HAVING
SUM(cyi.current_investment - pyi.previous_investment) > 0
ORDER BY
investment_growth DESC;
• - MySQL solution
WITH current_year_investments AS (
SELECT
inv.investor_id,
inv.product_id,
SUM(inv.investment_amount) AS current_investment
FROM
investments inv
JOIN ai_products aip ON inv.product_id = aip.product_id
WHERE
inv.investment_date >= CURDATE() - INTERVAL 1 YEAR
GROUP BY
inv.investor_id, inv.product_id
),
previous_year_investments AS (
SELECT
inv.investor_id,
inv.product_id,
SUM(inv.investment_amount) AS previous_investment
FROM
investments inv
JOIN ai_products aip ON inv.product_id = aip.product_id
WHERE
inv.investment_date >= CURDATE() - INTERVAL 2 YEAR AND inv.investment_date < CUR
DATE() - INTERVAL 1 YEAR
GROUP BY
inv.investor_id, inv.product_id
)
SELECT
cyi.investor_id,
SUM(cyi.current_investment - pyi.previous_investment) AS investment_growth
FROM
current_year_investments cyi
JOIN
previous_year_investments pyi ON cyi.investor_id = pyi.investor_id AND cyi.product_i
d = pyi.product_id
GROUP BY
cyi.investor_id
HAVING
SUM(cyi.current_investment - pyi.previous_investment) > 0
ORDER BY
investment_growth DESC;
• Q.697
Identify Investors Who Invested in Products with the Highest ROI
Explanation
IBM wants to identify the products that provide the highest Return on Investment (ROI).
ROI is calculated by taking the total investment in a product and dividing it by the revenue

875
1000+ SQL Interview Questions & Answers | By Zero Analyst

generated by that product. The task is to calculate the ROI for each product and identify the
investors who have invested in the top 3 products with the highest ROI.
The solution should:
• Calculate the ROI for each product.
• Identify the top 3 products with the highest ROI.
• List the investors who have invested in these top 3 products, along with their total
investment amount.

Datasets and SQL Schemas


• - Table creation
CREATE TABLE revenue (
product_id INT PRIMARY KEY,
product_name VARCHAR(255),
revenue DECIMAL(15, 2)
);
• - Datasets
INSERT INTO revenue (product_id, product_name, revenue)
VALUES
(101, 'IBM Cloud', 1500000),
(102, 'IBM Watson', 1200000),
(103, 'IBM Blockchain', 800000),
(104, 'IBM AI', 2000000),
(105, 'IBM Security', 1000000);

Learnings
• Calculating ROI as the ratio of investment to revenue.
• Using JOINs to combine revenue and investment data.
• Filtering products based on top N values (highest ROI).

Solutions
• - PostgreSQL solution
WITH product_roi AS (
SELECT
inv.product_id,
SUM(inv.investment_amount) AS total_investment,
r.revenue,
SUM(inv.investment_amount) / r.revenue AS roi
FROM
investments inv
JOIN revenue r ON inv.product_id = r.product_id
GROUP BY
inv.product_id, r.revenue
ORDER BY
roi DESC
LIMIT 3
)
SELECT
i.investor_id,
i.investor_name,
SUM(inv.investment_amount) AS total_investment
FROM
investors i
JOIN investments inv ON i.investor_id = inv.investor_id
JOIN product_roi pr ON inv.product_id = pr.product_id
GROUP BY
i.investor_id, i.investor_name
ORDER BY

876
1000+ SQL Interview Questions & Answers | By Zero Analyst

total_investment DESC;
• - MySQL solution
WITH product_roi AS (
SELECT
inv.product_id,
SUM(inv.investment_amount) AS total_investment,
r.revenue,
SUM(inv.investment_amount) / r.revenue AS roi
FROM
investments inv
JOIN revenue r ON inv.product_id = r.product_id
GROUP BY
inv.product_id, r.revenue
ORDER BY
roi DESC
LIMIT 3
)
SELECT
i.investor_id,
i.investor_name,
SUM(inv.investment_amount) AS total_investment
FROM
investors i
JOIN investments inv ON i.investor_id = inv.investor_id
JOIN product_roi pr ON inv.product_id = pr.product_id
GROUP BY
i.investor_id, i.investor_name
ORDER BY
total_investment DESC;
• Q.698
Calculate Project Hierarchy with Recursive CTE
Explanation
IBM has a project management database where each project can have sub-projects (child
projects). The task is to return the project hierarchy, showing each project along with its
level in the hierarchy and its parent project (if any).
For this, you’ll need to use a recursive CTE to find the hierarchy of projects and display
them with their parent-child relationships.

Datasets and SQL Schemas


• - Table creation
CREATE TABLE projects (
project_id INT,
project_name VARCHAR(255),
parent_project_id INT
);
• - Datasets
INSERT INTO projects (project_id, project_name, parent_project_id)
VALUES
(1, 'Project A', NULL),
(2, 'Project B', 1),
(3, 'Project C', 1),
(4, 'Project D', 2),
(5, 'Project E', 2),
(6, 'Project F', 3);

Learnings
• Using recursive CTE to navigate hierarchical structures
• Handling parent-child relationships in relational databases

877
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Using CTE for depth-first traversal of hierarchical data

Solutions
• - PostgreSQL solution
WITH RECURSIVE project_hierarchy AS (
-- Base case: Select the root projects
SELECT project_id, project_name, parent_project_id, 1 AS level
FROM projects
WHERE parent_project_id IS NULL
UNION ALL
-- Recursive case: Select child projects and increment the level
SELECT p.project_id, p.project_name, p.parent_project_id, ph.level + 1
FROM projects p
INNER JOIN project_hierarchy ph ON p.parent_project_id = ph.project_id
)
SELECT project_id, project_name, parent_project_id, level
FROM project_hierarchy
ORDER BY level, parent_project_id;
• - MySQL solution
WITH RECURSIVE project_hierarchy AS (
-- Base case: Select the root projects
SELECT project_id, project_name, parent_project_id, 1 AS level
FROM projects
WHERE parent_project_id IS NULL
UNION ALL
-- Recursive case: Select child projects and increment the level
SELECT p.project_id, p.project_name, p.parent_project_id, ph.level + 1
FROM projects p
INNER JOIN project_hierarchy ph ON p.parent_project_id = ph.project_id
)
SELECT project_id, project_name, parent_project_id, level
FROM project_hierarchy
ORDER BY level, parent_project_id;
• Q.699
Find Projects with Circular Dependencies
Explanation
IBM’s project database has a circular dependency problem, where projects might
mistakenly reference each other as parent-child in a cycle. The goal is to write a recursive
query that can detect and return projects that are part of a circular reference, i.e., projects
where one project is a descendant of itself.

Datasets and SQL Schemas


• - Table creation
CREATE TABLE projects (
project_id INT,
project_name VARCHAR(255),
parent_project_id INT
);
• - Datasets
INSERT INTO projects (project_id, project_name, parent_project_id)
VALUES
(1, 'Project A', 2),
(2, 'Project B', 3),
(3, 'Project C', 1); -- Circular reference: Project C is a child of Project A

Learnings
• Using recursive CTEs to detect cycles or circular dependencies

878
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Handling self-referencing data structures


• Using cycle detection in graph-based queries

Solutions
• - PostgreSQL solution
WITH RECURSIVE project_hierarchy AS (
-- Base case: Select root projects
SELECT project_id, project_name, parent_project_id
FROM projects
WHERE parent_project_id IS NOT NULL
UNION ALL
-- Recursive case: Traverse through the project hierarchy
SELECT p.project_id, p.project_name, p.parent_project_id
FROM projects p
INNER JOIN project_hierarchy ph ON p.parent_project_id = ph.project_id
)
SELECT DISTINCT ph1.project_id, ph1.project_name
FROM project_hierarchy ph1
JOIN project_hierarchy ph2 ON ph1.project_id = ph2.parent_project_id
WHERE ph1.project_id = ph2.project_id;
• - MySQL solution
WITH RECURSIVE project_hierarchy AS (
-- Base case: Select root projects
SELECT project_id, project_name, parent_project_id
FROM projects
WHERE parent_project_id IS NOT NULL
UNION ALL
-- Recursive case: Traverse through the project hierarchy
SELECT p.project_id, p.project_name, p.parent_project_id
FROM projects p
INNER JOIN project_hierarchy ph ON p.parent_project_id = ph.project_id
)
SELECT DISTINCT ph1.project_id, ph1.project_name
FROM project_hierarchy ph1
JOIN project_hierarchy ph2 ON ph1.project_id = ph2.parent_project_id
WHERE ph1.project_id = ph2.project_id;
• Q.700
Find Project Milestones Over Time
Explanation
IBM wants to analyze project milestones over time. Each project has a set of milestones, and
the task is to find all milestones that occurred in the last three months for each project,
ordered by the milestone date. This involves a recursive CTE to calculate dates relative to
the current date and joining with the milestones table.

Datasets and SQL Schemas


• - Table creation
CREATE TABLE projects (
project_id INT,
project_name VARCHAR(255)
);
• - Table creation
CREATE TABLE milestones (
milestone_id INT,
project_id INT,
milestone_name VARCHAR(255),
milestone_date DATE
);
• - Datasets

879
1000+ SQL Interview Questions & Answers | By Zero Analyst

INSERT INTO projects (project_id, project_name)


VALUES
(1, 'Project Alpha'),
(2, 'Project Beta');
INSERT INTO milestones (milestone_id, project_id, milestone_name, milestone_date)
VALUES
(1, 1, 'Milestone 1', '2023-02-15'),
(2, 1, 'Milestone 2', '2023-04-05'),
(3, 1, 'Milestone 3', '2023-07-01'),
(4, 2, 'Milestone 1', '2023-03-10'),
(5, 2, 'Milestone 2', '2023-06-15');

Learnings
• Using recursive CTEs to calculate dates and manipulate time-based data
• Working with date functions in SQL
• Joining CTEs with actual tables to retrieve relevant data

Solutions
• - PostgreSQL solution
WITH RECURSIVE recent_milestones AS (
SELECT milestone_id, project_id, milestone_name, milestone_date
FROM milestones
WHERE milestone_date > CURRENT_DATE - INTERVAL '3 months'
UNION ALL
SELECT m.milestone_id, m.project_id, m.milestone_name, m.milestone_date
FROM milestones m
JOIN recent_milestones rm ON m.project_id = rm.project_id
WHERE m.milestone_date > CURRENT_DATE - INTERVAL '3 months'
)
SELECT project_id, milestone_name, milestone_date
FROM recent_milestones
ORDER BY milestone_date;
• - MySQL solution
WITH RECURSIVE recent_milestones AS (
SELECT milestone_id, project_id, milestone_name, milestone_date
FROM milestones
WHERE milestone_date > CURDATE() - INTERVAL 3 MONTH
UNION ALL
SELECT m.milestone_id, m.project_id, m.milestone_name, m.milestone_date
FROM milestones m
JOIN recent_milestones rm ON m.project_id = rm.project_id
WHERE m.milestone_date > CURDATE() - INTERVAL 3 MONTH
)
SELECT project_id, milestone_name, milestone_date
FROM recent_milestones
ORDER BY milestone_date;

Key Learnings from These Questions:


• Recursive CTEs are extremely useful for navigating hierarchical data structures, cycle
detection, and even for date manipulations.
• Handling project hierarchies and self-referencing data (like parent-child relationships)
with recursive queries.
• Date-based filtering and date range manipulation in recursive queries for time-sensitive
data analysis, such as project milestones.

Dell
• Q.701

880
1000+ SQL Interview Questions & Answers | By Zero Analyst

Question
Identifying Top Purchasing Customers
Dell needs to identify the top customers who have spent the most on their products over the
past year. This analysis will help the company identify its "power users" for special offers
and future product development.
Explanation
The task requires calculating the total amount spent by each customer over the past year. This
involves joining the orders and products tables, filtering orders made in the last year,
summing the total spent by each user, and retrieving the top 5 users based on the total spent.
Datasets and SQL Schemas
• - Orders table creation
CREATE TABLE orders (
order_id INT,
user_id INT,
order_date DATE,
product_id INT
);
• - Products table creation
CREATE TABLE products (
product_id INT,
product_name VARCHAR(255),
price_usd DECIMAL(10, 2)
);
• - Insert sample data into orders table
INSERT INTO orders (order_id, user_id, order_date, product_id)
VALUES
(2001, 456, '2021-04-09', 106),
(2002, 789, '2021-05-10', 105),
(2003, 456, '2021-06-15', 107),
(2004, 123, '2021-07-20', 105),
(2005, 789, '2021-08-25', 106);
• - Insert sample data into products table
INSERT INTO products (product_id, product_name, price_usd)
VALUES
(105, 'Inspiron Laptop', 500),
(106, 'Alienware Gaming Desktop', 2000),
(107, 'Dell Monitor', 200);

Learnings
• JOIN operation to combine data from multiple tables
• DATE functions for filtering records based on time (e.g., the last year)
• Aggregation with SUM() to calculate total spending
• Grouping by user to get a total for each customer
• Ordering and LIMIT for retrieving top results
Solutions
• - PostgreSQL solution
SELECT o.user_id, SUM(p.price_usd) AS total_spent
FROM orders o
JOIN products p ON o.product_id = p.product_id
WHERE o.order_date >= DATE_TRUNC('year', CURRENT_DATE) - INTERVAL '1 year'
GROUP BY o.user_id
ORDER BY total_spent DESC
LIMIT 5;
• - MySQL solution
SELECT o.user_id, SUM(p.price_usd) AS total_spent
FROM orders o

881
1000+ SQL Interview Questions & Answers | By Zero Analyst

JOIN products p ON o.product_id = p.product_id


WHERE o.order_date >= CURDATE() - INTERVAL 1 YEAR
GROUP BY o.user_id
ORDER BY total_spent DESC
LIMIT 5;
• Q.702
Question
Average Rating Per Month for Each Product
As a data analyst at Dell, you are tasked with analyzing customer reviews on each product.
Write a SQL query to calculate the average rating ("stars") that each product received per
month.
Explanation
This task requires extracting the month from the review submission date and calculating the
average rating for each product in each month. The query groups the data by both month and
product ID to calculate the average for each combination.
Datasets and SQL Schemas
• - Reviews table creation
CREATE TABLE reviews (
review_id INT,
user_id INT,
submit_date DATE,
product_id INT,
stars INT
);
• - Insert sample data into reviews table
INSERT INTO reviews (review_id, user_id, submit_date, product_id, stars)
VALUES
(6171, 123, '2020-06-08', 50001, 4),
(7802, 265, '2020-06-10', 69852, 4),
(5293, 362, '2020-06-18', 50001, 3),
(6352, 192, '2020-07-26', 69852, 3),
(4517, 981, '2020-07-05', 69852, 2);

Learnings
• EXTRACT() function to extract specific date parts (e.g., month)
• AVG() function to calculate the average of numerical values
• GROUP BY to aggregate data by specific fields (month and product_id)
• ORDER BY for sorting results
Solutions
• - PostgreSQL solution
SELECT
EXTRACT(MONTH FROM submit_date) AS month,
product_id,
AVG(stars) AS avg_stars
FROM
reviews
GROUP BY
month, product_id
ORDER BY
product_id, month;
• - MySQL solution
SELECT
MONTH(submit_date) AS month,
product_id,
AVG(stars) AS avg_stars
FROM

882
1000+ SQL Interview Questions & Answers | By Zero Analyst

reviews
GROUP BY
month, product_id
ORDER BY
product_id, month;
• Q.703
Product Sales by Category
Question
Dell wants to analyze the sales performance of its products by category. Write a SQL query
to calculate the total sales for each product category in the past quarter, including the total
revenue generated.
Explanation
You need to join the orders, products, and categories tables to calculate the total sales
for each category over the past quarter. The query should filter orders made in the last
quarter, group the results by category, and sum up the total revenue based on product prices.
Datasets and SQL Schemas
• - Orders table creation
CREATE TABLE orders (
order_id INT,
user_id INT,
order_date DATE,
product_id INT,
quantity INT
);
• - Products table creation
CREATE TABLE products (
product_id INT,
product_name VARCHAR(255),
price_usd DECIMAL(10, 2),
category_id INT
);
• - Categories table creation
CREATE TABLE categories (
category_id INT,
category_name VARCHAR(255)
);
• - Insert sample data into orders table
INSERT INTO orders (order_id, user_id, order_date, product_id, quantity)
VALUES
(1001, 123, '2023-09-12', 20001, 2),
(1002, 456, '2023-09-15', 20002, 1),
(1003, 789, '2023-09-25', 20001, 3);
• - Insert sample data into products table
INSERT INTO products (product_id, product_name, price_usd, category_id)
VALUES
(20001, 'Alienware Laptop', 1500, 1),
(20002, 'XPS Desktop', 1200, 2);
• - Insert sample data into categories table
INSERT INTO categories (category_id, category_name)
VALUES
(1, 'Laptops'),
(2, 'Desktops');

Learnings
• JOIN between multiple tables (orders, products, categories)
• DATE functions for filtering the past quarter's data
• Aggregation with SUM() and GROUP BY

883
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Filtering based on date and product category


Solutions
• - PostgreSQL solution
SELECT c.category_name, SUM(p.price_usd * o.quantity) AS total_revenue
FROM orders o
JOIN products p ON o.product_id = p.product_id
JOIN categories c ON p.category_id = c.category_id
WHERE o.order_date >= DATE_TRUNC('quarter', CURRENT_DATE) - INTERVAL '1 quarter'
GROUP BY c.category_name
ORDER BY total_revenue DESC;
• - MySQL solution
SELECT c.category_name, SUM(p.price_usd * o.quantity) AS total_revenue
FROM orders o
JOIN products p ON o.product_id = p.product_id
JOIN categories c ON p.category_id = c.category_id
WHERE o.order_date >= CURDATE() - INTERVAL 3 MONTH
GROUP BY c.category_name
ORDER BY total_revenue DESC;
• Q.704
Customer Order Frequency
Question
Dell wants to understand how often its customers place orders for products. Write a SQL
query to calculate the number of orders placed by each customer in the last 6 months.
Explanation
You need to count the number of orders placed by each customer in the last 6 months by
filtering orders based on the date and grouping by customer ID.
Datasets and SQL Schemas
• - Orders table creation
CREATE TABLE orders (
order_id INT,
user_id INT,
order_date DATE,
product_id INT
);
• - Insert sample data into orders table
INSERT INTO orders (order_id, user_id, order_date, product_id)
VALUES
(2001, 123, '2023-06-10', 1001),
(2002, 456, '2023-07-14', 1002),
(2003, 123, '2023-08-05', 1003),
(2004, 789, '2023-09-01', 1004),
(2005, 123, '2023-09-20', 1005);

Learnings
• COUNT() function for counting orders
• GROUP BY for summarizing results by customer
• DATE filtering for the last 6 months
Solutions
• - PostgreSQL solution
SELECT user_id, COUNT(order_id) AS order_count
FROM orders
WHERE order_date >= CURRENT_DATE - INTERVAL '6 months'
GROUP BY user_id
ORDER BY order_count DESC;

884
1000+ SQL Interview Questions & Answers | By Zero Analyst

• - MySQL solution
SELECT user_id, COUNT(order_id) AS order_count
FROM orders
WHERE order_date >= CURDATE() - INTERVAL 6 MONTH
GROUP BY user_id
ORDER BY order_count DESC;
• Q.705
Product Popularity by Region
Question
Dell wants to track the popularity of its products by region. Write a SQL query to calculate
the number of units sold for each product by region in the past month.
Explanation
You need to join the orders, products, and regions tables to calculate the number of units
sold per product by region for the past month. The query should filter orders by the last
month and group by product and region.
Datasets and SQL Schemas
• - Orders table creation
CREATE TABLE orders (
order_id INT,
user_id INT,
order_date DATE,
product_id INT,
quantity INT,
region_id INT
);
• - Products table creation
CREATE TABLE products (
product_id INT,
product_name VARCHAR(255)
);
• - Regions table creation
CREATE TABLE regions (
region_id INT,
region_name VARCHAR(255)
);
• - Insert sample data into orders table
INSERT INTO orders (order_id, user_id, order_date, product_id, quantity, region_id)
VALUES
(3001, 123, '2023-12-05', 1001, 2, 1),
(3002, 456, '2023-12-10', 1002, 3, 2),
(3003, 789, '2023-12-15', 1001, 1, 1),
(3004, 123, '2023-12-20', 1003, 4, 3);
• - Insert sample data into products table
INSERT INTO products (product_id, product_name)
VALUES
(1001, 'Alienware Laptop'),
(1002, 'XPS Desktop'),
(1003, 'Inspiron Laptop');
• - Insert sample data into regions table
INSERT INTO regions (region_id, region_name)
VALUES
(1, 'North America'),
(2, 'Europe'),
(3, 'Asia');

Learnings
• JOIN across multiple tables (orders, products, regions)
• Aggregation for counting units sold

885
1000+ SQL Interview Questions & Answers | By Zero Analyst

• DATE filtering for tracking sales in the last month


• GROUP BY and COUNT() to summarize data
Solutions
• - PostgreSQL solution
SELECT p.product_name, r.region_name, SUM(o.quantity) AS total_units_sold
FROM orders o
JOIN products p ON o.product_id = p.product_id
JOIN regions r ON o.region_id = r.region_id
WHERE o.order_date >= DATE_TRUNC('month', CURRENT_DATE)
GROUP BY p.product_name, r.region_name
ORDER BY total_units_sold DESC;
• - MySQL solution
SELECT p.product_name, r.region_name, SUM(o.quantity) AS total_units_sold
FROM orders o
JOIN products p ON o.product_id = p.product_id
JOIN regions r ON o.region_id = r.region_id
WHERE o.order_date >= CURDATE() - INTERVAL 1 MONTH
GROUP BY p.product_name, r.region_name
ORDER BY total_units_sold DESC;

• Q.706
Count of Complaints by Product
Question
Write a SQL query to count the number of complaints for each product.
Explanation
This task involves calculating the total number of complaints for each product by grouping
the data by product_id and counting the occurrences of complaints.
Datasets and SQL Schemas
• - Complaints table creation
CREATE TABLE complaints (
complaint_id INT,
user_id INT,
product_id INT,
complaint_date DATE,
complaint_text VARCHAR(255)
);
• - Insert sample data into complaints table
INSERT INTO complaints (complaint_id, user_id, product_id, complaint_date, complaint_tex
t)
VALUES
(101, 123, 50001, '2023-08-10', 'Broken screen'),
(102, 456, 50002, '2023-09-05', 'Not working as expected'),
(103, 789, 50001, '2023-09-12', 'Battery draining too fast'),
(104, 101, 50001, '2023-09-15', 'Overheating issue');

Learnings
• COUNT() for counting occurrences
• GROUP BY to aggregate data by product
• Basic filtering without advanced logic
Solutions
• - PostgreSQL solution
SELECT product_id, COUNT(*) AS complaint_count
FROM complaints
GROUP BY product_id

886
1000+ SQL Interview Questions & Answers | By Zero Analyst

ORDER BY complaint_count DESC;


• - MySQL solution
SELECT product_id, COUNT(*) AS complaint_count
FROM complaints
GROUP BY product_id
ORDER BY complaint_count DESC;

• Q.707
Average Rating of Customer Feedback
Question
Write a SQL query to calculate the average rating given by customers for their feedback.
Explanation
You need to calculate the average of the rating column for each feedback entry. The query
should use the AVG() function to compute the average.
Datasets and SQL Schemas
• - Feedback table creation
CREATE TABLE feedback (
feedback_id INT,
user_id INT,
product_id INT,
feedback_date DATE,
rating INT
);
• - Insert sample data into feedback table
INSERT INTO feedback (feedback_id, user_id, product_id, feedback_date, rating)
VALUES
(201, 123, 50001, '2023-09-10', 4),
(202, 456, 50002, '2023-09-15', 5),
(203, 789, 50001, '2023-09-20', 3),
(204, 101, 50001, '2023-09-25', 2);

Learnings
• AVG() to calculate the average value of a column
• GROUP BY to compute averages by product or customer
• Filtering based on feedback dates or other criteria
Solutions
• - PostgreSQL solution
SELECT product_id, AVG(rating) AS avg_rating
FROM feedback
GROUP BY product_id
ORDER BY avg_rating DESC;
• - MySQL solution
SELECT product_id, AVG(rating) AS avg_rating
FROM feedback
GROUP BY product_id
ORDER BY avg_rating DESC;
• Q.708
Number of Service Requests in the Last 30 Days
Question
Write a SQL query to calculate the number of service requests made in the last 30 days.
Explanation

887
1000+ SQL Interview Questions & Answers | By Zero Analyst

You need to filter the service_requests table to count how many requests were made in the
last 30 days, using the current date.
Datasets and SQL Schemas
• - Service Requests table creation
CREATE TABLE service_requests (
request_id INT,
user_id INT,
service_type VARCHAR(100),
request_date DATE,
request_status VARCHAR(50)
);
• - Insert sample data into service_requests table
INSERT INTO service_requests (request_id, user_id, service_type, request_date, request_s
tatus)
VALUES
(301, 123, 'Warranty', '2023-09-10', 'Completed'),
(302, 456, 'Technical Support', '2023-09-05', 'Pending'),
(303, 789, 'Return', '2023-08-28', 'Completed'),
(304, 101, 'Warranty', '2023-09-20', 'Pending');

Learnings
• COUNT() to count rows
• DATE filtering using CURRENT_DATE
• Basic date arithmetic for filtering recent entries
Solutions
• - PostgreSQL solution
SELECT COUNT(*) AS service_request_count
FROM service_requests
WHERE request_date >= CURRENT_DATE - INTERVAL '30 days';
• - MySQL solution
SELECT COUNT(*) AS service_request_count
FROM service_requests
WHERE request_date >= CURDATE() - INTERVAL 30 DAY;
• Q.709
Identify Dell Products with Replacement Requests Within 6 Months of Purchase
Question
Write a SQL query to identify the Dell products where customers requested a replacement
within 6 months of their purchase.
Explanation
You need to find the products for which replacement requests were made within 6 months of
the purchase date. This requires joining the orders table with the service_requests table,
filtering the data based on the request_type (assuming "Replacement" is a type) and
ensuring the request was made within 6 months of the order date.
Datasets and SQL Schemas
• - Orders table creation
CREATE TABLE orders (
order_id INT,
user_id INT,
order_date DATE,
product_id INT,
quantity INT
);
• - Service Requests table creation

888
1000+ SQL Interview Questions & Answers | By Zero Analyst

CREATE TABLE service_requests (


request_id INT,
user_id INT,
product_id INT,
request_date DATE,
request_type VARCHAR(100) -- 'Replacement', 'Warranty', etc.
);
• - Insert sample data into orders table
INSERT INTO orders (order_id, user_id, order_date, product_id, quantity)
VALUES
(1001, 123, '2023-03-15', 50001, 2),
(1002, 456, '2023-04-10', 50002, 1),
(1003, 789, '2023-06-01', 50003, 3),
(1004, 101, '2023-07-20', 50001, 1);
• - Insert sample data into service_requests table
INSERT INTO service_requests (request_id, user_id, product_id, request_date, request_typ
e)
VALUES
(2001, 123, 50001, '2023-09-10', 'Replacement'),
(2002, 456, 50002, '2023-06-05', 'Replacement'),
(2003, 789, 50003, '2023-07-10', 'Technical Support'),
(2004, 101, 50001, '2023-08-15', 'Replacement');

Learnings
• JOIN operation to combine data from multiple tables
• DATE filtering to compare dates (e.g., 6 months from purchase date)
• Conditional filtering based on the request type (Replacement)
Solutions
• - PostgreSQL solution
SELECT o.product_id, p.product_name
FROM orders o
JOIN service_requests sr ON o.product_id = sr.product_id
JOIN products p ON o.product_id = p.product_id
WHERE sr.request_type = 'Replacement'
AND sr.request_date <= o.order_date + INTERVAL '6 months'
ORDER BY o.product_id;
• - MySQL solution
SELECT o.product_id, p.product_name
FROM orders o
JOIN service_requests sr ON o.product_id = sr.product_id
JOIN products p ON o.product_id = p.product_id
WHERE sr.request_type = 'Replacement'
AND sr.request_date <= DATE_ADD(o.order_date, INTERVAL 6 MONTH)
ORDER BY o.product_id;

Explanation of the Query


• JOINs: We're joining the orders table with the service_requests table to correlate
orders with requests, and with the products table to get the product name.
• Filtering: We filter for request_type = 'Replacement' to ensure only replacement
requests are considered. We also check that the request date is within 6 months of the order
date.
• DATE Functions: We use INTERVAL in PostgreSQL and DATE_ADD() in MySQL to
calculate the 6-month window for the request date.
• Q.710
Total Sales for Each Product in Each Region
Question

889
1000+ SQL Interview Questions & Answers | By Zero Analyst

Write a SQL query to find out the total sales (in terms of the number of units) for each
product in each region.
Explanation
This task requires joining the Sales, Products, and Regions tables. You need to group the
data by product name and region name, and then calculate the total number of units sold for
each product in each region. The SUM() function will be used to get the total number of units
sold.
Datasets and SQL Schemas
• - Products table creation
CREATE TABLE Products (
product_id INT,
name VARCHAR(255),
price DECIMAL(10, 2),
release_date DATE
);
• - Insert sample data into Products table
INSERT INTO Products (product_id, name, price, release_date)
VALUES
(1, 'Laptop A', 1200, '2020-01-01'),
(2, 'Laptop B', 1500, '2020-07-01'),
(3, 'Desktop A', 1700, '2021-02-01');
• - Regions table creation
CREATE TABLE Regions (
region_id INT,
name VARCHAR(255)
);
• - Insert sample data into Regions table
INSERT INTO Regions (region_id, name)
VALUES
(1, 'North America'),
(2, 'Europe'),
(3, 'Asia-Pacific');
• - Sales table creation
CREATE TABLE Sales (
sales_id INT,
product_id INT,
region_id INT,
sale_date DATE,
units_sold INT
);
• - Insert sample data into Sales table
INSERT INTO Sales (sales_id, product_id, region_id, sale_date, units_sold)
VALUES
(1, 1, 1, '2021-02-01', 3),
(2, 2, 2, '2021-03-01', 5),
(3, 3, 3, '2021-04-01', 2),
(4, 1, 2, '2021-05-01', 4),
(5, 2, 3, '2021-06-01', 1);

Learnings
• JOIN between multiple tables (Sales, Products, and Regions)
• GROUP BY to aggregate data by product and region
• SUM() to calculate the total units sold
• ORDER BY for sorting results by product and region
Solutions
• - PostgreSQL solution
SELECT P.name AS "Product", R.name AS "Region", SUM(S.units_sold) AS "Total Units Sold"
FROM Sales S

890
1000+ SQL Interview Questions & Answers | By Zero Analyst

JOIN Products P ON S.product_id = P.product_id


JOIN Regions R ON S.region_id = R.region_id
GROUP BY P.name, R.name
ORDER BY P.name, R.name;
• - MySQL solution
SELECT P.name AS "Product", R.name AS "Region", SUM(S.units_sold) AS "Total Units Sold"
FROM Sales S
JOIN Products P ON S.product_id = P.product_id
JOIN Regions R ON S.region_id = R.region_id
GROUP BY P.name, R.name
ORDER BY P.name, R.name;
• Q.711
Filter Dell Customers Based on Their Purchase and Support Experiences
Question
Write a SQL query to identify customers who have purchased laptops in the last year, rated
their purchase experience 4 or higher, and have not had any major tech support issues
(severity level 4 or 5) in the last six months.
Explanation
To identify the target customers, you need to:
• Filter customers who purchased laptops within the last year.
• Ensure that their purchase rating is 4 or higher.
• Ensure that there are no major tech support issues (i.e., issue level 4 or 5) in the last 6
months.
• Use JOIN operations between the purchase and tech_support tables to link the customer
data with their support history.
• Apply proper date filtering using SQL functions to restrict purchases to the last year and
support issues to the last six months.
Datasets and SQL Schemas
• - Purchase table creation
CREATE TABLE purchase (
purchase_id INT,
customer_id INT,
purchase_date DATE,
product_id VARCHAR(50),
rating INT
);
• - Insert sample data into purchase table
INSERT INTO purchase (purchase_id, customer_id, purchase_date, product_id, rating)
VALUES
(1001, 123, '2022-02-15', 'D50001', 5),
(1002, 456, '2022-05-20', 'D60052', 3),
(1003, 789, '2022-03-30', 'D70051', 4),
(1004, 123, '2022-06-18', 'D80050', 4),
(1005, 789, '2022-07-15', 'D90052', 4);
• - Tech Support table creation
CREATE TABLE tech_support (
support_id INT,
customer_id INT,
support_date DATE,
issue_level INT
);
• - Insert sample data into tech_support table
INSERT INTO tech_support (support_id, customer_id, support_date, issue_level)
VALUES
(2001, 123, '2022-05-18', 3),
(2002, 456, '2022-06-21', 5),

891
1000+ SQL Interview Questions & Answers | By Zero Analyst

(2003, 789, '2022-02-20', 2),


(2004, 123, '2022-03-25', 1),
(2005, 789, '2022-07-16', 4);

Learnings
• DATE filtering to get recent records (last year and last 6 months)
• JOINs between purchase and tech_support based on customer_id
• GROUP BY and HAVING to exclude customers with major support issues
• Filtering based on ratings and issue levels
Solutions
• - PostgreSQL solution
SELECT p.customer_id
FROM purchase p
LEFT JOIN tech_support ts ON p.customer_id = ts.customer_id
AND ts.support_date >= CURRENT_DATE - INTERVAL '6 months'
AND ts.issue_level >= 4 -- filtering for major issues
WHERE p.purchase_date >= CURRENT_DATE - INTERVAL '1 year'
AND p.rating >= 4 -- filtering for good ratings
GROUP BY p.customer_id
HAVING COUNT(ts.support_id) = 0; -- ensuring no major tech support issues
• - MySQL solution
SELECT p.customer_id
FROM purchase p
LEFT JOIN tech_support ts ON p.customer_id = ts.customer_id
AND ts.support_date >= CURDATE() - INTERVAL 6 MONTH
AND ts.issue_level >= 4 -- filtering for major issues
WHERE p.purchase_date >= CURDATE() - INTERVAL 1 YEAR
AND p.rating >= 4 -- filtering for good ratings
GROUP BY p.customer_id
HAVING COUNT(ts.support_id) = 0; -- ensuring no major tech support issues

Explanation of the Query


• LEFT JOIN: We use LEFT JOIN to include customers even if they haven't had any
support requests. We filter the support requests within the last 6 months and for major issues
(issue level >= 4).
• Filtering on purchase_date: We ensure the purchase was made in the last year using
WHERE p.purchase_date >= CURRENT_DATE - INTERVAL '1 year'.
• Filtering on rating: We only include customers who rated their purchase experience 4 or
higher using AND p.rating >= 4.
• HAVING COUNT(ts.support_id) = 0: This condition ensures that only customers who
have no major support issues in the last 6 months are included. It counts the support requests
for each customer and filters out customers who have major issues.
• Q.712
Calculate Average Sales of Dell Laptops per Month
Question
Write a SQL query to calculate the average number of units sold each month for each laptop
model.
Explanation
You need to calculate the monthly average sales for each laptop model. To achieve this, you
will:
• Group the data by the month and laptop model.

892
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Use the AVG() function to compute the average units sold per month for each model.
• Ensure the month is extracted correctly from the sale date using the date_trunc()
function (in PostgreSQL) or other date extraction methods in MySQL.
Datasets and SQL Schemas
• - Sales table creation
CREATE TABLE sales (
id INT,
sale_date DATE,
laptop_model VARCHAR(255),
units_sold INT
);
• - Insert sample data into sales table
INSERT INTO sales (id, sale_date, laptop_model, units_sold)
VALUES
(1, '2022-05-01', 'Dell Inspiron', 100),
(2, '2022-05-01', 'Dell XPS', 150),
(3, '2022-05-02', 'Dell Inspiron', 120),
(4, '2022-05-02', 'Dell XPS', 140),
(5, '2022-05-03', 'Dell Inspiron', 110),
(6, '2022-05-03', 'Dell XPS', 160);

Learnings
• DATE extraction to group by month
• AVG() to calculate the average sales per group
• GROUP BY to group data by month and laptop model
• ORDER BY to sort the results by month and laptop model
Solutions
• - PostgreSQL solution
SELECT date_trunc('month', sale_date)::date AS Month,
laptop_model,
AVG(units_sold) AS average_units_sold
FROM sales
GROUP BY date_trunc('month', sale_date)::date, laptop_model
ORDER BY Month, laptop_model;
• - MySQL solution
SELECT DATE_FORMAT(sale_date, '%Y-%m-01') AS Month,
laptop_model,
AVG(units_sold) AS average_units_sold
FROM sales
GROUP BY DATE_FORMAT(sale_date, '%Y-%m-01'), laptop_model
ORDER BY Month, laptop_model;

Explanation of the Query


• date_trunc('month', sale_date)::date (PostgreSQL): This function truncates the
sale_date to the first day of the month, effectively grouping the sales data by month.
• DATE_FORMAT(sale_date, '%Y-%m-01') (MySQL): This function formats the
sale_date to the first day of the month (in YYYY-MM-01 format), grouping sales by month in
MySQL.
• AVG(units_sold): The AVG() function calculates the average number of units sold for
each laptop model per month.
• GROUP BY: The query groups data by the truncated sale_date (month) and laptop_model
to calculate the monthly averages for each model.
• ORDER BY: The results are sorted by the Month and laptop_model to make the output easy
to read.

893
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Q.713
Find the Average Customer Rating for Each Product
Question
Write a SQL query to find the average customer rating for each product based on customer
reviews.
Explanation
You need to calculate the average rating for each product by grouping the data by
product_id and then using the AVG() function on the rating column.
Datasets and SQL Schemas
• - Reviews table creation
CREATE TABLE reviews (
review_id INT,
product_id INT,
customer_id INT,
rating INT,
review_date DATE
);
• - Insert sample data into reviews table
INSERT INTO reviews (review_id, product_id, customer_id, rating, review_date)
VALUES
(1, 101, 1001, 4, '2023-01-15'),
(2, 101, 1002, 5, '2023-02-10'),
(3, 102, 1003, 3, '2023-01-20'),
(4, 103, 1004, 4, '2023-03-01'),
(5, 102, 1005, 2, '2023-02-25');

Learnings
• AVG() for calculating the average rating
• GROUP BY to aggregate by product
Solutions
• - PostgreSQL and MySQL solution
SELECT product_id, AVG(rating) AS avg_rating
FROM reviews
GROUP BY product_id
ORDER BY product_id;
• Q.714
Find Products with a Consistent Rating Over the Past Year
Question
Write a SQL query to identify products that have received a consistent customer rating over
the past year. The consistency is defined as products where the standard deviation of ratings
over the last year is less than or equal to 0.5.
Explanation
You need to:
• Calculate the standard deviation of ratings for each product within the last year.
• Filter products where the standard deviation is less than or equal to 0.5, indicating that
ratings for the product are relatively consistent.
Datasets and SQL Schemas
• - Reviews table creation

894
1000+ SQL Interview Questions & Answers | By Zero Analyst

CREATE TABLE reviews (


review_id INT,
product_id INT,
customer_id INT,
rating INT,
review_date DATE
);
• - Insert sample data into reviews table
INSERT INTO reviews (review_id, product_id, customer_id, rating, review_date)
VALUES
(1, 101, 1001, 4, '2023-01-15'),
(2, 101, 1002, 4, '2023-02-10'),
(3, 101, 1003, 4, '2023-03-01'),
(4, 102, 1004, 5, '2023-01-10'),
(5, 102, 1005, 5, '2023-02-15'),
(6, 103, 1006, 3, '2023-01-22'),
(7, 103, 1007, 4, '2023-03-12');

Learnings
• STDDEV() for calculating standard deviation
• Filtering based on conditions using HAVING
• DATE filtering to get records from the last year
Solutions
• - PostgreSQL and MySQL solution
SELECT product_id, STDDEV(rating) AS rating_stddev
FROM reviews
WHERE review_date >= CURRENT_DATE - INTERVAL '1 year'
GROUP BY product_id
HAVING STDDEV(rating) <= 0.5
ORDER BY product_id;
• Q.715
Count the Number of Products with Each Rating
Question
Write a SQL query to count how many products have received each rating (1 to 5) from
customers.
Explanation
You need to count the occurrences of each rating for products in the reviews table. Use the
GROUP BY clause to group by rating and then use the COUNT() function to count how many
times each rating has been given.
Datasets and SQL Schemas
• - Reviews table creation
CREATE TABLE reviews (
review_id INT,
product_id INT,
customer_id INT,
rating INT,
review_date DATE
);
• - Insert sample data into reviews table
INSERT INTO reviews (review_id, product_id, customer_id, rating, review_date)
VALUES
(1, 101, 1001, 4, '2023-01-15'),
(2, 101, 1002, 5, '2023-02-10'),
(3, 102, 1003, 3, '2023-01-20'),
(4, 103, 1004, 5, '2023-03-01'),
(5, 102, 1005, 2, '2023-02-25'),
(6, 103, 1006, 4, '2023-03-15');

895
1000+ SQL Interview Questions & Answers | By Zero Analyst

Learnings
• COUNT() to count occurrences
• GROUP BY to group by product rating
Solutions
• - PostgreSQL and MySQL solution
SELECT rating, COUNT(*) AS product_count
FROM reviews
GROUP BY rating
ORDER BY rating;
• Q.716
Identify Products with the Lowest Average Rating
Question
Write a SQL query to find the products with the lowest average customer rating (rating less
than 3).
Explanation
You need to calculate the average rating for each product, and then filter products that have
an average rating less than 3.
Datasets and SQL Schemas
• - Reviews table creation
CREATE TABLE reviews (
review_id INT,
product_id INT,
customer_id INT,
rating INT,
review_date DATE
);
• - Insert sample data into reviews table
INSERT INTO reviews (review_id, product_id, customer_id, rating, review_date)
VALUES
(1, 101, 1001, 4, '2023-01-15'),
(2, 101, 1002, 5, '2023-02-10'),
(3, 102, 1003, 2, '2023-01-20'),
(4, 103, 1004, 3, '2023-03-01'),
(5, 102, 1005, 1, '2023-02-25');

Learnings
• AVG() for calculating average ratings
• HAVING clause to filter products with a rating less than 3
Solutions
• - PostgreSQL and MySQL solution
SELECT product_id, AVG(rating) AS avg_rating
FROM reviews
GROUP BY product_id
HAVING AVG(rating) < 3
ORDER BY avg_rating;
• Q.717
Find the Product with the Highest Variance in Customer Ratings Over Time
Question

896
1000+ SQL Interview Questions & Answers | By Zero Analyst

Write a SQL query to find the product that has the highest variance in customer ratings over
time. Variance measures how spread out the ratings are, and a high variance means that
customer ratings for the product are highly inconsistent.
Explanation
You need to:
• Calculate the variance of ratings for each product.
• Use the VAR_POP() or VAR_SAMP() function (depending on the SQL dialect) to calculate
the variance of ratings for each product.
• Identify the product with the highest variance.
Datasets and SQL Schemas
• - Reviews table creation
CREATE TABLE reviews (
review_id INT,
product_id INT,
customer_id INT,
rating INT,
review_date DATE
);
• - Insert sample data into reviews table
INSERT INTO reviews (review_id, product_id, customer_id, rating, review_date)
VALUES
(1, 101, 1001, 4, '2023-01-15'),
(2, 101, 1002, 5, '2023-02-10'),
(3, 101, 1003, 3, '2023-03-01'),
(4, 102, 1004, 4, '2023-01-10'),
(5, 102, 1005, 2, '2023-02-15'),
(6, 103, 1006, 5, '2023-01-22'),
(7, 103, 1007, 1, '2023-03-12');

Learnings
• VAR_POP() or VAR_SAMP() for calculating variance
• GROUP BY to group by product
• ORDER BY to sort by the highest variance
Solutions
• - PostgreSQL and MySQL solution
SELECT product_id, VAR_POP(rating) AS rating_variance
FROM reviews
GROUP BY product_id
ORDER BY rating_variance DESC
LIMIT 1;
• Q.718
Find the Correlation Between Product Price and Customer Rating
Question
Write a SQL query to find the correlation between the price of the products and the customer
ratings. Assume there is a products table that contains the product_id and price, and a
reviews table that contains the product_id and rating. Calculate the Pearson correlation
coefficient between the product price and the customer rating.
Explanation
You need to:
• Join the reviews and products tables on product_id.

897
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Calculate the Pearson correlation coefficient between price and rating. The formula for
the Pearson correlation coefficient is:

Where:

• xx is the product price


• yy is the rating
• nn is the number of pairs (product and rating)
Datasets and SQL Schemas
• - Products table creation
CREATE TABLE products (
product_id INT,
price DECIMAL(10, 2),
product_name VARCHAR(255)
);
• - Reviews table creation
CREATE TABLE reviews (
review_id INT,
product_id INT,
customer_id INT,
rating INT,
review_date DATE
);
• - Insert sample data into products table
INSERT INTO products (product_id, price, product_name)
VALUES
(101, 1000, 'Dell Inspiron'),
(102, 1500, 'Dell XPS'),
(103, 1200, 'Dell Latitude');
• - Insert sample data into reviews table
INSERT INTO reviews (review_id, product_id, customer_id, rating, review_date)
VALUES
(1, 101, 1001, 4, '2023-01-15'),
(2, 101, 1002, 5, '2023-02-10'),
(3, 102, 1003, 3, '2023-03-01'),
(4, 102, 1004, 4, '2023-01-10'),
(5, 103, 1005, 2, '2023-02-15'),
(6, 103, 1006, 3, '2023-03-01');

Learnings
• JOIN between reviews and products tables
• Pearson Correlation formula
• Aggregating and Calculating sums using SQL functions
Solutions
• - PostgreSQL solution
WITH correlation_data AS (
SELECT p.price, r.rating
FROM reviews r

898
1000+ SQL Interview Questions & Answers | By Zero Analyst

JOIN products p ON r.product_id = p.product_id


)
SELECT
(COUNT(*) * SUM(price * rating) - SUM(price) * SUM(rating)) /
(SQRT((COUNT(*) * SUM(price * price) - SUM(price) * SUM(price)) * (COUNT(*) * SUM(ra
ting * rating) - SUM(rating) * SUM(rating)))) AS pearson_correlation
FROM correlation_data;
• - MySQL solution
WITH correlation_data AS (
SELECT p.price, r.rating
FROM reviews r
JOIN products p ON r.product_id = p.product_id
)
SELECT
(COUNT(*) * SUM(price * rating) - SUM(price) * SUM(rating)) /
(SQRT((COUNT(*) * SUM(price * price) - SUM(price) * SUM(price)) * (COUNT(*) * SUM(ra
ting * rating) - SUM(rating) * SUM(rating)))) AS pearson_correlation
FROM correlation_data;

Explanation of the Queries


• Pearson Correlation: This query calculates the Pearson correlation coefficient between
product price and customer ratings using an intermediate CTE (correlation_data) that
joins the reviews and products tables. The Pearson correlation is a measure of the linear
relationship between two variables.
• STDDEV(): The standard deviation function is used to find how consistent the ratings are
for each product over the past year.
• VAR_POP(): The population variance is used to calculate how much the ratings vary for
each product over time.
• Q.719
Identify Products with the Most Frequent Negative Feedback in the Last 6 Months
Question
Write a SQL query to identify the products that have received the most frequent negative
feedback (ratings of 1 or 2) in the last 6 months.
Explanation
You need to:
• Filter the reviews with ratings of 1 or 2 (considered negative feedback).
• Filter reviews from the last 6 months.
• Group the data by product_id and count the occurrences of negative ratings for each
product.
• Order the products by the highest number of negative feedback.
Datasets and SQL Schemas
• - Reviews table creation
CREATE TABLE reviews (
review_id INT,
product_id INT,
customer_id INT,
rating INT,
review_date DATE
);
• - Insert sample data into reviews table
INSERT INTO reviews (review_id, product_id, customer_id, rating, review_date)
VALUES
(1, 101, 1001, 5, '2023-01-15'),
(2, 101, 1002, 2, '2023-02-10'),

899
1000+ SQL Interview Questions & Answers | By Zero Analyst

(3, 102, 1003, 1, '2023-03-01'),


(4, 103, 1004, 4, '2023-01-10'),
(5, 102, 1005, 2, '2023-04-15'),
(6, 101, 1006, 2, '2023-05-20');

Learnings
• Filtering based on rating using WHERE
• DATE functions to filter data from the last 6 months
• COUNT() for counting occurrences
• GROUP BY to group by product_id
Solutions
• - PostgreSQL and MySQL solution
SELECT product_id, COUNT(*) AS negative_feedback_count
FROM reviews
WHERE rating IN (1, 2)
AND review_date >= CURRENT_DATE - INTERVAL '6 months'
GROUP BY product_id
ORDER BY negative_feedback_count DESC;
• Q.720
Track the Trend of Customer Ratings Over Time for a Specific Product
Question
Write a SQL query to track the trend of customer ratings (average rating per month) for a
specific product over the last year.
Explanation
You need to:
• Group the reviews by month for a specific product_id.
• Calculate the average rating per month.
• Filter reviews from the last year.
• Display the results in chronological order.
Datasets and SQL Schemas
• - Reviews table creation
CREATE TABLE reviews (
review_id INT,
product_id INT,
customer_id INT,
rating INT,
review_date DATE
);
• - Insert sample data into reviews table
INSERT INTO reviews (review_id, product_id, customer_id, rating, review_date)
VALUES
(1, 101, 1001, 5, '2023-01-10'),
(2, 101, 1002, 4, '2023-02-15'),
(3, 101, 1003, 3, '2023-03-01'),
(4, 101, 1004, 4, '2023-04-20'),
(5, 101, 1005, 5, '2023-05-10'),
(6, 101, 1006, 4, '2023-06-15');

Learnings
• EXTRACT(MONTH FROM date) to extract the month
• AVG() to calculate the average rating per month
• GROUP BY for monthly grouping

900
1000+ SQL Interview Questions & Answers | By Zero Analyst

• DATE filtering to focus on the last year


Solutions
• - PostgreSQL and MySQL solution
SELECT EXTRACT(MONTH FROM review_date) AS month,
AVG(rating) AS avg_rating
FROM reviews
WHERE product_id = 101
AND review_date >= CURRENT_DATE - INTERVAL '1 year'
GROUP BY month
ORDER BY month;

American Express
• Q.721
Calculate Average Transaction Value per Customer

Explanation
You are asked to calculate the average transaction value for each customer in the American
Express database. The transaction data is stored in a transactions table. Calculate the
average transaction amount for each customer and order the results by the highest average
transaction value.

Datasets and SQL Schemas


Table creation:
-- transactions table creation
CREATE TABLE transactions (
transaction_id INT,
customer_id INT,
transaction_date TIMESTAMP,
amount DECIMAL(10, 2)
);

Datasets:
-- Insert data into transactions
INSERT INTO transactions (transaction_id, customer_id, transaction_date, amount)
VALUES
(1, 101, '2022-10-01 00:00:00', 150.00),
(2, 102, '2022-10-01 00:00:00', 200.00),
(3, 101, '2022-10-02 00:00:00', 50.00),
(4, 103, '2022-10-02 00:00:00', 300.00),
(5, 102, '2022-10-03 00:00:00', 75.00);

Learnings
• Aggregations: Using AVG() to calculate average values.
• GROUP BY: Grouping by customer ID to calculate per-customer statistics.
• Ordering: Sorting results by average transaction value.

Solutions
PostgreSQL Solution:
SELECT
customer_id,
AVG(amount) AS average_transaction_value
FROM

901
1000+ SQL Interview Questions & Answers | By Zero Analyst

transactions
GROUP BY
customer_id
ORDER BY
average_transaction_value DESC;

MySQL Solution:
SELECT
customer_id,
AVG(amount) AS average_transaction_value
FROM
transactions
GROUP BY
customer_id
ORDER BY
average_transaction_value DESC;
• Q.722
Find Top 5 Customers by Total Spending

Explanation
You are tasked with identifying the top 5 customers with the highest total spending. The
spending is recorded in the transactions table. Calculate the total amount spent by each
customer and display the top 5 customers ordered by total spending.

Datasets and SQL Schemas


Table creation:
-- transactions table creation
CREATE TABLE transactions (
transaction_id INT,
customer_id INT,
transaction_date TIMESTAMP,
amount DECIMAL(10, 2)
);

Datasets:
-- Insert data into transactions
INSERT INTO transactions (transaction_id, customer_id, transaction_date, amount)
VALUES
(1, 101, '2022-10-01 00:00:00', 250.00),
(2, 102, '2022-10-01 00:00:00', 400.00),
(3, 101, '2022-10-02 00:00:00', 150.00),
(4, 103, '2022-10-02 00:00:00', 500.00),
(5, 102, '2022-10-03 00:00:00', 200.00);

Learnings
• SUM(): Aggregating the total amount spent by each customer.
• GROUP BY: Grouping by customer to calculate total spending.
• LIMIT: Using LIMIT to restrict the result to top N records.

Solutions
PostgreSQL Solution:
SELECT
customer_id,
SUM(amount) AS total_spending
FROM
transactions
GROUP BY
customer_id

902
1000+ SQL Interview Questions & Answers | By Zero Analyst

ORDER BY
total_spending DESC
LIMIT 5;

MySQL Solution:
SELECT
customer_id,
SUM(amount) AS total_spending
FROM
transactions
GROUP BY
customer_id
ORDER BY
total_spending DESC
LIMIT 5;
• Q.723
Identify Customers Who Made More Than 3 Transactions in a Month

Explanation
You need to identify customers who have made more than three transactions in any given
month. The data is stored in the transactions table. Count the number of transactions per
customer per month and return customers who have made more than 3 transactions in a
month.

Datasets and SQL Schemas


Table creation:
-- transactions table creation
CREATE TABLE transactions (
transaction_id INT,
customer_id INT,
transaction_date TIMESTAMP,
amount DECIMAL(10, 2)
);

Datasets:
-- Insert data into transactions
INSERT INTO transactions (transaction_id, customer_id, transaction_date, amount)
VALUES
(1, 101, '2022-10-01 00:00:00', 150.00),
(2, 101, '2022-10-02 00:00:00', 200.00),
(3, 101, '2022-10-03 00:00:00', 100.00),
(4, 102, '2022-10-01 00:00:00', 50.00),
(5, 102, '2022-10-01 00:00:00', 75.00),
(6, 102, '2022-10-03 00:00:00', 100.00);

Learnings
• COUNT(): Counting transactions for each customer per month.
• EXTRACT(): Extracting the month from the transaction date.
• HAVING: Filtering groups based on aggregate conditions (transactions > 3).

Solutions
PostgreSQL Solution:
SELECT
customer_id,
EXTRACT(MONTH FROM transaction_date) AS month,
COUNT(*) AS transaction_count
FROM

903
1000+ SQL Interview Questions & Answers | By Zero Analyst

transactions
GROUP BY
customer_id, EXTRACT(MONTH FROM transaction_date)
HAVING
COUNT(*) > 3;

MySQL Solution:
SELECT
customer_id,
MONTH(transaction_date) AS month,
COUNT(*) AS transaction_count
FROM
transactions
GROUP BY
customer_id, MONTH(transaction_date)
HAVING
COUNT(*) > 3;
• Q.724
Calculate the Monthly Average Spend Per Customer (Excluding Refunds)

Explanation
In the transactions table, some transactions are refunds. You are required to calculate the
monthly average spend per customer, but excluding refunds. A refund transaction is
indicated by a negative amount. For each customer, calculate the total spend for each month,
then find the average spend across all months for each customer.

Datasets and SQL Schemas


Table creation:
-- transactions table creation
CREATE TABLE transactions (
transaction_id INT,
customer_id INT,
transaction_date TIMESTAMP,
amount DECIMAL(10, 2)
);

Datasets:
-- Insert data into transactions
INSERT INTO transactions (transaction_id, customer_id, transaction_date, amount)
VALUES
(1, 101, '2022-10-01 00:00:00', 200.00),
(2, 101, '2022-10-02 00:00:00', -50.00), -- Refund
(3, 101, '2022-11-01 00:00:00', 100.00),
(4, 102, '2022-10-01 00:00:00', 150.00),
(5, 102, '2022-11-01 00:00:00', 300.00),
(6, 103, '2022-10-01 00:00:00', 500.00),
(7, 103, '2022-10-15 00:00:00', -100.00); -- Refund

Learnings
• Filtering: Excluding transactions with negative amounts.
• Date functions: Extracting month and year from transaction_date.
• Aggregation: Calculating total spend per month, then averaging it over time.
• Handling Refunds: Ensuring refunds are excluded from the total spend.

Solutions

904
1000+ SQL Interview Questions & Answers | By Zero Analyst

PostgreSQL Solution:
SELECT
customer_id,
AVG(monthly_spend) AS average_monthly_spend
FROM (
SELECT
customer_id,
EXTRACT(MONTH FROM transaction_date) AS month,
EXTRACT(YEAR FROM transaction_date) AS year,
SUM(CASE WHEN amount > 0 THEN amount ELSE 0 END) AS monthly_spend
FROM
transactions
GROUP BY
customer_id, EXTRACT(MONTH FROM transaction_date), EXTRACT(YEAR FROM transaction_dat
e)
) AS monthly_data
GROUP BY
customer_id;

MySQL Solution:
SELECT
customer_id,
AVG(monthly_spend) AS average_monthly_spend
FROM (
SELECT
customer_id,
MONTH(transaction_date) AS month,
YEAR(transaction_date) AS year,
SUM(CASE WHEN amount > 0 THEN amount ELSE 0 END) AS monthly_spend
FROM
transactions
GROUP BY
customer_id, MONTH(transaction_date), YEAR(transaction_date)
) AS monthly_data
GROUP BY
customer_id;
• Q.725
Find Customers Who Increased Their Spending by More Than 50% Between Consecutive
Months

Explanation
You need to identify customers who increased their total spending by more than 50%
between two consecutive months. Use the transactions table to calculate monthly spending
and compare the current month's total to the previous month's total for each customer.

Datasets and SQL Schemas


Table creation:
-- transactions table creation
CREATE TABLE transactions (
transaction_id INT,
customer_id INT,
transaction_date TIMESTAMP,
amount DECIMAL(10, 2)
);

Datasets:
-- Insert data into transactions
INSERT INTO transactions (transaction_id, customer_id, transaction_date, amount)
VALUES
(1, 101, '2022-10-01 00:00:00', 200.00),
(2, 101, '2022-10-15 00:00:00', 100.00),
(3, 101, '2022-11-01 00:00:00', 300.00),

905
1000+ SQL Interview Questions & Answers | By Zero Analyst

(4, 102, '2022-10-01 00:00:00', 50.00),


(5, 102, '2022-11-01 00:00:00', 120.00),
(6, 103, '2022-10-01 00:00:00', 500.00),
(7, 103, '2022-11-01 00:00:00', 600.00);

Learnings
• Window Functions: Using LAG() to calculate the previous month's spending.
• Filtering: Identifying customers with more than a 50% increase in spending.
• Date Functions: Grouping by month and year.

Solutions
PostgreSQL Solution:
WITH monthly_spend AS (
SELECT
customer_id,
EXTRACT(MONTH FROM transaction_date) AS month,
EXTRACT(YEAR FROM transaction_date) AS year,
SUM(amount) AS total_spend
FROM
transactions
GROUP BY
customer_id, EXTRACT(MONTH FROM transaction_date), EXTRACT(YEAR FROM transaction_dat
e)
)
SELECT
current.customer_id,
current.year,
current.month,
current.total_spend AS current_month_spend,
previous.total_spend AS previous_month_spend
FROM
monthly_spend current
JOIN
monthly_spend previous ON current.customer_id = previous.customer_id
AND current.year = previous.year
AND current.month = previous.month + 1
WHERE
current.total_spend > previous.total_spend * 1.5;

MySQL Solution:
WITH monthly_spend AS (
SELECT
customer_id,
MONTH(transaction_date) AS month,
YEAR(transaction_date) AS year,
SUM(amount) AS total_spend
FROM
transactions
GROUP BY
customer_id, MONTH(transaction_date), YEAR(transaction_date)
)
SELECT
current.customer_id,
current.year,
current.month,
current.total_spend AS current_month_spend,
previous.total_spend AS previous_month_spend
FROM
monthly_spend current
JOIN
monthly_spend previous ON current.customer_id = previous.customer_id
AND current.year = previous.year
AND current.month = previous.month + 1
WHERE
current.total_spend > previous.total_spend * 1.5;

906
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Q.726
Identify Customers Who Have Made Purchases in 3 Different Years

Explanation
You need to identify customers who have made at least one purchase in 3 different years.
This involves counting the distinct years in which a customer made a transaction.

Datasets and SQL Schemas


Table creation:
-- transactions table creation
CREATE TABLE transactions (
transaction_id INT,
customer_id INT,
transaction_date TIMESTAMP,
amount DECIMAL(10, 2)
);

Datasets:
-- Insert data into transactions
INSERT INTO transactions (transaction_id, customer_id, transaction_date, amount)
VALUES
(1, 101, '2020-01-01 00:00:00', 100.00),
(2, 101, '2021-05-01 00:00:00', 200.00),
(3, 101, '2022-07-01 00:00:00', 300.00),
(4, 102, '2020-03-01 00:00:00', 150.00),
(5, 102, '2021-09-01 00:00:00', 180.00),
(6, 103, '2021-06-01 00:00:00', 500.00);

Learnings
• Distinct: Using DISTINCT to count distinct years.
• GROUP BY: Grouping by customer and year.
• HAVING: Filtering customers who have made purchases in three distinct years.

Solutions
PostgreSQL Solution:
SELECT
customer_id
FROM
(SELECT
customer_id,
EXTRACT(YEAR FROM transaction_date) AS year
FROM
transactions
GROUP BY
customer_id, EXTRACT(YEAR FROM transaction_date)) AS customer_years
GROUP BY
customer_id
HAVING
COUNT(DISTINCT year) = 3;

MySQL Solution:
SELECT
customer_id
FROM
(SELECT
customer_id,
YEAR(transaction_date) AS year
FROM

907
1000+ SQL Interview Questions & Answers | By Zero Analyst

transactions
GROUP BY
customer_id, YEAR(transaction_date)) AS customer_years
GROUP BY
customer_id
HAVING
COUNT(DISTINCT year) = 3;

Let me know if you need more challenging questions or further clarifications!


• Q.727
Identify Credit Cards with High Fraudulent Transaction Risk

Explanation
Credit card transactions are tracked in a transactions table, and you need to identify credit
cards that are at high risk of fraudulent activity. A credit card is considered high-risk if it has
made transactions exceeding $1000 in a single day three or more times within the last 30
days. You need to calculate how many high-risk days each card has, and return the cards with
three or more high-risk days.

Datasets and SQL Schemas


Table creation:
-- transactions table creation
CREATE TABLE transactions (
transaction_id INT,
card_id INT,
transaction_date TIMESTAMP,
amount DECIMAL(10, 2)
);

Datasets:
-- Insert data into transactions
INSERT INTO transactions (transaction_id, card_id, transaction_date, amount)
VALUES
(1, 201, '2022-12-01 00:00:00', 500.00),
(2, 201, '2022-12-01 00:00:00', 600.00),
(3, 201, '2022-12-02 00:00:00', 1200.00),
(4, 202, '2022-12-02 00:00:00', 800.00),
(5, 202, '2022-12-02 00:00:00', 200.00),
(6, 203, '2022-12-03 00:00:00', 2500.00),
(7, 203, '2022-12-03 00:00:00', 300.00),
(8, 203, '2022-12-05 00:00:00', 1200.00),
(9, 203, '2022-12-06 00:00:00', 500.00),
(10, 204, '2022-12-01 00:00:00', 150.00);

Learnings
• Date filtering: Using the DATE_SUB() function to filter data within the last 30 days.
• GROUP BY: Grouping by card_id and transaction date.
• COUNT: Counting the number of days a card has high-risk transactions (greater than
$1000).
• HAVING: Filtering cards that have three or more high-risk days.

Solutions
PostgreSQL Solution:
WITH high_risk_days AS (
SELECT

908
1000+ SQL Interview Questions & Answers | By Zero Analyst

card_id,
DATE(transaction_date) AS transaction_day,
SUM(amount) AS total_amount
FROM
transactions
WHERE
transaction_date >= CURRENT_DATE - INTERVAL '30 days'
GROUP BY
card_id, DATE(transaction_date)
HAVING
SUM(amount) > 1000
)
SELECT
card_id,
COUNT(transaction_day) AS high_risk_day_count
FROM
high_risk_days
GROUP BY
card_id
HAVING
COUNT(transaction_day) >= 3;

MySQL Solution:
WITH high_risk_days AS (
SELECT
card_id,
DATE(transaction_date) AS transaction_day,
SUM(amount) AS total_amount
FROM
transactions
WHERE
transaction_date >= CURDATE() - INTERVAL 30 DAY
GROUP BY
card_id, DATE(transaction_date)
HAVING
SUM(amount) > 1000
)
SELECT
card_id,
COUNT(transaction_day) AS high_risk_day_count
FROM
high_risk_days
GROUP BY
card_id
HAVING
COUNT(transaction_day) >= 3;
• Q.728
Identify Top 3 Cards with the Highest Monthly Spend

Explanation
You need to identify the top 3 credit cards with the highest total spending each month. The
transactions table records all the card transactions. For each card, calculate the total
monthly spending and return the top 3 cards with the highest monthly spending for a given
month (e.g., 2022-10).

Datasets and SQL Schemas


Table creation:
-- transactions table creation
CREATE TABLE transactions (
transaction_id INT,
card_id INT,
transaction_date TIMESTAMP,
amount DECIMAL(10, 2)

909
1000+ SQL Interview Questions & Answers | By Zero Analyst

);

Datasets:
-- Insert data into transactions
INSERT INTO transactions (transaction_id, card_id, transaction_date, amount)
VALUES
(1, 201, '2022-10-01 00:00:00', 250.00),
(2, 202, '2022-10-02 00:00:00', 500.00),
(3, 203, '2022-10-03 00:00:00', 1000.00),
(4, 201, '2022-10-10 00:00:00', 750.00),
(5, 202, '2022-10-15 00:00:00', 300.00),
(6, 203, '2022-10-18 00:00:00', 1200.00),
(7, 204, '2022-10-20 00:00:00', 100.00);

Learnings
• Grouping by Month: Extracting month and year from transaction_date.
• SUM(): Aggregating total amount spent for each card.
• LIMIT: Limiting the result to top N cards based on the total spend.

Solutions
PostgreSQL Solution:
SELECT
card_id,
SUM(amount) AS total_spent
FROM
transactions
WHERE
transaction_date >= '2022-10-01' AND transaction_date < '2022-11-01'
GROUP BY
card_id
ORDER BY
total_spent DESC
LIMIT 3;

MySQL Solution:
SELECT
card_id,
SUM(amount) AS total_spent
FROM
transactions
WHERE
transaction_date >= '2022-10-01' AND transaction_date < '2022-11-01'
GROUP BY
card_id
ORDER BY
total_spent DESC
LIMIT 3;
• Q.729
Detect Credit Cards with Suspicious Spending Patterns (High Frequency in Short Time)

Explanation
You are tasked with detecting suspicious spending patterns in the transactions table. A
suspicious pattern is defined as a credit card making more than 5 transactions in a 1-hour
period with a total amount greater than $500. You need to identify all card IDs that have
exhibited this behavior in the last 7 days.

Datasets and SQL Schemas


Table creation:

910
1000+ SQL Interview Questions & Answers | By Zero Analyst

-- transactions table creation


CREATE TABLE transactions (
transaction_id INT,
card_id INT,
transaction_date TIMESTAMP,
amount DECIMAL(10, 2)
);

Datasets:
-- Insert data into transactions
INSERT INTO transactions (transaction_id, card_id, transaction_date, amount)
VALUES
(1, 201, '2022-12-01 10:00:00', 100.00),
(2, 201, '2022-12-01 10:30:00', 200.00),
(3, 201, '2022-12-01 11:00:00', 150.00),
(4, 201, '2022-12-01 11:15:00', 60.00),
(5, 201, '2022-12-01 11:30:00', 80.00),
(6, 202, '2022-12-01 10:00:00', 500.00),
(7, 202, '2022-12-01 10:40:00', 60.00),
(8, 203, '2022-12-01 14:00:00', 200.00),
(9, 203, '2022-12-01 14:30:00', 100.00),
(10, 203, '2022-12-01 14:40:00', 150.00);

Learnings
• Window Functions: Using LAG() or LEAD() to check if multiple transactions occurred
within a short time span.
• TIME_INTERVAL: Grouping and counting transactions within a specific time window
(e.g., 1 hour).
• Filtering Suspicious Patterns: Checking for patterns where the total transaction amount is
greater than a threshold within a given time period.

Solutions
PostgreSQL Solution:
WITH suspicious_transactions AS (
SELECT
card_id,
transaction_date,
amount,
COUNT(*) OVER (PARTITION BY card_id ORDER BY transaction_date
RANGE BETWEEN INTERVAL '1 hour' PRECEDING AND CURRENT ROW) AS transac
tions_in_hour,
SUM(amount) OVER (PARTITION BY card_id ORDER BY transaction_date
RANGE BETWEEN INTERVAL '1 hour' PRECEDING AND CURRENT ROW) AS tota
l_in_hour
FROM
transactions
WHERE
transaction_date >= CURRENT_DATE - INTERVAL '7 days'
)
SELECT DISTINCT card_id
FROM
suspicious_transactions
WHERE
transactions_in_hour > 5 AND total_in_hour > 500;

MySQL Solution:
WITH suspicious_transactions AS (
SELECT
card_id,
transaction_date,
amount,
COUNT(*) OVER (PARTITION BY card_id ORDER BY transaction_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS transactions_in_
hour,

911
1000+ SQL Interview Questions & Answers | By Zero Analyst

SUM(amount) OVER (PARTITION BY card_id ORDER BY transaction_date


ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS total_in_hour
FROM
transactions
WHERE
transaction_date >= CURDATE() - INTERVAL 7 DAY
)
SELECT DISTINCT
card_id
FROM
suspicious_transactions
WHERE
transactions_in_hour > 5 AND total_in_hour > 500;

• Q.730
Question
Identify the VIP Customers for American Express
For American Express, identify customers with a high frequency of transactions (above
$5000). These 'Whale' users are customers who have made multiple transactions above the
$5000 threshold.
Explanation
The task is to find customers who have made multiple transactions with amounts greater than
or equal to $5000. This can be achieved by counting the transactions per customer and
applying a filter for the transaction count.
Datasets and SQL Schemas
• - Table creation
CREATE TABLE transactions (
transaction_id VARCHAR(50),
customer_id VARCHAR(50),
transaction_date TIMESTAMP,
transaction_amount DECIMAL(10, 2)
);
• - Datasets
INSERT INTO transactions (transaction_id, customer_id, transaction_date, transaction_amo
unt)
VALUES
('001', 'C001', '2022-08-01 00:00:00', 4800),
('002', 'C002', '2022-08-02 00:00:00', 6800),
('003', 'C003', '2022-08-03 00:00:00', 5000),
('004', 'C001', '2022-08-10 00:00:00', 5200),
('005', 'C002', '2022-08-22 00:00:00', 7000);

Learnings
• Use of COUNT() function to count transactions
• Filtering with HAVING for aggregate conditions
• Grouping by customer to aggregate data
• Transaction filtering with WHERE
Solutions
• - PostgreSQL solution
SELECT customer_id, COUNT(*) AS transaction_count
FROM transactions
WHERE transaction_amount >= 5000
GROUP BY customer_id
HAVING COUNT(*) > 1;
• - MySQL solution
SELECT customer_id, COUNT(*) AS transaction_count

912
1000+ SQL Interview Questions & Answers | By Zero Analyst

FROM transactions
WHERE transaction_amount >= 5000
GROUP BY customer_id
HAVING COUNT(*) > 1;
• Q.731
Question
Identify all different types of product and revenue for American Express
Given a table with product information and transaction data, write a SQL query to identify all
different product types along with the total revenue generated for each product.
Explanation
The task is to group the transactions by product type and sum the transaction amounts for
each product. This will give the total revenue per product.
Datasets and SQL Schemas
• - Table creation
CREATE TABLE transactions (
transaction_id VARCHAR(50),
customer_id VARCHAR(50),
transaction_date TIMESTAMP,
transaction_amount DECIMAL(10, 2),
product_type VARCHAR(100)
);
• - Datasets
INSERT INTO transactions (transaction_id, customer_id, transaction_date, transaction_amo
unt, product_type)
VALUES
('001', 'C001', '2022-08-01 00:00:00', 4800, 'Credit Card'),
('002', 'C002', '2022-08-02 00:00:00', 6800, 'Loan'),
('003', 'C003', '2022-08-03 00:00:00', 5000, 'Credit Card'),
('004', 'C001', '2022-08-10 00:00:00', 5200, 'Insurance'),
('005', 'C002', '2022-08-22 00:00:00', 7000, 'Loan');

Learnings
• Use of SUM() to calculate revenue
• Grouping by product type to aggregate data
• Use of GROUP BY and SELECT to identify distinct values
Solutions
• - PostgreSQL solution
SELECT product_type, SUM(transaction_amount) AS total_revenue
FROM transactions
GROUP BY product_type;
• - MySQL solution
SELECT product_type, SUM(transaction_amount) AS total_revenue
FROM transactions
GROUP BY product_type;
• Q.732
Question
Filter out customers who are based in New York, have a credit score above 700, and have a
total transaction amount in the last year exceeding $5000. Return the customer IDs in
ascending order.
Explanation

913
1000+ SQL Interview Questions & Answers | By Zero Analyst

The query needs to filter customers based on their location (New York), credit score (>700),
and their total transaction amount in the last year (greater than $5000). We need to join the
customer and transaction tables, aggregate the transactions for each customer, and apply
the required filters. Finally, sort the customer IDs in ascending order.
Datasets and SQL Schemas
• - Table creation
CREATE TABLE customer (
customer_id INT,
name VARCHAR(100),
location VARCHAR(50),
credit_score INT
);

CREATE TABLE transaction (


transaction_id INT,
customer_id INT,
transaction_amount DECIMAL(10, 2),
transaction_date DATE
);
• - Datasets
INSERT INTO customer (customer_id, name, location, credit_score)
VALUES
(101, 'John Doe', 'New York', 720),
(102, 'Jane Doe', 'California', 680),
(103, 'Michael Smith', 'New York', 750),
(104, 'Emily Johnson', 'New York', 690),
(105, 'William Brown', 'Texas', 710);

INSERT INTO transaction (transaction_id, customer_id, transaction_amount, transaction_da


te)
VALUES
(201, 101, 6500, '2021-10-22'),
(202, 102, 4000, '2021-07-15'),
(203, 103, 5500, '2021-08-31'),
(204, 104, 4800, '2021-11-05'),
(205, 105, 5100, '2021-09-20');

Learnings
• Using JOIN to combine multiple tables
• Applying WHERE filters for location and credit score
• Filtering transactions based on date with WHERE and DATE
• Using GROUP BY for aggregating transaction totals per customer
• Sorting results with ORDER BY
Solutions
• - PostgreSQL solution
SELECT c.customer_id
FROM customer c
JOIN transaction t ON c.customer_id = t.customer_id
WHERE c.location = 'New York'
AND c.credit_score > 700
AND t.transaction_date >= CURRENT_DATE - INTERVAL '1 year'
GROUP BY c.customer_id
HAVING SUM(t.transaction_amount) > 5000
ORDER BY c.customer_id;
• - MySQL solution
SELECT c.customer_id
FROM customer c
JOIN transaction t ON c.customer_id = t.customer_id
WHERE c.location = 'New York'
AND c.credit_score > 700
AND t.transaction_date >= CURDATE() - INTERVAL 1 YEAR

914
1000+ SQL Interview Questions & Answers | By Zero Analyst

GROUP BY c.customer_id
HAVING SUM(t.transaction_amount) > 5000
ORDER BY c.customer_id;
• Q.733
Question
Calculate the click-through-rate (CTR) for each campaign. The CTR is the ratio of the
number of customers who clicked on a campaign to the number of customers who viewed the
campaign. It is calculated as:
CTR = (Number of 'Clicked' Actions / Number of 'Viewed' Actions) * 100
Explanation
To calculate the CTR, we need to count how many times each campaign was viewed and
clicked. Then, calculate the ratio of clicks to views and multiply by 100 to get the percentage.
We use a LEFT JOIN to ensure we include all campaigns, even those with no clicks or views,
and then aggregate the actions using SUM and CASE statements.
Datasets and SQL Schemas
• - Table creation
CREATE TABLE campaigns (
campaign_id INT,
channel VARCHAR(50),
date DATE
);

CREATE TABLE clicks (


campaign_id INT,
customer_id INT,
action VARCHAR(50)
);
• - Datasets
INSERT INTO campaigns (campaign_id, channel, date)
VALUES
(1, 'Email', '2022-07-01'),
(2, 'Web', '2022-07-02'),
(3, 'App', '2022-07-03');

INSERT INTO clicks (campaign_id, customer_id, action)


VALUES
(1, 3452, 'Viewed'),
(1, 4563, 'Clicked'),
(2, 3765, 'Viewed'),
(2, 8765, 'Clicked'),
(2, 1234, 'Viewed'),
(3, 3452, 'Clicked'),
(3, 5632, 'Viewed');

Learnings
• Using LEFT JOIN to join tables and preserve all campaigns
• Using CASE statements within aggregation to count specific actions
• Calculating percentages with simple arithmetic after aggregation
• Grouping by campaign to calculate CTR for each campaign
Solutions
• - PostgreSQL solution
SELECT
c.campaign_id,
c.channel,
(SUM(CASE WHEN ck.action = 'Clicked' THEN 1 ELSE 0 END) /

915
1000+ SQL Interview Questions & Answers | By Zero Analyst

NULLIF(SUM(CASE WHEN ck.action = 'Viewed' THEN 1 ELSE 0 END), 0)) * 100 AS click_th
rough_rate
FROM
campaigns c
LEFT JOIN
clicks ck
ON
c.campaign_id = ck.campaign_id
GROUP BY
c.campaign_id,
c.channel;
• - MySQL solution
SELECT
c.campaign_id,
c.channel,
(SUM(CASE WHEN ck.action = 'Clicked' THEN 1 ELSE 0 END) /
NULLIF(SUM(CASE WHEN ck.action = 'Viewed' THEN 1 ELSE 0 END), 0)) * 100 AS click_th
rough_rate
FROM
campaigns c
LEFT JOIN
clicks ck
ON
c.campaign_id = ck.campaign_id
GROUP BY
c.campaign_id,
c.channel;
• Q.734
Calculate the total spend of each user in the past month, including both successful and failed
transactions.
Explanation
To calculate the total spend, we need to sum the transaction_amount for each user where
the transaction occurred in the past month. The query should filter by transaction status to
include only successful transactions. We will also group by the user to get the total spend per
user.
Datasets and SQL Schemas
• - Table creation
CREATE TABLE transactions (
transaction_id INT,
user_id INT,
transaction_date DATE,
transaction_amount DECIMAL(10, 2),
status VARCHAR(20)
);
• - Datasets
INSERT INTO transactions (transaction_id, user_id, transaction_date, transaction_amount,
status)
VALUES
(1, 101, '2022-12-01', 150.00, 'Successful'),
(2, 102, '2022-12-02', 200.00, 'Failed'),
(3, 103, '2022-12-03', 250.00, 'Successful'),
(4, 101, '2022-12-04', 100.00, 'Successful'),
(5, 102, '2022-12-05', 300.00, 'Successful');

Learnings
• Use of SUM() for aggregating total spend
• Filtering by status to include only successful transactions
• Use of date filtering with WHERE for the past month
• Grouping by user to calculate individual spend

916
1000+ SQL Interview Questions & Answers | By Zero Analyst

Solutions
• - PostgreSQL solution
SELECT user_id, SUM(transaction_amount) AS total_spend
FROM transactions
WHERE status = 'Successful'
AND transaction_date >= CURRENT_DATE - INTERVAL '1 month'
GROUP BY user_id;
• - MySQL solution
SELECT user_id, SUM(transaction_amount) AS total_spend
FROM transactions
WHERE status = 'Successful'
AND transaction_date >= CURDATE() - INTERVAL 1 MONTH
GROUP BY user_id;
• Q.735
Calculate the total charges incurred by each user in a given year, where charges are defined
as any transaction over $500.
Explanation
The task is to identify transactions with amounts over $500, and calculate the total charges
for each user. This should be filtered by transaction amount and grouped by user.
Datasets and SQL Schemas
• - Table creation
CREATE TABLE transactions (
transaction_id INT,
user_id INT,
transaction_date DATE,
transaction_amount DECIMAL(10, 2)
);
• - Datasets
INSERT INTO transactions (transaction_id, user_id, transaction_date, transaction_amount)
VALUES
(1, 101, '2022-01-15', 550.00),
(2, 102, '2022-03-10', 200.00),
(3, 103, '2022-07-23', 600.00),
(4, 101, '2022-08-12', 450.00),
(5, 102, '2022-11-05', 700.00);

Learnings
• Filtering transactions by amount using WHERE
• Aggregating charges using SUM()
• Grouping by user to calculate the total charges for each
Solutions
• - PostgreSQL solution
SELECT user_id, SUM(transaction_amount) AS total_charges
FROM transactions
WHERE transaction_amount > 500
AND EXTRACT(YEAR FROM transaction_date) = 2022
GROUP BY user_id;
• - MySQL solution
SELECT user_id, SUM(transaction_amount) AS total_charges
FROM transactions
WHERE transaction_amount > 500
AND YEAR(transaction_date) = 2022
GROUP BY user_id;
• Q.736

917
1000+ SQL Interview Questions & Answers | By Zero Analyst

Calculate the total rewards points accumulated by each user based on their transaction
history, assuming 1 point is awarded for every $10 spent.
Explanation
We need to calculate the total points for each user by dividing the transaction amount by 10
(as 1 point = $10 spent). This should be done for all transactions, and the result should be
aggregated by user.
Datasets and SQL Schemas
• - Table creation
CREATE TABLE transactions (
transaction_id INT,
user_id INT,
transaction_date DATE,
transaction_amount DECIMAL(10, 2)
);
• - Datasets
INSERT INTO transactions (transaction_id, user_id, transaction_date, transaction_amount)
VALUES
(1, 101, '2022-12-01', 150.00),
(2, 102, '2022-12-02', 200.00),
(3, 103, '2022-12-03', 250.00),
(4, 101, '2022-12-04', 100.00),
(5, 102, '2022-12-05', 300.00);

Learnings
• Use of arithmetic to calculate rewards points
• Aggregation with SUM() to calculate total points per user
• Grouping by user to calculate individual totals
Solutions
• - PostgreSQL solution
SELECT user_id, SUM(transaction_amount / 10) AS total_rewards_points
FROM transactions
GROUP BY user_id;
• - MySQL solution
SELECT user_id, SUM(transaction_amount / 10) AS total_rewards_points
FROM transactions
GROUP BY user_id;
• Q.737
Identify the top 3 users who spent the most money on transactions in the last 6 months.
Explanation
To find the top 3 users with the highest spend in the last 6 months, we need to sum the
transaction_amount for each user, filter the transactions by date (last 6 months), and then
order the users by the total spend in descending order. The LIMIT clause will be used to
restrict the result to the top 3.
Datasets and SQL Schemas
• - Table creation
CREATE TABLE transactions (
transaction_id INT,
user_id INT,
transaction_date DATE,
transaction_amount DECIMAL(10, 2)
);
• - Datasets
INSERT INTO transactions (transaction_id, user_id, transaction_date, transaction_amount)

918
1000+ SQL Interview Questions & Answers | By Zero Analyst

VALUES
(1, 101, '2022-06-01', 150.00),
(2, 102, '2022-07-15', 300.00),
(3, 103, '2022-08-20', 400.00),
(4, 101, '2022-09-10', 500.00),
(5, 102, '2022-10-01', 200.00),
(6, 104, '2022-11-15', 700.00),
(7, 105, '2022-12-05', 250.00);

Learnings
• Aggregating total spend per user
• Filtering by date using WHERE and INTERVAL
• Ordering by sum and limiting results with LIMIT
• Working with date functions to calculate date ranges
Solutions
• - PostgreSQL solution
SELECT user_id, SUM(transaction_amount) AS total_spent
FROM transactions
WHERE transaction_date >= CURRENT_DATE - INTERVAL '6 months'
GROUP BY user_id
ORDER BY total_spent DESC
LIMIT 3;
• - MySQL solution
SELECT user_id, SUM(transaction_amount) AS total_spent
FROM transactions
WHERE transaction_date >= CURDATE() - INTERVAL 6 MONTH
GROUP BY user_id
ORDER BY total_spent DESC
LIMIT 3;
• Q.738
Calculate the average transaction amount for each user during the month of December.
Explanation
To find the average transaction amount for each user in December, we need to filter
transactions based on the month and year (December 2022), group by user_id, and then
calculate the average transaction amount using the AVG() function.
Datasets and SQL Schemas
• - Table creation
CREATE TABLE transactions (
transaction_id INT,
user_id INT,
transaction_date DATE,
transaction_amount DECIMAL(10, 2)
);
• - Datasets
INSERT INTO transactions (transaction_id, user_id, transaction_date, transaction_amount)
VALUES
(1, 101, '2022-12-01', 150.00),
(2, 102, '2022-12-02', 200.00),
(3, 101, '2022-12-05', 350.00),
(4, 103, '2022-12-10', 400.00),
(5, 104, '2022-12-20', 500.00);

Learnings
• Filtering transactions by month using MONTH() and YEAR()
• Calculating average using AVG()
• Grouping by user_id to get average spend per user

919
1000+ SQL Interview Questions & Answers | By Zero Analyst

Solutions
• - PostgreSQL solution
SELECT user_id, AVG(transaction_amount) AS average_transaction_amount
FROM transactions
WHERE EXTRACT(MONTH FROM transaction_date) = 12
AND EXTRACT(YEAR FROM transaction_date) = 2022
GROUP BY user_id;
• - MySQL solution
SELECT user_id, AVG(transaction_amount) AS average_transaction_amount
FROM transactions
WHERE MONTH(transaction_date) = 12
AND YEAR(transaction_date) = 2022
GROUP BY user_id;
• Q.739
Determine the total number of transactions and total amount spent for each user in the last
year.
Explanation
This task requires calculating the total number of transactions and the total transaction
amount for each user over the past year. We need to filter transactions within the last 12
months and use COUNT() to get the number of transactions and SUM() to calculate the total
amount spent.
Datasets and SQL Schemas
• - Table creation
CREATE TABLE transactions (
transaction_id INT,
user_id INT,
transaction_date DATE,
transaction_amount DECIMAL(10, 2)
);
• - Datasets
INSERT INTO transactions (transaction_id, user_id, transaction_date, transaction_amount)
VALUES
(1, 101, '2022-01-15', 150.00),
(2, 102, '2022-02-20', 200.00),
(3, 101, '2022-05-25', 250.00),
(4, 103, '2022-06-30', 300.00),
(5, 101, '2022-09-10', 100.00),
(6, 102, '2022-10-15', 350.00),
(7, 103, '2022-11-01', 500.00);

Learnings
• Using COUNT() to find the total number of transactions
• Using SUM() to find the total amount spent
• Filtering transactions by date (last 12 months)
• Grouping results by user for aggregated statistics
Solutions
• - PostgreSQL solution
SELECT user_id, COUNT(*) AS total_transactions, SUM(transaction_amount) AS total_spent
FROM transactions
WHERE transaction_date >= CURRENT_DATE - INTERVAL '1 year'
GROUP BY user_id;
• - MySQL solution
SELECT user_id, COUNT(*) AS total_transactions, SUM(transaction_amount) AS total_spent
FROM transactions
WHERE transaction_date >= CURDATE() - INTERVAL 1 YEAR
GROUP BY user_id;

920
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Q.740
Find the users who spent more in the first half of the year (January to June) compared to the
second half (July to December) of the same year. Return their user IDs, the total amount
spent in both periods, and the difference.
Explanation
This question involves breaking down transactions into two distinct time periods (first half
and second half of the year) and comparing the total spend in each period for each user. We
need to calculate the total spend for each user in both periods and return users who spent
more in the first half. The query should also return the difference in the amounts spent
between the two periods.
Datasets and SQL Schemas
• - Table creation
CREATE TABLE transactions (
transaction_id INT,
user_id INT,
transaction_date DATE,
transaction_amount DECIMAL(10, 2)
);
• - Datasets
INSERT INTO transactions (transaction_id, user_id, transaction_date, transaction_amount)
VALUES
(1, 101, '2022-01-15', 150.00),
(2, 101, '2022-03-01', 200.00),
(3, 101, '2022-06-20', 300.00),
(4, 101, '2022-07-10', 400.00),
(5, 101, '2022-11-05', 500.00),
(6, 102, '2022-01-01', 250.00),
(7, 102, '2022-05-15', 300.00),
(8, 102, '2022-08-12', 200.00),
(9, 102, '2022-12-25', 150.00);

Learnings
• Use of conditional aggregation with CASE statements
• Date-based filtering using MONTH() or EXTRACT()
• Comparing two different groups of aggregated data
• Understanding how to calculate differences across two periods
Solutions
• - PostgreSQL solution
SELECT user_id,
SUM(CASE WHEN EXTRACT(MONTH FROM transaction_date) BETWEEN 1 AND 6 THEN transacti
on_amount ELSE 0 END) AS first_half_spend,
SUM(CASE WHEN EXTRACT(MONTH FROM transaction_date) BETWEEN 7 AND 12 THEN transact
ion_amount ELSE 0 END) AS second_half_spend,
SUM(CASE WHEN EXTRACT(MONTH FROM transaction_date) BETWEEN 1 AND 6 THEN transacti
on_amount ELSE 0 END) -
SUM(CASE WHEN EXTRACT(MONTH FROM transaction_date) BETWEEN 7 AND 12 THEN transact
ion_amount ELSE 0 END) AS spend_difference
FROM transactions
GROUP BY user_id
HAVING SUM(CASE WHEN EXTRACT(MONTH FROM transaction_date) BETWEEN 1 AND 6 THEN transacti
on_amount ELSE 0 END) >
SUM(CASE WHEN EXTRACT(MONTH FROM transaction_date) BETWEEN 7 AND 12 THEN tran
saction_amount ELSE 0 END);
• - MySQL solution
SELECT user_id,
SUM(CASE WHEN MONTH(transaction_date) BETWEEN 1 AND 6 THEN transaction_amount ELS
E 0 END) AS first_half_spend,

921
1000+ SQL Interview Questions & Answers | By Zero Analyst

SUM(CASE WHEN MONTH(transaction_date) BETWEEN 7 AND 12 THEN transaction_amount EL


SE 0 END) AS second_half_spend,
SUM(CASE WHEN MONTH(transaction_date) BETWEEN 1 AND 6 THEN transaction_amount ELS
E 0 END) -
SUM(CASE WHEN MONTH(transaction_date) BETWEEN 7 AND 12 THEN transaction_amount EL
SE 0 END) AS spend_difference
FROM transactions
GROUP BY user_id
HAVING SUM(CASE WHEN MONTH(transaction_date) BETWEEN 1 AND 6 THEN transaction_amount ELS
E 0 END) >
SUM(CASE WHEN MONTH(transaction_date) BETWEEN 7 AND 12 THEN
transaction_amount ELSE 0 END);

EY
• Q.741

Question: Identifying Duplicate Records


Using a table of customer orders (order_id, customer_id, product_id, order_date), write
a query to find duplicate orders. An order is considered duplicate if the same customer has
placed the same order for the same product on the same day.

Explanation
To identify duplicate orders:
• Group the records by customer_id, product_id, and order_date.
• Use the HAVING clause to filter groups that appear more than once, which indicates
duplicates.
• Return the duplicate orders' details such as customer_id, product_id, and order_date.

Datasets and SQL Schemas


-- Creating the Orders table
CREATE TABLE customer_orders (
order_id INT,
customer_id INT,
product_id INT,
order_date DATE
);

-- Inserting sample data into the Orders table


INSERT INTO customer_orders (order_id, customer_id, product_id, order_date)
VALUES
(1, 1001, 2001, '2023-01-10'),
(2, 1002, 2002, '2023-01-10'),
(3, 1001, 2001, '2023-01-10'),
(4, 1003, 2003, '2023-01-12'),
(5, 1001, 2001, '2023-01-10'),
(6, 1002, 2001, '2023-01-10');

Learnings
• Grouping records to identify duplicates.
• Using HAVING with COUNT to filter out duplicates.
• Self-join techniques to find specific records.

922
1000+ SQL Interview Questions & Answers | By Zero Analyst

Solutions

PostgreSQL Solution
SELECT
customer_id,
product_id,
order_date,
COUNT(*) AS duplicate_count
FROM
customer_orders
GROUP BY
customer_id, product_id, order_date
HAVING
COUNT(*) > 1
ORDER BY
customer_id, product_id, order_date;

MySQL Solution
SELECT
customer_id,
product_id,
order_date,
COUNT(*) AS duplicate_count
FROM
customer_orders
GROUP BY
customer_id, product_id, order_date
HAVING
COUNT(*) > 1
ORDER BY
customer_id, product_id, order_date;
• Q.742
Question: Identifying Unmatched Employee Logins and Logouts
Using a table of employee login and logout events (event_id, employee_id, event_type,
event_timestamp), write a query to find employees who have a login event without a
corresponding logout event. Each event_type can either be login or logout, and each
employee should have only one login and one logout event at any given time. The output
should list the employee_id and the event_timestamp of their unmatched login event.

Explanation
To identify employees who have logged in but not logged out:
• Join the table with itself to match each login event with a logout event for the same
employee.
• Use a LEFT JOIN to include all login events, even those without a matching logout
event.
• Filter for records where the logout event is missing (NULL).
• Return the employee_id and event_timestamp of the unmatched login events.

Datasets and SQL Schemas


-- Creating the Employee Events table
CREATE TABLE employee_events (
event_id INT,
employee_id INT,
event_type VARCHAR(10), -- 'login' or 'logout'

923
1000+ SQL Interview Questions & Answers | By Zero Analyst

event_timestamp DATETIME
);

-- Inserting sample data into the Employee Events table


INSERT INTO employee_events (event_id, employee_id, event_type, event_timestamp)
VALUES
(1, 101, 'login', '2023-01-10 08:00:00'),
(2, 102, 'login', '2023-01-10 09:00:00'),
(3, 101, 'logout', '2023-01-10 17:00:00'),
(4, 103, 'login', '2023-01-10 10:00:00'),
(5, 104, 'login', '2023-01-10 11:00:00');

Learnings
• Using JOIN to match related events in the same table.
• The use of LEFT JOIN to capture missing logout events.
• Filtering with IS NULL to detect unmatched events.

Solutions

PostgreSQL Solution
SELECT
e1.employee_id,
e1.event_timestamp AS unmatched_login_timestamp
FROM
employee_events e1
LEFT JOIN
employee_events e2 ON e1.employee_id = e2.employee_id
AND e1.event_type = 'login'
AND e2.event_type = 'logout'
AND e1.event_timestamp < e2.event_timestamp
WHERE
e1.event_type = 'login'
AND e2.event_id IS NULL
ORDER BY
e1.employee_id;

MySQL Solution
SELECT
e1.employee_id,
e1.event_timestamp AS unmatched_login_timestamp
FROM
employee_events e1
LEFT JOIN
employee_events e2 ON e1.employee_id = e2.employee_id
AND e1.event_type = 'login'
AND e2.event_type = 'logout'
AND e1.event_timestamp < e2.event_timestamp
WHERE
e1.event_type = 'login'
AND e2.event_id IS NULL
ORDER BY
e1.employee_id;
• Q.743

Question: Identifying Employees with Shorter Login Durations for Consecutive


Days
Write a SQL query to find employees who have logged in for less than 8 hours for two
consecutive days. The table employee_events contains event_id, employee_id,

924
1000+ SQL Interview Questions & Answers | By Zero Analyst

event_type, and event_timestamp. The events are either login or logout, and each
employee logs in and out once per day. The result should list the employee_id, the
first_day where the login duration was shorter than 8 hours, and the second_day where the
login duration was also shorter than 8 hours.

Explanation
To identify employees with shorter login durations for consecutive days:
• Pair up login and logout events for each employee on the same day.
• Calculate the duration between login and logout for each event.
• Filter the results to include only those where the duration is less than 8 hours.
• Use a self-join to find consecutive days where both days have a login duration of less than
8 hours.
• Return the employee_id, first_day, and second_day.

Datasets and SQL Schemas


-- Creating the Employee Events table
CREATE TABLE employee_events (
event_id INT,
employee_id INT,
event_type VARCHAR(10), -- 'login' or 'logout'
event_timestamp DATETIME
);

-- Inserting sample data into the Employee Events table


INSERT INTO employee_events (event_id, employee_id, event_type, event_timestamp)
VALUES
(1, 101, 'login', '2023-01-01 08:00:00'),
(2, 101, 'logout', '2023-01-01 15:30:00'), -- 7.5 hrs login duration
(3, 101, 'login', '2023-01-02 09:00:00'),
(4, 101, 'logout', '2023-01-02 16:30:00'), -- 7.5 hrs login duration
(5, 102, 'login', '2023-01-01 07:00:00'),
(6, 102, 'logout', '2023-01-01 15:00:00'), -- 8 hrs login duration
(7, 102, 'login', '2023-01-02 08:00:00'),
(8, 102, 'logout', '2023-01-02 16:00:00'); -- 8 hrs login duration

Learnings
• Calculating time differences between login and logout events.
• Using JOINs and self-joins to find consecutive days.
• Filtering on time durations (less than 8 hours).
• Date manipulation to find consecutive events.

Solutions

PostgreSQL Solution
WITH LoginDurations AS (
SELECT
employee_id,
DATE(event_timestamp) AS login_date,
EXTRACT(EPOCH FROM (MAX(CASE WHEN event_type = 'logout' THEN event_timestamp END
)
- MIN(CASE WHEN event_type = 'login' THEN event_timestamp EN
D))) / 3600 AS duration_hours

925
1000+ SQL Interview Questions & Answers | By Zero Analyst

FROM
employee_events
WHERE
event_type IN ('login', 'logout')
GROUP BY
employee_id, DATE(event_timestamp)
)
SELECT
l1.employee_id,
l1.login_date AS first_day,
l2.login_date AS second_day
FROM
LoginDurations l1
JOIN
LoginDurations l2 ON l1.employee_id = l2.employee_id
AND l1.login_date = l2.login_date - INTERVAL '1 day'
WHERE
l1.duration_hours < 8
AND l2.duration_hours < 8
ORDER BY
l1.employee_id, l1.login_date;

MySQL Solution
WITH LoginDurations AS (
SELECT
employee_id,
DATE(event_timestamp) AS login_date,
TIMESTAMPDIFF(SECOND,
MIN(CASE WHEN event_type = 'login' THEN event_timestamp END),
MAX(CASE WHEN event_type = 'logout' THEN event_timestamp END)) / 3
600 AS duration_hours
FROM
employee_events
WHERE
event_type IN ('login', 'logout')
GROUP BY
employee_id, DATE(event_timestamp)
)
SELECT
l1.employee_id,
l1.login_date AS first_day,
l2.login_date AS second_day
FROM
LoginDurations l1
JOIN
LoginDurations l2 ON l1.employee_id = l2.employee_id
AND l1.login_date = DATE_SUB(l2.login_date, INTERVAL 1 DAY)
WHERE
l1.duration_hours < 8
AND l2.duration_hours < 8
ORDER BY
l1.employee_id, l1.login_date;

Summary of the Solution


• Step 1: Use a CTE (LoginDurations) to calculate the login duration for each employee on
each day by calculating the difference between login and logout times.
• Step 2: Use a self-join to pair consecutive days for the same employee.
• Step 3: Filter for cases where both days have login durations of less than 8 hours.
• Step 4: Return the employee_id, first_day, and second_day where the criteria are met.
• Q.744

926
1000+ SQL Interview Questions & Answers | By Zero Analyst

Question: Identifying Employees Working More Than 1 Hour Consecutively


for 2 Days
Write a SQL query to identify employees who worked for more than 1 hour on two
consecutive days. For each pair of consecutive days where the total working time (calculated
as the difference between login and logout times) is more than 1 hour, flag the employee.
Return the employee_id, first_day, second_day, and a flag indicating if the condition is
met.

Explanation
To identify employees working for more than 1 hour consecutively for two days:
• Calculate the duration for each login/logout pair for each employee on each day.
• Join the table with itself to find consecutive days for the same employee.
• Filter the records where the total working time on both days exceeds 1 hour.
• Raise a flag for employees meeting the condition.

Datasets and SQL Schemas


-- Creating the Employee Events table
CREATE TABLE employee_events (
event_id INT,
employee_id INT,
event_type VARCHAR(10), -- 'login' or 'logout'
event_timestamp DATETIME
);

-- Inserting sample data into the Employee Events table


INSERT INTO employee_events (event_id, employee_id, event_type, event_timestamp)
VALUES
(1, 101, 'login', '2023-01-01 08:00:00'),
(2, 101, 'logout', '2023-01-01 09:30:00'), -- 1.5 hrs
(3, 101, 'login', '2023-01-02 08:00:00'),
(4, 101, 'logout', '2023-01-02 10:00:00'), -- 2 hrs
(5, 102, 'login', '2023-01-01 08:00:00'),
(6, 102, 'logout', '2023-01-01 09:30:00'), -- 1.5 hrs
(7, 102, 'login', '2023-01-02 08:00:00'),
(8, 102, 'logout', '2023-01-02 09:00:00'); -- 1 hour

Learnings
• Calculating time differences between login and logout.
• Self-joining the table to compare consecutive days.
• Filtering based on working duration (more than 1 hour).
• Using flags to mark employees who meet the criteria.

Solutions

PostgreSQL Solution
WITH WorkingDurations AS (
SELECT
employee_id,
DATE(event_timestamp) AS work_day,

927
1000+ SQL Interview Questions & Answers | By Zero Analyst

EXTRACT(EPOCH FROM (MAX(CASE WHEN event_type = 'logout' THEN event_timestamp END


)
- MIN(CASE WHEN event_type = 'login' THEN event_timestamp EN
D))) / 3600 AS working_hours
FROM
employee_events
WHERE
event_type IN ('login', 'logout')
GROUP BY
employee_id, DATE(event_timestamp)
)
SELECT
w1.employee_id,
w1.work_day AS first_day,
w2.work_day AS second_day,
'Flag' AS flag
FROM
WorkingDurations w1
JOIN
WorkingDurations w2 ON w1.employee_id = w2.employee_id
AND w1.work_day = w2.work_day - INTERVAL '1 day'
WHERE
w1.working_hours > 1
AND w2.working_hours > 1
ORDER BY
w1.employee_id, w1.work_day;

MySQL Solution
WITH WorkingDurations AS (
SELECT
employee_id,
DATE(event_timestamp) AS work_day,
TIMESTAMPDIFF(SECOND,
MIN(CASE WHEN event_type = 'login' THEN event_timestamp END),
MAX(CASE WHEN event_type = 'logout' THEN event_timestamp END)) / 3
600 AS working_hours
FROM
employee_events
WHERE
event_type IN ('login', 'logout')
GROUP BY
employee_id, DATE(event_timestamp)
)
SELECT
w1.employee_id,
w1.work_day AS first_day,
w2.work_day AS second_day,
'Flag' AS flag
FROM
WorkingDurations w1
JOIN
WorkingDurations w2 ON w1.employee_id = w2.employee_id
AND w1.work_day = DATE_SUB(w2.work_day, INTERVAL 1 DAY)
WHERE
w1.working_hours > 1
AND w2.working_hours > 1
ORDER BY
w1.employee_id, w1.work_day;

Summary of the Solution


• Step 1: Calculate the working duration for each employee on each day by finding the time
difference between login and logout events.
• Step 2: Use a self-join to pair consecutive days for the same employee.
• Step 3: Filter for cases where the total working hours on both days are more than 1 hour.
• Step 4: Flag employees who meet the condition.

928
1000+ SQL Interview Questions & Answers | By Zero Analyst

This query helps identify employees who are consistently working for more than 1 hour on
two consecutive days, which could be useful for monitoring employee engagement or
workload.
• Q.745
Employee Work-Life Balance Analysis
Question: Write a SQL query to identify employees who are working more than 10 hours on
average per day in the last 7 days. These employees should be flagged as potentially at risk
for burnout. The data is provided in the employee_work_hours table, which tracks the daily
work hours for employees. The result should show the employee_id, average_work_hours,
and a flag indicating that they are at risk.

Explanation
• Calculate the average work hours for each employee over the last 7 days.
• Flag those with an average work hours greater than 10.
• Return the employee ID, their average_work_hours, and a flag indicating they are at risk.

Datasets and SQL Schemas


-- Employee Work Hours Table
CREATE TABLE employee_work_hours (
record_id INT,
employee_id INT,
work_date DATE,
hours_worked INT
);

-- Inserting sample data


INSERT INTO employee_work_hours (record_id, employee_id, work_date, hours_worked)
VALUES
(1, 101, '2023-01-01', 9),
(2, 101, '2023-01-02', 12),
(3, 101, '2023-01-03', 11),
(4, 101, '2023-01-04', 10),
(5, 101, '2023-01-05', 9),
(6, 101, '2023-01-06', 13),
(7, 101, '2023-01-07', 14),
(8, 102, '2023-01-01', 8),
(9, 102, '2023-01-02', 7),
(10, 102, '2023-01-03', 6),
(11, 102, '2023-01-04', 9),
(12, 102, '2023-01-05', 8),
(13, 102, '2023-01-06', 7),
(14, 102, '2023-01-07', 9);

Solution
SELECT
employee_id,
AVG(hours_worked) AS average_work_hours,
'At Risk' AS flag
FROM
employee_work_hours
WHERE
work_date >= CURDATE() - INTERVAL 7 DAY
GROUP BY
employee_id
HAVING
AVG(hours_worked) > 10
ORDER BY
average_work_hours DESC;
• Q.746

929
1000+ SQL Interview Questions & Answers | By Zero Analyst

Identifying Employees with Insufficient Breaks


Question: Write a SQL query to find employees who have taken less than 30 minutes of
break time in total during their workday for more than 3 consecutive days. This can indicate
that they are not getting enough rest. You are provided with the employee_breaks table,
which logs break start and end times. For each employee, the query should return the
employee_id, start_date, end_date, and a flag indicating insufficient break duration.

Explanation
• Calculate the total break time for each employee on each day.
• Identify employees who have less than 30 minutes of break on consecutive days.
• Return employee_id, start_date, end_date, and flag them as needing more breaks.

Datasets and SQL Schemas


-- Employee Breaks Table
CREATE TABLE employee_breaks (
break_id INT,
employee_id INT,
break_start DATETIME,
break_end DATETIME
);

-- Inserting sample data


INSERT INTO employee_breaks (break_id, employee_id, break_start, break_end)
VALUES
(1, 101, '2023-01-01 12:00:00', '2023-01-01 12:15:00'),
(2, 101, '2023-01-02 12:10:00', '2023-01-02 12:20:00'),
(3, 101, '2023-01-03 12:00:00', '2023-01-03 12:10:00'),
(4, 102, '2023-01-01 12:00:00', '2023-01-01 12:20:00'),
(5, 102, '2023-01-02 12:05:00', '2023-01-02 12:30:00'),
(6, 102, '2023-01-03 12:00:00', '2023-01-03 12:25:00'),
(7, 103, '2023-01-01 12:00:00', '2023-01-01 12:15:00'),
(8, 103, '2023-01-02 12:10:00', '2023-01-02 12:40:00'),
(9, 103, '2023-01-03 12:00:00', '2023-01-03 12:30:00');

Solution
WITH BreakDurations AS (
SELECT
employee_id,
DATE(break_start) AS break_date,
SUM(TIMESTAMPDIFF(MINUTE, break_start, break_end)) AS total_break_time
FROM
employee_breaks
GROUP BY
employee_id, DATE(break_start)
)
SELECT
b1.employee_id,
b1.break_date AS start_date,
b2.break_date AS end_date,
'Insufficient Breaks' AS flag
FROM
BreakDurations b1
JOIN
BreakDurations b2 ON b1.employee_id = b2.employee_id
AND DATEDIFF(b2.break_date, b1.break_date) = 1
WHERE
b1.total_break_time < 30
AND b2.total_break_time < 30
ORDER BY
b1.employee_id, b1.break_date;
• Q.747

930
1000+ SQL Interview Questions & Answers | By Zero Analyst

3. Tracking Employee Sick Days


Question: Write a SQL query to find employees who have taken more than 3 sick days in the
past month. Employees with excessive sick days may need extra support. The
employee_sick_days table logs sick leave taken by employees. The query should return the
employee_id, sick_days_taken, and flag them as requiring follow-up.

Explanation
• Calculate the total sick days taken by each employee in the last month.
• Identify employees who have taken more than 3 sick days.
• Return employee_id, sick_days_taken, and flag them as requiring follow-up.

Datasets and SQL Schemas


-- Employee Sick Days Table
CREATE TABLE employee_sick_days (
sick_day_id INT,
employee_id INT,
sick_day_date DATE
);

-- Inserting sample data


INSERT INTO employee_sick_days (sick_day_id, employee_id, sick_day_date)
VALUES
(1, 101, '2023-01-01'),
(2, 101, '2023-01-05'),
(3, 101, '2023-01-10'),
(4, 101, '2023-01-15'),
(5, 102, '2023-01-01'),
(6, 102, '2023-01-03'),
(7, 103, '2023-01-01'),
(8, 103, '2023-01-02');

Solution
SELECT
employee_id,
COUNT(sick_day_date) AS sick_days_taken,
'Follow-up Needed' AS flag
FROM
employee_sick_days
WHERE
sick_day_date >= CURDATE() - INTERVAL 1 MONTH
GROUP BY
employee_id
HAVING
COUNT(sick_day_date) > 3
ORDER BY
sick_days_taken DESC;

Key Learnings
• Identifying patterns in work hours to ensure employees maintain a healthy work-life
balance.
• Tracking break durations to ensure employees are taking enough rest.
• Monitoring sick leave patterns to ensure that employees are not over-utilizing sick days,
which could indicate burnout or other issues.
• Q.748
Tracking Employees with Consecutive Late Logins

931
1000+ SQL Interview Questions & Answers | By Zero Analyst

Question: Write a SQL query to identify employees who have logged in late (after 9 AM) for
3 consecutive days in the past month. These employees might be exhibiting signs of poor
time management or work-life imbalance. The result should show the employee_id,
consecutive_late_days, and flag them as "Time Management Concern".

Explanation
• Track employees who log in after 9 AM.
• Identify employees who have consecutive late logins for 3 or more days.
• Flag these employees as having a time management concern.

Datasets and SQL Schemas


-- Employee Login Table
CREATE TABLE employee_logins (
login_id INT,
employee_id INT,
login_time DATETIME
);

-- Inserting sample data


INSERT INTO employee_logins (login_id, employee_id, login_time)
VALUES
(1, 101, '2023-01-01 09:15:00'),
(2, 101, '2023-01-02 09:20:00'),
(3, 101, '2023-01-03 09:10:00'),
(4, 102, '2023-01-01 09:00:00'),
(5, 102, '2023-01-02 09:05:00'),
(6, 102, '2023-01-03 08:50:00'),
(7, 103, '2023-01-01 09:05:00'),
(8, 103, '2023-01-02 09:30:00'),
(9, 103, '2023-01-03 09:00:00');

Solution
WITH LateLogins AS (
SELECT
employee_id,
DATE(login_time) AS login_date,
IF(TIME(login_time) > '09:00:00', 1, 0) AS is_late
FROM
employee_logins
WHERE
login_time >= CURDATE() - INTERVAL 1 MONTH
)
SELECT
l1.employee_id,
COUNT(*) AS consecutive_late_days,
'Time Management Concern' AS flag
FROM
LateLogins l1
JOIN
LateLogins l2 ON l1.employee_id = l2.employee_id
AND DATEDIFF(l2.login_date, l1.login_date) = 1
JOIN
LateLogins l3 ON l2.employee_id = l3.employee_id
AND DATEDIFF(l3.login_date, l2.login_date) = 1
WHERE
l1.is_late = 1 AND l2.is_late = 1 AND l3.is_late = 1
GROUP BY
l1.employee_id
ORDER BY
consecutive_late_days DESC;
• Q.749
Flagging Employees for Long Working Hours Without Breaks

932
1000+ SQL Interview Questions & Answers | By Zero Analyst

Question: Write a SQL query to flag employees who have worked for more than 9 hours
without taking any break. You are given the employee_work_hours table (with login and
logout timestamps) and the employee_breaks table (with break start and end times). Identify
employees who have worked continuously for more than 9 hours in a day without any break.

Explanation
• Calculate the total work time for each employee each day.
• Check if there is any break during the working hours.
• Flag employees who worked more than 9 hours without taking a break.

Datasets and SQL Schemas


-- Employee Work Hours Table
CREATE TABLE employee_work_hours (
work_id INT,
employee_id INT,
login_time DATETIME,
logout_time DATETIME
);

-- Employee Breaks Table


CREATE TABLE employee_breaks (
break_id INT,
employee_id INT,
break_start DATETIME,
break_end DATETIME
);

-- Inserting sample data


INSERT INTO employee_work_hours (work_id, employee_id, login_time, logout_time)
VALUES
(1, 101, '2023-01-01 08:00:00', '2023-01-01 18:00:00'),
(2, 101, '2023-01-02 09:00:00', '2023-01-02 19:00:00'),
(3, 102, '2023-01-01 08:00:00', '2023-01-01 17:30:00'),
(4, 103, '2023-01-01 09:00:00', '2023-01-01 18:00:00');

INSERT INTO employee_breaks (break_id, employee_id, break_start, break_end)


VALUES
(1, 101, '2023-01-01 12:00:00', '2023-01-01 12:30:00'),
(2, 102, '2023-01-01 12:00:00', '2023-01-01 12:15:00');

Solution
WITH WorkDurations AS (
SELECT
employee_id,
DATE(login_time) AS work_date,
TIMESTAMPDIFF(HOUR, login_time, logout_time) AS total_work_hours
FROM
employee_work_hours
),
Breaks AS (
SELECT
employee_id,
DATE(break_start) AS break_date,
SUM(TIMESTAMPDIFF(MINUTE, break_start, break_end)) / 60 AS total_break_hours
FROM
employee_breaks
GROUP BY
employee_id, DATE(break_start)
)
SELECT
w.employee_id,
w.work_date,
'No Breaks Taken' AS flag
FROM
WorkDurations w

933
1000+ SQL Interview Questions & Answers | By Zero Analyst

LEFT JOIN
Breaks b ON w.employee_id = b.employee_id AND w.work_date = b.break_date
WHERE
w.total_work_hours > 9
AND (b.total_break_hours IS NULL OR b.total_break_hours = 0)
ORDER BY
w.employee_id, w.work_date;
• Q.750
Identifying Employees Working on Weekends
Question: Write a SQL query to identify employees who have worked on both Saturday and
Sunday during the last month. Working on weekends can indicate high stress or workload.
The result should show employee_id, weekend_work_days, and a flag indicating they
worked on weekends.

Explanation
• Track all the weekend days (Saturday and Sunday) worked by employees.
• Identify employees who have worked on both Saturday and Sunday in the last month.
• Flag those employees as weekend workers.

Datasets and SQL Schemas


-- Employee Work Hours Table
CREATE TABLE employee_work_hours (
work_id INT,
employee_id INT,
login_time DATETIME,
logout_time DATETIME
);

-- Inserting sample data


INSERT INTO employee_work_hours (work_id, employee_id, login_time, logout_time)
VALUES
(1, 101, '2023-01-01 10:00:00', '2023-01-01 18:00:00'),
(2, 101, '2023-01-07 10:00:00', '2023-01-07 18:00:00'),
(3, 101, '2023-01-08 10:00:00', '2023-01-08 18:00:00'),
(4, 102, '2023-01-07 10:00:00', '2023-01-07 18:00:00'),
(5, 103, '2023-01-07 10:00:00', '2023-01-07 18:00:00'),
(6, 103, '2023-01-08 10:00:00', '2023-01-08 18:00:00');

Solution
WITH WeekendWork AS (
SELECT
employee_id,
DAYOFWEEK(login_time) AS work_day,
DATE(login_time) AS work_date
FROM
employee_work_hours
WHERE
login_time >= CURDATE() - INTERVAL 1 MONTH
)
SELECT
employee_id,
COUNT(DISTINCT work_date) AS weekend_work_days,
'Weekend Worker' AS flag
FROM
WeekendWork
WHERE
work_day IN (1, 7) -- 1 = Sunday, 7 = Saturday
GROUP BY
employee_id
HAVING
COUNT(DISTINCT work_date) = 2
ORDER BY
employee_id;

934
1000+ SQL Interview Questions & Answers | By Zero Analyst

Key Learnings
• Tracking consecutive behaviors such as late logins or long working hours can help
identify patterns related to employee wellbeing.
• Weekend working can indicate potential work overload or unhealthy work-life balance.
• Breaks and overtime tracking are crucial for identifying employees who might be
overworked or experiencing burnout.
These queries will help HR and managers focus on employees who may need additional
support to ensure their wellbeing and work-life balance
• Q.751
Question: Write a SQL query to identify employees who have been assigned more than 5
tasks per day for 3 consecutive days in the past month. This may indicate high stress or
workload imbalance. Return the employee_id, consecutive_days, and flag them as "High
Workload".

Explanation
• Track the number of tasks assigned to each employee each day.
• Identify employees who have been assigned more than 5 tasks per day for 3 consecutive
days.
• Flag these employees as having a high workload.

Datasets and SQL Schemas


-- Employee Task Assignments Table
CREATE TABLE employee_tasks (
task_id INT,
employee_id INT,
task_date DATE,
task_description VARCHAR(255)
);

-- Inserting sample data


INSERT INTO employee_tasks (task_id, employee_id, task_date, task_description)
VALUES
(1, 101, '2023-01-01', 'Task A'),
(2, 101, '2023-01-02', 'Task B'),
(3, 101, '2023-01-03', 'Task C'),
(4, 101, '2023-01-04', 'Task D'),
(5, 101, '2023-01-05', 'Task E'),
(6, 102, '2023-01-01', 'Task A'),
(7, 102, '2023-01-02', 'Task B'),
(8, 102, '2023-01-03', 'Task C'),
(9, 103, '2023-01-01', 'Task A'),
(10, 103, '2023-01-02', 'Task B'),
(11, 103, '2023-01-02', 'Task C'),
(12, 103, '2023-01-03', 'Task D');

Solution
WITH TaskCounts AS (
SELECT
employee_id,
task_date,
COUNT(*) AS task_count
FROM
employee_tasks
GROUP BY
employee_id, task_date
)

935
1000+ SQL Interview Questions & Answers | By Zero Analyst

SELECT
t1.employee_id,
COUNT(DISTINCT t1.task_date) AS consecutive_days,
'High Workload' AS flag
FROM
TaskCounts t1
JOIN
TaskCounts t2 ON t1.employee_id = t2.employee_id
AND DATEDIFF(t2.task_date, t1.task_date) = 1
JOIN
TaskCounts t3 ON t2.employee_id = t3.employee_id
AND DATEDIFF(t3.task_date, t2.task_date) = 1
WHERE
t1.task_count > 5 AND t2.task_count > 5 AND t3.task_count > 5
GROUP BY
t1.employee_id
HAVING
COUNT(DISTINCT t1.task_date) = 3
ORDER BY
consecutive_days DESC;
• Q.752
Flagging Employees with Excessive Overtime (More Than 12 Hours) on Weekdays
Question: Write a SQL query to identify employees who have worked more than 12 hours
on any weekdays in the past month. These employees may be facing work overload or
burnout. Return the employee_id, work_date, and flag them as "Excessive Overtime".

Explanation
• Calculate the work duration for each employee per day.
• Identify days when employees have worked more than 12 hours on weekdays (Monday
to Friday).
• Flag employees who have worked excessive hours as at risk of burnout.

Datasets and SQL Schemas


-- Employee Work Hours Table
CREATE TABLE employee_work_hours (
work_id INT,
employee_id INT,
login_time DATETIME,
logout_time DATETIME
);

-- Inserting sample data


INSERT INTO employee_work_hours (work_id, employee_id, login_time, logout_time)
VALUES
(1, 101, '2023-01-01 08:00:00', '2023-01-01 20:00:00'),
(2, 102, '2023-01-02 08:00:00', '2023-01-02 21:00:00'),
(3, 103, '2023-01-03 09:00:00', '2023-01-03 21:00:00'),
(4, 101, '2023-01-05 09:00:00', '2023-01-05 21:00:00');

Solution
WITH WorkDurations AS (
SELECT
employee_id,
DATE(login_time) AS work_date,
TIMESTAMPDIFF(HOUR, login_time, logout_time) AS work_hours,
DAYOFWEEK(login_time) AS day_of_week
FROM
employee_work_hours
WHERE
login_time >= CURDATE() - INTERVAL 1 MONTH
)
SELECT
employee_id,

936
1000+ SQL Interview Questions & Answers | By Zero Analyst

work_date,
'Excessive Overtime' AS flag
FROM
WorkDurations
WHERE
work_hours > 12
AND day_of_week BETWEEN 2 AND 6 -- Weekdays: Monday (2) to Friday (6)
ORDER BY
work_date DESC;
• Q.753
Detecting Employees with Consistently Low Engagement (No Activity for 7+ Days)
Question: Write a SQL query to detect employees who have not logged in or performed any
work-related activities (e.g., task assignments or project updates) for 7 or more consecutive
days. These employees may be experiencing disengagement, burnout, or personal issues.
Return the employee_id, first_inactive_date, and flag them as "Low Engagement".

Explanation
• Track employees who have not logged in or performed any work activities for 7+
consecutive days.
• Detect employees who might be disengaged or facing work-related burnout.
• Flag these employees as having low engagement.

Datasets and SQL Schemas


-- Employee Activities Table
CREATE TABLE employee_activities (
activity_id INT,
employee_id INT,
activity_type VARCHAR(255),
activity_date DATE
);

-- Inserting sample data


INSERT INTO employee_activities (activity_id, employee_id, activity_type, activity_date)
VALUES
(1, 101, 'Task Completed', '2023-01-01'),
(2, 101, 'Task Assigned', '2023-01-02'),
(3, 102, 'Task Completed', '2023-01-01'),
(4, 103, 'Task Assigned', '2023-01-04'),
(5, 103, 'Project Update', '2023-01-10');

Solution
WITH ActivityGaps AS (
SELECT
employee_id,
activity_date,
LEAD(activity_date) OVER (PARTITION BY employee_id ORDER BY activity_date) AS ne
xt_activity_date
FROM
employee_activities
)
SELECT
employee_id,
MIN(activity_date) AS first_inactive_date,
'Low Engagement' AS flag
FROM
ActivityGaps
WHERE
DATEDIFF(next_activity_date, activity_date) > 1
AND DATEDIFF(next_activity_date, activity_date) <= 7
GROUP BY
employee_id
ORDER BY
first_inactive_date DESC;

937
1000+ SQL Interview Questions & Answers | By Zero Analyst

Key Learnings
• High task load or excessive overtime may indicate employee burnout or poor work-life
balance.
• Identifying disengagement or lack of activity can help address potential workforce
wellbeing issues early.
• Workload tracking and activity analysis help in identifying employees who may need
additional support to maintain optimal wellbeing.
• Q.754

Tracking Employees with Consistently High Task Completion Rate


Question: Write a SQL query to identify employees who have completed more than 90% of
their assigned tasks for 3 consecutive months. These employees are performing at a
consistently high level. Return the employee_id, consecutive_months, and flag them as
"High Performer".

Explanation
• Track task completion rate for each employee on a monthly basis.
• Identify employees who have completed more than 90% of their assigned tasks for 3
consecutive months.
• Flag these employees as "High Performers".

Datasets and SQL Schemas


-- Employee Task Assignments Table
CREATE TABLE employee_tasks (
task_id INT,
employee_id INT,
assigned_date DATE,
task_status VARCHAR(50) -- 'Completed', 'Pending'
);

-- Inserting sample data


INSERT INTO employee_tasks (task_id, employee_id, assigned_date, task_status)
VALUES
(1, 101, '2023-01-01', 'Completed'),
(2, 101, '2023-01-02', 'Completed'),
(3, 101, '2023-01-03', 'Pending'),
(4, 101, '2023-02-01', 'Completed'),
(5, 101, '2023-02-02', 'Completed'),
(6, 101, '2023-03-01', 'Completed'),
(7, 102, '2023-01-01', 'Completed'),
(8, 102, '2023-02-01', 'Pending'),
(9, 102, '2023-03-01', 'Completed');

Solution
WITH TaskCompletionRate AS (
SELECT
employee_id,
EXTRACT(MONTH FROM assigned_date) AS task_month,
COUNT(*) AS total_tasks,
SUM(CASE WHEN task_status = 'Completed' THEN 1 ELSE 0 END) AS completed_tasks
FROM
employee_tasks
GROUP BY
employee_id, EXTRACT(MONTH FROM assigned_date)
),
HighPerformers AS (

938
1000+ SQL Interview Questions & Answers | By Zero Analyst

SELECT
employee_id,
COUNT(*) AS consecutive_months
FROM
TaskCompletionRate
WHERE
completed_tasks / total_tasks > 0.9
GROUP BY
employee_id
)
SELECT
employee_id,
consecutive_months,
'High Performer' AS flag
FROM
HighPerformers
WHERE
consecutive_months >= 3
ORDER BY
consecutive_months DESC;

3. Tracking Employees with Above-Average Performance in a Given Quarter


Question: Write a SQL query to find employees whose task completion rate is above the
average completion rate for their department in a given quarter (e.g., Q1 of 2023). Return
the employee_id, department_id, and flag them as "Above Average Performer".

Explanation
• Calculate the average task completion rate for each department in a given quarter.
• Identify employees whose task completion rate exceeds the department's average for that
quarter.
• Flag these employees as "Above Average Performers".

Datasets and SQL Schemas


-- Employee Task Assignments Table
CREATE TABLE employee_tasks (
task_id INT,
employee_id INT,
assigned_date DATE,
task_status VARCHAR(50) -- 'Completed', 'Pending'
);

-- Inserting sample data


INSERT INTO employee_tasks (task_id, employee_id, assigned_date, task_status)
VALUES
(1, 101, '2023-01-01', 'Completed'),
(2, 101, '2023-01-02', 'Completed'),
(3, 101, '2023-01-03', 'Pending'),
(4, 101, '2023-02-01', 'Completed'),
(5, 101, '2023-02-02', 'Completed'),
(6, 101, '2023-03-01', 'Completed'),
(7, 102, '2023-01-01', 'Completed'),
(8, 102, '2023-02-01', 'Pending'),
(9, 102, '2023-03-01', 'Completed');

Solution
WITH DepartmentCompletionRate AS (
SELECT
e.department_id,
EXTRACT(QUARTER FROM et.assigned_date) AS task_quarter,
AVG(CASE WHEN et.task_status = 'Completed' THEN 1 ELSE 0 END) AS avg_completion_
rate
FROM
employees e
JOIN

939
1000+ SQL Interview Questions & Answers | By Zero Analyst

employee_tasks et ON e.employee_id = et.employee_id


WHERE
et.assigned_date BETWEEN '2023-01-01' AND '2023-03-31' -- Q1 2023
GROUP BY
e.department_id, EXTRACT(QUARTER FROM et.assigned_date)
),
EmployeePerformance AS (
SELECT
e.employee_id,
e.department_id,
COUNT(*) AS total_tasks,
SUM(CASE WHEN et.task_status = 'Completed' THEN 1 ELSE 0 END) AS completed_tasks
,
(SUM(CASE WHEN et.task_status = 'Completed' THEN 1 ELSE 0 END) / COUNT(*)) AS ta
sk_completion_rate
FROM
employees e
JOIN
employee_tasks et ON e.employee_id = et.employee_id
WHERE
et.assigned_date BETWEEN '2023-01-01' AND '2023-03-31'
GROUP BY
e.employee_id, e.department_id
)
SELECT
ep.employee_id,
ep.department_id,
'Above Average Performer' AS flag
FROM
EmployeePerformance ep
JOIN
DepartmentCompletionRate dcr ON ep.department_id = dcr.department_id
WHERE
ep.task_completion_rate > dcr.avg_completion_rate
ORDER BY
ep.task_completion_rate DESC;

Key Learnings
• Task completion rates are a useful metric for tracking employee performance,
highlighting high performers and underperformers.
• Consecutive months of high task completion or a decline in performance can be critical
indicators of employee motivation or burnout.
• Understanding employees' relative performance within their department helps in fostering
a competitive and motivating work environment.
These queries help to monitor employee productivity and engagement, ultimately aiding in
the improvement of overall organizational performance and ensuring employee well-
being.
• Q.755
Identifying Employees with Declining Performance Over the Past 6 Months
Question: Write a SQL query to identify employees whose task completion rate has
declined by more than 20% over the past 6 months. Return the employee_id,
previous_rate, current_rate, and flag them as "Declining Performance".

Explanation
• Track task completion rates for employees over the past 6 months.
• Calculate the percentage change in task completion rate.
• Flag employees whose performance has declined by more than 20%.

940
1000+ SQL Interview Questions & Answers | By Zero Analyst

Datasets and SQL Schemas


-- Employee Task Assignments Table
CREATE TABLE employee_tasks (
task_id INT,
employee_id INT,
assigned_date DATE,
task_status VARCHAR(50) -- 'Completed', 'Pending'
);

-- Inserting sample data


INSERT INTO employee_tasks (task_id, employee_id, assigned_date, task_status)
VALUES
(1, 101, '2023-01-01', 'Completed'),
(2, 101, '2023-01-02', 'Completed'),
(3, 101, '2023-01-03', 'Pending'),
(4, 101, '2023-02-01', 'Completed'),
(5, 101, '2023-02-02', 'Completed'),
(6, 101, '2023-03-01', 'Completed'),
(7, 102, '2023-01-01', 'Completed'),
(8, 102, '2023-02-01', 'Pending'),
(9, 102, '2023-03-01', 'Completed');

Solution
WITH MonthlyCompletionRates AS (
SELECT
employee_id,
EXTRACT(MONTH FROM assigned_date) AS task_month,
COUNT(*) AS total_tasks,
SUM(CASE WHEN task_status = 'Completed' THEN 1 ELSE 0 END) AS completed_tasks
FROM
employee_tasks
WHERE
assigned_date >= CURDATE() - INTERVAL 6 MONTH
GROUP BY
employee_id, EXTRACT(MONTH FROM assigned_date)
),
PerformanceChange AS (
SELECT
employee_id,
MAX(CASE WHEN task_month = EXTRACT(MONTH FROM CURDATE() - INTERVAL 1 MONTH) THEN
completed_tasks / total_tasks ELSE 0 END) AS previous_rate,
MAX(CASE WHEN task_month = EXTRACT(MONTH FROM CURDATE()) THEN completed_tasks /
total_tasks ELSE 0 END) AS current_rate
FROM
MonthlyCompletionRates
GROUP BY
employee_id
)
SELECT
employee_id,
previous_rate,
current_rate,
'Declining Performance' AS flag
FROM
PerformanceChange
WHERE
(previous_rate - current_rate) > 0.2
ORDER BY
current_rate ASC;
• Q.756

Question
Calculate the Click-Through Rate (CTR) for Ernst & Young Webinars and identify the
webinar with the highest CTR.

Explanation

941
1000+ SQL Interview Questions & Answers | By Zero Analyst

You need to calculate the click-through rate (CTR), which is the ratio of users who clicked on
an ad to the users who saw the ad. Then, determine which webinar has the highest CTR. The
relevant data is stored across three tables: ad_impressions, ad_clicks, and
webinar_registrations. You need to join these tables and calculate the CTR for each webinar.

Datasets and SQL Schemas


Table creation:
-- ad_impressions table creation
CREATE TABLE ad_impressions (
impression_id INT,
webinar_id INT,
date TIMESTAMP,
user_id INT
);

-- ad_clicks table creation


CREATE TABLE ad_clicks (
click_id INT,
webinar_id INT,
date TIMESTAMP,
user_id INT
);

-- webinar_registrations table creation


CREATE TABLE webinar_registrations (
registration_id INT,
webinar_id INT,
date TIMESTAMP,
user_id INT
);

Datasets:
-- Insert data into ad_impressions
INSERT INTO ad_impressions (impression_id, webinar_id, date, user_id)
VALUES
(5621, 1011, '2022-10-01 00:00:00', 124),
(5622, 1920, '2022-10-01 00:00:00', 278),
(5623, 1011, '2022-10-01 00:00:00', 345),
(5624, 1011, '2022-10-01 00:00:00', 234),
(5625, 1920, '2022-10-01 00:00:00', 678);

-- Insert data into ad_clicks


INSERT INTO ad_clicks (click_id, webinar_id, date, user_id)
VALUES
(3952, 1011, '2022-10-01 00:00:00', 124),
(3953, 1920, '2022-10-01 00:00:00', 278),
(3954, 1011, '2022-10-01 00:00:00', 345);

-- Insert data into webinar_registrations


INSERT INTO webinar_registrations (registration_id, webinar_id, date, user_id)
VALUES
(182, 1011, '2022-10-01 00:00:00', 124),
(276, 1920, '2022-10-01 00:00:00', 278),
(930, 1011, '2022-10-02 00:00:00', 345),
(431, 1011, '2022-10-02 00:00:00', 234),
(590, 1920, '2022-10-02 00:00:00', 678);

Learnings
• Joins: Using JOIN to combine data from different tables based on common fields
(webinar_id and user_id).
• DISTINCT: Ensuring unique user counts for impressions, clicks, and registrations.
• Aggregations: Using COUNT() to aggregate impressions, clicks, and registrations.
• Subqueries: Using a WITH clause for intermediate calculations before the final result.

942
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Calculations: Performing arithmetic operations to calculate the CTR.

Solutions
PostgreSQL Solution:
WITH CTR AS (
SELECT
I.webinar_id,
COUNT(DISTINCT I.user_id) AS impressions,
COUNT(DISTINCT C.user_id) AS clicks,
COUNT(DISTINCT R.user_id) AS registrations
FROM
ad_impressions I
JOIN
ad_clicks C ON I.webinar_id = C.webinar_id AND I.user_id = C.user_id
JOIN
webinar_registrations R ON C.webinar_id = R.webinar_id AND C.user_id = R.user_id
WHERE
I.date <= C.date AND C.date <= R.date
GROUP BY
I.webinar_id
)

SELECT
webinar_id,
impressions,
clicks,
registrations,
(clicks::float / impressions::float * 100) AS click_through_rate
FROM
CTR
ORDER BY
click_through_rate DESC;

MySQL Solution:
WITH CTR AS (
SELECT
I.webinar_id,
COUNT(DISTINCT I.user_id) AS impressions,
COUNT(DISTINCT C.user_id) AS clicks,
COUNT(DISTINCT R.user_id) AS registrations
FROM
ad_impressions I
JOIN
ad_clicks C ON I.webinar_id = C.webinar_id AND I.user_id = C.user_id
JOIN
webinar_registrations R ON C.webinar_id = R.webinar_id AND C.user_id = R.user_id
WHERE
I.date <= C.date AND C.date <= R.date
GROUP BY
I.webinar_id
)

SELECT
webinar_id,
impressions,
clicks,
registrations,
(clicks / impressions * 100) AS click_through_rate
FROM
CTR
ORDER BY
click_through_rate DESC;
• Q.757
Identifying Project Delays and Resource Allocation Issues

943
1000+ SQL Interview Questions & Answers | By Zero Analyst

Question: Write a SQL query to identify projects that have experienced a delay of more
than 15% in their original estimated duration. For each delayed project, calculate the
additional time it took beyond the estimated duration, and identify whether the delay was
due to under- or over-allocation of resources. Return the project_id, estimated_duration,
actual_duration, over_or_under_allocation, and flag them as "Delayed Project".

Explanation
• Track the estimated duration and actual duration of each project.
• Calculate the percentage of delay for each project.
• Determine if the delay is due to under-allocation (fewer resources than expected) or over-
allocation (more resources than expected).
• Flag projects that have experienced delays of more than 15%.

Datasets and SQL Schemas


-- Projects Table
CREATE TABLE projects (
project_id INT,
project_name VARCHAR(255),
estimated_duration INT, -- in days
actual_duration INT -- in days
);

-- Resource Allocation Table


CREATE TABLE resource_allocation (
project_id INT,
employee_id INT,
allocated_hours INT,
actual_hours INT
);

-- Inserting sample data


INSERT INTO projects (project_id, project_name, estimated_duration, actual_duration)
VALUES
(1, 'Project A', 30, 35),
(2, 'Project B', 45, 50),
(3, 'Project C', 60, 55),
(4, 'Project D', 20, 25);

INSERT INTO resource_allocation (project_id, employee_id, allocated_hours, actual_hours)


VALUES
(1, 101, 160, 150),
(2, 102, 180, 200),
(3, 103, 240, 220),
(4, 104, 120, 130);

Solution
WITH DelayAnalysis AS (
SELECT
p.project_id,
p.estimated_duration,
p.actual_duration,
(p.actual_duration - p.estimated_duration) / p.estimated_duration * 100 AS delay
_percentage,
SUM(CASE WHEN ra.allocated_hours < ra.actual_hours THEN 1 ELSE 0 END) AS under_a
llocation,
SUM(CASE WHEN ra.allocated_hours > ra.actual_hours THEN 1 ELSE 0 END) AS over_al
location
FROM
projects p
LEFT JOIN
resource_allocation ra ON p.project_id = ra.project_id
GROUP BY
p.project_id, p.estimated_duration, p.actual_duration
)

944
1000+ SQL Interview Questions & Answers | By Zero Analyst

SELECT
da.project_id,
da.estimated_duration,
da.actual_duration,
CASE
WHEN da.under_allocation > da.over_allocation THEN 'Under-Allocation'
WHEN da.over_allocation > da.under_allocation THEN 'Over-Allocation'
ELSE 'Balanced'
END AS over_or_under_allocation,
'Delayed Project' AS flag
FROM
DelayAnalysis da
WHERE
da.delay_percentage > 15
ORDER BY
da.delay_percentage DESC;

Key Learnings
• Project delays and resource allocation issues often correlate with employee burnout and
performance declines. Tracking these issues is critical for maintaining a healthy work
environment.
• Monitoring employee efficiency helps identify potential overwork issues, ensuring that
employees are not being overloaded with tasks.
• Skewed resource allocation can lead to unbalanced workloads and performance
inefficiencies. Identifying and addressing this helps in redistributing work more evenly across
teams.
• Q.758
Tracking Employee Project Assignment Efficiency
Question: Write a SQL query to calculate the efficiency of each employee on projects,
defined as the ratio of actual hours worked to allocated hours on all their projects. Identify
employees whose efficiency is greater than 1.2 (i.e., they worked more than 120% of their
allocated hours). Return the employee_id, total_allocated_hours,
total_actual_hours, and flag them as "Overworked".

Explanation
• Calculate the total allocated hours and actual hours for each employee across all projects.
• Calculate the efficiency as the ratio of actual hours to allocated hours.
• Flag employees whose efficiency is greater than 1.2 as "Overworked".

Datasets and SQL Schemas


-- Resource Allocation Table (same as above)
-- Inserting sample data
INSERT INTO resource_allocation (project_id, employee_id, allocated_hours, actual_hours)
VALUES
(1, 101, 160, 170),
(1, 102, 180, 160),
(2, 101, 200, 250),
(3, 103, 240, 260),
(4, 104, 120, 150);

Solution
SELECT
employee_id,
SUM(allocated_hours) AS total_allocated_hours,
SUM(actual_hours) AS total_actual_hours,
(SUM(actual_hours) / SUM(allocated_hours)) AS efficiency,
CASE

945
1000+ SQL Interview Questions & Answers | By Zero Analyst

WHEN (SUM(actual_hours) / SUM(allocated_hours)) > 1.2 THEN 'Overworked'


ELSE 'Balanced'
END AS flag
FROM
resource_allocation
GROUP BY
employee_id
HAVING
(SUM(actual_hours) / SUM(allocated_hours)) > 1.2
ORDER BY
efficiency DESC;
• Q.759

Question
Analyzing Consulting Project Performance
Ernst & Young (EY) is a multinational professional services firm and is one of the "Big
Four" accounting firms. EY is involved with a number of consulting projects with different
clients and the stakeholder wants to understand the consulting project performance. You are
provided with the following two tables:
• projects - Each row records information of a project involving a specific client.
• billing - Each row records information of billing to the clients for each project.
The stakeholder would like to know:
• The total billing amount for each project.
• The average monthly billing amount for each project.
Write a SQL PostgreSQL query to help answer the stakeholder's questions.

Explanation
• Join the projects table with the billing table based on project_id to retrieve billing
information for each project.
• Calculate the total billing amount for each project using SUM().
• Calculate the average monthly billing for each project by dividing the total billing amount
by the number of months between the project_start_date and project_end_date.
• Use EXTRACT() to get the year and month part of the dates for calculating the total number
of months between the start and end date of each project.
• Group the results by project_id and project_name to ensure you get the summary for
each project.

Datasets and SQL Schemas


-- Table creation
CREATE TABLE projects (
project_id INT,
client_id INT,
project_name VARCHAR(255),
project_start_date DATE,
project_end_date DATE
);

CREATE TABLE billing (


billing_id INT,
project_id INT,
billing_date DATE,

946
1000+ SQL Interview Questions & Answers | By Zero Analyst

billing_amount DECIMAL(10, 2)
);

-- Sample data for projects


INSERT INTO projects (project_id, client_id, project_name, project_start_date, project_e
nd_date)
VALUES
(1, 101, 'Project A', '2019-01-01', '2019-06-01'),
(2, 102, 'Project B', '2019-05-01', '2019-11-01'),
(3, 103, 'Project C', '2019-10-01', '2020-03-01');

-- Sample data for billing


INSERT INTO billing (billing_id, project_id, billing_date, billing_amount)
VALUES
(1, 1, '2019-02-01', 5000),
(2, 1, '2019-05-01', 5000),
(3, 2, '2019-06-01', 4000),
(4, 2, '2019-08-01', 4000),
(5, 3, '2019-12-01', 6000);

Solutions

Solution (PostgreSQL)
SELECT
projects.project_id,
projects.project_name,
SUM(billing.billing_amount) AS total_billing,
(SUM(billing.billing_amount)/
((EXTRACT(YEAR FROM projects.project_end_date) - EXTRACT(YEAR FROM projects.project_
start_date))*12 +
(EXTRACT(MONTH FROM projects.project_end_date) - EXTRACT(MONTH FROM projects.project
_start_date)))) AS avg_monthly_billing
FROM
projects
INNER JOIN
billing ON projects.project_id = billing.project_id
GROUP BY
projects.project_id,
projects.project_name,
projects.project_start_date,
projects.project_end_date;

Learnings
• Joins: This problem involves an INNER JOIN between the projects and billing tables to
get the combined data for each project.
• Aggregation: The use of SUM() aggregates the total billing amount for each project.
• Date Calculations: The EXTRACT() function is used to calculate the number of months
between project_start_date and project_end_date. This allows the calculation of the
average monthly billing.
• Grouping: The query groups by project_id and project_name to get the results per
project.

• Q.760

Question
Identify Ernst & Young's Top Billing Clients

947
1000+ SQL Interview Questions & Answers | By Zero Analyst

Ernst & Young (EY) is a multinational professional services network. It primarily provides
assurance (including financial audit), tax, consulting, and advisory services to its clients. For
EY, a VIP user or "whale" would be a client that has substantial expenditures with EY in
terms of billing amounts. You are given a database that contains two tables: a "clients" table
and a "billings" table.
Write a SQL query to identify EY's top 10 clients in terms of total billed amount in the past
year.

Explanation
• The task requires identifying the top 10 clients based on the total amount billed in the past
year.
• You will need to join the clients table with the billings table using the client_id
field.
• Filter the billings records to only include those within the past year.
• Sum the total billed amount for each client and order the results by the total billed amount
in descending order.
• Return the top 10 clients with the highest billed amounts.

Datasets and SQL Schemas


-- Table creation
CREATE TABLE clients (
client_id INT,
client_name VARCHAR(255),
industry VARCHAR(255)
);

CREATE TABLE billings (


bill_id INT,
client_id INT,
billing_date DATE,
amount DECIMAL(10, 2)
);

-- Sample data for clients


INSERT INTO clients (client_id, client_name, industry)
VALUES
(100, 'ABC Corp', 'Technology'),
(101, 'DEF Inc', 'Healthcare'),
(102, 'GHI LLC', 'Finance');

-- Sample data for billings


INSERT INTO billings (bill_id, client_id, billing_date, amount)
VALUES
(200, 100, '2022-06-22', 50000.00),
(201, 101, '2022-07-13', 75000.00),
(202, 102, '2022-05-08', 36000.00),
(203, 100, '2022-11-15', 25000.00),
(204, 102, '2022-09-05', 40000.00);

Solutions

Solution (PostgreSQL, MySQL)


SELECT c.client_name, SUM(b.amount) AS total_billed
FROM clients AS c
JOIN billings AS b

948
1000+ SQL Interview Questions & Answers | By Zero Analyst

ON c.client_id = b.client_id
WHERE b.billing_date BETWEEN (CURRENT_DATE - INTERVAL '1 year') AND CURRENT_DATE
GROUP BY c.client_name
ORDER BY total_billed DESC
LIMIT 10;

Learnings
• Joins: This problem involves using a JOIN to combine data from two tables (clients and
billings) based on the client_id.
• Filtering: The query uses a date filter (WHERE b.billing_date BETWEEN ...) to ensure
only billing records from the last year are considered.
• Aggregation: The query uses SUM() to aggregate the billing amounts for each client.
• Ordering and Limiting: Sorting by total_billed in descending order helps identify the
top clients, and LIMIT 10 ensures we get the top 10 results.

Capgemini
• Q.761
Find Employees Who Report to More Than One Manager
Question:
Write a SQL query to find employees who report to more than one manager.
Explanation:
You need to:
• Identify employees with multiple manager IDs.
• Group the results by employee and use the HAVING clause to filter out employees with only
one manager.

Datasets and SQL Schemas

Employees Table
CREATE TABLE employees (
employee_id INT PRIMARY KEY,
name VARCHAR(100),
manager_id INT
);

INSERT INTO employees (employee_id, name, manager_id) VALUES


(1, 'Alice', 100),
(2, 'Bob', 101),
(3, 'Charlie', 100),
(4, 'David', 101),
(5, 'Eva', 102),
(6, 'Frank', 100),
(7, 'Grace', 102);

SQL Solution:
SELECT employee_id, name
FROM employees
GROUP BY employee_id, name
HAVING COUNT(DISTINCT manager_id) > 1;

• Q.762

949
1000+ SQL Interview Questions & Answers | By Zero Analyst

Find the Employees Who Have No Colleagues in the Same Department


Question:
Write a SQL query to find the employees who do not have any colleagues in the same
department (i.e., they are the only ones in that department).
Explanation:
To solve this, you need to:
• Identify employees who belong to departments that only have one employee.
• Use a GROUP BY clause to count the number of employees in each department and filter for
those with a count of one.

Datasets and SQL Schemas

Employees Table
CREATE TABLE employees (
employee_id INT PRIMARY KEY,
name VARCHAR(100),
department VARCHAR(100)
);

INSERT INTO employees (employee_id, name, department) VALUES


(1, 'Alice', 'HR'),
(2, 'Bob', 'Engineering'),
(3, 'Charlie', 'Engineering'),
(4, 'David', 'HR'),
(5, 'Eva', 'Marketing'),
(6, 'Frank', 'Finance');

SQL Solution:
SELECT name, department
FROM employees
WHERE department NOT IN (
SELECT department
FROM employees
GROUP BY department
HAVING COUNT(*) > 1
);

Key Takeaways for the Questions:


• Subqueries: Useful for filtering based on a condition derived from another query.
• Window Functions: ROW_NUMBER() helps rank rows and is often used for "N-th highest"
type problems.
• GROUP BY and HAVING: Essential for aggregation and filtering based on group
conditions.
• Q.763
Question
Identify the "Power Users" from Capgemini's database. Power Users are defined as clients
who use Capgemini's software more than 5 times per week on average.
Explanation

950
1000+ SQL Interview Questions & Answers | By Zero Analyst

To solve this, you need to join the employees and usage_logs tables. For each client,
calculate the average usage per week, and filter the results to show only those clients with an
average usage greater than 5 times per week.
Datasets and SQL Schemas
• - Table creation
CREATE TABLE employees (
employee_id INT,
client_name VARCHAR(255)
);

CREATE TABLE usage_logs (


log_id INT,
employee_id INT,
login_date DATE
);
• - Datasets
-- employees data
INSERT INTO employees (employee_id, client_name)
VALUES
(2431, 'ABC Consultants'),
(3716, 'Zeta Corporation'),
(4293, 'Gamma Enterprises'),
(5632, 'Delta Services'),
(6427, 'Beta Technologies');

-- usage_logs data
INSERT INTO usage_logs (log_id, employee_id, login_date)
VALUES
(7381, 2431, '2022-07-05'),
(8127, 3716, '2022-07-06'),
(6743, 4293, '2022-07-06'),
(9823, 5632, '2022-07-07'),
(6257, 6427, '2022-07-07');

Learnings
• Joins: Combine data from multiple tables.
• Aggregation: Calculate average usage.
• Filtering: Use WHERE and HAVING for conditions.
• Date functions: Use date ranges to limit the data.
Solutions
• - PostgreSQL solution
SELECT e.client_name, AVG(u.usage_count) AS avg_usage
FROM employees e
LEFT JOIN (
SELECT employee_id, COUNT(*) AS usage_count
FROM usage_logs
WHERE login_date >= CURRENT_DATE - INTERVAL '1 week'
GROUP BY employee_id
) u ON e.employee_id = u.employee_id
GROUP BY e.client_name
HAVING AVG(u.usage_count) > 5;
• - MySQL solution
SELECT e.client_name, AVG(u.usage_count) AS avg_usage
FROM employees e
LEFT JOIN (
SELECT employee_id, COUNT(*) AS usage_count
FROM usage_logs
WHERE login_date >= CURDATE() - INTERVAL 1 WEEK
GROUP BY employee_id
) u ON e.employee_id = u.employee_id
GROUP BY e.client_name
HAVING AVG(u.usage_count) > 5;

951
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Q.764

Question
Write a SQL query to identify employees who have been absent for three or more
consecutive days from the "attendance" table.

Explanation
The query uses the ROW_NUMBER() function to assign sequential numbers to attendance
records, and then calculates a grp value to identify consecutive days. It groups by this grp
value and filters those groups where the count of absences is three or more.

Datasets and SQL Schemas


• - Table creation
CREATE TABLE attendance (
employee_id INT,
attendance_date DATE,
status VARCHAR(10)
);
• - Datasets
INSERT INTO attendance (employee_id, attendance_date, status)
VALUES
(1, '2025-01-01', 'absent'),
(1, '2025-01-02', 'absent'),
(1, '2025-01-03', 'absent'),
(2, '2025-01-01', 'present'),
(2, '2025-01-02', 'absent'),
(2, '2025-01-03', 'absent'),
(2, '2025-01-04', 'absent');

Learnings
• Using ROW_NUMBER() to assign sequential numbers.
• Identifying consecutive sequences using ROW_NUMBER() and EXTRACT().
• Grouping by calculated values and applying conditions to filter results.

Solutions
• - PostgreSQL solution
SELECT employee_id, MIN(attendance_date) AS start_date, MAX(attendance_date) AS end_date
FROM (
SELECT
employee_id,
attendance_date,
ROW_NUMBER() OVER (PARTITION BY employee_id ORDER BY attendance_date) - EXTRACT(
DAY FROM attendance_date) AS grp
FROM attendance
WHERE status = 'absent'
) AS sub
GROUP BY employee_id, grp
HAVING COUNT(*) >= 3;
• - MySQL solution
SELECT employee_id, MIN(attendance_date) AS start_date, MAX(attendance_date) AS end_date
FROM (
SELECT
employee_id,
attendance_date,
ROW_NUMBER() OVER (PARTITION BY employee_id ORDER BY attendance_date) - DATEDIFF
(attendance_date, '1970-01-01') AS grp
FROM attendance
WHERE status = 'absent'
) AS sub

952
1000+ SQL Interview Questions & Answers | By Zero Analyst

GROUP BY employee_id, grp


HAVING COUNT(*) >= 3;
• Q.765
Find the Product with the Most Uniquely Sold Items
Question:
Write a SQL query to find the product that has the most unique items sold (based on distinct
customer_id per product) in the past month.
Explanation:
• Join the sales table with the products table to get the product names.
• Group by product and calculate the count of distinct customers who bought each product.
• Return the product with the maximum number of unique buyers.

Datasets and SQL Schemas

Sales Table
CREATE TABLE sales (
sale_id INT PRIMARY KEY,
product_id INT,
customer_id INT,
sale_date DATE
);

INSERT INTO sales (sale_id, product_id, customer_id, sale_date) VALUES


(1, 101, 1, '2023-09-01'),
(2, 102, 2, '2023-09-02'),
(3, 101, 3, '2023-09-03'),
(4, 101, 1, '2023-09-10'),
(5, 103, 4, '2023-09-15'),
(6, 101, 2, '2023-09-20'),
(7, 102, 3, '2023-09-21'),
(8, 101, 5, '2023-09-25'),
(9, 102, 4, '2023-09-26');

Products Table
CREATE TABLE products (
product_id INT PRIMARY KEY,
product_name VARCHAR(100)
);

INSERT INTO products (product_id, product_name) VALUES


(101, 'Galaxy S23'),
(102, 'Galaxy Buds'),
(103, 'Galaxy Watch');

SQL Solution:
SELECT p.product_name
FROM products p
JOIN sales s ON p.product_id = s.product_id
WHERE s.sale_date BETWEEN '2023-09-01' AND '2023-09-30'
GROUP BY p.product_id, p.product_name
ORDER BY COUNT(DISTINCT s.customer_id) DESC
LIMIT 1;
• Q.766
Find Employees Who Have Worked in All Departments

953
1000+ SQL Interview Questions & Answers | By Zero Analyst

Question:
Write a SQL query to find employees who have worked in all the departments at least once.
Assume there is a table that tracks which employees have worked in which department.
Explanation:
• You need to check if an employee has worked in all departments.
• This involves using a COUNT on the department column, comparing it with the total
number of distinct departments.

Datasets and SQL Schemas

Employees Table
CREATE TABLE employees (
employee_id INT PRIMARY KEY,
name VARCHAR(100)
);

INSERT INTO employees (employee_id, name) VALUES


(1, 'Alice'),
(2, 'Bob'),
(3, 'Charlie'),
(4, 'David');

Employee_Departments Table
CREATE TABLE employee_departments (
employee_id INT,
department VARCHAR(100)
);

INSERT INTO employee_departments (employee_id, department) VALUES


(1, 'HR'),
(1, 'Engineering'),
(1, 'Marketing'),
(2, 'HR'),
(2, 'Engineering'),
(3, 'HR'),
(3, 'Engineering'),
(3, 'Marketing'),
(4, 'Finance');

SQL Solution:
SELECT e.name
FROM employees e
JOIN employee_departments ed ON e.employee_id = ed.employee_id
GROUP BY e.employee_id, e.name
HAVING COUNT(DISTINCT ed.department) = (SELECT COUNT(DISTINCT department) FROM
employee_departments);
• Q.767
Find Employees Who Have Not Worked in Consecutive Months
Question:
Write a SQL query to find employees who have not worked in two consecutive months.
Explanation:
• Use a self-join on the employee_attendance table to compare each month's records for an
employee.
• Ensure there are no consecutive month entries for an employee.

954
1000+ SQL Interview Questions & Answers | By Zero Analyst

Datasets and SQL Schemas

Employee_Attendance Table
CREATE TABLE employee_attendance (
employee_id INT,
attendance_date DATE
);

INSERT INTO employee_attendance (employee_id, attendance_date) VALUES


(1, '2023-01-05'),
(1, '2023-02-10'),
(1, '2023-03-01'),
(2, '2023-01-15'),
(2, '2023-03-20'),
(3, '2023-02-12'),
(3, '2023-04-01'),
(4, '2023-01-20');

SQL Solution:
SELECT e1.employee_id
FROM employee_attendance e1
LEFT JOIN employee_attendance e2
ON e1.employee_id = e2.employee_id
AND MONTH(e1.attendance_date) = MONTH(e2.attendance_date) - 1
WHERE e2.attendance_date IS NULL
GROUP BY e1.employee_id;

Key Takeaways:
• Self-Joins and Subqueries: Often used when comparing values within the same table or
when looking for relative differences.
• Window Functions: Helpful for ranking, counting, and partitioning data without needing
multiple joins.
• Advanced Aggregation: Sometimes, you need to aggregate data by comparing against
other aggregates (like counting the distinct departments or comparing counts across rows).
• Date and Time Functions: These can be tricky but are necessary when comparing dates or
months (e.g., MONTH() or DATEPART()).
• Q.768
Find the Employees Who Have Worked the Most Hours in Each Department

Question:
Write a SQL query to find the employee who has worked the most hours in each department.
Explanation:
You need to:
• Join the employee_hours table with the employees and departments tables.
• Group by department and employee to calculate the total hours worked.
• Use ROW_NUMBER() to rank employees within each department by their total hours and
return the employee with the most hours worked in each department.

Datasets and SQL Schemas

Employee Table

955
1000+ SQL Interview Questions & Answers | By Zero Analyst

CREATE TABLE employees (


employee_id INT PRIMARY KEY,
name VARCHAR(100)
);

INSERT INTO employees (employee_id, name) VALUES


(1, 'Alice'),
(2, 'Bob'),
(3, 'Charlie'),
(4, 'David');

Departments Table
CREATE TABLE departments (
department_id INT PRIMARY KEY,
department_name VARCHAR(100)
);

INSERT INTO departments (department_id, department_name) VALUES


(1, 'HR'),
(2, 'Engineering'),
(3, 'Sales');

Employee_Hours Table
CREATE TABLE employee_hours (
employee_id INT,
department_id INT,
hours_worked INT,
work_date DATE
);

INSERT INTO employee_hours (employee_id, department_id, hours_worked, work_date) VALUES


(1, 1, 40, '2023-03-01'),
(2, 1, 35, '2023-03-02'),
(3, 2, 50, '2023-03-03'),
(4, 2, 45, '2023-03-04'),
(1, 3, 30, '2023-03-05'),
(3, 3, 40, '2023-03-06'),
(2, 1, 38, '2023-03-07'),
(4, 2, 55, '2023-03-08');

SQL Solution:
WITH RankedEmployees AS (
SELECT e.name, d.department_name,
SUM(eh.hours_worked) AS total_hours,
ROW_NUMBER() OVER (PARTITION BY eh.department_id ORDER BY SUM(eh.hours_worked
) DESC) AS rank
FROM employee_hours eh
JOIN employees e ON eh.employee_id = e.employee_id
JOIN departments d ON eh.department_id = d.department_id
GROUP BY e.employee_id, d.department_name
)
SELECT name, department_name, total_hours
FROM RankedEmployees
WHERE rank = 1;

• Q.769
Question
Find all employees who make more money than their direct boss.
Explanation
To solve this, you need to compare the salary of each employee with that of their direct
manager. This can be done by self-joining the employees table, using the manager_id to

956
1000+ SQL Interview Questions & Answers | By Zero Analyst

match employees with their managers, and filtering where the employee's salary is greater
than the manager's salary.
Datasets and SQL Schemas
• - Table creation
CREATE TABLE employees (
employee_id INT,
name VARCHAR(255),
salary DECIMAL(10, 2),
department_id INT,
manager_id INT
);
• - Datasets
-- employees data
INSERT INTO employees (employee_id, name, salary, department_id, manager_id)
VALUES
(1, 'Emma Thompson', 3800, 1, NULL),
(2, 'Daniel Rodriguez', 2230, 1, 10),
(3, 'Olivia Smith', 8000, 1, 8),
(4, 'Noah Johnson', 6800, 2, 8),
(5, 'Sophia Martinez', 1750, 1, 10),
(8, 'William Davis', 7000, 2, NULL),
(10, 'James Anderson', 4000, 1, NULL);

Learnings
• Self-joins: Join a table with itself to compare records.
• Comparison: Compare employee and manager salaries.
• Filtering: Use conditions to filter employees who earn more than their managers.
Solutions
• - PostgreSQL solution
SELECT e.employee_id, e.name AS employee_name
FROM employees e
JOIN employees m ON e.manager_id = m.employee_id
WHERE e.salary > m.salary;
• - MySQL solution
SELECT e.employee_id, e.name AS employee_name
FROM employees e
JOIN employees m ON e.manager_id = m.employee_id
WHERE e.salary > m.salary;
• Q.770
Question
Find the quantity details of the first and last orders for each customer.
Explanation
To solve this, you need to identify each customer's first and last order by order_date and
then retrieve the quantities of those orders. This can be done by using a subquery or window
functions to get the first and last orders per customer, and then joining the results to retrieve
the order quantities.
Datasets and SQL Schemas
• - Table creation
CREATE TABLE orders (
order_id INT,
customer_id INT,
order_date DATE,
quantity INT
);
• - Datasets

957
1000+ SQL Interview Questions & Answers | By Zero Analyst

-- orders data
INSERT INTO orders (order_id, customer_id, order_date, quantity)
VALUES
(1, 101, '2022-01-05', 3),
(2, 101, '2022-02-15', 5),
(3, 102, '2022-03-10', 2),
(4, 102, '2022-04-01', 4),
(5, 103, '2022-05-15', 7);

Learnings
• Window functions or subqueries: To find first and last order per customer.
• Aggregation: Grouping data by customer to find first and last orders.
• Sorting: Use ORDER BY to determine the first and last order date.
Solutions
• - PostgreSQL solution
WITH ranked_orders AS (
SELECT
customer_id,
order_id,
quantity,
order_date,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_date ASC) AS row_num_
asc,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_date DESC) AS row_num
_desc
FROM orders
)
SELECT
customer_id,
MAX(CASE WHEN row_num_asc = 1 THEN quantity END) AS first_order_qty,
MAX(CASE WHEN row_num_desc = 1 THEN quantity END) AS last_order_qty
FROM ranked_orders
GROUP BY customer_id;
• - MySQL solution
SELECT
customer_id,
MAX(CASE WHEN order_date = (SELECT MIN(order_date) FROM orders o2 WHERE o2.customer_
id = o1.customer_id) THEN quantity END) AS first_order_qty,
MAX(CASE WHEN order_date = (SELECT MAX(order_date) FROM orders o2 WHERE o2.customer_
id = o1.customer_id) THEN quantity END) AS last_order_qty
FROM orders o1
GROUP BY customer_id;
• Q.771

Question
Capgemini has a large customer base, and you are required to filter the customers whose
names start with 'CAP'. Write a SQL query to find all records where CustomerName starts
with 'CAP'.

Explanation
The query filters records from the customers table where the CustomerName begins with
'CAP'. The % wildcard in the LIKE clause matches any characters following 'CAP'.

Datasets and SQL Schemas


• - Table creation
CREATE TABLE customers (
CustomerID INT,
CustomerName VARCHAR(100)
);

958
1000+ SQL Interview Questions & Answers | By Zero Analyst

• - Datasets
INSERT INTO customers (CustomerID, CustomerName)
VALUES
(1, 'CAPITAL'),
(2, 'CAPRICE'),
(3, 'APPLE'),
(4, 'CAPITALIZE');

Learnings
• Usage of LIKE with wildcards (%) to filter string patterns.
• Filtering data based on specific string matches.

Solutions
• - PostgreSQL solution
SELECT CustomerID, CustomerName
FROM customers
WHERE CustomerName LIKE 'CAP%';
• - MySQL solution
SELECT CustomerID, CustomerName
FROM customers
WHERE CustomerName LIKE 'CAP%';
• Q.772

Question 2: Find Employees Who Have Worked in All Departments in the Last
6 Months
Question:
Write a SQL query to find employees who have worked in every department in the last 6
months.
Explanation:
You need to:
• Identify the departments that exist in the organization.
• Check if employees have worked in all those departments in the last 6 months.
• Use HAVING to ensure that the number of distinct departments worked by the employee
matches the total number of departments.

Datasets and SQL Schemas

Employee Table
CREATE TABLE employees (
employee_id INT PRIMARY KEY,
name VARCHAR(100)
);

INSERT INTO employees (employee_id, name) VALUES


(1, 'Alice'),
(2, 'Bob'),
(3, 'Charlie'),
(4, 'David');

Departments Table
CREATE TABLE departments (
department_id INT PRIMARY KEY,
department_name VARCHAR(100)
);

959
1000+ SQL Interview Questions & Answers | By Zero Analyst

INSERT INTO departments (department_id, department_name) VALUES


(1, 'HR'),
(2, 'Engineering'),
(3, 'Sales'),
(4, 'Marketing');

Employee_Hours Table
CREATE TABLE employee_hours (
employee_id INT,
department_id INT,
hours_worked INT,
work_date DATE
);

INSERT INTO employee_hours (employee_id, department_id, hours_worked, work_date) VALUES


(1, 1, 40, '2023-06-01'),
(1, 2, 45, '2023-06-02'),
(1, 3, 35, '2023-06-03'),
(1, 4, 40, '2023-06-04'),
(2, 1, 40, '2023-07-01'),
(2, 2, 50, '2023-07-02'),
(3, 1, 30, '2023-05-10'),
(3, 2, 40, '2023-05-11'),
(3, 3, 40, '2023-05-12'),
(4, 3, 25, '2023-07-03');

SQL Solution:
SELECT e.name
FROM employees e
JOIN employee_hours eh ON e.employee_id = eh.employee_id
WHERE eh.work_date >= CURRENT_DATE - INTERVAL '6 months'
GROUP BY e.employee_id
HAVING COUNT(DISTINCT eh.department_id) = (SELECT COUNT(*) FROM departments);
• Q.773
Question
Find the customers who have not made a purchase in the last year (2023) but have made a
purchase in the current year (2024).
Explanation
To solve this, you need to filter customers who made purchases in 2024 but did not make any
purchases in 2023. Use the YEAR() function to filter orders by year, then apply a condition to
exclude customers who made purchases in 2023.
Datasets and SQL Schemas
• - Table creation
CREATE TABLE orders (
order_id INT,
customer_id INT,
order_date DATE,
quantity INT
);
• - Datasets
-- orders data
INSERT INTO orders (order_id, customer_id, order_date, quantity)
VALUES
(1, 101, '2024-01-05', 3),
(2, 102, '2023-06-15', 5),
(3, 103, '2024-02-10', 2),
(4, 101, '2024-08-15', 4),
(5, 102, '2023-03-01', 7);

Learnings

960
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Date functions: Use YEAR() to filter based on specific years.


• Filtering: Exclude customers who have purchased in the previous year (2023).
• Subqueries: Use a subquery to identify customers who bought in 2023.
Solutions
• - PostgreSQL solution
SELECT DISTINCT customer_id
FROM orders
WHERE EXTRACT(YEAR FROM order_date) = 2024
AND customer_id NOT IN (
SELECT DISTINCT customer_id
FROM orders
WHERE EXTRACT(YEAR FROM order_date) = 2023
);
• - MySQL solution
SELECT DISTINCT customer_id
FROM orders
WHERE YEAR(order_date) = 2024
AND customer_id NOT IN (
SELECT DISTINCT customer_id
FROM orders
WHERE YEAR(order_date) = 2023
);
• Q.774
Question
Write a SQL query to recommend friends to a user based on mutual friends, excluding direct
friends.

Explanation
This query first identifies the direct friends of the user, then it finds potential friends by
looking for users who share mutual friends with the given user. It excludes direct friends
from the potential friends list and ranks them based on the number of mutual friends.

Datasets and SQL Schemas


• - Table creation
CREATE TABLE friendships (
user_id INT,
friend_id INT
);
• - Datasets
INSERT INTO friendships (user_id, friend_id)
VALUES
(1, 2),
(1, 3),
(2, 3),
(2, 4),
(3, 4),
(4, 5);

Learnings
• Using JOIN to find mutual relationships.
• Using WITH clauses (Common Table Expressions) to structure complex queries.
• Excluding direct friends using NOT IN or similar filtering.
• Aggregating mutual relationships using COUNT().

Solutions

961
1000+ SQL Interview Questions & Answers | By Zero Analyst

• - PostgreSQL solution
WITH DirectFriends AS (
SELECT user_id, friend_id
FROM friendships
WHERE user_id = :user_id
),
MutualFriends AS (
SELECT f1.friend_id AS mutual_friend, f2.friend_id AS potential_friend
FROM friendships f1
JOIN friendships f2 ON f1.friend_id = f2.user_id
WHERE f1.user_id = :user_id AND f2.friend_id != :user_id
)
SELECT potential_friend, COUNT(*) AS mutual_count
FROM MutualFriends
WHERE potential_friend NOT IN (SELECT friend_id FROM DirectFriends)
GROUP BY potential_friend
ORDER BY mutual_count DESC;
• - MySQL solution
WITH DirectFriends AS (
SELECT user_id, friend_id
FROM friendships
WHERE user_id = :user_id
),
MutualFriends AS (
SELECT f1.friend_id AS mutual_friend, f2.friend_id AS potential_friend
FROM friendships f1
JOIN friendships f2 ON f1.friend_id = f2.user_id
WHERE f1.user_id = :user_id AND f2.friend_id != :user_id
)
SELECT potential_friend, COUNT(*) AS mutual_count
FROM MutualFriends
WHERE potential_friend NOT IN (SELECT friend_id FROM DirectFriends)
GROUP BY potential_friend
ORDER BY mutual_count DESC;
• Q.775
Question
Write a SQL query to find the number of customers who have called more than three times
between 3 PM and 6 PM.

Explanation
This query filters the calls made between 3 PM and 6 PM using EXTRACT(HOUR FROM
call_time), groups the results by customer, counts the number of calls per customer, and
then filters those customers with more than three calls.

Datasets and SQL Schemas


• - Table creation
CREATE TABLE calls (
customer_id INT,
call_time TIMESTAMP
);
• - Datasets
INSERT INTO calls (customer_id, call_time)
VALUES
(1, '2025-01-01 15:30:00'),
(1, '2025-01-01 16:00:00'),
(1, '2025-01-01 17:15:00'),
(1, '2025-01-01 18:00:00'),
(2, '2025-01-01 14:00:00'),
(2, '2025-01-01 15:30:00'),
(2, '2025-01-01 16:45:00'),
(3, '2025-01-01 15:00:00'),
(3, '2025-01-01 15:30:00'),
(3, '2025-01-01 16:15:00'),

962
1000+ SQL Interview Questions & Answers | By Zero Analyst

(3, '2025-01-01 17:45:00');

Learnings
• Using EXTRACT() to filter based on specific hours in timestamps.
• Grouping results with GROUP BY to aggregate data by customer.
• Applying HAVING to filter the groups after aggregation.

Solutions
• - PostgreSQL solution
SELECT customer_id, COUNT(*) AS call_count
FROM calls
WHERE EXTRACT(HOUR FROM call_time) BETWEEN 15 AND 18
GROUP BY customer_id
HAVING COUNT(*) > 3;
• - MySQL solution
SELECT customer_id, COUNT(*) AS call_count
FROM calls
WHERE HOUR(call_time) BETWEEN 15 AND 18
GROUP BY customer_id
HAVING COUNT(*) > 3;
• Q.776
Question
Write a SQL query to calculate the median search frequency from a table of search logs.

Explanation
This query uses the PERCENTILE_CONT(0.5) function to calculate the median search
frequency for each search term. The function computes the 50th percentile (median) by
ordering the frequencies within each search term group.

Datasets and SQL Schemas


• - Table creation
CREATE TABLE search_logs (
search_term VARCHAR(100),
frequency INT
);
• - Datasets
INSERT INTO search_logs (search_term, frequency)
VALUES
('apple', 5),
('apple', 10),
('apple', 15),
('banana', 3),
('banana', 7),
('banana', 8),
('cherry', 2),
('cherry', 6);

Learnings
• Using PERCENTILE_CONT() to calculate percentiles or medians.
• Grouping data by a column to calculate aggregation per group.
• Working with ordered data using WITHIN GROUP in percentile functions.

Solutions
• - PostgreSQL solution
SELECT

963
1000+ SQL Interview Questions & Answers | By Zero Analyst

search_term,
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY frequency) AS median_frequency
FROM search_logs
GROUP BY search_term;
• - MySQL solution
MySQL does not directly support PERCENTILE_CONT. You would need a different approach
to calculate the median, such as using window functions or manual percentile calculation.
Here's an example using variables:
SELECT search_term,
AVG(frequency) AS median_frequency
FROM (
SELECT search_term, frequency,
@rownum := @rownum + 1 AS rownum,
@total_rows := @total_rows + 1 AS total_rows
FROM search_logs, (SELECT @rownum := 0, @total_rows := 0) AS vars
ORDER BY search_term, frequency
) AS sorted_data
WHERE rownum IN (FLOOR((total_rows + 1) / 2), FLOOR((total_rows + 2) / 2))
GROUP BY search_term;
• Q.777
Question
Calculate the monthly average review score for each product. The output should be sorted in
ascending order of the month and then by product_id.
Explanation
To solve this, you need to group the reviews by month and product, calculate the average
review score for each group, and then order the results first by month and then by
product_id. The submit_date should be converted to extract the year and month.
Datasets and SQL Schemas
• - Table creation
CREATE TABLE reviews (
review_id INT,
user_id INT,
submit_date TIMESTAMP,
product_id INT,
stars INT
);
• - Datasets
-- reviews data
INSERT INTO reviews (review_id, user_id, submit_date, product_id, stars)
VALUES
(6171, 123, '2022-06-08 00:00:00', 50001, 4),
(7802, 265, '2022-06-10 00:00:00', 69852, 4),
(5293, 362, '2022-06-18 00:00:00', 50001, 3),
(6352, 192, '2022-07-26 00:00:00', 69852, 3),
(4517, 981, '2022-07-05 00:00:00', 69852, 2);

Learnings
• Date functions: Use DATE_TRUNC() or EXTRACT() to group by month and year.
• Aggregation: Use AVG() to calculate the average score for each group.
• Sorting: Order the results by month and product.
Solutions
• - PostgreSQL solution
SELECT
EXTRACT(YEAR FROM submit_date) AS year,
EXTRACT(MONTH FROM submit_date) AS month,

964
1000+ SQL Interview Questions & Answers | By Zero Analyst

product_id,
AVG(stars) AS avg_review_score
FROM reviews
GROUP BY year, month, product_id
ORDER BY year, month, product_id;
• - MySQL solution
SELECT
YEAR(submit_date) AS year,
MONTH(submit_date) AS month,
product_id,
AVG(stars) AS avg_review_score
FROM reviews
GROUP BY year, month, product_id
ORDER BY year, month, product_id;
• Q.778
Question
Given a table of transactions, write a SQL query to identify customers who have made the
same payment amount more than once.

Explanation
This query groups the transactions by both customer_id and amount, counts how many
times each combination occurs, and then filters out those with more than one occurrence,
indicating repeated payments.

Datasets and SQL Schemas


• - Table creation
CREATE TABLE transactions (
customer_id INT,
amount DECIMAL(10, 2)
);
• - Datasets
INSERT INTO transactions (customer_id, amount)
VALUES
(1, 100.00),
(1, 200.00),
(1, 100.00),
(2, 50.00),
(2, 50.00),
(3, 150.00),
(3, 150.00),
(3, 200.00);

Learnings
• Using GROUP BY to aggregate data based on multiple columns.
• Using COUNT() to identify duplicates or repeated entries.
• Filtering grouped data with HAVING to apply conditions on aggregated results.

Solutions
• - PostgreSQL solution
SELECT customer_id, amount, COUNT(*)
FROM transactions
GROUP BY customer_id, amount
HAVING COUNT(*) > 1;
• - MySQL solution
SELECT customer_id, amount, COUNT(*)
FROM transactions
GROUP BY customer_id, amount
HAVING COUNT(*) > 1;

965
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Q.779
Identify the Top 3 Customers Who Have Spent the Most on Products
Question:
Write a SQL query to find the top 3 customers who have spent the most on products,
including the total amount spent by each.
Explanation:
You need to:
• Join the customers, purchases, and products tables.
• Multiply the product price by the number of units purchased to calculate the total amount
spent by each customer.
• Use ORDER BY to sort customers based on the total amount spent and return the top 3.

Datasets and SQL Schemas

Customers Table
CREATE TABLE customers (
customer_id INT PRIMARY KEY,
name VARCHAR(100),
email VARCHAR(100)
);

INSERT INTO customers (customer_id, name, email) VALUES


(1, 'Alice', '[email protected]'),
(2, 'Bob', '[email protected]'),
(3, 'Charlie', '[email protected]'),
(4, 'David', '[email protected]');

Products Table
CREATE TABLE products (
product_id INT PRIMARY KEY,
product_name VARCHAR(100),
price DECIMAL(10, 2)
);

INSERT INTO products (product_id, product_name, price) VALUES


(101, 'Galaxy S23', 899.99),
(102, 'Galaxy Buds', 199.99),
(103, 'Galaxy Watch', 299.99);

Purchases Table
CREATE TABLE purchases (
purchase_id INT PRIMARY KEY,
customer_id INT,
product_id INT,
units INT
);

INSERT INTO purchases (purchase_id, customer_id, product_id, units) VALUES


(1, 1, 101, 1),
(2, 2, 102, 2),
(3, 3, 101, 3),
(4, 4, 103, 1),
(5, 1, 103, 1),
(6, 3, 102, 1),
(7, 2, 101, 2);

SQL Solution:

966
1000+ SQL Interview Questions & Answers | By Zero Analyst

SELECT c.name, SUM(p.price * pur.units) AS total_spent


FROM customers c
JOIN purchases pur ON c.customer_id = pur.customer_id
JOIN products p ON pur.product_id = p.product_id
GROUP BY c.customer_id
ORDER BY total_spent DESC
LIMIT 3;

Key Concepts Tested:


• Aggregations with Window Functions: The use of ROW_NUMBER() and COUNT to rank
employees or products.
• Date Filtering and Grouping: Working with CURRENT_DATE, INTERVAL, and date-based
grouping for time-based analysis.
• Complex Joins and Subqueries: Joining multiple tables and working with subqueries or
aggregation to find the desired result.
• HAVING Clause: Filtering results based on aggregates, such as finding employees who
have worked in all departments.
• Q.780
Cumulative Sales with Missing Dates
Question
Write a SQL query to calculate cumulative sales for each day, even when some dates are
missing in the sales table.
Explanation
This problem is tricky because you need to handle gaps in dates and still calculate cumulative
totals. It requires using JOIN or LEFT JOIN with a sequence of all dates to fill in the gaps.
Datasets and SQL Schemas
• - Table creation
CREATE TABLE sales (
sale_date DATE,
amount DECIMAL(10, 2)
);
• - Datasets
INSERT INTO sales (sale_date, amount)
VALUES
('2025-01-01', 100),
('2025-01-03', 200),
('2025-01-05', 150);

Solutions
• - PostgreSQL solution
WITH date_series AS (
SELECT generate_series('2025-01-01'::DATE, '2025-01-05'::DATE, '1 day'::INTERVAL) AS
sale_date
)
SELECT ds.sale_date,
COALESCE(SUM(s.amount) OVER (ORDER BY ds.sale_date), 0) AS cumulative_sales
FROM date_series ds
LEFT JOIN sales s ON s.sale_date = ds.sale_date
ORDER BY ds.sale_date;
• - MySQL solution
WITH RECURSIVE date_series AS (
SELECT '2025-01-01' AS sale_date
UNION ALL
SELECT DATE_ADD(sale_date, INTERVAL 1 DAY)

967
1000+ SQL Interview Questions & Answers | By Zero Analyst

FROM date_series
WHERE sale_date < '2025-01-05'
)
SELECT ds.sale_date,
COALESCE(SUM(s.amount) OVER (ORDER BY ds.sale_date), 0) AS cumulative_sales
FROM date_series ds
LEFT JOIN sales s ON s.sale_date = ds.sale_date
ORDER BY ds.sale_date;

Question By Job Roles


Data Analyst
• Q.781

Question:
Assign ranks to employees based on their salaries in descending order.

Explanation:
Use the RANK() window function to assign a rank to each employee based on their salary,
ordered from highest to lowest. The RANK() function will handle ties by giving the same rank
to employees with equal salaries, but will leave gaps in ranking.

Datasets and SQL Schemas


• - Table creation
CREATE TABLE employees (
employee_id INT,
employee_name VARCHAR(100),
salary DECIMAL(10, 2)
);
• - Datasets
INSERT INTO employees (employee_id, employee_name, salary)
VALUES
(1, 'Amit', 75000),
(2, 'Priya', 70000),
(3, 'Ravi', 75000),
(4, 'Neha', 68000),
(5, 'Vikram', 70000);

Learnings:
• Using the RANK() window function to assign a rank based on an ordered set.
• Handling ties in ranking using RANK(), which leaves gaps between ranks.
• Window functions allow for more advanced analysis within a result set.

Solutions
• - PostgreSQL solution
SELECT employee_name, salary, RANK() OVER (ORDER BY salary DESC) AS salary_rank
FROM employees;
• - MySQL solution
SELECT employee_name, salary, RANK() OVER (ORDER BY salary DESC) AS salary_rank
FROM employees;
• Q.782

Question:

968
1000+ SQL Interview Questions & Answers | By Zero Analyst

Find employees who attended work for three consecutive days.

Explanation:
To identify employees who attended for three consecutive days, you can use the LEAD() and
LAG() window functions to access previous and next rows. By comparing the dates of
consecutive rows, you can check if the difference between attendance dates is exactly one
day, and then filter those who meet the condition for three consecutive days.

Datasets and SQL Schemas


• - Table creation
CREATE TABLE attendance (
employee_id INT,
attendance_date DATE
);
• - Datasets
INSERT INTO attendance (employee_id, attendance_date)
VALUES
(1, '2025-01-10'),
(1, '2025-01-11'),
(1, '2025-01-12'),
(2, '2025-01-10'),
(2, '2025-01-12'),
(2, '2025-01-13'),
(3, '2025-01-10'),
(3, '2025-01-11'),
(3, '2025-01-12'),
(3, '2025-01-13');

Learnings:
• Using the LEAD() and LAG() window functions to access values from previous and next
rows.
• Calculating date differences to detect consecutive patterns.
• Filtering results based on specific conditions in a windowed result set.

Solutions
• - PostgreSQL solution
SELECT DISTINCT a.employee_id
FROM (
SELECT employee_id,
attendance_date,
LEAD(attendance_date) OVER (PARTITION BY employee_id ORDER BY attendance_date
) AS next_date,
LAG(attendance_date) OVER (PARTITION BY employee_id ORDER BY attendance_date)
AS prev_date
FROM attendance
) a
WHERE a.next_date = a.attendance_date + INTERVAL '1 day'
AND a.prev_date = a.attendance_date - INTERVAL '1 day';
• - MySQL solution
SELECT DISTINCT a.employee_id
FROM (
SELECT employee_id,
attendance_date,
LEAD(attendance_date) OVER (PARTITION BY employee_id ORDER BY attendance_date
) AS next_date,
LAG(attendance_date) OVER (PARTITION BY employee_id ORDER BY attendance_date)
AS prev_date
FROM attendance
) a
WHERE DATEDIFF(a.next_date, a.attendance_date) = 1
AND DATEDIFF(a.attendance_date, a.prev_date) = 1;

969
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Q.783

Question:
Write a query to detect duplicate orders in a table.

Explanation:
To identify duplicate orders, you can use aggregation with GROUP BY on the order-related
fields (such as order_id or customer_id). Then, filter the groups using HAVING to find those
with a count greater than 1, which indicates duplicates.

Datasets and SQL Schemas


• - Table creation
CREATE TABLE orders (
order_id INT,
customer_id INT,
order_date DATE,
total_amount DECIMAL(10, 2)
);
• - Datasets
INSERT INTO orders (order_id, customer_id, order_date, total_amount)
VALUES
(1, 101, '2025-01-10', 250.00),
(2, 102, '2025-01-11', 150.00),
(3, 101, '2025-01-12', 250.00),
(4, 103, '2025-01-10', 300.00),
(5, 101, '2025-01-10', 250.00),
(6, 104, '2025-01-13', 200.00);

Learnings:
• Using GROUP BY to group records based on key columns (e.g., order_id or customer_id).
• Using HAVING to filter aggregated data and find duplicates.
• Identifying duplicate entries in a dataset based on specific criteria.

Solutions
• - PostgreSQL solution
SELECT order_id, customer_id, COUNT(*) AS duplicate_count
FROM orders
GROUP BY order_id, customer_id
HAVING COUNT(*) > 1;
• - MySQL solution
SELECT order_id, customer_id, COUNT(*) AS duplicate_count
FROM orders
GROUP BY order_id, customer_id
HAVING COUNT(*) > 1;
• Q.784

Question:
How do you implement auto-incrementing fields in SQL?

Explanation:
Auto-incrementing fields are used to automatically generate unique values for a primary key
column, typically for identifiers like ID. In SQL, different databases use different

970
1000+ SQL Interview Questions & Answers | By Zero Analyst

mechanisms for this feature. For SQL Server, the IDENTITY property is used, whereas
MySQL uses AUTO_INCREMENT.

Datasets and SQL Schemas


• - Table creation
CREATE TABLE employees (
employee_id INT IDENTITY(1,1), -- For SQL Server
employee_name VARCHAR(100),
salary DECIMAL(10, 2)
);

For MySQL, the syntax would be:


CREATE TABLE employees (
employee_id INT AUTO_INCREMENT, -- For MySQL
employee_name VARCHAR(100),
salary DECIMAL(10, 2),
PRIMARY KEY (employee_id)
);
• - Datasets
INSERT INTO employees (employee_name, salary)
VALUES
('Amit', 60000),
('Priya', 70000),
('Ravi', 75000);

Learnings:
• SQL Server uses the IDENTITY property for auto-incrementing fields, where you can
specify a starting value and increment.
• MySQL uses the AUTO_INCREMENT keyword to automatically increment field values with
each new row insertion.
• Both methods allow you to automatically generate unique values for primary key columns.

Solutions
• - SQL Server solution
CREATE TABLE employees (
employee_id INT IDENTITY(1,1) PRIMARY KEY,
employee_name VARCHAR(100),
salary DECIMAL(10, 2)
);
• - MySQL solution
CREATE TABLE employees (
employee_id INT AUTO_INCREMENT PRIMARY KEY,
employee_name VARCHAR(100),
salary DECIMAL(10, 2)
);
• Q.785

Question:
How can you insert NULL values into a column during data insertion?

Explanation:
To insert NULL values into a column, ensure the column allows NULL by not specifying NOT
NULL when creating the table. Then, use the NULL keyword in the INSERT statement where
you want to insert a NULL value.

Datasets and SQL Schemas

971
1000+ SQL Interview Questions & Answers | By Zero Analyst

• - Table creation
CREATE TABLE employees (
employee_id INT,
employee_name VARCHAR(100),
department_id INT NULL, -- Allowing NULL values in department_id
salary DECIMAL(10, 2)
);
• - Datasets
INSERT INTO employees (employee_id, employee_name, department_id, salary)
VALUES
(1, 'Amit', NULL, 60000), -- Inserting NULL for department_id
(2, 'Priya', 101, 70000),
(3, 'Ravi', NULL, 75000); -- Inserting NULL for department_id

Learnings:
• A column must be defined to allow NULL values (i.e., no NOT NULL constraint).
• NULL can be explicitly inserted using the NULL keyword in the INSERT statement.
• NULL is different from an empty string or zero — it represents the absence of a value.

Solutions
• - Postgres SQL and MySQL solution
INSERT INTO employees (employee_id, employee_name, department_id, salary)
VALUES
(1, 'Amit', NULL, 60000), -- Insert NULL for department_id
(2, 'Priya', 101, 70000),
(3, 'Ravi', NULL, 75000); -- Insert NULL for department_id
• Q.786

Question:
What are temporary tables, and when would you use them?

Explanation:
Temporary tables are tables that exist only for the duration of a session or transaction. They
can store intermediate results and are automatically dropped when the session ends or when
explicitly dropped. You would use them when you need to perform multiple complex
operations and want to store intermediate results temporarily to simplify queries or improve
performance.

Datasets and SQL Schemas


• - Table creation (Temporary Table)
CREATE TEMPORARY TABLE temp_employees (
employee_id INT,
employee_name VARCHAR(100),
salary DECIMAL(10, 2)
);
• - Inserting data into a Temporary Table
INSERT INTO temp_employees (employee_id, employee_name, salary)
VALUES
(1, 'Amit', 60000),
(2, 'Priya', 70000),
(3, 'Ravi', 75000);
• - Using the Temporary Table in a Query
SELECT * FROM temp_employees WHERE salary > 65000;

Learnings:

972
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Temporary Tables are useful for storing intermediate results that are needed only during
the session or transaction.
• They are automatically dropped at the end of the session or when the connection is closed,
saving storage and avoiding clutter.
• Use cases include complex reporting, batch processing, and multi-step data
transformations where intermediate data is needed but doesn’t need to be persisted.

Solutions
• - Postgres SQL / MySQL solution
-- Creating a temporary table
CREATE TEMPORARY TABLE temp_employees (
employee_id INT,
employee_name VARCHAR(100),
salary DECIMAL(10, 2)
);

-- Inserting data into the temporary table


INSERT INTO temp_employees (employee_id, employee_name, salary)
VALUES
(1, 'Amit', 60000),
(2, 'Priya', 70000),
(3, 'Ravi', 75000);

-- Querying the temporary table


SELECT * FROM temp_employees WHERE salary > 65000;

-- Dropping the temporary table manually (optional, usually not needed)


DROP TEMPORARY TABLE IF EXISTS temp_employees;
• Q.787

Question:
Differentiate between clustered and non-clustered indexes.

Explanation:
Clustered and non-clustered indexes are two types of database indexing techniques used to
improve the performance of data retrieval. The key difference lies in how the data is
physically stored and accessed.
• Clustered Index: The table's data is physically organized in the order of the clustered
index. There can only be one clustered index per table because the data can be sorted in only
one order. In MySQL and PostgreSQL, the primary key is often implemented as a clustered
index.
• Non-Clustered Index: A non-clustered index does not alter the physical order of the data.
Instead, it creates a separate structure that holds the indexed values and a pointer to the actual
data rows. Multiple non-clustered indexes can be created on a table.

Datasets and SQL Schemas


• - Table creation with primary key (Clustered Index)
CREATE TABLE employees (
employee_id INT PRIMARY KEY, -- Clustered index (default for primary key)
employee_name VARCHAR(100),
salary DECIMAL(10, 2)
);
• - Creating a non-clustered index
CREATE INDEX idx_salary ON employees (salary); -- Non-clustered index

973
1000+ SQL Interview Questions & Answers | By Zero Analyst

Learnings:
• Clustered Index:
• Data is physically sorted based on the index.
• Only one clustered index can exist per table.
• Typically created by default on primary keys.
• Non-Clustered Index:
• Does not alter the physical data order.
• Can have multiple non-clustered indexes on a table.
• Indexes are stored separately with pointers to data rows.

Solutions
MySQL Solution:
-- Clustered Index (Default for PRIMARY KEY)
CREATE TABLE employees (
employee_id INT PRIMARY KEY, -- Clustered index automatically created
employee_name VARCHAR(100),
salary DECIMAL(10, 2)
);

-- Non-Clustered Index
CREATE INDEX idx_salary ON employees (salary); -- Non-clustered index

PostgreSQL Solution:
-- Clustered Index (PRIMARY KEY creates a clustered index)
CREATE TABLE employees (
employee_id INT PRIMARY KEY, -- Clustered index by default
employee_name VARCHAR(100),
salary DECIMAL(10, 2)
);

-- Non-Clustered Index
CREATE INDEX idx_salary ON employees (salary); -- Non-clustered index

Key Differences:
• Clustered Index:
• Directly affects the physical storage order of data.
• Only one per table.
• Fast access to rows in the order of the index.
• Non-Clustered Index:
• Does not affect physical storage.
• Can have multiple non-clustered indexes.
• Slower access than clustered, but can be more flexible for queries involving multiple
columns or conditions.
• Q.788

Question:
Explain the concept of transactions in SQL.

Explanation:
A transaction in SQL is a sequence of one or more SQL operations that are executed as a
single unit of work. The transaction ensures data integrity and consistency, following the

974
1000+ SQL Interview Questions & Answers | By Zero Analyst

ACID properties. These properties ensure that transactions are processed reliably and
concurrently without causing data corruption.
• ACID Properties:
• Atomicity: A transaction is atomic; it either completes entirely or not at all. If an error
occurs, the changes are rolled back.
• Consistency: The database must transition from one consistent state to another after the
transaction.
• Isolation: Transactions are isolated from each other; changes made by one transaction are
not visible to others until committed.
• Durability: Once a transaction is committed, its changes are permanent, even if there’s a
system failure.
SQL transactions are controlled using the following commands:
• BEGIN: Starts a transaction.
• COMMIT: Commits the changes made by the transaction to the database.
• ROLLBACK: Rolls back (undoes) all changes made in the current transaction.

Datasets and SQL Schemas


• - Table creation
CREATE TABLE accounts (
account_id INT PRIMARY KEY,
account_name VARCHAR(100),
balance DECIMAL(10, 2)
);
• - Inserting initial data
INSERT INTO accounts (account_id, account_name, balance)
VALUES
(1, 'Alice', 1000.00),
(2, 'Bob', 500.00);

Learnings:
• Transactions ensure data consistency and integrity.
• The ACID properties are critical for maintaining the reliability of database operations.
• BEGIN, COMMIT, and ROLLBACK are essential SQL commands for controlling
transactions.

Solutions
• - Example of a Transaction
BEGIN;

-- Deducting money from Alice's account


UPDATE accounts SET balance = balance - 100 WHERE account_id = 1;

-- Adding money to Bob's account


UPDATE accounts SET balance = balance + 100 WHERE account_id = 2;

-- If everything is fine, commit the transaction


COMMIT;

-- In case of an error, roll back the transaction


ROLLBACK;

Key Points:
• BEGIN starts a new transaction.
• COMMIT saves changes made during the transaction.

975
1000+ SQL Interview Questions & Answers | By Zero Analyst

• ROLLBACK undoes changes if an error occurs or if you want to discard the transaction.
• Q.789

Question:
Describe the different normalization forms and their purposes.

Explanation:
Normalization is the process of organizing a database to reduce redundancy and dependency.
The goal is to ensure that the data is logically stored, efficient, and easy to maintain. There
are several normal forms (NF), each addressing specific types of issues in database design.
• First Normal Form (1NF):
• Purpose: Ensures that the table only contains atomic (indivisible) values and that each
column has a unique name.
• Requirement: Each record must be unique, and each column must contain atomic values
(i.e., no repeating groups or arrays).
Example (1NF):
CREATE TABLE orders (
order_id INT,
customer_name VARCHAR(100),
product_names VARCHAR(255) -- This violates 1NF, as multiple product names are stor
ed in one column.
);

Corrected for 1NF:


CREATE TABLE orders (
order_id INT,
customer_name VARCHAR(100),
product_name VARCHAR(100) -- Now each product name is stored in a separate row
);
• Second Normal Form (2NF):
• Purpose: Eliminates partial dependencies, i.e., all non-key attributes must depend on the
entire primary key.
• Requirement: The table must already be in 1NF. If the table has a composite primary key,
all non-key attributes must depend on the full primary key, not just part of it.
Example (2NF):
CREATE TABLE order_details (
order_id INT,
product_id INT,
product_name VARCHAR(100), -- This is partially dependent on product_id
quantity INT,
PRIMARY KEY (order_id, product_id)
);

Corrected for 2NF:


CREATE TABLE products (
product_id INT PRIMARY KEY,
product_name VARCHAR(100)
);

CREATE TABLE order_details (


order_id INT,
product_id INT,
quantity INT,
PRIMARY KEY (order_id, product_id),
FOREIGN KEY (product_id) REFERENCES products (product_id)
);

976
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Third Normal Form (3NF):


• Purpose: Eliminates transitive dependencies, i.e., non-key attributes should not depend on
other non-key attributes.
• Requirement: The table must already be in 2NF. All non-key attributes must depend only
on the primary key, and not on other non-key attributes.
Example (3NF):
CREATE TABLE employees (
employee_id INT PRIMARY KEY,
employee_name VARCHAR(100),
department_id INT,
department_name VARCHAR(100) -- This violates 3NF as department_name depends on dep
artment_id, not on employee_id
);

Corrected for 3NF:


CREATE TABLE departments (
department_id INT PRIMARY KEY,
department_name VARCHAR(100)
);

CREATE TABLE employees (


employee_id INT PRIMARY KEY,
employee_name VARCHAR(100),
department_id INT,
FOREIGN KEY (department_id) REFERENCES departments (department_id)
);
• Boyce-Codd Normal Form (BCNF):
• Purpose: Strengthens 3NF by addressing situations where a table has a non-trivial
dependency that violates the key rules.
• Requirement: The table must already be in 3NF. In BCNF, for every functional
dependency, the left-hand side must be a superkey (a candidate key or a combination of
columns that uniquely identifies a row).
Example (BCNF):
CREATE TABLE courses (
course_id INT,
instructor_id INT,
instructor_name VARCHAR(100),
PRIMARY KEY (course_id, instructor_id)
);

In this table, instructor_name is dependent on instructor_id, but instructor_id is not a


superkey. This violates BCNF.
Corrected for BCNF:
CREATE TABLE instructors (
instructor_id INT PRIMARY KEY,
instructor_name VARCHAR(100)
);

CREATE TABLE courses (


course_id INT PRIMARY KEY,
instructor_id INT,
FOREIGN KEY (instructor_id) REFERENCES instructors (instructor_id)
);

Learnings:
• 1NF: Ensures atomicity of columns (no repeating groups).
• 2NF: Eliminates partial dependencies on a composite primary key.

977
1000+ SQL Interview Questions & Answers | By Zero Analyst

• 3NF: Removes transitive dependencies, ensuring no non-key attributes depend on other


non-key attributes.
• BCNF: A stricter version of 3NF, where every determinant is a superkey.

Solutions:
• - SQL for Normalization Forms
-- 1NF Example (Corrected)
CREATE TABLE orders (
order_id INT PRIMARY KEY,
customer_name VARCHAR(100),
product_name VARCHAR(100)
);

-- 2NF Example (Corrected)


CREATE TABLE products (
product_id INT PRIMARY KEY,
product_name VARCHAR(100)
);

CREATE TABLE order_details (


order_id INT,
product_id INT,
quantity INT,
PRIMARY KEY (order_id, product_id),
FOREIGN KEY (product_id) REFERENCES products (product_id)
);

-- 3NF Example (Corrected)


CREATE TABLE departments (
department_id INT PRIMARY KEY,
department_name VARCHAR(100)
);

CREATE TABLE employees (


employee_id INT PRIMARY KEY,
employee_name VARCHAR(100),
department_id INT,
FOREIGN KEY (department_id) REFERENCES departments (department_id)
);

-- BCNF Example (Corrected)


CREATE TABLE instructors (
instructor_id INT PRIMARY KEY,
instructor_name VARCHAR(100)
);

CREATE TABLE courses (


course_id INT PRIMARY KEY,
instructor_id INT,
FOREIGN KEY (instructor_id) REFERENCES instructors (instructor_id)
);
• Q.790

Question:
What is a view in SQL, and how is it different from a table? Provide an example of how to
create and use a view.

Explanation:
A view in SQL is a virtual table that consists of a stored query result. It does not store data
physically but retrieves data from underlying tables when queried. Views can simplify
complex queries, provide a layer of security (by restricting access to specific columns or
rows), and present a customized perspective of the data.

978
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Difference from a Table: A table stores data physically, while a view stores the SQL
query to retrieve data.
• Use of Views: Views are typically used to simplify complex joins, aggregate data, or
present a specific subset of data to the user.

Datasets and SQL Schemas


• - Table creation
CREATE TABLE employees (
employee_id INT PRIMARY KEY,
employee_name VARCHAR(100),
department_id INT,
salary DECIMAL(10, 2)
);

CREATE TABLE departments (


department_id INT PRIMARY KEY,
department_name VARCHAR(100)
);
• - Sample data insertion
INSERT INTO employees (employee_id, employee_name, department_id, salary)
VALUES
(1, 'Amit', 1, 60000),
(2, 'Priya', 2, 70000),
(3, 'Ravi', 1, 75000);

INSERT INTO departments (department_id, department_name)


VALUES
(1, 'Engineering'),
(2, 'HR');

Learnings:
• A view is a stored query and does not store data itself.
• Views are useful for simplifying complex queries and enhancing security.
• Unlike tables, views can represent data from multiple tables or provide a filtered subset.

Solutions
• - Creating a View
CREATE VIEW employee_salary_view AS
SELECT e.employee_id, e.employee_name, d.department_name, e.salary
FROM employees e
JOIN departments d ON e.department_id = d.department_id;
• - Querying a View
SELECT * FROM employee_salary_view;
• - Dropping a View
DROP VIEW IF EXISTS employee_salary_view;

Data Engineer
• Q.791
Question
Design a database schema for a stand-alone fast food restaurant.
Follow-up: Write a query to find the top three items by revenue and the percentage of
customers who order drinks with their meals.
Explanation
The task is to design a database schema for a fast food restaurant with tables that include
customers, orders, items, and transactions. After the schema is designed, write a query that

979
1000+ SQL Interview Questions & Answers | By Zero Analyst

identifies the top three items by revenue and calculates the percentage of customers who
order drinks along with their meals.
Datasets and SQL Schemas
• - Table creation
CREATE TABLE Customers (
customer_id INT PRIMARY KEY,
customer_name VARCHAR(100),
email VARCHAR(100)
);

CREATE TABLE MenuItems (


item_id INT PRIMARY KEY,
item_name VARCHAR(100),
price DECIMAL(10, 2),
category VARCHAR(50) -- e.g., 'meal', 'drink', 'side'
);

CREATE TABLE Orders (


order_id INT PRIMARY KEY,
customer_id INT,
order_date DATE,
FOREIGN KEY (customer_id) REFERENCES Customers(customer_id)
);

CREATE TABLE OrderItems (


order_item_id INT PRIMARY KEY,
order_id INT,
item_id INT,
quantity INT,
FOREIGN KEY (order_id) REFERENCES Orders(order_id),
FOREIGN KEY (item_id) REFERENCES MenuItems(item_id)
);
• - Sample datasets
-- Inserting sample data into Customers table
INSERT INTO Customers (customer_id, customer_name, email)
VALUES
(1, 'John Doe', '[email protected]'),
(2, 'Jane Smith', '[email protected]');

-- Inserting sample data into MenuItems table


INSERT INTO MenuItems (item_id, item_name, price, category)
VALUES
(1, 'Burger', 5.99, 'meal'),
(2, 'Fries', 2.49, 'side'),
(3, 'Cola', 1.99, 'drink'),
(4, 'Water', 0.99, 'drink');

-- Inserting sample data into Orders table


INSERT INTO Orders (order_id, customer_id, order_date)
VALUES
(1, 1, '2025-01-10'),
(2, 2, '2025-01-11');

-- Inserting sample data into OrderItems table


INSERT INTO OrderItems (order_item_id, order_id, item_id, quantity)
VALUES
(1, 1, 1, 2),
(2, 1, 3, 1),
(3, 2, 4, 1),
(4, 2, 1, 1);

Learnings
• Understanding database normalization for restaurant-related entities (customers, orders,
menu items).
• Using aggregate functions like SUM() for revenue calculations.
• Utilizing JOIN operations to combine data from multiple tables.

980
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Using COUNT() and GROUP BY for customer-based statistics.


Solutions
• - PostgreSQL solution
-- Top 3 items by revenue
SELECT mi.item_name, SUM(oi.quantity * mi.price) AS total_revenue
FROM OrderItems oi
JOIN MenuItems mi ON oi.item_id = mi.item_id
GROUP BY mi.item_name
ORDER BY total_revenue DESC
LIMIT 3;

-- Percentage of customers who order drinks


SELECT
(COUNT(DISTINCT CASE WHEN mi.category = 'drink' THEN o.customer_id END) * 100.0) / C
OUNT(DISTINCT o.customer_id) AS percentage_of_customers_with_drinks
FROM Orders o
JOIN OrderItems oi ON o.order_id = oi.order_id
JOIN MenuItems mi ON oi.item_id = mi.item_id;
• - MySQL solution
-- Top 3 items by revenue
SELECT mi.item_name, SUM(oi.quantity * mi.price) AS total_revenue
FROM OrderItems oi
JOIN MenuItems mi ON oi.item_id = mi.item_id
GROUP BY mi.item_name
ORDER BY total_revenue DESC
LIMIT 3;

-- Percentage of customers who order drinks


SELECT
(COUNT(DISTINCT CASE WHEN mi.category = 'drink' THEN o.customer_id END) * 100.0) / C
OUNT(DISTINCT o.customer_id) AS percentage_of_customers_with_drinks
FROM Orders o
JOIN OrderItems oi ON o.order_id = oi.order_id
JOIN MenuItems mi ON oi.item_id = mi.item_id;
• Q.792
Question
Design a database schema to track how long each car takes to enter and exit the Golden Gate
Bridge.
Follow-up: Write a query to get the time of the fastest car on the current day.
Explanation
The task is to design a schema to monitor car entry and exit times on the Golden Gate Bridge.
You need to store details about vehicles, their entry and exit timestamps, and then write a
query to find the fastest car on a given day.
Datasets and SQL Schemas
• - Table creation
CREATE TABLE Vehicles (
vehicle_id INT PRIMARY KEY,
license_plate VARCHAR(50),
vehicle_type VARCHAR(50)
);

CREATE TABLE BridgeCrossing (


crossing_id INT PRIMARY KEY,
vehicle_id INT,
entry_time TIMESTAMP,
exit_time TIMESTAMP,
FOREIGN KEY (vehicle_id) REFERENCES Vehicles(vehicle_id)
);
• - Sample datasets

981
1000+ SQL Interview Questions & Answers | By Zero Analyst

-- Inserting sample data into Vehicles table


INSERT INTO Vehicles (vehicle_id, license_plate, vehicle_type)
VALUES
(1, 'ABC123', 'car'),
(2, 'XYZ456', 'truck'),
(3, 'LMN789', 'car');

-- Inserting sample data into BridgeCrossing table


INSERT INTO BridgeCrossing (crossing_id, vehicle_id, entry_time, exit_time)
VALUES
(1, 1, '2025-01-10 08:00:00', '2025-01-10 08:15:00'),
(2, 2, '2025-01-10 09:00:00', '2025-01-10 09:30:00'),
(3, 3, '2025-01-10 10:00:00', '2025-01-10 10:05:00');

Learnings
• Using TIMESTAMP for tracking precise entry and exit times.
• Calculating time differences using EXTRACT(EPOCH FROM ...) in PostgreSQL or
TIMESTAMPDIFF() in MySQL.
• Using aggregate functions and ORDER BY to find the fastest car.
Solutions
• - PostgreSQL solution
-- Query to get the time of the fastest car on the current day
SELECT v.license_plate,
(EXTRACT(EPOCH FROM (bc.exit_time - bc.entry_time)) / 60) AS time_minutes
FROM BridgeCrossing bc
JOIN Vehicles v ON bc.vehicle_id = v.vehicle_id
WHERE bc.entry_time::DATE = CURRENT_DATE
ORDER BY time_minutes ASC
LIMIT 1;
• - MySQL solution
-- Query to get the time of the fastest car on the current day
SELECT v.license_plate,
(TIMESTAMPDIFF(SECOND, bc.entry_time, bc.exit_time) / 60) AS time_minutes
FROM BridgeCrossing bc
JOIN Vehicles v ON bc.vehicle_id = v.vehicle_id
WHERE DATE(bc.entry_time) = CURDATE()
ORDER BY time_minutes ASC
LIMIT 1;
• Q.793

Question:
Explain SQL injection and how to prevent it.

Explanation:
SQL injection is a security vulnerability that occurs when an attacker can manipulate a SQL
query by injecting malicious SQL code into user inputs. This can allow unauthorized access,
modification, or deletion of data in the database. To prevent SQL injection, always use
parameterized queries (prepared statements) to safely bind user inputs and validate inputs to
ensure they conform to expected formats.

Learnings:
• SQL injection exploits user inputs to manipulate SQL queries.
• Using parameterized queries ensures user inputs are treated as data, not executable code.
• Input validation helps ensure that only valid data is processed, preventing harmful inputs.

Solutions:

982
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Parameterized Queries (Prepared Statements):


• PostgreSQL (using pg library in Node.js as an example):
SELECT * FROM users WHERE username = $1 AND password = $2;
• The $1 and $2 are placeholders for user inputs, and the values are bound at execution time,
preventing injection.
• MySQL (using mysqli library in PHP as an example):
$stmt = $mysqli->prepare("SELECT * FROM users WHERE username = ? AND password = ?");
$stmt->bind_param("ss", $username, $password);
$stmt->execute();
• Input Validation:
• Ensure inputs conform to expected types (e.g., email addresses, dates, integers).
• Use regex or specific functions to validate inputs:
• Email validation:
import re
email = "[email protected]"
if re.match(r"[^@]+@[^@]+\.[^@]+", email):
print("Valid email!")
else:
print("Invalid email!")
• Integer validation:
if isinstance(user_input, int):
print("Valid integer!")
else:
print("Invalid input!")
• Q.794
Question
How would you create a schema to represent client click data on the web?
Explanation
The task is to design a schema to store client click data, which tracks user interactions on a
website. You need tables to store information about users, pages visited, clicks, and relevant
session data.
Datasets and SQL Schemas
• - Table creation
CREATE TABLE Users (
user_id INT PRIMARY KEY,
user_agent VARCHAR(255),
ip_address VARCHAR(50)
);

CREATE TABLE Pages (


page_id INT PRIMARY KEY,
page_url VARCHAR(255),
page_title VARCHAR(255)
);

CREATE TABLE Clicks (


click_id INT PRIMARY KEY,
user_id INT,
page_id INT,
click_time TIMESTAMP,
click_position VARCHAR(50), -- e.g., "header", "footer", "sidebar"
FOREIGN KEY (user_id) REFERENCES Users(user_id),
FOREIGN KEY (page_id) REFERENCES Pages(page_id)
);

CREATE TABLE Sessions (


session_id INT PRIMARY KEY,
user_id INT,
session_start TIMESTAMP,

983
1000+ SQL Interview Questions & Answers | By Zero Analyst

session_end TIMESTAMP,
FOREIGN KEY (user_id) REFERENCES Users(user_id)
);
• - Sample datasets
-- Inserting sample data into Users table
INSERT INTO Users (user_id, user_agent, ip_address)
VALUES
(1, 'Mozilla/5.0', '192.168.1.1'),
(2, 'Chrome/91.0', '192.168.1.2');

-- Inserting sample data into Pages table


INSERT INTO Pages (page_id, page_url, page_title)
VALUES
(1, '/home', 'Home Page'),
(2, '/product', 'Product Page');

-- Inserting sample data into Clicks table


INSERT INTO Clicks (click_id, user_id, page_id, click_time, click_position)
VALUES
(1, 1, 1, '2025-01-10 10:00:00', 'header'),
(2, 2, 2, '2025-01-10 11:30:00', 'footer');

-- Inserting sample data into Sessions table


INSERT INTO Sessions (session_id, user_id, session_start, session_end)
VALUES
(1, 1, '2025-01-10 09:50:00', '2025-01-10 10:30:00'),
(2, 2, '2025-01-10 11:00:00', '2025-01-10 12:00:00');

Learnings
• Using foreign keys to establish relationships between users, pages, clicks, and sessions.
• Tracking detailed click data including the position of clicks on a page.
• Storing user agent and IP address information for session tracking.
• Utilizing TIMESTAMP for accurate recording of event times.
Solutions
• - PostgreSQL solution
-- Query to get the total number of clicks per page
SELECT p.page_url, COUNT(c.click_id) AS total_clicks
FROM Clicks c
JOIN Pages p ON c.page_id = p.page_id
GROUP BY p.page_url;
• - MySQL solution
-- Query to get the total number of clicks per page
SELECT p.page_url, COUNT(c.click_id) AS total_clicks
FROM Clicks c
JOIN Pages p ON c.page_id = p.page_id
GROUP BY p.page_url;
• Q.795
Question
Explain how you would perform an ETL (Extract, Transform, Load) process using SQL.
Explanation
The task is to describe how to implement an ETL process using SQL. The process involves
extracting data from a source, transforming it into the desired format, and then loading it into
the destination database. You will use SQL queries for each step to manipulate the data and
ensure it’s correctly loaded into the target system.
Datasets and SQL Schemas
• - Table creation
-- Source table (raw data)
CREATE TABLE source_data (

984
1000+ SQL Interview Questions & Answers | By Zero Analyst

id INT PRIMARY KEY,


customer_name VARCHAR(255),
purchase_amount DECIMAL(10, 2),
purchase_date DATE
);

-- Destination table (cleaned data)


CREATE TABLE target_data (
id INT PRIMARY KEY,
customer_name VARCHAR(255),
total_purchase DECIMAL(10, 2),
purchase_count INT
);
• - Sample datasets
-- Inserting sample data into source_data table
INSERT INTO source_data (id, customer_name, purchase_amount, purchase_date)
VALUES
(1, 'John Doe', 100.50, '2025-01-01'),
(2, 'Jane Smith', 200.75, '2025-01-02'),
(3, 'John Doe', 150.25, '2025-01-03');

-- Inserting sample data into target_data table (initially empty)


INSERT INTO target_data (id, customer_name, total_purchase, purchase_count)
VALUES
(1, 'John Doe', 250.75, 2),
(2, 'Jane Smith', 200.75, 1);

Learnings
• The Extract phase involves fetching data from a source, often a raw or transactional
system.
• The Transform phase includes cleaning and aggregating data, such as calculating totals or
grouping by categories.
• The Load phase moves the transformed data into a target table, ensuring the structure
matches the target schema.
• Using JOIN, GROUP BY, and INSERT INTO statements in SQL to manipulate and load data.
Solutions
• - PostgreSQL solution
-- ETL Process in SQL

-- Extract: Fetch data from the source


WITH extracted_data AS (
SELECT customer_name, SUM(purchase_amount) AS total_purchase, COUNT(id) AS purchase_
count
FROM source_data
GROUP BY customer_name
)

-- Transform: Clean and aggregate the data (already done in the CTE above)

-- Load: Insert transformed data into the target table


INSERT INTO target_data (customer_name, total_purchase, purchase_count)
SELECT customer_name, total_purchase, purchase_count
FROM extracted_data;
• - MySQL solution
-- ETL Process in SQL

-- Extract: Fetch data from the source


WITH extracted_data AS (
SELECT customer_name, SUM(purchase_amount) AS total_purchase, COUNT(id) AS purchase_
count
FROM source_data
GROUP BY customer_name
)

-- Transform: Clean and aggregate the data (already done in the CTE above)

985
1000+ SQL Interview Questions & Answers | By Zero Analyst

-- Load: Insert transformed data into the target table


INSERT INTO target_data (customer_name, total_purchase, purchase_count)
SELECT customer_name, total_purchase, purchase_count
FROM extracted_data;
• Q.796
Question
What are the differences between the DELETE and TRUNCATE commands in SQL?
Explanation
The task is to highlight the key differences between the DELETE and TRUNCATE commands in
SQL. Both are used to remove data from a table, but they operate in different ways, with
different performance characteristics and behaviors.
Datasets and SQL Schemas
(Example of a simple table for illustration)
• - Table creation
CREATE TABLE Employees (
employee_id INT PRIMARY KEY,
employee_name VARCHAR(100),
department VARCHAR(50)
);
• - Sample datasets
INSERT INTO Employees (employee_id, employee_name, department)
VALUES
(1, 'John Doe', 'HR'),
(2, 'Jane Smith', 'Finance'),
(3, 'Michael Brown', 'IT');

Learnings
• DELETE: Removes rows one at a time and logs each deletion in the transaction log. Can
be rolled back, slower for large datasets.
• TRUNCATE: Removes all rows in the table without logging individual row deletions.
Faster but cannot be rolled back (unless in a transaction).
• DELETE can have conditions (WHERE clause), whereas TRUNCATE removes all rows
from the table.
• TRUNCATE may reset identity columns, while DELETE does not.
• TRUNCATE is a DDL command, whereas DELETE is a DML command.
Solutions
• - PostgreSQL solution
-- DELETE example (removes specific rows)
DELETE FROM Employees WHERE department = 'HR';

-- TRUNCATE example (removes all rows from the table)


TRUNCATE TABLE Employees;
• - MySQL solution
-- DELETE example (removes specific rows)
DELETE FROM Employees WHERE department = 'HR';

-- TRUNCATE example (removes all rows from the table)


TRUNCATE TABLE Employees;
• Q.797
Question

986
1000+ SQL Interview Questions & Answers | By Zero Analyst

What are the differences between OLAP (Online Analytical Processing) and OLTP (Online
Transaction Processing)?
Explanation
The task is to compare OLAP and OLTP systems, highlighting their primary use cases, data
models, and performance characteristics.
Learnings
• OLAP: Used for complex queries and analytics, typically in data warehouses. It supports
multidimensional analysis and is optimized for read-heavy operations (e.g., aggregations,
reporting).
• OLTP: Focused on transaction processing, handling day-to-day operations such as order
processing, inventory management, and user activities. It is optimized for fast, reliable insert,
update, and delete operations.
• OLAP systems typically involve star or snowflake schemas and deal with large amounts of
historical data, whereas OLTP systems use normalized schemas to ensure data integrity and
fast transaction processing.
• OLAP databases are usually read-heavy, while OLTP databases are write-heavy.
Key Differences

OLAP (Online Analytical OLTP (Online Transaction


Feature
Processing) Processing)

Analytical processing, reporting, Handling daily operations


Purpose
and querying large data sets (transactions)

Large, historical datasets for Smaller, current datasets with


Data Volume
analysis frequent updates

Query Complex queries with aggregations Simple queries with frequent


Complexity and joins inserts, updates

Star schema, snowflake schema Entity-relationship (normalized)


Data Modeling
(denormalized) schema

Generally no transactional Strong transactional consistency


Transactions
consistency (read-only) (ACID properties)

Optimized for read-heavy Optimized for high-speed


Performance
operations (large scans) insert/update/delete

Data warehousing, business E-commerce platforms, banking


Examples
intelligence applications systems, CRM
Solutions
• - OLAP query example (aggregating data)
SELECT department, SUM(sales_amount) AS total_sales

987
1000+ SQL Interview Questions & Answers | By Zero Analyst

FROM sales_data
GROUP BY department
ORDER BY total_sales DESC;
• - OLTP query example (inserting a transaction)
INSERT INTO orders (order_id, customer_id, product_id, quantity, order_date)
VALUES (101, 5, 3, 2, '2025-01-15');
• Q.798
Question
Explain the ACID properties in database systems.
Explanation
The ACID properties ensure that database transactions are processed reliably and adhere to
specific rules that guarantee data integrity and consistency, even in the face of system failures
or errors. ACID stands for Atomicity, Consistency, Isolation, and Durability.
Learnings
• Atomicity: Guarantees that all operations within a transaction are completed successfully.
If any operation fails, the entire transaction is rolled back.
• Consistency: Ensures that a transaction brings the database from one valid state to another,
maintaining all rules, constraints, and triggers.
• Isolation: Ensures that transactions are executed independently of one another. Even if
multiple transactions are occurring concurrently, each one will be isolated from others until it
is complete.
• Durability: Once a transaction is committed, it is permanent. Even in the event of a system
crash, the results of the transaction are preserved.
Solutions
• Atomicity:
Example: In a banking system, if a transfer involves debiting one account and crediting
another, both operations must succeed. If one fails, the whole transaction is rolled back (i.e.,
neither account is modified).
• Consistency:
Example: A transaction that deducts money from one account and adds it to another must
maintain the integrity of the account balances. If the transaction violates a constraint (e.g.,
negative balance), it is rolled back.
• Isolation:
Example: If two customers try to withdraw money from the same account simultaneously, the
transactions are isolated to prevent one from affecting the other. One transaction may have to
wait until the other finishes, ensuring correctness.
• Durability:
Example: Once a transaction that updates an order status is committed, even if the system
crashes right after, the status update will persist once the system is restored.
• Q.799
Question
What is Partitioning in databases, and why is it important for query performance?
Explanation

988
1000+ SQL Interview Questions & Answers | By Zero Analyst

The task is to explain database partitioning, which involves splitting a large table into
smaller, more manageable pieces (partitions). Partitioning helps improve query performance,
data management, and scalability by allowing operations to target specific subsets of data.
Learnings
• Partitioning: Splitting a table into smaller, distinct pieces (partitions) based on certain key
columns (e.g., date, region, etc.).
• It improves query performance by limiting the amount of data to scan during queries (e.g.,
partitioning by date allows querying only the relevant time period).
• Partitioning enhances data management, enabling better load balancing, parallel
processing, and more efficient data access.
• Types of Partitioning:
• Range Partitioning: Partitioned by a range of values (e.g., date ranges).
• List Partitioning: Partitioned by a specific list of values (e.g., regions, countries).
• Hash Partitioning: Data is divided based on a hash function applied to a column value.
Example
A large sales table can be partitioned by year, so each year’s data is stored in a separate
partition, improving query performance when filtering by date.
Solutions
• - Example of Range Partitioning (PostgreSQL)
CREATE TABLE sales (
sale_id INT PRIMARY KEY,
sale_date DATE,
amount DECIMAL
)
PARTITION BY RANGE (sale_date);

CREATE TABLE sales_2021 PARTITION OF sales FOR VALUES FROM ('2021-01-01') TO ('2021-12-3
1');
CREATE TABLE sales_2022 PARTITION OF sales FOR VALUES FROM ('2022-01-01') TO ('2022-12-
31');
• - Example of List Partitioning (MySQL)
CREATE TABLE employees (
employee_id INT PRIMARY KEY,
department VARCHAR(50),
salary DECIMAL(10, 2)
)
PARTITION BY LIST (department) (
PARTITION hr VALUES IN ('HR'),
PARTITION engineering VALUES IN ('Engineering'),
PARTITION finance VALUES IN ('Finance')
);
• Q.800
Question
How do you optimize SQL queries for better performance?
Explanation
The task is to explain how to optimize SQL queries to improve their execution speed, reduce
resource usage, and enhance overall database performance. Query optimization is crucial for
handling large datasets, complex queries, and ensuring faster response times.
Learnings

989
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Indexing: Create indexes on columns that are frequently queried (e.g., in WHERE, JOIN, or
ORDER BY clauses) to speed up data retrieval.
• *Avoiding SELECT ***: Select only the necessary columns rather than using SELECT * to
reduce the amount of data returned.
• Using Joins Efficiently: Use appropriate JOIN types (e.g., INNER JOIN, LEFT JOIN) and
ensure the join condition is indexed to minimize the number of rows processed.
• Query Refactoring: Rewrite complex queries to break them into smaller, more efficient
parts, using temporary tables or subqueries when necessary.
• Using LIMIT: When dealing with large result sets, use LIMIT to restrict the number of
rows returned.
• Proper Data Types: Use the smallest possible data types for your columns (e.g., using INT
instead of BIGINT when possible).
• Query Execution Plan: Use EXPLAIN (PostgreSQL) or EXPLAIN PLAN (MySQL) to
analyze the query execution plan and identify bottlenecks.
• Avoiding N+1 Problem: Use JOIN or IN to avoid making multiple queries in loops (N+1
queries).
Solutions
• - Example of Using Indexing (PostgreSQL/MySQL)
-- Create an index on a frequently queried column
CREATE INDEX idx_employee_name ON employees (employee_name);
• - Refactoring a Complex Query to Use JOIN Instead of Subqueries
-- Inefficient query with subquery
SELECT name, (SELECT department FROM departments WHERE employee_id = employees.id) AS de
pt
FROM employees;

-- Optimized query with JOIN


SELECT e.name, d.department
FROM employees e
JOIN departments d ON e.id = d.employee_id;
• - Using EXPLAIN to Analyze Query Performance (PostgreSQL)
EXPLAIN ANALYZE
SELECT employee_name, salary
FROM employees
WHERE department = 'HR';
• - Using LIMIT to Restrict Result Set (PostgreSQL/MySQL)
SELECT * FROM employees
ORDER BY hire_date DESC
LIMIT 100;

Key Takeaways
• Indexing improves data retrieval speed.
• Use JOIN instead of subqueries when possible.
• Limit the amount of data retrieved with LIMIT and specific column selection.
• Analyze the query execution plan to identify inefficiencies.

Business Analyst
• Q.801
Question
What is the difference between DELETE TABLE and TRUNCATE TABLE in SQL?
Explanation

990
1000+ SQL Interview Questions & Answers | By Zero Analyst

DELETE TABLE and TRUNCATE TABLE are both used to remove data from a table, but they
differ in how they operate, their impact on the transaction log, and their ability to be rolled
back.
• DELETE removes specific rows based on a condition and can be rolled back if wrapped in
a transaction. It logs individual row deletions, which can make it slower for large datasets.
• TRUNCATE removes all rows from a table, does not log individual row deletions, and is
generally faster. It does not allow for row-specific conditions and may not be rolled back in
some database systems unless in a transaction.
Learnings
• DELETE can be rolled back (if not committed), supports WHERE clauses, and operates row-
by-row, generating a more extensive transaction log.
• TRUNCATE cannot selectively delete rows (no WHERE clause), removes all rows quickly,
and has minimal logging, making it faster but less flexible.
• DELETE maintains the table structure and any associated constraints like foreign keys,
while TRUNCATE can reset identity columns, and may not enforce certain constraints
depending on the DBMS.
Key Differences

Feature DELETE TABLE TRUNCATE TABLE

Removes rows one at a time Removes all rows (cannot use


Data Deletion
(supports WHERE clause) WHERE)

Transaction Logs minimal information (only


Logs each row deletion
Logging table structure)

Can be rolled back if wrapped Cannot be rolled back in many DBs


Rollback
in a transaction unless in a transaction

Slower, especially with large Faster, especially with large


Performance
datasets datasets

Foreign Key Does not affect foreign keys Cannot be used if foreign keys exist
Constraints unless explicitly handled unless temporarily removed

Does not reset auto- Resets auto-increment/identity


Reset Identity
increment/identity columns columns

Trigger Triggers (DELETE) are


Triggers are not activated
Activation activated
Solutions
• - DELETE example
-- Delete specific rows based on a condition
DELETE FROM employees WHERE department = 'HR';
• - TRUNCATE example

991
1000+ SQL Interview Questions & Answers | By Zero Analyst

-- Remove all rows from the table


TRUNCATE TABLE employees;
• Q.802
Question
Write an SQL query to select all records of employees with last names between 'Bailey' and
'Frederick'.
Explanation
This task requires selecting records where the last name of employees falls within a specified
alphabetical range. To achieve this, we can use the BETWEEN operator, which helps in filtering
values within a certain range, inclusive of the boundary values.
Datasets and SQL Schemas
• - Table creation
CREATE TABLE employees (
employee_id INT PRIMARY KEY,
first_name VARCHAR(100),
last_name VARCHAR(100),
department VARCHAR(50)
);
• - Sample datasets
INSERT INTO employees (employee_id, first_name, last_name, department)
VALUES
(1, 'John', 'Adams', 'HR'),
(2, 'Jane', 'Bailey', 'Finance'),
(3, 'Michael', 'Frederick', 'IT'),
(4, 'Emily', 'Cook', 'Marketing'),
(5, 'Anna', 'Davis', 'Finance');

Learnings
• The BETWEEN operator is used to filter data within a range (inclusive of the boundary
values).
• Alphabetical ranges with strings are based on the lexicographical (dictionary) order.
• In this case, the query will select all employees with last names starting from 'Bailey' to
'Frederick', including both 'Bailey' and 'Frederick'.
Solution
• - SQL query to select records within the specified range
SELECT * FROM employees
WHERE last_name BETWEEN 'Bailey' AND 'Frederick';

Key Takeaways
• The BETWEEN operator can be used for both numerical and string ranges.
• When used with strings, BETWEEN compares lexicographical order (alphabetical order).
• This query will return all records where the last_name is between 'Bailey' and 'Frederick',
inclusive.
• Q.803
Question
Write an SQL query to find the year from a YYYY-MM-DD date.
Explanation

992
1000+ SQL Interview Questions & Answers | By Zero Analyst

This task involves extracting the year part from a DATE column that is in the format YYYY-MM-
DD. You can use built-in SQL date functions to extract specific parts of a date, such as the
YEAR() function, which is available in most SQL databases.

Datasets and SQL Schemas


• - Table creation
CREATE TABLE orders (
order_id INT PRIMARY KEY,
order_date DATE,
total_amount DECIMAL(10, 2)
);
• - Sample datasets
INSERT INTO orders (order_id, order_date, total_amount)
VALUES
(1, '2023-06-15', 250.75),
(2, '2024-01-10', 120.50),
(3, '2025-03-21', 300.00);

Learnings
• SQL provides date functions like YEAR(), MONTH(), and DAY() to extract specific parts of a
date.
• Extracting the year is useful for time-based analysis, like grouping orders by year or
filtering by a specific year.
• The YEAR() function works for DATE, DATETIME, and TIMESTAMP columns.
Solution
• - SQL query to extract the year from a DATE
SELECT order_id, order_date, YEAR(order_date) AS order_year
FROM orders;

Key Takeaways
• The YEAR() function extracts the year from a DATE value.
• The query will return the year from each order_date in the orders table.
• The output column order_year will contain only the year part of the date (e.g., 2023,
2024, 2025).
• Q.804
Question
Write an SQL query to select the second highest salary in the engineering department.
Explanation
This task involves selecting the second highest salary from the employees table, specifically
for the "Engineering" department. To achieve this, you can use different approaches, such as
using the LIMIT with ORDER BY, or using subqueries with DISTINCT and MAX(). The most
efficient solution often involves using ROW_NUMBER() or a similar window function (if
supported by the database).
Datasets and SQL Schemas
• - Table creation
CREATE TABLE employees (
employee_id INT PRIMARY KEY,
first_name VARCHAR(100),
last_name VARCHAR(100),
department VARCHAR(50),
salary DECIMAL(10, 2)
);

993
1000+ SQL Interview Questions & Answers | By Zero Analyst

• - Sample datasets
INSERT INTO employees (employee_id, first_name, last_name, department, salary)
VALUES
(1, 'John', 'Doe', 'Engineering', 80000),
(2, 'Jane', 'Smith', 'Engineering', 95000),
(3, 'Alice', 'Johnson', 'Marketing', 70000),
(4, 'Bob', 'Brown', 'Engineering', 85000),
(5, 'Charlie', 'Davis', 'Engineering', 90000);

Learnings
• You can find the second highest salary using a subquery, DISTINCT, LIMIT, or window
functions (e.g., ROW_NUMBER()).
• The common method involves selecting the maximum salary from the set of salaries that
are less than the highest salary.
Solutions

Using Subquery to Find the Second Highest Salary (Without Window


Functions)
SELECT MAX(salary) AS second_highest_salary
FROM employees
WHERE department = 'Engineering' AND salary < (SELECT MAX(salary) FROM employees WHERE d
epartment = 'Engineering');

Using ROW_NUMBER() (Preferred for Larger Data or Advanced SQL Databases)


WITH RankedSalaries AS (
SELECT salary, ROW_NUMBER() OVER (ORDER BY salary DESC) AS rank
FROM employees
WHERE department = 'Engineering'
)
SELECT salary AS second_highest_salary
FROM RankedSalaries
WHERE rank = 2;

Using LIMIT (For MySQL and PostgreSQL)


SELECT salary AS second_highest_salary
FROM employees
WHERE department = 'Engineering'
ORDER BY salary DESC
LIMIT 1 OFFSET 1;

Key Takeaways
• The subquery approach is simple and effective but can be less efficient with larger datasets.
• Using ROW_NUMBER() is ideal for more complex use cases or large datasets, where you
need to rank rows.
• LIMIT with OFFSET is an easy solution for finding the second highest value in
MySQL/PostgreSQL.
• Q.805
Question
What is the PRIMARY KEY in SQL?
Explanation
A PRIMARY KEY in SQL is a constraint that uniquely identifies each record in a table. It
ensures that no two rows have the same value for the primary key columns and that the
primary key columns cannot contain NULL values. Each table can have only one PRIMARY

994
1000+ SQL Interview Questions & Answers | By Zero Analyst

KEY, and this key can consist of a single column or a combination of multiple columns
(composite primary key).
Learnings
• A PRIMARY KEY must have unique values for each record in the table.
• It does not allow NULL values in any of the columns that are part of the primary key.
• Each table can only have one PRIMARY KEY, but the key can span multiple columns
(composite key).
• A PRIMARY KEY automatically creates a unique index on the column(s) to enforce
uniqueness and speed up queries.
• The primary key is often used in relationships (e.g., as a foreign key in other tables).
Solution
• - Table creation with a PRIMARY KEY on a single column
CREATE TABLE employees (
employee_id INT PRIMARY KEY,
first_name VARCHAR(100),
last_name VARCHAR(100),
department VARCHAR(50)
);
• - Table creation with a composite PRIMARY KEY (multiple columns)
CREATE TABLE project_assignments (
project_id INT,
employee_id INT,
PRIMARY KEY (project_id, employee_id)
);

Key Takeaways
• The PRIMARY KEY constraint enforces uniqueness and prevents NULL values.
• A table can only have one PRIMARY KEY, but it can be made up of one or more columns.
• It is crucial for maintaining data integrity and is often used to establish relationships
between tables.
• Q.806
Question
What is the difference between INNER JOIN and OUTER JOIN?
Explanation
The difference between INNER JOIN and OUTER JOIN lies in the way they handle unmatched
rows between the tables being joined.
• INNER JOIN: Returns only the rows where there is a match in both tables. If a row from
one table does not have a corresponding match in the other table, it is excluded from the
result set.
• OUTER JOIN: Returns all rows from one table, and the matching rows from the other
table. If there is no match, NULL values are returned for columns of the table without a match.
There are three types of OUTER JOIN:
• LEFT OUTER JOIN (LEFT JOIN): Returns all rows from the left table and the matched
rows from the right table.
• RIGHT OUTER JOIN (RIGHT JOIN): Returns all rows from the right table and the
matched rows from the left table.

995
1000+ SQL Interview Questions & Answers | By Zero Analyst

• FULL OUTER JOIN: Returns all rows from both tables, with matching rows where
available. If there is no match, NULL values are returned for the columns of the table without a
match.
Learnings
• INNER JOIN returns only the intersecting data between two tables.
• OUTER JOIN returns all data from one table, and for the unmatched rows from the other,
it fills in NULL.
• LEFT OUTER JOIN includes all rows from the left table, and only matching rows from
the right table.
• RIGHT OUTER JOIN includes all rows from the right table, and only matching rows
from the left table.
• FULL OUTER JOIN includes all rows from both tables, filling in NULL where no match
exists.
Key Differences

Feature INNER JOIN OUTER JOIN

Match Only rows with matching All rows from one table, and matched rows
Criteria values in both tables from the other

Does not include rows Includes rows without a match, filling in


Null Values
without a match NULL for unmatched columns

Types One type only LEFT, RIGHT, FULL OUTER JOIN


Solutions
• - INNER JOIN example
SELECT employees.employee_id, employees.first_name, departments.department_name
FROM employees
INNER JOIN departments ON employees.department_id = departments.department_id;
• - LEFT OUTER JOIN example (returns all employees, including those without a
department)
SELECT employees.employee_id, employees.first_name, departments.department_name
FROM employees
LEFT OUTER JOIN departments ON employees.department_id = departments.department_id;
• - RIGHT OUTER JOIN example (returns all departments, including those with no
employees)
SELECT employees.employee_id, employees.first_name, departments.department_name
FROM employees
RIGHT OUTER JOIN departments ON employees.department_id = departments.department_id;
• - FULL OUTER JOIN example (returns all employees and all departments, with NULL
where there is no match)
SELECT employees.employee_id, employees.first_name, departments.department_name
FROM employees
FULL OUTER JOIN departments ON employees.department_id = departments.department_id;

Key Takeaways
• INNER JOIN is used when you only want the rows that exist in both tables.
• OUTER JOIN is used when you want to include all rows from one table, and matching
rows from the other table, with NULL for unmatched rows.

996
1000+ SQL Interview Questions & Answers | By Zero Analyst

• The type of OUTER JOIN determines which table’s rows are always included (LEFT,
RIGHT, or FULL).
• Q.807
Question
How would you find the total sales for each product category?
Explanation
To find the total sales for each product category, you can use the SUM() aggregate function to
sum up the sales for each category. The GROUP BY clause is used to group the results by
product category, so the total sales are calculated for each individual category.
Datasets and SQL Schemas
• - Table creation
CREATE TABLE products (
product_id INT PRIMARY KEY,
product_name VARCHAR(100),
category VARCHAR(50),
price DECIMAL(10, 2)
);

CREATE TABLE sales (


sale_id INT PRIMARY KEY,
product_id INT,
quantity INT,
sale_date DATE,
FOREIGN KEY (product_id) REFERENCES products (product_id)
);
• - Sample datasets
-- Inserting sample data into products table
INSERT INTO products (product_id, product_name, category, price)
VALUES
(1, 'Laptop', 'Electronics', 1200.00),
(2, 'Smartphone', 'Electronics', 800.00),
(3, 'Blender', 'Home Appliances', 150.00),
(4, 'Washing Machine', 'Home Appliances', 500.00);

-- Inserting sample data into sales table


INSERT INTO sales (sale_id, product_id, quantity, sale_date)
VALUES
(1, 1, 5, '2023-01-10'),
(2, 2, 3, '2023-01-12'),
(3, 3, 10, '2023-02-20'),
(4, 4, 7, '2023-02-22'),
(5, 1, 2, '2023-03-05');

Learnings
• The SUM() function is used to calculate the total sales for each category.
• The GROUP BY clause groups the data by category, so that the sum is computed for each
category individually.
• You can combine multiple tables with JOIN if sales data and product details are in different
tables.
Solution
SELECT p.category, SUM(s.quantity * p.price) AS total_sales
FROM sales s
JOIN products p ON s.product_id = p.product_id
GROUP BY p.category;

Key Takeaways

997
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Use SUM() to calculate the total sales.


• Use GROUP BY to group the data by a specific column, such as category.
• The query joins the sales table with the products table to get the price for each product
and then calculates the total sales (quantity * price) for each category.
• Q.808
Question
How would you use a CASE statement in SQL?
Explanation
The CASE statement in SQL is used to create conditional logic within a query, allowing you to
return different values based on specific conditions. It can be used in SELECT, UPDATE,
DELETE, and ORDER BY clauses. The CASE statement operates like an "IF-ELSE" logic in SQL.

• Simple CASE: Evaluates an expression and compares it to various conditions.


• Searched CASE: Evaluates a series of conditions (logical expressions) and returns a result
based on the first condition that is true.
Learnings
• CASE allows for conditional logic in queries, making it powerful for data transformation.
• The CASE statement can handle multiple conditions and return different results based on
them.
• It can be used in both aggregate and non-aggregate queries.
Solution

Example 1: Using CASE in a SELECT Statement


• - Example: Classifying employees based on salary ranges
SELECT employee_id, first_name, salary,
CASE
WHEN salary < 50000 THEN 'Low'
WHEN salary BETWEEN 50000 AND 100000 THEN 'Medium'
WHEN salary > 100000 THEN 'High'
ELSE 'Unknown'
END AS salary_classification
FROM employees;

In this example, the CASE statement classifies employees into 'Low', 'Medium', or 'High'
salary categories based on their salary.

Example 2: Using CASE in an UPDATE Statement


• - Example: Updating employee bonus based on performance rating
UPDATE employees
SET bonus = CASE
WHEN performance_rating = 'Excellent' THEN 1000
WHEN performance_rating = 'Good' THEN 500
ELSE 0
END
WHERE department = 'Engineering';

Here, the CASE statement updates the bonus for employees in the "Engineering" department
based on their performance rating.

Example 3: Using CASE in an ORDER BY Clause


• - Example: Sorting employees by salary range (Low to High, Medium, High)
SELECT employee_id, first_name, salary

998
1000+ SQL Interview Questions & Answers | By Zero Analyst

FROM employees
ORDER BY
CASE
WHEN salary < 50000 THEN 1
WHEN salary BETWEEN 50000 AND 100000 THEN 2
WHEN salary > 100000 THEN 3
END;

In this case, the CASE statement is used in the ORDER BY clause to prioritize rows based on
salary ranges.
Key Takeaways
• The CASE statement provides conditional logic within SQL queries, helping transform or
classify data based on specific conditions.
• The CASE expression can be used in SELECT, UPDATE, and ORDER BY clauses.
• Simple CASE compares a column or expression with fixed values, while Searched CASE
evaluates multiple conditions.
• Q.809
Question
What is the difference between WHERE and HAVING clauses in SQL?
Explanation
The WHERE and HAVING clauses are both used to filter data, but they are used in different
contexts:
• WHERE: Filters rows before any grouping or aggregation is applied. It is used to filter
data at the individual row level and can be used with all types of queries, including those
without aggregation.
• HAVING: Filters data after aggregation or grouping has been performed. It is used to filter
groups of rows (i.e., after GROUP BY), based on aggregated values like COUNT(), SUM(),
AVG(), etc.
Learnings
• WHERE filters data before aggregation and can be used on individual rows.
• HAVING filters data after aggregation and is used to filter groups created by GROUP BY.
• The WHERE clause cannot be used with aggregate functions (like SUM(), AVG()), but HAVING
can.
Key Differences

Feature WHERE HAVING

Filters rows before Filters groups after aggregation


Usage
aggregation (row-level). (group-level).

Aggregate Cannot be used with Can be used with aggregate functions


Functions aggregate functions. like COUNT(), SUM().

Entire dataset or individual


Applies To Groups of rows (used with GROUP BY).
rows.

999
1000+ SQL Interview Questions & Answers | By Zero Analyst

Before GROUP BY and


When Used After GROUP BY and aggregation.
aggregation.
Solutions

Example 1: Using WHERE to Filter Rows Before Aggregation


SELECT product_id, price
FROM products
WHERE price > 100;

In this example, the WHERE clause filters products with a price greater than 100 before any
aggregation.

Example 2: Using HAVING to Filter Groups After Aggregation


SELECT category, AVG(price) AS average_price
FROM products
GROUP BY category
HAVING AVG(price) > 50;

Here, the HAVING clause filters categories where the average price is greater than 50, after
grouping the products by category.

Key Takeaways
• WHERE filters individual rows based on conditions and is applied before any aggregation.
• HAVING filters the result of aggregations and is used after GROUP BY.
• Q.810
Question
How can subqueries be used in SQL?
Explanation
A subquery is a query nested inside another query. It can be used to:
• Retrieve a single value or multiple values.
• Provide results to be used by the main query.
• Be used in various clauses, including SELECT, FROM, WHERE, and HAVING.
Subqueries can be categorized into:
• Scalar Subquery: Returns a single value (used in SELECT, WHERE, etc.).
• Correlated Subquery: Depends on values from the outer query and is evaluated once for
each row.
• Non-correlated Subquery: Independent of the outer query and is executed only once.
• Inline Subquery: Used in the FROM clause to act as a derived table or view.
Learnings
• Subqueries can be used in SELECT, WHERE, FROM, and HAVING clauses.
• Scalar subqueries return single values, while multi-row subqueries can return multiple
values for use in comparison.
• Correlated subqueries reference the outer query, whereas non-correlated subqueries do
not.
• Subqueries can simplify complex queries by breaking them down into smaller parts.
Solutions

1000
1000+ SQL Interview Questions & Answers | By Zero Analyst

Example 1: Using a Subquery in the WHERE Clause


• - Find employees who earn more than the average salary
SELECT employee_id, first_name, salary
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);

In this case, the subquery (SELECT AVG(salary) FROM employees) calculates the average
salary, and the main query returns employees whose salary is greater than the calculated
average.

Example 2: Using a Subquery in the FROM Clause (Inline View)


• - Find the average salary for each department using a subquery in the FROM clause
SELECT department, AVG(salary) AS avg_salary
FROM (SELECT department, salary FROM employees) AS dept_salary
GROUP BY department;

Here, the subquery (SELECT department, salary FROM employees) is used to filter data
before calculating the average salary for each department.

Example 3: Using a Correlated Subquery


• - Find employees whose salary is higher than the average salary for their department
SELECT employee_id, first_name, salary, department
FROM employees e
WHERE salary > (SELECT AVG(salary)
FROM employees
WHERE department = e.department);

In this correlated subquery, the inner query references the outer query’s department value
(e.department) to calculate the average salary for that department.

Key Takeaways
• Subqueries are queries embedded within another query and can be used in various clauses
like SELECT, WHERE, FROM, and HAVING.
• Scalar subqueries return a single value, whereas multi-row subqueries return multiple
values.
• Correlated subqueries depend on the outer query and are evaluated for each row, while
non-correlated subqueries are independent and run once.
• Subqueries can make complex queries more readable by breaking them into smaller logical
parts.

SQL Developer
• Q.811
Question
How do you find duplicate records in a table?
Answer
To find duplicate records in a table, you can use the GROUP BY clause to group rows based on
the columns that might have duplicates, and the HAVING clause to filter groups that appear
more than once.

Example: Find duplicate rows based on a single column


To find employees with duplicate first names:

1001
1000+ SQL Interview Questions & Answers | By Zero Analyst

SELECT first_name, COUNT(*) AS duplicate_count


FROM employees
GROUP BY first_name
HAVING COUNT(*) > 1;

Example: Find duplicate rows based on multiple columns


To find duplicate records where both first_name and last_name are the same:
SELECT first_name, last_name, COUNT(*) AS duplicate_count
FROM employees
GROUP BY first_name, last_name
HAVING COUNT(*) > 1;

Key Points:
• GROUP BY groups records based on the columns specified.
• HAVING COUNT(*) > 1 filters groups that have more than one occurrence, identifying
duplicates.
• Q.812
Question
What are indexes, and how do they improve query performance?
Answer
Indexes are database objects that improve the speed of data retrieval operations on a table at
the cost of additional storage space and slower write operations. They work similarly to an
index in a book, allowing the database to quickly locate rows based on a specified column or
set of columns.

How Indexes Improve Query Performance:


• Faster Data Retrieval: Indexes speed up SELECT queries by providing quick access to
rows, especially for large tables, by reducing the need for full table scans.
• Efficient Sorting and Filtering: They optimize operations like WHERE, ORDER BY, and
JOIN by allowing quick lookup of values.
• Reduced Search Time: Instead of scanning the entire table, indexes allow the database to
use the indexed structure (like a B-tree or hash) to quickly find the desired rows.

Trade-offs:
• Increased Storage: Indexes take up additional disk space because they maintain a separate
data structure.
• Slower Write Operations: INSERT, UPDATE, and DELETE operations are slower because the
index must also be updated every time data is modified. This increases the overhead.

Example: Creating an Index


CREATE INDEX idx_employee_name ON employees(first_name, last_name);

Key Points:
• Indexes are used to improve read performance (SELECT queries) by speeding up data
retrieval.
• They can slow down write operations (INSERT, UPDATE, DELETE) because the index needs
to be updated.

1002
1000+ SQL Interview Questions & Answers | By Zero Analyst

• It's important to balance indexing based on the specific workload and query patterns of the
application.
• Q.813
Question
What is a subquery, and how is it different from a join?
Answer
A subquery is a query nested inside another query. It can return a single value, a set of
values, or a table of results, and is often used in the SELECT, WHERE, or FROM clauses. A join,
on the other hand, is used to combine data from two or more tables based on a related
column.

Differences between Subqueries and Joins:


• Subqueries:
• Can be used in SELECT, WHERE, or FROM clauses.
• Often used for filtering data based on the result of another query (e.g., finding records
based on aggregate values).
• Can be correlated (referencing columns from the outer query) or non-correlated
(independent of the outer query).
• Joins:
• Combine rows from two or more tables based on a related column (e.g., matching foreign
keys).
• They typically return a larger result set by matching rows from each table.
• Joins are usually more efficient than subqueries for combining large sets of data because
they avoid repeated query execution.

Scenarios Where Subqueries Can Be More Efficient:


• When you need a single value or a result for comparison (e.g., using WHERE clause):
Subqueries can be more efficient when you only need to filter data based on a single result or
a set of aggregated values.

Example: Subquery for Filtering Data


Find employees who earn more than the average salary:
SELECT employee_id, first_name, salary
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);

Example: Join for Combining Data


Find employees and their department names:
SELECT e.employee_id, e.first_name, d.department_name
FROM employees e
JOIN departments d ON e.department_id = d.department_id;

Key Points:
• Subqueries are useful for filtering or computing values that will be used by the outer
query, especially when a direct relationship isn't needed.
• Joins are generally more efficient when you need to combine data from multiple tables and
work with large datasets.

1003
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Subqueries can be more readable and useful for certain operations (like filtering based on
aggregated results), while joins are better suited for combining rows across multiple tables.
• Q.814
Question
What is a stored procedure, and how does it differ from a function?
Answer
A stored procedure is a precompiled collection of SQL statements stored in the database
that can be executed by the database engine. It can perform operations like querying,
updating, and deleting data, as well as executing complex business logic. A function is
similar but typically designed to return a value and is often used in expressions or queries.

Key Differences between Stored Procedures and Functions:


• Stored Procedure:
• Purpose: Primarily used to execute a series of SQL statements that perform operations
such as inserting, updating, or deleting data, or executing complex logic.
• Return Type: Does not return a value directly. It can use output parameters or return
status codes, but it is not designed to return a result that can be used in a SQL query.
• Usage: Can be called using EXEC or CALL. It can include complex logic like loops,
conditionals, and error handling.
• Function:
• Purpose: Primarily used to compute and return a single value or result (e.g., a scalar value
or table) that can be used in SQL queries, expressions, or other functions.
• Return Type: Must return a single value (such as INT, VARCHAR, DECIMAL, etc.) or a table.
• Usage: Can be used in SELECT, WHERE, ORDER BY, or other SQL clauses directly in a query.

Example: Stored Procedure


A stored procedure to update an employee's salary:
CREATE PROCEDURE UpdateSalary(IN emp_id INT, IN new_salary DECIMAL)
BEGIN
UPDATE employees SET salary = new_salary WHERE employee_id = emp_id;
END;

You can execute it like this:


CALL UpdateSalary(1, 50000);

Example: Function
A function to calculate the total salary of employees in a department:
CREATE FUNCTION GetDepartmentSalary(department_id INT)
RETURNS DECIMAL
BEGIN
DECLARE total_salary DECIMAL;
SELECT SUM(salary) INTO total_salary FROM employees WHERE department_id = department
_id;
RETURN total_salary;
END;

You can use it in a query like this:


SELECT department_name, GetDepartmentSalary(department_id) AS total_salary
FROM departments;

1004
1000+ SQL Interview Questions & Answers | By Zero Analyst

Key Points:
• Stored Procedures: Used for executing operations that may or may not return data; they
don’t return values directly in queries.
• Functions: Used to calculate and return a value that can be directly used in SQL queries or
expressions.
• Stored Procedures allow for more complex operations, including multiple SQL
statements, control flow, and error handling, while Functions are generally simpler and
designed to return a single value.
• Q.815
Question
What is the purpose of the GROUP BY clause? Provide an example.
Answer
The GROUP BY clause in SQL is used to group rows that have the same values in specified
columns into summary rows, typically for the purpose of aggregation. It allows you to
aggregate data based on one or more columns using aggregate functions like COUNT(), SUM(),
AVG(), MIN(), and MAX().

Purpose of GROUP BY:


• It groups data based on one or more columns.
• It allows you to perform aggregate calculations (e.g., sums, averages, counts) on each
group of rows.
• It is used in conjunction with aggregate functions to get summarized data from detailed
records.

Example:
Find the total sales for each product category in a sales table.
SELECT category, SUM(sales_amount) AS total_sales
FROM sales
GROUP BY category;

In this example:
• category is the column we group by.
• SUM(sales_amount) calculates the total sales for each product category.
• The query returns one row for each unique product category, showing the total sales
amount for that category.

Key Points:
• GROUP BY groups rows based on specified columns.
• It is used to perform aggregate functions on groups of rows.
• It simplifies data analysis by summarizing data at different levels (e.g., by department,
product, or date).
• Q.816
Question
What are window functions in SQL? Provide examples.
Answer

1005
1000+ SQL Interview Questions & Answers | By Zero Analyst

Window functions in SQL are special types of functions that perform calculations across a
set of table rows related to the current row, but unlike aggregate functions, they do not group
the result set into a single output row. Window functions allow you to perform calculations
like ranking, running totals, and moving averages without collapsing the result set.

Purpose of Window Functions:


• They perform calculations across rows related to the current row (called a "window").
• They allow you to keep the result set intact while providing valuable insights like ranking
or cumulative sums.
• They are typically used with the OVER() clause, which defines the window or partition of
data the function will operate on.

Common Window Functions:


• ROW_NUMBER(): Assigns a unique row number to each row in the result set.
• RANK(): Assigns a rank to each row, with gaps between ranks for tied rows.
• LEAD(): Provides access to a row's data that follows the current row in the result set.
• LAG(): Provides access to a row's data that precedes the current row.

Examples:

1. ROW_NUMBER(): Assign a unique row number to each employee ordered


by salary
SELECT employee_id, first_name, salary,
ROW_NUMBER() OVER (ORDER BY salary DESC) AS row_num
FROM employees;
• This assigns a unique number (row_num) to each employee based on their salary in
descending order.

2. RANK(): Rank employees based on their salary, with gaps for ties
SELECT employee_id, first_name, salary,
RANK() OVER (ORDER BY salary DESC) AS salary_rank
FROM employees;
• This ranks employees by salary in descending order. If two employees have the same
salary, they will share the same rank, and the next rank will be skipped (e.g., two employees
ranked 1, then the next employee will be ranked 3).

3. LEAD(): Get the salary of the next employee in the result set
SELECT employee_id, first_name, salary,
LEAD(salary) OVER (ORDER BY salary DESC) AS next_salary
FROM employees;
• This provides the salary of the next employee in the list, ordered by salary in descending
order. If there is no next row, it returns NULL.

4. LAG(): Get the salary of the previous employee in the result set
SELECT employee_id, first_name, salary,
LAG(salary) OVER (ORDER BY salary DESC) AS previous_salary
FROM employees;
• This provides the salary of the previous employee in the list, ordered by salary in
descending order. If there is no previous row, it returns NULL.

Key Points:

1006
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Window functions allow you to perform complex calculations across rows while keeping
the individual row data intact.
• They are used with the OVER() clause, which defines how the window or partition is
created (e.g., by ordering or partitioning data).
• Window functions like ROW_NUMBER(), RANK(), LEAD(), and LAG() are especially useful
for tasks like ranking, calculating running totals, or comparing values across rows in a result
set.
• Q.817
Question:
What are normalization and denormalization in database design?
Answer:
• Normalization: The process of organizing a database to reduce redundancy and
dependency by dividing large tables into smaller, manageable ones and using relationships
between them. The goal is to minimize data duplication and ensure data integrity.
• Types: Includes several normal forms (1NF, 2NF, 3NF, BCNF, etc.), with each successive
form removing specific types of redundancy and dependency.
• Denormalization: The process of combining tables or introducing redundancy back into a
database to improve read performance. Denormalization is used when performance is
prioritized over data integrity, often in cases like data warehousing.
Key Points:
• Normalization improves data integrity and reduces redundancy.
• Denormalization improves read performance by reducing the complexity of joins but
increases redundancy and the potential for anomalies during updates.
• Q.818
Question:
What is the difference between CHAR and VARCHAR data types?
Answer:
• CHAR (Fixed-Length String): Stores a string of fixed length. If the string is shorter than
the defined length, the remaining space is padded with spaces.
• Example: CHAR(10) will always take up 10 bytes, even if the string stored is only 5
characters long.
• VARCHAR (Variable-Length String): Stores a string with a length that can vary. It only
uses as much space as needed to store the string, plus a small amount for length information.
• Example: VARCHAR(10) will store a string of up to 10 characters, but it will only use as
many bytes as needed for the actual string length.
Key Points:
• CHAR is useful when you know the string length will always be fixed (e.g., country codes,
phone numbers).
• VARCHAR is more efficient when string length varies because it saves storage space.
• Q.819
What is the purpose of using foreign keys in database design?
Answer:

1007
1000+ SQL Interview Questions & Answers | By Zero Analyst

A foreign key is a column (or combination of columns) in a table that refers to the primary
key of another table. It establishes a relationship between the two tables, ensuring referential
integrity.
Purpose:
• Referential Integrity: Ensures that a foreign key value must match a value in the
referenced table’s primary key or be NULL.
• Relationship Representation: Used to define relationships between tables (e.g., one-to-
many, many-to-many).
Example: In an orders table, the customer_id can be a foreign key that references the id
field in the customers table, ensuring that every order corresponds to an existing customer.
Key Points:
• Foreign keys prevent invalid data by ensuring that relationships between tables are valid.
• They help maintain consistent and accurate data across multiple tables.
• Q.820
Question:
What is the difference between UNION and UNION ALL in SQL?
Answer:
• UNION: Combines the results of two or more SELECT queries and removes duplicate rows
from the final result set. The result will only include unique rows.
• Example: SELECT column1 FROM table1 UNION SELECT column1 FROM table2;
• UNION ALL: Combines the results of two or more SELECT queries but does not remove
duplicates. It returns all rows, including duplicates.
• Example: SELECT column1 FROM table1 UNION ALL SELECT column1 FROM table2;
Key Points:
• UNION removes duplicates and may be slower due to the extra overhead of eliminating
duplicates.
• UNION ALL is faster because it does not perform duplicate removal, but it may return
duplicate rows.

Full Stack Developer


• Q.821
Question:
What is the difference between DELETE, TRUNCATE, and DROP in SQL?
Answer:
• DELETE: Removes rows from a table based on a condition (WHERE clause). It is a DML
operation and can be rolled back if wrapped in a transaction. It does not remove the table
structure.
• Example: DELETE FROM employees WHERE employee_id = 5;
• TRUNCATE: Removes all rows from a table without logging individual row deletions. It
is faster than DELETE but cannot be rolled back in most databases (non-transactional). It resets
any auto-increment counters.
• Example: TRUNCATE TABLE employees;

1008
1000+ SQL Interview Questions & Answers | By Zero Analyst

• DROP: Completely removes a table from the database, including its structure, data, and
any associated indexes or constraints. This action cannot be rolled back.
• Example: DROP TABLE employees;
Key Points:
• DELETE: Selective row deletion, can be rolled back.
• TRUNCATE: Removes all rows quickly, but cannot be rolled back in most systems.
• DROP: Removes the table entirely, including structure and data.
• Q.822

Question:
What is the difference between INNER JOIN and LEFT JOIN?
Answer:
• INNER JOIN: Returns only the rows where there is a match in both tables. If there is no
match, those rows are excluded.
• LEFT JOIN (or LEFT OUTER JOIN): Returns all rows from the left table, and the
matching rows from the right table. If no match exists, the result is NULL on the right side.
Example:
-- INNER JOIN
SELECT a.id, b.name
FROM users a
INNER JOIN orders b ON a.id = b.user_id;

-- LEFT JOIN
SELECT a.id, b.name
FROM users a
LEFT JOIN orders b ON a.id = b.user_id;
• Q.823

Question:
How do you optimize a slow SQL query?
Answer:
To optimize a slow SQL query:
• Indexing: Create indexes on columns used in WHERE, JOIN, and ORDER BY clauses.
• Query Refactoring: Rewrite complex subqueries as joins or vice versa.
• Limit Data: Use LIMIT or TOP to return only necessary rows.
• Analyze Execution Plan: Use EXPLAIN (PostgreSQL) or EXPLAIN PLAN (Oracle) to check
how SQL is executed and identify bottlenecks.
• *Avoid SELECT *** : Select only the columns you need.
• Q.824

Question:
What is a JOIN in SQL, and can you name different types of joins?
Answer:
A JOIN is used to combine rows from two or more tables based on a related column between
them.

1009
1000+ SQL Interview Questions & Answers | By Zero Analyst

Types of Joins:
• INNER JOIN: Returns rows when there is a match in both tables.
• LEFT JOIN (OUTER JOIN): Returns all rows from the left table and matched rows from
the right table.
• RIGHT JOIN (OUTER JOIN): Returns all rows from the right table and matched rows
from the left table.
• FULL JOIN (OUTER JOIN): Returns rows when there is a match in one of the tables.
• CROSS JOIN: Returns the Cartesian product of two tables (all combinations of rows).
• Q.825

Question:
What is the purpose of the GROUP BY clause in SQL?
Answer:
The GROUP BY clause is used to group rows that have the same values in specified columns
into summary rows, typically for aggregation. It is often used with aggregate functions like
COUNT(), SUM(), AVG(), MIN(), and MAX() to perform calculations on each group.

Example:
SELECT department, COUNT(*) AS total_employees
FROM employees
GROUP BY department;
• Q.826

Question:
What is the difference between HAVING and WHERE clauses in SQL?
Answer:
• WHERE: Filters rows before any grouping is done. It is used for filtering individual rows
based on conditions.
• HAVING: Filters groups after the GROUP BY operation. It is used to filter the results of
aggregated data.
Example:
-- WHERE (before grouping)
SELECT employee_id, salary
FROM employees
WHERE salary > 50000;

-- HAVING (after grouping)


SELECT department, AVG(salary)
FROM employees
GROUP BY department
HAVING AVG(salary) > 60000;
• Q.827
Question:
What is a subquery in SQL, and how is it different from a join?
Answer:
• A subquery is a query nested inside another query. It can return a single value, multiple
values, or even a table.

1010
1000+ SQL Interview Questions & Answers | By Zero Analyst

• A JOIN combines data from two or more tables based on a related column, returning a
new result set with columns from both tables.
Difference:
• Subqueries are useful for filtering or performing calculations, while joins are used to
combine data from multiple tables into a single result set.
Example:
-- Subquery Example
SELECT employee_id, first_name
FROM employees
WHERE department_id = (SELECT department_id FROM departments WHERE name = 'Sales');

-- JOIN Example
SELECT e.employee_id, e.first_name, d.name
FROM employees e
JOIN departments d ON e.department_id = d.department_id
WHERE d.name = 'Sales';
• Q.828
What is the purpose of INDEX in SQL?
Answer:
An index is used to improve the speed of data retrieval operations on a table. It provides a
quick way to look up rows based on the values of one or more columns. However, indexes
can slow down write operations like INSERT, UPDATE, and DELETE since the index must also
be updated.
Key Points:
• Indexes are used to speed up queries that use WHERE, ORDER BY, or JOIN operations.
• They use additional disk space and affect performance for INSERT, UPDATE, and DELETE.
• Q.829
Question
What are transactions in SQL, and why are they important?
Answer:
A transaction is a sequence of one or more SQL operations that are executed as a single unit.
It ensures that the database remains in a consistent state, even in the case of errors.
ACID Properties of Transactions:
• Atomicity: Ensures that all operations within the transaction are completed; if not, the
transaction is rolled back.
• Consistency: Ensures the database transitions from one consistent state to another.
• Isolation: Ensures that transactions are isolated from each other.
• Durability: Ensures that once a transaction is committed, it is permanently stored.
Example:
BEGIN TRANSACTION;

UPDATE accounts SET balance = balance - 100 WHERE account_id = 1;


UPDATE accounts SET balance = balance + 100 WHERE account_id = 2;

COMMIT;
• Q.830

1011
1000+ SQL Interview Questions & Answers | By Zero Analyst

Question:
What are the differences between TRUNCATE, DELETE, and DROP?
Answer:
• DELETE: Removes specific rows from a table based on a condition. It can be rolled back
and does not remove the table structure.
• Example: DELETE FROM employees WHERE id = 5;
• TRUNCATE: Removes all rows from a table and cannot be rolled back in most databases.
It is faster than DELETE as it does not log individual row deletions.
• Example: TRUNCATE TABLE employees;
• DROP: Completely removes a table, including its data, structure, and any associated
constraints and indexes. This operation cannot be rolled back.
• Example: DROP TABLE employees;

Cloud Data Engineer


• Q.831

You are given a table sales with information about sales transactions. Write an SQL query
to identify the top 3 products sold by revenue (i.e., quantity * price) for each region. Display
the region, product name, total revenue, and rank of the product.

Explanation
To solve this, first calculate the total revenue for each product by multiplying quantity and
price. Then, use window functions like RANK() or ROW_NUMBER() to rank the products
within each region based on the total revenue. Finally, filter the results to show only the top 3
products per region.

Datasets and SQL Schemas


Table creation and sample data:
-- Create the sales table
CREATE TABLE sales (
sale_id INT PRIMARY KEY,
region VARCHAR(50),
product_name VARCHAR(255),
quantity INT,
price DECIMAL(10, 2),
sale_date DATE
);

-- Insert sample data into the sales table


INSERT INTO sales (sale_id, region, product_name, quantity, price, sale_date)
VALUES
(1, 'North', 'Laptop', 5, 1000, '2024-01-05'),
(2, 'South', 'Phone', 3, 500, '2024-01-06'),
(3, 'North', 'Tablet', 10, 300, '2024-01-10'),
(4, 'South', 'Headphone', 4, 150, '2024-01-12'),
(5, 'North', 'Phone', 2, 500, '2024-01-15'),
(6, 'South', 'Tablet', 6, 300, '2024-01-18'),
(7, 'North', 'Headphone', 3, 150, '2024-01-20');

1012
1000+ SQL Interview Questions & Answers | By Zero Analyst

Learnings
• Window functions: Using RANK() or ROW_NUMBER() allows ranking of items within
groups (here, regions).
• Aggregation: SUM(quantity * price) is used to calculate total revenue for each product.
• Partitioning: The PARTITION BY clause in window functions helps in calculating ranks
separately for each region.
• Filtering: Using WHERE rank <= 3 filters the results to show only the top 3 products for
each region.

Solutions

PostgreSQL solution:
WITH product_sales AS (
SELECT region,
product_name,
SUM(quantity * price) AS total_revenue
FROM sales
GROUP BY region, product_name
)
SELECT region,
product_name,
total_revenue,
RANK() OVER (PARTITION BY region ORDER BY total_revenue DESC) AS rank
FROM product_sales
WHERE rank <= 3
ORDER BY region, rank;

MySQL solution:
WITH product_sales AS (
SELECT region,
product_name,
SUM(quantity * price) AS total_revenue
FROM sales
GROUP BY region, product_name
)
SELECT region,
product_name,
total_revenue,
RANK() OVER (PARTITION BY region ORDER BY total_revenue DESC) AS rank
FROM product_sales
WHERE rank <= 3
ORDER BY region, rank;

Both PostgreSQL and MySQL solutions are identical in this case, as they both support
RANK() window function and CTEs (Common Table Expressions).

• Q.832
You are given a sales table with product sales data. Write an SQL query to calculate the 7-
day moving average of sales (quantity * price) for each product.

Explanation
To solve this, calculate the 7-day moving average of sales for each product. The moving
average should be based on the total revenue (quantity * price) over the last 7 days, including

1013
1000+ SQL Interview Questions & Answers | By Zero Analyst

the current day. Use window functions such as AVG() with the ROWS BETWEEN clause to
define the moving window.

Datasets and SQL Schemas


Table creation and sample data:
-- Create the sales table
CREATE TABLE sales (
sale_id INT PRIMARY KEY,
product_id INT,
quantity INT,
price DECIMAL(10, 2),
sale_date DATE
);

-- Insert sample data into the sales table


INSERT INTO sales (sale_id, product_id, quantity, price, sale_date)
VALUES
(1, 101, 10, 500, '2024-01-01'),
(2, 102, 5, 200, '2024-01-02'),
(3, 101, 15, 500, '2024-01-03'),
(4, 103, 8, 300, '2024-01-04'),
(5, 102, 10, 200, '2024-01-05'),
(6, 101, 10, 500, '2024-01-06'),
(7, 104, 12, 150, '2024-01-07'),
(8, 101, 20, 500, '2024-01-08');

Learnings
• Window functions: Using AVG() with ROWS BETWEEN enables calculating moving
averages within a defined window.
• ROWS BETWEEN: Defines the moving window for each row, where 6 PRECEDING
includes the current row and the 6 previous rows, totaling 7 rows.
• PARTITION BY: Ensures that the calculation is done separately for each product.
• Date ordering: The window is ordered by sale_date to ensure the moving average
calculation respects the time series.

Solutions

PostgreSQL solution:
SELECT product_id,
sale_date,
AVG(quantity * price) OVER (
PARTITION BY product_id
ORDER BY sale_date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) AS moving_avg_sales
FROM sales
ORDER BY product_id, sale_date;

MySQL solution:
SELECT product_id,
sale_date,
AVG(quantity * price) OVER (
PARTITION BY product_id
ORDER BY sale_date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) AS moving_avg_sales

1014
1000+ SQL Interview Questions & Answers | By Zero Analyst

FROM sales
ORDER BY product_id, sale_date;

Both PostgreSQL and MySQL solutions are identical in this case, as they both support
window functions like AVG() with ROWS BETWEEN.

• Q.833
You are given a table order_details with predicted and actual delivery times for orders.
Write a query to identify the delivery partners who have the most delayed orders (orders
where actual delivery time is later than the predicted delivery time).

Explanation
To solve this, first identify delayed orders by comparing actual_time with predicted_time
(i.e., actual_time > predicted_time). Then, group the results by del_partner and count
the number of delayed orders for each delivery partner. Finally, order the results to show the
delivery partners with the most delayed orders.

Datasets and SQL Schemas


Table creation and sample data:
-- Create the order_details table
CREATE TABLE order_details (
order_id INT,
del_partner VARCHAR(255),
predicted_time TIMESTAMP,
actual_time TIMESTAMP
);

-- Insert sample data into the order_details table


INSERT INTO order_details (order_id, del_partner, predicted_time, actual_time)
VALUES
(1, 'Partner A', '2024-01-01 10:00:00', '2024-01-01 10:30:00'),
(2, 'Partner B', '2024-01-02 14:00:00', '2024-01-02 14:20:00'),
(3, 'Partner A', '2024-01-03 16:00:00', '2024-01-03 16:10:00'),
(4, 'Partner B', '2024-01-04 09:00:00', '2024-01-04 09:45:00');

Learnings
• Simple comparisons: Using actual_time > predicted_time allows filtering delayed
orders directly.
• Aggregation: The use of COUNT(*) helps to count the number of delayed orders per
delivery partner.
• GROUP BY: Ensures results are grouped by delivery partner, so the delayed order count is
calculated for each.
• Ordering: ORDER BY delayed_orders DESC sorts the delivery partners based on the
number of delayed orders, from most to least.

Solutions

PostgreSQL solution:

1015
1000+ SQL Interview Questions & Answers | By Zero Analyst

SELECT del_partner,
COUNT(*) AS delayed_orders
FROM order_details
WHERE actual_time > predicted_time
GROUP BY del_partner
ORDER BY delayed_orders DESC;

MySQL solution:
SELECT del_partner,
COUNT(*) AS delayed_orders
FROM order_details
WHERE actual_time > predicted_time
GROUP BY del_partner
ORDER BY delayed_orders DESC;

Both PostgreSQL and MySQL solutions are identical, as they both support the basic SQL
syntax for aggregation and comparison.

• Q.834
Write a query to calculate the median order value for each customer. If there is an even
number of orders, return the average of the two middle values.

Explanation
To calculate the median for each customer, you need to:
• Sort the orders by order_value for each customer.
• Assign row numbers using the ROW_NUMBER() window function to position each order.
• Count the total number of orders per customer using COUNT(*) OVER (PARTITION BY
customer_id).
• For odd-numbered orders, select the middle value.
• For even-numbered orders, calculate the average of the two middle values.

Datasets and SQL Schemas


Table creation and sample data:
-- Create the orders table
CREATE TABLE orders (
order_id INT PRIMARY KEY,
customer_id INT,
order_value DECIMAL(10, 2)
);

-- Insert sample data into the orders table


INSERT INTO orders (order_id, customer_id, order_value)
VALUES
(1, 101, 250.00),
(2, 101, 150.00),
(3, 102, 100.00),
(4, 102, 200.00),
(5, 103, 300.00),
(6, 103, 400.00);

Learnings

1016
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Window functions: ROW_NUMBER() is used to assign a unique number to each row, and
COUNT() calculates the total number of orders for each customer.
• Partitioning: The query partitions the data by customer_id so that each customer’s orders
are handled independently.
• Median calculation: The median is computed by identifying the middle value(s), with
special handling for even numbers of rows using AVG().
• Order handling: The query sorts the orders for each customer by order_value to
calculate the median in a properly ordered sequence.

Solutions

PostgreSQL solution:
WITH ordered_orders AS (
SELECT customer_id, order_value,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_value) AS row_num,
COUNT(*) OVER (PARTITION BY customer_id) AS total_orders
FROM orders
)
SELECT customer_id,
AVG(order_value) AS median_order_value
FROM ordered_orders
WHERE row_num IN (total_orders / 2, total_orders / 2 + 1)
GROUP BY customer_id;

MySQL solution:
WITH ordered_orders AS (
SELECT customer_id, order_value,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_value) AS row_num,
COUNT(*) OVER (PARTITION BY customer_id) AS total_orders
FROM orders
)
SELECT customer_id,
AVG(order_value) AS median_order_value
FROM ordered_orders
WHERE row_num IN (total_orders / 2, total_orders / 2 + 1)
GROUP BY customer_id;

Both PostgreSQL and MySQL solutions are identical, as they both support window functions
like ROW_NUMBER() and COUNT()

• Q.835
You are given a purchase_history table with customer purchases. Write a query to identify
customers who have made purchases in the last 30 days but not in the last 7 days.

Explanation
To solve this, you need to:
• Filter customers who have made purchases in the last 30 days but not in the last 7 days.
• You can use a combination of NOT EXISTS or LEFT JOIN to check if there are any
purchases in the last 7 days for each customer.
• Ensure that customers who made purchases within the 30-day window but not in the last 7
days are selected.

1017
1000+ SQL Interview Questions & Answers | By Zero Analyst

Datasets and SQL Schemas


Table creation and sample data:
-- Create the purchase_history table
CREATE TABLE purchase_history (
customer_id INT,
purchase_date DATE,
amount DECIMAL(10, 2)
);

-- Insert sample data into the purchase_history table


INSERT INTO purchase_history (customer_id, purchase_date, amount)
VALUES
(1, '2024-03-01', 150),
(1, '2024-03-15', 250),
(2, '2024-02-20', 100),
(2, '2024-02-28', 200),
(3, '2024-03-05', 300);

Learnings
• Date manipulation: CURRENT_DATE - INTERVAL is used to calculate the last 30 days and
7 days from the current date.
• Filtering with NOT EXISTS: Used to exclude customers who made purchases in the last 7
days.
• Subqueries: The NOT EXISTS clause ensures that there are no records for the same
customer in the last 7 days.
• Grouping and aggregation: GROUP BY ensures that we check for unique customers, and
HAVING filters out those who made purchases in the last 7 days.

Solutions

PostgreSQL solution:
SELECT customer_id
FROM purchase_history
WHERE purchase_date >= CURRENT_DATE - INTERVAL '30 days'
AND purchase_date < CURRENT_DATE - INTERVAL '7 days'
GROUP BY customer_id
HAVING NOT EXISTS (
SELECT 1
FROM purchase_history
WHERE customer_id = purchase_history.customer_id
AND purchase_date >= CURRENT_DATE - INTERVAL '7 days'
);

MySQL solution:
SELECT customer_id
FROM purchase_history
WHERE purchase_date >= CURDATE() - INTERVAL 30 DAY
AND purchase_date < CURDATE() - INTERVAL 7 DAY
GROUP BY customer_id
HAVING NOT EXISTS (
SELECT 1
FROM purchase_history
WHERE customer_id = purchase_history.customer_id
AND purchase_date >= CURDATE() - INTERVAL 7 DAY
);

1018
1000+ SQL Interview Questions & Answers | By Zero Analyst

The syntax for date manipulation differs slightly between PostgreSQL (CURRENT_DATE -
INTERVAL '30 days') and MySQL (CURDATE() - INTERVAL 30 DAY), but the logic
remains the same.

• Q.836
Given the sales table, calculate the percentage change in sales for each product between Q1
(January to March) and Q2 (April to June) of a given year.

Explanation
To solve this, you need to:
• Calculate sales for Q1 and Q2: Use conditional aggregation to sum the sales for Q1
(January to March) and Q2 (April to June).
• Formula for percentage change: The formula to calculate percentage change is:

Ensure to handle the aggregation correctly for each product.

Datasets and SQL Schemas


Table creation and sample data:
-- Create the sales table
CREATE TABLE sales (
order_id INT PRIMARY KEY,
product_id INT,
quantity INT,
price DECIMAL(10, 2),
sale_date DATE
);

-- Insert sample data into the sales table


INSERT INTO sales (order_id, product_id, quantity, price, sale_date)
VALUES
(1, 101, 10, 500, '2024-01-10'),
(2, 101, 5, 500, '2024-04-15'),
(3, 102, 8, 300, '2024-02-20'),
(4, 102, 3, 300, '2024-05-05');

Learnings
• Conditional Aggregation: Using CASE WHEN within SUM() allows you to conditionally
aggregate values for different periods (Q1 and Q2).
• Date functions: The EXTRACT(MONTH FROM sale_date) function is used to filter data
based on the month part of sale_date.
• Percentage Change Calculation: Proper handling of the formula (Q2_sales -
Q1_sales) / Q1_sales * 100 is essential to get the correct result.
• GROUP BY: Grouping by product_id ensures that the calculation is done for each
individual product.

1019
1000+ SQL Interview Questions & Answers | By Zero Analyst

Solutions

PostgreSQL solution:
SELECT product_id,
SUM(CASE WHEN EXTRACT(MONTH FROM sale_date) BETWEEN 1 AND 3 THEN quantity * price
ELSE 0 END) AS Q1_sales,
SUM(CASE WHEN EXTRACT(MONTH FROM sale_date) BETWEEN 4 AND 6 THEN quantity * price
ELSE 0 END) AS Q2_sales,
100 * (SUM(CASE WHEN EXTRACT(MONTH FROM sale_date) BETWEEN 4 AND 6 THEN quantity
* price ELSE 0 END) -
SUM(CASE WHEN EXTRACT(MONTH FROM sale_date) BETWEEN 1 AND 3 THEN quantity
* price ELSE 0 END)) /
NULLIF(SUM(CASE WHEN EXTRACT(MONTH FROM sale_date) BETWEEN 1 AND 3 THEN qu
antity * price ELSE 0 END), 0)
) AS percentage_change
FROM sales
GROUP BY product_id;

MySQL solution:
SELECT product_id,
SUM(CASE WHEN MONTH(sale_date) BETWEEN 1 AND 3 THEN quantity * price ELSE 0 END)
AS Q1_sales,
SUM(CASE WHEN MONTH(sale_date) BETWEEN 4 AND 6 THEN quantity * price ELSE 0 END)
AS Q2_sales,
100 * (SUM(CASE WHEN MONTH(sale_date) BETWEEN 4 AND 6 THEN quantity * price ELSE
0 END) -
SUM(CASE WHEN MONTH(sale_date) BETWEEN 1 AND 3 THEN quantity * price ELSE
0 END)) /
NULLIF(SUM(CASE WHEN MONTH(sale_date) BETWEEN 1 AND 3 THEN quantity * pric
e ELSE 0 END), 0)
) AS percentage_change
FROM sales
GROUP BY product_id;
• NULLIF: This ensures that division by zero is avoided in case Q1 sales is zero, preventing a
division by zero error.
• Date function differences: PostgreSQL uses EXTRACT(MONTH FROM sale_date) while
MySQL uses MONTH(sale_date) to extract the month.

This question tests the candidate’s ability to:


• Aggregate data conditionally based on a time period.
• Calculate percentage change between two periods.
• Handle potential errors (like division by zero) in SQL calculations.
• Q.837
Explain the Difference Between Data Warehousing and Data Lakes, and When Would
You Use Each?

Answer:
• Data Warehousing:
• Purpose: Primarily designed for structured data and optimized for fast querying and
analysis.
• Data Structure: Highly structured (i.e., relational data models), with data being cleansed,
transformed, and organized into schemas (e.g., star schema, snowflake schema).
• Use Case: Best for analytical applications where you need to perform complex queries and
reporting, often on historical data. Examples include business intelligence and dashboarding.

1020
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Data Lakes:
• Purpose: Designed to store vast amounts of raw, unstructured, semi-structured, and
structured data.
• Data Structure: Can handle unstructured data (e.g., text files, images) and semi-structured
data (e.g., JSON, Parquet), and the schema is often applied at the time of reading (schema-on-
read).
• Use Case: Best for storing large volumes of diverse data types that may not yet be fully
understood, including big data, logs, or data from IoT devices. Commonly used in machine
learning, deep learning, and predictive analytics where data might need extensive
transformation and processing.
• Key Differences:
• Structure: Data warehouses use a predefined schema (schema-on-write) while data lakes
store raw data and apply schema later (schema-on-read).
• Flexibility: Data lakes are more flexible in terms of the types of data they store, while data
warehouses are rigid but optimized for fast querying.
• Speed: Data warehouses are faster for querying structured data, while data lakes can
handle much larger data sets but may require more time for transformation and analysis.
• Q.838
What Are the Advantages and Challenges of Using Partitioning in SQL Databases?

Answer:
Advantages of Partitioning:
• Improved Query Performance: Partitioning can significantly improve query performance
by limiting the amount of data scanned during query execution, especially when queries filter
on partitioned columns (e.g., date).
• Example: A large table partitioned by order_date can make querying recent data faster
because only the relevant partitions are scanned.
• Manageability: Partitioning allows for easier management of large tables. You can drop or
archive older partitions without affecting the rest of the table, helping with data retention
policies.
• Parallel Processing: Partitioning allows parallel query execution by processing different
partitions simultaneously, which speeds up query processing.
• Faster Backup and Restore: Since data is stored in separate partitions, you can backup or
restore individual partitions instead of the entire table, which can be more efficient for very
large datasets.
Challenges of Partitioning:
• Overhead of Partition Management: Deciding how to partition the data, such as by range
or list, and managing partitioned tables can require additional administrative overhead.
• Example: Deciding between RANGE (e.g., by date) or LIST (e.g., by region) partitioning can
have significant impacts on performance and maintenance.
• Suboptimal Querying: If queries do not filter on partitioned columns, partitioning can
result in poor performance because the database may scan the entire table (i.e., a "partition
scan").
• Increased Complexity: As partitions grow, the logic required for loading, querying, and
maintaining them becomes more complex, especially when dealing with partition boundaries
and managing stale partitions.

1021
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Write Amplification: In some scenarios, partitioning can increase the overhead of


inserting data into specific partitions, especially if a partition is too small or too large.
Use Cases for Partitioning:
• Partitioning is ideal for large tables with a lot of data that can be logically divided, such as
logs, time-series data, or transaction data.
• Q.839
How to Optimize a Query for Faster Performance When Joining Large Tables?
Answer:
• Use indexes on the columns involved in joins and filters.
• *Avoid SELECT *** and only select the necessary columns.
• Use EXPLAIN to analyze query execution plans.
• Consider partitioning large tables to improve data retrieval speeds.
• Optimize joins: prefer INNER JOIN over OUTER JOIN when possible to reduce
unnecessary data.
• Q.840
What is the Difference Between OLTP and OLAP?
Answer:
• OLTP (Online Transaction Processing): Designed for managing transactional data. It
involves frequent insertions, updates, and deletions (e.g., banking, order processing). OLTP
databases are optimized for high-volume and quick transactions.
• OLAP (Online Analytical Processing): Used for complex querying and analysis of large
datasets. OLAP systems are optimized for read-heavy operations and are typically used for
business intelligence applications (e.g., data warehousing, trend analysis).

Machine Learning Engineer


• Q.841
How would you calculate the mean and standard deviation for each feature in a dataset using
SQL?
Answer:
You can use the AVG() and STDDEV() functions in SQL to calculate the mean and standard
deviation of a feature.
Example:
SELECT
feature_name,
AVG(feature_value) AS mean,
STDDEV(feature_value) AS stddev
FROM features_data
GROUP BY feature_name;
• Q.842
Question:
What is the purpose of the GROUP BY clause in SQL? How would you use it to aggregate data
for machine learning?
Answer:

1022
1000+ SQL Interview Questions & Answers | By Zero Analyst

The GROUP BY clause is used to group rows that have the same values into summary rows. It
is often used in conjunction with aggregate functions like COUNT(), SUM(), AVG(), and MAX()
to compute aggregate statistics for each group.
For machine learning, GROUP BY can be used to summarize data, such as calculating the
average value of a feature for each class label.
Example:
SELECT label, AVG(feature1), AVG(feature2)
FROM data_points
GROUP BY label;
• Q.843
Question
How would you handle missing values in a dataset using SQL?
Answer:
There are several approaches to handle missing values in a dataset using SQL:
• Remove rows with missing values using WHERE clause.
• Replace missing values with a default value (e.g., mean, median, or mode) using
COALESCE().
Examples:
• Remove rows with missing values:
SELECT * FROM data_points WHERE feature1_value IS NOT NULL;
• Replace missing values with the mean:
SELECT
COALESCE(feature1_value, (SELECT AVG(feature1_value) FROM data_points)) AS feature1_
value
FROM data_points;
• Q.844

Question:
How would you perform a join between multiple tables in SQL to combine training features
and labels?
Answer:
To combine multiple tables (such as features and labels) for training, you would typically use
an INNER JOIN or LEFT JOIN based on a common key (e.g., dataset_id).
Example:
SELECT f.feature1_value, f.feature2_value, l.label_value
FROM features_data f
JOIN labels l ON f.dataset_id = l.dataset_id
WHERE f.dataset_id = 1;
• Q.845
Question:
How would you calculate the correlation coefficient between two features in a dataset using
SQL?
Answer:
You can calculate the Pearson correlation coefficient using SQL by computing the
covariance of two features divided by the product of their standard deviations.

1023
1000+ SQL Interview Questions & Answers | By Zero Analyst

Example:
SELECT
(SUM(feature1_value * feature2_value) - SUM(feature1_value) * SUM(feature2_value) /
COUNT(*)) /
(SQRT((SUM(POWER(feature1_value, 2)) - POWER(SUM(feature1_value), 2) / COUNT(*)) *
(SUM(POWER(feature2_value, 2)) - POWER(SUM(feature2_value), 2) / COUNT(*)))) A
S correlation
FROM data_points;
• Q.846

Question:
How would you implement an INSERT operation in SQL to add new training data?
Answer:
You would use the INSERT INTO statement to add new data to a table.
Example:
INSERT INTO data_points (dataset_id, feature1_value, feature2_value, label_value)
VALUES (1, 3.4, 5.1, 0);
• Q.847

Question:
How would you efficiently handle large datasets in SQL for machine learning purposes?
Answer:
To handle large datasets efficiently:
• Indexing: Create indexes on frequently queried columns, such as feature_id, label_id,
or any foreign key.
• Partitioning: Split large tables into smaller, manageable partitions based on certain
column values (e.g., DATE).
• Batch Processing: Query data in smaller batches instead of loading the entire dataset at
once.
• Use Aggregate Functions: Instead of returning all rows, aggregate data using functions
like AVG(), SUM(), etc., to reduce the dataset size.
• Q.848

Question:
How would you calculate the performance metrics (e.g., accuracy, precision, recall) using
SQL on a model's predictions?
Answer:
You can calculate performance metrics by comparing the predicted values against the actual
values from the test set.
Example (Accuracy):
SELECT
COUNT(CASE WHEN predicted_label = actual_label THEN 1 END) / COUNT(*) AS accuracy
FROM predictions;
• Q.849

Question:

1024
1000+ SQL Interview Questions & Answers | By Zero Analyst

How do you create a schema for storing machine learning model metadata, such as training
parameters and performance metrics?
Answer:
To store model metadata, you would design a schema with tables for:
• models: Stores information about the model (e.g., model_id, name, type).
• training_data: Stores training details like dataset used, hyperparameters, and timestamp.
• metrics: Stores performance metrics for each model (e.g., accuracy, precision, recall).
Example:
CREATE TABLE models (
model_id INT PRIMARY KEY,
name VARCHAR(100),
model_type VARCHAR(50),
created_at TIMESTAMP
);

CREATE TABLE training_data (


training_id INT PRIMARY KEY,
model_id INT,
dataset_id INT,
hyperparameters JSON,
FOREIGN KEY (model_id) REFERENCES models(model_id)
);

CREATE TABLE metrics (


metric_id INT PRIMARY KEY,
model_id INT,
accuracy DECIMAL,
precision DECIMAL,
recall DECIMAL,
FOREIGN KEY (model_id) REFERENCES models(model_id)
);
• Q.850
What is normalization in the context of SQL and how is it related to feature scaling in
machine learning?
Answer:
Normalization in SQL refers to organizing the database schema to eliminate redundancy and
dependency. It involves breaking down large tables into smaller ones and establishing
relationships between them.
In machine learning, feature scaling (e.g., Min-Max scaling, Z-score normalization) is used
to adjust the range of features, which can help improve model performance, especially in
algorithms sensitive to the scale of input features (e.g., SVM, KNN). While SQL
normalization deals with data structure, feature scaling deals with preparing data for
models.

Backend Developer
• Q.851
What is the difference between JOIN and UNION in SQL?
Answer:
• JOIN: Combines rows from two or more tables based on a related column (e.g., primary
key/foreign key). It can be an INNER JOIN, LEFT JOIN, RIGHT JOIN, or FULL
OUTER JOIN.

1025
1000+ SQL Interview Questions & Answers | By Zero Analyst

• UNION: Combines the result sets of two or more SELECT queries. It removes duplicate
rows by default, whereas UNION ALL includes all rows.
Example:
-- JOIN Example
SELECT a.name, b.department
FROM employees a
INNER JOIN departments b ON a.department_id = b.department_id;

-- UNION Example
SELECT name FROM employees
UNION
SELECT name FROM contractors;
• Q.852
How do you handle database migrations in a production environment?
Answer:
• Use migration tools like Flyway, Liquibase, or built-in database migration frameworks in
backend frameworks (e.g., Django's migrations, Rails ActiveRecord migrations).
• Version control: Ensure each migration is versioned and can be applied in sequence.
• Backups: Always take a backup of the database before running migrations.
• Testing: Test migrations on a staging environment before applying them in production.
• Rollback: Ensure that migrations are reversible, or create a manual rollback plan if needed.
• Q.853
What are transactions in SQL, and why are they important for backend development?
Answer:
A transaction is a sequence of SQL operations executed as a single unit, ensuring database
consistency. Transactions are crucial for maintaining data integrity in the event of errors or
system crashes.
ACID Properties:
• Atomicity: Ensures all operations in a transaction are completed, or none are.
• Consistency: The database starts and ends in a consistent state.
• Isolation: Ensures transactions do not interfere with each other.
• Durability: Once committed, the changes are permanent.
Example:
BEGIN TRANSACTION;
UPDATE accounts SET balance = balance - 100 WHERE account_id = 1;
UPDATE accounts SET balance = balance + 100 WHERE account_id = 2;
COMMIT;
• Q.854
How would you create an index on a table, and when should you use it?
Answer:
An index is created to speed up queries by reducing the time taken to search for specific
rows.
Example:
CREATE INDEX idx_employee_name ON employees (name);

When to use:

1026
1000+ SQL Interview Questions & Answers | By Zero Analyst

• For columns frequently used in WHERE, JOIN, ORDER BY, or GROUP BY clauses.
• On primary/foreign keys for faster lookup.
When NOT to use:
• Avoid indexing columns that are updated frequently as it can slow down INSERT, UPDATE,
and DELETE operations.
• Q.855
What is Normalization in SQL, and why is it important?
Answer:
Normalization is the process of organizing a database to reduce redundancy and dependency
by dividing large tables into smaller tables and establishing relationships. The goal is to avoid
data anomalies.
Normal Forms:
• 1NF: Eliminate duplicate columns, and ensure that each field contains only atomic values.
• 2NF: Eliminate partial dependency (each non-key attribute must depend on the entire
primary key).
• 3NF: Eliminate transitive dependency (non-key attributes should depend only on the
primary key).
Why Important:
• Reduces data redundancy.
• Ensures data integrity.
• Simplifies updates, deletions, and insertions.
• Q.856
Question:
How would you prevent SQL Injection in a backend system?
Answer:
• Use Prepared Statements: Ensure that user input is treated as data, not executable code.
• Input Validation: Validate and sanitize all user inputs.
• Stored Procedures: Use parameterized queries and stored procedures that don’t
concatenate strings directly.
• Least Privilege: Ensure that the database user account has the least privileges necessary
for the task.
Example (Prepared Statement):
SELECT * FROM users WHERE username = ? AND password = ?;
• Q.857
Question:
What are foreign keys, and how do they ensure referential integrity?
Answer:
A foreign key is a column (or set of columns) in one table that refers to the primary key in
another table. Foreign keys ensure that relationships between tables are valid by enforcing
referential integrity.
Example:

1027
1000+ SQL Interview Questions & Answers | By Zero Analyst

CREATE TABLE orders (


order_id INT PRIMARY KEY,
customer_id INT,
FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
);
• Q.858

Question:
How would you implement pagination in SQL for large datasets?
Answer:
Pagination is implemented using LIMIT and OFFSET in SQL to retrieve a subset of rows.
Example (for MySQL/PostgreSQL):
SELECT * FROM employees LIMIT 10 OFFSET 20;
• Q.859
What is the difference between a clustered and non-clustered index?
Answer:
• Clustered Index: The data rows are stored in the table in the order of the index. A table
can have only one clustered index.
• Non-clustered Index: The index is stored separately from the table, and it contains a
pointer to the actual data. A table can have multiple non-clustered indexes.
• Q.860
Question:
How do you implement data integrity in SQL?
Answer:
Data integrity is ensured by using:
• Primary Keys: Ensures uniqueness for each record.
• Foreign Keys: Ensures valid relationships between tables.
• Check Constraints: Ensures that values in a column meet certain conditions.
• Not Null Constraints: Ensures that a column cannot have NULL values.
• Unique Constraints: Ensures all values in a column are unique.

50+ Most Asked Questions & Answers


• Q.861

Question:
Write a query to find the 3rd highest salary in the employees table.

Explanation:
You need to write a query that identifies the 3rd highest salary from the employees table.
Use a subquery along with the DISTINCT keyword to eliminate duplicate salaries, and use
LIMIT to fetch the 3rd highest value.

1028
1000+ SQL Interview Questions & Answers | By Zero Analyst

Datasets and SQL Schemas


• - Table creation
CREATE TABLE employees (
employee_id INT,
employee_name VARCHAR(100),
salary DECIMAL(10, 2)
);
• - Datasets
INSERT INTO employees (employee_id, employee_name, salary)
VALUES
(1, 'Alice', 50000),
(2, 'Bob', 70000),
(3, 'Charlie', 75000),
(4, 'David', 70000),
(5, 'Eve', 90000);

Learnings:
• Use of DISTINCT to eliminate duplicates.
• Usage of subqueries to rank or filter records.
• Limiting results with LIMIT in SQL.

Solutions
• - PostgreSQL solution
SELECT MIN(salary) AS third_highest_salary
FROM (
SELECT DISTINCT salary
FROM employees
ORDER BY salary DESC
LIMIT 3
) AS subquery;
• - MySQL solution
SELECT MIN(salary) AS third_highest_salary
FROM (
SELECT DISTINCT salary
FROM employees
ORDER BY salary DESC
LIMIT 3
) AS subquery;
• Q.862

Question:
Identify employees earning above the average salary within their department.

Explanation:
Write a query to find employees whose salaries are greater than the average salary in their
respective department. A correlated subquery can be used to compare each employee's salary
against the average salary of their department.

Datasets and SQL Schemas


• - Table creation
CREATE TABLE employees (
employee_id INT,
employee_name VARCHAR(100),
department_id INT,
salary DECIMAL(10, 2)
);
• - Datasets
INSERT INTO employees (employee_id, employee_name, department_id, salary)

1029
1000+ SQL Interview Questions & Answers | By Zero Analyst

VALUES
(1, 'Alice', 1, 60000),
(2, 'Bob', 1, 55000),
(3, 'Charlie', 2, 70000),
(4, 'David', 2, 65000),
(5, 'Eve', 1, 75000),
(6, 'Frank', 2, 60000);

Learnings:
• Use of correlated subqueries to compare row values with aggregated results.
• Use of AVG() to calculate the average salary within a department.
• Filtering records based on the result of the subquery.

Solutions
• - PostgreSQL solution
SELECT employee_name, salary, department_id
FROM employees e1
WHERE salary > (
SELECT AVG(salary)
FROM employees e2
WHERE e1.department_id = e2.department_id
);
• - MySQL solution
SELECT employee_name, salary, department_id
FROM employees e1
WHERE salary > (
SELECT AVG(salary)
FROM employees e2
WHERE e1.department_id = e2.department_id
);
• Q.863
Write a query to find employees who have worked in more than 2 departments.
Explanation
To solve this, you'll need to count the number of departments for each employee and filter out
those who worked in more than 2.
Datasets and SQL Schemas
• - Table creation
CREATE TABLE EmployeeDepartment (
employee_id INT,
department_id INT
);
• - Datasets
INSERT INTO EmployeeDepartment (employee_id, department_id)
VALUES
(1, 101),
(1, 102),
(1, 103),
(2, 101),
(2, 104),
(3, 102);

Learnings
• Using GROUP BY and HAVING clauses for counting occurrences
• Aggregation in SQL
Solutions
• - PostgreSQL solution

1030
1000+ SQL Interview Questions & Answers | By Zero Analyst

SELECT employee_id
FROM EmployeeDepartment
GROUP BY employee_id
HAVING COUNT(DISTINCT department_id) > 2;
• - MySQL solution
SELECT employee_id
FROM EmployeeDepartment
GROUP BY employee_id
HAVING COUNT(DISTINCT department_id) > 2;
• Q.864
Write a query to get the total sales per product for the last 30 days.
Explanation
You will need to filter sales by date and aggregate the sales for each product.
Datasets and SQL Schemas
• - Table creation
CREATE TABLE Sales (
sale_id INT,
product_id INT,
sale_date DATE,
sale_amount DECIMAL
);
• - Datasets
INSERT INTO Sales (sale_id, product_id, sale_date, sale_amount)
VALUES
(1, 101, '2025-01-01', 250),
(2, 102, '2025-01-05', 300),
(3, 101, '2025-01-10', 150),
(4, 103, '2025-01-15', 450),
(5, 101, '2025-01-20', 200);

Learnings
• Using WHERE to filter by date
• Aggregation and grouping data in SQL
Solutions
• - PostgreSQL solution
SELECT product_id, SUM(sale_amount) AS total_sales
FROM Sales
WHERE sale_date >= CURRENT_DATE - INTERVAL '30 days'
GROUP BY product_id;
• - MySQL solution
SELECT product_id, SUM(sale_amount) AS total_sales
FROM Sales
WHERE sale_date >= CURDATE() - INTERVAL 30 DAY
GROUP BY product_id;
• Q.865
Find the employees who do not have any sales in the Sales table.
Explanation
Use a LEFT JOIN to get all employees and then filter for those with no corresponding sales.
Datasets and SQL Schemas
• - Table creation
CREATE TABLE Employee (
employee_id INT,
name VARCHAR(100)
);

1031
1000+ SQL Interview Questions & Answers | By Zero Analyst

• - Datasets
INSERT INTO Employee (employee_id, name)
VALUES
(1, 'Alice'),
(2, 'Bob'),
(3, 'Charlie');
• - Sales dataset (some employees may not have sales)
CREATE TABLE Sales (
sale_id INT,
employee_id INT,
sale_amount DECIMAL
);
INSERT INTO Sales (sale_id, employee_id, sale_amount)
VALUES
(1, 1, 500),
(2, 2, 700);

Learnings
• Using LEFT JOIN to identify missing data
• Filtering NULL values
Solutions
• - PostgreSQL solution
SELECT e.name
FROM Employee e
LEFT JOIN Sales s ON e.employee_id = s.employee_id
WHERE s.sale_id IS NULL;
• - MySQL solution
SELECT e.name
FROM Employee e
LEFT JOIN Sales s ON e.employee_id = s.employee_id
WHERE s.sale_id IS NULL;
• Q.866
Write a query to find the total revenue for each day.
Explanation
Group by the date and aggregate the sales amount to get the total revenue per day.
Datasets and SQL Schemas
• - Table creation
CREATE TABLE Sales (
sale_id INT,
sale_date DATE,
sale_amount DECIMAL
);
• - Datasets
INSERT INTO Sales (sale_id, sale_date, sale_amount)
VALUES
(1, '2024-01-01', 100),
(2, '2024-01-01', 150),
(3, '2024-01-02', 200);

Learnings
• Date-based aggregation
• Using GROUP BY for daily totals
Solutions
• - PostgreSQL solution
SELECT sale_date, SUM(sale_amount) AS total_revenue
FROM Sales

1032
1000+ SQL Interview Questions & Answers | By Zero Analyst

GROUP BY sale_date;
• - MySQL solution
SELECT sale_date, SUM(sale_amount) AS total_revenue
FROM Sales
GROUP BY sale_date;
• Q.867
Find all customers who have not placed any orders.
Explanation
Use a LEFT JOIN on the customers table and the orders table, filtering for customers with no
corresponding orders.
Datasets and SQL Schemas
• - Table creation
CREATE TABLE Customer (
customer_id INT,
customer_name VARCHAR(100)
);
• - Datasets
INSERT INTO Customer (customer_id, customer_name)
VALUES
(1, 'Alice'),
(2, 'Bob');
• - Orders dataset
CREATE TABLE Orders (
order_id INT,
customer_id INT,
order_amount DECIMAL
);
INSERT INTO Orders (order_id, customer_id, order_amount)
VALUES
(1, 1, 500);

Learnings
• Identifying missing relationships using LEFT JOIN
Solutions
• - PostgreSQL solution
SELECT c.customer_name
FROM Customer c
LEFT JOIN Orders o ON c.customer_id = o.customer_id
WHERE o.order_id IS NULL;
• - MySQL solution
SELECT c.customer_name
FROM Customer c
LEFT JOIN Orders o ON c.customer_id = o.customer_id
WHERE o.order_id IS NULL;
• Q.868
Find the employees who joined after a specific date.
Explanation
You need to filter employees based on their joining date.
Datasets and SQL Schemas
• - Table creation
CREATE TABLE Employee (
employee_id INT,
name VARCHAR(100),
join_date DATE

1033
1000+ SQL Interview Questions & Answers | By Zero Analyst

);
• - Datasets
INSERT INTO Employee (employee_id, name, join_date)
VALUES
(1, 'Alice', '2023-01-10'),
(2, 'Bob', '2024-01-05');

Learnings
• Filtering by date condition
Solutions
• - PostgreSQL solution
SELECT name
FROM Employee
WHERE join_date > '2023-01-01';
• - MySQL solution
SELECT name
FROM Employee
WHERE join_date > '2023-01-01';
• Q.869
Find the most recent order placed by each customer.
Explanation
You need to find the most recent order for each customer by grouping the orders and using
MAX() on the order date.
Datasets and SQL Schemas
• - Table creation
CREATE TABLE Orders (
order_id INT,
customer_id INT,
order_date DATE,
order_amount DECIMAL
);
• - Datasets
INSERT INTO Orders (order_id, customer_id, order_date, order_amount)
VALUES
(1, 1, '2024-01-01', 300),
(2, 1, '2024-02-10', 500),
(3, 2, '2024-03-05', 700),
(4, 2, '2024-02-15', 200),
(5, 3, '2024-04-01', 400);

Solutions
• - PostgreSQL solution
SELECT customer_id, MAX(order_date) AS most_recent_order_date
FROM Orders
GROUP BY customer_id;
• - MySQL solution
SELECT customer_id, MAX(order_date) AS most_recent_order_date
FROM Orders
GROUP BY customer_id;
• Q.870
Find the employees who have the same salary.
Explanation
You need to identify employees who share the same salary. This involves grouping by the
salary column and filtering with HAVING to only include salaries that appear more than once.

1034
1000+ SQL Interview Questions & Answers | By Zero Analyst

Datasets and SQL Schemas


• - Table creation
CREATE TABLE Employee (
employee_id INT,
name VARCHAR(100),
salary INT
);
• - Datasets
INSERT INTO Employee (employee_id, name, salary)
VALUES
(1, 'Alice', 50000),
(2, 'Bob', 60000),
(3, 'Charlie', 50000),
(4, 'David', 70000),
(5, 'Eve', 60000);

Learnings
• Using GROUP BY to group rows based on salary
• Filtering with HAVING to get only those salaries that have more than one employee
• Using aggregation functions like COUNT() to identify duplicate values
Solutions
• - PostgreSQL solution
SELECT salary, ARRAY_AGG(name) AS employees
FROM Employee
GROUP BY salary
HAVING COUNT(salary) > 1;
• - MySQL solution
SELECT salary, GROUP_CONCAT(name) AS employees
FROM Employee
GROUP BY salary
HAVING COUNT(salary) > 1;

• Q.871
Find the Nth Highest Salary
Explanation
The task is to find the nth highest salary in the Employees table. There are multiple ways to
approach this problem, such as using OFFSET and LIMIT for simpler queries, or using window
functions like DENSE_RANK() to handle ties in salary rankings. Below are a few approaches to
solve this.

Datasets and SQL Schemas


• - Table creation
DROP TABLE IF EXISTS Employees;
CREATE TABLE Employees (
id INT,
name VARCHAR(100) NOT NULL,
salary NUMERIC(10, 2),
department_id INT,
manager_id INT,
hire_date DATE NOT NULL
);
• - Insert data into Employees table
INSERT INTO Employees (id, name, salary, department_id, manager_id, hire_date)
VALUES
(1, 'Alice', 90000, 1, NULL, '2022-01-15'),
(2, 'Micheal', 80000, 2, 1, '2022-02-20'),
(3, 'Bob', 80000, 2, 1, '2022-02-20'),

1035
1000+ SQL Interview Questions & Answers | By Zero Analyst

(4, 'Charlie', 75000, 2, 1, '2022-03-12'),


(5, 'David', 85000, 2, 1, '2022-03-25'),
(6, 'Eve', 95000, 2, 2, '2022-04-01'),
(7, 'Frank', 78000, 2, 2, '2022-04-20'),
(8, 'Grace', 60000, 3, 3, '2022-05-12'),
(9, 'Heidi', 88000, 3, 1, '2022-06-15');

Learnings
• Using ORDER BY to sort the records by salary.
• Using OFFSET and LIMIT to directly access the nth row.
• Using DENSE_RANK() for more flexible ranking in the case of salary ties.
• The importance of understanding window functions for advanced SQL queries.

Solutions

1. Approach 1: Using OFFSET and LIMIT


This method is simple and works well when you need to retrieve a specific record (e.g., 4th
highest, 6th highest).
Solution:
SELECT *
FROM Employees
ORDER BY salary DESC
OFFSET 4 - 1 LIMIT 1; -- For 4th highest salary, use OFFSET 4-1

Explanation:
• ORDER BY salary DESC sorts the employees by salary in descending order.
• OFFSET 4-1 LIMIT 1 skips the first 3 rows and retrieves the 4th highest salary (change 4
to any other value for different nth salaries).

2. Approach 2: Using DENSE_RANK() for Ranking


DENSE_RANK() can be used to assign a unique rank to each salary value while handling ties
(e.g., multiple employees having the same salary).
Solution:
SELECT *
FROM (
SELECT *,
DENSE_RANK() OVER (ORDER BY salary DESC) AS d_rank
FROM Employees
) AS ranked_employees
WHERE d_rank = 4; -- For 4th highest salary, change 4 to n

Explanation:
• DENSE_RANK() assigns a rank to each employee based on their salary in descending order,
without gaps in ranks (if multiple employees have the same salary).
• We then filter the result to get the 4th highest salary by checking where the rank equals 4.
You can change 4 to any desired value for the nth highest salary.

3. Approach 3: Using ROW_NUMBER() for Unique Ranks

1036
1000+ SQL Interview Questions & Answers | By Zero Analyst

If you want a strictly unique ranking for each salary (even if some salaries are the same), use
ROW_NUMBER(). This approach ensures that no two employees get the same rank, even if their
salaries are identical.
Solution:
SELECT *
FROM (
SELECT *,
ROW_NUMBER() OVER (ORDER BY salary DESC) AS row_num
FROM Employees
) AS ranked_employees
WHERE row_num = 4; -- For 4th highest salary, change 4 to n

Explanation:
• ROW_NUMBER() generates a unique sequential number for each row, starting at 1, without
any gaps.
• Similar to the DENSE_RANK() method, we filter the result to get the 4th row based on the
generated row_num.

Comparison of Approaches

Approach Pros Cons

Simple, fast, works Doesn't handle ties properly (if there are
OFFSET/LIMIT
well for specific rows multiple employees with the same salary)

Handles ties in salary More complex, especially for large


DENSE_RANK()
correctly datasets

Strictly unique ranking Could assign different ranks to identical


ROW_NUMBER()
even with ties salaries, might not be useful in some cases
• Q.872
Find Median Salary
Explanation
The median salary is the middle value in a sorted list of salaries. If there is an odd number of
records, the median is the middle salary. If there is an even number of records, the median is
the average of the two middle salaries.

Datasets and SQL Schemas


• - Table creation
DROP TABLE IF EXISTS Employees;
CREATE TABLE Employees (
id INT,
name VARCHAR(100) NOT NULL,
salary NUMERIC(10, 2),
department_id INT,
manager_id INT,
hire_date DATE NOT NULL
);
• - Insert data into Employees table
INSERT INTO Employees (id, name, salary, department_id, manager_id, hire_date)
VALUES

1037
1000+ SQL Interview Questions & Answers | By Zero Analyst

(1, 'Alice', 90000, 1, NULL, '2022-01-15'),


(2, 'Bob', 80000, 2, 1, '2022-02-20'),
(3, 'Charlie', 75000, 2, 1, '2022-03-12'),
(4, 'David', 85000, 2, 1, '2022-03-25'),
(5, 'Eve', 95000, 2, 2, '2022-04-01'),
(6, 'Frank', 78000, 2, 2, '2022-04-20'),
(7, 'Grace', 60000, 3, 3, '2022-05-12'),
(8, 'Heidi', 88000, 3, 1, '2022-06-15'),
(9, 'Sam', 89000, 1, 2, '2022-06-15');

Solution to Find the Median Salary


To calculate the median salary, you can use the following approach:
• For Odd Number of Records:
The median is the middle salary.
• For Even Number of Records:
The median is the average of the two middle salaries.

Approach 1: Using Window Functions


We can use window functions (ROW_NUMBER(), COUNT(), or PERCENT_RANK()) to find the
middle value or the average of the two middle values.

1. Query for Median when the Number of Employees is Odd (Current Data)
WITH SortedSalaries AS (
SELECT salary, ROW_NUMBER() OVER (ORDER BY salary) AS row_num, COUNT(*) OVER () AS t
otal_count
FROM Employees
)
SELECT AVG(salary) AS median_salary
FROM SortedSalaries
WHERE row_num IN ((total_count + 1) / 2, total_count / 2 + 1);

Explanation:
• ROW_NUMBER() assigns a unique row number to each salary in ascending order.
• COUNT(*) OVER () calculates the total number of records.
• The AVG(salary) computes the median:
• If the total count is odd, the middle row will be returned.
• If the total count is even, the two middle rows are averaged.

2. Query for Median when the Number of Employees is Even (After Adding
New Record)
Let's add one more record with a salary of 91,000:
INSERT INTO Employees (id, name, salary, department_id, manager_id, hire_date)
VALUES
(10, 'John', 91000, 1, 2, '2022-07-01');

Now the total number of employees is 10, an even number. The median will be the average of
the 5th and 6th highest salaries.
The same query from above will now return the correct median for the updated dataset:
WITH SortedSalaries AS (
SELECT salary, ROW_NUMBER() OVER (ORDER BY salary) AS row_num, COUNT(*) OVER () AS t
otal_count

1038
1000+ SQL Interview Questions & Answers | By Zero Analyst

FROM Employees
)
SELECT AVG(salary) AS median_salary
FROM SortedSalaries
WHERE row_num IN ((total_count + 1) / 2, total_count / 2 + 1);

Explanation of Changes:
• After adding the 10th employee, the number of records becomes even.
• The query will return the average of the 5th and 6th highest salaries as the median.

Approach 2: Using PERCENTILE_CONT() (for PostgreSQL)


PostgreSQL offers a built-in function PERCENTILE_CONT() that calculates the median
directly.

For Odd Number of Employees:


SELECT PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY salary) AS median_salary
FROM Employees;

For Even Number of Employees (After Adding the Record):


SELECT PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY salary) AS median_salary
FROM Employees;

Explanation:
• PERCENTILE_CONT(0.5) computes the 50th percentile (the median).
• WITHIN GROUP (ORDER BY salary) orders the salaries before calculating the percentile.

Expected Median Before and After Adding New Record


• Before Adding New Record:
• The list of salaries (sorted) is: 60000, 75000, 78000, 80000, 80000, 85000, 88000,
89000, 90000
• The median salary is the 5th salary: 80000.
• After Adding New Record (Salary = 91,000):
• The list of salaries (sorted) becomes: 60000, 75000, 78000, 80000, 80000, 85000,
88000, 89000, 90000, 91000
• The median salary is the average of the 5th and 6th salaries: (80000 + 85000) / 2 =
82500.

• Q.873
Find Employee Details Who Have Salary Greater Than Their Manager's Salary
Explanation
This query aims to retrieve employees whose salary is greater than their manager's salary. To
achieve this, you need to perform a self-join on the Employees table, where the employee's
manager_id is matched to the id of the manager. Then, filter the results where the
employee's salary is greater than the manager's salary.

Datasets and SQL Schemas


• - Table creation
DROP TABLE IF EXISTS Employees;
CREATE TABLE Employees (

1039
1000+ SQL Interview Questions & Answers | By Zero Analyst

id INT,
name VARCHAR(100) NOT NULL,
salary NUMERIC(10, 2),
department_id INT,
manager_id INT,
hire_date DATE NOT NULL
);
• - Insert data into Employees table
INSERT INTO Employees (id, name, salary, department_id, manager_id, hire_date)
VALUES
(1, 'Alice', 90000, 1, NULL, '2022-01-15'),
(2, 'Bob', 80000, 2, 1, '2022-02-20'),
(3, 'Charlie', 75000, 2, 1, '2022-03-12'),
(4, 'David', 85000, 2, 1, '2022-03-25'),
(5, 'Eve', 95000, 2, 2, '2022-04-01'),
(6, 'Frank', 78000, 2, 2, '2022-04-20'),
(7, 'Grace', 60000, 3, 3, '2022-05-12'),
(8, 'Heidi', 88000, 3, 1, '2022-06-15'),
(9, 'Sam', 89000, 3, 2, '2022-05-01');

Learnings
• Self-join: Join a table with itself to compare an employee’s data with their manager’s data.
• Filtering: Use a condition to check which employees have a higher salary than their
managers.
• NULL handling: Handle employees who do not have managers (i.e., manager_id IS
NULL).

Solutions

1. Approach 1: Self-Join
The simplest way to solve this is to use a self-join. The Employees table is joined with itself
by matching employee.manager_id to manager.id, and we filter employees whose salary is
greater than their manager's salary.
SELECT e.id, e.name, e.salary, m.name AS manager_name, m.salary AS manager_salary
FROM Employees e
JOIN Employees m ON e.manager_id = m.id
WHERE e.salary > m.salary;

Explanation:
• e and m are aliases for the employee and manager respectively.
• We join the Employees table on e.manager_id = m.id to link each employee with their
respective manager.
• The WHERE e.salary > m.salary condition filters the employees whose salary is greater
than their manager’s.

2. Approach 2: Using a Subquery


Another approach is to use a subquery to find the manager's salary for each employee, and
then compare it with the employee's salary.
SELECT id, name, salary, manager_id
FROM Employees e
WHERE salary > (
SELECT salary

1040
1000+ SQL Interview Questions & Answers | By Zero Analyst

FROM Employees m
WHERE m.id = e.manager_id
);

Explanation:
• The subquery (SELECT salary FROM Employees m WHERE m.id = e.manager_id)
finds the manager's salary.
• The outer query compares the employee's salary (e.salary) with the manager's salary
retrieved by the subquery.

3. Approach 3: Using Window Functions


Window functions such as LAG() can also be used to compare each employee's salary with
the salary of their manager.
WITH ManagerSalaries AS (
SELECT id, name, salary, manager_id,
LAG(salary) OVER (PARTITION BY manager_id ORDER BY id) AS manager_salary
FROM Employees
)
SELECT id, name, salary, manager_id, manager_salary
FROM ManagerSalaries
WHERE salary > manager_salary;

Explanation:
• LAG(salary) OVER (PARTITION BY manager_id ORDER BY id) retrieves the salary of
the manager for each employee.
• The WHERE salary > manager_salary condition filters employees who have a higher
salary than their manager.

4. Approach 4: Using LEFT JOIN for Handling Null Managers


If you want to include employees who don’t have managers (e.g., top-level employees), you
can use a LEFT JOIN and handle NULL values appropriately.
SELECT e.id, e.name, e.salary, m.name AS manager_name, m.salary AS manager_salary
FROM Employees e
LEFT JOIN Employees m ON e.manager_id = m.id
WHERE e.salary > COALESCE(m.salary, 0); -- If manager salary is NULL, consider it as 0

Explanation:
• LEFT JOIN ensures that employees without managers (i.e., manager_id is NULL) are
included in the result.
• COALESCE(m.salary, 0) handles the case where the employee doesn’t have a manager by
replacing NULL values with 0. This ensures that top-level managers (without managers) do not
satisfy the condition e.salary > m.salary unless their salary is greater than 0.

Expected Output for Current Data


For the current dataset (without adding new records), the query will return employees who
have a higher salary than their manager.
• Alice has no manager (so will not appear in the results).

1041
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Bob, Charlie, David, Heidi, and Sam will not appear because their salaries are lower than
or equal to their manager’s salary.
• Eve, Frank, and Grace will appear if their salaries exceed their respective managers.

Key Takeaways
• Self-join is a straightforward approach to compare employees with their managers.
• Subqueries are helpful for breaking down the logic into smaller parts.
• Window functions like LAG() are useful for comparisons across rows.
• LEFT JOIN allows handling cases where employees don’t have managers.
• Q.874
Find Employee's Hierarchy Level
Explanation
The task is to determine the level of each employee in the company hierarchy. The level
should start at 1 for employees without managers, and increment by 1 for each level below
the top-level manager. This can be done using a recursive Common Table Expression (CTE)
that will traverse the hierarchy.

Datasets and SQL Schemas


• - Table creation
DROP TABLE IF EXISTS Employees;
CREATE TABLE Employees (
id INT PRIMARY KEY,
name VARCHAR(50),
manager_id INT
);
• - Insert data into Employees table
INSERT INTO Employees (id, name, manager_id)
VALUES
(1, 'Alice', NULL),
(2, 'Bob', 1),
(3, 'Charlie', 2),
(4, 'David', 2),
(5, 'Eve', 3),
(6, 'Frank', 3),
(7, 'Grace', 4);

Solution
To find the hierarchy level, we can use a recursive CTE. This approach works by starting
with the employees who don't have managers (level 1), and then recursively finding each
subsequent level.

Query: Using Recursive CTE


WITH RECURSIVE EmployeeHierarchy AS (
-- Base case: employees with no manager (top level)
SELECT id, name, manager_id, 1 AS level
FROM Employees
WHERE manager_id IS NULL

UNION ALL

-- Recursive case: find employees managed by someone at a higher level


SELECT e.id, e.name, e.manager_id, eh.level + 1 AS level
FROM Employees e

1042
1000+ SQL Interview Questions & Answers | By Zero Analyst

JOIN EmployeeHierarchy eh ON e.manager_id = eh.id


)
SELECT id, name, level
FROM EmployeeHierarchy
ORDER BY level, id;

Explanation:
• Base Case: The query first selects employees with manager_id IS NULL (the top-level
employees, in this case, Alice), and assigns them a level of 1.
• Recursive Case: It then recursively joins the Employees table with the
EmployeeHierarchy CTE on manager_id = id, incrementing the level for each subsequent
level in the hierarchy.
• The UNION ALL combines the base case and recursive case, continuing the recursion until
all employees are processed.
• Finally, the query selects the id, name, and level for each employee, ordered by level
and id.

Expected Output
The query will return the following result:
id | name | level
----------------------
1 | Alice | 1
2 | Bob | 2
3 | Charlie | 3
4 | David | 3
5 | Eve | 4
6 | Frank | 4
7 | Grace | 4

Key Takeaways
• Recursive CTEs are useful for hierarchical data and allow for efficient querying of parent-
child relationships.
• The base case defines the starting point (employees with no manager), and the recursive
case calculates the level for employees reporting to those in the previous levels.
• The UNION ALL combines the base and recursive results to generate the hierarchy.
• Q.875
Find Each Customer's Latest and Second Latest Order Amount
Explanation
This query is designed to retrieve the latest and the second latest order amount for each
customer. To achieve this:
• We first find the latest order for each customer using a subquery to get the most recent
order date.
• Then, for each latest order, we find the second latest order by comparing the order_date
of previous orders for the same customer.

Datasets and SQL Schemas


• - Table creation
DROP TABLE IF EXISTS orders;
CREATE TABLE orders (
order_id INT,
customer_id INT,

1043
1000+ SQL Interview Questions & Answers | By Zero Analyst

order_date DATE,
order_amount DECIMAL(10, 2)
);
• - Insert data into orders table
INSERT INTO orders (order_id, customer_id, order_date, order_amount) VALUES
(1, 101, '2024-01-10', 150.00),
(2, 101, '2024-02-15', 200.00),
(3, 101, '2024-03-20', 180.00),
(4, 102, '2024-01-12', 200.00),
(5, 102, '2024-02-25', 250.00),
(6, 102, '2024-03-10', 320.00),
(7, 103, '2024-01-25', 400.00),
(8, 103, '2024-02-15', 420.00);

Solution
The query can be optimized using a self-join or subqueries to find the latest and second
latest order amounts for each customer. Here’s the optimized query:

Query: Using Subquery for Latest and Second Latest Order


SELECT
o1.customer_id,
o1.order_amount AS latest_order_amount,
(SELECT
MAX(o3.order_amount)
FROM orders o3
WHERE
o3.customer_id = o1.customer_id
AND o3.order_date < o1.order_date
) AS second_latest_order_amount
FROM orders o1
WHERE
o1.order_date = (SELECT MAX(order_date)
FROM orders o2
WHERE o2.customer_id = o1.customer_id);

Explanation:
• Subquery for Latest Order: The outer query first selects the most recent order for each
customer by using a subquery to find the maximum order_date for each customer_id.
• Subquery for Second Latest Order: Inside the SELECT clause, a subquery is used to fetch
the second latest order. This subquery looks for orders with an order_date earlier than the
latest order (o3.order_date < o1.order_date) for the same customer_id. The
MAX(o3.order_amount) fetches the second highest order amount for each customer.
• The query ensures that for each customer, it returns their latest and second latest order
amounts.

• Q.876
Find Average Processing Time by Each Machine
Explanation
To calculate the average processing time for each machine, we need to:
• Pair the "start" and "end" activities for each process (same process_id for each machine).
• Calculate the difference between the "end" timestamp and the "start" timestamp to get the
processing time for each process.
• Group the results by machine_id and compute the average processing time for each
machine.

1044
1000+ SQL Interview Questions & Answers | By Zero Analyst

Datasets and SQL Schemas


• - Table creation
DROP TABLE IF EXISTS Activity;
CREATE TABLE Activity (
machine_id INT,
process_id INT,
activity_type VARCHAR(10),
timestamp FLOAT
);
• - Insert data into Activity table
INSERT INTO Activity (machine_id, process_id, activity_type, timestamp)
VALUES
(1, 1, 'start', 10.5),
(1, 1, 'end', 15.0),
(1, 2, 'start', 20.0),
(1, 2, 'end', 25.5),
(2, 1, 'start', 5.0),
(2, 1, 'end', 12.5),
(2, 2, 'start', 15.0),
(2, 2, 'end', 20.0);

Solution
To calculate the average processing time, we'll use the following approach:
• Join the start and end activities for each process_id and machine_id. We can use a
self-join on the Activity table by matching process_id and machine_id.
• Compute the processing time for each process by subtracting the start timestamp from
the end timestamp.
• Group by machine_id to calculate the average processing time per machine.

Query: Using Self-Join to Calculate Processing Time


SELECT
a1.machine_id,
AVG(a2.timestamp - a1.timestamp) AS avg_processing_time
FROM Activity a1
JOIN Activity a2 ON a1.machine_id = a2.machine_id
AND a1.process_id = a2.process_id
AND a1.activity_type = 'start'
AND a2.activity_type = 'end'
GROUP BY a1.machine_id;

Explanation:
• Self-Join: We join the Activity table with itself by matching the machine_id and
process_id. We also ensure that the first part of the process (start activity) is joined with
the second part (end activity).
• Processing Time Calculation: The processing time for each process is calculated as
a2.timestamp - a1.timestamp where a1 represents the "start" activity and a2 represents
the "end" activity.
• AVG Function: We use AVG() to calculate the average processing time for each machine.
• Group By: The results are grouped by machine_id to compute the average processing
time for each machine.
• Q.877
Retrieve All Ids of a Person Whose Rating is Greater Than Their Friend's Rating
Explanation

1045
1000+ SQL Interview Questions & Answers | By Zero Analyst

The task is to retrieve the IDs of people based on the following conditions:
• If the person has friends, check if their rating is greater than that of their friend's rating.
• If the person doesn't have any friends (i.e., friend_id is NULL), retrieve the ID only if
their rating is greater than 85.
The solution involves:
• Using a LEFT JOIN to link each person with their rating and their friend's rating.
• Applying a WHERE clause to check the conditions for both cases: when the person has
friends and when they don't.
• DISTINCT is used to avoid duplicate entries in case a person has multiple friends who
meet the condition.

Datasets and SQL Schemas


• - Table creation for Friends
CREATE TABLE Friends (
id INT,
friend_id INT
);
• - Table creation for Ratings
DROP TABLE IF EXISTS Ratings;
CREATE TABLE Ratings (
id INT PRIMARY KEY,
rating INT
);
• - Insert data into Friends table
INSERT INTO Friends (id, friend_id)
VALUES
(1, 2),
(1, 3),
(2, 3),
(3, 4),
(4, 1),
(4, 2),
(5,NULL),
(6,NULL);
• - Insert data into Ratings table
INSERT INTO Ratings (id, rating)
VALUES
(1, 85),
(2, 90),
(3, 75),
(4, 88),
(5, 82),
(6, 91);

Solution
To achieve the required result, we:
• Perform a LEFT JOIN between the Friends table and the Ratings table for both the
person (f.id) and their friend (f.friend_id).
• In the WHERE clause, apply the following conditions:
• If the person has a friend (f.friend_id IS NOT NULL), check if their rating is greater
than the friend's rating.
• If the person doesn't have a friend (f.friend_id IS NULL), check if their rating is greater
than 85.

1046
1000+ SQL Interview Questions & Answers | By Zero Analyst

• DISTINCT ensures that the result does not contain duplicate IDs.

Query:
SELECT DISTINCT(f.id)
FROM Friends as f
LEFT JOIN Ratings as r ON r.id = f.id
LEFT JOIN Ratings as r2 ON f.friend_id = r2.id
WHERE
(f.friend_id IS NOT NULL AND r.rating > r2.rating) -- Person's rating is greater
than friend's rating
OR
(f.friend_id IS NULL AND r.rating > 85); -- Person doesn't have a
friend, but rating is greater than 85
• Q.878
Find the ID Where the Seat is Empty and Both the Seats Before and After It are Also
Empty
Explanation
To find the rows where the seat is empty (represented by 1), and both the previous seat
(prev_seat_id) and the next seat (next_seat_id) are also empty:
• We will use window functions LAG() and LEAD() to get the values of the previous and
next seats based on the id order.
• After retrieving the previous and next seats, we filter the result to include only the rows
where the current seat is empty (seat_id = 1), and both the previous and next seats are also
empty (prev_seat_id = 1 and next_seat_id = 1).

Datasets and SQL Schemas


• - Table creation
DROP TABLE IF EXISTS cinemas;
CREATE TABLE cinemas
(id SERIAL, seat_id INT);
• - Insert data into cinemas table
INSERT INTO cinemas(seat_id)
VALUES
(1),
(0),
(0),
(1),
(1),
(1),
(0);

Solution
To achieve the desired result, we use:
• LAG(seat_id): Retrieves the seat status of the row before the current one (previous seat).
• LEAD(seat_id): Retrieves the seat status of the row after the current one (next seat).
• WHERE clause to check that the current seat is empty (seat_id = 1), and both the previous
and next seats are also empty (prev_seat_id = 1 and next_seat_id = 1).

Query:
SELECT
id,
seat_id
FROM

1047
1000+ SQL Interview Questions & Answers | By Zero Analyst

(
SELECT
*,
LAG(seat_id) OVER(ORDER BY id) AS prev_seat_id,
LEAD(seat_id) OVER(ORDER BY id) AS next_seat_id
FROM cinemas
) AS t1
WHERE
seat_id = 1
AND prev_seat_id = 1
AND next_seat_id = 1;

Explanation:
• LAG() and LEAD(): These window functions are used to retrieve the values of the
previous and next row's seat_id. They are ordered by the id field to ensure we look at
consecutive rows.
• WHERE clause: We filter the result to only include rows where:
• The current seat (seat_id) is empty (1).
• The previous (prev_seat_id) and next (next_seat_id) seats are also empty (1).

Explanation:
• Seat ID 4 is empty (seat_id = 1), and both the previous seat (id = 3) and the next seat
(id = 5) are also empty (seat_id = 1).

Key Takeaways
• Window Functions: LAG() and LEAD() are powerful for comparing a row with its
neighbors (previous or next row).
• Filtering with Window Functions: You can filter based on the results of window
functions in a subquery to get complex conditions involving neighboring rows.
• Order of Rows: In this case, we use ORDER BY id to ensure we are looking at consecutive
rows based on their id values.
• Q.879
Find Users Who Have Logged in on at Least 3 Consecutive Days
Explanation
The task is to find users who have logged in on at least 3 consecutive days. This can be
achieved by:
• Using Window Functions: We can use LAG() to compare the current login date with the
previous one to identify consecutive login streaks.
• Cumulative Streak Counting: We calculate consecutive logins using a SUM() function
over the computed streaks.
• Filter Consecutive Streaks: Finally, we filter users with streaks of at least 2 consecutive
days, which indicates at least 3 consecutive login days (since SUM(steaks) starts at 0).

Datasets and SQL Schemas


• - Table creation for user_activity
DROP TABLE IF EXISTS user_activity;
CREATE TABLE user_activity (
user_id INT,
login_date DATE
);
• - Insert data into user_activity table

1048
1000+ SQL Interview Questions & Answers | By Zero Analyst

INSERT INTO user_activity (user_id, login_date) VALUES


(1, '2024-08-01'),
(1, '2024-08-02'),
(1, '2024-08-05'),
(1, '2024-08-07'),
(2, '2024-08-01'),
(2, '2024-08-02'),
(2, '2024-08-03'),
(2, '2024-08-04'),
(2, '2024-08-06'),
(3, '2024-08-01'),
(3, '2024-08-02'),
(3, '2024-08-03'),
(3, '2024-08-07'),
(4, '2024-08-02'),
(4, '2024-08-05'),
(4, '2024-08-07');

Solution
We use two Common Table Expressions (CTEs) to calculate the consecutive login days:
• First CTE (steak_table): For each login, we use the LAG() function to compare the
current login date with the previous one. If the dates are consecutive (i.e., the difference is
exactly 1 day), we mark the streak with a 1; otherwise, we mark it with 0.
• Second CTE (steak2): We calculate the cumulative sum of the steaks for each user to
identify the total consecutive login days.
• Final Query: We filter the results to only include users whose streak is 2 or greater, which
means they logged in for at least 3 consecutive days.

Query:
WITH steak_table AS
(
SELECT
user_id,
login_date,
CASE
WHEN login_date = LAG(login_date) OVER(PARTITION BY user_id ORDER BY login_d
ate) + INTERVAL '1 day' THEN 1
ELSE 0
END as steaks
FROM user_activity
),
steak2 AS
(
SELECT
user_id,
login_date,
SUM(steaks) OVER(PARTITION BY user_id ORDER BY login_date) as consecutive_login
FROM steak_table
)
SELECT
DISTINCT user_id
FROM steak2
WHERE consecutive_login >= 2;

Explanation:
• LAG(login_date): This function looks at the previous row's login_date for each user to
check if the current login_date is exactly 1 day after the previous one.
• CASE WHEN: If the login date is consecutive (i.e., the difference between the current and
previous login date is 1 day), we assign a 1 to indicate the streak continues.

1049
1000+ SQL Interview Questions & Answers | By Zero Analyst

• SUM(steaks): This cumulative sum aggregates consecutive streaks for each user, and if the
sum is >= 2, it means the user logged in for at least 3 consecutive days.
• DISTINCT user_id: We use DISTINCT to ensure each user appears only once in the result
set.

• Q.880
SQL Query to Find 3 Consecutive Available Seats in a Cinema Hall
Explanation
We need to find 3 consecutive available seats for three friends to sit together in a cinema hall.
We assume that:
• status = 0 indicates the seat is available.
• status = 1 indicates the seat is booked.
• We need to check for a sequence of three consecutive seats that are all available (i.e.,
status = 0 for three consecutive rows).
Solution Steps:
• Identify Consecutive Seats: We will use window functions like LEAD() or LAG() to
compare each seat with its next and previous seat.
• Filter Available Seats: We only want rows where the current seat and its two adjacent
seats are all available.
• Return Seat Numbers: We'll return the first and last seat numbers from the sequence of 3
available seats.

Datasets
-- Step 1: Create the cinema hall table
CREATE TABLE cinema_hall (
id INT PRIMARY KEY,
status INT CHECK (status IN (0, 1)) -- 0: Available, 1: Booked
);

-- Step 2: Insert data into the table


INSERT INTO cinema_hall (id, status) VALUES
(1, 0),
(2, 1),
(3, 1),
(4, 0),
(5, 0),
(6, 0),
(7, 1);

SQL Query
WITH ConsecutiveSeats AS (
SELECT
id AS seat_id,
LEAD(status, 1) OVER(ORDER BY id) AS next_seat_status,
LEAD(status, 2) OVER(ORDER BY id) AS next_to_next_seat_status
FROM cinema_hall
)
SELECT
seat_id AS first_seat,
(seat_id + 2) AS last_seat
FROM ConsecutiveSeats
WHERE status = 0
AND next_seat_status = 0
AND next_to_next_seat_status = 0;

1050
1000+ SQL Interview Questions & Answers | By Zero Analyst

Explanation:
• Window Functions (LEAD()):
• LEAD(status, 1) gives the status of the next seat.
• LEAD(status, 2) gives the status of the seat after the next one.
• Filtering the Seats:
• We check if the current seat and the next two seats have status = 0 (available).
• Output:
• If the condition is met, the first seat and the last seat from the sequence of 3 consecutive
available seats are returned.

Expected Output (Based on the Sample Data)


Given Cinema Hall Data:

id status

1 0

2 1

3 1

4 0

5 0

6 0

7 1
Query Result:

first_seat last_seat

4 6

Key Points to Note:


• Window Functions (LEAD() and LAG()): These functions allow us to look ahead (or
behind) in the dataset, which is essential for checking consecutive rows.
• Efficient Filtering: By filtering on the adjacent seats' statuses, we efficiently identify
blocks of 3 consecutive available seats.
• Sequence Matching: The LEAD() function helps us match adjacent rows and find the valid
blocks of available seats.
• Q.881

1051
1000+ SQL Interview Questions & Answers | By Zero Analyst

SQL Query to Order By Each State


Explanation:
To order the orders by each state, we need to display the orders sorted by the states column.
Additionally, we will also join the customers table to display the customer details for each
order.

Datasets
-- orders & customers

-- Create the ORDER table


CREATE TABLE orders (
order_id INT PRIMARY KEY, -- Order ID
customer_id INT, -- Customer ID
order_date DATE, -- Order Date
total_amount DECIMAL(10, 2), -- Order Total Amount
states VARCHAR(50), -- Region
category VARCHAR(50) -- Category
);

INSERT INTO orders (order_id, customer_id, order_date, total_amount, states, category) V


ALUES
(1, 1, '2024-11-15', 500.00, 'Maharashtra', 'Electronics'),
(2, 2, '2024-10-10', 1200.00, 'Karnataka', 'Furniture'),
(3, 3, '2024-09-25', 300.00, 'Tamil Nadu', 'Books'),
(4, 4, '2024-06-05', 1500.00, 'Delhi', 'Clothing'),
(5, 1, '2023-12-12', 700.00, 'Kerala', 'Electronics'),
(6, 2, '2024-11-20', 800.00, 'West Bengal', 'Home Appliances'),
(7, 3, '2023-05-10', 600.00, 'Rajasthan', 'Furniture'),
(8, 5, '2024-07-15', 450.00, 'Gujarat', 'Books'),
(9, 6, '2024-01-25', 1000.00, 'Punjab', 'Electronics'),
(10, 7, '2024-03-10', 550.00, 'Uttar Pradesh', 'Clothing');

-- customer table
CREATE TABLE customers (
customer_id INT PRIMARY KEY, -- Customer ID
customer_name VARCHAR(100), -- Customer Name
email VARCHAR(100), -- Email Address
phone VARCHAR(15) -- Phone Number
);

-- Insert sample customers


INSERT INTO customers (customer_id, customer_name, email, phone) VALUES
(1, 'Alice Johnson', '[email protected]', '1234567890'),
(2, 'Bob Smith', '[email protected]', '2345678901'),
(3, 'Charlie Davis', '[email protected]', '3456789012'),
(4, 'Diana Prince', '[email protected]', '4567890123'),
(5, 'Eve Adams', '[email protected]', '5678901234'),
(6, 'Frank Taylor', '[email protected]', '6789012345'),
(7, 'Grace Lee', '[email protected]', '7890133456'),
(8, 'Sam', '[email protected]', '9890123456'),
(9, 'Alex', '[email protected]', '8890123456');

SELECT * FROM customers;


SELECT * FROM orders;

SQL Query:
SELECT
o.order_id,
o.customer_id,
c.customer_name,
o.order_date,
o.total_amount,
o.states,

1052
1000+ SQL Interview Questions & Answers | By Zero Analyst

o.category
FROM orders o
JOIN customers c
ON o.customer_id = c.customer_id
ORDER BY o.states, o.order_date;

Explanation:
• JOIN: We are joining the orders table with the customers table on the customer_id to
retrieve the customer details along with their orders.
• ORDER BY: We are ordering the result set first by the states column and then by the
order_date within each state, ensuring that orders are grouped and ordered by both state and
date.
• Selected Columns: The query selects the order ID, customer ID, customer name, order
date, total amount, state, and category for each order.

Output Example:
Assuming we have the sample data you provided, the query will produce results like this:

order_i customer_i customer_na order_dat total_amou


states category
d d me e nt

2024-11- Maharasht Electroni


1 1 Alice Johnson 500.00
15 ra cs

2023-12- Electroni
5 1 Alice Johnson 700.00 Kerala
12 cs

Home
2024-11- West
6 2 Bob Smith 800.00 Applianc
20 Bengal
es

2024-10-
2 2 Bob Smith 1200.00 Karnataka Furniture
10

2023-05-
7 3 Charlie Davis 600.00 Rajasthan Furniture
10

2024-09- Tamil
3 3 Charlie Davis 300.00 Books
25 Nadu

2024-07-
8 5 Eve Adams 450.00 Gujarat Books
15

2024-03- Uttar
10 7 Grace Lee 550.00 Clothing
10 Pradesh

1053
1000+ SQL Interview Questions & Answers | By Zero Analyst

2024-06-
4 4 Diana Prince 1500.00 Delhi Clothing
05

2024-01- Electroni
9 6 Frank Taylor 1000.00 Punjab
25 cs

Key Takeaways:
• Ordering by Multiple Columns: The ORDER BY clause can be used to order by multiple
columns. In this case, first by states, then by order_date.
• Join Operations: The JOIN between orders and customers allows us to display
customer-specific details for each order.
• Q.882

Question
Write an SQL query to find the highest salary from the employees table where the salary
value occurs only once.

Explanation
To find the highest salary that appears only once in the table, we need to:
• Count the occurrences of each salary using GROUP BY.
• Filter out salaries that appear more than once using the HAVING clause.
• Select the maximum salary from the filtered results.

Datasets and SQL Schemas

Table Creation
CREATE TABLE employees (
id INT,
first_name VARCHAR(50),
last_name VARCHAR(50),
age INT,
sex VARCHAR(1),
employee_title VARCHAR(50),
department VARCHAR(50),
salary INT,
target INT,
bonus INT,
email VARCHAR(100),
city VARCHAR(50),
address VARCHAR(100),
manager_id INT
);

Sample Data Insertion


INSERT INTO employees
(id, first_name, last_name, age, sex, employee_title, department, salary, target, bonus,
email, city, address, manager_id)
VALUES
(5, 'Max', 'George', 26, 'M', 'Sales', 'Sales', 1300, 200, 150, '[email protected]', 'Cali
fornia', '2638 Richards Avenue', 1),
(13, 'Katty', 'Bond', 56, 'F', 'Manager', 'Management', 150000, 0, 300, '[email protected]
om', 'Arizona', NULL, 1),
(11, 'Richerd', 'Gear', 57, 'M', 'Manager', 'Management', 250000, 0, 300, 'Richerd@compa
ny.com', 'Alabama', NULL, 1),

1054
1000+ SQL Interview Questions & Answers | By Zero Analyst

(10, 'Jennifer', 'Dion', 34, 'F', 'Sales', 'Sales', 1000, 200, 150, '[email protected]
m', 'Alabama', NULL, 13),
(19, 'George', 'Joe', 50, 'M', 'Manager', 'Management', 250000, 0, 300, 'George@company.
com', 'Florida', '1003 Wyatt Street', 1),
(18, 'Laila', 'Mark', 26, 'F', 'Sales', 'Sales', 1000, 200, 150, '[email protected]', 'F
lorida', '3655 Spirit Drive', 11),
(20, 'Sarrah', 'Bicky', 31, 'F', 'Senior Sales', 'Sales', 2000, 200, 150, 'Sarrah@compan
y.com', 'Florida', '1176 Tyler Avenue', 19);

Learnings
• GROUP BY and HAVING: Grouping data by salary and filtering out salaries that appear
more than once using HAVING COUNT(salary) = 1.
• MAX(): Using the MAX() function to find the highest salary from the filtered results.

Solutions

PostgreSQL Solution
SELECT MAX(salary) AS highest_unique_salary
FROM employees
GROUP BY salary
HAVING COUNT(salary) = 1;

MySQL Solution
SELECT MAX(salary) AS highest_unique_salary
FROM employees
GROUP BY salary
HAVING COUNT(salary) = 1;
• Q.883

Question
Write an SQL query to find employees who have a salary higher than their managers.

Explanation
To find employees with a salary greater than their managers, we need to:
• Join the employees table with itself (self-join) where one instance represents the employee
and the other represents the manager.
• Compare the salary of the employee with that of their manager.
• Select the employees whose salary is greater than the manager’s salary.

Datasets and SQL Schemas

Table Creation
CREATE TABLE employees (
id INT,
first_name VARCHAR(50),
last_name VARCHAR(50),
age INT,
sex VARCHAR(1),
employee_title VARCHAR(50),
department VARCHAR(50),
salary INT,
target INT,
bonus INT,
email VARCHAR(100),
city VARCHAR(50),
address VARCHAR(100),
manager_id INT

1055
1000+ SQL Interview Questions & Answers | By Zero Analyst

);

Sample Data Insertion


INSERT INTO employees
(id, first_name, last_name, age, sex, employee_title, department, salary, target, bonus,
email, city, address, manager_id)
VALUES
(5, 'Max', 'George', 26, 'M', 'Sales', 'Sales', 1300, 200, 150, '[email protected]', 'Cali
fornia', '2638 Richards Avenue', 1),
(13, 'Katty', 'Bond', 56, 'F', 'Manager', 'Management', 150000, 0, 300, '[email protected]
om', 'Arizona', NULL, 1),
(11, 'Richerd', 'Gear', 57, 'M', 'Manager', 'Management', 250000, 0, 300, 'Richerd@compa
ny.com', 'Alabama', NULL, 1),
(10, 'Jennifer', 'Dion', 34, 'F', 'Sales', 'Sales', 1000, 200, 150, '[email protected]
m', 'Alabama', NULL, 13),
(19, 'George', 'Joe', 50, 'M', 'Manager', 'Management', 250000, 0, 300, 'George@company.
com', 'Florida', '1003 Wyatt Street', 1),
(18, 'Laila', 'Mark', 26, 'F', 'Sales', 'Sales', 1000, 200, 150, '[email protected]', 'F
lorida', '3655 Spirit Drive', 11),
(20, 'Sarrah', 'Bicky', 31, 'F', 'Senior Sales', 'Sales', 2000, 200, 150, 'Sarrah@compan
y.com', 'Florida', '1176 Tyler Avenue', 19);

Learnings
• Self-Join: Joining the employees table with itself to compare employees and their
managers.
• Condition: Comparing the salary of the employee and their manager_id to find the
employees who earn more than their managers.

Solutions

PostgreSQL Solution
SELECT e.first_name AS employee_first_name, e.last_name AS employee_last_name, e.salary
AS employee_salary,
m.first_name AS manager_first_name, m.last_name AS manager_last_name, m.salary AS
manager_salary
FROM employees e
JOIN employees m ON e.manager_id = m.id
WHERE e.salary > m.salary;

MySQL Solution
SELECT e.first_name AS employee_first_name, e.last_name AS employee_last_name, e.salary
AS employee_salary,
m.first_name AS manager_first_name, m.last_name AS manager_last_name, m.salary AS
manager_salary
FROM employees e
JOIN employees m ON e.manager_id = m.id
WHERE e.salary > m.salary;
• Q.884

Question
Write an SQL query to categorize customers into new or returning based on the number of
returns they have done. If the number of returns is greater than or equal to 1, they are
categorized as new; otherwise, they are categorized as returning. Use the sales and
returns tables.

Explanation
We need to count the number of returns each customer has made. If a customer has made one
or more returns, they are categorized as new; otherwise, they are returning. This can be

1056
1000+ SQL Interview Questions & Answers | By Zero Analyst

achieved by joining the sales and returns tables, counting the number of returns for each
customer, and then using a CASE statement to categorize them accordingly.

Datasets and SQL Schemas

Table Creation
CREATE TABLE sales (
sale_id INT PRIMARY KEY,
customer_id INT,
sale_date DATE,
sale_amount DECIMAL(10, 2)
);

CREATE TABLE returns (


return_id INT PRIMARY KEY,
customer_id INT,
return_date DATE,
return_amount DECIMAL(10, 2)
);

Sample Data Insertion


-- Inserting data into sales table
INSERT INTO sales (sale_id, customer_id, sale_date, sale_amount)
VALUES
(1, 101, '2025-01-10', 150.00),
(2, 102, '2025-01-12', 200.00),
(3, 103, '2025-01-15', 250.00),
(4, 104, '2025-01-18', 300.00),
(5, 101, '2025-02-20', 100.00),
(6, 105, '2025-02-22', 400.00),
(7, 106, '2025-03-01', 500.00),
(8, 107, '2025-03-05', 120.00),
(9, 108, '2025-03-10', 200.00),
(10, 101, '2025-03-12', 300.00);

-- Inserting data into returns table


INSERT INTO returns (return_id, customer_id, return_date, return_amount)
VALUES
(1, 101, '2025-01-15', 50.00),
(2, 102, '2025-01-20', 100.00),
(3, 103, '2025-02-01', 100.00),
(4, 101, '2025-02-18', 30.00),
(5, 104, '2025-03-02', 50.00),
(6, 105, '2025-03-10', 150.00);

Learnings
• COUNT() with GROUP BY: Counting the number of returns for each customer.
• JOINs: Joining sales and returns tables using customer_id.
• CASE: Using a CASE statement to categorize customers based on their return count.
• Aggregating: Aggregating returns data to calculate the number of returns per customer.

Solutions

PostgreSQL Solution
SELECT s.customer_id,
CASE
WHEN COUNT(r.return_id) >= 1 THEN 'New'
ELSE 'Returning'
END AS customer_category
FROM sales s
LEFT JOIN returns r ON s.customer_id = r.customer_id
GROUP BY s.customer_id;

1057
1000+ SQL Interview Questions & Answers | By Zero Analyst

MySQL Solution
SELECT s.customer_id,
CASE
WHEN COUNT(r.return_id) >= 1 THEN 'New'
ELSE 'Returning'
END AS customer_category
FROM sales s
LEFT JOIN returns r ON s.customer_id = r.customer_id
GROUP BY s.customer_id;
• Q.885

Question
Write an SQL query to find the driver who cancelled the highest number of rides.

Explanation
We need to count the number of cancelled rides for each driver, and then identify the driver
with the highest count. This can be done by grouping the rides by driver and filtering for
cancelled rides, then sorting by the number of cancellations in descending order and limiting
the result to one.

Datasets and SQL Schemas

Table Creation
CREATE TABLE rides (
ride_id INT PRIMARY KEY,
driver_id INT,
ride_status VARCHAR(20) -- 'completed', 'cancelled', etc.
);

CREATE TABLE drivers (


driver_id INT PRIMARY KEY,
driver_name VARCHAR(100)
);

Sample Data Insertion


-- Inserting data into drivers table
INSERT INTO drivers (driver_id, driver_name)
VALUES
(1, 'John'),
(2, 'Emma'),
(3, 'David'),
(4, 'Sophia');

-- Inserting data into rides table


INSERT INTO rides (ride_id, driver_id, ride_status)
VALUES
(1, 1, 'completed'),
(2, 1, 'cancelled'),
(3, 2, 'completed'),
(4, 3, 'cancelled'),
(5, 3, 'completed'),
(6, 1, 'cancelled'),
(7, 4, 'cancelled'),
(8, 2, 'completed'),
(9, 2, 'cancelled'),
(10, 3, 'cancelled');

Learnings
• Counting with GROUP BY: Using COUNT() to calculate the number of cancelled rides
per driver.

1058
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Filtering: Filtering only the cancelled rides using WHERE ride_status = 'cancelled'.
• Sorting and Limiting: Sorting by cancellation count in descending order and using LIMIT
1 to get the driver with the highest number of cancellations.

Solutions

PostgreSQL Solution
SELECT d.driver_name, COUNT(r.ride_id) AS cancelled_rides
FROM rides r
JOIN drivers d ON r.driver_id = d.driver_id
WHERE r.ride_status = 'cancelled'
GROUP BY d.driver_name
ORDER BY cancelled_rides DESC
LIMIT 1;

MySQL Solution
SELECT d.driver_name, COUNT(r.ride_id) AS cancelled_rides
FROM rides r
JOIN drivers d ON r.driver_id = d.driver_id
WHERE r.ride_status = 'cancelled'
GROUP BY d.driver_name
ORDER BY cancelled_rides DESC
LIMIT 1;
• Q.886

Question
Write an SQL query to find out which restaurant had the highest number of orders in each
quarter.

Explanation
We need to group the orders by the quarter of the year, then count the number of orders for
each restaurant in each quarter. For each quarter, we will find the restaurant with the highest
number of orders.

Datasets and SQL Schemas

Table Creation
CREATE TABLE orders (
order_id INT PRIMARY KEY,
restaurant_id INT,
order_date DATE
);

CREATE TABLE restaurants (


restaurant_id INT PRIMARY KEY,
restaurant_name VARCHAR(100)
);

Sample Data Insertion


-- Inserting data into restaurants table
INSERT INTO restaurants (restaurant_id, restaurant_name)
VALUES
(1, 'Restaurant A'),
(2, 'Restaurant B'),
(3, 'Restaurant C'),
(4, 'Restaurant D');

-- Inserting data into orders table

1059
1000+ SQL Interview Questions & Answers | By Zero Analyst

INSERT INTO orders (order_id, restaurant_id, order_date)


VALUES
(1, 1, '2025-01-10'),
(2, 1, '2025-02-15'),
(3, 2, '2025-03-20'),
(4, 3, '2025-04-05'),
(5, 4, '2025-04-25'),
(6, 1, '2025-05-10'),
(7, 3, '2025-06-10'),
(8, 1, '2025-07-15'),
(9, 2, '2025-08-30'),
(10, 4, '2025-09-05'),
(11, 1, '2025-10-15'),
(12, 2, '2025-11-01');

Learnings
• Date Extraction: Using functions like EXTRACT(QUARTER FROM date) (PostgreSQL) or
QUARTER(date) (MySQL) to extract the quarter from the order date.
• Aggregation: Using COUNT() to calculate the number of orders per restaurant.
• Subquery or Window Functions: Using a subquery or window function to identify the
restaurant with the highest number of orders for each quarter.
• GROUP BY and ORDER BY: Grouping by quarter and restaurant, then ordering to
identify the restaurant with the most orders.

Solutions

PostgreSQL Solution
WITH quarter_orders AS (
SELECT
EXTRACT(QUARTER FROM order_date) AS quarter,
restaurant_id,
COUNT(order_id) AS order_count
FROM orders
GROUP BY quarter, restaurant_id
)
SELECT
q.quarter,
r.restaurant_name,
q.order_count
FROM quarter_orders q
JOIN restaurants r ON q.restaurant_id = r.restaurant_id
WHERE (q.quarter, q.order_count) IN (
SELECT quarter, MAX(order_count)
FROM quarter_orders
GROUP BY quarter
)
ORDER BY q.quarter;

MySQL Solution
WITH quarter_orders AS (
SELECT
QUARTER(order_date) AS quarter,
restaurant_id,
COUNT(order_id) AS order_count
FROM orders
GROUP BY quarter, restaurant_id
)
SELECT
q.quarter,
r.restaurant_name,
q.order_count
FROM quarter_orders q
JOIN restaurants r ON q.restaurant_id = r.restaurant_id
WHERE (q.quarter, q.order_count) IN (

1060
1000+ SQL Interview Questions & Answers | By Zero Analyst

SELECT quarter, MAX(order_count)


FROM quarter_orders
GROUP BY quarter
)
ORDER BY q.quarter;
• Q.887

Question
Write an SQL query to find the best day of the week based on the highest total sales.

Explanation
To solve this, we need to group the sales by day of the week and calculate the total sales for
each day. Afterward, we can use the ORDER BY clause to sort the days by total sales in
descending order, and use LIMIT 1 to retrieve the best day.

Datasets and SQL Schemas

Table Creation
CREATE TABLE sales (
sale_id INT PRIMARY KEY,
sale_date DATE,
sale_amount DECIMAL(10, 2)
);

Sample Data Insertion


INSERT INTO sales (sale_id, sale_date, sale_amount)
VALUES
(1, '2025-01-01', 200.50),
(2, '2025-01-02', 150.75),
(3, '2025-01-03', 300.00),
(4, '2025-01-04', 250.00),
(5, '2025-01-05', 450.00),
(6, '2025-01-06', 500.00),
(7, '2025-01-07', 100.00),
(8, '2025-01-08', 350.00),
(9, '2025-01-09', 200.00),
(10, '2025-01-10', 150.00);

Learnings
• Grouping by Date: Using the DAYOFWEEK() (MySQL) or EXTRACT(DOW FROM date)
(PostgreSQL) to extract the day of the week from the sale_date.
• Aggregation: Using SUM() to calculate the total sales for each day.
• Sorting: Sorting the results to identify the day with the highest total sales.
• Limiting Results: Using LIMIT 1 to return only the day with the highest sales.

Solutions

PostgreSQL Solution
SELECT TO_CHAR(sale_date, 'Day') AS day_of_week, SUM(sale_amount) AS total_sales
FROM sales
GROUP BY day_of_week
ORDER BY total_sales DESC
LIMIT 1;

MySQL Solution
SELECT DAYNAME(sale_date) AS day_of_week, SUM(sale_amount) AS total_sales

1061
1000+ SQL Interview Questions & Answers | By Zero Analyst

FROM sales
GROUP BY day_of_week
ORDER BY total_sales DESC
LIMIT 1;
• Q.888

Question
Write an SQL query to find the department with the highest average salary in the company.

Explanation
To find the department with the highest average salary, we need to first group employees by
their department and calculate the average salary for each department using the AVG function.
Then, we can use the ORDER BY clause to sort the results by average salary in descending
order and limit the result to the top department.

Datasets and SQL Schemas

Table Creation
CREATE TABLE employees (
employee_id INT PRIMARY KEY,
employee_name VARCHAR(50),
department_id INT,
salary INT
);

CREATE TABLE departments (


department_id INT PRIMARY KEY,
department_name VARCHAR(50)
);

Sample Data Insertion


-- Inserting data into departments table
INSERT INTO departments (department_id, department_name)
VALUES
(1, 'HR'),
(2, 'Engineering'),
(3, 'Marketing');

-- Inserting data into employees table


INSERT INTO employees (employee_id, employee_name, department_id, salary)
VALUES
(1, 'John', 1, 5000),
(2, 'Jane', 2, 6000),
(3, 'David', 2, 8000),
(4, 'Emma', 3, 7000),
(5, 'James', 2, 7500);

Learnings
• Grouping and Aggregation: Using GROUP BY with AVG to calculate the average salary per
department.
• Sorting: Sorting the results by calculated average salary to identify the department with the
highest value.
• Limiting Results: Using LIMIT to get only the department with the highest average salary.

Solutions

PostgreSQL Solution

1062
1000+ SQL Interview Questions & Answers | By Zero Analyst

SELECT d.department_name, AVG(e.salary) AS average_salary


FROM employees e
JOIN departments d ON e.department_id = d.department_id
GROUP BY d.department_name
ORDER BY average_salary DESC
LIMIT 1;

MySQL Solution
SELECT d.department_name, AVG(e.salary) AS average_salary
FROM employees e
JOIN departments d ON e.department_id = d.department_id
GROUP BY d.department_name
ORDER BY average_salary DESC
LIMIT 1;
• Q.889

Question
Write an SQL query to find all employees who earn more than the average salary.

Explanation
To solve this, we need to calculate the average salary using the AVG function and then filter
out the employees who earn more than this average. This can be done using a subquery to
calculate the average salary and then a WHERE clause to compare employee salaries to that
average.

Datasets and SQL Schemas

Table Creation
CREATE TABLE employees (
employee_id INT PRIMARY KEY,
employee_name VARCHAR(50),
salary INT
);

Sample Data Insertion


INSERT INTO employees (employee_id, employee_name, salary)
VALUES
(1, 'John', 5000),
(2, 'Jane', 6000),
(3, 'David', 8000),
(4, 'Emma', 7000),
(5, 'James', 7500);

Learnings
• Aggregation: Using AVG() to calculate the average salary.
• Subqueries: Using a subquery to filter based on a calculated value (average salary).
• Comparison: Using the WHERE clause to compare individual values against a calculated
metric.

Solutions

PostgreSQL Solution
SELECT employee_id, employee_name, salary
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);

1063
1000+ SQL Interview Questions & Answers | By Zero Analyst

MySQL Solution
SELECT employee_id, employee_name, salary
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);
• Q.890

Question
Write an SQL query to find the name of the product with the highest price in each country.

Explanation
We need to identify the product with the highest price for each country. To do this, we can
use a combination of JOIN between the suppliers and products tables, and the GROUP BY
clause to group by country. For each group, we will use the MAX function to get the product
with the highest price, and then match the product name to that highest price.

Datasets and SQL Schemas

Table Creation
CREATE TABLE suppliers (
supplier_id INT PRIMARY KEY,
supplier_name VARCHAR(25),
country VARCHAR(25)
);

CREATE TABLE products (


product_id INT PRIMARY KEY,
product_name VARCHAR(25),
supplier_id INT,
price FLOAT,
FOREIGN KEY (supplier_id) REFERENCES suppliers(supplier_id)
);

Sample Data Insertion


-- Inserting data into suppliers table
INSERT INTO suppliers (supplier_id, supplier_name, country)
VALUES
(501, 'alan', 'India'),
(502, 'rex', 'US'),
(503, 'dodo', 'India'),
(504, 'rahul', 'US'),
(505, 'zara', 'Canada'),
(506, 'max', 'Canada');

-- Inserting data into products table


INSERT INTO products (product_id, product_name, supplier_id, price)
VALUES
(201, 'iPhone 14', 501, 1299),
(202, 'iPhone 8', 502, 999),
(203, 'iPhone 11', 503, 1199),
(204, 'iPhone 13', 502, 1199),
(205, 'iPhone 12', 502, 1199),
(206, 'iPhone 14', 501, 1399),
(207, 'iPhone 15', 503, 1499),
(208, 'iPhone 15', 505, 1499),
(209, 'iPhone 12', 502, 1299),
(210, 'iPhone 13', 502, 1199),
(211, 'iPhone 11', 501, 1099),
(212, 'iPhone 14', 503, 1399),
(213, 'iPhone 8', 502, 1099),
(222, 'Samsung Galaxy S21', 504, 1699),
(223, 'Samsung Galaxy S20', 505, 1899),
(224, 'Google Pixel 6', 501, 899),

1064
1000+ SQL Interview Questions & Answers | By Zero Analyst

(225, 'Google Pixel 5', 502, 799),


(226, 'OnePlus 9 Pro', 503, 1699),
(227, 'OnePlus 9', 502, 1999),
(228, 'Xiaomi Mi 11', 501, 899),
(229, 'Xiaomi Mi 10', 504, 699),
(230, 'Huawei P40 Pro', 505, 1099),
(231, 'Huawei P30', 502, 1299),
(232, 'Sony Xperia 1 III', 503, 1199),
(233, 'Sony Xperia 5 III', 501, 999),
(234, 'LG Velvet', 505, 1899),
(235, 'LG G8 ThinQ', 504, 799),
(236, 'Motorola Edge Plus', 502, 1099),
(237, 'Motorola One 5G', 501, 799),
(238, 'ASUS ROG Phone 5', 503, 1999),
(239, 'ASUS ZenFone 8', 504, 999),
(240, 'Nokia 8.3 5G', 502, 899),
(241, 'Nokia 7.2', 501, 699),
(242, 'BlackBerry Key2', 504, 1899),
(243, 'BlackBerry Motion', 502, 799),
(244, 'HTC U12 Plus', 501, 899),
(245, 'HTC Desire 20 Pro', 505, 699),
(246, 'Lenovo Legion Phone Duel', 503, 1499),
(247, 'Lenovo K12 Note', 504, 1499),
(248, 'ZTE Axon 30 Ultra', 501, 1299),
(249, 'ZTE Blade 20', 502, 1599),
(250, 'Oppo Find X3 Pro', 503, 1999);

Learnings
• JOIN operation: Combining data from two tables using JOIN on supplier_id.
• GROUP BY: Grouping the results by country.
• MAX function: Finding the maximum value of a column (price).
• Subquery or Window Functions: Using a subquery or window function to find the
highest-priced product for each country.

Solutions

PostgreSQL Solution
SELECT s.country, p.product_name
FROM suppliers s
JOIN products p ON s.supplier_id = p.supplier_id
WHERE p.price = (
SELECT MAX(price)
FROM products
JOIN suppliers ON suppliers.supplier_id = products.supplier_id
WHERE suppliers.country = s.country
)
ORDER BY s.country;

MySQL Solution
SELECT s.country, p.product_name
FROM suppliers s
JOIN products p ON s.supplier_id = p.supplier_id
WHERE p.price = (
SELECT MAX(price)
FROM products
JOIN suppliers ON suppliers.supplier_id = products.supplier_id
WHERE suppliers.country = s.country
)
ORDER BY s.country;
• Q.891

Question

1065
1000+ SQL Interview Questions & Answers | By Zero Analyst

Write an SQL query to get the details of the employee with the second-highest salary from
each department.

Explanation
To find the employee with the second-highest salary in each department, we can:
• Use a subquery to rank employees within each department by salary in descending order.
• Use the ROW_NUMBER() window function to assign a rank to each employee within their
department.
• Filter out the employees with the first-highest salary and get the employee with the second-
highest salary by selecting where the rank is 2.

Datasets and SQL Schemas

Table Creation
CREATE TABLE employees (
id INT,
first_name VARCHAR(50),
last_name VARCHAR(50),
age INT,
sex VARCHAR(1),
employee_title VARCHAR(50),
department VARCHAR(50),
salary INT,
target INT,
bonus INT,
email VARCHAR(100),
city VARCHAR(50),
address VARCHAR(100),
manager_id INT
);

Sample Data Insertion


INSERT INTO employees
(id, first_name, last_name, age, sex, employee_title, department, salary, target, bonus,
email, city, address, manager_id)
VALUES
(5, 'Max', 'George', 26, 'M', 'Sales', 'Sales', 1300, 200, 150, '[email protected]', 'Cali
fornia', '2638 Richards Avenue', 1),
(13, 'Katty', 'Bond', 56, 'F', 'Manager', 'Management', 150000, 0, 300, '[email protected]
om', 'Arizona', NULL, 1),
(11, 'Richerd', 'Gear', 57, 'M', 'Manager', 'Management', 250000, 0, 300, 'Richerd@compa
ny.com', 'Alabama', NULL, 1),
(10, 'Jennifer', 'Dion', 34, 'F', 'Sales', 'Sales', 1000, 200, 150, '[email protected]
m', 'Alabama', NULL, 13),
(19, 'George', 'Joe', 50, 'M', 'Manager', 'Management', 250000, 0, 300, 'George@company.
com', 'Florida', '1003 Wyatt Street', 1),
(18, 'Laila', 'Mark', 26, 'F', 'Sales', 'Sales', 1000, 200, 150, '[email protected]', 'F
lorida', '3655 Spirit Drive', 11),
(20, 'Sarrah', 'Bicky', 31, 'F', 'Senior Sales', 'Sales', 2000, 200, 150, 'Sarrah@compan
y.com', 'Florida', '1176 Tyler Avenue', 19);

Learnings
• Window Functions: Using ROW_NUMBER() to rank employees by salary within each
department.
• Subquery/CTE: Using a subquery or CTE (Common Table Expression) to get the second-
highest salary by filtering the row with rank 2.

Solutions

1066
1000+ SQL Interview Questions & Answers | By Zero Analyst

PostgreSQL Solution
WITH RankedEmployees AS (
SELECT id, first_name, last_name, department, salary,
ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS rank
FROM employees
)
SELECT id, first_name, last_name, department, salary
FROM RankedEmployees
WHERE rank = 2;

MySQL Solution
WITH RankedEmployees AS (
SELECT id, first_name, last_name, department, salary,
ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS rank
FROM employees
)
SELECT id, first_name, last_name, department, salary
FROM RankedEmployees
WHERE rank = 2;

Key Points
• ROW_NUMBER(): This function is used to assign a unique rank to each employee within their
department based on salary.
• PARTITION BY: Divides the data into partitions (in this case, by department).
• Filtering for Rank 2: We filter the results to get only the employee with the second-
highest salary in each department.
• Q.892

Question
Write an SQL query to calculate the total number of returned orders for each month, based on
the Orders and Returns tables.

Explanation
To calculate the total number of returned orders for each month, we need to:
• Join the Returns table with the Orders table based on the OrderID.
• Extract the month and year from the OrderDate in the Orders table.
• Count the number of returns for each month.
• Group the results by year and month to get the total returns for each month.

Datasets and SQL Schemas

Table Creation
CREATE TABLE Orders (
OrderID INT PRIMARY KEY,
OrderDate DATE,
TotalAmount DECIMAL(10, 2)
);

DROP TABLE IF EXISTS Returns;


CREATE TABLE Returns (
ReturnID INT PRIMARY KEY,
OrderID INT,
FOREIGN KEY (OrderID) REFERENCES Orders(OrderID)
);

1067
1000+ SQL Interview Questions & Answers | By Zero Analyst

Sample Data Insertion


-- Inserting data into Orders table
INSERT INTO Orders (OrderID, OrderDate, TotalAmount) VALUES
(1, '2023-01-15', 150.50),
(2, '2023-02-20', 200.75),
(3, '2023-02-28', 300.25),
(4, '2023-03-10', 180.00),
(5, '2023-04-05', 250.80);

-- Inserting data into Returns table


INSERT INTO Returns (ReturnID, OrderID) VALUES
(101, 2),
(102, 4),
(103, 5),
(104, 1),
(105, 3);

Learnings
• JOINs: Using an inner join to combine Orders and Returns based on the OrderID.
• DATE Functions: Extracting the year and month from the OrderDate using functions like
YEAR() and MONTH().
• GROUP BY and COUNT(): Grouping by the year and month of the OrderDate and
counting the number of returned orders.

Solutions

PostgreSQL Solution
SELECT
TO_CHAR(o.OrderDate, 'YYYY-MM') AS Month,
COUNT(r.ReturnID) AS TotalReturns
FROM Returns r
JOIN Orders o ON r.OrderID = o.OrderID
GROUP BY TO_CHAR(o.OrderDate, 'YYYY-MM')
ORDER BY Month;

MySQL Solution
SELECT
DATE_FORMAT(o.OrderDate, '%Y-%m') AS Month,
COUNT(r.ReturnID) AS TotalReturns
FROM Returns r
JOIN Orders o ON r.OrderID = o.OrderID
GROUP BY DATE_FORMAT(o.OrderDate, '%Y-%m')
ORDER BY Month;

Key Points
• TO_CHAR() / DATE_FORMAT(): Functions used to extract the year and month from the
OrderDate column.
• COUNT(): Used to count the number of returns for each group.
• GROUP BY: Groups the results by month and year to get the total returned orders per month.
• Q.893

Question
Write an SQL query to find the top 2 products in the top 2 categories based on the total spend
amount.

Explanation

1068
1000+ SQL Interview Questions & Answers | By Zero Analyst

To solve this, we need to:


• Aggregate the total spend for each product within each category using GROUP BY and
SUM().
• Rank the products within each category by the total spend using ROW_NUMBER() or
RANK().
• Filter to get the top 2 products in the top 2 categories based on total spend.
• Order the results to ensure that we select the top categories and top products within each
category.

Datasets and SQL Schemas

Table Creation
CREATE TABLE orders (
category VARCHAR(20),
product VARCHAR(20),
user_id INT,
spend DECIMAL(10, 2),
transaction_date DATE
);

Sample Data Insertion


INSERT INTO orders VALUES
('appliance','refrigerator',165,246.00,'2021/12/26'),
('appliance','refrigerator',123,299.99,'2022/03/02'),
('appliance','washingmachine',123,219.80,'2022/03/02'),
('electronics','vacuum',178,152.00,'2022/04/05'),
('electronics','wirelessheadset',156,249.90,'2022/07/08'),
('electronics','TV',145,189.00,'2022/07/15'),
('Television','TV',165,129.00,'2022/07/15'),
('Television','TV',163,129.00,'2022/07/15'),
('Television','TV',141,129.00,'2022/07/15'),
('toys','Ben10',145,189.00,'2022/07/15'),
('toys','Ben10',145,189.00,'2022/07/15'),
('toys','yoyo',165,129.00,'2022/07/15'),
('toys','yoyo',163,129.00,'2022/07/15'),
('toys','yoyo',141,129.00,'2022/07/15'),
('toys','yoyo',145,189.00,'2022/07/15'),
('electronics','vacuum',145,189.00,'2022/07/15');

Learnings
• SUM(): Used to calculate the total spend for each product within each category.
• RANK() or ROW_NUMBER(): Used to rank products within each category based on the total
spend.
• ORDER BY: Sorting categories and products based on spend to pick the top 2 categories and
top 2 products within each category.

Solutions

PostgreSQL Solution
WITH RankedProducts AS (
SELECT category, product, SUM(spend) AS total_spend,
RANK() OVER (PARTITION BY category ORDER BY SUM(spend) DESC) AS product_rank
FROM orders
GROUP BY category, product
),
RankedCategories AS (
SELECT category, SUM(spend) AS category_spend,
RANK() OVER (ORDER BY SUM(spend) DESC) AS category_rank

1069
1000+ SQL Interview Questions & Answers | By Zero Analyst

FROM orders
GROUP BY category
)
SELECT rp.category, rp.product, rp.total_spend
FROM RankedProducts rp
JOIN RankedCategories rc ON rp.category = rc.category
WHERE rp.product_rank <= 2 AND rc.category_rank <= 2
ORDER BY rc.category_rank, rp.product_rank;

MySQL Solution
WITH RankedProducts AS (
SELECT category, product, SUM(spend) AS total_spend,
RANK() OVER (PARTITION BY category ORDER BY SUM(spend) DESC) AS product_rank
FROM orders
GROUP BY category, product
),
RankedCategories AS (
SELECT category, SUM(spend) AS category_spend,
RANK() OVER (ORDER BY SUM(spend) DESC) AS category_rank
FROM orders
GROUP BY category
)
SELECT rp.category, rp.product, rp.total_spend
FROM RankedProducts rp
JOIN RankedCategories rc ON rp.category = rc.category
WHERE rp.product_rank <= 2 AND rc.category_rank <= 2
ORDER BY rc.category_rank, rp.product_rank;

Key Points
• SUM(spend): Used to calculate the total spend for each product and each category.
• RANK(): This window function ranks the products within each category by the total spend
and ranks the categories by their total spend.
• CTEs (Common Table Expressions): Used to first rank the categories and products and
then filter the top 2 from each.
• Q.894

Question
Write an SQL query to retrieve the third-highest salary from the Employee table.

Explanation
To retrieve the third-highest salary:
• Use a window function like DENSE_RANK() to assign ranks to the salaries in descending
order.
• Filter the result to get the row with rank 3, which corresponds to the third-highest salary.

Datasets and SQL Schemas

Table Creation
DROP TABLE IF EXISTS Employees;

CREATE TABLE Employees (


EmployeeID INT PRIMARY KEY,
Name VARCHAR(50),
Department VARCHAR(50),
Salary DECIMAL(10, 2),
HireDate DATE
);

1070
1000+ SQL Interview Questions & Answers | By Zero Analyst

Sample Data Insertion


INSERT INTO Employees (EmployeeID, Name, Department, Salary, HireDate) VALUES
(101, 'John Smith', 'Sales', 60000.00, '2022-01-15'),
(102, 'Jane Doe', 'Marketing', 55000.00, '2022-02-20'),
(103, 'Michael Johnson', 'Finance', 70000.00, '2021-12-10'),
(104, 'Emily Brown', 'Sales', 62000.00, '2022-03-05'),
(106, 'Sam Brown', 'IT', 62000.00, '2022-03-05'),
(105, 'Chris Wilson', 'Marketing', 58000.00, '2022-01-30');

Learnings
• DENSE_RANK(): A window function that assigns a rank to each row within the partition of a
result set, with no gaps in ranking when there are ties.
• Filtering by Rank: We filter out the third-highest salary by selecting the row with rank =
3.

Solutions

PostgreSQL and MySQL Solution


SELECT salary AS third_highest_salary
FROM (
SELECT salary,
DENSE_RANK() OVER (ORDER BY salary DESC) AS drn
FROM Employees
) AS subquery
WHERE drn = 3;

Key Points
• DENSE_RANK(): Assigns ranks to rows in the result set, allowing us to handle ties correctly.
• Filtering with WHERE: We use the rank value to filter and return the third-highest salary.
• Q.895

Question
Write an SQL query to find all products that haven't been sold in the last six months. Return
the product_id, product_name, category, and price of these products.

Explanation
To solve this problem:
• Identify the current date and calculate the date six months ago.
• Join the Products table with the Sales table on product_id.
• Check for products that have no sales records in the last six months using a LEFT
JOIN and a filter based on the sale date.
• Exclude products that have sales in the last six months by checking for NULL in the
sales date.

Datasets and SQL Schemas

Table Creation
DROP TABLE IF EXISTS Products;
CREATE TABLE Products (
product_id SERIAL PRIMARY KEY,
product_name VARCHAR(100),
category VARCHAR(50),
price DECIMAL(10, 2)

1071
1000+ SQL Interview Questions & Answers | By Zero Analyst

);

DROP TABLE IF EXISTS Sales;


CREATE TABLE Sales (
sale_id SERIAL PRIMARY KEY,
product_id INT,
sale_date DATE,
quantity INT,
FOREIGN KEY (product_id) REFERENCES Products(product_id)
);

Sample Data Insertion


-- Inserting sample data into Products table
INSERT INTO Products (product_name, category, price) VALUES
('Product A', 'Category 1', 10.00),
('Product B', 'Category 2', 15.00),
('Product C', 'Category 1', 20.00),
('Product D', 'Category 3', 25.00);

-- Inserting sample data into Sales table


INSERT INTO Sales (product_id, sale_date, quantity) VALUES
(1, '2023-09-15', 5),
(2, '2023-10-20', 3),
(1, '2024-01-05', 2),
(3, '2024-02-10', 4),
(4, '2023-12-03', 1);

Learnings
• LEFT JOIN: Used to include all products, even those without sales records in the last six
months.
• Date functions: Used to calculate the date six months ago and compare with sales data.
• IS NULL: Used to filter out products that have sales records in the last six months.

Solutions

PostgreSQL and MySQL Solution


SELECT p.product_id, p.product_name, p.category, p.price
FROM Products p
LEFT JOIN Sales s ON p.product_id = s.product_id
AND s.sale_date > CURRENT_DATE - INTERVAL '6 months'
WHERE s.sale_date IS NULL;

Key Points
• LEFT JOIN: Ensures that even products with no sales in the last six months are included.
• CURRENT_DATE - INTERVAL '6 months': Computes the date six months ago from the
current date.
• WHERE s.sale_date IS NULL: Filters products that have no sales in the last six months.
• Q.896

Question
Write an SQL query to find customers who bought Airpods after purchasing an iPhone.

Explanation
To solve this:
• We need to identify customers who bought an iPhone and then bought Airpods later.
• This requires comparing the purchase dates of the two products for the same customer.

1072
1000+ SQL Interview Questions & Answers | By Zero Analyst

• We can achieve this by joining the Customers and Purchases tables, filtering the
purchases where the product is iPhone and Airpods, and ensuring that the Airpods purchase
happens after the iPhone purchase.

Datasets and SQL Schemas

Table Creation
DROP TABLE IF EXISTS customers;
CREATE TABLE Customers (
CustomerID INT,
CustomerName VARCHAR(50)
);

DROP TABLE IF EXISTS purchases;


CREATE TABLE Purchases (
PurchaseID INT,
CustomerID INT,
ProductName VARCHAR(50),
PurchaseDate DATE
);

Sample Data Insertion


-- Inserting sample data into Customers table
INSERT INTO Customers (CustomerID, CustomerName) VALUES
(1, 'John'),
(2, 'Emma'),
(3, 'Michael'),
(4, 'Ben'),
(5, 'John');

-- Inserting sample data into Purchases table


INSERT INTO Purchases (PurchaseID, CustomerID, ProductName, PurchaseDate) VALUES
(100, 1, 'iPhone', '2024-01-01'),
(101, 1, 'MacBook', '2024-01-20'),
(102, 1, 'Airpods', '2024-03-10'),
(103, 2, 'iPad', '2024-03-05'),
(104, 2, 'iPhone', '2024-03-15'),
(105, 3, 'MacBook', '2024-03-20'),
(106, 3, 'Airpods', '2024-03-25'),
(107, 4, 'iPhone', '2024-03-22'),
(108, 4, 'Airpods', '2024-03-29'),
(110, 5, 'Airpods', '2024-02-29'),
(109, 5, 'iPhone', '2024-03-22');

Learnings
• JOIN: Used to link customers to their purchase history.
• Date Comparison: We compare the purchase dates to check if the purchase of Airpods
was after the iPhone.
• Subquery/Correlated Subquery: A correlated subquery can help filter for customers who
bought Airpods after buying iPhones.

Solutions

PostgreSQL and MySQL Solution


SELECT DISTINCT c.CustomerID, c.CustomerName
FROM Customers c
JOIN Purchases p1 ON c.CustomerID = p1.CustomerID
JOIN Purchases p2 ON c.CustomerID = p2.CustomerID
WHERE p1.ProductName = 'iPhone'
AND p2.ProductName = 'Airpods'
AND p1.PurchaseDate < p2.PurchaseDate;

1073
1000+ SQL Interview Questions & Answers | By Zero Analyst

Key Points
• JOIN: We join the Purchases table twice, once for iPhone purchases and once for Airpods
purchases.
• WHERE clause: Ensures that only customers who bought Airpods after iPhones are selected.
• DISTINCT: Ensures that each customer is listed only once, even if they made multiple
qualifying purchases.
• Q.897

Question
Write a SQL query to classify employees into three categories based on their salary:
• "High" - Salary greater than $70,000
• "Medium" - Salary between $50,000 and $70,000 (inclusive)
• "Low" - Salary less than $50,000
Your query should return the EmployeeID, FirstName, LastName, Department, Salary, and
a new column SalaryCategory indicating the category to which each employee belongs.

Explanation
To solve this:
• Use CASE WHEN logic to create a new column SalaryCategory that classifies
employees based on their salary.
• Define the conditions for "High", "Medium", and "Low" salary categories.
• Select all required columns (EmployeeID, FirstName, LastName, Department, Salary,
and the new SalaryCategory).

Datasets and SQL Schemas

Table Creation
DROP TABLE IF EXISTS employees;

CREATE TABLE employees (


EmployeeID INT PRIMARY KEY,
FirstName VARCHAR(50),
LastName VARCHAR(50),
Department VARCHAR(50),
Salary NUMERIC(10, 2)
);

Sample Data Insertion


INSERT INTO employees (EmployeeID, FirstName, LastName, Department, Salary) VALUES
(1, 'John', 'Doe', 'Finance', 75000.00),
(2, 'Jane', 'Smith', 'HR', 60000.00),
(3, 'Michael', 'Johnson', 'IT', 45000.00),
(4, 'Emily', 'Brown', 'Marketing', 55000.00),
(5, 'David', 'Williams', 'Finance', 80000.00),
(6, 'Sarah', 'Jones', 'HR', 48000.00),
(7, 'Chris', 'Taylor', 'IT', 72000.00),
(8, 'Jessica', 'Wilson', 'Marketing', 49000.00);

Learnings
• CASE WHEN: A conditional expression used to create categories based on salary ranges.
• NUMERIC type: Used to store precise decimal values for salary amounts.

1074
1000+ SQL Interview Questions & Answers | By Zero Analyst

Solutions

PostgreSQL and MySQL Solution


SELECT
EmployeeID,
FirstName,
LastName,
Department,
Salary,
CASE
WHEN Salary > 70000 THEN 'High'
WHEN Salary BETWEEN 50000 AND 70000 THEN 'Medium'
ELSE 'Low'
END AS SalaryCategory
FROM employees;

Key Points
• CASE WHEN: The CASE statement is used to assign a value based on conditional checks.
• Salary Ranges: Using BETWEEN for the "Medium" category ensures we handle both the
lower and upper bounds inclusively.
• Classifying Data: The query returns a new column (SalaryCategory) that classifies
employees into salary ranges based on their salary value.
• Q.898

Question
Write a SQL query to show the unique_id of each employee from the Employees table. If an
employee does not have a corresponding unique ID in the EmployeeUNI table, return NULL for
that employee.

Explanation
To solve this:
• We need to join the Employees table with the EmployeeUNI table based on the employee
id.
• We will use a LEFT JOIN to include all employees from the Employees table, and show
the unique_id from the EmployeeUNI table where it exists. If a unique_id does not exist for
an employee, the result should return NULL for that employee's unique_id.
• Return the employee id, name, and unique_id (or NULL if not available).

Datasets and SQL Schemas

Table Creation
DROP TABLE IF EXISTS Employees;
CREATE TABLE Employees (
id INT PRIMARY KEY,
name VARCHAR(255)
);

DROP TABLE IF EXISTS EmployeeUNI;


CREATE TABLE EmployeeUNI (
id INT PRIMARY KEY,
unique_id INT
);

Sample Data Insertion

1075
1000+ SQL Interview Questions & Answers | By Zero Analyst

-- Insert sample data into Employees table


INSERT INTO Employees (id, name) VALUES
(1, 'Alice'),
(7, 'Bob'),
(11, 'Meir'),
(90, 'Winston'),
(3, 'Jonathan');

-- Insert sample data into EmployeeUNI table


INSERT INTO EmployeeUNI (id, unique_id) VALUES
(3, 1),
(11, 2),
(90, 3);

Learnings
• LEFT JOIN: Used to ensure all records from the Employees table are included, and only
matching records from the EmployeeUNI table.
• NULL Handling: The LEFT JOIN ensures that if there is no match in the EmployeeUNI
table, NULL is returned for the unique_id.

Solutions

PostgreSQL and MySQL Solution


SELECT e.id, e.name, eu.unique_id
FROM Employees e
LEFT JOIN EmployeeUNI eu ON e.id = eu.id;

Key Points
• LEFT JOIN: Ensures all employees are shown, even if there is no corresponding
unique_id in the EmployeeUNI table.
• NULL for Missing Data: If an employee does not have a unique ID, the query returns
NULL in the unique_id column for that employee.
• Readable Output: The output will include the employee's id, name, and either the
unique_id (if available) or NULL.
• Q.899

Question
Write a SQL query to find all the employees who do not manage anyone. Return their
emp_id, name, and manager_id.

Explanation
To solve this:
• We need to identify employees who are not managers.
• An employee is a manager if their emp_id appears in the manager_id column for other
employees.
• Employees who do not manage anyone will not have their emp_id listed as a manager_id
in any other record.
• We can achieve this by using a LEFT JOIN to check if their emp_id appears in the
manager_id column of other employees. If it doesn't, that means they don't manage anyone.

Datasets and SQL Schemas

1076
1000+ SQL Interview Questions & Answers | By Zero Analyst

Table Creation
DROP TABLE IF EXISTS employees;

CREATE TABLE employees (


emp_id INT PRIMARY KEY,
name VARCHAR(100),
manager_id INT,
FOREIGN KEY (manager_id) REFERENCES employees(emp_id)
);

Sample Data Insertion


INSERT INTO employees (emp_id, name, manager_id) VALUES
(1, 'John Doe', NULL),
(2, 'Jane Smith', 1),
(3, 'Alice Johnson', 1),
(4, 'Bob Brown', 3),
(5, 'Emily White', NULL),
(6, 'Michael Lee', 3),
(7, 'David Clark', NULL),
(8, 'Sarah Davis', 2),
(9, 'Kevin Wilson', 2),
(10, 'Laura Martinez', 4);

Learnings
• LEFT JOIN: Used to find all employees, including those without managers.
• NOT EXISTS or NOT IN: These can be used to filter employees who don't have their
emp_id as a manager_id in any record.

Solutions

PostgreSQL and MySQL Solution


SELECT e.emp_id, e.name, e.manager_id
FROM employees e
LEFT JOIN employees m ON e.emp_id = m.manager_id
WHERE m.manager_id IS NULL;

Key Points
• LEFT JOIN: Ensures all employees are considered, even those who don't have a
corresponding manager.
• m.manager_id IS NULL: Filters out those employees who do not manage anyone, i.e.,
those who are not referenced as a manager by any other employee.
• Q.900

Question
Find the top 2 customers who have spent the most money across all their orders. Return their
names, emails, and total amounts spent.

Explanation
To solve this problem:
• We need to calculate the total amount spent by each customer across all their orders.
• We'll use the SUM() function to calculate the total order amount for each customer.
• Then, we'll order the results by the total amount spent in descending order and limit the
result to the top 2 customers.

1077
1000+ SQL Interview Questions & Answers | By Zero Analyst

Datasets and SQL Schemas

Table Creation
CREATE TABLE customers (
customer_id INT PRIMARY KEY,
customer_name VARCHAR(100),
customer_email VARCHAR(100)
);

DROP TABLE IF EXISTS orders;


CREATE TABLE orders (
order_id INT PRIMARY KEY,
customer_id INT,
order_date DATE,
order_amount DECIMAL(10, 2),
FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
);

Sample Data Insertion


INSERT INTO customers (customer_id, customer_name, customer_email) VALUES
(1, 'John Doe', '[email protected]'),
(2, 'Jane Smith', '[email protected]'),
(3, 'Alice Johnson', '[email protected]'),
(4, 'Bob Brown', '[email protected]');

INSERT INTO orders (order_id, customer_id, order_date, order_amount) VALUES


(1, 1, '2024-01-03', 50.00),
(2, 2, '2024-01-05', 75.00),
(3, 1, '2024-01-10', 25.00),
(4, 3, '2024-01-15', 60.00),
(5, 2, '2024-01-20', 50.00),
(6, 1, '2024-02-01', 100.00),
(7, 2, '2024-02-05', 25.00),
(8, 3, '2024-02-10', 90.00),
(9, 1, '2024-02-15', 50.00),
(10, 2, '2024-02-20', 75.00);

Learnings
• SUM(): This is an aggregation function used to calculate the total amount spent by each
customer.
• JOIN: Used to combine data from the customers and orders tables based on the
customer_id.
• ORDER BY: This is used to sort the result in descending order of total amounts spent.
• LIMIT: Restricts the result to only the top 2 customers.

Solutions

PostgreSQL and MySQL Solution


SELECT
c.customer_name,
c.customer_email,
SUM(o.order_amount) AS total_spent
FROM
customers c
JOIN
orders o ON c.customer_id = o.customer_id
GROUP BY
c.customer_name, c.customer_email
ORDER BY
total_spent DESC
LIMIT 2;

1078
1000+ SQL Interview Questions & Answers | By Zero Analyst

Key Takeaways
• JOIN: Used to combine related data from different tables (customers and orders).
• SUM(): Aggregates the total amount spent by each customer.
• LIMIT: Restricts the result to the top N records, in this case, the top 2 customers based on
spending.
• Q.901
Question
Write an SQL query to find customers who have made purchases in all product categories.
Explanation
To solve this problem, we need to identify customers who have purchased at least one item
from each distinct product category. We can do this by using a GROUP BY clause to count the
distinct categories per customer and comparing it to the total number of unique categories
available in the Purchases table.
Datasets and SQL Schemas
• - Table creation
CREATE TABLE Customers (
customer_id INT PRIMARY KEY,
customer_name VARCHAR(50)
);
• - Table creation
CREATE TABLE Purchases (
purchase_id INT PRIMARY KEY,
customer_id INT,
product_category VARCHAR(50),
FOREIGN KEY (customer_id) REFERENCES Customers(customer_id)
);
• - Datasets
INSERT INTO Customers (customer_id, customer_name) VALUES
(1, 'Alice'),
(2, 'Bob'),
(3, 'Charlie'),
(4, 'David'),
(5, 'Emma');
INSERT INTO Purchases (purchase_id, customer_id, product_category) VALUES
(101, 1, 'Electronics'),
(102, 1, 'Books'),
(103, 1, 'Clothing'),
(104, 1, 'Electronics'),
(105, 2, 'Clothing'),
(106, 1, 'Beauty'),
(107, 3, 'Electronics'),
(108, 3, 'Books'),
(109, 4, 'Books'),
(110, 4, 'Clothing'),
(111, 4, 'Beauty'),
(112, 5, 'Electronics'),
(113, 5, 'Books');

Learnings
• Use of JOIN to connect two tables.
• Use of COUNT(DISTINCT) to count unique categories per customer.
• Comparison between the total number of unique categories and the number of categories
per customer.
Solutions
• - PostgreSQL solution

1079
1000+ SQL Interview Questions & Answers | By Zero Analyst

SELECT c.customer_id, c.customer_name


FROM Customers c
JOIN Purchases p ON c.customer_id = p.customer_id
GROUP BY c.customer_id, c.customer_name
HAVING COUNT(DISTINCT p.product_category) = (SELECT COUNT(DISTINCT product_category) FRO
M Purchases);
• - MySQL solution
SELECT c.customer_id, c.customer_name
FROM Customers c
JOIN Purchases p ON c.customer_id = p.customer_id
GROUP BY c.customer_id, c.customer_name
HAVING COUNT(DISTINCT p.product_category) = (SELECT COUNT(DISTINCT product_category) FRO
M Purchases);
• Q.902
Question
Write a SQL query to find out each hotel's best performing month based on revenue.
Explanation
To solve this problem, we need to calculate the total revenue for each hotel for each month,
then rank the months based on revenue for each hotel. We can use a window function
(RANK()) to assign rankings to the months, and then filter for the best performing month for
each hotel.
Datasets and SQL Schemas
• - Table creation
CREATE TABLE hotel_bookings (
booking_id SERIAL PRIMARY KEY,
booking_date DATE,
hotel_name VARCHAR(100),
total_guests INT,
total_nights INT,
total_price DECIMAL(10, 2)
);
• - Datasets
INSERT INTO hotel_bookings (booking_date, hotel_name, total_guests, total_nights, total_
price) VALUES
('2023-01-05', 'Hotel A', 2, 3, 300.00),
('2023-02-10', 'Hotel B', 3, 5, 600.00),
('2023-03-15', 'Hotel A', 4, 2, 400.00),
('2023-04-20', 'Hotel B', 2, 4, 500.00),
('2023-05-25', 'Hotel A', 3, 3, 450.00),
('2023-06-30', 'Hotel B', 5, 2, 350.00),
('2023-07-05', 'Hotel A', 2, 5, 550.00),
('2023-08-10', 'Hotel B', 3, 3, 450.00),
('2023-09-15', 'Hotel A', 4, 4, 500.00),
('2023-10-20', 'Hotel B', 2, 3, 300.00),
('2023-11-25', 'Hotel A', 3, 2, 350.00),
('2023-12-30', 'Hotel B', 5, 4, 600.00),
('2022-01-05', 'Hotel A', 2, 3, 300.00),
('2022-02-10', 'Hotel B', 3, 5, 600.00),
('2022-03-15', 'Hotel A', 4, 2, 400.00),
('2022-04-20', 'Hotel B', 2, 4, 500.00),
('2022-05-25', 'Hotel A', 3, 3, 450.00),
('2022-06-30', 'Hotel B', 5, 2, 350.00),
('2022-07-05', 'Hotel A', 2, 5, 550.00),
('2022-08-10', 'Hotel B', 3, 3, 450.00),
('2022-09-15', 'Hotel A', 4, 4, 500.00),
('2022-10-20', 'Hotel B', 2, 3, 300.00),
('2022-11-25', 'Hotel A', 3, 2, 350.00),
('2022-12-30', 'Hotel B', 5, 4, 600.00);

Learnings
• Use of window functions like RANK() to assign rankings based on aggregated data.

1080
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Grouping by multiple columns (hotel_name, year, and month) for aggregating revenue.
• Use of DATE_TRUNC() (or equivalent) to extract the month and year from a DATE column.
• Filtering top-ranked rows using RANK() to get the best performing month.
Solutions
• - PostgreSQL solution
WITH monthly_revenue AS (
SELECT
hotel_name,
EXTRACT(YEAR FROM booking_date) AS year,
EXTRACT(MONTH FROM booking_date) AS month,
SUM(total_price) AS revenue
FROM hotel_bookings
GROUP BY hotel_name, EXTRACT(YEAR FROM booking_date), EXTRACT(MONTH FROM booking_dat
e)
), ranked_months AS (
SELECT
hotel_name,
year,
month,
revenue,
RANK() OVER (PARTITION BY hotel_name ORDER BY revenue DESC) AS rank
FROM monthly_revenue
)
SELECT hotel_name, year, month, revenue
FROM ranked_months
WHERE rank = 1;
• - MySQL solution
WITH monthly_revenue AS (
SELECT
hotel_name,
YEAR(booking_date) AS year,
MONTH(booking_date) AS month,
SUM(total_price) AS revenue
FROM hotel_bookings
GROUP BY hotel_name, YEAR(booking_date), MONTH(booking_date)
), ranked_months AS (
SELECT
hotel_name,
year,
month,
revenue,
RANK() OVER (PARTITION BY hotel_name ORDER BY revenue DESC) AS rank
FROM monthly_revenue
)
SELECT hotel_name, year, month, revenue
FROM ranked_months
WHERE rank = 1;
• Q.903
Question
Find the details of employees whose salary is greater than the average salary across the entire
company.
Explanation
To solve this problem, we need to calculate the average salary for all employees in the
company and then filter the employees whose salary is above this average. We can use a
subquery to calculate the average salary and then compare each employee's salary to that
value.
Datasets and SQL Schemas
• - Table creation
CREATE TABLE employees (

1081
1000+ SQL Interview Questions & Answers | By Zero Analyst

employee_id SERIAL PRIMARY KEY,


employee_name VARCHAR(100),
department VARCHAR(50),
salary DECIMAL(10, 2)
);
• - Datasets
INSERT INTO employees (employee_name, department, salary)
VALUES
('John Doe', 'HR', 50000.00),
('Jane Smith', 'HR', 55000.00),
('Michael Johnson', 'HR', 60000.00),
('Emily Davis', 'IT', 60000.00),
('David Brown', 'IT', 65000.00),
('Sarah Wilson', 'Finance', 70000.00),
('Robert Taylor', 'Finance', 75000.00),
('Jennifer Martinez', 'Finance', 80000.00);

Learnings
• Use of subqueries to calculate aggregate values like average salary.
• Filtering results based on comparison with aggregate values.
• Basic understanding of how to use WHERE clauses for comparisons.
Solutions
• - PostgreSQL solution
SELECT employee_id, employee_name, department, salary
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);
• - MySQL solution
SELECT employee_id, employee_name, department, salary
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);
• Q.904
Question
You have a table called products with below columns: product_id, product_name, price,
quantity_sold. Calculate the percentage contribution of each product to total revenue.
Round the result to 2 decimal places.
Explanation
To solve this, first, calculate the total revenue by multiplying price and quantity_sold for
each product. Then, calculate the percentage contribution of each product by dividing its
revenue by the total revenue of all products and multiplying by 100. The result should be
rounded to 2 decimal places.
Datasets and SQL Schemas
• - Table creation
CREATE TABLE products (
product_id INT PRIMARY KEY,
product_name VARCHAR(100),
price DECIMAL(10, 2),
quantity_sold INT
);
• - Datasets
INSERT INTO products (product_id, product_name, price, quantity_sold) VALUES
(1, 'iPhone', 899.00, 600),
(2, 'iMac', 1299.00, 150),
(3, 'MacBook Pro', 1499.00, 500),
(4, 'AirPods', 499.00, 800),
(5, 'Accessories', 199.00, 300);

Learnings

1082
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Calculating total revenue for each product.


• Summing up the total revenue of all products.
• Calculating the percentage contribution using basic arithmetic.
• Rounding results using the ROUND() function.
Solutions
• - PostgreSQL solution
SELECT product_id, product_name, price, quantity_sold,
ROUND((price * quantity_sold) / (SELECT SUM(price * quantity_sold) FROM products)
* 100, 2) AS percentage_contribution
FROM products;
• - MySQL solution
SELECT product_id, product_name, price, quantity_sold,
ROUND((price * quantity_sold) / (SELECT SUM(price * quantity_sold) FROM products)
* 100, 2) AS percentage_contribution
FROM products;
• Q.905
Question
You have a dataset of a food delivery company with columns order_id, customer_id,
order_date, and pref_delivery_date. If the customer's preferred delivery date is the same
as the order date, then the order is called immediate; otherwise, it is called scheduled. Write a
solution to find the percentage of immediate orders in the first orders of all customers,
rounded to 2 decimal places.
Explanation
To solve this, first, we need to identify the first order for each customer. We can do this by
finding the earliest order_date for each customer_id. After that, we check if the
order_date matches the customer_pref_delivery_date to classify the order as
immediate. Finally, calculate the percentage of immediate orders among all the first orders
for all customers.
Datasets and SQL Schemas
• - Table creation
CREATE TABLE Delivery (
delivery_id SERIAL PRIMARY KEY,
customer_id INT,
order_date DATE,
customer_pref_delivery_date DATE
);
• - Datasets
INSERT INTO Delivery (customer_id, order_date, customer_pref_delivery_date) VALUES
(1, '2019-08-01', '2019-08-02'),
(2, '2019-08-02', '2019-08-02'),
(1, '2019-08-11', '2019-08-12'),
(3, '2019-08-24', '2019-08-24'),
(3, '2019-08-21', '2019-08-22'),
(2, '2019-08-11', '2019-08-13'),
(4, '2019-08-09', '2019-08-09'),
(5, '2019-08-09', '2019-08-10'),
(4, '2019-08-10', '2019-08-12'),
(6, '2019-08-09', '2019-08-11'),
(7, '2019-08-12', '2019-08-13'),
(8, '2019-08-13', '2019-08-13'),
(9, '2019-08-11', '2019-08-12');

Learnings
• Use of ROW_NUMBER() or RANK() to identify the first order for each customer.

1083
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Filtering to identify immediate orders by comparing order_date and


customer_pref_delivery_date.
• Calculating the percentage of a specific condition (immediate orders) over a total (first
orders).
Solutions
• - PostgreSQL solution
WITH first_orders AS (
SELECT customer_id, order_date, customer_pref_delivery_date,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_date) AS rn
FROM Delivery
), immediate_orders AS (
SELECT customer_id, order_date, customer_pref_delivery_date
FROM first_orders
WHERE rn = 1 AND order_date = customer_pref_delivery_date
)
SELECT
ROUND(COUNT(*) * 100.0 / (SELECT COUNT(*) FROM first_orders WHERE rn = 1), 2) AS imm
ediate_order_percentage
FROM immediate_orders;
• - MySQL solution
WITH first_orders AS (
SELECT customer_id, order_date, customer_pref_delivery_date,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_date) AS rn
FROM Delivery
), immediate_orders AS (
SELECT customer_id, order_date, customer_pref_delivery_date
FROM first_orders
WHERE rn = 1 AND order_date = customer_pref_delivery_date
)
SELECT
ROUND(COUNT(*) * 100.0 / (SELECT COUNT(*) FROM first_orders WHERE rn = 1), 2) AS imm
ediate_order_percentage
FROM immediate_orders;
• Q.906
Question
Write a query that'll identify returning active users. A returning active user is a user that has
made a second purchase within 7 days of their first purchase. Output a list of user_ids of
these returning active users.
Explanation
To solve this, we need to:
• Identify the user's first purchase by selecting the earliest purchase_date for each
user_id.
• Then, for the same user, identify a second purchase made within 7 days of the first
purchase.
• Use a self-join on the amazon_transactions table to compare the first and second
purchases for each user.
• Return only distinct user_ids of users who made the second purchase within 7 days of the
first.
Datasets and SQL Schemas
• - Table creation
CREATE TABLE amazon_transactions (
id SERIAL PRIMARY KEY,
user_id INT,
item VARCHAR(255),
purchase_date DATE,

1084
1000+ SQL Interview Questions & Answers | By Zero Analyst

revenue NUMERIC
);
• - Datasets
INSERT INTO amazon_transactions (user_id, item, purchase_date, revenue) VALUES
(109, 'milk', '2020-03-03', 123),
(139, 'biscuit', '2020-03-18', 421),
(120, 'milk', '2020-03-18', 176),
(108, 'banana', '2020-03-18', 862),
(130, 'milk', '2020-03-28', 333),
(103, 'bread', '2020-03-29', 862),
(122, 'banana', '2020-03-07', 952),
(125, 'bread', '2020-03-13', 317),
(139, 'bread', '2020-03-30', 929),
(141, 'banana', '2020-03-17', 812),
(116, 'bread', '2020-03-31', 226),
(128, 'bread', '2020-03-04', 112),
(146, 'biscuit', '2020-03-04', 362),
(119, 'banana', '2020-03-28', 127),
(142, 'bread', '2020-03-09', 503),
(122, 'bread', '2020-03-06', 593),
(128, 'biscuit', '2020-03-24', 160),
(112, 'banana', '2020-03-24', 262),
(149, 'banana', '2020-03-29', 382),
(100, 'banana', '2020-03-18', 599),
(130, 'milk', '2020-03-16', 604),
(103, 'milk', '2020-03-31', 290),
(112, 'banana', '2020-03-23', 523),
(102, 'bread', '2020-03-25', 325),
(120, 'biscuit', '2020-03-21', 858),
(109, 'bread', '2020-03-22', 432),
(101, 'milk', '2020-03-01', 449),
(138, 'milk', '2020-03-19', 961),
(100, 'milk', '2020-03-29', 410),
(129, 'milk', '2020-03-02', 771),
(123, 'milk', '2020-03-31', 434),
(104, 'biscuit', '2020-03-31', 957),
(110, 'bread', '2020-03-13', 210),
(143, 'bread', '2020-03-27', 870),
(130, 'milk', '2020-03-12', 176),
(128, 'milk', '2020-03-28', 498),
(133, 'banana', '2020-03-21', 837),
(150, 'banana', '2020-03-20', 927),
(120, 'milk', '2020-03-27', 793),
(109, 'bread', '2020-03-02', 362),
(110, 'bread', '2020-03-13', 262),
(140, 'milk', '2020-03-09', 468),
(112, 'banana', '2020-03-04', 381),
(117, 'biscuit', '2020-03-19', 831),
(137, 'banana', '2020-03-23', 490),
(130, 'bread', '2020-03-09', 149),
(133, 'bread', '2020-03-08', 658),
(143, 'milk', '2020-03-11', 317),
(111, 'biscuit', '2020-03-23', 204),
(150, 'banana', '2020-03-04', 299),
(131, 'bread', '2020-03-10', 155),
(140, 'biscuit', '2020-03-17', 810),
(147, 'banana', '2020-03-22', 702),
(119, 'biscuit', '2020-03-15', 355),
(116, 'milk', '2020-03-12', 468),
(141, 'milk', '2020-03-14', 254),
(143, 'bread', '2020-03-16', 647),
(105, 'bread', '2020-03-21', 562),
(149, 'biscuit', '2020-03-11', 827),
(117, 'banana', '2020-03-22', 249),
(150, 'banana', '2020-03-21', 450),
(134, 'bread', '2020-03-08', 981),
(133, 'banana', '2020-03-26', 353),
(127, 'milk', '2020-03-27', 300),
(101, 'milk', '2020-03-26', 740),
(137, 'biscuit', '2020-03-12', 473),
(113, 'biscuit', '2020-03-21', 278),

1085
1000+ SQL Interview Questions & Answers | By Zero Analyst

(141, 'bread', '2020-03-21', 118),


(112, 'biscuit', '2020-03-14', 334),
(118, 'milk', '2020-03-30', 603),
(111, 'milk', '2020-03-19', 205),
(146, 'biscuit', '2020-03-13', 599),
(148, 'banana', '2020-03-14', 530),
(100, 'banana', '2020-03-13', 175),
(105, 'banana', '2020-03-05', 815),
(129, 'milk', '2020-03-02', 489),
(121, 'milk', '2020-03-16', 476),
(117, 'bread', '2020-03-11', 270),
(133, 'milk', '2020-03-12', 446),
(124, 'bread', '2020-03-31', 937),
(145, 'bread', '2020-03-07', 821),
(105, 'banana', '2020-03-09', 972),
(131, 'milk', '2020-03-09', 808),
(114, 'biscuit', '2020-03-31', 202),
(120, 'milk', '2020-03-06', 898),
(130, 'milk', '2020-03-06', 581),
(141, 'biscuit', '2020-03-11', 749),
(147, 'bread', '2020-03-14', 262),
(118, 'milk', '2020-03-15', 735),
(136, 'biscuit', '2020-03-22', 410),
(132, 'bread', '2020-03-06', 161),
(137, 'biscuit', '2020-03-31', 427),
(107, 'bread', '2020-03-01', 701),
(111, 'biscuit', '2020-03-18', 218),
(100, 'bread', '2020-03-07', 410),
(106, 'milk', '2020-03-21', 379),
(114, 'banana', '2020-03-25', 705),
(110, 'bread', '2020-03-27', 225),
(130, 'milk', '2020-03-16', 494),
(117, 'bread', '2020-03-10', 209);

Learnings
• Using self-joins to compare data from the same table.
• Identifying first and second purchases with date filters.
• Using DISTINCT to return unique user_ids.
• Working with date arithmetic (e.g., purchase_date - first_purchase_date <= 7).
Solutions
• - PostgreSQL solution
SELECT DISTINCT a1.user_id AS active_users
FROM amazon_transactions a1 -- First purchase table
JOIN amazon_transactions a2 -- Second purchase table
ON a1.user_id = a2.user_id
AND a1.purchase_date < a2.purchase_date -- First purchase is before second
AND a2.purchase_date - a1.purchase_date <= 7 -- Second purchase is within 7 days
ORDER BY a1.user_id;
• - MySQL solution
SELECT DISTINCT a1.user_id AS active_users
FROM amazon_transactions a1 -- First purchase table
JOIN amazon_transactions a2 -- Second purchase table
ON a1.user_id = a2.user_id
AND a1.purchase_date < a2.purchase_date -- First purchase is before second
AND DATEDIFF(a2.purchase_date, a1.purchase_date) <= 7 -- Second purchase is within
7 days
ORDER BY a1.user_id;
• Q.907
Question
For each week, find the total number of orders. Include only the orders that are from the first
quarter of 2023.
Explanation

1086
1000+ SQL Interview Questions & Answers | By Zero Analyst

To solve this:
• Filter the orders table to include only the data from the first quarter of 2023 (January to
March).
• Group the results by week. We can derive the week of the year using the DATE_TRUNC()
function in PostgreSQL or WEEK() in MySQL.
• Calculate the total number of orders for each week by summing the quantity for each
week.
• Return the week and the total number of orders.
Datasets and SQL Schemas
• - Table creation
CREATE TABLE orders (
order_id INT PRIMARY KEY,
order_date DATE,
quantity INT
);
• - Datasets
INSERT INTO orders (order_id, order_date, quantity) VALUES
(1, '2023-01-02', 5),
(2, '2023-02-05', 3),
(3, '2023-02-07', 2),
(4, '2023-03-10', 6),
(5, '2023-02-15', 4),
(6, '2023-04-21', 8),
(7, '2023-05-28', 7),
(8, '2023-05-05', 3),
(9, '2023-08-10', 5),
(10, '2023-05-02', 6),
(11, '2023-02-07', 4),
(12, '2023-04-15', 9),
(13, '2023-03-22', 7),
(14, '2023-04-30', 8),
(15, '2023-04-05', 6),
(16, '2023-02-02', 6),
(17, '2023-01-07', 4),
(18, '2023-05-15', 9),
(19, '2023-05-22', 7),
(20, '2023-06-30', 8),
(21, '2023-07-05', 6);

Learnings
• Filtering data by a specific date range (first quarter of 2023).
• Grouping results by week using date functions.
• Aggregating data with SUM() for total orders per week.
Solutions
• - PostgreSQL solution
SELECT
DATE_TRUNC('week', order_date) AS week_start_date,
SUM(quantity) AS total_orders
FROM orders
WHERE order_date >= '2023-01-01' AND order_date <= '2023-03-31' -- Filter for the first
quarter
GROUP BY week_start_date
ORDER BY week_start_date;
• - MySQL solution
SELECT
YEARWEEK(order_date, 1) AS week_start_date, -- WEEK() function with 1 as the second
argument for Monday as the start of the week
SUM(quantity) AS total_orders
FROM orders

1087
1000+ SQL Interview Questions & Answers | By Zero Analyst

WHERE order_date >= '2023-01-01' AND order_date <= '2023-03-31' -- Filter for the first
quarter
GROUP BY week_start_date
ORDER BY week_start_date;
• Q.908
Question
Write a query to find the starting and ending transaction amounts for each customer. Return
customer_id, their first transaction amount, last transaction amount, and the respective
transaction dates.
Explanation
To solve this:
• For each customer, identify the first and last transaction.
• To get the first transaction, order the records by transaction_date in ascending order.
• To get the last transaction, order the records by transaction_date in descending order.
• Use ROW_NUMBER() (or RANK()) window function to rank the transactions for each
customer.
• Join the two results (first and last transactions) on customer_id.
Datasets and SQL Schemas
• - Table creation
CREATE TABLE bank_transactions (
transaction_id SERIAL PRIMARY KEY,
bank_id INT,
customer_id INT,
transaction_amount DECIMAL(10, 2),
transaction_type VARCHAR(10),
transaction_date DATE
);
• - Datasets
INSERT INTO bank_transactions (bank_id, customer_id, transaction_amount, transaction_typ
e, transaction_date) VALUES
(1, 101, 500.00, 'credit', '2024-01-01'),
(1, 101, 200.00, 'debit', '2024-01-02'),
(1, 101, 300.00, 'credit', '2024-01-05'),
(1, 101, 150.00, 'debit', '2024-01-08'),
(1, 102, 1000.00, 'credit', '2024-01-01'),
(1, 102, 400.00, 'debit', '2024-01-03'),
(1, 102, 600.00, 'credit', '2024-01-05'),
(1, 102, 200.00, 'debit', '2024-01-09');

Learnings
• Using window functions (ROW_NUMBER(), RANK()) to get first and last transactions.
• Filtering the dataset by first and last rows for each customer.
• Sorting data by transaction date to identify the first and last transactions.
Solutions
• - PostgreSQL solution
WITH ranked_transactions AS (
SELECT
customer_id,
transaction_amount,
transaction_date,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY transaction_date ASC) AS fi
rst_rank,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY transaction_date DESC) AS l
ast_rank
FROM bank_transactions

1088
1000+ SQL Interview Questions & Answers | By Zero Analyst

)
SELECT
customer_id,
FIRST_VALUE(transaction_amount) OVER (PARTITION BY customer_id ORDER BY first_rank)
AS first_transaction_amt,
FIRST_VALUE(transaction_amount) OVER (PARTITION BY customer_id ORDER BY last_rank) A
S last_transaction_amt,
MIN(transaction_date) AS first_transaction_date,
MAX(transaction_date) AS last_transaction_date
FROM ranked_transactions
WHERE first_rank = 1 OR last_rank = 1
GROUP BY customer_id;
• - MySQL solution
WITH ranked_transactions AS (
SELECT
customer_id,
transaction_amount,
transaction_date,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY transaction_date ASC) AS fi
rst_rank,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY transaction_date DESC) AS l
ast_rank
FROM bank_transactions
)
SELECT
customer_id,
MAX(CASE WHEN first_rank = 1 THEN transaction_amount END) AS first_transaction_amt,
MAX(CASE WHEN last_rank = 1 THEN transaction_amount END) AS last_transaction_amt,
MIN(CASE WHEN first_rank = 1 THEN transaction_date END) AS first_transaction_date,
MAX(CASE WHEN last_rank = 1 THEN transaction_date END) AS last_transaction_date
FROM ranked_transactions
GROUP BY customer_id;
• Q.909
Question
Write a query to fetch students with the minimum and maximum marks from the "Students"
table.

Explanation
You need to identify the students who have the lowest and highest marks. This can be done
by using aggregate functions such as MIN() and MAX(), and then joining these results with the
original table to retrieve the corresponding student details.

Datasets and SQL Schemas


• - Table creation
CREATE TABLE Students (
student_id INT PRIMARY KEY,
student_name VARCHAR(50),
marks INT,
class VARCHAR(10)
);
• - Datasets
INSERT INTO Students (student_id, student_name, marks, class) VALUES
(1, 'John Doe', 85, 'A'),
(2, 'Jane Smith', 92, 'B'),
(3, 'Michael Johnson', 78, 'A'),
(4, 'Emily Brown', 59, 'C'),
(5, 'David Lee', 88, 'B'),
(6, 'Sarah Wilson', 59, 'A'),
(7, 'Daniel Taylor', 90, 'C'),
(8, 'Emma Martinez', 79, 'B'),
(9, 'Christopher Anderson', 87, 'A'),
(10, 'Olivia Garcia', 91, 'C'),
(11, 'James Rodriguez', 83, 'B'),

1089
1000+ SQL Interview Questions & Answers | By Zero Analyst

(12, 'Sophia Hernandez', 94, 'A'),


(13, 'Matthew Martinez', 76, 'C'),
(14, 'Isabella Lopez', 89, 'B'),
(15, 'Ethan Gonzalez', 80, 'A'),
(16, 'Amelia Perez', 93, 'C'),
(17, 'Alexander Torres', 77, 'B'),
(18, 'Mia Flores', 86, 'A'),
(19, 'William Sanchez', 84, 'C'),
(20, 'Ava Ramirez', 97, 'B'),
(21, 'Daniel Taylor', 75, 'A'),
(22, 'Chloe Cruz', 98, 'C'),
(23, 'Benjamin Ortiz', 89, 'B'),
(24, 'Harper Reyes', 99, 'A'),
(25, 'Ryan Stewart', 99, 'C');

Learnings
• Use of MIN() and MAX() aggregate functions to find the minimum and maximum values.
• Applying JOIN to retrieve full student details based on aggregate results.
• Filtering rows using WHERE to match the minimum and maximum marks.

Solutions
• - PostgreSQL solution
SELECT student_id, student_name, marks, class
FROM Students
WHERE marks = (SELECT MIN(marks) FROM Students)
OR marks = (SELECT MAX(marks) FROM Students);
• - MySQL solution
SELECT student_id, student_name, marks, class
FROM Students
WHERE marks = (SELECT MIN(marks) FROM Students)
OR marks = (SELECT MAX(marks) FROM Students);
• Q.910
Question
Write a query to find products that are sold by both Supplier A and Supplier B, excluding
products sold by only one supplier.
Explanation
To solve this problem, we need to identify products that are sold by both Supplier A and
Supplier B. We can achieve this by grouping the data by product_id and filtering the groups
to include only those where both suppliers are present. We will use the HAVING clause to
ensure that the product has records for both suppliers.
Datasets and SQL Schemas
• - Table creation
CREATE TABLE products (
product_id INT,
product_name VARCHAR(100),
supplier_name VARCHAR(50)
);
• - Datasets
INSERT INTO products (product_id, product_name, supplier_name) VALUES
(1, 'Product 1', 'Supplier A'),
(1, 'Product 1', 'Supplier B'),
(3, 'Product 3', 'Supplier A'),
(3, 'Product 3', 'Supplier A'),
(5, 'Product 5', 'Supplier A'),
(5, 'Product 5', 'Supplier B'),
(7, 'Product 7', 'Supplier C'),
(8, 'Product 8', 'Supplier A'),
(7, 'Product 7', 'Supplier B'),

1090
1000+ SQL Interview Questions & Answers | By Zero Analyst

(7, 'Product 7', 'Supplier A'),


(9, 'Product 9', 'Supplier B'),
(9, 'Product 9', 'Supplier C'),
(10, 'Product 10', 'Supplier C'),
(11, 'Product 11', 'Supplier C'),
(10, 'Product 10', 'Supplier A');

Learnings
• Using GROUP BY to aggregate data based on common fields.
• Filtering groups with HAVING based on conditions applied to aggregated results.
• Understanding how to check for multiple conditions across rows in the same group (i.e.,
different suppliers for the same product).
Solutions
• - PostgreSQL solution
SELECT product_id, product_name
FROM products
GROUP BY product_id, product_name
HAVING COUNT(DISTINCT supplier_name) = 2
AND 'Supplier A' IN (SELECT DISTINCT supplier_name FROM products WHERE product_id = pr
oducts.product_id)
AND 'Supplier B' IN (SELECT DISTINCT supplier_name FROM products WHERE product_id = pr
oducts.product_id);
• - MySQL solution
SELECT product_id, product_name
FROM products
GROUP BY product_id, product_name
HAVING COUNT(DISTINCT supplier_name) = 2
AND 'Supplier A' IN (SELECT DISTINCT supplier_name FROM products WHERE product_id = pr
oducts.product_id)
AND 'Supplier B' IN (SELECT DISTINCT supplier_name FROM products WHERE product_id = pr
oducts.product_id);

100+ Theoretical Questions & Answers


• Q.911
Question
What is the role of the WHERE clause in SQL queries, and how is it used?

Answer
The WHERE clause is used to filter records in SQL queries. It specifies the conditions that
must be met for a row to be included in the result set. It can be used with comparison
operators (like =, >, <), logical operators (like AND, OR), pattern matching (LIKE), and to
handle NULL values (IS NULL).
• Q.912

Question
What is the purpose of the GROUP BY clause in SQL? Provide an example.

Answer

1091
1000+ SQL Interview Questions & Answers | By Zero Analyst

The GROUP BY clause in SQL is used to group rows that have the same values in specified
columns into summary rows. It is often used with aggregate functions like COUNT(), SUM(),
AVG(), MAX(), and MIN() to perform calculations on each group of rows.

Companies Where This Question was Asked


• Google
• Amazon
• Microsoft
• Oracle
• Accenture
• Deloitte

• Q.913
Question
Explain the ORDER OF EXECUTION in SQL.

Answer
The ORDER OF EXECUTION in SQL determines the sequence in which SQL clauses are
processed. Here's the typical order:
• FROM – Retrieves the data from the tables.
• JOIN – Combines rows from different tables based on a related column.
• WHERE – Filters rows based on the given condition.
• GROUP BY – Groups rows based on column values.
• HAVING – Filters groups after the GROUP BY operation.
• SELECT – Chooses the columns to display.
• DISTINCT – Removes duplicate rows (if any).
• ORDER BY – Sorts the result based on specified columns.
• LIMIT – Limits the number of rows returned.

Companies Where This Question was Asked


• Google
• Amazon
• Microsoft
• IBM
• Accenture
• Capgemini
• Q.914
Question
Difference Between UNIQUE and PRIMARY constraints.

Answer

1092
1000+ SQL Interview Questions & Answers | By Zero Analyst

• PRIMARY constraint ensures that the column has unique values and cannot contain NULL.
It is used to identify each record uniquely in a table.
• UNIQUE constraint also ensures that the column has unique values, but it allows NULL
values (depending on the database system).
Key differences:
• A table can have only one PRIMARY key, but it can have multiple UNIQUE
constraints.
• PRIMARY key columns cannot contain NULL values, but UNIQUE columns can.

Companies Where This Question was Asked


• Google
• Amazon
• Microsoft
• Oracle
• Accenture
• Deloitte
• Q.915
Question
What is the difference between NULL and "null"?

Answer
• NULL represents the absence of a value or unknown data in a database. It is a special
marker used to indicate that no data exists for that field.
• "null" (with quotes) is just a string, a value like any other text. It is not the same as NULL
and represents the literal word "null".
Key difference:
• NULL is used to indicate missing or undefined data, while "null" is just a string
containing the characters "n", "u", "l", and "l".

Companies Where This Question was Asked


• Google
• Amazon
• Microsoft
• Oracle
• Accenture
• IBM
• Q.916
Question
Discuss the differences between the CHAR and VARCHAR data types in SQL.

Answer

1093
1000+ SQL Interview Questions & Answers | By Zero Analyst

• CHAR (Character) is a fixed-length data type. It stores data with a predefined length,
padding with spaces if the value is shorter than the specified length.
• VARCHAR (Variable Character) is a variable-length data type. It only uses the amount of
space required to store the actual value, without padding.
Key differences:
• CHAR is faster for fixed-length data, but it may waste storage if the data is shorter than
the defined length.
• VARCHAR is more efficient for variable-length data, as it only uses as much space as
needed for the value.

Companies Where This Question was Asked


• Google
• Amazon
• Microsoft
• Oracle
• Accenture
• Capgemini
• Q.917
Question
Difference Between VARIABLE and PARAMETER in Stored Procedure.

Answer
• VARIABLE in a stored procedure is used to store temporary data or intermediate results.
It can only be accessed and modified within the scope of the stored procedure.
• PARAMETER is a value passed into a stored procedure when it is called. It can be used
to accept input from the caller or return a value to the caller.
Key differences:
• VARIABLE is local to the stored procedure and is used for internal calculations or data
manipulation.
• PARAMETER is used to pass data into or out of the stored procedure.

Companies Where This Question was Asked


• Google
• Amazon
• Microsoft
• Oracle
• Accenture
• Cognizant
• Q.918
Question
Explain the purpose of the ORDER BY clause in SQL queries and provide examples.

1094
1000+ SQL Interview Questions & Answers | By Zero Analyst

Answer
The ORDER BY clause in SQL is used to sort the result set of a query in ascending (ASC) or
descending (DESC) order based on one or more columns.
• ASC (ascending) is the default and sorts data from lowest to highest (e.g., A to Z, 1 to 10).
• DESC (descending) sorts data from highest to lowest (e.g., Z to A, 10 to 1).
Example 1: Sorting by one column (ascending by default)
SELECT * FROM Employees
ORDER BY LastName;

Example 2: Sorting by one column (descending order)


SELECT * FROM Employees
ORDER BY Salary DESC;

Example 3: Sorting by multiple columns


SELECT * FROM Employees
ORDER BY Department, LastName DESC;

Companies Where This Question was Asked


• Google
• Amazon
• Microsoft
• Oracle
• Accenture
• Deloitte
• Q.919
Question
How can you handle NULL values in SQL?

Answer
In SQL, NULL represents missing or undefined data. To handle NULL values, you can use the
following methods:
• IS NULL / IS NOT NULL
Check if a column contains NULL or not.
SELECT * FROM Employees WHERE Department IS NULL;
• COALESCE()
Replace NULL with a default value.
SELECT COALESCE(Salary, 0) FROM Employees;

This replaces any NULL value in the Salary column with 0.


• IFNULL() (MySQL) / NVL() (Oracle)
Another way to replace NULL values with a specified value.
SELECT IFNULL(Salary, 0) FROM Employees; -- MySQL
• NULLIF()
Returns NULL if two expressions are equal, otherwise returns the first expression.

1095
1000+ SQL Interview Questions & Answers | By Zero Analyst

SELECT NULLIF(Salary, 0) FROM Employees;


• CASE Statement
Handle NULL in a more complex way.
SELECT CASE WHEN Salary IS NULL THEN 'No Salary' ELSE Salary END FROM Employees;

Companies Where This Question was Asked


• Google
• Amazon
• Microsoft
• Oracle
• Accenture
• IBM
• Q.920
Question
Describe the concept of normalization forms (1NF, 2NF, 3NF) and why they are important in
database design.

Answer
Normalization is the process of organizing data in a database to reduce redundancy and
improve data integrity. There are several "normal forms" (1NF, 2NF, and 3NF) that guide the
normalization process.
• First Normal Form (1NF):
• A table is in 1NF if all its columns contain atomic values (i.e., each column contains only
one value per row).
• No repeating groups or arrays.
• Example: Instead of storing multiple phone numbers in one column, each phone number
should be stored in a separate row.
-- Example of 1NF
| CustomerID | Name | Phone |
|------------|----------|------------|
| 1 | John | 1234567890 |
| 1 | John | 9876543210 |
• Second Normal Form (2NF):
• A table is in 2NF if it is in 1NF and all non-key attributes are fully functionally dependent
on the primary key (i.e., no partial dependency).
• This removes partial dependencies where non-key columns depend only on part of a
composite key.
• Example: Split a table with a composite key into multiple tables.
-- Example of 2NF (before and after)
-- Before 2NF (composite key: (OrderID, ProductID))
| OrderID | ProductID | ProductName | Quantity |
|---------|-----------|-------------|----------|
| 1 | 101 | Apple | 10 |
| 1 | 102 | Banana | 5 |

-- After 2NF
-- Orders table
| OrderID | CustomerID |
|---------|------------|
| 1 | 1001 |

1096
1000+ SQL Interview Questions & Answers | By Zero Analyst

-- OrderDetails table
| OrderID | ProductID | Quantity |
|---------|-----------|----------|
| 1 | 101 | 10 |
| 1 | 102 | 5 |

-- Products table
| ProductID | ProductName |
|-----------|-------------|
| 101 | Apple |
| 102 | Banana |
• Third Normal Form (3NF):
• A table is in 3NF if it is in 2NF and there are no transitive dependencies (i.e., non-key
attributes should not depend on other non-key attributes).
• Every non-key attribute should depend only on the primary key.
• Example: Separate a column that depends on another non-key column into a new table.
-- Example of 3NF (before and after)
-- Before 3NF
| StudentID | StudentName | Department | DepartmentHead |
|-----------|-------------|------------|----------------|
| 1 | Alice | Physics | Dr. Smith |

-- After 3NF
-- Students table
| StudentID | StudentName | DepartmentID |
|-----------|-------------|--------------|
| 1 | Alice | 101 |

-- Departments table
| DepartmentID | Department | DepartmentHead |
|--------------|-------------|----------------|
| 101 | Physics | Dr. Smith |

Why Normalization is Important:


• Data Integrity: Reduces redundancy and ensures consistent and accurate data.
• Storage Efficiency: Minimizes wasted space by eliminating duplicate data.
• Easier Updates: Makes updates, inserts, and deletes more efficient by centralizing data
and reducing anomalies.
• Scalability: More manageable and adaptable to changes in the data model.

Companies Where This Question was Asked


• Google
• Amazon
• Microsoft
• Oracle
• Accenture
• Deloitte
• Q.921
Question
Describe the benefits of using database triggers and provide examples of their usage.

Answer

1097
1000+ SQL Interview Questions & Answers | By Zero Analyst

A database trigger is a set of SQL statements that are automatically executed (or
"triggered") when certain events occur on a specific table or view. Triggers can be used to
enforce business rules, automate tasks, and maintain data integrity.

Benefits of Using Database Triggers:


• Automated Data Validation: Triggers can enforce rules such as checking data integrity
before data is inserted, updated, or deleted.
• Enforcing Business Rules: Triggers can enforce complex business logic that cannot be
easily handled by application code alone. For example, ensuring a value is within a specific
range or calculating totals on updates.
• Auditing: Triggers are useful for logging changes to data, such as creating an audit trail for
updates, deletes, and inserts.
• Data Consistency: Triggers can automatically update related tables when data in a parent
table changes, ensuring data consistency across the database.
• Performance Optimization: Some operations can be optimized using triggers by
executing complex tasks automatically instead of in the application code.

Types of Triggers:
• BEFORE Trigger: Executes before the operation (INSERT, UPDATE, DELETE) takes
place.
• AFTER Trigger: Executes after the operation (INSERT, UPDATE, DELETE) has
occurred.
• INSTEAD OF Trigger: Executes instead of the operation, used mainly for views.

Example 1: Audit Trail for DELETE Operation


Create a trigger to log deleted records into an audit table.
CREATE TRIGGER log_deletions
AFTER DELETE ON Employees
FOR EACH ROW
BEGIN
INSERT INTO Employee_Audit (EmployeeID, Action, Timestamp)
VALUES (OLD.EmployeeID, 'DELETE', NOW());
END;

This trigger logs every deletion from the Employees table into an Employee_Audit table
with a timestamp.

Example 2: Preventing Invalid Data Insertion (Data Validation)


Create a trigger that prevents inserting employees with a negative salary.
CREATE TRIGGER check_salary
BEFORE INSERT ON Employees
FOR EACH ROW
BEGIN
IF NEW.Salary < 0 THEN
SIGNAL SQLSTATE '45000' SET MESSAGE_TEXT = 'Salary cannot be negative';
END IF;
END;

This trigger prevents insertion of records into the Employees table if the Salary is negative.

Example 3: Updating Related Tables

1098
1000+ SQL Interview Questions & Answers | By Zero Analyst

When a new order is placed, a trigger can automatically update the Inventory table to reduce
stock.
CREATE TRIGGER update_inventory
AFTER INSERT ON Orders
FOR EACH ROW
BEGIN
UPDATE Products
SET Stock = Stock - NEW.Quantity
WHERE ProductID = NEW.ProductID;
END;

This trigger ensures that stock levels in the Products table are updated when a new order is
placed in the Orders table.

Companies Where This Question was Asked


• Google
• Amazon
• Microsoft
• Oracle
• Accenture
• IBM
• Q.922
Question
Difference between VIEW and MATERIALIZED VIEW.

Answer
• VIEW:
• A VIEW is a virtual table that represents the result of a query. It does not store data
physically; instead, it generates the data dynamically when queried.
• Every time you access a VIEW, the underlying query is executed to fetch the latest data.
• It is useful for simplifying complex queries, hiding sensitive data, or presenting data in a
particular format.
• MATERIALIZED VIEW:
• A MATERIALIZED VIEW is similar to a VIEW, but it stores the result of the query physically,
like a snapshot of the data at the time the view was created or last refreshed.
• It improves performance by allowing the data to be precomputed and stored, but it can
become outdated if the underlying data changes.
• You can refresh a MATERIALIZED VIEW manually or at set intervals to ensure it reflects the
most current data.

Key Differences:
• Storage:
• VIEW: Does not store data; it's virtual.
• MATERIALIZED VIEW: Stores the query result physically.
• Performance:
• VIEW: Query performance can be slower because data is fetched dynamically each time.

1099
1000+ SQL Interview Questions & Answers | By Zero Analyst

• MATERIALIZED VIEW: Generally faster for large datasets, as data is precomputed and
stored.
• Data Freshness:
• VIEW: Always shows the most up-to-date data.
• MATERIALIZED VIEW: May show stale data until it is refreshed.
• Usage:
• VIEW: Useful for complex queries that need to be executed repeatedly or for hiding
complexity.
• MATERIALIZED VIEW: Useful for improving performance where the query involves large or
complex aggregations and doesn't require real-time data.

Companies Where This Question was Asked


• Google
• Amazon
• Microsoft
• Oracle
• Accenture
• IBM
• Q.923
Question
Difference Between "AND" and "OR" operator.

Answer
• AND:
• The AND operator is used to combine multiple conditions, and all conditions must be true
for the overall expression to be true.
• If one condition is false, the entire expression evaluates to false.
Example:
SELECT * FROM Employees
WHERE Age > 30 AND Department = 'Sales';

This returns employees who are both older than 30 and belong to the 'Sales' department.
• OR:
• The OR operator is used to combine multiple conditions, and if any of the conditions is
true, the overall expression will be true.
• If one condition is true, the entire expression evaluates to true, even if the other condition
is false.
Example:
SELECT * FROM Employees
WHERE Age > 30 OR Department = 'Sales';

This returns employees who are either older than 30 or belong to the 'Sales' department.

Key Differences:
• Condition Requirement:

1100
1000+ SQL Interview Questions & Answers | By Zero Analyst

• AND: All conditions must be true for the result to be true.


• OR: Only one condition needs to be true for the result to be true.
• Result:
• AND: More restrictive, filters data more tightly.
• OR: Less restrictive, includes more data.
• Logical Behavior:
• AND: Returns true if all conditions are true.
• OR: Returns true if any condition is true.

Companies Where This Question was Asked


• Google
• Amazon
• Microsoft
• Oracle
• Accenture
• Capgemini
• Q.924
Question
Describe the difference between UNION and UNION ALL in SQL.

Answer
• UNION:
• The UNION operator combines the results of two or more queries and removes duplicate
rows from the final result.
• It ensures that each row in the result set is unique.
• Performance: Slightly slower because it needs to check and remove duplicates.
Example:
SELECT Name FROM Employees WHERE Department = 'HR'
UNION
SELECT Name FROM Employees WHERE Department = 'Finance';

This query returns a list of unique employee names from both the HR and Finance
departments, removing any duplicates.
• UNION ALL:
• The UNION ALL operator combines the results of two or more queries and includes all
rows, even duplicates.
• It does not perform any duplicate removal, so it's faster than UNION because it simply
concatenates the result sets.
Example:
SELECT Name FROM Employees WHERE Department = 'HR'
UNION ALL
SELECT Name FROM Employees WHERE Department = 'Finance';

This query returns a list of employee names from both HR and Finance departments,
including duplicates.

1101
1000+ SQL Interview Questions & Answers | By Zero Analyst

Key Differences:
• Duplicate Rows:
• UNION: Removes duplicate rows.
• UNION ALL: Includes all rows, even duplicates.
• Performance:
• UNION: Slightly slower due to the process of eliminating duplicates.
• UNION ALL: Faster because it does not remove duplicates.
• Use Cases:
• UNION: Use when you want a distinct list of results.
• UNION ALL: Use when duplicates are acceptable or when performance is a priority.

Companies Where This Question was Asked


• Google
• Amazon
• Microsoft
• Oracle
• Accenture
• Deloitte
• Q.925
Question
Difference between FULL JOIN vs LEFT JOIN.

Answer
• LEFT JOIN (or LEFT OUTER JOIN):
• A LEFT JOIN returns all rows from the left table and the matching rows from the right
table.
• If there is no match in the right table, the result will contain NULL values for columns from
the right table.
Example:
SELECT Employees.Name, Departments.DepartmentName
FROM Employees
LEFT JOIN Departments ON Employees.DepartmentID = Departments.DepartmentID;

This query returns all employees, even those who do not belong to any department (with
NULL for the department name in such cases).

• FULL JOIN (or FULL OUTER JOIN):


• A FULL JOIN returns all rows from both the left and right tables.
• If there is no match between the tables, NULL values are returned for the columns from the
table with no match.
Example:
SELECT Employees.Name, Departments.DepartmentName
FROM Employees
FULL JOIN Departments ON Employees.DepartmentID = Departments.DepartmentID;

This query returns all employees and all departments, with NULL values for either employee
or department where there is no match.

1102
1000+ SQL Interview Questions & Answers | By Zero Analyst

Key Differences:
• Returned Rows:
• LEFT JOIN: Returns all rows from the left table, and matched rows from the right table. If
no match, returns NULL from the right table.
• FULL JOIN: Returns all rows from both the left and right tables. If no match, returns NULL
for missing data from the respective table.
• Use Cases:
• LEFT JOIN: Use when you want all records from the left table and any matching records
from the right table.
• FULL JOIN: Use when you want all records from both tables, regardless of whether they
have matching rows.
• NULL Values:
• LEFT JOIN: Can result in NULL values only for columns of the right table when there's no
match.
• FULL JOIN: Can result in NULL values for columns from either the left or the right table
when no match exists.

Companies Where This Question was Asked


• Google
• Amazon
• Microsoft
• Oracle
• Accenture
• Capgemini
• Q.926
Question
SELF JOIN vs CROSS JOIN.

Answer
• SELF JOIN:
• A SELF JOIN is a join where a table is joined with itself. It is used when you need to
compare rows within the same table.
• Typically, a table is aliased to differentiate between the instances of the same table in the
query.
Example:
SELECT A.EmployeeName, B.EmployeeName AS ManagerName
FROM Employees A
LEFT JOIN Employees B ON A.ManagerID = B.EmployeeID;

This query joins the Employees table with itself to get each employee's manager's name. The
A and B are aliases for two instances of the same Employees table.

• CROSS JOIN:
• A CROSS JOIN produces the Cartesian product of two tables, meaning it returns every
combination of rows from both tables.

1103
1000+ SQL Interview Questions & Answers | By Zero Analyst

• There is no condition or relationship between the tables. If Table1 has m rows and Table2
has n rows, the result will contain m * n rows.
Example:
SELECT Products.ProductName, Colors.Color
FROM Products
CROSS JOIN Colors;

This query returns all combinations of products and colors, creating a product-color
combination for every row in both tables.

Key Differences:
• Purpose:
• SELF JOIN: Used to join a table with itself, typically to compare or relate rows within the
same table.
• CROSS JOIN: Used to produce all possible combinations of rows between two tables.
• Result:
• SELF JOIN: Produces a result based on logical relationships between rows in the same
table (e.g., employees and managers).
• CROSS JOIN: Produces every possible pair of rows from both tables, resulting in a large
result set.
• Conditions:
• SELF JOIN: Involves a condition to specify how the rows are related (usually via
primary/foreign key relationships).
• CROSS JOIN: No condition; it simply combines each row from the first table with every
row from the second table.

Companies Where This Question was Asked


• Google
• Amazon
• Microsoft
• Oracle
• Accenture
• IBM
• Q.927
Question
Can we ALTER VIEW?

Answer
No, in SQL, you cannot directly use the ALTER statement to modify an existing VIEW.
However, you can modify a view by first dropping it and then recreating it with the new
definition.

Steps to Modify a View:


• Drop the Existing View:
DROP VIEW IF EXISTS view_name;
• Create the New View:

1104
1000+ SQL Interview Questions & Answers | By Zero Analyst

CREATE VIEW view_name AS


SELECT column1, column2
FROM table_name
WHERE condition;

While ALTER VIEW does not exist, some databases (like MySQL and PostgreSQL) offer
commands like CREATE OR REPLACE VIEW to recreate a view without needing to drop it
explicitly.
Example:
CREATE OR REPLACE VIEW view_name AS
SELECT column1, column2
FROM table_name
WHERE condition;

This approach allows you to modify the view definition without manually dropping and
creating it.

Companies Where This Question was Asked


• Google
• Amazon
• Microsoft
• Oracle
• Accenture
• Deloitte
• Q.928
Question
Difference Between CTE and Recursive CTE.

Answer
• CTE (Common Table Expression):
• A CTE is a temporary result set defined within the execution scope of a SELECT, INSERT,
UPDATE, or DELETE statement. It simplifies complex queries by providing a way to break them
into modular subqueries.
• A CTE is defined using the WITH clause and can be referenced multiple times within the
main query.
Example:
WITH EmployeeCTE AS (
SELECT EmployeeID, EmployeeName, Department
FROM Employees
)
SELECT * FROM EmployeeCTE;

In this example, the EmployeeCTE is a simple result set that is defined and used in the main
query.
• Recursive CTE:
• A recursive CTE is a type of CTE that refers to itself within its definition. It is used to
handle hierarchical or recursive data, such as organizational charts, folder structures, or
family trees.
• A recursive CTE typically has two parts:

1105
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Anchor member: The base query that provides the starting point of the recursion.
• Recursive member: The query that references the CTE itself and iteratively fetches
related data.
Example (e.g., hierarchical employee structure):
WITH RECURSIVE EmployeeHierarchy AS (
-- Anchor member (base case)
SELECT EmployeeID, EmployeeName, ManagerID
FROM Employees
WHERE ManagerID IS NULL

UNION ALL

-- Recursive member
SELECT e.EmployeeID, e.EmployeeName, e.ManagerID
FROM Employees e
JOIN EmployeeHierarchy eh ON e.ManagerID = eh.EmployeeID
)
SELECT * FROM EmployeeHierarchy;

This recursive CTE retrieves all employees and their managers in a hierarchical structure.

Key Differences:
• Purpose:
• CTE: Used for simplifying complex queries by breaking them into smaller, reusable
subqueries.
• Recursive CTE: Specifically used to handle hierarchical or recursive relationships in data
(e.g., finding a manager’s hierarchy).
• Structure:
• CTE: A non-recursive result set that can be referenced within a single query.
• Recursive CTE: Includes two parts (anchor and recursive) and references itself to handle
recursion.
• Use Case:
• CTE: Used for modularizing queries, performing joins, and aggregating data.
• Recursive CTE: Used to process hierarchical or recursive data, like calculating a path,
parent-child relationships, etc.

Companies Where This Question was Asked


• Google
• Amazon
• Microsoft
• Oracle
• Accenture
• IBM
• Q.929
Question
What is the difference between a correlated and a non-correlated subquery?

Answer

1106
1000+ SQL Interview Questions & Answers | By Zero Analyst

A subquery is a query nested inside another query (usually within the SELECT, INSERT,
UPDATE, or DELETE statement). Subqueries can be classified as correlated or non-correlated
based on how they relate to the outer query.

Correlated Subquery:
• A correlated subquery is a subquery that depends on the outer query for its values. For
each row processed by the outer query, the subquery is executed, and the values from the
outer query are passed into the subquery.
• The subquery references columns from the outer query (usually via correlation), and its
result depends on those values.
Example:
SELECT EmployeeID, Name
FROM Employees E
WHERE Salary > (
SELECT AVG(Salary)
FROM Employees
WHERE DepartmentID = E.DepartmentID
);

In this example:
• The subquery uses E.DepartmentID from the outer query, making it correlated. For each
employee in the outer query, the subquery recalculates the average salary for that particular
department.

Non-Correlated Subquery:
• A non-correlated subquery is a subquery that does not depend on the outer query. It is
executed once, and its result is used by the outer query. The subquery runs independently of
the outer query.
• The subquery does not reference any columns from the outer query.
Example:
SELECT EmployeeID, Name
FROM Employees
WHERE Salary > (SELECT AVG(Salary) FROM Employees);

In this example:
• The subquery calculates the average salary of all employees once and is independent of the
outer query. It does not refer to any column from the outer Employees table.

Key Differences:
• Dependence on Outer Query:
• Correlated Subquery: Depends on the outer query for its values. It is evaluated for each
row in the outer query.
• Non-Correlated Subquery: Does not depend on the outer query. It is executed once and
its result is used by the outer query.
• Execution:
• Correlated Subquery: Evaluated once for each row of the outer query.
• Non-Correlated Subquery: Evaluated only once, independent of the outer query.
• Performance:
• Correlated Subquery: Can be slower, as the subquery is executed repeatedly for each row
in the outer query.

1107
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Non-Correlated Subquery: Usually more efficient since the subquery is executed only
once.

Companies Where This Question was Asked


• Google
• Amazon
• Microsoft
• Oracle
• Accenture
• IBM
• Q.930
Question
Describe the steps you would take to troubleshoot a slow-running SQL query.

Answer
When troubleshooting a slow-running SQL query, the goal is to identify the bottlenecks and
optimize performance. Here are the steps you would typically follow:

1. Check the Query Plan (Execution Plan)


• Why: The execution plan shows how the database engine is processing the query. It
highlights the operations (e.g., full table scans, joins, index usage) that could be causing
delays.
• How:
• In most RDBMS, you can use commands like EXPLAIN (PostgreSQL, MySQL) or SET
STATISTICS IO ON (SQL Server) to view the execution plan.
• Look for expensive operations like full table scans, nested loops, or high-cost joins.
• Action: Optimize the query by creating necessary indexes or rewriting inefficient joins.
Example (PostgreSQL/MySQL):
EXPLAIN SELECT * FROM Employees WHERE Department = 'Sales';

2. Analyze Index Usage


• Why: Queries without appropriate indexes may perform full table scans, significantly
slowing down performance, especially with large datasets.
• How:
• Review whether the columns used in WHERE, JOIN, and ORDER BY clauses have proper
indexes.
• Use the execution plan to check if indexes are being used.
• Action: Create or modify indexes on frequently queried columns.
Example:
CREATE INDEX idx_department ON Employees(Department);

3. Optimize Joins
• Why: Inefficient joins can cause performance problems, especially with large tables or if
using cartesian products (unintended cross joins).

1108
1000+ SQL Interview Questions & Answers | By Zero Analyst

• How:
• Review join conditions and ensure that you're using the most efficient type of join (e.g.,
INNER JOIN vs LEFT JOIN).
• Avoid joining unnecessary tables or using DISTINCT when it's not needed.
• Action: Rewrite complex queries, use appropriate join conditions, or limit the dataset with
WHERE before joining.

4. Limit the Data Retrieved


• Why: Querying unnecessary data or returning too many rows can reduce query
performance.
• How:
• Ensure the SELECT statement only returns the necessary columns (use SELECT
column_name instead of SELECT *).
• Apply proper filters (WHERE clauses) to limit the dataset.
• Action: Refine the query to return only the data that’s needed.

5. Check for Subquery Optimization


• Why: Subqueries can often be inefficient, especially if they are correlated subqueries or
nested inside joins.
• How:
• Replace subqueries with joins when possible. Sometimes, subqueries in the SELECT or
WHERE clause can be rewritten to improve performance.
• Use EXISTS instead of IN when checking for existence, as IN can sometimes be slower.
• Action: Refactor subqueries into joins or use more efficient alternatives like EXISTS/JOIN
instead of IN.

6. Check Server and Database Configuration


• Why: Sometimes performance issues are due to resource limitations, such as insufficient
memory, CPU, or database configuration settings.
• How:
• Check server load, available resources (CPU, memory), and database configuration settings
like buffer sizes, cache, and parallelism.
• Review database parameters like max_connections, query_cache_size, and work_mem
(for PostgreSQL) or innodb_buffer_pool_size (for MySQL).
• Action: Increase memory allocation, optimize database configuration, or upgrade hardware
if necessary.

7. Analyze Table Statistics


• Why: Outdated or missing table statistics can lead to poor query optimization by the
database engine.
• How:
• Check if table statistics are up-to-date. Most databases automatically update statistics, but
sometimes manual intervention is needed.
• For example, in SQL Server, use UPDATE STATISTICS to refresh statistics.
• Action: Update statistics to ensure the query optimizer has accurate data distribution
information.
Example (SQL Server):

1109
1000+ SQL Interview Questions & Answers | By Zero Analyst

UPDATE STATISTICS Employees;

8. Consider Query Caching


• Why: Repeated execution of the same query can be optimized using cached query results,
reducing query execution time.
• How:
• Some databases, like MySQL and PostgreSQL, cache query results for frequently executed
queries. Review your server's query cache settings.
• Check if the query results can be cached at the application level (e.g., using a caching
mechanism like Redis or Memcached).
• Action: Enable query caching if not already enabled or consider caching results in the
application layer.

9. Break Down Complex Queries


• Why: Very complex queries may require too many resources or may be difficult for the
database engine to optimize effectively.
• How:
• Break down large, complex queries into smaller intermediate steps using temporary tables
or common table expressions (CTEs).
• This helps with debugging and optimization as the database engine can process smaller
chunks more efficiently.
• Action: Refactor the query into simpler, more manageable components.

10. Check for Locks or Contention


• Why: Sometimes slow queries are caused by locks or contention for database resources.
• How:
• Check for locks in the database using commands like SHOW ENGINE INNODB STATUS
(MySQL) or sp_who2 (SQL Server).
• Look for long-running transactions or deadlocks that might be blocking your query.
• Action: Resolve locking issues by optimizing transaction handling or adjusting isolation
levels.

Companies Where This Question was Asked


• Google
• Amazon
• Microsoft
• Oracle
• Accenture
• IBM
• Q.931
Question
Difference Between ROW_NUMBER vs DENSE_RANK.

Answer

1110
1000+ SQL Interview Questions & Answers | By Zero Analyst

Both ROW_NUMBER() and DENSE_RANK() are window functions in SQL that assign a unique
number to rows within a result set, but they behave differently when handling ties (duplicate
values).

ROW_NUMBER():
• The ROW_NUMBER() function assigns a unique sequential integer to each row, starting from
1 for the first row. When there are duplicate values, ROW_NUMBER() still assigns a unique
number to each row.
• There is no gap between the assigned numbers, even if there are duplicate values in the
result set.
Example:
SELECT EmployeeID, Salary, ROW_NUMBER() OVER (ORDER BY Salary DESC) AS RowNum
FROM Employees;

If the result is:

EmployeeID Salary RowNum

1 50000 1

2 50000 2

3 45000 3

4 40000 4
In this case, even though EmployeeID 1 and EmployeeID 2 have the same salary, they still
get unique RowNum values (1 and 2).

DENSE_RANK():
• The DENSE_RANK() function also assigns a rank to each row, but it does not skip ranks
when there are ties. If two rows have the same value, they receive the same rank, and the next
rank is incremented by 1.
• This means that there are no gaps in the ranking, unlike ROW_NUMBER().
Example:
SELECT EmployeeID, Salary, DENSE_RANK() OVER (ORDER BY Salary DESC) AS Rank
FROM Employees;

If the result is:

EmployeeID Salary Rank

1 50000 1

2 50000 1

3 45000 2

1111
1000+ SQL Interview Questions & Answers | By Zero Analyst

4 40000 3
Here, EmployeeID 1 and EmployeeID 2 have the same salary and receive the same rank (1),
but EmployeeID 3 gets rank 2, and EmployeeID 4 gets rank 3. Notice that there are no gaps
in the ranking.

Key Differences:
• Handling Ties (Duplicate Values):
• ROW_NUMBER(): Assigns a unique number to every row, even if the rows have the
same value.
• DENSE_RANK(): Assigns the same rank to rows with the same value and does not skip
ranks.
• Gap in Values:
• ROW_NUMBER(): Will never have gaps in the numbering, even when there are ties.
• DENSE_RANK(): Will not leave gaps between ranks if there are ties (e.g., rank 1, rank 1,
rank 2, rank 3).
• Use Cases:
• ROW_NUMBER(): Useful when you need to generate a unique, sequential identifier for
each row, regardless of ties.
• DENSE_RANK(): Useful when you want to assign ranks and ensure no ranks are skipped,
particularly in scenarios like competition ranking.

Companies Where This Question was Asked


• Google
• Amazon
• Microsoft
• Oracle
• Accenture
• Deloitte
• Q.932
Question
Explain types of SQL commands (One or two statements from each command).

Answer
SQL commands are categorized into different types based on their functionality. The main
types are:

1. DML (Data Manipulation Language):


• Purpose: These commands are used to manipulate and query data in the database.
• Commands:
• SELECT: Retrieves data from a database.
SELECT * FROM Employees;
• INSERT: Adds new records to a table.
INSERT INTO Employees (EmployeeID, Name, Department)
VALUES (101, 'John Doe', 'Sales');

1112
1000+ SQL Interview Questions & Answers | By Zero Analyst

• UPDATE: Modifies existing records in a table.


UPDATE Employees SET Department = 'Marketing' WHERE EmployeeID = 101;
• DELETE: Removes records from a table.
DELETE FROM Employees WHERE EmployeeID = 101;

2. DDL (Data Definition Language):


• Purpose: These commands define or modify the structure of the database, tables, and
objects.
• Commands:
• CREATE: Creates a new database or table.
CREATE TABLE Employees (
EmployeeID INT,
Name VARCHAR(100),
Department VARCHAR(50)
);
• ALTER: Modifies an existing database object (e.g., table structure).
ALTER TABLE Employees ADD COLUMN Salary DECIMAL(10, 2);
• DROP: Deletes an existing database object (e.g., table, view).
DROP TABLE Employees;

3. DCL (Data Control Language):


• Purpose: These commands deal with permissions and access control.
• Commands:
• GRANT: Grants specific privileges to a user.
GRANT SELECT, INSERT ON Employees TO User1;
• REVOKE: Removes specific privileges from a user.
REVOKE SELECT ON Employees FROM User1;

4. TCL (Transaction Control Language):


• Purpose: These commands manage transactions within a database, ensuring data integrity
and consistency.
• Commands:
• COMMIT: Saves the current transaction permanently.
COMMIT;
• ROLLBACK: Reverts changes made during the current transaction.
ROLLBACK;
• SAVEPOINT: Sets a point within a transaction to which you can later roll back.
SAVEPOINT Savepoint1;

5. DQL (Data Query Language):


• Purpose: This is used to query or retrieve data from the database.
• Commands:
• SELECT: Retrieves data based on specified conditions.
SELECT * FROM Employees WHERE Department = 'Sales';

Companies Where This Question was Asked


• Google
• Amazon
• Microsoft
• Oracle
• Accenture
• Capgemini

1113
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Q.933
Question
Difference between OLAP (Online Analytical Processing) and OLTP (Online Transaction
Processing).

Answer
OLAP and OLTP are two types of data processing systems designed for different purposes.
Here's a breakdown of the key differences between them:

OLAP (Online Analytical Processing):


• Purpose: OLAP is used for analyzing large volumes of data, often in a multidimensional
format, and is mainly used for decision-making and business intelligence. It's designed to
support complex queries, aggregations, and analysis of historical data.
• Characteristics:
• Focuses on analysis rather than daily transactions.
• Typically used for generating reports, dashboards, and complex queries.
• Handles large volumes of historical data and is optimized for querying and reading data.
• Data is often organized in multidimensional structures (like cubes), which allow for fast
retrieval of aggregated data.
• Example Use Case: A company using OLAP to analyze sales trends over the last five
years or generate quarterly financial reports.
Example:
SELECT Region, SUM(Sales)
FROM SalesData
GROUP BY Region;
• Tools: Examples include Microsoft SQL Server Analysis Services (SSAS), Oracle
Essbase, and IBM Cognos.

OLTP (Online Transaction Processing):


• Purpose: OLTP is used for managing day-to-day operations and transactions of a
business, such as inserting, updating, and deleting records in real-time. It's optimized for
handling high volumes of simple, transactional queries.
• Characteristics:
• Focuses on transactions (inserts, updates, and deletes).
• Designed for real-time operations with frequent read and write operations.
• Optimized for fast query processing with a large number of users performing basic
CRUD (Create, Read, Update, Delete) operations.
• Data is typically organized in normalized tables to minimize redundancy and improve
efficiency for transaction processing.
• Example Use Case: A retail store using OLTP to process customer purchases and update
inventory in real-time.
Example:
INSERT INTO Orders (OrderID, CustomerID, ProductID, Quantity)
VALUES (1001, 3005, 452, 2);
• Tools: Examples include MySQL, Oracle Database, and Microsoft SQL Server.

1114
1000+ SQL Interview Questions & Answers | By Zero Analyst

Key Differences:

Feature OLAP OLTP

Transactional processing (daily


Purpose Data analysis and reporting
operations)

Data Size Large volume of historical data Small, current transaction records

Multidimensional (Cubes) or
Data Structure Relational (Normalized)
Denormalized

Complex queries and Simple queries with frequent


Query Type
aggregations inserts/updates

Business intelligence, data Banking systems, e-commerce,


Examples
mining reservation systems

Performance Read-heavy, optimized for Write-heavy, optimized for fast


Focus large scans insert/update/delete

Not focused on real-time


Transactions Focused on real-time transactions
transactions

Summary:
• OLAP is designed for data analysis, where the data is read-heavy, often aggregated, and
used for decision-making. It's used in scenarios like reporting, forecasting, and trend analysis.
• OLTP is designed for real-time transactions, focusing on speed and efficiency for day-
to-day operations like order processing, inventory management, and customer transactions.

Companies Where This Question was Asked


• Google
• Amazon
• Microsoft
• Oracle
• Accenture
• IBM
• Q.934
Question
Explain the difference between SQL and NoSQL.

Answer

1115
1000+ SQL Interview Questions & Answers | By Zero Analyst

SQL (Structured Query Language) and NoSQL (Not Only SQL) represent two different types
of database management systems, each with its own structure, use cases, and advantages.
Here's a breakdown of the key differences between them:

SQL (Structured Query Language):


• Purpose: SQL databases are used for structured data where relationships between data
points are well-defined. They are relational databases that store data in tables (rows and
columns).
• Characteristics:
• Data Model: Relational (tables with fixed schemas).
• Schema: Fixed schema — tables and their structure (column types, constraints) must be
defined ahead of time.
• Query Language: Uses SQL for querying, which supports complex joins, transactions,
and multi-table operations.
• Transactions: Supports ACID (Atomicity, Consistency, Isolation, Durability) properties,
ensuring reliability and consistency for transaction-based applications.
• Scalability: Primarily vertical scaling (adding more power to a single server).
• Examples: MySQL, PostgreSQL, Microsoft SQL Server, Oracle Database.
Example of SQL Query:
SELECT name, age FROM employees WHERE department = 'Sales';

NoSQL (Not Only SQL):


• Purpose: NoSQL databases are designed for unstructured or semi-structured data and
provide greater flexibility in terms of schema and scaling. They are commonly used for
handling large volumes of diverse, distributed, or evolving data types.
• Characteristics:
• Data Model: Non-relational; includes document, key-value, column-family, or graph
stores.
• Schema: Dynamic schema — Data doesn't require a predefined schema. Each document
or key-value pair can have different fields.
• Query Language: No standard query language, but each database uses its own set of APIs
or query methods. Queries tend to be simpler and optimized for horizontal scalability.
• Transactions: No support for ACID properties (except for some newer NoSQL
databases). Many NoSQL systems use BASE (Basically Available, Soft state, Eventually
consistent) for better performance at scale.
• Scalability: Primarily horizontal scaling (scaling across many servers, ideal for large-
scale applications).
• Examples: MongoDB, Cassandra, Redis, Couchbase, Neo4j (Graph database).
Example of NoSQL Query (MongoDB):
db.employees.find({ department: 'Sales' }, { name: 1, age: 1 });

Key Differences:

Feature SQL (Relational Database) NoSQL (Non-Relational Database)

Relational (tables with rows Non-relational (document, key-value,


Data Model
and columns) graph, etc.)

1116
1000+ SQL Interview Questions & Answers | By Zero Analyst

Schema Fixed, predefined schema Flexible, dynamic schema

Query SQL (Structured Query Varies by database (e.g., MongoDB


Language Language) queries, key-value access)

Supports ACID properties BASE (eventual consistency, no full


Transactions
(reliable transactions) ACID support)

Horizontal (scale-out, distributed


Scalability Vertical (scale-up)
systems)

Suitable for structured data and Suitable for unstructured data, high-
Use Case complex queries (e.g., banking, speed reads/writes, and big data (e.g.,
ERP) social media, IoT)

MySQL, PostgreSQL, Oracle, MongoDB, Cassandra, Redis,


Examples
Microsoft SQL Server Couchbase, Neo4j

Summary:
• SQL databases are ideal for structured, transaction-heavy applications where data
consistency and relationships between data entities are crucial (e.g., banking systems,
inventory management).
• NoSQL databases are better suited for handling large volumes of unstructured or semi-
structured data, flexible schema requirements, and horizontal scaling (e.g., social
networks, real-time analytics, content management).

Companies Where This Question was Asked


• Google
• Amazon
• Microsoft
• Oracle
• Accenture
• IBM
• Q.935
Question
Difference between TEXT and VARCHAR in SQL.

Answer
Both TEXT and VARCHAR are used to store variable-length strings in SQL, but there are
key differences in terms of storage, performance, and use cases.

VARCHAR (Variable Character):

1117
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Purpose: VARCHAR is used to store variable-length character strings, meaning it only takes
up as much space as the string requires, plus some overhead for length storage.
• Characteristics:
• Storage: Uses only the amount of space required to store the string plus a small overhead
(usually 1 or 2 bytes for length information).
• Length: You must specify a maximum length for the column when defining it (e.g.,
VARCHAR(255)).
• Performance: Faster for most operations since it is optimized for variable-length storage.
• Use Case: Ideal for fields where you know the maximum length of the string, like names,
email addresses, or titles.
Example:
CREATE TABLE Employees (
EmployeeID INT,
Name VARCHAR(100)
);

TEXT:
• Purpose: TEXT is used to store large amounts of text (such as long paragraphs or articles)
and can store data of variable length without a predefined size limit.
• Characteristics:
• Storage: Internally, TEXT may use a different storage mechanism (like a separate memory
area for large data), and it often has more overhead compared to VARCHAR.
• Length: There is no maximum length for TEXT (or it's significantly larger than VARCHAR),
but in some databases, the maximum size can still be constrained (e.g., in PostgreSQL, TEXT
can store up to 1GB of data).
• Performance: May not be as performant as VARCHAR for smaller strings because it is
optimized for large text data and can introduce additional overhead in certain operations.
• Use Case: Suitable for storing large blocks of text, such as descriptions, articles, or long-
form content.
Example:
CREATE TABLE Articles (
ArticleID INT,
Content TEXT
);

Key Differences:

Feature VARCHAR TEXT

Requires a specified maximum No fixed length, can store very


Length
length (e.g., VARCHAR(255)) large amounts of text

Efficient, uses only the amount of May require more storage


Storage space required plus a small overhead due to handling large
overhead data

1118
1000+ SQL Interview Questions & Answers | By Zero Analyst

Generally faster for short strings, May have some overhead for
Performance especially when a length is larger data, but optimized for large
specified text storage

Short to moderately long strings Large text data (e.g., descriptions,


Use Case
(e.g., names, emails, titles) articles)

Limited by the specified length


Maximum Can store very large amounts of
(e.g., VARCHAR(255) has a
Size text (e.g., 1GB in PostgreSQL)
maximum of 255 characters)

Available in most DBMS but may


Database Widely supported with a defined
have different implementations
Support length
and limitations

Summary:
• VARCHAR is best used when you know the maximum size of the text and want to
optimize storage and performance.
• TEXT is used for larger blocks of text where the size can vary greatly, and there is no
predefined length constraint.

Companies Where This Question was Asked


• Google
• Amazon
• Microsoft
• Oracle
• Accenture
• IBM
• Q.936
Question
Explain the concept of database transactions and the ACID properties.

Answer

Database Transactions:
A database transaction is a sequence of one or more operations (such as insert, update,
delete, or select) that are executed as a single unit of work. These operations are executed in
a way that they either all succeed or all fail. If any operation within a transaction fails, the
entire transaction is rolled back to ensure the database remains in a consistent state.
Key Aspects of Transactions:
• Atomicity: A transaction is an atomic unit of work. This means that the entire transaction
is treated as a single operation, and either all changes are committed, or none are. If one part
of the transaction fails, the entire transaction is rolled back.

1119
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Consistency: A transaction must transition the database from one valid state to another.
The database must remain in a valid state before and after the transaction.
• Isolation: Transactions are isolated from one another. The intermediate steps of a
transaction are not visible to other transactions. The final result is only visible once the
transaction is completed.
• Durability: Once a transaction is committed, its changes are permanent, even if there is a
system crash.

ACID Properties:
The ACID properties define the essential characteristics that ensure reliable processing of
database transactions. They are:

1. Atomicity:
• Definition: Atomicity ensures that a transaction is all or nothing. If any part of the
transaction fails, the entire transaction is aborted, and the database remains unchanged.
• Example: If a bank transfer involves withdrawing money from one account and depositing
it into another, both actions must succeed. If either the withdrawal or the deposit fails, both
operations are rolled back.
Example:
BEGIN TRANSACTION;
UPDATE Accounts SET Balance = Balance - 100 WHERE AccountID = 1;
UPDATE Accounts SET Balance = Balance + 100 WHERE AccountID = 2;
COMMIT;

2. Consistency:
• Definition: The consistency property ensures that a transaction brings the database from
one valid state to another. The database must satisfy all integrity constraints (like foreign
keys, unique constraints, etc.) before and after the transaction.
• Example: A database enforces that an account balance cannot be negative. If a transaction
tries to withdraw more than the available balance, the transaction will fail, keeping the
database in a consistent state.

3. Isolation:
• Definition: Isolation ensures that transactions are executed independently of each other.
Even though multiple transactions might be happening concurrently, each transaction will
execute as if it is the only one.
• Levels of Isolation: SQL databases provide different levels of isolation to manage
concurrency:
• Read Uncommitted: Transactions can read data that is not yet committed.
• Read Committed: A transaction can only read committed data.
• Repeatable Read: Ensures that if a transaction reads a record, no other transaction can
modify it until the transaction is complete.
• Serializable: Ensures that transactions are executed in a way that they seem to be executed
sequentially, even though they may be running concurrently.
Example:
Two transactions trying to update the same record concurrently could lead to inconsistent
results, but isolation prevents this from happening by ensuring that one transaction's changes
are invisible to others until the transaction is complete.

1120
1000+ SQL Interview Questions & Answers | By Zero Analyst

4. Durability:
• Definition: Durability guarantees that once a transaction is committed, its effects are
permanent, even in the event of a system failure (like a power loss or crash).
• Example: Once a bank transfer transaction is committed, even if the system crashes
immediately after, the money will still have been transferred successfully when the system
recovers.
Example:
COMMIT; -- After this, the changes are permanent even if the system crashes.

Why ACID Properties Matter:


• Reliability: ACID ensures that a database system is reliable and can handle unexpected
situations, like crashes or concurrent transactions, without losing data or corrupting the
database.
• Consistency: It guarantees that a database is always in a valid state, which is crucial for
applications like banking, e-commerce, or any system that requires high data integrity.
• Concurrency Control: By ensuring isolation, ACID helps manage multiple transactions
occurring at the same time without causing conflicts or inconsistent data.

Summary:
A database transaction is a sequence of operations that are executed as a single unit of
work. The ACID properties (Atomicity, Consistency, Isolation, Durability) ensure that
transactions are processed reliably, maintaining data integrity and consistency even in cases
of errors, crashes, or concurrent access.

Companies Where This Question was Asked


• Google
• Amazon
• Microsoft
• Oracle
• Accenture
• IBM
• Q.937
Question
Describe the importance of data integrity constraints such as NOT NULL, UNIQUE, and
CHECK constraints in SQL databases.

Answer
Data integrity constraints are essential in maintaining the accuracy, consistency, and
reliability of data in SQL databases. They ensure that only valid data is stored in the
database, enforcing business rules and preventing data anomalies. Let's look at the NOT
NULL, UNIQUE, and CHECK constraints and their importance:

1. NOT NULL Constraint:

1121
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Purpose: The NOT NULL constraint ensures that a column cannot have a NULL value,
meaning that a value must be provided when inserting or updating records in the table.
• Importance:
• Data Completeness: Ensures that critical fields (e.g., user ID, order date) are always filled
with meaningful data.
• Prevents Missing Data: Guarantees that no rows are left with incomplete data, which
could lead to errors or inconsistencies in business logic.
• Improves Data Quality: Ensures that the application or system always has the necessary
information to operate properly.
Example:
CREATE TABLE Employees (
EmployeeID INT NOT NULL,
Name VARCHAR(100) NOT NULL,
DateOfBirth DATE NOT NULL
);

In this example, the NOT NULL constraint ensures that EmployeeID, Name, and DateOfBirth
are always provided for each employee record.

2. UNIQUE Constraint:
• Purpose: The UNIQUE constraint ensures that all values in a column or a set of columns are
distinct. No two rows can have the same value in the specified column(s).
• Importance:
• Prevents Duplicate Data: Helps in preventing redundancy by ensuring that there are no
duplicate values in important fields like email addresses, usernames, or customer IDs.
• Improves Data Quality: Ensures uniqueness, which is critical for identifying records
uniquely (e.g., an email address should be unique for each user).
• Key for Referential Integrity: When used in conjunction with primary keys or foreign
keys, it helps enforce relationships between tables.
Example:
CREATE TABLE Customers (
CustomerID INT PRIMARY KEY,
Email VARCHAR(255) UNIQUE
);

Here, the UNIQUE constraint on Email ensures that no two customers can have the same email
address.

3. CHECK Constraint:
• Purpose: The CHECK constraint is used to enforce a condition on the values in a column. It
ensures that only values satisfying a specified condition are allowed in a column.
• Importance:
• Enforces Business Rules: The CHECK constraint allows you to enforce rules directly at the
database level. For example, ensuring that an employee's salary is always greater than 0 or
that a product's price is within a valid range.
• Data Validation: It provides a way to validate data before it’s inserted into the database,
preventing invalid or out-of-bound values.

1122
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Improves Data Consistency: By enforcing rules on data, it ensures that the data complies
with predefined conditions and constraints, improving data integrity.
Example:
CREATE TABLE Products (
ProductID INT PRIMARY KEY,
Price DECIMAL(10, 2),
CHECK (Price > 0)
);

In this example, the CHECK constraint ensures that no product can have a price less than or
equal to 0.

Why Data Integrity Constraints Are Important:


• Ensures Data Accuracy: Constraints prevent the entry of invalid or incorrect data into the
database, ensuring that only valid, meaningful data is stored.
• Improves Data Consistency: By enforcing rules at the database level, constraints help
maintain consistency across the database, ensuring that the data follows the business logic
and rules of the organization.
• Simplifies Data Validation: Constraints help with data validation directly in the database,
reducing the need for repetitive validation checks in the application code.
• Prevents Anomalies: Constraints like NOT NULL, UNIQUE, and CHECK protect the database
from data anomalies, such as duplicate records, missing critical values, or invalid data
ranges.
• Helps in Data Integrity and Reliability: By maintaining accurate and valid data, integrity
constraints contribute to the reliability of the database, which is crucial for reporting,
decision-making, and data analytics.

Summary:
• NOT NULL ensures that fields always have a value, preventing incomplete data.
• UNIQUE ensures that important fields (like email or user IDs) contain distinct values,
preventing duplicates.
• CHECK enforces specific conditions or business rules on data, ensuring that only valid
data is entered into the database.
Together, these constraints help maintain high-quality, reliable, and consistent data, which is
essential for the smooth functioning of applications, reporting, and business processes.

Companies Where This Question was Asked


• Google
• Amazon
• Microsoft
• Oracle
• Accenture
• IBM
• Q.938
Question
Discuss the advantages and disadvantages of using stored procedures.

1123
1000+ SQL Interview Questions & Answers | By Zero Analyst

Answer
A stored procedure is a precompiled collection of SQL statements and logic stored in the
database, which can be executed with a single call. They are commonly used to encapsulate
business logic, manage repetitive tasks, and improve performance. Like any tool, stored
procedures come with both advantages and disadvantages.

Advantages of Using Stored Procedures:


• Improved Performance:
• Precompilation: Stored procedures are precompiled and stored in the database, so they are
optimized for execution. This results in faster execution compared to dynamic SQL, as there
is no need for the SQL query to be parsed and compiled every time it's executed.
• Reduced Network Traffic: Since the logic is executed on the database server, only the
procedure call and necessary parameters are sent over the network, reducing the amount of
data transmitted.
Example: A stored procedure that performs multiple operations (e.g., querying, updating)
can be called once, reducing the need to send several individual queries.
• Code Reusability:
• Centralized Logic: Business logic encapsulated in stored procedures can be reused across
various applications and clients. This reduces redundancy and ensures that changes to logic
need to be made in just one place.
• Consistency: Reusing stored procedures ensures that the same operations are performed
consistently across different parts of an application or even across different applications.
• Security:
• Access Control: Stored procedures provide a layer of security. Users can be granted
permission to execute the stored procedure without giving them direct access to the
underlying tables or data. This can help to prevent unauthorized access or accidental
modifications.
• SQL Injection Prevention: Because stored procedures are precompiled, they can help
mitigate the risk of SQL injection attacks when proper parameters are used, unlike dynamic
SQL.
• Maintainability:
• Easier to Manage: When the logic is encapsulated in stored procedures, it’s easier to
maintain and modify. Changes to logic can be made in one central location (the stored
procedure), rather than modifying each SQL query in the application code.
• Version Control: Stored procedures can be versioned separately from the application
code, allowing changes and updates to be tracked more effectively.
• Reduced Client-Side Complexity:
• Encapsulation of Business Logic: Stored procedures can encapsulate complex business
logic, reducing the complexity of the client-side code. The client application can simply call
the stored procedure, without needing to handle complex SQL queries.

Disadvantages of Using Stored Procedures:


• Database Dependency:
• Tight Coupling to DBMS: Stored procedures are usually written in the procedural
language of a specific DBMS (e.g., PL/SQL for Oracle, T-SQL for SQL Server), which can

1124
1000+ SQL Interview Questions & Answers | By Zero Analyst

make them DBMS-dependent. This can cause issues if you need to switch databases or use
multiple database systems.
• Portability Issues: Code written for one database system (e.g., SQL Server) may not be
compatible with another system (e.g., MySQL or PostgreSQL), making it harder to migrate to
a different platform.
• Performance Overhead:
• Complexity in Execution: While stored procedures are precompiled, if they contain
complex logic or many operations, they can still introduce performance overhead. Misusing
them (e.g., for tasks better handled by application code) can lead to slow performance.
• Resource Consumption: Long-running or resource-heavy stored procedures can place
strain on the database server, affecting the performance of other queries and operations.
• Debugging and Testing Challenges:
• Difficult to Debug: Debugging stored procedures can be more difficult compared to
regular application code, as many database systems do not offer robust debugging tools or a
simple way to step through stored procedure execution.
• Limited Testing: Stored procedures are often tested in isolation, but in real-world
applications, they interact with other components, which can make testing and integration
more challenging.
• Version Control and Change Management:
• Harder to Integrate into CI/CD Pipelines: Managing stored procedure versions and
integrating them into modern CI/CD (Continuous Integration/Continuous Deployment)
pipelines can be more complex compared to application code. Manual intervention is often
required to deploy changes to stored procedures.
• Lack of Good Version Control: Unlike regular application code, versioning and
maintaining changes to stored procedures can be harder without proper tools or processes,
especially in large systems.
• Vendor Lock-In:
• Proprietary SQL Extensions: Each DBMS has its own proprietary extensions for stored
procedures, making it harder to port your code to another database. For example, a stored
procedure written in T-SQL for SQL Server might not work in MySQL or PostgreSQL due to
differences in the procedural languages used by each database.
• Dependency on Database Vendors: Since stored procedures are highly dependent on the
underlying database system, this could lead to lock-in with a particular vendor (e.g., Oracle,
Microsoft SQL Server), which might limit flexibility in choosing or switching databases.
• Difficulty in Scaling:
• Centralized Logic: Since stored procedures execute on the database server, they can
become a bottleneck if not properly optimized, especially in high-traffic applications. This
can hinder scalability, particularly in distributed systems where logic might need to be
decentralized.
• Limited Parallelism: Complex stored procedures might not take full advantage of
database optimizations for parallelism, limiting performance when handling very large
volumes of data.

Summary:
Advantages:
• Improved Performance: Precompiled, reducing query compilation time.
• Code Reusability: Centralized business logic.

1125
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Security: Limits direct access to data, helps prevent SQL injection.


• Maintainability: Easier to manage and modify code.
• Reduced Client Complexity: Encapsulates complex logic away from application code.
Disadvantages:
• Database Dependency: Tied to a specific DBMS.
• Performance Overhead: Can become inefficient with complex operations.
• Debugging Challenges: Difficult to debug and test.
• Version Control Issues: Harder to manage in modern CI/CD workflows.
• Vendor Lock-In: Harder to switch databases due to proprietary languages.
• Scalability Issues: Can be a bottleneck if not properly optimized.

Companies Where This Question was Asked


• Google
• Amazon
• Microsoft
• Oracle
• Accenture
• IBM
• Q.939
Question
Discuss the role of the COMMIT and ROLLBACK statements in SQL transactions.

Answer
In SQL, COMMIT and ROLLBACK are crucial statements that control the end of a
transaction and determine whether the changes made during the transaction should be saved
or discarded. These statements ensure that a series of operations can be treated as a single
unit of work, helping maintain data integrity and consistency. Here’s a deeper dive into their
roles:

COMMIT Statement:
• Purpose: The COMMIT statement is used to permanently save all the changes made during a
transaction to the database. Once a transaction is committed, the changes become visible to
other transactions, and they are permanent, even in the case of a system crash.
• When is it used?:
• You use COMMIT after performing a set of operations (like INSERT, UPDATE, or DELETE) that
you want to make permanent.
• Typically, COMMIT is used at the end of a transaction when you are sure the transaction has
been completed successfully.
• Importance:
• Data Persistence: Ensures that all changes to the database are saved and made permanent.
• Concurrency: Makes changes visible to other users or transactions. Once committed, other
transactions can read or modify the data.

1126
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Transaction Completion: Marks the successful completion of a transaction. Once a


COMMIT is issued, the transaction is considered finished, and no further changes can be rolled
back.
• Example:
BEGIN TRANSACTION;
UPDATE Accounts SET Balance = Balance - 100 WHERE AccountID = 1;
UPDATE Accounts SET Balance = Balance + 100 WHERE AccountID = 2;
COMMIT; -- Make the changes permanent

In this example, after transferring money from one account to another, the COMMIT statement
ensures that both the withdrawal and deposit are permanently saved.

ROLLBACK Statement:
• Purpose: The ROLLBACK statement is used to undo all changes made during a transaction.
If an error occurs or you decide that you do not want to save the changes, you can use
ROLLBACK to revert the database to its state before the transaction began.
• When is it used?:
• You use ROLLBACK when you want to discard the changes made during a transaction. This
is typically done if an error occurs, or if the business logic or validation fails during the
transaction.
• You can also use ROLLBACK if you determine that a transaction was unnecessary or
incorrect.
• Importance:
• Error Handling: Allows you to cancel a transaction and maintain the consistency and
integrity of the database in the case of an error or failure.
• Atomicity: Guarantees that if part of a transaction fails, all changes made during the
transaction are rolled back, and the database remains in a valid state.
• Data Integrity: Ensures that partial changes to data are not left in an inconsistent or
corrupt state.
• Example:
BEGIN TRANSACTION;
UPDATE Accounts SET Balance = Balance - 100 WHERE AccountID = 1;
UPDATE Accounts SET Balance = Balance + 100 WHERE AccountID = 2;
-- If an error occurs, roll back the transaction
ROLLBACK; -- Undo all changes

In this example, if there’s an issue (e.g., insufficient funds, database error), ROLLBACK ensures
that neither account is affected, and the database returns to its state before the transaction
started.

How COMMIT and ROLLBACK Work Together:


• Transaction Lifecycle: A transaction begins with some database operations and can be
ended either with a COMMIT or a ROLLBACK.
• If everything goes as expected and you want the changes to be permanent, use COMMIT.
• If something goes wrong or you want to cancel the changes, use ROLLBACK.
• Atomicity: Both COMMIT and ROLLBACK ensure atomicity, meaning that all operations in a
transaction either succeed together or fail together. This is fundamental to the concept of
transactions: if one part fails, the whole transaction fails, and the database is left unchanged
(thanks to ROLLBACK).

1127
1000+ SQL Interview Questions & Answers | By Zero Analyst

Real-Life Scenario:
Imagine you're transferring money between two bank accounts:
• Step 1: You withdraw $100 from Account A.
• Step 2: You deposit $100 into Account B.
If the deposit to Account B fails for any reason (e.g., database crash or validation issue), you
don’t want Account A to have been debited. So, you ROLLBACK the entire transaction,
ensuring that neither account is modified. However, if both operations succeed, you would
issue a COMMIT, making the transaction permanent.
BEGIN TRANSACTION;
UPDATE Accounts SET Balance = Balance - 100 WHERE AccountID = 1;
UPDATE Accounts SET Balance = Balance + 100 WHERE AccountID = 2;

-- If both updates succeed, COMMIT the transaction


COMMIT;
-- If something goes wrong, ROLLBACK the transaction
ROLLBACK;

Key Differences Between COMMIT and ROLLBACK:

Aspect COMMIT ROLLBACK

To make changes To undo all changes and restore the database


Purpose
permanent and visible to its state before the transaction

Effect on Makes the changes


Cancels the changes and restores original data
Data permanent

Changes are visible to


Visibility Changes are not visible and are discarded
other transactions

When to After a transaction


If an error occurs or changes are not needed
Use completes successfully

Summary:
• COMMIT makes all changes within a transaction permanent and visible to others.
• ROLLBACK undoes all changes made during a transaction, ensuring that no partial or
inconsistent data is saved.
These two statements are essential for maintaining data integrity, consistency, and ensuring
that transactions are either fully completed or fully discarded, in line with the ACID
properties of database transactions.

Companies Where This Question was Asked


• Google

1128
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Amazon
• Microsoft
• Oracle
• Accenture
• IBM
• Q.940
Question
Explain the purpose of the LIKE operator in SQL and provide examples of its usage.

Answer
The LIKE operator in SQL is used to search for a specified pattern within a column's
values. It is primarily used in the WHERE clause to filter results based on pattern matching,
especially when you need to search for values that match a particular sequence of characters.

Purpose of the LIKE Operator:


• Pattern Matching: The LIKE operator allows you to search for values that match a
specific pattern, rather than exact matches.
• Flexible Searches: It is helpful when you are unsure of the exact value but know part of it,
like searching for a name or part of an address.
• Text-based Filtering: It’s commonly used with string columns to search for substrings,
partial matches, or similar text.

Wildcard Characters Used with LIKE:


• % (percent sign): Represents zero, one, or multiple characters.
• _ (underscore): Represents exactly one character.

Examples of LIKE Operator Usage:

1. Basic Example Using %:


• To find all records where a column contains a certain substring (e.g., "John"):
SELECT * FROM Employees
WHERE Name LIKE '%John%';
• Explanation: The % before and after John means that any characters can appear before or
after "John". It will match names like "Johnny", "Jonathan", "John Smith", etc.

2. Using % to Start or End with a Pattern:


• Names that start with "John":
SELECT * FROM Employees
WHERE Name LIKE 'John%';
• Explanation: The John% pattern will match any name starting with "John", such as "John",
"Johnny", "Johnathan", etc.
• Names that end with "son":
SELECT * FROM Employees
WHERE Name LIKE '%son';
• Explanation: The %son pattern matches names ending in "son", like "Wilson", "Jason",
"Harrison", etc.

1129
1000+ SQL Interview Questions & Answers | By Zero Analyst

3. Using _ (Underscore) for Single Character Matching:


• To find employees with names that have exactly 5 characters and start with "J":
SELECT * FROM Employees
WHERE Name LIKE 'J____';
• Explanation: The J____ pattern ensures that the name starts with "J" and is followed by
exactly 4 characters. It would match names like "James", "Jared", etc., but not "John" or
"Jonathan".

4. Combining % and _:
• To find names where the first character is "J" and the second character is any letter, and the
name is followed by any number of characters:
SELECT * FROM Employees
WHERE Name LIKE 'J_n%';
• Explanation: The J_n% pattern matches names that start with "J", have any letter in the
second position (represented by _), and can be followed by any characters. It would match
"Jan", "Jonny", "Jean", etc.

5. Case Sensitivity:
• In some SQL databases, the LIKE operator is case-insensitive by default (e.g., in MySQL),
but in others (e.g., PostgreSQL), it is case-sensitive.
• To perform a case-insensitive search, some systems offer functions like ILIKE (in
PostgreSQL) or use LOWER()/UPPER() functions:
SELECT * FROM Employees
WHERE LOWER(Name) LIKE 'john%';

Summary of the LIKE Operator:


• % matches any sequence of characters (including no characters).
• _ matches exactly one character.
• Used in WHERE clauses to filter data based on partial string matching.
• Commonly used when you don’t know the exact value you are searching for but can define
a pattern.

Advantages:
• Provides flexible and powerful pattern matching in SQL queries.
• Helps in filtering data based on incomplete or approximate text.

Disadvantages:
• Can be slower on large datasets because it requires pattern matching.
• Wildcard searches starting with % (e.g., %John) may not use indexes effectively, leading to
slower performance.

Companies Where This Question was Asked


• Google
• Amazon
• Microsoft
• Oracle
• Accenture
• IBM

1130
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Q.941
Question
Explain the concept of data warehousing and how it differs from traditional relational
databases.

Answer

Data Warehousing:
A data warehouse is a specialized type of database designed to support business
intelligence (BI) activities, such as reporting, data analysis, and decision-making. It is an
integrated, subject-oriented, time-variant, and non-volatile collection of data that helps
organizations make strategic business decisions.

Key Characteristics of a Data Warehouse:


• Subject-Oriented:
• A data warehouse is designed to analyze data by key subjects (e.g., sales, inventory,
finance) rather than focusing on day-to-day transactions.
• Data is organized and optimized for business analysis, not for transactional processing.
• Integrated:
• Data from different sources (e.g., operational databases, external sources) is cleaned,
transformed, and integrated into the warehouse, providing a unified view of data across the
organization.
• Time-Variant:
• Data in a data warehouse is often historical, meaning it stores data over time (e.g., months
or years). This allows for trend analysis and reporting based on historical information.
• Non-Volatile:
• Once data is loaded into a data warehouse, it is rarely updated or deleted. This ensures the
data remains stable and consistent for analytical purposes.
• Optimized for Analytical Queries:
• Data warehouses are structured to support complex queries and reporting. They are often
optimized for read-heavy workloads rather than frequent updates or inserts.

How Data Warehousing Differs from Traditional Relational Databases:

Traditional Relational
Feature Data Warehousing
Databases (OLTP)

Designed for analysis, reporting, Designed for transaction


Purpose
and decision-making. processing and daily operations.

Organized by subject (sales, Structured around operational


Data Structure finance, etc.) and optimized for processes with a focus on
complex queries. efficiency and consistency.

1131
1000+ SQL Interview Questions & Answers | By Zero Analyst

Stores real-time, current


Stores historical data for analysis
Data Type transactional data (e.g., customer
and trends over time.
orders, inventory).

Primarily used for read


Optimized for write operations
Data Operations operations (e.g., complex queries,
(e.g., inserts, updates, deletes).
reports).

Data is often stored in a star Data is normalized (often in third


Schema Design schema or snowflake schema for normal form) to avoid redundancy
easier querying and reporting. and ensure data consistency.

Supports complex analytical


Query Handles simpler, transactional
queries, aggregations, and data
Complexity queries (e.g., CRUD operations).
mining.

Optimized for large-scale data


Performance Optimized for fast inserts,
retrieval (e.g., OLAP cubes, pre-
Optimization updates, and deletes.
aggregated data).

Data is periodically updated


Data is constantly updated in real-
Data Freshness (batch loads) from operational
time with every transaction.
systems.

Data is consistent across the


Data Data consistency is critical for
organization, but not necessarily
Consistency every transaction in real-time.
in real-time.

Differences Between Data Warehousing and Traditional Relational Databases:


• Purpose:
• Data Warehouses: Primarily used for analytical purposes such as reporting, data mining,
and trend analysis.
• Relational Databases: Designed for transactional purposes where data is frequently
inserted, updated, and deleted (e.g., handling customer orders, inventory).
• Data Organization:
• Data Warehouses: Use schemas like star or snowflake, which are optimized for complex
queries and analysis. These schemas organize data into facts (e.g., sales, revenue) and
dimensions (e.g., time, region).
• Relational Databases: Typically normalized into multiple tables to reduce redundancy and
ensure data integrity. Data is usually organized based on entities and their relationships (e.g.,
customers, products).
• Data Handling:
• Data Warehouses: Store historical data, often including aggregated data for faster
reporting. Data is non-volatile, meaning it’s rarely modified after it’s loaded.
• Relational Databases: Store real-time operational data, and updates occur frequently
(inserts, updates, deletes).

1132
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Performance:
• Data Warehouses: Optimized for read-heavy workloads and complex queries, with larger
data volumes. Queries are often aggregations or multi-table joins that analyze trends or past
performance.
• Relational Databases: Optimized for write-heavy workloads and transactional integrity.
The focus is on speed and accuracy for daily business operations.
• Data Refresh:
• Data Warehouses: Data is updated at regular intervals (e.g., daily, weekly) via batch
processing. It’s not real-time data, but a historical snapshot that enables trend analysis.
• Relational Databases: Data is updated in real-time with every transaction (e.g., when a
user places an order or updates their profile).

Example Use Case for Data Warehousing:


• A retail company could use a data warehouse to aggregate data from various operational
systems (e.g., sales transactions, inventory data, customer information) to generate reports
and analyze trends over the past year. For example, they could generate monthly sales
reports, track customer purchasing patterns, or analyze regional performance. The data is
loaded into the warehouse in batches and then used for high-level reporting, decision-making,
and strategic planning.

Summary:
• Data warehousing is designed for the storage and analysis of large volumes of
historical data, often from multiple sources, and is optimized for read-heavy, complex
queries and business intelligence tasks.
• Traditional relational databases are used for real-time transactional operations,
focusing on consistency, speed, and integrity of daily business processes.
Data warehousing helps organizations extract meaningful insights from vast amounts of
data, while traditional relational databases support the core transactional activities that keep
a business running.

Companies Where This Question was Asked


• Google
• Amazon
• Microsoft
• Oracle
• Accenture
• IBM
• Q.942
Question
Describe the benefits of using database triggers and provide examples of their usage.

Answer

1133
1000+ SQL Interview Questions & Answers | By Zero Analyst

What is a Database Trigger?


A database trigger is a set of SQL statements that are automatically executed (or
"triggered") in response to specific events on a particular table or view in a database. Triggers
are often used to enforce business rules, data integrity, and automate tasks within the
database.
Triggers are commonly used in relational databases (like MySQL, Oracle, SQL Server,
PostgreSQL) and can be fired by insertions, updates, or deletions of data.

Benefits of Using Database Triggers:


• Automatic Enforcement of Business Rules:
• Triggers can enforce business rules or constraints automatically when certain conditions
are met, without needing to modify application code.
• For example, if you want to prevent a customer from having a negative balance, you can
create a trigger that automatically rejects updates or inserts where the balance is negative.
Example: Prevent negative balances.
CREATE TRIGGER prevent_negative_balance
BEFORE UPDATE ON Accounts
FOR EACH ROW
BEGIN
IF NEW.balance < 0 THEN
SIGNAL SQLSTATE '45000' SET MESSAGE_TEXT = 'Balance cannot be negative';
END IF;
END;
• Auditing and Logging:
• Triggers are frequently used for auditing and logging purposes, where they automatically
log changes made to data (such as updates, deletions, or inserts) in an audit table. This can
help track changes for compliance or historical tracking.
Example: Auditing changes to an employee's salary.
CREATE TRIGGER salary_audit
AFTER UPDATE ON Employees
FOR EACH ROW
BEGIN
INSERT INTO Salary_Audit (EmployeeID, OldSalary, NewSalary, ChangeDate)
VALUES (OLD.EmployeeID, OLD.Salary, NEW.Salary, NOW());
END;
• Maintaining Data Integrity:
• Triggers help maintain data consistency and integrity. They can automatically enforce
referential integrity by checking or updating related data when records are inserted, updated,
or deleted.
• For instance, a trigger could ensure that when an order is deleted, all associated order items
are also removed from the database.
Example: Cascading deletes in an order system.
CREATE TRIGGER cascade_order_delete
AFTER DELETE ON Orders
FOR EACH ROW
BEGIN
DELETE FROM OrderItems WHERE OrderID = OLD.OrderID;
END;
• Preventing Invalid Data Entry:
• Triggers can prevent invalid data from being entered into a database by checking values
before or after an insert/update occurs. This helps to ensure that data adheres to predefined
rules or constraints.

1134
1000+ SQL Interview Questions & Answers | By Zero Analyst

Example: Prevent insertions of records with invalid email formats.


CREATE TRIGGER validate_email_format
BEFORE INSERT ON Customers
FOR EACH ROW
BEGIN
IF NOT NEW.Email LIKE '%_@__%.__%' THEN
SIGNAL SQLSTATE '45000' SET MESSAGE_TEXT = 'Invalid email format';
END IF;
END;
• Complex Default Values or Computations:
• Triggers can be used to automatically compute values or set default values for certain
columns based on the values of other columns. For example, setting a timestamp whenever a
record is updated.
Example: Automatically set the "LastModified" timestamp on update.
CREATE TRIGGER set_last_modified
BEFORE UPDATE ON Products
FOR EACH ROW
BEGIN
SET NEW.LastModified = NOW();
END;
• Enforcing Referential Integrity (Cascading Actions):
• Triggers can be used for enforcing cascading actions (such as cascading updates and
deletes) that are not supported directly by foreign key constraints in certain DBMS.
Example: Automatically update an employee’s department when the department name is
changed.
CREATE TRIGGER update_department_name
AFTER UPDATE ON Departments
FOR EACH ROW
BEGIN
UPDATE Employees SET DepartmentName = NEW.DepartmentName WHERE DepartmentName = OLD.D
epartmentName;
END;
• Improved Performance in Some Use Cases:
• Triggers can sometimes improve performance by reducing the amount of work the
application must do. For example, instead of performing complex logic in the application
layer, a trigger can automatically perform that logic directly in the database when necessary.

Examples of Different Types of Triggers:


• BEFORE Trigger: Fires before a modification (insert, update, delete) is made to a table.
• Used for validation or changing data before it is committed.
Example: Set the “status” field to “active” before inserting a new user.
CREATE TRIGGER before_insert_user
BEFORE INSERT ON Users
FOR EACH ROW
BEGIN
SET NEW.Status = 'active';
END;
• AFTER Trigger: Fires after a modification is made to a table.
• Useful for actions like logging, updating other tables, or sending notifications.
Example: After deleting a product, log the deletion.
CREATE TRIGGER after_delete_product
AFTER DELETE ON Products
FOR EACH ROW
BEGIN

1135
1000+ SQL Interview Questions & Answers | By Zero Analyst

INSERT INTO Product_Deletion_Log (ProductID, DeletedAt)


VALUES (OLD.ProductID, NOW());
END;
• INSTEAD OF Trigger: Used in views to replace the default action (insert, update, delete).
• Commonly used with views to handle updates to a view instead of a table.
Example: Instead of deleting from a view, delete from an underlying table.
CREATE TRIGGER instead_of_delete
INSTEAD OF DELETE ON EmployeeView
FOR EACH ROW
BEGIN
DELETE FROM Employees WHERE EmployeeID = OLD.EmployeeID;
END;

Disadvantages of Using Triggers:


• Complexity:
• Triggers can introduce complexity into your database design and logic, making it harder to
understand how and when certain data changes occur.
• Performance Overhead:
• Triggers add additional overhead to your database operations, especially if they involve
complex logic or are fired frequently. This could lead to performance issues, particularly with
large datasets.
• Hidden Logic:
• Triggers run automatically in the background, which can make the database behavior less
transparent. Developers and database administrators may not always be aware of triggers that
impact data modification.
• Difficult to Debug:
• Debugging triggers can be challenging because they execute automatically and may not be
directly visible in the application code. This can make tracking down errors or understanding
why data was modified more difficult.

Summary:
Database Triggers offer powerful ways to automate tasks like enforcing business rules,
logging changes, maintaining data integrity, and performing calculations. They are
particularly useful for ensuring data consistency and automating repetitive tasks directly in
the database layer. However, they can also add complexity and performance overhead, so
they should be used judiciously.

Companies Where This Question was Asked


• Google
• Amazon
• Microsoft
• Oracle
• Accenture
• IBM
• Q.943
Question

1136
1000+ SQL Interview Questions & Answers | By Zero Analyst

Discuss the concept of database concurrency control and how it is achieved in SQL
databases.

Answer

What is Database Concurrency Control?


Database concurrency control refers to the mechanisms that ensure multiple transactions
accessing the database at the same time (concurrently) do so in a way that maintains data
consistency, integrity, and isolation. Concurrency control is crucial in multi-user
environments where many transactions may attempt to access or modify the same data
simultaneously.
Concurrency control helps prevent issues such as:
• Lost updates: When two transactions try to update the same data at the same time, and one
update is overwritten.
• Temporary inconsistency: When one transaction reads data that is being modified by
another transaction.
• Dirty reads: When one transaction reads data that is uncommitted and later rolled back by
another transaction.
• Phantom reads: When the result set of a query changes due to another transaction
inserting or deleting data.

Goals of Concurrency Control:


• Atomicity: Each transaction is treated as a single unit, meaning either all operations
succeed, or none do.
• Consistency: The database should remain in a consistent state before and after a
transaction, even if multiple transactions are happening concurrently.
• Isolation: The operations of one transaction should not be visible to others until the
transaction is committed.
• Durability: Once a transaction is committed, its effects are permanent, even in the event of
a system failure.

How is Concurrency Control Achieved in SQL Databases?


SQL databases use several techniques to achieve concurrency control. These techniques are
based on locking mechanisms, isolation levels, and transaction management.

1. Locking Mechanisms:
Locks are used to prevent other transactions from accessing data in conflicting ways while
one transaction is performing operations.
• Shared Locks (Read Locks): A transaction can acquire a shared lock on a piece of data if
it intends to read the data. Multiple transactions can acquire shared locks on the same data
simultaneously.
Example: If Transaction 1 reads a record, other transactions can also read the same record,
but none can modify it until the lock is released.

1137
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Exclusive Locks (Write Locks): A transaction acquiring an exclusive lock on data ensures
that no other transaction can read or write that data until the lock is released. This is used for
operations that modify data.
Example: If Transaction 1 is updating a record, no other transactions can read or update that
record until Transaction 1 is finished.
• Deadlock: A situation where two or more transactions are waiting for each other to release
locks, causing a cycle. Most databases detect deadlocks and automatically terminate one of
the transactions to break the deadlock.

2. Isolation Levels:
The isolation level of a transaction determines the level of visibility other transactions have
to its uncommitted changes. SQL databases support several isolation levels, which control the
degree of locking and the kinds of concurrency phenomena (dirty reads, non-repeatable reads,
phantom reads) allowed.
Common isolation levels include:
• Read Uncommitted:
• Allows: Dirty reads, non-repeatable reads, and phantom reads.
• Explanation: Transactions can read data that is not committed yet (i.e., changes that might
later be rolled back). This level provides the least protection and the highest concurrency but
can lead to inconsistent results.
• Read Committed:
• Allows: Non-repeatable reads, but not dirty reads.
• Explanation: Transactions can only read committed data, but if a transaction reads data
and another transaction updates it, the first transaction’s result will differ when it reads the
same data again. It prevents dirty reads but still allows for other concurrency issues.
• Repeatable Read:
• Allows: Non-repeatable reads, but prevents phantom reads.
• Explanation: Once a transaction reads data, it will see the same data if it reads it again,
even if another transaction modifies it in the meantime. However, phantom reads may still
occur if other transactions insert or delete records.
• Serializable:
• Prevents: Dirty reads, non-repeatable reads, and phantom reads.
• Explanation: This is the highest level of isolation, where transactions are executed serially
(one at a time), ensuring no interference between them. However, this level can significantly
impact performance due to reduced concurrency.
Each database engine may have slight variations in how it implements isolation levels, but
these four are the most common.

3. Optimistic Concurrency Control:


In optimistic concurrency control, transactions are allowed to execute without acquiring
locks, but before they commit, the system checks whether any other transactions have
modified the data in the meantime. If any changes are detected, the transaction is rolled back.
• Benefit: Optimistic concurrency control reduces the overhead of lock contention in
scenarios where conflicts are rare.

1138
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Common Use: This is useful when reading is far more common than writing, and conflicts
are expected to be infrequent.
Example: A banking system where multiple users might view account balances
simultaneously, but updates (such as deposits) are relatively infrequent.

4. Pessimistic Concurrency Control:


Pessimistic concurrency control locks the data as soon as a transaction starts modifying it,
preventing other transactions from accessing the same data. This is the more traditional
approach, where the database uses locks to ensure that conflicting operations do not occur.
• Benefit: Ensures data integrity but may lead to lock contention and reduced performance.
• Common Use: This is typically used in environments where conflicts are frequent, and
data consistency is critical.
Example: A real-time booking system for flights where only one user can book a specific
seat at a time.

Example Scenario:
Imagine two users, User A and User B, trying to transfer money from the same bank
account.
• Without Concurrency Control: If both users attempt to withdraw money from the
account simultaneously, they might both see the same balance and each withdraw more than
what is available. This leads to overdrafts.
• With Concurrency Control: Using locks or serializable isolation, one of the users would
be forced to wait until the other finishes its transaction. This ensures that the balance is
consistent and prevents issues like overdrafts or inconsistent results.

Conclusion:
Database concurrency control is essential for ensuring that multiple transactions can occur
simultaneously without compromising the consistency and integrity of the data. It is achieved
through:
• Locking mechanisms (shared and exclusive locks),
• Isolation levels (Read Uncommitted, Read Committed, Repeatable Read, Serializable),
• Optimistic and pessimistic concurrency control, each providing different trade-offs
between data consistency and system performance.
By properly managing concurrency, databases can ensure that they handle multi-user
environments efficiently while maintaining data correctness.

Companies Where This Question was Asked


• Google
• Amazon
• Microsoft
• Oracle

1139
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Accenture
• IBM
• Q.944
Question
Explain the role of the SELECT INTO statement in SQL and provide examples of its usage.

Answer

What is the SELECT INTO Statement?


The SELECT INTO statement in SQL is used to create a new table and insert the result of a
query into that table. It combines two operations:
• Selecting data from one or more tables.
• Inserting that data into a newly created table.
It is often used when you want to create a copy of an existing table's data or generate a new
table based on a query's result without first creating the table manually.

Syntax:
SELECT column1, column2, ...
INTO new_table
FROM existing_table
WHERE condition;
• column1, column2, ...: The columns you want to select from the existing table.
• new_table: The name of the new table that will be created.
• existing_table: The name of the table from which data is selected.
• condition: An optional condition to filter the rows you want to insert into the new table.

Key Points:
• Creates a new table: The SELECT INTO statement creates a new table with the columns
and data types inferred from the source table(s).
• Data insertion: The data retrieved by the SELECT query is inserted into the newly created
table.
• No pre-existing table: The target table (new_table) is created automatically if it does not
already exist. If the table exists, you cannot use SELECT INTO; instead, you would need to use
INSERT INTO.

Examples of SELECT INTO Usage:

1. Creating a New Table from an Existing Table


This example creates a new table called EmployeeCopy that contains all the data from the
Employees table.
SELECT *
INTO EmployeeCopy
FROM Employees;

1140
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Result: A new table EmployeeCopy will be created with the same structure (columns and
data types) as the Employees table, and all the rows from Employees will be copied into
EmployeeCopy.

2. Creating a New Table with Specific Columns


You can also specify particular columns to be included in the new table. For example, to
create a new table with only the EmployeeID and Name columns:
SELECT EmployeeID, Name
INTO EmployeeNameList
FROM Employees;
• Result: A new table EmployeeNameList will be created with only the EmployeeID and
Name columns, and the data from these columns will be copied into the new table.

3. Creating a New Table with Filtered Data


You can apply a WHERE clause to filter the rows before they are copied into the new table. For
instance, if you want to create a new table with employees who have a salary greater than
$50,000:
SELECT EmployeeID, Name, Salary
INTO HighSalaryEmployees
FROM Employees
WHERE Salary > 50000;
• Result: A new table HighSalaryEmployees will be created containing only the employees
with salaries greater than $50,000.

4. Creating a New Table with Aggregate Data


You can use aggregate functions to create summary tables. For example, if you want to create
a table with the total sales by each salesperson:
SELECT SalespersonID, SUM(SalesAmount) AS TotalSales
INTO SalesSummary
FROM Sales
GROUP BY SalespersonID;
• Result: A new table SalesSummary will be created with SalespersonID and their
corresponding TotalSales.

5. Creating a New Table with a Join


You can use the SELECT INTO statement to join multiple tables and create a new table based
on that result. For example, creating a new table with employee details along with their
department names:
SELECT e.EmployeeID, e.Name, d.DepartmentName
INTO EmployeeDepartment
FROM Employees e
JOIN Departments d ON e.DepartmentID = d.DepartmentID;
• Result: A new table EmployeeDepartment will be created with the employee EmployeeID,
Name, and their corresponding DepartmentName.

Important Considerations:
• Table Structure:
• The structure (columns and data types) of the new table is automatically derived from the
result of the SELECT statement.

1141
1000+ SQL Interview Questions & Answers | By Zero Analyst

• You cannot specify column data types in the SELECT INTO statement; they are inferred
from the source columns.
• No Constraints or Indexes:
• The new table created by SELECT INTO will not have any indexes, primary keys, or foreign
key constraints from the original table. You would need to manually add these after the table
is created if necessary.
• Cannot Use SELECT INTO with an Existing Table:
• If the target table (new_table) already exists, you cannot use SELECT INTO. Instead, you
would use INSERT INTO.
• Performance Considerations:
• SELECT INTO can be resource-intensive, especially when copying large datasets, because it
involves both creating a new table and inserting data into it.

Summary:
The SELECT INTO statement in SQL is a powerful tool to create a new table and insert data
into it based on the results of a query. It is often used for creating temporary tables, backups,
or summary reports. However, it creates tables without any constraints or indexes, so they
must be added separately if needed.

Companies Where This Question was Asked


• Google
• Amazon
• Microsoft
• Oracle
• Accenture
• IBM
• Q.945
Question
What is SQL and its significance in data analysis?

Answer
SQL (Structured Query Language) is a standard programming language used to manage
and manipulate relational databases. It allows you to query, insert, update, and delete data
stored in relational database systems (RDBMS).

Significance in Data Analysis:


• Data Extraction: SQL enables the extraction of large volumes of structured data from
databases quickly and efficiently using queries.
• Data Cleaning: SQL is used to filter, aggregate, and transform data, essential for preparing
data for analysis.
• Data Manipulation: SQL helps in manipulating data (e.g., updating records, joining
tables) to uncover trends, patterns, and insights.
• Performance: SQL queries can be optimized for faster data retrieval, making it essential
for handling large datasets in real-time analysis.

1142
1000+ SQL Interview Questions & Answers | By Zero Analyst

Companies Where This Question was Asked


• Google
• Amazon
• Microsoft
• Oracle
• Accenture
• IBM
• Q.946
Question
Differentiate between SQL and MySQL.

Answer
• SQL (Structured Query Language):
• Definition: A standardized language used to query, manipulate, and manage data in
relational databases.
• Purpose: Used for writing queries to interact with databases (e.g., SELECT, INSERT,
UPDATE).
• Independence: SQL is not a database system; it is a language used across different
databases.
• MySQL:
• Definition: A relational database management system (RDBMS) that uses SQL as its
query language.
• Purpose: MySQL stores, manages, and organizes data in tables. It implements SQL to
perform operations like data retrieval, manipulation, etc.
• Independence: MySQL is a software/database system, not a language.

Companies Where This Question was Asked


• Google
• Amazon
• Microsoft
• Oracle
• Accenture
• IBM
• Q.947
Question
What are the primary components of a SQL query?

Answer
The primary components of a SQL query are:
• SELECT: Specifies the columns to retrieve.
• FROM: Identifies the table from which to fetch data.

1143
1000+ SQL Interview Questions & Answers | By Zero Analyst

• WHERE: Filters records based on specified conditions.


• GROUP BY: Groups rows that have the same values into summary rows (often used with
aggregate functions).
• HAVING: Filters groups based on conditions (used with GROUP BY).
• ORDER BY: Sorts the result set based on one or more columns.

Example:
SELECT Name, Age
FROM Employees
WHERE Age > 30
ORDER BY Age DESC;

Companies Where This Question was Asked


• Google
• Amazon
• Microsoft
• Oracle
• Accenture
• IBM
• Q.948
Question
Define the terms: table, row, and column in SQL.

Answer
• Table: A collection of related data organized into rows and columns. It is the basic unit of
data storage in a database.
• Example: Employees table stores employee-related data.
• Row: A single, horizontal record in a table, representing one data entry.
• Example: A row in the Employees table represents one employee.
• Column: A vertical set of values in a table, representing a specific attribute of data.
• Example: Name, Age, and Salary are columns in the Employees table.

Companies Where This Question was Asked


• Tesla
• Apple
• Microsoft
• Amazon
• Nike
• Q.949
Question
How do you comment out lines in SQL?

Answer

1144
1000+ SQL Interview Questions & Answers | By Zero Analyst

In SQL, you can comment out lines using two types of comment syntax:
• Single-line comment: Use - to comment a single line.
-- This is a single-line comment
SELECT * FROM Employees;
• Multi-line comment: Use /* to begin and / to end a comment block.
/* This is a
multi-line comment */
SELECT * FROM Employees;

Companies Where This Question was Asked


• Intel
• Facebook
• Oracle
• Spotify
• Uber
• Q.950
Question
What is the purpose of the SELECT statement in SQL?

Answer
The SELECT statement is used to retrieve data from one or more tables in a database. It
allows you to specify which columns to return, and you can filter, sort, and group the data as
needed.

Basic Syntax:
SELECT column1, column2, ...
FROM table_name
WHERE condition;
• Purpose: Extracts data for analysis, reporting, or further processing.

Companies Where This Question was Asked


• Walmart
• Apple
• Google
• Microsoft
• Tesla
• Q.951
Question
How do you retrieve all columns from a table using SQL?

Answer
To retrieve all columns from a table, use the * wildcard with the SELECT statement:
SELECT *
FROM table_name;
• Purpose: Fetches all columns and their data from the specified table.

1145
1000+ SQL Interview Questions & Answers | By Zero Analyst

Companies Where This Question was Asked


• Ford
• Adobe
• Tesla
• Coca-Cola
• LinkedIn
• Q.952
Question
How do you eliminate duplicate records in a SQL query?

Answer
To eliminate duplicate records, use the DISTINCT keyword in your SELECT query:
SELECT DISTINCT column1, column2, ...
FROM table_name;
• Purpose: Returns only unique rows for the specified columns.

Companies Where This Question was Asked


• Intel
• Adobe
• Twitter
• Netflix
• Uber
• Q.953
Question
Explain the difference between GROUP BY and ORDER BY clauses.

Answer
• GROUP BY:
• Purpose: Groups rows that have the same values in specified columns, often used with
aggregate functions (e.g., COUNT(), SUM()).
• Example:
SELECT department, COUNT(*)
FROM employees
GROUP BY department;
• ORDER BY:
• Purpose: Sorts the result set based on one or more columns, in ascending (ASC) or
descending (DESC) order.
• Example:
SELECT * FROM employees
ORDER BY salary DESC;

Companies Where This Question was Asked

1146
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Netflix
• Tesla
• Amazon
• Facebook
• Salesforce
• Q.954
Question
How do you limit the number of records returned by a SQL query?

Answer
To limit the number of records, use the LIMIT clause (in MySQL, PostgreSQL, SQLite) or
TOP keyword (in SQL Server):

• MySQL/PostgreSQL/SQLite:
SELECT *
FROM table_name
LIMIT 10;
• SQL Server:
SELECT TOP 10 *
FROM table_name;
• Purpose: Restricts the result set to a specified number of rows.

Companies Where This Question was Asked


• Tesla
• Microsoft
• IBM
• Oracle
• Walmart
• Q.955
Question
How do you perform arithmetic operations in SQL?

Answer
In SQL, you can perform arithmetic operations using standard operators:
• Addition (+): Adds two values.
SELECT price + tax AS total_price FROM products;
• Subtraction (): Subtracts one value from another.
SELECT salary - deductions AS net_salary FROM employees;
• Multiplication (): Multiplies two values.
SELECT quantity * unit_price AS total_cost FROM orders;
• Division (/): Divides one value by another.
SELECT total_amount / quantity AS unit_price FROM sales;
• Modulus (%): Returns the remainder of a division.
SELECT salary % 2 AS remainder FROM employees;

Companies Where This Question was Asked

1147
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Nike
• Apple
• Facebook
• Google
• Amazon
• Q.956
Question
Explain the purpose of the IN operator in SQL.

Answer
The IN operator is used to check if a value matches any value in a list or subquery. It
simplifies multiple OR conditions.

Syntax:
SELECT *
FROM table_name
WHERE column_name IN (value1, value2, value3, ...);

Example:
SELECT *
FROM employees
WHERE department IN ('Sales', 'Marketing', 'HR');
• Purpose: Matches a column’s value against a set of possible values or a subquery.

Companies Where This Question was Asked


• Tesla
• Microsoft
• Amazon
• Facebook
• Intel
• Q.957
Question
Explain the difference between EXISTS and IN in SQL.

Answer
• EXISTS:
• Purpose: Checks if a subquery returns any rows. It returns TRUE if the subquery has at least
one row, otherwise FALSE.
• Usage: Typically used with correlated subqueries.
• Example:
SELECT *
FROM employees e
WHERE EXISTS (
SELECT 1 FROM departments d WHERE d.manager_id = e.employee_id
);
• IN:
• Purpose: Checks if a value matches any value in a list or subquery.

1148
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Usage: Can be used with both correlated and non-correlated subqueries.


• Example:
SELECT *
FROM employees
WHERE department_id IN (1, 2, 3);

Key Differences:
• EXISTS checks for the presence of rows, while IN checks for a match to a list of values.
• EXISTS is more efficient when checking for existence, especially in subqueries, while IN
is better for comparing against fixed sets of values.

Companies Where This Question was Asked


• Nike
• Google
• Tesla
• Walmart
• Apple
• Q.958
Question
What are aggregate functions in SQL? Provide examples.

Answer
Aggregate functions perform a calculation on a set of values and return a single result. They
are commonly used with GROUP BY to summarize data.

Common Aggregate Functions:


• COUNT(): Returns the number of rows.
SELECT COUNT(*) FROM employees;
• SUM(): Returns the sum of a numeric column.
SELECT SUM(salary) FROM employees;
• AVG(): Returns the average value of a numeric column.
SELECT AVG(salary) FROM employees;
• MAX(): Returns the maximum value in a column.
SELECT MAX(salary) FROM employees;
• MIN(): Returns the minimum value in a column.
SELECT MIN(salary) FROM employees;

Companies Where This Question was Asked


• Facebook
• Amazon
• Microsoft
• Google
• Tesla
• Q.959
Question
What is a self-join in SQL? Provide an example.

1149
1000+ SQL Interview Questions & Answers | By Zero Analyst

Answer
A self-join is a join where a table is joined with itself. It’s useful for comparing rows within
the same table.

Syntax:
SELECT A.column1, B.column2
FROM table_name A, table_name B
WHERE A.common_column = B.common_column;

Example:
Suppose you have an Employees table with EmployeeID, ManagerID, and Name columns, and
you want to find each employee's manager.
SELECT E.Name AS Employee, M.Name AS Manager
FROM Employees E
LEFT JOIN Employees M ON E.ManagerID = M.EmployeeID;
• Purpose: This query joins the Employees table with itself, comparing each employee's
ManagerID with the EmployeeID of other employees.

Companies Where This Question was Asked


• Microsoft
• Apple
• Amazon
• Oracle
• Walmart
• Q.960
Question
Explain the difference between a view and a table in SQL.

Answer
• Table:
• Definition: A table is a physical storage structure that holds data in rows and columns.
• Purpose: Stores actual data in a database.
• Example: employees, orders.
• View:
• Definition: A view is a virtual table created by a query that retrieves data from one or more
tables.
• Purpose: Provides a simplified or customized way to view data without storing it
physically.
• Example: CREATE VIEW EmployeeDetails AS SELECT Name, Age FROM employees
WHERE Age > 30;

Key Differences:
• Storage: Tables store data; views do not store data, only the query definition.
• Usage: Tables are used for permanent data storage, while views are used for convenient
data access.

1150
1000+ SQL Interview Questions & Answers | By Zero Analyst

Companies Where This Question was Asked


• Google
• Amazon
• Microsoft
• Apple
• Tesla
• Q.961
Question
How do you perform a case-insensitive search in SQL?

Answer
To perform a case-insensitive search, you can use:
• UPPER() or LOWER() functions:
• Convert both the column and search term to the same case.
SELECT *
FROM employees
WHERE UPPER(name) = UPPER('john');
• COLLATE clause (in some databases like MySQL, SQL Server):
• Use a case-insensitive collation.
SELECT *
FROM employees
WHERE name COLLATE Latin1_General_CI_AS = 'john';

Key Points:
• UPPER() / LOWER(): Changes both the column and search term to the same case for
comparison.
• COLLATE: Specifies case-insensitive collation for the query.

• Q.962
Question
Explain the purpose of the EXPLAIN statement in SQL.

Answer
The EXPLAIN statement is used to display the execution plan of a query. It provides insights
into how the SQL query will be executed by the database engine, showing details like:
• Which indexes are used
• The order in which tables are accessed
• The type of join used
• Estimated row counts and cost

Example:
EXPLAIN SELECT * FROM employees WHERE department = 'Sales';
• Purpose: Helps optimize queries by understanding how the database processes them,
allowing for better performance tuning.

1151
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Q.963
Question
How do you perform data migration in SQL?

Answer
Data migration in SQL involves transferring data from one database or table to another.
Common methods include:
• Using INSERT INTO with SELECT:
• Move data from one table to another within the same database or across databases.
INSERT INTO new_table (column1, column2)
SELECT column1, column2
FROM old_table;
• Export/Import:
• Export data from the source database to a file (CSV, SQL dump) and import it into the
destination database.
• Tools like mysqldump (MySQL) or pg_dump (PostgreSQL) are used for this purpose.
• ETL (Extract, Transform, Load):
• Use an ETL tool or SQL scripts to extract data from the source, transform it as needed, and
load it into the target system.
• Purpose: Helps in transferring, updating, or restructuring data when moving between
different environments or systems.

• Q.964
Question
What is a Database?

Answer
A database is a structured collection of data stored and managed electronically. It allows for
efficient storage, retrieval, modification, and management of data. Databases are used to store
data in a way that is easily accessible, searchable, and manageable.

Key Characteristics:
• Tables: Organized into rows (records) and columns (attributes).
• Queries: Data is accessed and manipulated using query languages like SQL.
• Management: Managed by a DBMS (Database Management System) like MySQL,
PostgreSQL, or Oracle.
• Purpose: Used to store and manage large volumes of data, ensuring consistency, security,
and easy access.

• Q.965
Question
What is DBMS?

1152
1000+ SQL Interview Questions & Answers | By Zero Analyst

Answer
A DBMS (Database Management System) is software that enables users to create, manage,
and interact with databases. It provides an interface between users and the database, handling
data storage, retrieval, and manipulation while ensuring data integrity, security, and
concurrency control.

Key Functions of DBMS:


• Data Definition: Defines the structure of the database (schemas, tables).
• Data Manipulation: Handles querying, updating, inserting, and deleting data.
• Data Security: Ensures data access control and user permissions.
• Data Integrity: Ensures data consistency and accuracy.
• Concurrency Control: Manages simultaneous data access by multiple users.

Examples of DBMS:
• Relational DBMS (RDBMS): MySQL, PostgreSQL, Oracle
• Non-relational DBMS: MongoDB, Cassandra
• Purpose: Helps organize and manage data efficiently, enabling seamless data access and
manipulation.

• Q.966
Question
What is RDBMS? How is it different from DBMS?

Answer
• RDBMS (Relational Database Management System):
• An RDBMS is a type of DBMS that stores data in tables (relations) and uses SQL for data
manipulation. It supports relationships between tables and ensures data integrity through
keys (primary, foreign).
• Examples: MySQL, PostgreSQL, Oracle, SQL Server.
• DBMS (Database Management System):
• A DBMS is a broader system that manages databases in general, but it doesn't necessarily
enforce relational data structures or relationships. It could store data in various formats
(hierarchical, network, object-based, etc.).
• Examples: File systems, Hierarchical DBMS like IBM's IMS.

Key Differences:
• Data Structure:
• RDBMS: Stores data in tables with rows and columns.
• DBMS: Can store data in various formats (e.g., hierarchical, network).
• Relationships:
• RDBMS: Supports relationships between tables using keys.
• DBMS: May not support relational data models.
• ACID Properties:
• RDBMS: Supports full ACID compliance (Atomicity, Consistency, Isolation, Durability).
• DBMS: May not fully support ACID properties.

1153
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Q.967
Question
What is a Foreign Key?

Answer
A Foreign Key is a column (or a set of columns) in a table that establishes a link between the
data in two tables. It refers to the primary key of another table, enforcing referential integrity
between the two tables.

Key Points:
• Purpose: Ensures that values in the foreign key column(s) correspond to valid entries in
the referenced table.
• Integrity: Prevents orphaned records by enforcing valid relationships.

Example:
CREATE TABLE Orders (
OrderID INT PRIMARY KEY,
CustomerID INT,
FOREIGN KEY (CustomerID) REFERENCES Customers(CustomerID)
);
• Purpose: The CustomerID in the Orders table is a foreign key that links to the
CustomerID in the Customers table.

• Q.968
Question
Difference between Unique Key and Candidate Key.

Answer
• Unique Key:
• Purpose: Ensures that all values in a column (or a set of columns) are distinct across rows.
• Characteristics: A table can have multiple unique keys. It allows NULL values, but only
one NULL value is allowed in columns with a unique constraint.
• Example:
CREATE TABLE employees (
EmployeeID INT UNIQUE,
Email VARCHAR(255) UNIQUE
);
• Candidate Key:
• Purpose: A set of one or more columns that can uniquely identify a row in a table. It is a
potential candidate for being chosen as the Primary Key.
• Characteristics: A table can have multiple candidate keys, but only one is selected as the
Primary Key. Candidate keys cannot have NULL values.
• Example:
CREATE TABLE students (
StudentID INT PRIMARY KEY,
RollNo INT UNIQUE,
Email VARCHAR(255) UNIQUE
);

1154
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Here, StudentID, RollNo, and Email are candidate keys, but StudentID is chosen as the
primary key.

Key Differences:
• Uniqueness: Both enforce uniqueness, but candidate keys are potential primary keys,
while unique keys are constraints to ensure no duplicate values.
• NULL values: Unique keys can allow one NULL, while candidate keys cannot.

• Q.969
Question
Difference between Composite Key and Super Key.

Answer
• Composite Key:
• Definition: A composite key is a primary key that consists of two or more columns
combined to uniquely identify a record in a table.
• Purpose: Used when a single column is not sufficient to uniquely identify a row.
• Example:

Here,
(student_id, course_id) together form a composite key.
CREATE TABLE enrollment (
student_id INT,
course_id INT,
PRIMARY KEY (student_id, course_id)
);
• Super Key:
• Definition: A super key is any combination of columns that uniquely identifies a row in a
table. It can include additional columns that are not necessary for uniqueness.
• Purpose: A super key can contain extra attributes, making it a broader concept than a
candidate key.
• Example: In the enrollment table, (student_id, course_id) is a super key.
(student_id, course_id, name) is also a super key, but it's not minimal.

Key Differences:
• Composition: A composite key is a specific type of key formed by multiple columns,
whereas a super key can be any combination of columns (including extra, unnecessary ones).
• Uniqueness: All super keys are unique, but they can be non-minimal, meaning they may
contain extra attributes. Composite keys are minimal by definition.

• Q.970
Question
What is the difference between NULL and NOT NULL?

Answer
• NULL:

1155
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Definition: Represents the absence of a value or an unknown value in a column.


• Usage: A column with a NULL value means that no data has been assigned to that field.
• Example:
CREATE TABLE employees (
employee_id INT,
name VARCHAR(100),
department VARCHAR(50) NULL
);
• NOT NULL:
• Definition: Ensures that a column cannot have a NULL value. The column must always
contain a valid value.
• Usage: Used as a constraint to ensure that a field is mandatory and cannot be left empty.
• Example:
CREATE TABLE employees (
employee_id INT NOT NULL,
name VARCHAR(100) NOT NULL
);

Key Differences:
• NULL allows for missing or undefined values, while NOT NULL enforces that a column
always has a value.
• NULL is used to represent unknown or undefined data, whereas NOT NULL ensures data
integrity by requiring values in the column.

• Q.971
Question
Difference between Default Constraint and Check Constraint.

Answer
• Default Constraint:
• Definition: A Default Constraint automatically assigns a default value to a column when
no value is provided during an insert operation.
• Purpose: Ensures that a column has a predefined value when no explicit value is supplied.
• Example:

Here, if
status is not provided during insertion, it will automatically default to 'Active'.
CREATE TABLE employees (
employee_id INT PRIMARY KEY,
name VARCHAR(100),
status VARCHAR(10) DEFAULT 'Active'
);
• Check Constraint:
• Definition: A Check Constraint ensures that the values entered into a column satisfy a
specified condition or rule.
• Purpose: Restricts the data that can be entered into a column by enforcing a condition.
• Example:

Here, the
age column only allows values greater than or equal to 18.
CREATE TABLE employees (

1156
1000+ SQL Interview Questions & Answers | By Zero Analyst

employee_id INT PRIMARY KEY,


name VARCHAR(100),
age INT CHECK (age >= 18)
);

Key Differences:
• Default Constraint provides a default value when no value is supplied, while Check
Constraint ensures that the values meet specific criteria or conditions.
• Default Constraint applies to missing values, whereas Check Constraint applies to all
values entered.

• Q.972
Question
What is the difference between Natural Join and Cross Join?

Answer
• Natural Join:
• Definition: A Natural Join automatically joins tables based on all columns with the same
name and compatible data types in both tables.
• Purpose: Simplifies the join by matching columns with the same name, eliminating
duplicates in the result set.
• Example:

In this example,
employees and departments are joined based on all columns with the same name (e.g.,
department_id).
SELECT *
FROM employees
NATURAL JOIN departments;
• Cross Join:
• Definition: A Cross Join returns the Cartesian product of two tables, meaning each row
from the first table is paired with every row from the second table.
• Purpose: Generates all possible combinations of rows from both tables.
• Example:

This will return every combination of rows from


employees and departments.
SELECT *
FROM employees
CROSS JOIN departments;

Key Differences:
• Natural Join automatically joins tables based on matching column names and eliminates
duplicates, whereas Cross Join returns every combination of rows from both tables
(Cartesian product).
• Natural Join is used for related data, while Cross Join is used for generating
combinations or testing purposes.

• Q.973

1157
1000+ SQL Interview Questions & Answers | By Zero Analyst

Question
What is the difference between INT and BIGINT?

Answer
• INT:
• Definition: The INT data type is used to store integer values within a specified range.
• Range: Typically from 2,147,483,648 to 2,147,483,647 (for signed integers).
• Storage: Requires 4 bytes of storage.
• Example:
CREATE TABLE employees (
employee_id INT
);
• BIGINT:
• Definition: The BIGINT data type is used to store larger integer values than INT.
• Range: Typically from 9,223,372,036,854,775,808 to 9,223,372,036,854,775,807 (for
signed integers).
• Storage: Requires 8 bytes of storage.
• Example:
CREATE TABLE transactions (
transaction_id BIGINT
);

Key Differences:
• Size: BIGINT can store much larger numbers compared to INT.
• Storage: BIGINT requires more storage (8 bytes) than INT (4 bytes).
• Use Case: Use INT for smaller numbers and BIGINT when handling large values, such as
for IDs or counters that might exceed the range of INT.

• Q.974
Question
What is the difference between DATE and DATETIME?

Answer
• DATE:
• Definition: The DATE data type is used to store only the date (year, month, and day)
without time information.
• Format: YYYY-MM-DD
• Range: Typically from '1000-01-01' to '9999-12-31'.
• Example:
CREATE TABLE events (
event_date DATE
);
• DATETIME:
• Definition: The DATETIME data type is used to store both the date and time (hours,
minutes, seconds) information.
• Format: YYYY-MM-DD HH:MM:SS
• Range: Typically from '1000-01-01 00:00:00' to '9999-12-31 23:59:59'.

1158
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Example:
CREATE TABLE events (
event_datetime DATETIME
);

Key Differences:
• Time Information: DATE stores only the date, while DATETIME stores both date and time.
• Precision: DATETIME provides more detailed information with time, making it suitable for
timestamps.
• Use Case: Use DATE when you only need the date (e.g., birthdates), and DATETIME when
both date and time are required (e.g., event timestamps).

• Q.975
Question
What is the difference between FLOAT and DECIMAL?

Answer
• FLOAT:
• Definition: FLOAT is a data type used to store approximate numeric values with floating
decimal points. It is used for storing values that require a wide range but do not require
precise accuracy.
• Precision: Stores numbers in an approximate way, which may lead to rounding errors in
some cases.
• Usage: Suitable for scientific calculations or when the precision of the fractional part is not
crucial.
• Example:
CREATE TABLE products (
price FLOAT
);
• DECIMAL (or NUMERIC):
• Definition: DECIMAL stores exact numeric values with fixed precision and scale. It is used
when precise calculations are necessary, especially for financial data.
• Precision: The DECIMAL(p, s) type allows you to define the total number of digits (p) and
the number of digits after the decimal point (s), ensuring exact values.
• Usage: Ideal for storing monetary values or any data where exact precision is critical.
• Example:
CREATE TABLE transactions (
amount DECIMAL(10, 2)
);

Key Differences:
• Precision: DECIMAL stores exact values with a defined precision, while FLOAT is
approximate and may lead to rounding errors.
• Use Case: Use DECIMAL for financial data or when precision is critical, and FLOAT for
scientific calculations where some approximation is acceptable.

• Q.976
Question

1159
1000+ SQL Interview Questions & Answers | By Zero Analyst

What is the difference between ENUM and SET?

Answer
• ENUM:
• Definition: ENUM is a data type in SQL used to store a single value from a predefined list of
values.
• Purpose: Limits the column to one value from a list of possible options.
• Example:

Here,
status can only be one of the three values: 'Active', 'Inactive', or 'On Leave'.
CREATE TABLE employees (
status ENUM('Active', 'Inactive', 'On Leave')
);
• SET:
• Definition: SET is a data type used to store multiple values from a predefined list. It allows
the selection of zero or more values from the list.
• Purpose: Useful when you want to store multiple values in a single column.
• Example:

Here,
skills can store one or more values like 'Java', 'SQL', or both.
CREATE TABLE employees (
skills SET('Java', 'SQL', 'Python', 'JavaScript')
);

Key Differences:
• Value Storage: ENUM stores only one value from the list, whereas SET can store one or
more values from the list.
• Use Case: Use ENUM for single-choice fields and SET when a column needs to store
multiple choices.

• Q.977
Question
What is the difference between DELETE and TRUNCATE?

Answer
• DELETE:
• Definition: DELETE is a DML (Data Manipulation Language) command used to remove
rows from a table based on a condition. It can delete specific rows or all rows.
• Behavior:
• Can delete specific rows based on a WHERE clause.
• It is logged in the transaction log, which makes it slower for large datasets.
• Triggers associated with the table are fired when DELETE is used.
• Example:
DELETE FROM employees WHERE status = 'Inactive';
• TRUNCATE:

1160
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Definition: TRUNCATE is a DDL (Data Definition Language) command used to remove all
rows from a table. It resets the table to its empty state.
• Behavior:
• Removes all rows from the table (cannot be used with a WHERE clause).
• It is faster than DELETE because it does not log individual row deletions.
• Does not fire triggers.
• Does not reset the table’s structure (like constraints or indexes), but often resets identity
columns.
• Example:
TRUNCATE TABLE employees;

Key Differences:
• Scope: DELETE can remove specific rows, whereas TRUNCATE removes all rows.
• Performance: TRUNCATE is faster because it is minimally logged.
• Transaction Log: DELETE is fully logged, while TRUNCATE has minimal logging.
• Triggers: DELETE activates triggers, but TRUNCATE does not.
• Rollback: Both can be rolled back if used within a transaction, but TRUNCATE can’t be
rolled back in some DBMS (like in SQL Server if the table has a foreign key constraint).

• Q.978
Question
What is a scalar function?

Answer
A scalar function is a function in SQL that operates on a single value (or set of values) and
returns a single value. These functions perform operations like mathematical calculations,
string manipulations, or data type conversions.

Key Characteristics:
• Input: Takes one or more input values (arguments).
• Output: Returns a single value (scalar result).
• Example: Mathematical, string, or date functions.

Common Scalar Functions:


• Mathematical: ABS(), ROUND(), SQRT()
• String: CONCAT(), UPPER(), LENGTH()
• Date/Time: NOW(), DATEADD(), DATEDIFF()
• Conversion: CAST(), CONVERT()

Example:
SELECT UPPER('hello') AS UppercaseString;
• This uses the UPPER() scalar function to convert the string 'hello' to 'HELLO'.

Purpose: Scalar functions are used to process or transform individual values within SQL
queries, returning a single result for each input.

1161
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Q.979
Question
What is the difference between COALESCE and IFNULL?

Answer
• COALESCE:
• Definition: COALESCE returns the first non-NULL value from a list of arguments.
• Syntax: COALESCE(value1, value2, ..., valueN)
• Behavior: It can take multiple arguments and returns the first non-NULL value. If all
values are NULL, it returns NULL.
• Example:

Result:
'Hello' (returns the first non-NULL value).
SELECT COALESCE(NULL, NULL, 'Hello', 'World');
• IFNULL:
• Definition: IFNULL checks if the first argument is NULL; if it is, it returns the second
argument; otherwise, it returns the first argument.
• Syntax: IFNULL(expression, replacement)
• Behavior: It only takes two arguments, returning the second argument if the first is NULL.
• Example:

Result:
'Default' (since the first argument is NULL).
SELECT IFNULL(NULL, 'Default');

Key Differences:
• Number of Arguments: COALESCE can take multiple arguments, whereas IFNULL only
takes two arguments.
• Flexibility: COALESCE is more flexible and can handle multiple possible fallback values,
while IFNULL is more limited to one fallback value.
• Database Compatibility: COALESCE is standard SQL, while IFNULL is typically used in
MySQL and SQLite.

• Q.980
Question
What is the difference between CASE and IF?

Answer
• CASE:
• Definition: CASE is an expression used to perform conditional logic inside SQL queries,
similar to an IF-THEN-ELSE statement.
• Usage: Can be used in both SELECT statements and other SQL clauses (e.g., WHERE, ORDER
BY).
• Syntax:

1162
1000+ SQL Interview Questions & Answers | By Zero Analyst

CASE
WHEN condition1 THEN result1
WHEN condition2 THEN result2
ELSE result3
END
• Example:
SELECT employee_id,
CASE
WHEN salary > 50000 THEN 'High'
ELSE 'Low'
END AS salary_category
FROM employees;
• Behavior: Returns a value based on the condition(s) specified.
• IF:
• Definition: IF is used for conditional logic in SQL, but it's typically more suited for
procedural code or flow control in stored procedures or functions (not in regular SQL
queries).
• Usage: Primarily used in stored procedures, triggers, or functions in SQL.
• Syntax:
IF condition THEN
-- Do something
ELSE
-- Do something else
END IF;
• Example:
DELIMITER //
CREATE PROCEDURE check_salary(IN salary INT)
BEGIN
IF salary > 50000 THEN
SELECT 'High Salary';
ELSE
SELECT 'Low Salary';
END IF;
END //
DELIMITER ;

Key Differences:
• Usage Context:
• CASE is used in SQL queries for conditional column values or expressions.
• IF is used in SQL procedures, functions, or flow control statements (not in regular SELECT
queries).
• Flexibility:
• CASE is more versatile for use in SELECT, WHERE, ORDER BY, etc.
• IF is used for more complex, procedural logic and flow control.
• Return Type:
• CASE returns a value based on conditions, while IF is more about executing specific
statements based on a condition.

• Q.981
Question
What is the difference between CAST and CONVERT?

Answer
• CAST:

1163
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Definition: CAST is a standard SQL function used to convert one data type to another.
• Usage: Works in most SQL databases and is part of the SQL standard.
• Syntax:
CAST(expression AS target_data_type)
• Example:
SELECT CAST('123' AS INT);
• Behavior: Simple and universal for type conversion across different databases.
• CONVERT:
• Definition: CONVERT is specific to SQL Server and is used to convert data from one type to
another, with optional formatting for date/time conversions.
• Usage: Primarily used in SQL Server, but not part of the SQL standard. Allows additional
style formatting for date and time conversions.
• Syntax:
CONVERT(target_data_type, expression, [style])
• Example:
SELECT CONVERT(INT, '123');
SELECT CONVERT(VARCHAR, GETDATE(), 1); -- Date format style
• Behavior: Offers more flexibility, especially for date and time formatting, but is not as
portable as CAST.

Key Differences:
• Portability: CAST is standard SQL, whereas CONVERT is specific to SQL Server.
• Flexibility: CONVERT supports additional options, such as specifying styles for date/time
formatting, while CAST is simpler and more straightforward.
• Use Case: Use CAST for general data type conversion across most SQL databases, and
CONVERT in SQL Server when additional date formatting is required.

• Q.982
Question
What is the difference between a multi-column subquery and a nested subquery?

Answer
• Multi-Column Subquery:
• Definition: A multi-column subquery returns multiple columns (more than one) and is
used in situations where more than one column of data is needed to be compared.
• Usage: It is typically used with the IN, ANY, or ALL operators in the WHERE clause.
• Example:
SELECT employee_id, department_id
FROM employees
WHERE (employee_id, department_id) IN (
SELECT employee_id, department_id
FROM employees
WHERE salary > 50000
);
• Behavior: The subquery returns more than one column (employee_id and
department_id), and the outer query compares these values against the corresponding
columns in the main query.
• Nested Subquery:
• Definition: A nested subquery is a subquery placed inside another subquery or SQL
query, generally used to return a single value or set of values for comparison.

1164
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Usage: It can be used in SELECT, FROM, WHERE, or other SQL clauses, often nested within
another query.
• Example:
SELECT employee_id, salary
FROM employees
WHERE salary > (
SELECT AVG(salary)
FROM employees
);
• Behavior: The inner subquery computes an average salary, and the outer query compares
each employee's salary against this average.

Key Differences:
• Columns Returned:
• A multi-column subquery returns multiple columns, often used for comparisons
involving multiple fields.
• A nested subquery typically returns a single value or a set of values but is placed inside
another query.
• Usage:
• A multi-column subquery is commonly used with the IN or ANY operators, comparing
multiple columns.
• A nested subquery is more flexible, used in different parts of a query like WHERE, SELECT,
etc.

• Q.983
Question
What is a bitmap index?

Answer
A bitmap index is a type of index in SQL that uses bitmap vectors (bit arrays) to represent
the presence or absence of values in a column. It is especially effective for columns with a
low cardinality, meaning columns that have a limited number of distinct values.

Key Features:
• Bitmap Representation: Each distinct value in the column is represented by a bitmap (a
sequence of 0s and 1s). Each bit corresponds to a row in the table, where 1 means the value is
present in that row, and 0 means it is absent.
• Efficient for Low Cardinality: Bitmap indexes are most useful when the column has a
small number of unique values (e.g., Gender, Status, or Yes/No columns).
• Storage: The index is very compact and efficient for columns with a small set of distinct
values, but can be inefficient with high-cardinality columns (e.g., Name or Email).
• Bitwise Operations: Bitmap indexes allow for fast bitwise operations (AND, OR, NOT),
which can speed up complex queries, particularly when multiple conditions are involved.

Example:
For a Gender column with values like 'Male', 'Female', 'Other', a bitmap index would create a
bitmap for each distinct gender:

1165
1000+ SQL Interview Questions & Answers | By Zero Analyst

Gender Bitmap Representation

Male 10101010 (1 for rows with Male)

Female 01010101 (1 for rows with Female)

Other 00010000 (1 for rows with Other)


This allows fast filtering and joins on the Gender column.

Key Benefits:
• Faster Query Performance: Particularly for analytical queries involving multiple
conditions (e.g., WHERE Gender = 'Male' AND Status = 'Active').
• Efficient Storage: Especially for columns with a limited number of distinct values.

Limitations:
• Inefficient for High Cardinality Columns: Bitmap indexes are not suitable for columns
with many unique values as they consume more space and can be slower.
• Update Overhead: If the column being indexed is frequently updated, the bitmap index
may require more resources to maintain.

• Q.984
Question
What is the purpose of the COALESCE function in SQL?

Answer
The COALESCE function in SQL is used to return the first non-NULL value from a list of
arguments. It is useful for handling NULL values in queries and providing default values when
NULL is encountered.

Key Points:
• Purpose: To handle NULL values by replacing them with a specified default value or the
first non-NULL value in a list.
• Syntax:

It returns the first argument that is not


NULL. If all arguments are NULL, it returns NULL.
COALESCE(value1, value2, ..., valueN)

Example:
SELECT COALESCE(NULL, NULL, 'Hello', 'World');

Result: 'Hello'
Explanation: The first two values are NULL, but 'Hello' is the first non-NULL value, so it is
returned.

1166
1000+ SQL Interview Questions & Answers | By Zero Analyst

Use Cases:
• Default Value: Replacing NULL with a default value.
SELECT COALESCE(salary, 0) FROM employees;

This returns 0 if salary is NULL.


• Fallback Logic: Returning the first available non-NULL value in a sequence of columns
or expressions.

Benefits:
• Simplifies Logic: Avoids complex CASE statements for NULL handling.
• Improves Readability: Makes SQL queries more readable and concise when dealing with
NULL values.

• Q.985
Question
What is the NTILE function in SQL?

Answer
The NTILE function in SQL is a window function that distributes rows of a result set into a
specified number of buckets or groups, based on the order defined by the ORDER BY clause.
Each bucket gets an approximately equal number of rows. It is commonly used for data
analysis to categorize data into quantiles (e.g., quartiles, deciles).

Syntax:
NTILE(number_of_buckets) OVER (ORDER BY column_name)
• number_of_buckets: The number of groups (or buckets) you want to divide the rows into.
• ORDER BY: Specifies the column(s) that define the order of rows before dividing them into
buckets.

Example:
Suppose we have the following sales table:
+-------------+---------+
| employee_id | sales |
+-------------+---------+
| 1 | 5000 |
| 2 | 3000 |
| 3 | 7000 |
| 4 | 4000 |
| 5 | 6000 |
| 6 | 2000 |
+-------------+---------+

To divide the employees into 3 groups (buckets) based on their sales, we use the NTILE
function:
SELECT employee_id, sales, NTILE(3) OVER (ORDER BY sales DESC) AS sales_bucket
FROM sales;

Result:
+-------------+---------+-------------+
| employee_id | sales | sales_bucket|
+-------------+---------+-------------+

1167
1000+ SQL Interview Questions & Answers | By Zero Analyst

| 3 | 7000 | 1 |
| 5 | 6000 | 1 |
| 1 | 5000 | 2 |
| 4 | 4000 | 2 |
| 2 | 3000 | 3 |
| 6 | 2000 | 3 |
+-------------+---------+-------------+

Key Points:
• The NTILE function assigns each row to a bucket numbered from 1 to n (where n is the
number of buckets).
• The result is ordered by the specified column(s) in the ORDER BY clause before the rows are
distributed into buckets.
• Rows are distributed as evenly as possible, but if the number of rows is not divisible by the
number of buckets, some buckets may contain one more row than others.

Use Cases:
• Dividing data into quartiles, deciles, or any other type of quantile.
• Ranking or segmenting data for comparative analysis.

Summary: The NTILE function is useful for distributing rows into a specific number of
groups based on their order, which is ideal for analysis tasks that require dividing data into
segments (like percentiles or rankings).
• Q.986
Question
How do you update existing records in a table?

Answer
To update existing records in a table, you use the UPDATE statement in SQL. This statement
modifies the values of one or more columns in existing rows of a table.

Syntax:
UPDATE table_name
SET column1 = value1, column2 = value2, ...
WHERE condition;
• table_name: The name of the table where the data will be updated.
• SET: Specifies the columns to be updated and their new values.
• WHERE: Defines the condition to specify which rows should be updated (if omitted, all rows
are updated).

Example:
UPDATE employees
SET salary = 60000, department = 'HR'
WHERE employee_id = 101;

This updates the salary and department for the employee with employee_id = 101.

Key Points:
• WHERE Clause: Always use the WHERE clause to avoid updating all rows in the table.
• Multiple Columns: You can update multiple columns in a single UPDATE statement.

1168
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Rollback: If you accidentally update the wrong records, you can roll back the transaction
(if using transactions).

Summary: The UPDATE statement is used to modify existing records in a table by specifying
new values for one or more columns based on a condition provided by the WHERE clause.
• Q.987
Question
What is the MERGE statement in SQL?

Answer
The MERGE statement in SQL is used to perform insert, update, or delete operations on a
target table based on matching conditions with a source table. It allows for conditional
updates or inserts in a single statement, making it especially useful for handling "upsert"
operations (i.e., updating existing rows or inserting new ones based on certain conditions).

Syntax:
MERGE INTO target_table AS target
USING source_table AS source
ON target.column = source.column
WHEN MATCHED THEN
UPDATE SET target.column = source.column
WHEN NOT MATCHED THEN
INSERT (column1, column2, ...) VALUES (value1, value2, ...)
WHEN NOT MATCHED BY SOURCE THEN
DELETE;

Explanation:
• target_table: The table that will be updated, inserted into, or deleted from.
• source_table: The table that provides the data for comparison.
• ON: Specifies the condition for matching rows between the target and source tables.
• WHEN MATCHED: Defines the action to take when a match is found (e.g., update).
• WHEN NOT MATCHED: Defines the action to take when no match is found (e.g., insert).
• WHEN NOT MATCHED BY SOURCE: Defines the action when a row exists in the target table
but has no corresponding row in the source table (e.g., delete).

Example:
Assume we have a products table and a new_products table. We want to update the price
of existing products and insert new products if they don't already exist.
MERGE INTO products AS p
USING new_products AS np
ON p.product_id = np.product_id
WHEN MATCHED THEN
UPDATE SET p.price = np.price
WHEN NOT MATCHED THEN
INSERT (product_id, product_name, price)
VALUES (np.product_id, np.product_name, np.price);

Key Points:
• Single Operation: MERGE allows performing multiple operations (insert, update, delete) in
one query.

1169
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Efficient for Upserts: Commonly used for upsert operations (inserting or updating
records based on whether a match exists).
• Deletion Option: You can also delete records that no longer have a corresponding match
in the source table.

Summary: The MERGE statement in SQL combines INSERT, UPDATE, and DELETE operations
into one powerful statement, making it ideal for synchronizing data between two tables. It
compares rows from a source table with rows in a target table and performs actions based on
whether a match is found.
• Q.988
Question
What is PIVOT in SQL, and how is it different from GROUP BY?

Answer

PIVOT in SQL:
The PIVOT operation in SQL is used to rotate data, converting unique values from one
column into multiple columns in the result set. It is often used for summarizing or
transforming row data into columnar data, making it easier to analyze in a tabular format.
• Purpose: To aggregate data and display it in a cross-tabular format, where rows become
columns.
• Syntax (in SQL Server):
SELECT <non-aggregated_column>, [value1], [value2], [valueN]
FROM
(SELECT <columns> FROM <table>) AS source
PIVOT
(AGGREGATE_FUNCTION(<value_column>) FOR <pivot_column> IN ([value1], [value2], [valu
eN])) AS pvt;

Example:
Suppose you have a sales table:
+-------------+--------+---------+------------+
| product_id | month | sales | region |
+-------------+--------+---------+------------+
| 1 | Jan | 100 | East |
| 1 | Feb | 120 | East |
| 2 | Jan | 150 | West |
| 2 | Feb | 180 | West |
+-------------+--------+---------+------------+

To display total sales by product and month (pivoting on month):


SELECT product_id, [Jan], [Feb]
FROM
(SELECT product_id, month, sales FROM sales) AS source
PIVOT
(SUM(sales) FOR month IN ([Jan], [Feb])) AS pvt;

Result:
+-------------+-----+-----+
| product_id | Jan | Feb |
+-------------+-----+-----+
| 1 | 100 | 120 |

1170
1000+ SQL Interview Questions & Answers | By Zero Analyst

| 2 | 150 | 180 |
+-------------+-----+-----+

GROUP BY in SQL:
The GROUP BY clause is used to group rows that have the same values into summary rows,
like calculating aggregates (e.g., COUNT(), SUM(), AVG(), etc.) for each group. It is commonly
used when you want to perform aggregation on data.
• Purpose: To aggregate data based on common values and provide summarized results for
each group.
• Syntax:
SELECT column, AGGREGATE_FUNCTION(column)
FROM table
GROUP BY column;

Example:
To get total sales per product and month:
SELECT product_id, month, SUM(sales) AS total_sales
FROM sales
GROUP BY product_id, month;

Result:
+-------------+--------+------------+
| product_id | month | total_sales|
+-------------+--------+------------+
| 1 | Jan | 100 |
| 1 | Feb | 120 |
| 2 | Jan | 150 |
| 2 | Feb | 180 |
+-------------+--------+------------+

Key Differences:
• Purpose:
• PIVOT: Rotates rows into columns to create a summary table with different columns for
each distinct value.
• GROUP BY: Groups rows based on a specific column or set of columns and applies
aggregate functions to each group.
• Output:
• PIVOT: Transforms data into a new format where distinct values from a column become
column headers.
• GROUP BY: Summarizes data in a grouped format with one row per group, showing
aggregated results.
• Complexity:
• PIVOT: Typically requires more complex syntax and is often used when you need to
create a cross-tab report.
• GROUP BY: Easier to use for simple aggregation but doesn’t transform the data structure
into a wide format like PIVOT.
• Flexibility:
• PIVOT: Useful for specific cases like creating dynamic column headers (e.g., months as
columns for sales).

1171
1000+ SQL Interview Questions & Answers | By Zero Analyst

• GROUP BY: More flexible for general aggregation tasks and works in a wider variety of
scenarios.

Summary: The PIVOT function in SQL is used to convert unique values into columns for
creating a cross-tab view, whereas GROUP BY is used to aggregate data by grouping rows with
common values and applying functions to summarize them.
• Q.989
Question
How do you perform a conditional update in SQL?

Answer
A conditional update in SQL is performed using the UPDATE statement in combination with
a WHERE clause that specifies the conditions under which the records should be updated. You
can also use conditional expressions like CASE or IF to update based on specific criteria.

Basic Syntax:
UPDATE table_name
SET column_name = new_value
WHERE condition;
• column_name: The column to update.
• new_value: The new value you want to set for the column.
• condition: Specifies which rows should be updated.

Example 1: Simple Conditional Update


UPDATE employees
SET salary = 70000
WHERE employee_id = 101;

This updates the salary of the employee with employee_id = 101 to 70000.

Example 2: Using CASE for Conditional Update


You can use the CASE expression to apply different updates based on a condition.
UPDATE employees
SET salary =
CASE
WHEN department = 'HR' THEN 60000
WHEN department = 'IT' THEN 80000
ELSE salary
END
WHERE department IN ('HR', 'IT');

This updates the salary based on the department:


• For 'HR', set salary to 60000.
• For 'IT', set salary to 80000.
• If not in 'HR' or 'IT', the salary remains unchanged.

Example 3: Update Using Multiple Conditions


UPDATE products
SET price = price * 1.1
WHERE category = 'Electronics' AND stock > 100;

1172
1000+ SQL Interview Questions & Answers | By Zero Analyst

This increases the price by 10% for products in the 'Electronics' category where the stock is
greater than 100.

Key Points:
• The WHERE clause is crucial to limit which rows will be updated; without it, all rows will
be updated.
• CASE can be used for conditional logic in a single update statement to set different values
based on conditions.
• The AND, OR, and other logical operators can be used within the WHERE clause for more
complex conditions.

Summary: A conditional update in SQL is done using the UPDATE statement with a WHERE
clause. You can also use CASE to set different values based on conditions within the same
query.
• Q.990
Question
What is query optimization in SQL?

Answer
Query optimization is the process of improving the performance of an SQL query by
reducing its execution time and resource consumption (like CPU, memory, and disk I/O). It
involves analyzing and modifying queries to ensure they execute in the most efficient way
possible, based on the database structure, indexes, and execution plans.

Key Aspects of Query Optimization:


• Choosing Efficient Execution Plans:
• The database query optimizer decides the most efficient execution plan by analyzing
available indexes, join methods, and query structure.
• It compares different ways to execute the query and picks the one with the least estimated
cost.
• Index Usage:
• Proper use of indexes can speed up query performance by reducing the number of rows
that need to be scanned.
• Optimizing queries to take advantage of existing indexes is a major part of query
optimization.
• Avoiding Full Table Scans:
• Instead of scanning the entire table, optimized queries aim to limit the number of rows read
using indexes or filtering conditions.
• Efficient Joins:
• The optimizer decides on the best join type (e.g., INNER JOIN, LEFT JOIN, HASH JOIN,
MERGE JOIN) and the order of joins to minimize resource usage.
• Reducing Subqueries and Unnecessary Operations:
• In some cases, subqueries can be replaced with joins, or redundant operations can be
eliminated to improve performance.

1173
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Using Proper Aggregations:


• Queries with aggregation functions like SUM(), COUNT(), and AVG() should be optimized
by ensuring they are applied to the right subset of data.

Techniques for Query Optimization:


• Indexing: Creating appropriate indexes on columns frequently used in WHERE, JOIN, or
ORDER BY clauses.
• Analyzing Execution Plans: Using tools like EXPLAIN (or EXPLAIN ANALYZE) to see the
database’s query execution plan.
• Limiting Result Set: Using LIMIT or TOP to restrict the number of rows returned when
only a subset of data is needed.
• *Avoiding SELECT ***: Avoid SELECT * and specify only the necessary columns to
reduce I/O and processing overhead.
• Rewriting Queries: Restructuring queries for better performance, like replacing
subqueries with joins or using IN instead of multiple OR conditions.

Example:
For a query that performs a full table scan:
SELECT * FROM employees WHERE salary > 50000;

You could add an index on the salary column to optimize it:


CREATE INDEX idx_salary ON employees(salary);

After the index is created, the query optimizer may use the index to quickly find rows with a
salary > 50000, avoiding a full table scan.

Summary: Query optimization in SQL is the process of enhancing the performance of


queries by selecting efficient execution paths, utilizing indexes, minimizing resource
consumption, and restructuring queries when necessary. It ensures faster execution and
reduces the load on the database system.
• Q.991
Question
How do you analyze query performance in SQL?

Answer
To analyze query performance in SQL, you can use the following methods:
• EXPLAIN: Shows the execution plan of a query, helping identify bottlenecks (like full
table scans, missing indexes, etc.).
EXPLAIN SELECT * FROM employees WHERE salary > 50000;
• EXPLAIN ANALYZE: Provides both the execution plan and actual runtime statistics.
EXPLAIN ANALYZE SELECT * FROM employees WHERE salary > 50000;
• Query Profiling: Tools like SHOW PROFILE (in MySQL) provide detailed time breakdowns
of query execution.
• Index Analysis: Check if indexes are being used effectively with the query, especially for
columns in WHERE, JOIN, or ORDER BY.
• Database Logs: Review slow query logs to identify queries that take longer to execute.

1174
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Database Performance Tools: Use tools like SQL Server Profiler, MySQL Workbench,
or Oracle SQL Developer for detailed performance metrics.
• Q.992
Question
What is an execution plan in SQL?

Answer
An execution plan in SQL is a detailed roadmap that the database engine follows to execute
a query. It shows how the database will retrieve data, including the use of indexes, join
methods, and access paths.

Key Elements:
• Scan: Full table scan or index scan.
• Join Type: Inner join, outer join, hash join, etc.
• Sort: How rows are sorted (e.g., ORDER BY).
• Filters: Conditions applied during data retrieval.

How to View:
• MySQL: EXPLAIN SELECT * FROM table;
• SQL Server: SET SHOWPLAN_ALL ON; or use Execution Plan in SSMS.
• PostgreSQL: EXPLAIN ANALYZE SELECT * FROM table;
• Q.993
Question
What is the difference between Sequence Scan and Bitmap Scan in SQL?

Answer
• Sequence Scan: A sequential scan reads the entire table row by row. It is used when no
indexes are available or when scanning all rows is faster than using an index (e.g., for small
tables).
• Bitmap Scan: A bitmap scan uses a bitmap index to quickly identify rows matching
multiple conditions, and is more efficient when combining multiple indexes. It is typically
used for queries with multiple conditions (e.g., AND, OR) on indexed columns.

Key Differences:
• Sequence Scan: Scans all rows, no indexing.
• Bitmap Scan: Uses a bitmap index for faster retrieval on multiple conditions. More
efficient with large datasets and complex queries.
• Q.994
Question
How do you manage user permissions in SQL?

Answer

1175
1000+ SQL Interview Questions & Answers | By Zero Analyst

User permissions in SQL are managed using GRANT, REVOKE, and DENY commands.
• GRANT: Assigns specific privileges to users or roles on database objects (tables, views,
etc.).
GRANT SELECT, INSERT ON employees TO user_name;
• REVOKE: Removes previously granted permissions from a user or role.
REVOKE SELECT, INSERT ON employees FROM user_name;
• DENY: Explicitly denies specific permissions to a user, even if the user inherits them from
a role.
DENY DELETE ON employees TO user_name;
• SHOW GRANTS: Displays the current privileges for a user.
SHOW GRANTS FOR user_name;

Permissions can be granted at various levels (e.g., database, schema, table) and can be based
on roles to simplify management.
• Q.995
Question
What are the types of data integrity in SQL?

Answer
The main types of data integrity are:
• Entity Integrity: Ensures that each row in a table has a unique identifier (primary key) and
that no part of the primary key can be NULL.
• Referential Integrity: Ensures that foreign keys correctly reference valid rows in other
tables. It maintains relationships between tables.
• Domain Integrity: Ensures that all column values are within a defined domain (i.e., valid
data types, ranges, or specific values). This is enforced by constraints like CHECK, NOT NULL,
and DEFAULT.
• User-Defined Integrity: Custom rules defined by users that do not fall under other types,
ensuring the data follows business rules or logic.
• Q.996
Question
What is role-based access control (RBAC)?

Answer
Role-based access control (RBAC) is a security model where access to resources is granted
based on a user's role within an organization. Users are assigned roles, and each role has
specific permissions to perform actions on the database.

Key Elements:
• Roles: A set of permissions assigned to a group (e.g., Admin, User, Manager).
• Users: Individuals assigned to one or more roles.
• Permissions: Rights granted to roles, such as SELECT, INSERT, UPDATE, DELETE.

Example:
CREATE ROLE manager;
GRANT SELECT, UPDATE ON employees TO manager;

1176
1000+ SQL Interview Questions & Answers | By Zero Analyst

CREATE USER john;


GRANT manager TO john;

Here, john inherits the permissions of the manager role.


• Q.997
Question
What is the difference between COMMIT and ROLLBACK in SQL?

Answer
• COMMIT: Finalizes the changes made in a transaction and makes them permanent in the
database. Once committed, changes cannot be undone.
COMMIT;
• ROLLBACK: Undoes all changes made during the current transaction, reverting the
database to its state before the transaction began.
ROLLBACK;

Key Difference:
• COMMIT makes changes permanent.
• ROLLBACK cancels changes made in the transaction, restoring the previous state.
• Q.998
Question
What is a covering index in SQL?

Answer
A covering index is an index that includes all the columns needed for a query, allowing the
database to satisfy the query entirely using the index, without needing to access the actual
table data.

Example:
If a query selects columns A, B, and C:
SELECT A, B, C FROM table WHERE A = 'value';

You can create a covering index like this:


CREATE INDEX idx_covering ON table(A, B, C);

With this index, the database can return the query result directly from the index, avoiding the
need to access the table.

Benefits:
• Improves performance by reducing I/O operations (no need to fetch data from the table).
• Fast retrieval for queries that need specific columns indexed.
• Q.999
Question
How do you avoid full table scans in SQL?

1177
1000+ SQL Interview Questions & Answers | By Zero Analyst

Answer
To avoid full table scans:
• Use Indexes: Ensure columns used in WHERE, JOIN, or ORDER BY have appropriate indexes.
• Example: Index on employee_id for fast lookups.
• Optimize Queries: Write selective queries with WHERE clauses to limit the number of rows
scanned.
• Analyze Execution Plans: Use EXPLAIN to check if indexes are being used or if full table
scans are occurring.
• Use Partitioning: Partition large tables to split data into smaller, manageable chunks,
reducing the number of rows scanned.
• Limit Results: Use LIMIT or TOP to restrict the number of rows returned when only a
subset of data is needed.
• Avoid SELECT *: Specify only the necessary columns to reduce unnecessary data retrieval.
• Q.1000
Question
What is a filtered index in SQL?

Answer
A filtered index is an index that is created with a filter condition, indexing only a subset of
rows that meet the specified criteria. This reduces the index size and improves query
performance when queries frequently reference specific rows.

Syntax:
CREATE INDEX index_name
ON table_name (column_name)
WHERE condition;

Example:
CREATE INDEX idx_active_employees
ON employees (status)
WHERE status = 'Active';

This index will only include rows where status = 'Active', improving performance for
queries filtering on active employees.
• Q.1001
Question
What is data encryption in SQL?

Answer
Data encryption in SQL is the process of converting sensitive data into an unreadable format
to prevent unauthorized access. It ensures that data stored in the database is secure, even if
someone gains access to the database files.

Types of Encryption in SQL:

1178
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Transparent Data Encryption (TDE): Encrypts entire databases, making data on disk
unreadable without the decryption key.
• Example: Used in SQL Server and Oracle.
• Column-level Encryption: Encrypts specific columns (e.g., personal information) while
leaving other data unencrypted.
• Example: AES_ENCRYPT() and AES_DECRYPT() in MySQL.
• Backup Encryption: Encrypts database backups to protect sensitive data during storage or
transfer.
• SSL/TLS Encryption: Secures the connection between client applications and the
database server, ensuring data is encrypted during transmission.

Example (MySQL):
CREATE TABLE users (
id INT PRIMARY KEY,
name VARCHAR(100),
email VARCHAR(100),
password VARBINARY(255)
);

-- Insert encrypted data (using AES encryption)


INSERT INTO users (name, email, password)
VALUES ('John Doe', '[email protected]', AES_ENCRYPT('mypassword', 'encryption_key'));
• Q.1002
Question
What is data masking in SQL?

Answer
Data masking in SQL is the process of hiding sensitive data by replacing it with fictional or
obfuscated values, while retaining its original format. It allows developers, testers, or other
users to work with realistic data without exposing confidential information.

Types of Data Masking:


• Static Data Masking: Data is permanently obfuscated in the database for non-production
environments (e.g., in test databases).
• Dynamic Data Masking: Data is obfuscated on the fly when queried, without altering the
actual data in the database.

Example (SQL Server - Dynamic Masking):


CREATE TABLE employees (
emp_id INT,
name VARCHAR(100),
salary DECIMAL(10, 2) MASKED WITH (FUNCTION = 'default()')
);

SELECT emp_id, name, salary FROM employees;

In this case, the salary column will be masked when queried, showing masked values (e.g.,
xxx) instead of the actual data.

Benefits:
• Protects sensitive data in non-production environments.
• Reduces exposure of confidential information while maintaining data usability.

1179
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Q.1003
Question
What is data modeling?

Answer
Data modeling is the process of designing the structure of a database, including its tables,
columns, relationships, and constraints. It helps define how data is stored, accessed, and
manipulated, ensuring the database supports the organization's needs effectively.

Types of Data Models:


• Conceptual Data Model: High-level model focusing on business requirements and key
entities without specifying details.
• Logical Data Model: Specifies the structure of data elements and relationships, without
concern for physical storage.
• Physical Data Model: Describes how the data is physically stored in the database,
including indexes, partitions, and file storage.

Key Components:
• Entities: Objects or concepts that store data (e.g., Customer, Order).
• Attributes: Data fields within entities (e.g., Customer Name, Order Date).
• Relationships: Associations between entities (e.g., Customer places Order).
Data modeling ensures efficient data storage, retrieval, and consistency.
• Q.1004
Question
What is an Entity-Relationship Diagram (ERD)?

Answer
An Entity-Relationship Diagram (ERD) is a visual representation of the entities in a
database and their relationships. It helps in designing and understanding the structure of a
database.

Key Components:
• Entities: Represented by rectangles; they are objects or concepts (e.g., Customer,
Product).
• Attributes: Represented by ovals; they are properties or details of entities (e.g., Customer
Name, Product Price).
• Relationships: Represented by diamonds; they show how entities are related (e.g.,
Customer places Order).
• Primary Key: Underlined attribute that uniquely identifies each entity.

Example:
An ERD for a Customer placing an Order might show:

1180
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Customer (Entity) → CustomerID (Attribute, Primary Key)


• Order (Entity) → OrderID (Attribute, Primary Key)
• Relationship: A Customer places an Order.
ERDs are crucial for visualizing database structure and relationships.
• Q.1005
Question
What is normalization in database design?

Answer
Normalization is the process of organizing a database to reduce data redundancy and
improve data integrity by dividing large tables into smaller ones and defining relationships
between them.

Key Normal Forms:


• 1NF (First Normal Form): Ensures all columns contain atomic (indivisible) values and
each record is unique.
• 2NF (Second Normal Form): Achieved by removing partial dependencies, ensuring non-
key columns depend on the whole primary key.
• 3NF (Third Normal Form): Removes transitive dependencies, ensuring non-key columns
are dependent only on the primary key.

Benefits:
• Reduces data duplication.
• Ensures consistency and integrity.
• Improves query performance by reducing redundant data.
• Q.1006
Question
What is the difference between Star Schema and Snowflake Schema in data warehousing?

Answer
• Star Schema:
In a Star Schema, the fact table is at the center and is directly connected to dimension tables.
The structure is simple, with no normalization. It is easy to query and performs faster for read
operations but may have data redundancy.
• Snowflake Schema:
In a Snowflake Schema, the dimension tables are normalized, meaning they are broken into
multiple related tables. This reduces redundancy but increases the complexity of queries and
joins. It is space-efficient but can be slower for querying due to multiple joins.
Key Difference:
• Star Schema has a denormalized structure, while Snowflake Schema is normalized.

1181
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Star Schema is simpler and faster, whereas Snowflake Schema is more complex and space-
efficient.

Company it was asked


Amazon
• Q.1007
Question
What is a Fact Table and a Dimension Table in data warehousing?

Answer
• Fact Table:
A Fact Table contains the quantitative data (facts) used for analysis, such as sales revenue,
quantity sold, or profit. It is the central table in a schema and typically includes foreign keys
that link it to dimension tables.
• Dimension Table:
A Dimension Table contains descriptive attributes (dimensions) related to the facts, such as
product details, customer information, or time periods. It provides context to the data in the
fact table and is used for filtering, grouping, or categorizing the facts.
Example:
In a sales schema:
• Fact Table: Sales with columns like Sale_ID, Product_ID, Date_ID, Amount.
• Dimension Table: Product with columns like Product_ID, Product_Name, Category.

Company it was asked


Microsoft
• Q.1008
Question
What is ELT in data processing?

Answer
ELT (Extract, Load, Transform) is a data integration process where:
• Extract: Data is pulled from multiple sources.
• Load: The raw data is loaded directly into a target system, such as a data warehouse or
data lake.
• Transform: The data is then transformed within the target system using its computational
power.
Key Features:
• Suitable for modern cloud-based data warehouses like Snowflake, BigQuery, or Redshift.

1182
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Allows for faster data loading since transformations happen after loading.
• Enables handling large-scale data processing.
Example:
• Extract data from APIs and databases.
• Load it into a data lake (e.g., Amazon S3).
• Transform it using SQL or tools like dbt in a cloud data warehouse.

Company it was asked


Google
• Q.1009
Question
How is SQL better than other DBMS systems?

Answer
SQL (Structured Query Language) is better than many other DBMS systems due to the
following reasons:
• Standardized Language:
SQL is a globally recognized and standardized language for managing relational databases.
• Ease of Use:
It has a simple, declarative syntax, making it easy to learn and use for querying, updating, and
managing data.
• Versatility:
SQL is supported by almost all relational database management systems (e.g., MySQL,
PostgreSQL, Oracle, MS SQL Server), ensuring compatibility and portability.
• Powerful Querying:
SQL allows complex queries using joins, aggregations, and subqueries to retrieve meaningful
insights from data.
• Scalability:
It handles large-scale data efficiently with indexing and partitioning, making it suitable for
enterprise applications.
• Integration:
SQL integrates seamlessly with analytics, reporting tools, and programming languages like
Python and R.
Why SQL Over Others:
While some NoSQL databases like MongoDB or Cassandra are better for unstructured data
and scalability, SQL excels in structured data management, ensuring data integrity,
consistency, and powerful analytics.

1183
1000+ SQL Interview Questions & Answers | By Zero Analyst

Company it was asked


Meta
• Q.1010
Question
What are the different types of Relational Database Management Systems (RDBMS)?

Answer
There are several types of RDBMS based on features, architecture, and usage. Here are the
main categories:
• Open Source RDBMS:
• Free to use and often customizable.
• Examples:
• MySQL: Popular for web applications and small-medium businesses.
• PostgreSQL: Known for advanced features like JSON support and extensibility.
• MariaDB: A fork of MySQL with additional features.
• Commercial RDBMS:
• Proprietary software with enterprise-grade support and advanced features.
• Examples:
• Oracle Database: Known for its robustness and high scalability.
• Microsoft SQL Server: Commonly used in Windows-based environments.
• IBM Db2: Used in large-scale enterprise applications.
• Cloud-Based RDBMS:
• Managed services on the cloud, reducing operational overhead.
• Examples:
• Amazon RDS (Relational Database Service): Supports MySQL, PostgreSQL, SQL
Server, and more.
• Google Cloud SQL: Managed MySQL, PostgreSQL, and SQL Server.
• Azure SQL Database: A cloud-based version of Microsoft SQL Server.
• Embedded RDBMS:
• Used for applications that require a lightweight, embedded database.
• Examples:
• SQLite: Lightweight, serverless, and widely used in mobile apps.
• H2: A fast, in-memory RDBMS often used in Java applications.
• Distributed RDBMS:
• Designed to run on multiple servers to handle large-scale distributed data.
• Examples:
• CockroachDB: Horizontally scalable and fault-tolerant.
• Google Spanner: A globally distributed RDBMS.
• In-Memory RDBMS:
• Stores data in memory for faster processing.
• Examples:
• SAP HANA: Known for real-time analytics.
• MemSQL (now SingleStore): Optimized for high-speed data handling.
• Object-Relational RDBMS (ORDBMS):

1184
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Extends RDBMS by supporting object-oriented features like inheritance.


• Examples:
• PostgreSQL: Offers extensive object-relational capabilities.
• Informix: Supports hybrid data types.

Company it was asked


Amazon

50+ Super Hard Questions & Answers


• Q.1011
You are tasked with identifying the top 3 customers who have spent the most in the last
two quarters. To find these customers, you need to calculate the total spend for each
customer in each quarter and ensure that they have made at least 5 orders in each of the last
two quarters. Then, you need to rank the customers based on their total spend and return the
top 3 customers. The output should include the customer name, total spend over the last two
quarters, and the average order amount over the last two quarters.

Create the tables:


-- Create Customers Table
CREATE TABLE Customers (
customer_id INT PRIMARY KEY,
customer_name VARCHAR(255),
signup_date DATE,
region VARCHAR(50)
);

-- Insert sample records into Customers table


INSERT INTO Customers (customer_id, customer_name, signup_date, region) VALUES
(101, 'John Doe', '2020-01-15', 'North America'),
(102, 'Jane Smith', '2021-03-22', 'Europe'),
(103, 'Alice Johnson', '2019-07-08', 'North America'),
(104, 'Bob Brown', '2022-11-19', 'Asia'),
(105, 'Charlie Davis', '2018-02-03', 'Europe'),
(106, 'David Wilson', '2020-06-17', 'North America'),
(107, 'Emma Harris', '2017-05-28', 'Africa'),
(108, 'Frank Miller', '2021-10-09', 'Asia'),
(109, 'Grace Lee', '2020-12-25', 'South America'),
(110, 'Henry Clark', '2019-04-12', 'Europe');

-- Create Orders Table


CREATE TABLE Orders (
order_id INT PRIMARY KEY,
customer_id INT,
order_date DATE,
amount DECIMAL(10, 2),
FOREIGN KEY (customer_id) REFERENCES Customers(customer_id)
);

-- Insert sample records into Orders table


INSERT INTO Orders (order_id, customer_id, order_date, amount) VALUES
(1001, 101, '2022-01-10', 150.00),
(1002, 102, '2022-02-05', 200.00),
(1003, 103, '2022-03-15', 180.00),
(1004, 101, '2022-04-20', 220.00),
(1005, 104, '2022-01-30', 250.00),
(1006, 105, '2022-02-25', 300.00),
(1007, 106, '2022-03-10', 130.00),

1185
1000+ SQL Interview Questions & Answers | By Zero Analyst

(1008, 107, '2022-04-05', 350.00),


(1009, 108, '2022-03-28', 170.00),
(1010, 109, '2022-02-14', 140.00),
(1011, 110, '2022-01-18', 220.00),
(1012, 101, '2022-02-22', 190.00),
(1013, 102, '2022-04-30', 280.00),
(1014, 103, '2022-03-10', 300.00),
(1015, 104, '2022-02-15', 230.00),
(1016, 105, '2022-01-05', 160.00),
(1017, 106, '2022-04-25', 170.00),
(1018, 107, '2022-03-05', 400.00),
(1019, 108, '2022-04-12', 190.00),
(1020, 109, '2022-01-23', 180.00),
(1021, 110, '2022-02-17', 210.00),
(1022, 101, '2022-03-30', 160.00),
(1023, 102, '2022-01-10', 270.00),
(1024, 103, '2022-04-01', 240.00),
(1025, 104, '2022-03-20', 320.00),
(1026, 105, '2022-03-25', 350.00),
(1027, 106, '2022-02-15', 230.00),
(1028, 107, '2022-01-07', 360.00),
(1029, 108, '2022-03-05', 210.00),
(1030, 109, '2022-04-10', 190.00),
(1031, 110, '2022-02-22', 180.00);

Question Explanation:
• Data Structure:
• The Customers table holds information about each customer, including their
customer_id, customer_name, signup_date, and region.
• The Orders table holds information about each order placed by a customer, including the
order_id, customer_id, order_date, and amount spent.
• Logic:
• For each customer, we need to calculate the total spending over the last two quarters.
• Ensure the customer has placed at least 5 orders in each of the last two quarters.
• The result should include the total spending over the last two quarters, the average order
amount, and the customer name.
• Finally, we rank the customers by total spend and return the top 3.
• Challenges:
• Calculating the total spend for the last two quarters and determining which months fall
within those quarters.
• Filtering customers who made at least 5 orders per quarter.
• Ranking customers by their total spending in the last two quarters.

SQL Solution:
WITH quarterly_orders AS (
SELECT
o.customer_id,
EXTRACT(YEAR FROM o.order_date) AS order_year,
CASE
WHEN EXTRACT(MONTH FROM o.order_date) BETWEEN 1 AND 3 THEN 'Q1'
WHEN EXTRACT(MONTH FROM o.order_date) BETWEEN 4 AND 6 THEN 'Q2'
WHEN EXTRACT(MONTH FROM o.order_date) BETWEEN 7 AND 9 THEN 'Q3'
WHEN EXTRACT(MONTH FROM o.order_date) BETWEEN 10 AND 12 THEN 'Q4'
END AS quarter,
COUNT(o.order_id) AS num_orders,
SUM(o.amount) AS total_spend,
AVG(o.amount) AS avg_order_amount
FROM Orders o
WHERE o.order_date >= CURRENT_DATE - INTERVAL '6 months'
GROUP BY o.customer_id, EXTRACT(YEAR FROM o.order_date), quarter

1186
1000+ SQL Interview Questions & Answers | By Zero Analyst

),
filtered_customers AS (
SELECT
q.customer_id,
q.order_year,
q.quarter,
q.num_orders,
q.total_spend,
q.avg_order_amount
FROM quarterly_orders q
WHERE q.num_orders >= 5 -- Filter customers with at least 5 orders in a quarter
),
customer_spending AS (
SELECT
fc.customer_id,
SUM(fc.total_spend) AS total_spend_last_two_quarters,
AVG(fc.avg_order_amount) AS avg_order_amount_last_two_quarters
FROM filtered_customers fc
WHERE fc.quarter IN ('Q3', 'Q4') -- Last two quarters
GROUP BY fc.customer_id
),
ranked_customers AS (
SELECT
cs.customer_id,
cs.total_spend_last_two_quarters,
cs.avg_order_amount_last_two_quarters,
ROW_NUMBER() OVER (ORDER BY cs.total_spend_last_two_quarters DESC) AS rank
FROM customer_spending cs
)
SELECT
c.customer_name,
rc.total_spend_last_two_quarters,
rc.avg_order_amount_last_two_quarters
FROM ranked_customers rc
JOIN Customers c ON c.customer_id = rc.customer_id
WHERE rc.rank <= 3; -- Top 3 customers by total spending

Expected Output:

customer_name total_spend_last_two_quarters avg_order_amount_last_two_quarters

John Doe 330.00 165.00

Jane Smith 470.00 195.00

Alice Johnson 480.00 200.00

Explanation of the SQL Query:


• Step 1 (Quarterly Breakdown): We extract each customer's spending data on a quarterly
basis using EXTRACT(MONTH FROM o.order_date) and categorize it into quarters (Q1, Q2,
Q3, Q4).
• Step 2 (Filter by Order Count): We ensure that the customer has made at least 5 orders
in each quarter using the WHERE q.num_orders >= 5 condition.
• Step 3 (Sum and Average Calculation): We calculate the total spend and average order
amount for customers in the last two quarters (Q3 and `
Q4`).

1187
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Step 4 (Ranking): The ROW_NUMBER() function is used to rank customers by total spend in
descending order.
• Final Output: We return the top 3 customers based on their total spending over the last
two quarters, along with their average order amount.

This problem involves multiple steps like filtering, aggregation, date manipulation, and
ranking. It challenges you to apply SQL features such as EXTRACT(), CASE, GROUP BY,
ROW_NUMBER(), and conditional aggregation for real-world business logic.

• Q.1012
Find the customer who made the most number of unique purchases from the same
product category within the last year.

Explanation of the Question:


This question is designed to test your understanding of:
• Filtering data based on a time period (last year).
• Counting distinct purchases made by a customer in a particular product category.
• Using joins and grouping to aggregate data efficiently.
The goal is to identify customers who show a strong preference for certain product
categories, considering only distinct purchases within the last year.

Learnings:
• Time Period Filtering:
• You will learn how to filter data for a specific time range using date functions (e.g.,
CURRENT_DATE, INTERVAL in MySQL, and CURRENT_DATE - INTERVAL in PostgreSQL).
• Distinct Count:
• You will need to count the distinct products purchased by each customer from the same
category. This will require understanding how to use COUNT(DISTINCT column_name).
• Joins and Grouping:
• You'll gain experience using INNER JOINs to combine customer, orders, and product
tables.
• Using GROUP BY in SQL to group data by both customer and product category.
• Subqueries:
• To narrow down the most frequent purchaser, you may need to use a subquery to filter the
top customer per category.

SQL Schemas and Datasets

Create and Insert Statements


-- Create the customers table
CREATE TABLE customers (
customer_id INT PRIMARY KEY,
customer_name VARCHAR(255),
registration_date DATE
);

1188
1000+ SQL Interview Questions & Answers | By Zero Analyst

-- Create the products table


CREATE TABLE products (
product_id INT PRIMARY KEY,
product_name VARCHAR(255),
category_id INT
);

-- Create the orders table


CREATE TABLE orders (
order_id INT PRIMARY KEY,
customer_id INT,
product_id INT,
order_date DATE,
amount DECIMAL(10, 2),
FOREIGN KEY (customer_id) REFERENCES customers(customer_id),
FOREIGN KEY (product_id) REFERENCES products(product_id)
);

-- Insert sample data into customers table


INSERT INTO customers (customer_id, customer_name, registration_date)
VALUES
(1, 'John Doe', '2022-01-01'),
(2, 'Jane Smith', '2021-03-15'),
(3, 'Alice Johnson', '2020-05-10');

-- Insert sample data into products table


INSERT INTO products (product_id, product_name, category_id)
VALUES
(101, 'Laptop', 1),
(102, 'Smartphone', 1),
(103, 'Tablet', 2),
(104, 'Headphones', 2),
(105, 'Smartwatch', 3);

-- Insert sample data into orders table


INSERT INTO orders (order_id, customer_id, product_id, order_date, amount)
VALUES
(1, 1, 101, '2023-06-01', 1200.00),
(2, 1, 102, '2023-07-15', 800.00),
(3, 1, 103, '2023-08-10', 400.00),
(4, 2, 101, '2023-02-20', 1500.00),
(5, 2, 102, '2023-03-05', 700.00),
(6, 2, 101, '2023-05-18', 1300.00),
(7, 3, 103, '2023-02-28', 300.00),
(8, 3, 104, '2023-04-15', 150.00),
(9, 1, 105, '2023-01-20', 250.00),
(10, 3, 102, '2023-06-10', 600.00),
(11, 2, 105, '2023-09-10', 200.00);

Solutions

MySQL Solution
SELECT customer_id, customer_name, COUNT(DISTINCT product_id) AS unique_purchases
FROM customers
JOIN orders ON customers.customer_id = orders.customer_id
JOIN products ON orders.product_id = products.product_id
WHERE order_date >= CURDATE() - INTERVAL 1 YEAR
GROUP BY customer_id, products.category_id
ORDER BY unique_purchases DESC
LIMIT 1;

Explanation:
• The query joins the customers, orders, and products tables to get all necessary
information.

1189
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Filters data for the last year using WHERE order_date >= CURDATE() - INTERVAL 1
YEAR.
• Groups the data by customer_id and category_id to calculate distinct purchases per
customer in the same category.
• Orders the result by the count of distinct products (COUNT(DISTINCT product_id)) and
limits the result to the top 1 customer.

PostgreSQL Solution
SELECT customer_id, customer_name, COUNT(DISTINCT product_id) AS unique_purchases
FROM customers
JOIN orders ON customers.customer_id = orders.customer_id
JOIN products ON orders.product_id = products.product_id
WHERE order_date >= CURRENT_DATE - INTERVAL '1 YEAR'
GROUP BY customer_id, products.category_id
ORDER BY unique_purchases DESC
LIMIT 1;

Explanation:
• In PostgreSQL, CURRENT_DATE - INTERVAL '1 YEAR' is used to filter the data from the
last year.
• The rest of the query structure is identical to MySQL.

Key Takeaways:
• Handling Time Intervals:
• You will understand how to filter data based on dynamic time intervals using INTERVAL
in both MySQL and PostgreSQL.
• Counting Distinct Values:
• The query demonstrates how to count distinct values in a specific field (i.e., distinct
products purchased by a customer).
• Joins and Aggregation:
• A deep dive into using multiple joins and grouping data to calculate aggregated metrics.
• Subquery for Top Results:
• The query can be modified to use subqueries if necessary to extract the "top" results,
teaching the importance of subqueries in data ranking.
• Q.1013
• Q.1014
Find the top 3 industries with the highest total earnings for employees who are under 30
years old. Include the industry name, total earnings, and the average age of employees
in that industry.

Explanation of the Question:


This question aims to test the following concepts:
• Age Filtering: You will need to filter employees based on their age, specifically those
under 30.
• Grouping by Industry: You will have to aggregate the data based on industries and
calculate total earnings for each industry.

1190
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Aggregate Functions: You will be required to use aggregate functions such as SUM() and
AVG() to calculate total earnings and the average age.
• Ranking: The result should be ordered by total earnings to identify the top 3 industries.
The dataset represents employee information and their respective earnings in different
industries. By applying proper filtering and aggregation, you'll find which industries are the
highest earners for employees below 30 years old.

Learnings:
• Filtering Data by Age:
• You will learn how to filter employees based on their age, using comparison operators
like < to get employees under a certain age.
• Aggregate Functions:
• The query will help you practice using SUM() to calculate total earnings and AVG() to get
the average age of employees within each industry.
• GROUP BY and Sorting:
• You will be grouping data by industry and sorting by the total earnings to rank the
industries.
• Limiting Results:
• You will use LIMIT to restrict the output to the top 3 industries by total earnings.

SQL Schemas and Datasets

Create and Insert Statements (based on UK Gov data)


-- Create the industries table
CREATE TABLE industries (
industry_id INT PRIMARY KEY,
industry_name VARCHAR(255)
);

-- Create the employees table


CREATE TABLE employees (
employee_id INT PRIMARY KEY,
name VARCHAR(255),
age INT,
industry_id INT,
earnings DECIMAL(10, 2),
FOREIGN KEY (industry_id) REFERENCES industries(industry_id)
);

-- Insert sample data into industries table


INSERT INTO industries (industry_id, industry_name)
VALUES
(1, 'Technology'),
(2, 'Healthcare'),
(3, 'Finance'),
(4, 'Construction'),
(5, 'Education');

-- Insert sample data into employees table


INSERT INTO employees (employee_id, name, age, industry_id, earnings)
VALUES
(1, 'John Smith', 28, 1, 55000.00),
(2, 'Jane Doe', 24, 1, 60000.00),
(3, 'Alice Brown', 32, 2, 45000.00),
(4, 'Bob Johnson', 29, 3, 70000.00),
(5, 'Charlie Davis', 23, 3, 65000.00),

1191
1000+ SQL Interview Questions & Answers | By Zero Analyst

(6, 'David Wilson', 26, 4, 40000.00),


(7, 'Eve White', 27, 2, 47000.00),
(8, 'Frank Black', 22, 5, 32000.00),
(9, 'Grace Green', 25, 1, 52000.00),
(10, 'Helen Blue', 24, 4, 42000.00),
(11, 'Isabel Gold', 29, 3, 72000.00),
(12, 'James Red', 27, 5, 31000.00);

Solutions

MySQL Solution
SELECT
i.industry_name,
SUM(e.earnings) AS total_earnings,
AVG(e.age) AS average_age
FROM
employees e
JOIN
industries i ON e.industry_id = i.industry_id
WHERE
e.age < 30
GROUP BY
i.industry_name
ORDER BY
total_earnings DESC
LIMIT 3;

Explanation:
• Filtering by Age: The WHERE e.age < 30 condition ensures that only employees under
the age of 30 are considered.
• Joining Tables: We use JOIN to combine the employees table and the industries table
on industry_id.
• Aggregation: The SUM(e.earnings) computes the total earnings for each industry, while
AVG(e.age) calculates the average age of employees in that industry.
• Grouping and Sorting: The GROUP BY i.industry_name groups the results by industry,
and the results are ordered by total_earnings DESC to get the industries with the highest
earnings.
• Limiting Results: LIMIT 3 restricts the result to the top 3 industries.

PostgreSQL Solution
SELECT
i.industry_name,
SUM(e.earnings) AS total_earnings,
AVG(e.age) AS average_age
FROM
employees e
JOIN
industries i ON e.industry_id = i.industry_id
WHERE
e.age < 30
GROUP BY
i.industry_name
ORDER BY
total_earnings DESC
LIMIT 3;

Explanation:
The query for PostgreSQL is the same as for MySQL. The logic and syntax are identical. The
key operations are filtering by age, joining the tables, grouping by industry, and ordering the
results to get the top 3 industries with the highest earnings.

1192
1000+ SQL Interview Questions & Answers | By Zero Analyst

Key Takeaways:
• Handling Age Filtering:
• Learn how to filter employees based on age using comparison operators like < 30 to get
those under 30 years old.
• Aggregation:
• Practice using aggregate functions like SUM() and AVG() to calculate total earnings and
average age, respectively, across grouped data.
• Joins and Grouping:
• Understand how to join tables based on a common column (industry_id) and how to
group data using GROUP BY to get aggregated results per group (industry in this case).
• Ranking with Limits:
• Learn how to limit the number of rows returned using LIMIT and sort the result based
on aggregated values to get the top results (top 3 industries).
• Real-Life Data Use:
• This type of query could be used in real-world scenarios such as identifying top-earning
industries for young professionals, salary analysis, or workforce distribution within a specific
age group.
• Q.1015
Find the top 3 Indian states with the highest population density in 2023. Output the
state name, population in 2023, area of the state, and population density (calculated as
population/area).

Explanation of the Question:


This question tests your ability to:
• Calculate derived fields: In this case, population density is a calculated field derived from
population and area (population / area).
• Filter and Aggregate Data: You will need to compute the population density for each
state and sort the results.
• Ranking: You'll rank the states based on their population density and select the top 3.
• Use of Arithmetic Operations in SQL: You'll apply division to compute population
density, and the use of ORDER BY helps you rank the states.
By using this dataset, you can simulate how to calculate important metrics like population
density and determine which states are most densely populated.

Learnings:
• Calculated Columns:
• You will learn how to calculate derived metrics such as population density using
arithmetic operations in SQL.
• Sorting and Ranking:
• By using ORDER BY, you'll learn to sort results in descending order to get the top states
based on a metric.
• Data Aggregation and Filtering:

1193
1000+ SQL Interview Questions & Answers | By Zero Analyst

• You'll practice aggregating data by computing population density at the state level, and
filtering results to retrieve the top 3 states.
• Handling Large Datasets:
• In real-world applications, this type of query can help in processing large datasets and
calculating statistics like population density for data analysis, policy-making, or urban
planning.

SQL Schemas and Datasets

Create and Insert Statements (based on Indian States and Populations)


-- Create the states table
CREATE TABLE states (
state_id INT PRIMARY KEY,
state_name VARCHAR(255),
population INT,
area DECIMAL(10, 2) -- Area in square kilometers
);

-- Insert sample data into states table


INSERT INTO states (state_id, state_name, population, area)
VALUES
(1, 'Uttar Pradesh', 220000000, 243286.00),
(2, 'Maharashtra', 123000000, 307713.00),
(3, 'Bihar', 126700000, 94163.00),
(4, 'West Bengal', 91000000, 88752.00),
(5, 'Madhya Pradesh', 85000000, 308350.00),
(6, 'Tamil Nadu', 76000000, 130058.00),
(7, 'Rajasthan', 81000000, 342239.00),
(8, 'Karnataka', 67000000, 191791.00),
(9, 'Gujarat', 68000000, 196024.00),
(10, 'Andhra Pradesh', 54000000, 162968.00),
(11, 'Odisha', 46000000, 155707.00),
(12, 'Kerala', 35000000, 38852.00);

Solutions

MySQL Solution
SELECT
state_name,
population,
area,
(population / area) AS population_density
FROM
states
ORDER BY
population_density DESC
LIMIT 3;

Explanation:
• Derived Field (Population Density): We calculate population density by dividing the
population by the area (population / area).
• Sorting by Population Density: The ORDER BY population_density DESC sorts the
states based on the calculated population density in descending order.
• Limiting Results: The LIMIT 3 ensures that only the top 3 states with the highest
population density are returned.

PostgreSQL Solution

1194
1000+ SQL Interview Questions & Answers | By Zero Analyst

SELECT
state_name,
population,
area,
(population / area) AS population_density
FROM
states
ORDER BY
population_density DESC
LIMIT 3;

Explanation:
• The query for PostgreSQL is identical to MySQL in terms of logic. We calculate the
population density and sort by it in descending order to get the top 3 states.

Key Takeaways:
• Calculated Metrics:
• Learn how to calculate population density by dividing one column by another
(population / area).
• Sorting and Ranking:
• Understand how to use ORDER BY to rank data based on calculated metrics and apply
LIMIT to retrieve the top 3 results.
• Handling Numerical Data:
• Practice working with large numbers (population and area) and ensure your calculations
are accurate by using appropriate data types like DECIMAL and INT.
• Real-World Use Cases:
• This query can be used in real-world applications for urban planning, resource
allocation, and government policies, where calculating population density is crucial for
determining state priorities.
• Data Quality Considerations:
• The question demonstrates the importance of working with accurate and consistent
datasets to ensure that metrics like population density are meaningful.

Why this Question is Useful:


• Practical Relevance: Population density is an important metric for urban planning,
infrastructure development, and socio-economic analysis. It is crucial for understanding how
people are distributed across vast geographical regions.
• SQL Proficiency: This question tests your understanding of SQL aggregation, arithmetic
operations, and ranking, which are essential for data analysis tasks in real-life situations.
• Q.1016
Find the top 3 USA companies by total revenue in 2023. Output the company name,
revenue in 2023, and the percentage change in revenue compared to 2022.

Explanation of the Question:


This question involves:
• Calculating revenue change: You will need to compute the percentage change in revenue
between two consecutive years (2023 and 2022).

1195
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Sorting by total revenue: You are asked to identify the top 3 companies based on their
revenue in 2023.
• Percentage calculation: The percentage change is calculated as:
Percentage Change

The question evaluates your ability to handle multi-year data, perform basic arithmetic
operations, and filter out the top companies based on revenue.

Learnings:
• Percentage Calculations:
• You'll learn how to calculate the percentage change in data over multiple years, which is
common in financial and business data analysis.
• Sorting and Ranking:
• The use of ORDER BY allows you to rank companies based on their 2023 revenue. With
LIMIT 3, you can efficiently get the top companies.
• Handling Multiple Years of Data:
• Working with multiple years of data will help you practice combining and comparing
datasets from different time periods, which is useful for financial forecasting or historical
trend analysis.
• Data Aggregation:
• The query will involve GROUP BY to aggregate data by company and compute totals,
which is a fundamental concept in SQL.

SQL Schemas and Datasets

Create and Insert Statements (based on Top USA Companies and Revenues)
-- Create the companies table
CREATE TABLE companies (
company_id INT PRIMARY KEY,
company_name VARCHAR(255)
);

-- Insert sample companies data


INSERT INTO companies (company_id, company_name) VALUES
(1, 'Apple'),
(2, 'Microsoft'),
(3, 'Amazon'),
(4, 'Google'),
(5, 'Tesla'),
(6, 'Meta'),
(7, 'Nvidia'),
(8, 'Intel');

-- Create the revenues table


CREATE TABLE company_revenue (
company_id INT,
year INT,
revenue DECIMAL(15, 2),
FOREIGN KEY (company_id) REFERENCES companies(company_id)

1196
1000+ SQL Interview Questions & Answers | By Zero Analyst

);

-- Insert sample revenue data for two years (2022 and 2023)
INSERT INTO company_revenue (company_id, year, revenue) VALUES
(1, 2022, 365817.00), -- Apple 2022 revenue in millions
(1, 2023, 389000.00), -- Apple 2023 revenue in millions
(2, 2022, 168000.00), -- Microsoft 2022 revenue in millions
(2, 2023, 183000.00), -- Microsoft 2023 revenue in millions
(3, 2022, 469800.00), -- Amazon 2022 revenue in millions
(3, 2023, 510000.00), -- Amazon 2023 revenue in millions
(4, 2022, 257600.00), -- Google 2022 revenue in millions
(4, 2023, 274000.00), -- Google 2023 revenue in millions
(5, 2022, 53500.00), -- Tesla 2022 revenue in millions
(5, 2023, 64000.00), -- Tesla 2023 revenue in millions
(6, 2022, 117930.00), -- Meta 2022 revenue in millions
(6, 2023, 119500.00), -- Meta 2023 revenue in millions
(7, 2022, 26100.00), -- Nvidia 2022 revenue in millions
(7, 2023, 38000.00), -- Nvidia 2023 revenue in millions
(8, 2022, 73000.00), -- Intel 2022 revenue in millions
(8, 2023, 75000.00); -- Intel 2023 revenue in millions

Solutions

MySQL Solution
SELECT
c.company_name,
r2023.revenue AS revenue_2023,
((r2023.revenue - r2022.revenue) / r2022.revenue) * 100 AS percentage_change
FROM
companies c
JOIN
company_revenue r2023 ON c.company_id = r2023.company_id AND r2023.year = 2023
JOIN
company_revenue r2022 ON c.company_id = r2022.company_id AND r2022.year = 2022
ORDER BY
r2023.revenue DESC
LIMIT 3;

Explanation:
• Join on Multiple Tables: The query performs a self-join on the company_revenue table to
get both 2022 and 2023 revenue for the same company.
• Calculate Percentage Change: The percentage change is calculated using the formula
((2023 revenue - 2022 revenue) / 2022 revenue) * 100.
• Sorting and Ranking: The companies are ordered by 2023 revenue in descending order,
and the LIMIT 3 clause ensures only the top 3 companies are returned.

PostgreSQL Solution
SELECT
c.company_name,
r2023.revenue AS revenue_2023,
((r2023.revenue - r2022.revenue) / r2022.revenue) * 100 AS percentage_change
FROM
companies c
JOIN
company_revenue r2023 ON c.company_id = r2023.company_id AND r2023.year = 2023
JOIN
company_revenue r2022 ON c.company_id = r2022.company_id AND r2022.year = 2022
ORDER BY
r2023.revenue DESC
LIMIT 3;

Explanation:

1197
1000+ SQL Interview Questions & Answers | By Zero Analyst

• The PostgreSQL query is identical to the MySQL query, as the syntax for JOINs and
calculations is very similar between MySQL and PostgreSQL.

Key Takeaways:
• Percentage Change Calculation:
• Learn how to calculate the percentage change between two years using basic arithmetic
operations in SQL.
• Multi-Year Data Aggregation:
• This query requires you to work with multiple years of data and JOIN the same table on
itself to compare values for two different years.
• Sorting and Ranking:
• You'll learn how to sort data based on a calculated metric, in this case, revenue in 2023,
and filter it to get the top 3 companies.
• Real-World Business Application:
• This query simulates a real-world use case for financial analysis, where businesses need to
assess their growth or decline by comparing revenues year-over-year.

Why This Question is Useful:


• Business Intelligence and Analysis: Understanding how a company's revenue is
performing year-over-year is critical for business decision-making, investments, and
forecasting.
• Advanced SQL Skills: This question combines data aggregation, arithmetic operations,
and self-joins to compare data over multiple years, which is a common and powerful
technique in SQL.
• Q.1017
Find the top 5 Indian states by GDP growth rate from 2022 to 2023. Output the state
name, GDP in 2022, GDP in 2023, GDP growth rate (percentage change), and the
percentage change.

Explanation of the Question:


This question evaluates:
• GDP Growth Rate Calculation: You need to compute the percentage change in GDP
between two years (2022 and 2023). The formula for the GDP growth rate is:

• Sorting States by Growth Rate: You need to find the states with the highest growth in
GDP from 2022 to 2023 and output the top 5 states.
• Ranking: You will use SQL's ORDER BY and LIMIT clauses to get the top 5 states
based on the highest growth rates.
This question will test your ability to work with time-series data (year-over-year
comparison), calculate growth rates, and filter results.

1198
1000+ SQL Interview Questions & Answers | By Zero Analyst

Learnings:
• Percentage Change Calculations:
• You will learn how to compute the percentage change between two years (or two data
points) in SQL, a common requirement in economic and financial analysis.
• SQL Sorting and Ranking:
• This question requires you to sort the results based on calculated values (GDP growth)
and limit the results to top values, an essential skill in data analysis and reporting.
• Data Comparison Across Time:
• Handling time-series data (2022 vs. 2023) to compare GDP will improve your ability to
work with historical data and derive insights over time.
• Advanced SQL Joins:
• You will practice joining two years of data from the same table and performing operations
on those values, a critical concept for business and financial analysis.

SQL Schemas and Datasets

Create and Insert Statements (based on Indian Economy and GDP Growth)
-- Create the states table
CREATE TABLE states (
state_id INT PRIMARY KEY,
state_name VARCHAR(255)
);

-- Insert sample states data


INSERT INTO states (state_id, state_name) VALUES
(1, 'Uttar Pradesh'),
(2, 'Maharashtra'),
(3, 'Bihar'),
(4, 'West Bengal'),
(5, 'Tamil Nadu'),
(6, 'Rajasthan'),
(7, 'Karnataka'),
(8, 'Gujarat'),
(9, 'Andhra Pradesh'),
(10, 'Madhya Pradesh');

-- Create the gdp table


CREATE TABLE state_gdp (
state_id INT,
year INT,
gdp DECIMAL(15, 2),
FOREIGN KEY (state_id) REFERENCES states(state_id)
);

-- Insert GDP data for 2022 and 2023


INSERT INTO state_gdp (state_id, year, gdp) VALUES
(1, 2022, 2192000.00), -- Uttar Pradesh GDP in 2022 (in INR crores)
(1, 2023, 2345000.00), -- Uttar Pradesh GDP in 2023
(2, 2022, 2105000.00), -- Maharashtra GDP in 2022
(2, 2023, 2180000.00), -- Maharashtra GDP in 2023
(3, 2022, 370000.00), -- Bihar GDP in 2022
(3, 2023, 390500.00), -- Bihar GDP in 2023
(4, 2022, 1505000.00), -- West Bengal GDP in 2022
(4, 2023, 1570000.00), -- West Bengal GDP in 2023
(5, 2022, 1700000.00), -- Tamil Nadu GDP in 2022
(5, 2023, 1750000.00), -- Tamil Nadu GDP in 2023
(6, 2022, 1055000.00), -- Rajasthan GDP in 2022
(6, 2023, 1085000.00), -- Rajasthan GDP in 2023
(7, 2022, 1500000.00), -- Karnataka GDP in 2022
(7, 2023, 1605000.00), -- Karnataka GDP in 2023
(8, 2022, 1300000.00), -- Gujarat GDP in 2022
(8, 2023, 1370000.00), -- Gujarat GDP in 2023

1199
1000+ SQL Interview Questions & Answers | By Zero Analyst

(9, 2022, 1350000.00), -- Andhra Pradesh GDP in 2022


(9, 2023, 1405000.00), -- Andhra Pradesh GDP in 2023
(10, 2022, 1000000.00), -- Madhya Pradesh GDP in 2022
(10, 2023, 1030000.00); -- Madhya Pradesh GDP in 2023

Solutions

MySQL Solution
SELECT
s.state_name,
g2023.gdp AS gdp_2023,
g2022.gdp AS gdp_2022,
((g2023.gdp - g2022.gdp) / g2022.gdp) * 100 AS gdp_growth_rate
FROM
states s
JOIN
state_gdp g2023 ON s.state_id = g2023.state_id AND g2023.year = 2023
JOIN
state_gdp g2022 ON s.state_id = g2022.state_id AND g2022.year = 2022
ORDER BY
gdp_growth_rate DESC
LIMIT 5;

Explanation:
• Join: The state_gdp table is joined twice on itself: once for 2023 and once for 2022 data.
• GDP Growth Rate Calculation: The GDP growth rate is calculated using the formula: .
((GDP2023−GDP2022)/GDP2022)×100((GDP_{2023} - GDP_{2022}) / GDP_{2022})
\times 100
• Sorting: The results are sorted in descending order based on the GDP growth rate.
• Top 5 States: The LIMIT 5 clause returns the top 5 states with the highest growth rates.

PostgreSQL Solution
SELECT
s.state_name,
g2023.gdp AS gdp_2023,
g2022.gdp AS gdp_2022,
((g2023.gdp - g2022.gdp) / g2022.gdp) * 100 AS gdp_growth_rate
FROM
states s
JOIN
state_gdp g2023 ON s.state_id = g2023.state_id AND g2023.year = 2023
JOIN
state_gdp g2022 ON s.state_id = g2022.state_id AND g2022.year = 2022
ORDER BY
gdp_growth_rate DESC
LIMIT 5;

Explanation:
• The PostgreSQL solution is the same as the MySQL one. Both SQL engines support
similar syntax for joins, sorting, and calculations.

Key Takeaways:
• GDP Growth Rate Calculation:
• You'll learn how to calculate growth rates based on previous and current values (2022 and
2023 GDP), a common economic analysis task.
• Joins Across Different Time Periods:

1200
1000+ SQL Interview Questions & Answers | By Zero Analyst

• The ability to join data from the same table for two different years (2022 and 2023) is
crucial for time-series data analysis.
• SQL Ranking and Sorting:
• Sorting the data by growth rate and filtering the top 5 states shows the power of using
ORDER BY and LIMIT together to rank results.
• Economic Analysis Using SQL:
• This exercise is similar to real-world economic analysis where you need to evaluate the
performance of different states (or regions) over time.

Why This Question is Useful:


• Economic Data Analysis: This type of analysis is used by economists, policy-makers, and
businesses to measure growth over time and make decisions based on economic trends.
• SQL Skills: This problem reinforces key SQL concepts like joins, arithmetic operations,
sorting, and ranking. These are essential skills for anyone working with business or
economic data.
• Q.1018
Find the top 5 Nifty 50 companies with the highest year-on-year growth in market
capitalization from 2022 to 2023. Output the company name, market capitalization in
2022, market capitalization in 2023, market cap growth (percentage change), and the
percentage change.

Explanation of the Question:


This question focuses on the year-on-year market cap growth for companies listed in the
Nifty 50 index, which is a stock market index representing the top 50 companies listed on the
National Stock Exchange of India (NSE).
Key Concepts to Focus On:
• Market Cap Growth Calculation: You need to calculate the percentage growth in the
market capitalization of each company from 2022 to 2023. The formula to calculate the
percentage growth is:
Growth Rate

• Sorting by Growth Rate: After calculating the growth rate, you need to sort the results by
the growth rate in descending order, to identify the top-performing companies.
• Limit the Results: The question asks for the top 5 companies with the highest growth
rate. This is achieved using LIMIT 5 in SQL.

Learnings:
• Understanding Financial Data:
• This question is great for learning how to handle financial data, particularly how to analyze
market capitalization growth—a common task in stock market analysis.

1201
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Growth Rate Calculation in SQL:


• You'll learn how to calculate the percentage change between two years using SQL, which
is useful in various business and financial analyses.
• SQL Sorting and Filtering:
• You'll practice sorting and filtering data, essential skills for working with large datasets in
the financial sector.
• Handling Time-Series Data:
• Working with two years of market cap data will help you understand how to compare data
points over time, a crucial skill in financial analysis.

SQL Schemas and Datasets

Create and Insert Statements (based on Nifty 50 Companies and Market Cap
Growth)
-- Create the companies table
CREATE TABLE companies (
company_id INT PRIMARY KEY,
company_name VARCHAR(255)
);

-- Insert sample company data


INSERT INTO companies (company_id, company_name) VALUES
(1, 'Reliance Industries'),
(2, 'Tata Consultancy Services'),
(3, 'HDFC Bank'),
(4, 'Infosys'),
(5, 'ICICI Bank'),
(6, 'Hindustan Unilever'),
(7, 'Larsen & Toubro'),
(8, 'Kotak Mahindra Bank'),
(9, 'Bajaj Finance'),
(10, 'Wipro');

-- Create the market_cap table


CREATE TABLE market_cap (
company_id INT,
year INT,
market_cap DECIMAL(15, 2), -- in INR Crores
FOREIGN KEY (company_id) REFERENCES companies(company_id)
);

-- Insert market capitalization data for 2022 and 2023


INSERT INTO market_cap (company_id, year, market_cap) VALUES
(1, 2022, 1750000.00), -- Reliance Industries market cap in 2022
(1, 2023, 1850000.00), -- Reliance Industries market cap in 2023
(2, 2022, 1600000.00), -- TCS market cap in 2022
(2, 2023, 1700000.00), -- TCS market cap in 2023
(3, 2022, 800000.00), -- HDFC Bank market cap in 2022
(3, 2023, 850000.00), -- HDFC Bank market cap in 2023
(4, 2022, 650000.00), -- Infosys market cap in 2022
(4, 2023, 700000.00), -- Infosys market cap in 2023
(5, 2022, 600000.00), -- ICICI Bank market cap in 2022
(5, 2023, 620000.00), -- ICICI Bank market cap in 2023
(6, 2022, 520000.00), -- Hindustan Unilever market cap in 2022
(6, 2023, 530000.00), -- Hindustan Unilever market cap in 2023
(7, 2022, 380000.00), -- L&T market cap in 2022
(7, 2023, 400000.00), -- L&T market cap in 2023
(8, 2022, 700000.00), -- Kotak Mahindra Bank market cap in 2022
(8, 2023, 710000.00), -- Kotak Mahindra Bank market cap in 2023
(9, 2022, 1300000.00), -- Bajaj Finance market cap in 2022
(9, 2023, 1350000.00), -- Bajaj Finance market cap in 2023
(10, 2022, 430000.00), -- Wipro market cap in 2022
(10, 2023, 450000.00); -- Wipro market cap in 2023

1202
1000+ SQL Interview Questions & Answers | By Zero Analyst

Solutions

MySQL Solution
SELECT
c.company_name,
m2023.market_cap AS market_cap_2023,
m2022.market_cap AS market_cap_2022,
((m2023.market_cap - m2022.market_cap) / m2022.market_cap) * 100 AS market_cap_growt
h
FROM
companies c
JOIN
market_cap m2023 ON c.company_id = m2023.company_id AND m2023.year = 2023
JOIN
market_cap m2022 ON c.company_id = m2022.company_id AND m2022.year = 2022
ORDER BY
market_cap_growth DESC
LIMIT 5;

Explanation:
• Joins: The query uses two joins on the market_cap table to retrieve both 2023 and 2022
data for the companies.
• Market Cap Growth Calculation: The percentage growth of market capitalization is
calculated using the formula:

• Sorting: The results are sorted in descending order by the market cap growth.
• Top 5 Companies: The LIMIT 5 clause ensures only the top 5 companies with the highest
growth are returned.

PostgreSQL Solution
SELECT
c.company_name,
m2023.market_cap AS market_cap_2023,
m2022.market_cap AS market_cap_2022,
((m2023.market_cap - m2022.market_cap) / m2022.market_cap) * 100 AS market_cap_growt
h
FROM
companies c
JOIN
market_cap m2023 ON c.company_id = m2023.company_id AND m2023.year = 2023
JOIN
market_cap m2022 ON c.company_id = m2022.company_id AND m2022.year = 2022
ORDER BY
market_cap_growth DESC
LIMIT 5;

Explanation:
• The PostgreSQL query is the same as the MySQL one. The SQL syntax for JOIN, ORDER
BY, and LIMIT works the same in both databases.

Key Takeaways:
• Financial Analysis Using SQL:

1203
1000+ SQL Interview Questions & Answers | By Zero Analyst

• This problem teaches how to calculate the year-on-year growth in market capitalization—a
key financial metric for evaluating company performance.
• Using SQL Joins for Time Series Data:
• The question involves joining data from the same table for two different years (2022 and
2023), which is a common task in financial and business analysis.
• Sorting and Ranking Companies:
• You'll learn how to sort and filter the top companies based on specific criteria like market
cap growth using ORDER BY and LIMIT.
• Practical Financial SQL Skills:
• This exercise simulates real-world scenarios, where investors, analysts, and financial
institutions need to evaluate and rank companies based on financial performance over time.

Why This Question is Useful:


• Stock Market Analysis: Understanding how to calculate and interpret financial growth is
crucial for anyone working in finance, investment, or business analysis.
• SQL for Business Intelligence: This problem strengthens your ability to use SQL for
business intelligence tasks like calculating growth, ranking companies, and making data-
driven decisions.
• Q.1019
Find the top 5 countries with the highest increase in population from 2022 to 2023.
Output the country name, population in 2022, population in 2023, population increase,
and the percentage increase in population.

Explanation of the Question:


This problem is based on population data for countries in the world, where you need to
calculate and rank the countries based on their population increase over the course of a year
(from 2022 to 2023).
Key Concepts to Focus On:
• Population Increase Calculation: You need to calculate the increase in population
between 2022 and 2023 for each country. The formula is:

• Percentage Increase: After calculating the increase, you also need to calculate the
percentage increase in population. This can be calculated as:

• Sorting by Population Increase: You need to sort the countries by their population
increase in descending order to get the top 5.
• Limiting the Results: The question asks for the top 5 countries with the highest
population increase, which requires you to use LIMIT 5.

1204
1000+ SQL Interview Questions & Answers | By Zero Analyst

Learnings:
• Understanding Population Growth:
• This question teaches how to analyze population growth, a key demographic metric used
by governments, economists, and policy makers for planning resources, infrastructure, and
governance.
• Performing Calculations in SQL:
• You'll learn how to calculate the difference between two years and the percentage change
within SQL—common operations in many fields including economics and business analysis.
• Ranking and Sorting Data:
• This exercise helps you understand how to rank data based on calculated fields (such as
population increase), and how to filter out the top results using LIMIT.
• Practical Use of Joins:
• Working with data from two different years and calculating the differences is a common
scenario in SQL queries, especially when analyzing trends or financial growth.

SQL Schemas and Datasets

Create and Insert Statements (based on World Population Data)


-- Create the countries table
CREATE TABLE countries (
country_id INT PRIMARY KEY,
country_name VARCHAR(255)
);

-- Insert sample country data


INSERT INTO countries (country_id, country_name) VALUES
(1, 'China'),
(2, 'India'),
(3, 'United States'),
(4, 'Indonesia'),
(5, 'Pakistan'),
(6, 'Brazil'),
(7, 'Nigeria'),
(8, 'Bangladesh'),
(9, 'Russia'),
(10, 'Mexico');

-- Create the population table


CREATE TABLE population (
country_id INT,
year INT,
population BIGINT, -- in actual data, population is usually in billions or millions
FOREIGN KEY (country_id) REFERENCES countries(country_id)
);

-- Insert population data for 2022 and 2023


INSERT INTO population (country_id, year, population) VALUES
(1, 2022, 1412600000), -- China population in 2022
(1, 2023, 1415000000), -- China population in 2023
(2, 2022, 1393409038), -- India population in 2022
(2, 2023, 1401000000), -- India population in 2023
(3, 2022, 332915073), -- United States population in 2022
(3, 2023, 334000000), -- United States population in 2023
(4, 2022, 276361783), -- Indonesia population in 2022
(4, 2023, 278000000), -- Indonesia population in 2023
(5, 2022, 225199937), -- Pakistan population in 2022
(5, 2023, 226500000), -- Pakistan population in 2023
(6, 2022, 213993437), -- Brazil population in 2022
(6, 2023, 214700000), -- Brazil population in 2023
(7, 2022, 211400708), -- Nigeria population in 2022
(7, 2023, 214000000), -- Nigeria population in 2023

1205
1000+ SQL Interview Questions & Answers | By Zero Analyst

(8, 2022, 166303498), -- Bangladesh population in 2022


(8, 2023, 167000000), -- Bangladesh population in 2023
(9, 2022, 145805947), -- Russia population in 2022
(9, 2023, 146000000), -- Russia population in 2023
(10, 2022, 126190788), -- Mexico population in 2022
(10, 2023, 127000000); -- Mexico population in 2023

Solutions

MySQL Solution
SELECT
c.country_name,
p2023.population AS population_2023,
p2022.population AS population_2022,
(p2023.population - p2022.population) AS population_increase,
((p2023.population - p2022.population) / p2022.population) * 100 AS percentage_incre
ase
FROM
countries c
JOIN
population p2023 ON c.country_id = p2023.country_id AND p2023.year = 2023
JOIN
population p2022 ON c.country_id = p2022.country_id AND p2022.year = 2022
ORDER BY
population_increase DESC
LIMIT 5;

Explanation:
• Joins: The query uses two joins on the population table to retrieve both the 2022 and
2023 population data for each country.
• Population Increase Calculation: The increase in population is calculated by subtracting
the population in 2022 from the population in 2023.
• Percentage Increase: The percentage increase is then calculated by dividing the
population increase by the 2022 population, and multiplying by 100.
• Sorting: The query sorts the countries by the population increase in descending order,
ensuring that the top countries with the highest growth appear first.
• Limit: The LIMIT 5 ensures that only the top 5 countries with the highest population
increase are returned.

PostgreSQL Solution
SELECT
c.country_name,
p2023.population AS population_2023,
p2022.population AS population_2022,
(p2023.population - p2022.population) AS population_increase,
((p2023.population - p2022.population) / p2022.population) * 100 AS percentage_incre
ase
FROM
countries c
JOIN
population p2023 ON c.country_id = p2023.country_id AND p2023.year = 2023
JOIN
population p2022 ON c.country_id = p2022.country_id AND p2022.year = 2022
ORDER BY
population_increase DESC
LIMIT 5;

Explanation:
• The PostgreSQL solution is identical to the MySQL solution, as the SQL syntax for joins,
ordering, and limiting works similarly in both databases.

1206
1000+ SQL Interview Questions & Answers | By Zero Analyst

Key Takeaways:
• Understanding Population Growth:
• The question teaches how to measure the increase in population between two years and
how to calculate and interpret percentage growth, which is a key skill in demographics,
economics, and social sciences.
• Handling Time-Series Data:
• You'll learn how to work with time-series data, specifically comparing population data
across two different years using SQL joins.
• SQL Calculations:
• This problem demonstrates how to perform arithmetic operations directly in SQL queries,
which is a common task in data analysis.
• Practical Use Cases:
• This question is applicable in various real-world situations, including governmental policy
planning, economic forecasting, and international comparisons on demographic growth.

Why This Question is Useful:


• Demographic Analysis:
• This exercise mimics tasks often faced by governmental agencies, economic research
institutions, or NGOs when analyzing population data to plan for future growth, resource
allocation, or policy changes.
• SQL Data Transformation Skills:
• The ability to calculate percentage changes and growth rates directly in SQL is a vital skill
in many business intelligence roles and financial analyses.
• Real-World Data Handling:
• Working with datasets based on global or national population statistics provides a deeper
understanding of how real-world data is structured and analyzed in various industries.

This problem is designed to test a mix of arithmetic skills, data transformation, and
proficiency in SQL joins and sorting, providing you with a well-rounded challenge in data
analysis.
• Q.1020
Identify the top 5 airlines with the highest number of accidents over the past 10 years.
Output the airline name, number of accidents, and the percentage of total accidents.

Explanation of the Question:


This problem requires you to analyze flight accident data over a 10-year period and identify
the airlines with the highest number of accidents. You will need to calculate the total
number of accidents and percentage contribution of each airline's accidents to the overall
total. The solution involves the following steps:
• Accident Count per Airline: For each airline, count the number of accidents they have
had in the past 10 years.
• Calculate Total Accidents: Sum the accidents of all airlines for the last 10 years.

1207
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Calculate Percentage of Total Accidents: Calculate each airline's share of the total
accidents as a percentage.
• Sorting: Rank the airlines by the number of accidents, and return only the top 5.

Learnings:
• SQL Grouping and Aggregation:
• You'll learn how to group data using GROUP BY and use aggregation functions like COUNT()
to calculate the number of accidents per airline.
• Percentage Calculations in SQL:
• This problem helps you practice how to calculate percentages in SQL by dividing the
airline's accident count by the total number of accidents.
• Filtering and Ranking:
• This question tests your ability to filter and rank data, specifically sorting by the accident
count and limiting the results to the top 5 using LIMIT.
• Real-World Use Case:
• This question mimics a real-world scenario that could be useful for aviation safety
analysts, government agencies, or insurance companies that deal with aviation safety and
accident analysis.

SQL Schemas and Datasets

Create and Insert Statements (based on Flight Accident Data)


-- Create table for airlines
CREATE TABLE airlines (
airline_id INT PRIMARY KEY,
airline_name VARCHAR(255)
);

-- Insert sample airline data


INSERT INTO airlines (airline_id, airline_name) VALUES
(1, 'Airline A'),
(2, 'Airline B'),
(3, 'Airline C'),
(4, 'Airline D'),
(5, 'Airline E'),
(6, 'Airline F'),
(7, 'Airline G');

-- Create table for accidents


CREATE TABLE flight_accidents (
accident_id INT PRIMARY KEY,
airline_id INT,
accident_date DATE,
fatalities INT,
FOREIGN KEY (airline_id) REFERENCES airlines(airline_id)
);

-- Insert sample flight accident data


INSERT INTO flight_accidents (accident_id, airline_id, accident_date, fatalities) VALUES
(1, 1, '2023-06-12', 50),
(2, 1, '2021-08-15', 30),
(3, 2, '2022-09-23', 10),
(4, 2, '2021-05-05', 5),
(5, 3, '2020-11-30', 15),
(6, 3, '2019-03-13', 20),
(7, 4, '2022-01-25', 40),
(8, 5, '2021-07-18', 60),
(9, 5, '2020-06-07', 45),

1208
1000+ SQL Interview Questions & Answers | By Zero Analyst

(10, 5, '2022-04-20', 30),


(11, 6, '2021-12-08', 0),
(12, 6, '2023-03-10', 2),
(13, 7, '2020-05-05', 5),
(14, 7, '2023-01-10', 3),
(15, 2, '2021-09-29', 20),
(16, 4, '2023-07-01', 10),
(17, 1, '2020-10-15', 25),
(18, 1, '2022-12-30', 15),
(19, 2, '2023-02-18', 4),
(20, 3, '2023-04-11', 8);

Solutions

MySQL Solution
SELECT
a.airline_name,
COUNT(f.accident_id) AS number_of_accidents,
ROUND((COUNT(f.accident_id) / total_accidents) * 100, 2) AS percentage_of_total_acci
dents
FROM
airlines a
JOIN
flight_accidents f ON a.airline_id = f.airline_id
JOIN
(SELECT COUNT(accident_id) AS total_accidents FROM flight_accidents WHERE accident_d
ate BETWEEN '2013-01-01' AND '2023-01-01') AS total
ON 1=1
WHERE
f.accident_date BETWEEN '2013-01-01' AND '2023-01-01'
GROUP BY
a.airline_name
ORDER BY
number_of_accidents DESC
LIMIT 5;

Explanation:
• Joins: The query joins the airlines and flight_accidents tables using the
airline_id.
• Date Filter: The query filters accidents that occurred between 2013-01-01 and 2023-01-
01, which ensures we're analyzing the last 10 years.
• Subquery: The subquery calculates the total number of accidents in the last 10 years. This
total is then used in the percentage calculation.
• Percentage Calculation: The percentage of total accidents for each airline is calculated by
dividing the number of accidents by the total accidents, and then multiplying by 100.
• Sorting and Limiting: The query orders the results by the number of accidents in
descending order, and limits the output to the top 5 airlines.

PostgreSQL Solution
SELECT
a.airline_name,
COUNT(f.accident_id) AS number_of_accidents,
ROUND((COUNT(f.accident_id) * 100.0 / total_accidents), 2) AS percentage_of_total_ac
cidents
FROM
airlines a
JOIN
flight_accidents f ON a.airline_id = f.airline_id
JOIN
(SELECT COUNT(accident_id) AS total_accidents FROM flight_accidents WHERE accident_d
ate BETWEEN '2013-01-01' AND '2023-01-01') AS total
ON 1=1

1209
1000+ SQL Interview Questions & Answers | By Zero Analyst

WHERE
f.accident_date BETWEEN '2013-01-01' AND '2023-01-01'
GROUP BY
a.airline_name
ORDER BY
number_of_accidents DESC
LIMIT 5;

Explanation:
• The PostgreSQL solution is almost identical to the MySQL solution, with a small
difference in how division is handled for the percentage calculation. PostgreSQL uses 100.0
to ensure the division results in a float.

Key Takeaways:
• Using Joins Across Multiple Tables:
• The problem helps you practice joining multiple tables (airlines and flight accidents) and
using aggregations (like COUNT()) to summarize the data.
• Percentage Calculations:
• The task of calculating percentages based on grouped data is very common in data analysis
and business intelligence.
• Data Filtering by Date:
• You'll learn how to filter data by a time range, which is especially useful in time-series
analysis and reporting.
• Handling Large Datasets:
• Flight accident data is usually quite large, so this exercise helps you understand how to
handle large datasets by aggregating and limiting results efficiently.

Why This Question is Useful:


• Aviation Safety and Policy:
• This question is especially relevant for aviation safety analysts or government bodies
that track the performance of airlines over time to identify trends and risks in air travel safety.
• Practical SQL Use Case:
• This question mirrors a common scenario where data scientists or business analysts need to
analyze historical accident data for decision-making and risk management.
• SQL Aggregation and Ranking:
• Understanding how to aggregate and rank results is crucial for anyone working with large
datasets in business analysis or decision support.

This problem provides a solid challenge, combining aggregation, percentages, date


filtering, and ranking in SQL, making it a good exercise for medium-hard interview
practice.
• Q.1021
• Q.1022
• Q.1023
• Q.1024
• Q.1025
Question

1210
1000+ SQL Interview Questions & Answers | By Zero Analyst

Retrieve all employees with more than 10 absences in 2024 and department equal to
"Engineering" from the EmployeeAttendance table.
Explanation
You need to select all records where the Absences are greater than 10, the Year is 2024, and
the Department is 'Engineering'. Use the WHERE clause to filter based on these conditions.
Datasets and SQL Schemas
-- Table creation
CREATE TABLE EmployeeAttendance (
EmployeeID INT,
EmployeeName VARCHAR(50),
Department VARCHAR(50),
Absences INT,
Year INT
);

-- Datasets
INSERT INTO EmployeeAttendance (EmployeeID, EmployeeName, Department, Absences, Year) VA
LUES
(1, 'John', 'Engineering', 12, 2024),
(2, 'Emma', 'Marketing', 8, 2024),
(3, 'Liam', 'Engineering', 15, 2024),
(4, 'Sophia', 'HR', 9, 2023),
(5, 'Noah', 'Engineering', 7, 2024);

Learnings
• Using the WHERE clause to filter based on multiple conditions (numeric values and text).
• Combining conditions with AND to retrieve data based on several criteria.
Solutions
• - PostgreSQL solution
SELECT EmployeeID, EmployeeName, Department, Absences, Year
FROM EmployeeAttendance
WHERE Absences > 10 AND Year = 2024 AND Department = 'Engineering';
• - MySQL solution
SELECT EmployeeID, EmployeeName, Department, Absences, Year
FROM EmployeeAttendance
WHERE Absences > 10 AND Year = 2024 AND Department = 'Engineering';
• Q.1026
• Q.1027
Question
Find the top 3 products with the highest cancellation rates for the month of January 2023.
Include only products that had at least 30 orders in that month, and where the cancellation
rate (number of cancellations / total orders) is greater than 20%.
Explanation
For each product, calculate the cancellation rate as the percentage of cancellations relative to
total orders. Filter out products with fewer than 30 orders in January 2023 and those with a
cancellation rate greater than 20%. Return the top 3 products with the highest cancellation
rates.
Datasets and SQL Schemas
• - Table creation
CREATE TABLE ProductOrders (
OrderID INT,
ProductID INT,
ProductName VARCHAR(100),

1211
1000+ SQL Interview Questions & Answers | By Zero Analyst

OrderAmount DECIMAL(10, 2),


OrderDate DATE
);

CREATE TABLE OrderCancellations (


CancellationID INT,
OrderID INT,
CancellationAmount DECIMAL(10, 2),
CancellationDate DATE
);
• - Datasets
INSERT INTO ProductOrders (OrderID, ProductID, ProductName, OrderAmount, OrderDate) VALU
ES
(1, 201, 'Laptop', 1500.00, '2023-01-01'),
(2, 202, 'Phone', 800.00, '2023-01-05'),
(3, 201, 'Laptop', 1200.00, '2023-01-15'),
(4, 203, 'Headphones', 100.00, '2023-01-10'),
(5, 202, 'Phone', 700.00, '2023-01-20'),
(6, 201, 'Laptop', 1300.00, '2023-01-25'),
(7, 204, 'Smartwatch', 300.00, '2023-01-02'),
(8, 203, 'Headphones', 150.00, '2023-01-12'),
(9, 202, 'Phone', 900.00, '2023-01-15'),
(10, 201, 'Laptop', 1800.00, '2023-01-18');

INSERT INTO OrderCancellations (CancellationID, OrderID, CancellationAmount, Cancellatio


nDate) VALUES
(1, 201, 1500.00, '2023-01-03'),
(2, 202, 800.00, '2023-01-07'),
(3, 201, 1200.00, '2023-01-17'),
(4, 203, 100.00, '2023-01-13'),
(5, 202, 700.00, '2023-01-21'),
(6, 204, 300.00, '2023-01-06'),
(7, 202, 900.00, '2023-01-25');

Learnings
• Using COUNT() to count both orders and cancellations
• Calculating cancellation rates
• Filtering with HAVING based on conditions involving aggregated data
• Sorting results by aggregated values
Solutions
• - PostgreSQL solution
WITH CancellationRates AS (
SELECT po.ProductID, po.ProductName,
COUNT(co.CancellationID) AS TotalCancellations,
COUNT(po.OrderID) AS TotalOrders,
(COUNT(co.CancellationID) * 1.0 / COUNT(po.OrderID)) * 100 AS CancellationRat
e
FROM ProductOrders po
LEFT JOIN OrderCancellations co ON po.OrderID = co.OrderID
WHERE po.OrderDate BETWEEN '2023-01-01' AND '2023-01-31'
GROUP BY po.ProductID, po.ProductName
HAVING COUNT(po.OrderID) >= 30 AND (COUNT(co.CancellationID) *1.0 /
COUNT(po.OrderID)) * 100 > 20)SELECT ProductID, ProductName, CancellationRate FROM
CancellationRatesORDER BY CancellationRate DESC LIMIT 3;

MySQL Solution
WITH CancellationRates AS (
SELECT po.ProductID, po.ProductName,
COUNT(co.CancellationID) AS TotalCancellations,
COUNT(po.OrderID) AS TotalOrders,
(COUNT(co.CancellationID) * 1.0 / COUNT(po.OrderID)) * 100 AS CancellationRat
e
FROM ProductOrders po
LEFT JOIN OrderCancellations co ON po.OrderID = co.OrderID
WHERE po.OrderDate BETWEEN '2023-01-01' AND '2023-01-31'
GROUP BY po.ProductID, po.ProductName

1212
1000+ SQL Interview Questions & Answers | By Zero Analyst

HAVING COUNT(po.OrderID) >= 30 AND (COUNT(co.CancellationID) * 1.0 / COUNT(po.OrderI


D)) * 100 > 20
)
SELECT ProductID, ProductName, CancellationRate
FROM CancellationRates
ORDER BY CancellationRate DESC
LIMIT 3;
• Q.1028

Question
Write a SQL query to find all active customers who watched more than 10 episodes of a
show called "Stranger Things" in the last 30 days.

Explanation
The task is to identify active users who have watched more than 10 distinct episodes of the
show "Stranger Things" within the last 30 days. The query should:
• Join the users, viewing_history, and shows tables.
• Filter for active users (u.active = TRUE).
• Filter for the show "Stranger Things" (s.show_name = 'Stranger Things').
• Ensure the viewing history is from the last 30 days.
• Count the distinct episodes watched and only include users who watched more than 10
episodes.

Datasets and SQL Schemas


• - Table creation
CREATE TABLE users (
user_id INT PRIMARY KEY,
active BOOLEAN
);

CREATE TABLE viewing_history (


user_id INT,
show_id INT,
episode_id INT,
watch_date DATE,
FOREIGN KEY (user_id) REFERENCES users(user_id),
FOREIGN KEY (show_id) REFERENCES shows(show_id)
);

CREATE TABLE shows (


show_id INT PRIMARY KEY,
show_name VARCHAR(100)
);
• - Datasets
-- Users
INSERT INTO users (user_id, active)
VALUES
(1001, TRUE),
(1002, FALSE),
(1003, TRUE),
(1004, TRUE),
(1005, FALSE);

-- Shows
INSERT INTO shows (show_id, show_name)
VALUES
(2001, 'Stranger Things'),
(2002, 'Money Heist');

-- Viewing History

1213
1000+ SQL Interview Questions & Answers | By Zero Analyst

INSERT INTO viewing_history (user_id, show_id, episode_id, watch_date)


VALUES
(1001, 2001, 3001, '2022-10-01'),
(1001, 2001, 3002, '2022-10-02'),
(1001, 2001, 3003, '2022-10-03'),
(1002, 2001, 3001, '2022-10-01'),
(1002, 2001, 3002, '2022-10-02'),
(1003, 2001, 3001, '2022-10-01'),
(1003, 2001, 3002, '2022-11-01'),
(1003, 2001, 3003, '2022-11-02'),
(1004, 2002, 3004, '2022-11-03');

Learnings
• Filtering data using date ranges (NOW() - INTERVAL '30 days').
• Using JOIN to combine data from multiple tables.
• Using COUNT(DISTINCT ...) to count unique episodes watched by each user.
• Filtering based on aggregated counts using HAVING.
• Sorting results based on user activity and ensuring distinct episodes are counted.

Solutions
• - PostgreSQL solution
SELECT DISTINCT v.user_id
FROM users u
JOIN viewing_history v ON v.user_id = u.user_id
JOIN shows s ON s.show_id = v.show_id
WHERE u.active = TRUE
AND s.show_name = 'Stranger Things'
AND v.watch_date >= NOW() - INTERVAL '30 days'
GROUP BY v.user_id
HAVING COUNT(DISTINCT v.episode_id) > 10;
• - MySQL solution
SELECT DISTINCT v.user_id
FROM users u
JOIN viewing_history v ON v.user_id = u.user_id
JOIN shows s ON s.show_id = v.show_id
WHERE u.active = TRUE
AND s.show_name = 'Stranger Things'
AND v.watch_date >= CURDATE() - INTERVAL 30 DAY
GROUP BY v.user_id
HAVING COUNT(DISTINCT v.episode_id) > 10;
• Q.1029

Question:
Uber wants to analyze driver performance by giving a special Diwali bonus!
Write an SQL query to find the top drivers based on the highest average rating in each city,
ensuring they have completed at least 5 rides in the last 3 months.
Ignore incomplete rides (where end_time is missing).
Return city_name, driver_name, total_completed_rides, and avg_rating.

Explanation:
• Join the Drivers and Rides tables based on driver_id.
• Filter rides that have completed (end_time IS NOT NULL) and are within the last 3
months (start_time >= CURRENT_DATE - INTERVAL '3 month').
• Group the results by city and driver_id to calculate the total completed rides and
average rating for each driver.

1214
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Use HAVING to ensure each driver has completed at least 5 rides in the last 3 months.
• Use the RANK() window function to rank the drivers by their average rating in each city.
• Filter to return only the top-ranked driver (rank = 1) in each city.

Datasets and SQL Schemas


• - Table creation
-- Create Rides Table
CREATE TABLE Rides (
ride_id INT PRIMARY KEY,
driver_id INT,
customer_id INT,
start_time TIMESTAMP,
end_time TIMESTAMP,
distance DECIMAL(5, 2),
price DECIMAL(8, 2),
rating DECIMAL(2, 1)
);

-- Create Drivers Table


CREATE TABLE Drivers (
driver_id INT PRIMARY KEY,
driver_name VARCHAR(100),
city VARCHAR(50),
join_date DATE
);
• - Insert Data
INSERT INTO Drivers (driver_id, driver_name, city, join_date)
VALUES
(104, 'Rajesh Kumar', 'Mumbai', '2022-03-10'),
(105, 'Anita Sharma', 'Delhi', '2022-08-25'),
(106, 'Vikram Singh', 'Bengaluru', '2023-02-18'),
(107, 'Priya Reddy', 'Mumbai', '2023-05-02'),
(108, 'Rohit Verma', 'Bengaluru', '2023-04-12'),
(109, 'Sanjay Patel', 'Delhi', '2023-06-30'),
(110, 'Deepa Nair', 'Mumbai', '2022-11-15'),
(111, 'Mohammed Ali', 'Hyderabad', '2023-01-20'),
(112, 'Sneha Joshi', 'Pune', '2023-03-15'),
(113, 'Arvind Menon', 'Chennai', '2023-06-05');

INSERT INTO Rides (ride_id, driver_id, customer_id, start_time, end_time, distance, pric
e, rating)
VALUES
(6, 104, 301, '2024-08-10 13:00:00', '2024-08-10 13:40:00', 8.2, 220.00, 4.5),
(7, 104, 302, '2024-09-12 14:20:00', '2024-09-12 14:55:00', 5.0, 150.25, 4.3),
(8, 104, 303, '2024-10-02 09:15:00', '2024-10-02 09:45:00', 6.5, 175.50, 4.8),
(9, 105, 304, '2024-08-15 16:30:00', '2024-08-15 17:00:00', 7.1, 187.50, 4.7),
(10, 105, 305, '2024-09-10 08:10:00', '2024-09-10 08:45:00', 9.2, 245.00, 4.6),
(11, 105, 306, '2024-10-20 19:05:00', '2024-10-20 19:35:00', 5.9, 160.00, 5.0),
(12, 106, 307, '2024-07-22 18:20:00', null, null, null, null),
(13, 106, 308, '2024-08-08 11:30:00', '2024-08-08 12:00:00', 3.6, 100.00, 4.4),
(14, 106, 309, '2024-09-15 09:00:00', '2024-09-15 09:35:00', 5.0, 132.50, 4.8),
(15, 107, 310, '2024-08-25 08:00:00', '2024-08-25 08:30:00', 6.2, 157.50, 4.2),
(16, 107, 311, '2024-09-22 13:20:00', '2024-09-22 13:50:00', 5.3, 140.00, 4.3),
(17, 107, 312, '2024-10-05 10:05:00', '2024-10-05 10:30:00', 4.8, 125.00, 4.5),
(18, 108, 313, '2024-08-02 15:30:00', '2024-08-02 16:00:00', 7.0, 190.00, 4.6),
(19, 108, 314, '2024-09-17 14:10:00', '2024-09-17 14:40:00', 8.2, 210.00, 4.7),
(20, 108, 315, '2024-10-12 17:30:00', '2024-10-12 17:55:00', 6.3, 165.00, 4.8),
(21, 109, 316, '2024-08-18 09:30:00', '2024-08-18 10:00:00', 6.0, 180.00, 4.2),
(22, 109, 317, '2024-09-20 11:45:00', '2024-09-20 12:15:00', 5.9, 175.00, 4.1),
(23, 109, 318, '2024-10-15 13:00:00', '2024-10-15 13:30:00', 4.7, 130.00, 4.5),
(27, 104, 322, '2024-10-15 11:10:00', '2024-10-15 11:40:00', 4.5, 120.00, 4.3),
(31, 105, 326, '2024-10-12 10:10:00', null, null, null, null),
(32, 105, 327, '2024-10-14 12:45:00', '2024-10-14 13:15:00', 5.8, 155.00, 4.5),
(35, 106, 330, '2024-10-11 08:30:00', '2024-10-11 09:00:00', 5.5, 140.00, 4.7),
(36, 106, 331, '2024-10-13 13:50:00', '2024-10-13 14:20:00', 6.4, 165.50, 4.5),
(39, 107, 334, '2024-10-14 09:00:00', '2024-10-14 09:30:00', 5.3, 150.00, 4.3),
(40, 107, 335, '2024-10-15 19:00:00', '2024-10-15 19:30:00', 6.1, 160.00, 4.4),

1215
1000+ SQL Interview Questions & Answers | By Zero Analyst

(41, 107, 336, '2024-10-17 21:20:00', null, null, null, null),


(44, 108, 339, '2024-10-14 18:00:00', '2024-10-14 18:30:00', 6.2, 170.00, 4.7),
(45, 108, 340, '2024-10-16 07:30:00', '2024-10-16 08:00:00', 5.8, 145.00, 4.5),
(46, 108, 341, '2024-10-18 12:30:00', '2024-10-18 13:00:00', 6.9, 185.75, 4.8),
(53, 107, 348, '2024-10-11 16:30:00', '2024-10-11 17:00:00', 5.2, 135.00, 4.6),
(54, 107, 349, '2024-10-12 07:45:00', '2024-10-12 08:15:00', 6.0, 160.00, 4.4),
(49, 105, 344, '2024-10-07 14:20:00', null, null, null, null),
(50, 105, 345, '2024-10-08 11:10:00', '2024-10-08 11:35:00', 5.6, 150.00, 4.3),
(55, 108, 350, '2024-10-13 12:00:00', '2024-10-13 12:30:00', 5.9, 155.00, 4.7),
(56, 108, 351, '2024-10-14 15:20:00', null, null, null, null),
(51, 106, 346, '2024-10-09 19:00:00', '2024-10-09 19:30:00', 7.1, 180.00, 4.6),
(52, 106, 347, '2024-10-10 09:00:00', '2024-10-10 09:25:00', 4.7, 125.00, 4.5),
(59, 110, 354, '2024-10-17 08:00:00', '2024-10-17 08:30:00', 4.0, 100.00, 4.1),
(60, 110, 355, '2024-10-18 14:15:00', '2024-10-18 14:45:00', 6.5, 150.00, 4.5),
(61, 110, 356, '2024-10-19 18:30:00', '2024-10-19 19:00:00', 3.8, 90.00, 4.2),
(62, 111, 357, '2024-10-15 09:30:00', '2024-10-15 10:00:00', 5.1, 130.00, 4.6),
(63, 111, 358, '2024-10-16 16:00:00', '2024-10-16 16:30:00', 4.3, 115.00, 4.4),
(64, 112, 359, '2024-10-14 11:00:00', '2024-10-14 11:30:00', 6.0, 155.00, 4.3),
(65, 112, 360, '2024-10-15 15:15:00', '2024-10-15 15:45:00', 7.2, 180.00, 4.5),
(66, 113, 361, '2024-10-10 09:00:00', '2024-10-10 09:30:00', 5.7, 140.00, 4.6),
(67, 113, 362, '2024-10-12 13:45:00', null, null, null, null),
(68, 113, 363, '2024-10-14 19:30:00', '2024-10-14 20:00:00', 5.9, 150.00, 4.7),
(69, 110, 364, '2024-10-20 10:00:00', '2024-10-20 10:30:00', 4.5, 110.00, 4.3),
(70, 110, 365, '2024-10-21 12:15:00', '2024-10-21 12:45:00', 6.2, 145.00, 4.4),
(71, 111, 366, '2024-10-22 14:30:00', '2024-10-22 15:00:00', 3.9, 95.00, 4.5),
(72, 111, 367, '2024-10-23 17:00:00', '2024-10-23 17:30:00', 4.6, 120.00, 4.2),
(73, 112, 368, '2024-10-24 08:00:00', '2024-10-24 08:30:00', 5.5, 145.00, 4.1),
(74, 112, 369, '2024-10-25 13:15:00', '2024-10-25 13:45:00', 6.8, 175.00, 4.4),
(78, 111, 373, '2024-10-29 10:15:00', '2024-10-29 10:45:00', 6.0, 160.00, 4.3),
(80, 113, 375, '2024-10-31 11:30:00', '2024-10-31 12:00:00', 5.8, 140.00, 4.6),
(77, 110, 372, '2024-10-28 15:30:00', '2024-10-28 16:00:00', 5.3, 135.00, 4.4);

Learnings:
• Joins: Using JOIN to combine tables based on driver_id.
• Filtering: Using conditions like end_time IS NOT NULL and date filtering.
• Aggregation: Using COUNT() to count completed rides and AVG() to calculate average
ratings.
• Window Functions: Using RANK() to rank drivers within each city based on their average
rating.
• Grouping: Using GROUP BY to aggregate data at a city and driver level.
• Having Clause: Ensuring drivers have completed a minimum of 5 rides.

Solutions
• - PostgreSQL solution
SELECT city, driver_name, total_completed_rides, avg_rating
FROM
(SELECT
d.city,
d.driver_id,
d.driver_name,
COUNT(r.ride_id) as total_completed_rides,
AVG(r.rating) as avg_rating,
RANK() OVER(PARTITION BY d.city ORDER BY AVG(r.rating) DESC) as rank
FROM rides as r
JOIN drivers as d ON d.driver_id = r.driver_id
WHERE
r.end_time IS NOT NULL
AND r.start_time >= CURRENT_DATE - INTERVAL '3 month'
GROUP BY d.city, d.driver_id, d.driver_name
HAVING COUNT(r.ride_id) >= 5) as subquery
WHERE rank = 1;
• - MySQL solution
SELECT city, driver_name, total_completed_rides, avg_rating
FROM
(SELECT

1216
1000+ SQL Interview Questions & Answers | By Zero Analyst

d.city,
d.driver_id,
d.driver_name,
COUNT(r.ride_id) as total_completed_rides,
AVG(r.rating) as avg_rating,
RANK() OVER(PARTITION BY d.city ORDER BY AVG(r.rating) DESC) as rank
FROM rides as r
JOIN drivers as d ON d.driver_id = r.driver_id
WHERE
r.end_time IS NOT NULL
AND r.start_time >= CURDATE() - INTERVAL 3 MONTH
GROUP BY d.city, d.driver_id, d.driver_name
HAVING COUNT(r.ride_id) >= 5) as subquery
WHERE rank = 1;
• Q.1030
Question
Tracking Refunds and Chargebacks
Given a table of payments (payment_id, user_id, payment_method, amount, payment_date,
transaction_type) and a table of refunds (refund_id, payment_id, refund_amount,
refund_date), write a query to calculate the net payment amount (payment amount minus
refund) for each user in the last 30 days. Include only users who have a net payment amount
greater than $0.

Explanation
• The goal is to track the net payment amount for each user by subtracting any refunds from
the original payments.
• For each payment, we will calculate the total refund (if any), and subtract it from the
payment amount.
• We will filter users whose net payment amount is greater than $0 in the last 30 days.
• This involves joining the payments table with the refunds table and calculating the net
amount.

Datasets and SQL Schemas


• - Table creation
CREATE TABLE payments (
payment_id INT,
user_id INT,
payment_method VARCHAR(50),
amount DECIMAL(10, 2),
payment_date DATE,
transaction_type VARCHAR(20)
);

CREATE TABLE refunds (


refund_id INT,
payment_id INT,
refund_amount DECIMAL(10, 2),
refund_date DATE
);
• - Datasets
INSERT INTO payments (payment_id, user_id, payment_method, amount, payment_date, transac
tion_type)
VALUES
(1, 101, 'credit card', 500.00, '2022-07-10', 'payment'),
(2, 101, 'PayPal', 1000.00, '2022-07-12', 'payment'),
(3, 102, 'credit card', 1200.00, '2022-07-15', 'payment'),
(4, 103, 'credit card', 500.00, '2022-07-16', 'payment'),

1217
1000+ SQL Interview Questions & Answers | By Zero Analyst

(5, 101, 'credit card', 300.00, '2022-07-17', 'payment');

INSERT INTO refunds (refund_id, payment_id, refund_amount, refund_date)


VALUES
(1, 1, 200.00, '2022-07-12'),
(2, 2, 100.00, '2022-07-14'),
(3, 4, 250.00, '2022-07-18');

Learnings
• Using JOIN to combine data from payments and refunds tables.
• Using COALESCE() or IFNULL() to handle null values when no refund exists for a payment.
• Filtering data for the last 30 days using CURRENT_DATE - INTERVAL '30 days'.
• Calculating the net payment by subtracting the refund amount from the payment amount.

Solutions
• - PostgreSQL solution
SELECT
p.user_id,
SUM(p.amount) - COALESCE(SUM(r.refund_amount), 0) AS net_payment
FROM
payments p
LEFT JOIN
refunds r ON p.payment_id = r.payment_id
WHERE
p.payment_date > CURRENT_DATE - INTERVAL '30 days'
GROUP BY
p.user_id
HAVING
SUM(p.amount) - COALESCE(SUM(r.refund_amount), 0) > 0;
• - MySQL solution
SELECT
p.user_id,
SUM(p.amount) - IFNULL(SUM(r.refund_amount), 0) AS net_payment
FROM
payments p
LEFT JOIN
refunds r ON p.payment_id = r.payment_id
WHERE
p.payment_date > CURDATE() - INTERVAL 30 DAY
GROUP BY
p.user_id
HAVING
SUM(p.amount) - IFNULL(SUM(r.refund_amount), 0) > 0;
• Q.1031

Question
Identify employees who were absent for two consecutive days.

Explanation
You need to identify employees who have two consecutive absent records in the attendance
table. You will likely use a self-join or window functions to compare attendance records for
consecutive days.

Datasets and SQL Schemas


Table: employees
CREATE TABLE employees (

1218
1000+ SQL Interview Questions & Answers | By Zero Analyst

employee_id INT,
employee_name VARCHAR(100)
);

-- datasets
INSERT INTO employees (employee_id, employee_name)
VALUES
(1, 'John Doe'),
(2, 'Jane Smith'),
(3, 'Alice Brown');

Table: attendance
CREATE TABLE attendance (
employee_id INT,
attendance_date DATE,
status VARCHAR(20)
);

-- datasets
INSERT INTO attendance (employee_id, attendance_date, status)
VALUES
(1, '2025-01-01', 'Absent'),
(1, '2025-01-02', 'Absent'),
(1, '2025-01-03', 'Present'),
(2, '2025-01-01', 'Present'),
(2, '2025-01-02', 'Absent'),
(2, '2025-01-03', 'Absent'),
(3, '2025-01-01', 'Present');

Learnings
• Self-joins or window functions for comparing consecutive rows
• Date handling for consecutive days
• Filtering based on conditions (e.g., 'Absent' status)

Solutions
PostgreSQL Solution
SELECT e.employee_name
FROM attendance a1
JOIN attendance a2 ON a1.employee_id = a2.employee_id
AND a1.attendance_date = a2.attendance_date - INTERVAL '1 day'
WHERE a1.status = 'Absent' AND a2.status = 'Absent'
GROUP BY e.employee_name;

MySQL Solution
SELECT e.employee_name
FROM attendance a1
JOIN attendance a2 ON a1.employee_id = a2.employee_id
AND a1.attendance_date = DATE_SUB(a2.attendance_date, INTERVAL 1 DAY)
WHERE a1.status = 'Absent' AND a2.status = 'Absent'
GROUP BY e.employee_name;
• Q.1032

Question
Find employees whose total overtime hours across all present days exceed 5 hours. Return
their employee_id and total overtime hours.

Explanation

1219
1000+ SQL Interview Questions & Answers | By Zero Analyst

You need to calculate the total overtime hours for each employee across days when they were
present. If the total overtime exceeds 5 hours, return their employee_id and the total
overtime hours. You will use aggregation and filtering to achieve this.

Datasets and SQL Schemas


Table: employees
CREATE TABLE employees (
employee_id INT,
employee_name VARCHAR(100)
);

-- datasets
INSERT INTO employees (employee_id, employee_name)
VALUES
(1, 'John Doe'),
(2, 'Jane Smith'),
(3, 'Alice Brown');

Table: attendance
CREATE TABLE attendance (
employee_id INT,
attendance_date DATE,
status VARCHAR(20),
overtime_hours DECIMAL(5,2) -- number of overtime hours
);

-- datasets
INSERT INTO attendance (employee_id, attendance_date, status, overtime_hours)
VALUES
(1, '2025-01-01', 'Present', 2.5),
(1, '2025-01-02', 'Present', 3.0),
(1, '2025-01-03', 'Absent', 0.0),
(2, '2025-01-01', 'Present', 1.5),
(2, '2025-01-02', 'Present', 2.5),
(2, '2025-01-03', 'Present', 2.0),
(3, '2025-01-01', 'Present', 6.0),
(3, '2025-01-02', 'Present', 0.5);

Learnings
• Aggregating values (total overtime hours).
• Filtering based on conditions (e.g., presence on specific days).
• Summing up overtime across multiple records.

Solutions
PostgreSQL Solution
SELECT employee_id, SUM(overtime_hours) AS total_overtime
FROM attendance
WHERE status = 'Present'
GROUP BY employee_id
HAVING SUM(overtime_hours) > 5;

MySQL Solution
SELECT employee_id, SUM(overtime_hours) AS total_overtime
FROM attendance
WHERE status = 'Present'
GROUP BY employee_id
HAVING SUM(overtime_hours) > 5;
• Q.1033

Question

1220
1000+ SQL Interview Questions & Answers | By Zero Analyst

For employees who were late, calculate their average overtime hours on those days. Exclude
employees who were never late.

Explanation
You need to calculate the average overtime hours for each employee on the days they were
late. Exclude employees who never had a "Late" status. This involves filtering for "Late"
days and then calculating the average overtime hours for those specific days.

Datasets and SQL Schemas


Table: employees
CREATE TABLE employees (
employee_id INT,
employee_name VARCHAR(100)
);

-- datasets
INSERT INTO employees (employee_id, employee_name)
VALUES
(1, 'John Doe'),
(2, 'Jane Smith'),
(3, 'Alice Brown');

Table: attendance
CREATE TABLE attendance (
employee_id INT,
attendance_date DATE,
status VARCHAR(20),
overtime_hours DECIMAL(5,2) -- number of overtime hours
);

-- datasets
INSERT INTO attendance (employee_id, attendance_date, status, overtime_hours)
VALUES
(1, '2025-01-01', 'Late', 2.5),
(1, '2025-01-02', 'Present', 0.0),
(1, '2025-01-03', 'Late', 1.5),
(2, '2025-01-01', 'Present', 0.0),
(2, '2025-01-02', 'Late', 3.0),
(2, '2025-01-03', 'Present', 0.0),
(3, '2025-01-01', 'Present', 0.0);

Learnings
• Filtering for specific statuses (e.g., "Late").
• Calculating averages with AVG() function.
• Excluding records based on conditions (e.g., employees who were never late).

Solutions
PostgreSQL Solution
SELECT employee_id, AVG(overtime_hours) AS avg_overtime
FROM attendance
WHERE status = 'Late'
GROUP BY employee_id
HAVING COUNT(*) > 0;

MySQL Solution
SELECT employee_id, AVG(overtime_hours) AS avg_overtime
FROM attendance
WHERE status = 'Late'
GROUP BY employee_id

1221
1000+ SQL Interview Questions & Answers | By Zero Analyst

HAVING COUNT(*) > 0;


• Q.1034

Question
Rank employees by their overall attendance consistency, defined as the total number of
Present days divided by the total number of attendance records. Return their employee_id,
consistency percentage, and rank.

Explanation
You need to calculate the attendance consistency for each employee. The consistency is
defined as the ratio of Present days to the total number of attendance records (including both
Present and Absent days). You will then rank employees based on their consistency. This
requires the use of aggregation, division, and ranking functions.

Datasets and SQL Schemas


Table: employees
CREATE TABLE employees (
employee_id INT,
employee_name VARCHAR(100)
);

-- datasets
INSERT INTO employees (employee_id, employee_name)
VALUES
(1, 'John Doe'),
(2, 'Jane Smith'),
(3, 'Alice Brown');

Table: attendance
CREATE TABLE attendance (
employee_id INT,
attendance_date DATE,
status VARCHAR(20)
);

-- datasets
INSERT INTO attendance (employee_id, attendance_date, status)
VALUES
(1, '2025-01-01', 'Present'),
(1, '2025-01-02', 'Absent'),
(1, '2025-01-03', 'Present'),
(2, '2025-01-01', 'Present'),
(2, '2025-01-02', 'Present'),
(2, '2025-01-03', 'Absent'),
(3, '2025-01-01', 'Absent'),
(3, '2025-01-02', 'Absent'),
(3, '2025-01-03', 'Present');

Learnings
• Calculating ratios (Present days / Total days).
• Ranking with window functions (RANK(), DENSE_RANK()).
• Using aggregation and conditional counting.

Solutions
PostgreSQL Solution
WITH attendance_summary AS (

1222
1000+ SQL Interview Questions & Answers | By Zero Analyst

SELECT employee_id,
COUNT(*) AS total_records,
COUNT(CASE WHEN status = 'Present' THEN 1 END) AS present_days
FROM attendance
GROUP BY employee_id
)
SELECT employee_id,
(present_days * 100.0 / total_records) AS consistency_percentage,
RANK() OVER (ORDER BY (present_days * 100.0 / total_records) DESC) AS rank
FROM attendance_summary;

MySQL Solution
WITH attendance_summary AS (
SELECT employee_id,
COUNT(*) AS total_records,
COUNT(CASE WHEN status = 'Present' THEN 1 END) AS present_days
FROM attendance
GROUP BY employee_id
)
SELECT employee_id,
(present_days * 100.0 / total_records) AS consistency_percentage,
RANK() OVER (ORDER BY (present_days * 100.0 / total_records) DESC) AS rank
FROM attendance_summary;
• Q.1035

Question
For each student, calculate their total grade across all subjects and rank them in descending
order of total grades. Return student_id, total_grade, and rank.

Explanation
You need to calculate the total grade for each student by summing their grades across all
subjects. Then, rank the students based on their total grade in descending order. This involves
using aggregation and ranking functions.

Datasets and SQL Schemas


Table: students
CREATE TABLE students (
student_id INT,
student_name VARCHAR(100)
);

-- datasets
INSERT INTO students (student_id, student_name)
VALUES
(1, 'John Doe'),
(2, 'Jane Smith'),
(3, 'Alice Brown');

Table: grades
CREATE TABLE grades (
student_id INT,
subject VARCHAR(100),
grade DECIMAL(5,2)
);

-- datasets
INSERT INTO grades (student_id, subject, grade)
VALUES
(1, 'Math', 85.5),
(1, 'Science', 90.0),
(1, 'History', 78.0),
(2, 'Math', 92.0),

1223
1000+ SQL Interview Questions & Answers | By Zero Analyst

(2, 'Science', 85.5),


(2, 'History', 88.0),
(3, 'Math', 75.0),
(3, 'Science', 80.0),
(3, 'History', 70.0);

Learnings
• Using SUM() to calculate total grades.
• Ranking results using RANK() or DENSE_RANK().
• Aggregation with grouping and ordering by total grades.

Solutions
PostgreSQL Solution
WITH total_grades AS (
SELECT student_id, SUM(grade) AS total_grade
FROM grades
GROUP BY student_id
)
SELECT student_id, total_grade,
RANK() OVER (ORDER BY total_grade DESC) AS rank
FROM total_grades;

MySQL Solution
WITH total_grades AS (
SELECT student_id, SUM(grade) AS total_grade
FROM grades
GROUP BY student_id
)
SELECT student_id, total_grade,
RANK() OVER (ORDER BY total_grade DESC) AS rank
FROM total_grades;
• Q.1036

Question
Identify the top scorer in each subject. If multiple students have the same top score in a
subject, return all of them. Return subject, student_id, and grade.

Explanation
You need to find the highest score for each subject, and if multiple students share the same
highest score, return all of them. This involves using aggregation to find the maximum grade
per subject, and then filtering to get students who match the maximum grade.

Datasets and SQL Schemas


Table: students
CREATE TABLE students (
student_id INT,
student_name VARCHAR(100)
);

-- datasets
INSERT INTO students (student_id, student_name)
VALUES
(1, 'John Doe'),
(2, 'Jane Smith'),
(3, 'Alice Brown');

Table: grades

1224
1000+ SQL Interview Questions & Answers | By Zero Analyst

CREATE TABLE grades (


student_id INT,
subject VARCHAR(100),
grade DECIMAL(5,2)
);

-- datasets
INSERT INTO grades (student_id, subject, grade)
VALUES
(1, 'Math', 85.5),
(1, 'Science', 90.0),
(1, 'History', 78.0),
(2, 'Math', 92.0),
(2, 'Science', 85.5),
(2, 'History', 88.0),
(3, 'Math', 92.0),
(3, 'Science', 80.0),
(3, 'History', 70.0);

Learnings
• Using MAX() to find the highest grade in a subject.
• Filtering to return all students who match the highest grade.
• Grouping by subject to calculate the top scorer.

Solutions
PostgreSQL Solution
WITH max_grades AS (
SELECT subject, MAX(grade) AS max_grade
FROM grades
GROUP BY subject
)
SELECT g.subject, g.student_id, g.grade
FROM grades g
JOIN max_grades m ON g.subject = m.subject AND g.grade = m.max_grade;

MySQL Solution
WITH max_grades AS (
SELECT subject, MAX(grade) AS max_grade
FROM grades
GROUP BY subject
)
SELECT g.subject, g.student_id, g.grade
FROM grades g
JOIN max_grades m ON g.subject = m.subject AND g.grade = m.max_grade;
• Q.1037

Question
Identify products whose sales performance improved each day over the 4-day period. This
means the units sold on a given day must be greater than the previous day. Return
product_id and the total_improvement_days (number of consecutive days with increasing
sales).

Explanation
You need to identify products where sales increased every day over a 4-day period. For each
product, you will compare sales for each day with the previous day. If the sales were greater
on a given day, the streak of improvement continues. At the end, count how many days in the
4-day period saw this improvement for each product.

1225
1000+ SQL Interview Questions & Answers | By Zero Analyst

Datasets and SQL Schemas


Table: products
CREATE TABLE products (
product_id INT,
product_name VARCHAR(100)
);

-- datasets
INSERT INTO products (product_id, product_name)
VALUES
(1, 'Product A'),
(2, 'Product B'),
(3, 'Product C');

Table: sales
CREATE TABLE sales (
product_id INT,
sale_date DATE,
units_sold INT
);

-- datasets
INSERT INTO sales (product_id, sale_date, units_sold)
VALUES
(1, '2025-01-01', 10),
(1, '2025-01-02', 15),
(1, '2025-01-03', 20),
(1, '2025-01-04', 25),
(2, '2025-01-01', 5),
(2, '2025-01-02', 7),
(2, '2025-01-03', 8),
(2, '2025-01-04', 7),
(3, '2025-01-01', 3),
(3, '2025-01-02', 4),
(3, '2025-01-03', 5),
(3, '2025-01-04', 6);

Learnings
• Using LAG() or self-joins to compare consecutive rows.
• Filtering based on a condition where sales increase over multiple consecutive days.
• Aggregating data to count the number of improvement days for each product.

Solutions
PostgreSQL Solution
WITH sales_comparison AS (
SELECT product_id, sale_date, units_sold,
LAG(units_sold) OVER (PARTITION BY product_id ORDER BY sale_date) AS prev_uni
ts_sold
FROM sales
)
SELECT product_id,
COUNT(*) AS total_improvement_days
FROM sales_comparison
WHERE units_sold > prev_units_sold
GROUP BY product_id
HAVING COUNT(*) = 3; -- 3 consecutive improvement days out of 4

MySQL Solution
WITH sales_comparison AS (
SELECT product_id, sale_date, units_sold,
LAG(units_sold) OVER (PARTITION BY product_id ORDER BY sale_date) AS prev_uni
ts_sold
FROM sales

1226
1000+ SQL Interview Questions & Answers | By Zero Analyst

)
SELECT product_id,
COUNT(*) AS total_improvement_days
FROM sales_comparison
WHERE units_sold > prev_units_sold
GROUP BY product_id
HAVING COUNT(*) = 3; -- 3 consecutive improvement days out of 4
• Q.1038

Question
For each product, calculate the maximum sales in a single day and return the product_id and
the max_sales. Only include products where the maximum sales were greater than 45 units.

Explanation
You need to calculate the maximum sales for each product on any single day. Then, filter the
results to only include products where their maximum sales exceeded 45 units. This involves
using the MAX() function to calculate the highest sales and applying a condition to filter based
on the value.

Datasets and SQL Schemas


Table: products
CREATE TABLE products (
product_id INT,
product_name VARCHAR(100)
);

-- datasets
INSERT INTO products (product_id, product_name)
VALUES
(1, 'Product A'),
(2, 'Product B'),
(3, 'Product C');

Table: sales
CREATE TABLE sales (
product_id INT,
sale_date DATE,
units_sold INT
);

-- datasets
INSERT INTO sales (product_id, sale_date, units_sold)
VALUES
(1, '2025-01-01', 10),
(1, '2025-01-02', 55),
(1, '2025-01-03', 20),
(1, '2025-01-04', 35),
(2, '2025-01-01', 20),
(2, '2025-01-02', 45),
(2, '2025-01-03', 30),
(2, '2025-01-04', 50),
(3, '2025-01-01', 5),
(3, '2025-01-02', 10),
(3, '2025-01-03', 15),
(3, '2025-01-04', 30);

Learnings
• Using the MAX() function to calculate the highest sales.
• Filtering results with HAVING to include only products with sales greater than a threshold.

1227
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Grouping data by product_id to perform the aggregation.

Solutions
PostgreSQL Solution
SELECT product_id, MAX(units_sold) AS max_sales
FROM sales
GROUP BY product_id
HAVING MAX(units_sold) > 45;

MySQL Solution
SELECT product_id, MAX(units_sold) AS max_sales
FROM sales
GROUP BY product_id
HAVING MAX(units_sold) > 45;
• Q.1039

Question
Find all the available listings in the location 'Miami' or 'New York' with an availability date
within the next 7 days.

Explanation
You need to filter the listings based on two conditions:
• The location must be either 'Miami' or 'New York'.
• The availability date should be within the next 7 days from the current date.
You will use WHERE clauses to filter based on the location and availability date. Additionally,
you can use date functions to ensure the availability date is within the next 7 days.

Datasets and SQL Schemas


Table: listings
CREATE TABLE listings (
listing_id INT,
location VARCHAR(100),
available_from DATE,
available_to DATE
);

-- datasets
INSERT INTO listings (listing_id, location, available_from, available_to)
VALUES
(1, 'Miami', '2025-01-15', '2025-01-20'),
(2, 'New York', '2025-01-18', '2025-01-22'),
(3, 'Los Angeles', '2025-01-10', '2025-01-15'),
(4, 'Miami', '2025-01-17', '2025-01-21'),
(5, 'New York', '2025-01-12', '2025-01-16');

Learnings
• Using the WHERE clause to filter by location.
• Using CURRENT_DATE or NOW() to filter dates relative to the current day.
• Date comparison to filter availability within the next 7 days.

Solutions
PostgreSQL Solution

1228
1000+ SQL Interview Questions & Answers | By Zero Analyst

SELECT listing_id, location, available_from, available_to


FROM listings
WHERE location IN ('Miami', 'New York')
AND available_from BETWEEN CURRENT_DATE AND CURRENT_DATE + INTERVAL '7 days';

MySQL Solution
SELECT listing_id, location, available_from, available_to
FROM listings
WHERE location IN ('Miami', 'New York')
AND available_from BETWEEN CURDATE() AND CURDATE() + INTERVAL 7 DAY;
• Q.1040

Question
Calculate the total potential revenue for listings available between '2024-12-20' and
'2024-12-25'.
(Revenue is defined as the price_per_night multiplied by the number of days the property
is available within this period.)

Explanation
You need to calculate the potential revenue for each listing based on the number of nights the
listing is available within the specified date range (2024-12-20 to 2024-12-25).
• The revenue for each listing is calculated by multiplying the price_per_night by the
number of nights the listing is available within the given date range.
• If the listing's availability period overlaps with this date range, you need to calculate how
many days fall within that range.

Datasets and SQL Schemas


Table: listings
CREATE TABLE listings (
listing_id INT,
location VARCHAR(100),
available_from DATE,
available_to DATE,
price_per_night DECIMAL(10, 2) -- Price per night for the listing
);

-- datasets
INSERT INTO listings (listing_id, location, available_from, available_to, price_per_nigh
t)
VALUES
(1, 'Miami', '2024-12-15', '2024-12-22', 150.00),
(2, 'New York', '2024-12-20', '2024-12-25', 250.00),
(3, 'Los Angeles', '2024-12-18', '2024-12-23', 200.00),
(4, 'Miami', '2024-12-10', '2024-12-18', 180.00),
(5, 'New York', '2024-12-21', '2024-12-25', 300.00);

Learnings
• Calculating the overlap between two date ranges.
• Using GREATEST() and LEAST() functions to determine the overlapping dates.
• Multiplying price_per_night by the number of overlapping days to calculate revenue.

Solutions
PostgreSQL Solution

1229
1000+ SQL Interview Questions & Answers | By Zero Analyst

SELECT listing_id,
location,
price_per_night,
-- Calculate the number of overlapping days
GREATEST(LEAST(available_to, '2024-12-25'::DATE) - GREATEST(available_from, '2024
-12-20'::DATE), 0) AS available_days,
price_per_night * GREATEST(LEAST(available_to, '2024-12-25'::DATE) - GREATEST(ava
ilable_from, '2024-12-20'::DATE), 0) AS potential_revenue
FROM listings
WHERE available_to >= '2024-12-20' AND available_from <= '2024-12-25';

MySQL Solution
SELECT listing_id,
location,
price_per_night,
-- Calculate the number of overlapping days
GREATEST(LEAST(available_to, '2024-12-25') - GREATEST(available_from, '2024-12-20
'), 0) AS available_days,
price_per_night * GREATEST(LEAST(available_to, '2024-12-25') - GREATEST(available
_from, '2024-12-20'), 0) AS potential_revenue
FROM listings
WHERE available_to >= '2024-12-20' AND available_from <= '2024-12-25';

Explanation of the Calculation:


• GREATEST(LEAST(available_to, '2024-12-25') - GREATEST(available_from,
'2024-12-20'), 0) calculates the number of overlapping days between the listing’s
availability and the specified period (2024-12-20 to 2024-12-25).
• price_per_night * available_days calculates the potential revenue for each listing
based on the overlapping days. If there's no overlap, the result will be zero due to the
GREATEST() function ensuring no negative days are calculated.
• Q.1041

Question
Write a query to find the most expensive listing (price_per_night) in each city.

Explanation
You need to find the highest price per night (price_per_night) for listings in each city. This
can be achieved by grouping the data by city and using the MAX() function to determine the
highest price. You will then return the listing that has the maximum price for each city.

Datasets and SQL Schemas


Table: listings
CREATE TABLE listings (
listing_id INT,
location VARCHAR(100),
available_from DATE,
available_to DATE,
price_per_night DECIMAL(10, 2) -- Price per night for the listing
);

-- datasets
INSERT INTO listings (listing_id, location, available_from, available_to, price_per_nigh
t)
VALUES
(1, 'Miami', '2024-12-15', '2024-12-22', 150.00),
(2, 'New York', '2024-12-20', '2024-12-25', 250.00),
(3, 'Los Angeles', '2024-12-18', '2024-12-23', 200.00),
(4, 'Miami', '2024-12-10', '2024-12-18', 180.00),
(5, 'New York', '2024-12-21', '2024-12-25', 300.00),

1230
1000+ SQL Interview Questions & Answers | By Zero Analyst

(6, 'Los Angeles', '2024-12-22', '2024-12-30', 220.00);

Learnings
• Using the MAX() function to find the highest value in a group.
• Grouping data by a specific column (location) to find maximum values for each city.
• Using JOIN (if needed) to retrieve the full record for the highest-priced listings.

Solutions
PostgreSQL Solution
WITH max_prices AS (
SELECT location, MAX(price_per_night) AS max_price
FROM listings
GROUP BY location
)
SELECT l.listing_id, l.location, l.price_per_night
FROM listings l
JOIN max_prices m ON l.location = m.location
AND l.price_per_night = m.max_price;

MySQL Solution
WITH max_prices AS (
SELECT location, MAX(price_per_night) AS max_price
FROM listings
GROUP BY location
)
SELECT l.listing_id, l.location, l.price_per_night
FROM listings l
JOIN max_prices m ON l.location = m.location
AND l.price_per_night = m.max_price;

Explanation of the Solution:


• The max_prices CTE (Common Table Expression) calculates the highest price
(MAX(price_per_night)) for each location.
• The main query joins the listings table with the max_prices CTE to find the listing with
the price_per_night equal to the maximum price for each location.
• This ensures that the query returns the full details of the most expensive listing for each
city.
• Q.1042

Question
Write a query to count how many listings are available and unavailable, grouped by
location.

Explanation
You need to count the number of listings in each location that are either available or
unavailable. To determine availability, you can use the available_from and available_to
dates. If the current date is between available_from and available_to, the listing is
considered available. Otherwise, it is unavailable.

Datasets and SQL Schemas


Table: listings
CREATE TABLE listings (

1231
1000+ SQL Interview Questions & Answers | By Zero Analyst

listing_id INT,
location VARCHAR(100),
available_from DATE,
available_to DATE,
price_per_night DECIMAL(10, 2) -- Price per night for the listing
);

-- datasets
INSERT INTO listings (listing_id, location, available_from, available_to, price_per_nigh
t)
VALUES
(1, 'Miami', '2024-12-15', '2024-12-22', 150.00),
(2, 'New York', '2024-12-20', '2024-12-25', 250.00),
(3, 'Los Angeles', '2024-12-18', '2024-12-23', 200.00),
(4, 'Miami', '2024-12-10', '2024-12-18', 180.00),
(5, 'New York', '2024-12-21', '2024-12-25', 300.00),
(6, 'Los Angeles', '2024-12-22', '2024-12-30', 220.00);

Learnings
• Using CASE statements to categorize records as available or unavailable based on date
comparison.
• Grouping data by location and counting the records for each category.
• Using CURRENT_DATE or NOW() to determine the current date.

Solutions
PostgreSQL Solution
SELECT location,
COUNT(CASE WHEN available_from <= CURRENT_DATE AND available_to >= CURRENT_DATE T
HEN 1 END) AS available_count,
COUNT(CASE WHEN available_from > CURRENT_DATE OR available_to < CURRENT_DATE THEN
1 END) AS unavailable_count
FROM listings
GROUP BY location;

MySQL Solution
SELECT location,
COUNT(CASE WHEN available_from <= CURDATE() AND available_to >= CURDATE() THEN 1
END) AS available_count,
COUNT(CASE WHEN available_from > CURDATE() OR available_to < CURDATE() THEN 1 END
) AS unavailable_count
FROM listings
GROUP BY location;

Explanation of the Solution:


• The CASE statement is used to categorize each listing:
• If the listing's available_from date is less than or equal to the current date
(CURRENT_DATE/CURDATE()) and its available_to date is greater than or equal to the current
date, it's considered available.
• If the listing is not available in this range (either available_from is in the future or
available_to is in the past), it's considered unavailable.
• COUNT is used to count the number of listings in each category, grouped by location. The
GROUP BY clause ensures the count is calculated for each location.
• Q.1043

Question
Write an SQL query to find for each month and country, the number of transactions and their
total amount, the number of approved transactions and their total amount.

1232
1000+ SQL Interview Questions & Answers | By Zero Analyst

Return the result table in the following order:

mont countr trans_cou approved_cou trans_total_amou approved_total_amou


h y nt nt nt nt

2018-
US 2 1 3000 1000
12

2019-
US 1 1 2000 2000
01

2019-
DE 1 1 2000 2000
01

Explanation
In this query, you need to:
• Group the transactions by month and country.
• Count the total number of transactions (trans_count).
• Count the number of approved transactions (approved_count).
• Calculate the total amount of all transactions (trans_total_amount).
• Calculate the total amount of approved transactions (approved_total_amount).
• Use CASE statements to conditionally sum values based on whether the transaction is
approved or not.
You can extract the month from the trans_date column using the appropriate date function
(TO_CHAR in PostgreSQL, DATE_FORMAT in MySQL) and group by both month and country.

Solution
PostgreSQL Solution
SELECT
TO_CHAR(trans_date, 'YYYY-MM') AS month,
country,
COUNT(*) AS trans_count,
SUM(CASE WHEN state = 'approved' THEN 1 ELSE 0 END) AS approved_count,
SUM(amount) AS trans_total_amount,
SUM(CASE WHEN state = 'approved' THEN amount ELSE 0 END) AS approved_total_amount
FROM Transactions
GROUP BY TO_CHAR(trans_date, 'YYYY-MM'), country
ORDER BY month, country;

MySQL Solution
SELECT
DATE_FORMAT(trans_date, '%Y-%m') AS month,
country,
COUNT(*) AS trans_count,
SUM(CASE WHEN state = 'approved' THEN 1 ELSE 0 END) AS approved_count,
SUM(amount) AS trans_total_amount,
SUM(CASE WHEN state = 'approved' THEN amount ELSE 0 END) AS approved_total_amount
FROM Transactions
GROUP BY DATE_FORMAT(trans_date, '%Y-%m'), country
ORDER BY month, country;

1233
1000+ SQL Interview Questions & Answers | By Zero Analyst

Explanation of the Query:


• TO_CHAR (PostgreSQL) / DATE_FORMAT (MySQL):
These functions are used to extract the year and month in the format YYYY-MM from the
trans_date.

• COUNT(*):
This counts the total number of transactions for each group (i.e., month and country).
• SUM(CASE WHEN state = 'approved' THEN 1 ELSE 0 END):
This counts how many transactions were approved. It adds 1 for approved transactions, and
0 for others, effectively counting only the approved ones.

• SUM(amount):
This calculates the total amount for all transactions in that group.
• SUM(CASE WHEN state = 'approved' THEN amount ELSE 0 END):
This sums the amounts only for approved transactions, calculating the total amount of
approved transactions.
• GROUP BY:
The data is grouped by month (which is derived from trans_date) and country.
• ORDER BY:
The result is ordered first by month and then by country to match the desired output format.

Expected Output:

mont countr trans_cou approved_cou trans_total_amou approved_total_amou


h y nt nt nt nt

2018-
US 2 1 3000 1000
12

2019-
US 1 1 2000 2000
01

2019-
DE 1 1 2000 2000
01
• Q.1044

Question
Given the reviews table, write a query to retrieve the average star rating for each product,
grouped by month.
The output should display the month as a numerical value, product ID, and average star rating
rounded to two decimal places.
Sort the output first by month and then by product ID.

1234
1000+ SQL Interview Questions & Answers | By Zero Analyst

Explanation
To solve this, you need to:
• Extract the month from the submit_date field.
• Group the data by product_id and month.
• Calculate the average star rating for each group.
• Round the average star rating to two decimal places.
• Sort the results by month and then by product_id.

Datasets and SQL Schemas


Table: reviews
CREATE TABLE reviews (
review_id INTEGER,
user_id INTEGER,
submit_date TIMESTAMP,
product_id INTEGER,
stars INTEGER
);

-- Sample data (provided in the prompt)


INSERT INTO reviews (review_id, user_id, submit_date, product_id, stars) VALUES
(6171, 123, '2022-06-08 00:00:00', 50001, 4),
(7802, 265, '2022-06-10 00:00:00', 69852, 4),
(5293, 362, '2022-06-18 00:00:00', 50001, 3),
(6352, 192, '2022-07-26 00:00:00', 69852, 3),
(4517, 981, '2022-07-05 00:00:00', 69852, 2),
(8654, 753, '2022-08-15 00:00:00', 50001, 5),
(9743, 642, '2022-08-22 00:00:00', 69852, 3),
(1025, 874, '2022-08-05 00:00:00', 50001, 4),
(2089, 512, '2022-09-10 00:00:00', 69852, 2),
(3078, 369, '2022-09-18 00:00:00', 50001, 5),
(4056, 785, '2022-09-25 00:00:00', 69852, 4),
(5034, 641, '2022-10-12 00:00:00', 50001, 3),
(6023, 829, '2022-10-18 00:00:00', 69852, 5),
(7012, 957, '2022-10-25 00:00:00', 50001, 2),
(8001, 413, '2022-11-05 00:00:00', 69852, 4),
(8990, 268, '2022-11-15 00:00:00', 50001, 3),
(9967, 518, '2022-11-22 00:00:00', 69852, 3),
(1086, 753, '2022-12-10 00:00:00', 50001, 5),
(1175, 642, '2022-12-18 00:00:00', 69852, 4),
(1264, 874, '2022-12-25 00:00:00', 50001, 3),
(1353, 512, '2022-12-31 00:00:00', 69852, 2),
(1442, 369, '2023-01-05 00:00:00', 50001, 4),
(1531, 785, '2023-01-15 00:00:00', 69852, 5),
(1620, 641, '2023-01-22 00:00:00', 50001, 3),
(1709, 829, '2023-01-30 00:00:00', 69852, 4);

Learnings
• Extracting the month and year from a timestamp or date field.
• Using GROUP BY to aggregate data.
• Calculating average values with AVG() and rounding the result using ROUND().
• Sorting results with ORDER BY.

Solutions
PostgreSQL Solution
SELECT
TO_CHAR(submit_date, 'YYYY-MM') AS month,
product_id,
ROUND(AVG(stars)::numeric, 2) AS average_star_rating

1235
1000+ SQL Interview Questions & Answers | By Zero Analyst

FROM reviews
GROUP BY TO_CHAR(submit_date, 'YYYY-MM'), product_id
ORDER BY month, product_id;

MySQL Solution
SELECT
DATE_FORMAT(submit_date, '%Y-%m') AS month,
product_id,
ROUND(AVG(stars), 2) AS average_star_rating
FROM reviews
GROUP BY DATE_FORMAT(submit_date, '%Y-%m'), product_id
ORDER BY month, product_id;

Explanation:
• TO_CHAR(submit_date, 'YYYY-MM') (PostgreSQL) or
DATE_FORMAT(submit_date, '%Y-%m') (MySQL): Extracts the year and month from
the submit_date as a string in the format YYYY-MM.
• AVG(stars): Calculates the average of the stars column for each group.
• ROUND(..., 2): Rounds the average star rating to 2 decimal places.
• GROUP BY: Groups the data by month and product_id to calculate the average per
product per month.
• ORDER BY: Orders the results first by month and then by product ID.

Expected Output:

month product_id average_star_rating

2022-06 50001 3.67

2022-06 69852 3.50

2022-07 69852 2.50

2022-08 50001 4.33

2022-08 69852 3.33

2022-09 50001 4.33

2022-09 69852 3.25

2022-10 50001 2.67

2022-10 69852 4.67

2022-11 50001 3.67

2022-11 69852 3.00

1236
1000+ SQL Interview Questions & Answers | By Zero Analyst

2022-12 50001 3.67

2022-12 69852 2.75

2023-01 50001 3.67

2023-01 69852 4.33

Summary:
This query calculates the average star ratings for each product in each month, rounds the
average to two decimal places, and sorts the results by month and product ID.
• Q.1045

Question
Identify users who have made purchases totaling more than $10,000 in the last month from
the purchases table.
The table contains information about purchases, including the user ID, date of purchase,
product ID, and amount spent.

Explanation
To solve this:
• Filter the records to include only those within the last month.
• This can be done by comparing the date_of_purchase with the current date
(CURRENT_DATE) and checking if it falls within the last month.
• Group the records by user_id to aggregate the total amount spent by each user.
• Use the SUM() function to calculate the total amount spent by each user.
• Filter the users whose total spending is greater than $10,000.

Datasets and SQL Schemas


Table: purchases
CREATE TABLE purchases (
purchase_id INT PRIMARY KEY,
user_id INT,
date_of_purchase TIMESTAMP,
product_id INT,
amount_spent DECIMAL(10, 2)
);

-- Sample data (provided in the prompt)


INSERT INTO purchases (purchase_id, user_id, date_of_purchase, product_id, amount_spent)
VALUES
(2171, 145, '2024-02-22 00:00:00', 43001, 1000),
(3022, 578, '2024-02-24 00:00:00', 25852, 4000),
(4933, 145, '2024-02-28 00:00:00', 43001, 7000),
(6322, 248, '2024-02-19 00:00:00', 25852, 2000),
(4717, 578, '2024-02-12 00:00:00', 25852, 7000),
(2172, 145, '2024-01-15 00:00:00', 43001, 8000),
(3023, 578, '2024-01-18 00:00:00', 25852, 3000),
(4934, 145, '2024-01-28 00:00:00', 43001, 9000),

1237
1000+ SQL Interview Questions & Answers | By Zero Analyst

(6323, 248, '2024-02-20 00:00:00', 25852, 1500),


(4718, 578, '2024-02-25 00:00:00', 25852, 6000);

Learnings
• Date comparison to filter records from the last month.
• SUM() aggregation to calculate the total purchase amount.
• GROUP BY to aggregate data by user_id.
• HAVING clause to filter groups based on aggregated results.

Solutions
PostgreSQL Solution
SELECT user_id, SUM(amount_spent) AS total_spent
FROM purchases
WHERE date_of_purchase >= CURRENT_DATE - INTERVAL '1 month'
GROUP BY user_id
HAVING SUM(amount_spent) > 10000;

MySQL Solution
SELECT user_id, SUM(amount_spent) AS total_spent
FROM purchases
WHERE date_of_purchase >= CURDATE() - INTERVAL 1 MONTH
GROUP BY user_id
HAVING SUM(amount_spent) > 10000;

Explanation of the Query:


• Filtering by Date:
• CURRENT_DATE - INTERVAL '1 month' (PostgreSQL) or CURDATE() - INTERVAL 1
MONTH (MySQL) ensures that only records within the last month are considered.
• SUM(amount_spent):
• The SUM() function is used to calculate the total amount spent by each user within the last
month.
• GROUP BY user_id:
• This groups the purchases by user_id so that we can calculate the total spending for each
user.
• HAVING SUM(amount_spent) > 10000:
• This filters out users who have spent $10,000 or less.

Expected Output:

user_id total_spent

145 27000.00

578 26000.00

Summary:
This query identifies users who made purchases totaling more than $10,000 in the last month.
The results display the user_id and their total spending during this period.
• Q.1046

1238
1000+ SQL Interview Questions & Answers | By Zero Analyst

Question
Identify the top 3 posts with the highest engagement (likes + comments) for each user on a
Facebook page. Display the user ID, post ID, engagement count, and rank for each post.

Explanation
You need to calculate the total engagement for each post (sum of likes and comments). Then,
rank the posts for each user based on this engagement in descending order. Finally, filter the
results to only show the top 3 posts for each user.

Datasets and SQL Schemas


• - Table creation
CREATE TABLE fb_posts (
post_id INT PRIMARY KEY,
user_id INT,
likes INT,
comments INT,
post_date DATE
);
• - Datasets
INSERT INTO fb_posts (post_id, user_id, likes, comments, post_date) VALUES
(1, 101, 50, 20, '2024-02-27'),
(2, 102, 30, 15, '2024-02-28'),
(3, 103, 70, 25, '2024-02-29'),
(4, 101, 80, 30, '2024-03-01'),
(5, 102, 40, 10, '2024-03-02'),
(6, 103, 60, 20, '2024-03-03'),
(7, 101, 90, 35, '2024-03-04'),
(8, 101, 90, 35, '2024-03-05'),
(9, 102, 50, 15, '2024-03-06'),
(10, 103, 30, 10, '2024-03-07'),
(11, 101, 60, 25, '2024-03-08'),
(12, 102, 70, 30, '2024-03-09'),
(13, 103, 80, 35, '2024-03-10'),
(14, 101, 40, 20, '2024-03-11'),
(15, 102, 90, 40, '2024-03-12'),
(16, 103, 20, 5, '2024-03-13'),
(17, 101, 70, 25, '2024-03-14'),
(18, 102, 50, 15, '2024-03-15'),
(19, 103, 30, 10, '2024-03-16'),
(20, 101, 60, 20, '2024-03-17');

Learnings
• Use of ROW_NUMBER() for ranking.
• Window functions to partition data by user.
• Summing values for calculating engagement.
• Filtering based on rank.

Solutions
• - PostgreSQL Solution
WITH RankedPosts AS (
SELECT user_id, post_id, (likes + comments) AS engagement,
ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY (likes + comments) DESC) AS
rank
FROM fb_posts
)

1239
1000+ SQL Interview Questions & Answers | By Zero Analyst

SELECT user_id, post_id, engagement, rank


FROM RankedPosts
WHERE rank <= 3
ORDER BY user_id, rank;
• - MySQL Solution
WITH RankedPosts AS (
SELECT user_id, post_id, (likes + comments) AS engagement,
ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY (likes + comments) DESC) AS
rank
FROM fb_posts
)
SELECT user_id, post_id, engagement, rank
FROM RankedPosts
WHERE rank <= 3
ORDER BY user_id, rank;
• Q.1047
Question
Write a query to retrieve the count of companies that have posted duplicate job listings.
Duplicate job listings are defined as two job listings within the same company that share
identical titles and descriptions.

Explanation
You need to group the job listings by company, title, and description, and then count how
many times each combination appears. If a combination appears more than once, it indicates
a duplicate listing. Finally, count how many companies have at least one duplicate job listing.

Datasets and SQL Schemas


• - Table creation
CREATE TABLE job_listings (
job_id INTEGER PRIMARY KEY,
company_id INTEGER,
title TEXT,
description TEXT
);
• - Datasets
INSERT INTO job_listings (job_id, company_id, title, description)
VALUES
(248, 827, 'Business Analyst', 'Business analyst evaluates past and current business
data with the primary goal of improving decision-making processes within organizations.'
),
(149, 845, 'Business Analyst', 'Business analyst evaluates past and current business
data with the primary goal of improving decision-making processes within organizations.'
),
(945, 345, 'Data Analyst', 'Data analyst reviews data to identify key insights into
a business''s customers and ways the data can be used to solve problems.'),
(164, 345, 'Data Analyst', 'Data analyst reviews data to identify key insights into
a business''s customers and ways the data can be used to solve problems.'),
(172, 244, 'Data Engineer', 'Data engineer works in a variety of settings to build s
ystems that collect, manage, and convert raw data into usable information for data scien
tists and business analysts to interpret.'),
(573, 456, 'Software Engineer', 'Software engineer designs, develops, tests, and mai
ntains software applications.'),
(324, 789, 'Software Engineer', 'Software engineer designs, develops, tests, and mai
ntains software applications.'),
(890, 123, 'Data Scientist', 'Data scientist analyzes and interprets complex data to
help organizations make informed decisions.'),
(753, 123, 'Data Scientist', 'Data scientist analyzes and interprets complex data to
help organizations make informed decisions.');

1240
1000+ SQL Interview Questions & Answers | By Zero Analyst

Learnings
• Use GROUP BY to group rows based on shared attributes (company, title, description).
• Use HAVING to filter groups with more than one occurrence.
• Use COUNT for aggregating and filtering duplicate entries.

Solutions
• - PostgreSQL Solution
SELECT COUNT(DISTINCT company_id) AS cnt_company
FROM (
SELECT company_id, title, description, COUNT(*) AS total_job
FROM job_listings
GROUP BY company_id, title, description
HAVING COUNT(*) > 1
) x1;
• - MySQL Solution
SELECT COUNT(DISTINCT company_id) AS cnt_company
FROM (
SELECT company_id, title, description, COUNT(*) AS total_job
FROM job_listings
GROUP BY company_id, title, description
HAVING COUNT(*) > 1
) x1;
• Q.1048
Question
Identify the region with the lowest sales amount for the previous month. Return the region
name and total sales amount.

Explanation
The task is to identify the region with the lowest sales for the previous month. The solution
involves filtering the sales data for the previous month, grouping it by region, calculating the
total sales for each region, and then selecting the region with the lowest total sales.

Datasets and SQL Schemas


-- Table creation
CREATE TABLE Sales (
SaleID SERIAL PRIMARY KEY,
Region VARCHAR(50),
Amount DECIMAL(10, 2),
SaleDate DATE
);

-- datasets
INSERT INTO Sales (Region, Amount, SaleDate) VALUES
('North', 5000.00, '2024-02-01'),
('South', 6000.00, '2024-02-02'),
('East', 4500.00, '2024-02-03'),
('West', 7000.00, '2024-02-04'),
('North', 5500.00, '2024-02-05'),
('South', 6500.00, '2024-02-06'),
('East', 4800.00, '2024-02-07'),
('West', 7200.00, '2024-02-08'),
('North', 5200.00, '2024-02-09'),
('South', 6200.00, '2024-02-10'),
('East', 4700.00, '2024-02-11'),
('West', 7100.00, '2024-02-12'),
('North', 5300.00, '2024-02-13'),

1241
1000+ SQL Interview Questions & Answers | By Zero Analyst

('South', 6300.00, '2024-02-14'),


('East', 4600.00, '2024-02-15'),
('West', 7300.00, '2024-02-16'),
('North', 5400.00, '2024-02-17'),
('South', 6400.00, '2024-02-18'),
('East', 4900.00, '2024-02-19'),
('West', 7400.00, '2024-02-20'),
('North', 5600.00, '2024-02-21'),
('South', 6600.00, '2024-02-22'),
('East', 5000.00, '2024-02-23'),
('West', 7500.00, '2024-02-24'),
('North', 5700.00, '2024-02-25'),
('South', 6700.00, '2024-02-26'),
('East', 5100.00, '2024-02-27'),
('West', 7600.00, '2024-02-28');

Learnings
• Date Functions: Using EXTRACT() to filter records for the previous month.
• Aggregation: Summing values using SUM() and grouping by region with GROUP BY.
• Sorting: Sorting by total sales in ascending order to get the region with the lowest sales.
• Limiting: Using LIMIT 1 to get only the region with the lowest sales.

Solutions
• PostgreSQL solution
SELECT
region,
SUM(amount) as total_sales
FROM sales
WHERE EXTRACT(MONTH FROM saledate) = EXTRACT(MONTH FROM CURRENT_DATE - INTERVAL '1 month
')
AND EXTRACT(YEAR FROM saledate) = EXTRACT(YEAR FROM CURRENT_DATE)
GROUP BY region
ORDER BY total_sales ASC
LIMIT 1;
• MySQL solution
SELECT
region,
SUM(amount) AS total_sales
FROM sales
WHERE MONTH(saledate) = MONTH(CURRENT_DATE - INTERVAL 1 MONTH)
AND YEAR(saledate) = YEAR(CURRENT_DATE)
GROUP BY region
ORDER BY total_sales ASC
LIMIT 1;
• Q.1049
Question
Find the median within a series of numbers in SQL.

Explanation
To calculate the median, you need to:
• Order the data in ascending or descending order.
• Use ROW_NUMBER() to assign a rank to each record.
• For an odd number of records, the median is the middle value. For an even number, the
median is the average of the two middle values.
• Filter the rows to select the middle value(s) based on the rank difference.

1242
1000+ SQL Interview Questions & Answers | By Zero Analyst

Datasets and SQL Schemas


-- Table creation
CREATE TABLE tiktok (
views INT
);

-- datasets
INSERT INTO tiktok (views)
VALUES
(100), (800), (350),
(150), (600),
(700), (700), (950);

Learnings
• Window Functions: Using ROW_NUMBER() to assign row numbers in both ascending and
descending order.
• Median Calculation: Calculating the median using ranking and selecting values based on
the rank difference.
• CTE: Using Common Table Expressions (CTEs) to simplify complex queries and
intermediate results.
• Handling Odd/Even Data: Identifying whether the data set size is odd or even and
calculating the appropriate median.

Solutions
• PostgreSQL solution
WITH CTE AS (
SELECT
views,
ROW_NUMBER() OVER(ORDER BY views ASC) AS rn_asc,
ROW_NUMBER() OVER(ORDER BY views DESC) AS rn_desc
FROM tiktok
WHERE views < 900
)
SELECT
AVG(views) AS median
FROM CTE
WHERE ABS(rn_asc - rn_desc) <= 1; -- 0 or 1
• MySQL solution
WITH CTE AS (
SELECT
views,
ROW_NUMBER() OVER(ORDER BY views ASC) AS rn_asc,
ROW_NUMBER() OVER(ORDER BY views DESC) AS rn_desc
FROM tiktok
WHERE views < 900
)
SELECT
AVG(views) AS median
FROM CTE
WHERE ABS(rn_asc - rn_desc) <= 1; -- 0 or 1
• Q.1050
Question
Which metro city had the highest number of restaurant orders in September 2021?
Write the SQL query to retrieve the city name and the total count of orders, ordered by the
total count of orders in descending order.

1243
1000+ SQL Interview Questions & Answers | By Zero Analyst

(Note: Metro cities are 'Delhi', 'Mumbai', 'Bangalore', 'Hyderabad')

Explanation
To solve this:
• Filter the records for September 2021.
• Restrict the cities to 'Delhi', 'Mumbai', 'Bangalore', and 'Hyderabad'.
• Group the data by city and count the number of orders per city.
• Order the results by the count of orders in descending order to find the city with the highest
number of orders.

Datasets and SQL Schemas


-- Table creation
CREATE TABLE restaurant_orders (
city VARCHAR(50),
restaurant_id INT,
order_id INT,
order_date DATE
);

-- datasets
INSERT INTO restaurant_orders (city, restaurant_id, order_id, order_date)
VALUES
('Delhi', 101, 1, '2021-09-05'),
('Bangalore', 102, 12, '2021-09-08'),
('Bangalore', 102, 13, '2021-09-08'),
('Bangalore', 102, 14, '2021-09-08'),
('Mumbai', 103, 3, '2021-09-10'),
('Mumbai', 103, 30, '2021-09-10'),
('Chennai', 104, 4, '2021-09-15'),
('Delhi', 105, 5, '2021-09-20'),
('Bangalore', 106, 6, '2021-09-25'),
('Mumbai', 107, 7, '2021-09-28'),
('Chennai', 108, 8, '2021-09-30'),
('Delhi', 109, 9, '2021-10-05'),
('Bangalore', 110, 10, '2021-10-08'),
('Mumbai', 111, 11, '2021-10-10'),
('Chennai', 112, 12, '2021-10-15'),
('Kolkata', 113, 13, '2021-10-20'),
('Hyderabad', 114, 14, '2021-10-25'),
('Pune', 115, 15, '2021-10-28'),
('Jaipur', 116, 16, '2021-10-30');

Learnings
• Filtering by Date: Using the WHERE clause to filter data for a specific month and year.
• City Restriction: Filtering cities by a predefined list (Metro cities).
• Aggregation: Using COUNT() to count the number of orders per city.
• Sorting: Ordering the results by the count of orders in descending order to get the city with
the highest orders.

Solutions
• PostgreSQL solution
SELECT
city,
COUNT(order_id) AS total_orders
FROM restaurant_orders
WHERE city IN ('Delhi', 'Mumbai', 'Bangalore', 'Hyderabad')

1244
1000+ SQL Interview Questions & Answers | By Zero Analyst

AND EXTRACT(MONTH FROM order_date) = 9


AND EXTRACT(YEAR FROM order_date) = 2021
GROUP BY city
ORDER BY total_orders DESC;
• MySQL solution
SELECT
city,
COUNT(order_id) AS total_orders
FROM restaurant_orders
WHERE city IN ('Delhi', 'Mumbai', 'Bangalore', 'Hyderabad')
AND MONTH(order_date) = 9
AND YEAR(order_date) = 2021
GROUP BY city
ORDER BY total_orders DESC;
• Q.1051
Question
Write a SQL query to calculate the running total revenue for each combination of date and
product ID.
The output should include: date, product_id, product_name, revenue, and
running_total.

The result should be ordered by product_id and date in ascending order.

Explanation
To calculate the running total:
• Use the SUM() function with OVER() to compute the cumulative sum of revenue.
• The PARTITION BY clause is used to compute the running total separately for each
product_id.
• The ORDER BY clause ensures that the running total is calculated in the correct sequence
(by product_id and date).
• Ensure the result is ordered by product_id and date as per the requirement.

Datasets and SQL Schemas


-- Table creation
CREATE TABLE orders (
date DATE,
product_id INT,
product_name VARCHAR(255),
revenue DECIMAL(10, 2)
);

-- datasets
INSERT INTO orders (date, product_id, product_name, revenue) VALUES
('2024-01-01', 101, 'iPhone 13 Pro', 1000.00),
('2024-01-01', 102, 'iPhone 13 Pro Max', 1200.00),
('2024-01-02', 101, 'iPhone 13 Pro', 950.00),
('2024-01-02', 103, 'iPhone 12 Pro', 1100.00),
('2024-01-03', 102, 'iPhone 13 Pro Max', 1250.00),
('2024-01-03', 104, 'iPhone 11', 1400.00),
('2024-01-04', 101, 'iPhone 13 Pro', 800.00),
('2024-01-04', 102, 'iPhone 13 Pro Max', 1350.00),
('2024-01-05', 103, 'iPhone 12 Pro', 1000.00),
('2024-01-05', 104, 'iPhone 11', 700.00),
('2024-01-06', 101, 'iPhone 13 Pro', 600.00),
('2024-01-06', 102, 'iPhone 13 Pro Max', 550.00),
('2024-01-07', 101, 'iPhone 13 Pro', 400.00),
('2024-01-07', 103, 'iPhone 12 Pro', 250.00),
('2024-01-08', 102, 'iPhone 13 Pro Max', 200.00),

1245
1000+ SQL Interview Questions & Answers | By Zero Analyst

('2024-01-08', 104, 'iPhone 11', 150.00),


('2024-01-09', 101, 'iPhone 13 Pro', 100.00),
('2024-01-09', 102, 'iPhone 13 Pro Max', 50.00),
('2024-01-10', 101, 'iPhone 13 Pro', 1000.00),
('2024-01-10', 102, 'iPhone 13 Pro Max', 1200.00),
('2024-01-11', 101, 'iPhone 13 Pro', 950.00),
('2024-01-11', 103, 'iPhone 12 Pro', 1100.00),
('2024-01-12', 102, 'iPhone 13 Pro Max', 1250.00),
('2024-01-12', 104, 'iPhone 11', 1400.00);

Learnings
• Window Functions: Using SUM() with OVER() to compute the running total.
• Partitioning: Using PARTITION BY to calculate running totals separately for each
product_id.
• Ordering: Ensuring the correct sequence of calculations using ORDER BY within the
OVER() clause.
• Date Handling: Managing the date ordering for running totals.

Solutions
• PostgreSQL solution
SELECT
date,
product_id,
product_name,
revenue,
SUM(revenue) OVER (PARTITION BY product_id ORDER BY date) AS running_total
FROM orders
ORDER BY product_id, date;
• MySQL solution
SELECT
date,
product_id,
product_name,
revenue,
SUM(revenue) OVER (PARTITION BY product_id ORDER BY date) AS running_total
FROM orders
ORDER BY product_id, date;
• Q.1052
Question
Suppose you are given two tables - Orders and Returns.
The Orders table contains information about orders placed by customers, and the Returns
table contains information about returned items.
Design a SQL query to find the top 5 customers with the highest percentage of returned items
out of their total orders.
Return the customer ID and the percentage of returned items rounded to two decimal places.

Explanation
To solve this:
• Join the orders and returns tables on order_id.
• Calculate the total items ordered and the total returned items for each customer.

1246
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Compute the percentage of returned items as (returned_items /


total_items_ordered) * 100.
• Use ORDER BY to sort customers by the highest percentage of returned items.
• Use LIMIT 5 to get the top 5 customers.
• Round the percentage to two decimal places.

Datasets and SQL Schemas


-- Table creation for orders
CREATE TABLE orders (
order_id INT,
customer_id INT,
order_date DATE,
total_items_ordered INT
);

-- Insert data into orders


INSERT INTO orders VALUES
(1, 101, '2022-01-01', 5),
(2, 102, '2022-01-02', 10),
(3, 103, '2022-01-03', 8),
(4, 104, '2022-01-04', 12),
(5, 105, '2022-01-05', 15),
(6, 106, '2022-01-06', 20),
(7, 107, '2022-01-07', 25),
(8, 108, '2022-01-08', 30),
(9, 109, '2022-01-09', 35),
(10, 110, '2022-01-10', 40),
(11, 111, '2022-01-11', 45),
(12, 112, '2022-01-12', 50),
(13, 113, '2022-01-13', 55),
(14, 114, '2022-01-14', 60),
(15, 115, '2022-01-15', 65);

-- Table creation for returns


CREATE TABLE returns (
return_id INT,
order_id INT,
return_date DATE,
returned_items INT
);

-- Insert data into returns


INSERT INTO returns VALUES
(1, 1, '2022-01-03', 2),
(2, 2, '2022-01-05', 3),
(3, 3, '2022-01-07', 1),
(4, 5, '2022-01-08', 4),
(5, 6, '2022-01-08', 6),
(6, 7, '2022-01-09', 7),
(7, 8, '2022-01-10', 8),
(8, 9, '2022-01-11', 9),
(9, 10, '2022-01-12', 10),
(10, 11, '2022-01-13', 11),
(11, 12, '2022-01-14', 12),
(12, 13, '2022-01-15', 13),
(13, 14, '2022-01-16', 14),
(14, 15, '2022-01-17', 15);

Learnings
• Joining Tables: Using JOIN to combine data from two tables based on common order_id.
• Aggregating Data: Using SUM() to calculate the total items ordered and returned for each
customer.

1247
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Calculating Percentages: Computing the percentage of returned items relative to total


items ordered.
• Sorting and Limiting Results: Using ORDER BY and LIMIT to find the top N results.
• Rounding: Using ROUND() to ensure the percentage is rounded to two decimal places.

Solutions
• PostgreSQL solution
SELECT
o.customer_id,
ROUND((COALESCE(SUM(r.returned_items), 0) / SUM(o.total_items_ordered)) * 100, 2) AS
return_percentage
FROM orders o
LEFT JOIN returns r ON o.order_id = r.order_id
GROUP BY o.customer_id
ORDER BY return_percentage DESC
LIMIT 5;
• MySQL solution
SELECT
o.customer_id,
ROUND((IFNULL(SUM(r.returned_items), 0) / SUM(o.total_items_ordered)) * 100, 2) AS r
eturn_percentage
FROM orders o
LEFT JOIN returns r ON o.order_id = r.order_id
GROUP BY o.customer_id
ORDER BY return_percentage DESC
LIMIT 5;
• Q.1053
Question
Write a SQL query to report the sum of all total investment values in 2016 (tiv_2016) for all
policyholders who:
• Have the same tiv_2015 value as one or more other policyholders.
• Are not located in the same city as any other policyholder (i.e., the (lat, lon) attribute
pairs must be unique).
The result should be rounded to two decimal places.

Explanation
• Step 1: Identify policyholders with the same tiv_2015 value as other policyholders. This
can be done by using a GROUP BY on tiv_2015 and filtering with HAVING COUNT(*) > 1 to
only include tiv_2015 values that appear more than once.
• Step 2: Ensure that the policyholders are not located in the same city. This means that the
(lat, lon) pair must be unique. We can use GROUP BY on (lat, lon) and filter with
HAVING COUNT(*) = 1.
• Step 3: Join these two criteria and calculate the sum of tiv_2016 for each pid that
satisfies both conditions.
• Step 4: Use ROUND() to round the tiv_2016 values to two decimal places.

Datasets and SQL Schemas


-- Table creation
CREATE TABLE Insurance (
pid INT PRIMARY KEY,
tiv_2015 FLOAT,

1248
1000+ SQL Interview Questions & Answers | By Zero Analyst

tiv_2016 FLOAT,
lat FLOAT,
lon FLOAT
);

-- Sample data
INSERT INTO Insurance (pid, tiv_2015, tiv_2016, lat, lon) VALUES
(1, 10, 5, 10, 10),
(2, 20, 20, 20, 20),
(3, 10, 30, 20, 20),
(4, 10, 40, 40, 40);

Learnings
• Using GROUP BY and HAVING: We use GROUP BY to group by columns like tiv_2015 and
(lat, lon), and HAVING to apply conditions like COUNT(*) > 1 for duplicates.
• Joins: In this case, we need to check both the same tiv_2015 value and the unique (lat,
lon) pair, which may require self-joins or filtering after grouping.
• Rounding: We use the ROUND() function to round the tiv_2016 to two decimal places.

Solutions
• PostgreSQL solution
SELECT ROUND(SUM(i.tiv_2016), 2) AS total_tiv_2016
FROM Insurance i
JOIN (
SELECT tiv_2015
FROM Insurance
GROUP BY tiv_2015
HAVING COUNT(*) > 1
) dup_tiv ON i.tiv_2015 = dup_tiv.tiv_2015
JOIN (
SELECT lat, lon
FROM Insurance
GROUP BY lat, lon
HAVING COUNT(*) = 1
) unique_loc ON i.lat = unique_loc.lat AND i.lon = unique_loc.lon;
• MySQL solution
SELECT ROUND(SUM(i.tiv_2016), 2) AS total_tiv_2016
FROM Insurance i
JOIN (
SELECT tiv_2015
FROM Insurance
GROUP BY tiv_2015
HAVING COUNT(*) > 1
) dup_tiv ON i.tiv_2015 = dup_tiv.tiv_2015
JOIN (
SELECT lat, lon
FROM Insurance
GROUP BY lat, lon
HAVING COUNT(*) = 1
) unique_loc ON i.lat = unique_loc.lat AND i.lon = unique_loc.lon;
• Q.1054
Question
Write a SQL query to fix the names in the Users table so that only the first character is
uppercase and the rest are lowercase.
Return the result table ordered by user_id.

Explanation

1249
1000+ SQL Interview Questions & Answers | By Zero Analyst

To fix the names:


• Use the SQL function UPPER() to convert the first character of the name to uppercase.
• Use the SQL function LOWER() to convert the rest of the name to lowercase.
• The final name can be constructed by concatenating the uppercase first character with the
rest of the lowercase string.
• Order the result by user_id to meet the requirement.

Datasets and SQL Schemas


-- Table creation
CREATE TABLE Users (
user_id INT PRIMARY KEY,
name VARCHAR(255)
);

-- Sample data
INSERT INTO Users (user_id, name) VALUES
(1, 'aLice'),
(2, 'bOB');

Learnings
• String Manipulation: Using UPPER() and LOWER() to manipulate string data in SQL.
• Concatenation: Using the CONCAT() function (or a similar method) to merge strings, in
this case, for fixing the name format.
• Ordering Results: Using ORDER BY to sort the result by user_id.

Solutions
• PostgreSQL solution
SELECT
user_id,
INITCAP(name) AS name
FROM Users
ORDER BY user_id;
• MySQL solution
SELECT
user_id,
CONCAT(UPPER(SUBSTRING(name, 1, 1)), LOWER(SUBSTRING(name, 2))) AS name
FROM Users
ORDER BY user_id;

Explanation of Key Functions:


• INITCAP() (PostgreSQL): Automatically capitalizes the first letter and makes the rest
lowercase.
• CONCAT() and UPPER(), LOWER() (MySQL): Manually concatenate the uppercase first
letter with the rest of the lowercase name.
• Q.1055
Question
Write a SQL query to find the patient_id, patient_name, and conditions of the patients
who have Type I Diabetes. Type I Diabetes always starts with the DIAB1 prefix.
Return the result table in any order.

1250
1000+ SQL Interview Questions & Answers | By Zero Analyst

Explanation
• Condition Matching: We need to filter the rows where the conditions column contains a
code starting with the prefix DIAB1.
• String Matching: The LIKE operator can be used to check if the conditions column
contains DIAB1 at the beginning of any condition code.
• Handling Multiple Conditions: Since the conditions column can contain multiple codes
separated by spaces, we need to ensure that we correctly identify whether DIAB1 appears as
part of any condition.

Datasets and SQL Schemas


-- Table creation
CREATE TABLE Patients (
patient_id INT PRIMARY KEY,
patient_name VARCHAR(255),
conditions VARCHAR(255)
);

-- Sample data
INSERT INTO Patients (patient_id, patient_name, conditions) VALUES
(1, 'Daniel', 'YFEV COUGH'),
(2, 'Alice', ''),
(3, 'Bob', 'DIAB100 MYOP'),
(4, 'George', 'ACNE DIAB100'),
(5, 'Alain', 'DIAB201');

Learnings
• String Matching: Using LIKE to search for specific patterns in text data.
• Handling Multiple Words in a String: Since conditions are space-separated, LIKE can
match a prefix within the string.

Solutions
• PostgreSQL solution
SELECT
patient_id,
patient_name,
conditions
FROM Patients
WHERE conditions LIKE '%DIAB1%'
;
• MySQL solution
SELECT
patient_id,
patient_name,
conditions
FROM Patients
WHERE conditions LIKE '%DIAB1%'
;

Explanation:
• LIKE '%DIAB1%': The LIKE operator checks if the conditions field contains the string
DIAB1 anywhere, ensuring we capture any code starting with DIAB1.
Both solutions are identical since the query is simple and works in both PostgreSQL and
MySQL in the same way. The result will contain all the patients with any condition starting
with DIAB1.

1251
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Q.1056
Question
Write a SQL query to delete all duplicate emails from the Person table, keeping only the one
with the smallest id.

Explanation
• Identifying Duplicates: Duplicate emails can be identified by grouping the email column
and selecting the minimum id for each email.
• Deleting Duplicates: After identifying the smallest id for each email, we need to delete
rows where the id is greater than the minimum id for that email.
To do this, we can use a DELETE statement with a subquery. The subquery will select the
id of the rows that are not the smallest id for each email, and those rows will be deleted.

Datasets and SQL Schemas


-- Table creation
CREATE TABLE Person (
id INT PRIMARY KEY,
email VARCHAR(255)
);

-- Sample data
INSERT INTO Person (id, email) VALUES
(1, '[email protected]'),
(2, '[email protected]'),
(3, '[email protected]');

Learnings
• DELETE with a Subquery: A common approach to deleting rows based on conditions
related to other rows in the same table.
• GROUP BY and MIN(): To find the smallest id for each duplicate email, GROUP BY is
used with the MIN() function.
• Subquery: Used to find all id values that should be retained, and delete all others.

Solutions
• PostgreSQL/MySQL solution
DELETE FROM Person
WHERE id NOT IN (
SELECT MIN(id)
FROM Person
GROUP BY email
);

Explanation:
• Subquery:
• SELECT MIN(id) FROM Person GROUP BY email finds the smallest id for each unique
email.
• Main Query:

1252
1000+ SQL Interview Questions & Answers | By Zero Analyst

• DELETE FROM Person WHERE id NOT IN (...) deletes rows whose id is not the
smallest for their email.
This query will keep only the row with the smallest id for each email and delete all other
rows with the same email.
• Q.1057
Question
Write a SQL query to convert the first letter of each word in content_text to uppercase,
while keeping the rest of the letters lowercase. Return the original text and the modified text
in two columns.

Explanation
• Splitting the Text into Words: We need to split the content_text into individual words,
process each word to capitalize the first letter, and then reassemble the sentence.
• Uppercase and Lowercase: The function UPPER(LEFT(word, 1)) is used to capitalize
the first letter of each word, while LOWER(RIGHT(word, LENGTH(word)-1)) ensures that the
rest of the word is in lowercase.
• Reassemble the Sentence: After modifying each word, the words should be concatenated
back together with a space separator.
• Final Output: The result will have two columns—one with the original content and the
other with the modified content.

Datasets and SQL Schemas


-- Table creation
CREATE TABLE user_content
(
content_id INT PRIMARY KEY,
customer_id INT,
content_type VARCHAR(50),
content_text VARCHAR(255)
);

-- Sample data
INSERT INTO user_content
(
content_id,
customer_id,
content_type,
content_text
)
VALUES
(1, 2, 'comment', 'hello world! this is a TEST.'),
(2, 8, 'comment', 'what a great day'),
(3, 4, 'comment', 'WELCOME to the event.'),
(4, 2, 'comment', 'e-commerce is booming.'),
(5, 6, 'comment', 'Python is fun!!'),
(6, 6, 'review', '123 numbers in text.'),
(7, 10, 'review', 'special chars: @#$$%^&*()'),
(8, 4, 'comment', 'multiple CAPITALS here.'),
(9, 6, 'review', 'sentence. and ANOTHER sentence!'),
(10, 2, 'post', 'goodBYE!');

Learnings
• String Manipulation: Using UPPER() and LOWER() functions to adjust case.

1253
1000+ SQL Interview Questions & Answers | By Zero Analyst

• Text Processing: Handling text split and reassembly using functions like
STRING_TO_ARRAY(), UNNEST(), and STRING_AGG().
• Aggregating Data: Using STRING_AGG() to concatenate processed words back together.

Solutions
• PostgreSQL Solution
WITH t1 AS (
SELECT
content_id,
content_text as original_content,
UNNEST(STRING_TO_ARRAY(content_text, ' ')) as word
FROM user_content
),
t2 AS (
SELECT
content_id,
original_content,
STRING_AGG(
CONCAT(UPPER(LEFT(word, 1)), LOWER(RIGHT(word, LENGTH(word)-1))),
' '
) as modified_content
FROM t1
GROUP BY content_id, original_content
ORDER BY content_id
)
SELECT
original_content,
modified_content
FROM t2;
• MySQL Solution
-- MySQL does not have direct functions like STRING_TO_ARRAY and STRING_AGG,
-- so we need to process the text manually.
SELECT
content_text AS original_content,
-- Concatenate words, capitalizing the first letter of each word
TRIM(
CONCAT(
UPPER(SUBSTRING(content_text, 1, 1)),
LOWER(SUBSTRING(content_text, 2))
)
) AS modified_content
FROM user_content;
• Q.1058
Question
You are given two tables:
• Transactions – Stores transaction details (ID, customer ID, date, and amount).
• Customers – Stores customer details (ID and name).
Find the average transaction amount for each customer who made more than 5 transactions in
September 2023.

Explanation
To solve this, we need to:
• Filter transactions that occurred in September 2023.
• Count the number of transactions per customer during that period.
• Find customers with more than 5 transactions in September 2023.
• Calculate the average transaction amount for those customers.

1254
1000+ SQL Interview Questions & Answers | By Zero Analyst

Datasets and SQL Schemas


Table Creation:
CREATE TABLE Transactions (
transaction_id INT PRIMARY KEY,
customer_id INT,
transaction_date DATE,
amount DECIMAL(10, 2)
);

CREATE TABLE Customers (


customer_id INT PRIMARY KEY,
customer_name VARCHAR(100)
);

Sample Data Insertion:


INSERT INTO Transactions (transaction_id, customer_id, transaction_date, amount) VALUES
(1, 1, '2023-09-01', 100.50),
(2, 2, '2023-09-01', 200.75),
(3, 1, '2023-09-05', 150.00),
(4, 3, '2023-09-10', 300.20),
(5, 2, '2023-09-15', 250.30),
(6, 1, '2023-09-20', 180.00),
(7, 3, '2023-09-21', 400.00),
(8, 1, '2023-09-25', 170.75),
(9, 1, '2023-09-28', 160.25),
(10, 1, '2023-09-30', 190.60);

INSERT INTO Customers (customer_id, customer_name) VALUES


(1, 'Alice'),
(2, 'Bob'),
(3, 'Charlie');

Learnings
• Use COUNT() with GROUP BY to count the number of transactions per customer.
• Filter transactions within the month of September 2023 using WHERE and BETWEEN.
• Use HAVING to filter customers with more than 5 transactions.
• Join the Transactions and Customers tables to get the customer's name along with their
average transaction amount.

Solutions
PostgreSQL Solution:
SELECT c.customer_name, AVG(t.amount) AS average_amount
FROM Transactions t
JOIN Customers c ON t.customer_id = c.customer_id
WHERE t.transaction_date BETWEEN '2023-09-01' AND '2023-09-30'
GROUP BY c.customer_name
HAVING COUNT(t.transaction_id) > 5;

MySQL Solution:
SELECT c.customer_name, AVG(t.amount) AS average_amount
FROM Transactions t
JOIN Customers c ON t.customer_id = c.customer_id
WHERE t.transaction_date BETWEEN '2023-09-01' AND '2023-09-30'
GROUP BY c.customer_name
HAVING COUNT(t.transaction_id) > 5;
• Q.1059
Question
Write an SQL query to find all dates' ID with a higher temperature compared to its previous
date (yesterday).

1255
1000+ SQL Interview Questions & Answers | By Zero Analyst

Explanation
To solve this:
• Use the LAG() window function to get the temperature from the previous day.
• Compare the current day's temperature with the previous day's temperature.
• Filter the results where the current day's temperature is greater than the previous day's
temperature.

Datasets and SQL Schemas


Table Creation:
CREATE TABLE IF NOT EXISTS Weather (
Id INT,
RecordDate DATE,
Temperature INT
);

Sample Data Insertion:


INSERT INTO Weather (Id, RecordDate, Temperature) VALUES
(1, '2015-01-01', 10),
(2, '2015-01-02', 25),
(3, '2015-01-03', 20),
(4, '2015-01-04', 30);

Learnings
• LAG() window function allows you to access the previous row's value.
• WHERE clause is used to filter for cases where today's temperature is greater than
yesterday's.
• ORDER BY within OVER() ensures the data is ordered by date to properly compare
consecutive rows.

Solutions
PostgreSQL Solution:
WITH weather_data AS (
SELECT *,
LAG(temperature, 1) OVER(ORDER BY recorddate) AS prev_day_temp
FROM weather
)
SELECT id
FROM weather_data
WHERE temperature > prev_day_temp;

MySQL Solution:
WITH weather_data AS (
SELECT *,
LAG(temperature, 1) OVER(ORDER BY recorddate) AS prev_day_temp
FROM weather
)
SELECT id
FROM weather_data
WHERE temperature > prev_day_temp;
• Q.1060

Question

1256
1000+ SQL Interview Questions & Answers | By Zero Analyst

Write an SQL query that reports for every date within at most 90 days from today, the
number of users that logged in for the first time on that date. Assume today is 2019-06-30.
Note that we only care about dates with non-zero user count.

Explanation
We need to identify users who logged in for the first time on a given date within the last 90
days from today (2019-06-30). This can be achieved by checking the earliest "login" date for
each user and counting users who logged in on each distinct date.

Datasets and SQL Schemas


Table Creation:
CREATE TABLE IF NOT EXISTS traffic (
user_id INT,
activity VARCHAR(25),
activity_date DATE
);

Sample Data Insertion:


INSERT INTO traffic (user_id, activity, activity_date) VALUES
(1, 'login', '2019-05-01'),
(1, 'homepage', '2019-05-01'),
(1, 'logout', '2019-05-01'),
(1, 'groups', '2019-05-02'),
(1, 'login', '2019-06-15'),
(1, 'logout', '2019-06-15'),
(1, 'login', '2019-06-25'),
(1, 'homepage', '2019-06-25'),
(2, 'login', '2019-06-21'),
(2, 'logout', '2019-06-21'),
(2, 'jobs', '2019-06-22'),
(3, 'login', '2019-01-01'),
(3, 'jobs', '2019-01-01'),
(3, 'logout', '2019-01-01'),
(3, 'homepage', '2019-06-25'),
(4, 'login', '2019-06-21'),
(4, 'groups', '2019-06-21'),
(4, 'logout', '2019-06-21'),
(4, 'homepage', '2019-06-23'),
(5, 'login', '2019-03-01'),
(5, 'logout', '2019-03-01'),
(5, 'login', '2019-06-21'),
(5, 'logout', '2019-06-21'),
(5, 'jobs', '2019-06-24');

Learnings
• Identifying first-time logins requires filtering out older activities and only considering the
earliest "login" event.
• Use GROUP BY and MIN() to extract the first login date for each user.
• Use COUNT() and HAVING to count non-zero results.

1257
1000+ SQL Interview Questions & Answers | By Zero Analyst

Solutions
PostgreSQL Solution:
WITH first_logins AS (
SELECT user_id, MIN(activity_date) AS first_login_date
FROM traffic
WHERE activity = 'login'
GROUP BY user_id
)
SELECT first_login_date, COUNT(user_id) AS user_count
FROM first_logins
WHERE first_login_date BETWEEN '2019-06-01' AND '2019-06-30'
GROUP BY first_login_date
HAVING COUNT(user_id) > 0
ORDER BY first_login_date;

MySQL Solution:
WITH first_logins AS (
SELECT user_id, MIN(activity_date) AS first_login_date
FROM traffic
WHERE activity = 'login'
GROUP BY user_id
)
SELECT first_login_date, COUNT(user_id) AS user_count
FROM first_logins
WHERE first_login_date BETWEEN '2019-06-01' AND '2019-06-30'
GROUP BY first_login_date
HAVING COUNT(user_id) > 0
ORDER BY first_login_date;

1258
1000+ SQL Interview Questions & Answers | By Zero Analyst

SQL Cheat sheet - 100+ SQL Most Used


Commands

🔗 Access here — https://tinyurl.com/sql-cheatsheet100


Scan Below Code to get the SQL Cheat sheet

1259
1000+ SQL Interview Questions & Answers | By Zero Analyst

Thank You for Reading!

Thank You for Reading!


Congratulations on completing "1000+ SQL Interview Questions and Answers"! I hope
this book has been a valuable stepping stone in your journey toward mastering SQL and
excelling in your data career.

Stay Connected:
• LinkedIn: N. H.
• GitHub: 1000 SQL Questions & Answers
• Instagram: @Zero_Analyst
• YouTube: Zero Analyst

About the Author:


N.H.
Founder of Zero Analyst, a vibrant data community with over 75K followers on Instagram
and 21K subscribers on YouTube.
With 5+ years of experience in the data domain, I have had the privilege of mentoring over
2000 students worldwide, helping them land their dream data jobs and thrive in their
careers.

Your Next Steps:


• Practice More: Dive into real-world datasets and projects.
• Stay Curious: Continue exploring advanced topics and solving SQL challenges.
• Engage with the Community: Share your progress and learnings on social platforms.

Acknowledgments:
A heartfelt thank you to every reader, student, and follower who has been part of my journey.
Your dedication to learning and growing fuels my passion for creating resources like this
book.

Final Words of Advice:


Consistency and curiosity are the keys to success. Keep challenging yourself, stay connected
with the data community, and never stop learning. Your dream data job is closer than you
think—go for it!
With gratitude,
N.H.

© 2025 Zero Analyst(N.H.)


ISBN: 9798306737812 (Independently published)

1260
1000+ SQL Interview Questions & Answers | By Zero Analyst

1261

You might also like