Question 1
/* 1. You have two tables: Product and Supplier.
- Product Table Columns: Product_id, Product_Name, Supplier_id, Price
- Supplier Table Columns: Supplier_id, Supplier_Name, Country
*/
-- Write an SQL query to find the name of the product with the highest
-- price in each country
– Using window functions and Subquery we can do it
SELECT
product_name,
price,
country
FROM
(SELECT
s.country,
p.product_name,
p.price,
ROW_NUMBER() OVER(PARTITION BY s.country ORDER BY p.price
DESC) as rn
FROM products as p
JOIN suppliers as s
ON s.supplier_id = p.supplier_id) x1 -- using alias for the query
WHERE rn = 1
Question 2
Write a SQL query to retrieve the IDs of the Facebook pages that have zero likes.
The output should be sorted in ascending order based on the page IDs.
-- Question 2 link :: https://datalemur.com/questions/sql-page-with-no-likes
-- My Solution
SELECT p.page_id
FROM pages p
LEFT JOIN page_likes pl ON p.page_id = pl.page_id
GROUP BY p.page_id
HAVING COUNT(pl.page_id) = 0
ORDER BY p.page_id ASC
Write a query to calculate the click-through rate (CTR) for the app in 2022 and round the results
to 2 decimal places.
Definition and note:
Percentage of click-through rate (CTR) = 100.0 * Number of clicks / Number of impressions
To avoid integer division, multiply the CTR by 100.0, not 100.
Expected Output Columns: app_id, ctr
Question 2 Link :: https://datalemur.com/questions/click-through-rate
-- SQL query to calculate the click-through rate (CTR)
SELECT
app_id,
ROUND((100.0 * SUM(CASE WHEN event_type = 'click' THEN 1 ELSE 0 END) / COUNT(*)),
2) AS ctr
FROM
events
WHERE
YEAR(timestamp) = 2022
GROUP BY
app_id;
Question 3
Write an SQL query to calculate the difference
between the highest salaries
in the marketing and engineering department.
Output the absolute difference in salaries.
SELECT
ABS(MAX(CASE WHEN department = 'Marketing' THEN salary END) -
MAX(CASE WHEN department = 'Engineering' THEN salary END)) as salary_diff
FROM Salaries;
Question 4
Customer Segmentation Problem:
You have two tables: customers and orders.
customers table has columns:
customer_id, customer_name, age, gender.
orders table has columns:
order_id, customer_id, order_date, total_amount.
Write an SQL query to find the average order amount
for male and female customers separately
return the results with 2 DECIMAL.
SELECT
c.gender as gender_name,
ROUND(avg(o.total_amount), 2) as avg_spent
FROM customers as c
JOIN orders as o
ON c.customer_id = o.customer_id
GROUP BY gender_name
Question 5
Write a SQL query to obtain the third transaction of every user.
Output the user id, spend, and transaction date.
SELECT
user_id,
spend,
transaction_date
FROM
(
SELECT
user_id,
spend,
transaction_date,
ROW_NUMBER() OVER(PARTITION BY user_id ORDER BY transaction_date) as rn
FROM transactions) x1
WHERE rn = 3
Question 6
Write a query that calculates the total viewership for laptops and mobile devices,
where mobile is defined as the sum of tablet and phone viewership. Output the total
viewership for laptops as laptop_views and the total viewership for mobile devices as
mobile_views.
SELECT
SUM(CASE WHEN device_type = 'laptop' THEN viewership_count ELSE 0 END) AS
laptop_views,
SUM(CASE WHEN device_type IN ('tablet', 'phone') THEN viewership_count ELSE 0 END)
AS mobile_views
FROM viewership
Question 7
Write a query to identify the top two highest-grossing products
within each category in the year 2022. Output should include the category,
product, and total spend.
SELECT
category,
product,
total_spend
FROM (
SELECT
category,
product,
SUM(spend) as total_spend,
RANK() OVER(PARTITION BY category ORDER BY SUM(spend) DESC) rn
FROM product_spend
WHERE EXTRACT(YEAR FROM transaction_date) = '2022'
GROUP BY 1, 2 ) x1
WHERE rn <= 2
Question 8
Write a query to obtain a histogram of tweets posted per user in 2022.
Output the tweet count per user as the bucket and the number of Twitter users who fall into that
bucket.
SELECT
num_post,
COUNT(user_id) as num_user
FROM
(
SELECT
user_id,
COUNT(tweet_id) as num_post
FROM tweets
WHERE EXTRACT(YEAR FROM tweet_date) = '2022'
GROUP BY user_id
)x1
GROUP BY num_post
Question 9
Find Department's Top three
salaries in each department
SELECT
department_name,
emp_name,
salary
FROM (
SELECT
d.name as department_name,
e.name as emp_name,
e.salary as salary,
DENSE_RANK() OVER(PARTITION BY d.name ORDER BY e.salary
DESC) drn
FROM employee as e
JOIN
department as d
ON e.departmentid = d.id) x1
WHERE drn <= 3
Question 10
Write an SQL query to find for each month and country,
the number of transactions and their total amount,
the number of approved transactions and their total amount.
Return the result table in in below order.RANGE
SELECT
TO_CHAR(trans_date, 'YYYY-MM') as month,
country,
COUNT(1) as trans_count,
SUM(CASE WHEN state='approved' THEN 1 ELSE 0 END) as approved_count,
SUM(amount) as trans_total_amount,
SUM(CASE WHEN state= 'approved' THEN amount ELSE 0 END) as
approved_total_amount
FROM transactions
GROUP BY 1, 2
Question 11
Question:: Given the reviews table, write a query to retrieve
the average star rating for each product, grouped by month.
The output should display the month as a numerical value,
product ID, and average star rating rounded to two decimal places.
Sort the output first by month and then by product ID.
SELECT * FROM reviews;
-- month by each product and their avg rating
SELECT
EXTRACT(MONTH FROM submit_date) as month,
product_id,
ROUND(AVG(stars), 2) as avg_rating
FROM reviews
GROUP BY month, product_id
ORDER BY month, product_id
Question 12
SQL Question 1: Identify IBM's High Capacity Users
CREATE TABLE purchases (
purchase_id INT PRIMARY KEY,
user_id INT,
date_of_purchase TIMESTAMP,
product_id INT,
amount_spent DECIMAL(10, 2)
);
SQL Question:
Identify users who have made purchases
totaling more than $10,000 in the last month
from the purchases table.
The table contains information about purchases,
including the user ID, date of purchase, product ID,
and amount spent.
SELECT user_id, SUM(amount_spent) AS total_spent
FROM purchases
WHERE date_of_purchase >= DATE_TRUNC('month', CURRENT_DATE) - INTERVAL '1
month'
AND date_of_purchase < DATE_TRUNC('month', CURRENT_DATE)
GROUP BY user_id
HAVING SUM(amount_spent) > 10000;
Average Duration of Employee's Service
Given the data on IBM employees, can you find the average duration
of service for employees across different departments?
The Duration of service is represented as end_date - start_date.
If end_date is NULL, consider it as the current date.
CREATE TABLE employee_service (
employee_id INT PRIMARY KEY,
name VARCHAR(50),
start_date DATE,
end_date DATE,
department VARCHAR(50)
)
SELECT department,
AVG(COALESCE(end_date, CURRENT_DATE) - start_date) AS avg_service_duration
FROM employee_service
GROUP BY department;
Question 13
Question: Identify the top 3 posts with the highest engagement
(likes + comments) for each user on a Facebook page. Display
the user ID, post ID, engagement count, and rank for each post.
CREATE TABLE fb_posts (
post_id INT PRIMARY KEY,
user_id INT,
likes INT,
comments INT,
post_date DATE
)
WITH rank_posts
AS (
SELECT
user_id,
post_id,
SUM(likes + comments) as engagement_count,
ROW_NUMBER() OVER(PARTITION BY user_id ORDER BY SUM(likes +
comments) DESC) rn,
DENSE_RANK() OVER(PARTITION BY user_id ORDER BY SUM(likes +
comments) DESC) ranks
FROM fb_posts
GROUP BY user_id, post_id
ORDER BY user_id, engagement_count DESC
)
SELECT
user_id,
post_id,
engagement_count,
ranks
FROM rank_posts
WHERE rn <=3
-- Q.2
Determine the users who have posted more than 2 times
in the past week and calculate the total number of likes
they have received. Return user_id and number of post and no of likes
CREATE TABLE posts (
post_id INT PRIMARY KEY,
user_id INT,
likes INT,
post_date DATE
);
SELECT
user_id,
SUM(likes) as total_likes,
COUNT(post_id) as cnt_post
FROM posts
WHERE post_date >= CURRENT_DATE - 7 AND
post_date < CURRENT_DATE
GROUP BY user_id
HAVING COUNT(post_id)
Question 14
-- Q.1 LinkedIn Data Analyst Interview question
Assume you're given a table containing job postings
from various companies on the LinkedIn platform.
Write a query to retrieve the count of companies
that have posted duplicate job listings.
Definition:
Duplicate job listings are defined as two job listings
within the same company that share identical titles and descriptions.
CREATE TABLE job_listings (
job_id INTEGER PRIMARY KEY,
company_id INTEGER,
title TEXT,
description TEXT
)
SELECT
COUNT(1) as cnt_company
FROM
(SELECT
company_id,
title,
description,
COUNT(1) as total_job
FROM job_listings
GROUP BY 1, 2, 3
HAVING COUNT(1) > 1
)x1
Question 15
Identify the region with the lowest sales amount for the previous month.
return region name and total_sale amount.
-- region and sum sale
-- filter last month
-- lowest sale region
CREATE TABLE Sales (
SaleID SERIAL PRIMARY KEY,
Region VARCHAR(50),
Amount DECIMAL(10, 2),
SaleDate DATE
)
SELECT
region,
SUM(amount) as total_sales
FROM sales
WHERE EXTRACT(MONTH FROM saledate) = EXTRACT(MONTH FROM CURRENT_DATE -
INTERVAL '1 month')
AND EXTRACT(YEAR FROM saledate) = EXTRACT(YEAR FROM CURRENT_DATE)
GROUP BY region
ORDER BY total_sales ASC
LIMIT 1
Question 16
Find the median within a series of numbers in SQL
WITH CTE
AS (
SELECT
views,
ROW_NUMBER() OVER( ORDER BY views ASC) rn_asc,
ROW_NUMBER() OVER( ORDER BY views DESC) rn_desc
FROM tiktok
WHERE views < 900
)
SELECT
AVG(views) as median
FROM CTE
WHERE ABS(rn_asc - rn_desc) <= 1
Question 17
-- How many delayed orders does each delivery partner have,
considering the predicted delivery time and the actual delivery time?
CREATE TABLE order_details (
order_id INT,
del_partner VARCHAR(255),
predicted_time TIMESTAMP,
delivery_time TIMESTAMP
)
-- My solution
-- del_partner delayed orders cnt
-- delayed order means del_time > pred_del_time
SELECT
del_partner,
COUNT(order_id) as cnt_delayed_orders
FROM order_details
WHERE
predicted_time < delivery_time
GROUP BY del_partne
Question 18
Which metro city had the highest number of restaurant orders in September 2021?
Write the SQL query to retrieve the city name and the total count of orders,
ordered by the total count of orders in descending order.
-- Note metro cites are 'Delhi', 'Mumbai', 'Bangalore', 'Hyderabad'
-- Create the Table
CREATE TABLE restaurant_orders (
city VARCHAR(50),
restaurant_id INT,
order_id INT,
order_date DATE
)
-- city name
-- total orders
-- filter metro
-- grouyp by city
-- 1
SELECT
city,
count(order_id) as total_orders
FROM restaurant_orders
WHERE city IN ('Delhi', 'Mumbai', 'Bangalore', 'Hyderabad')
AND order_date BETWEEN '2021-09-01' AND '2021-09-30'
GROUP BY city
ORDER BY total_orders DESC
LIMIT 1
Question 19
-- Get the count of distint student that are not unique
CREATE TABLE student_names(
student_id INT,
name VARCHAR(50)
);
-- Insert the records
INSERT INTO student_names (student_id, name) VALUES
(1, 'RAM'),
(2, 'ROBERT'),
(3, 'ROHIM'),
(4, 'RAM'),
(5, 'ROBERT')
SELECT
COUNT(*) as distint_student_cnt
FROM
(
SELECT name,
COUNT(name)
FROM student_names
GROUP BY name
HAVING COUNT(name) = 1
) as subquery
Question 20
Find city wise customers count who have placed
more than three orders in November 2023.
CREATE TABLE zomato_orders(
order_id INT PRIMARY KEY,
customer_id INT,
order_date DATE,
price FLOAT,
city VARCHAR(25)
)
SELECT
city,
COUNT(1) total_customer_count
FROM (
SELECT
city,
customer_id as customer,
COUNT(1) as total_orders
FROM zomato_orders
WHERE order_date BETWEEN '2023-11-01' AND '2023-11-30'
GROUP BY city, customer_id
HAVING COUNT(1)> 3
) sub_query
GROUP BY city
Question 21
Find the top-performing two months
by revenue for each hotel for each year.
return hotel_id, year, month, revenue
CREATE TABLE hotel_revenue (
hotel_id INT,
month VARCHAR(10),
year INT,
revenue DECIMAL(10, 2)
)
Find the top-performing two months
by revenue for each hotel for each year.
return hotel_id, year, month, revenue
-- hotel_id, year, month, revenue
-- ranking based on revenue
-- filter top 2 month for each hotel in each year
WITH CTE1
AS
(
SELECT
hotel_id,
year,
month,
revenue,
DENSE_RANK() OVER(PARTITION BY hotel_id, year ORDER BY revenue
DESC) drn
FROM hotel_revenue
)
SELECT
hotel_id,
year,
month,
revenue
FROM CTE1
WHERE drn <= 2
Question 22
Write a SQL query to retrieve the emp_id, emp_name, and manager_name
from the given employee table.
It's important to note that managers are also employees in the table.
Employees table has 3 COLUMNS
-- emp_id, emp_name, maneger_id
CREATE TABLE employees (
emp_id INT PRIMARY KEY,
emp_name VARCHAR(255),
manager_id INT,
FOREIGN KEY (manager_id) REFERENCES employees(emp_id)
);
-- Inserting records into the employees table
INSERT INTO employees (emp_id, emp_name, manager_id) VALUES
(1, 'John Doe', NULL), -- John Doe is the manager-- -----------------------
-- My Solution
-- -----------------------
-- emp_id,
-- emp_name,
-- manager_name based on manager id
SELECT
e1.emp_id,
e1.emp_name,
e1.manager_id,
e2.emp_name as manager_name
FROM employees as e1
CROSS JOIN
employees as e2
WHERE e1.manager_id = e2.emp_id
-- approach 2
SELECT
e1.emp_id,
e1.emp_name,
e1.manager_id,
e2.emp_name as manager_name
FROM employees as e1
LEFT JOIN
employees as e2
ON e1.manager_id = e2.emp_id
WHERE e1.manager_id IS NOT NULL
(2, 'Jane Smith', 1), -- Jane Smith reports to John Doe
(3, 'Alice Johnson', 1), -- Alice Johnson reports to John Doe
(4, 'Bob Williams', 2), -- Bob Williams reports to Jane Smith
(5, 'Charlie Brown', 2), -- Charlie Brown reports to Jane Smith
(6, 'David Lee', 3), -- David Lee reports to Alice Johnson
(7, 'Emily Davis', 3), -- Emily Davis reports to Alice Johnson
(8, 'Fiona Clark', 4), -- Fiona Clark reports to Bob Williams
(9, 'George Turner', 4), -- George Turner reports to Bob Williams
(10, 'Hannah Baker', 5), -- Hannah Baker reports to Charlie Brown
(11, 'Isaac White', 5), -- Isaac White reports to Charlie Brown
(12, 'Jessica Adams', 6), -- Jessica Adams reports to David Lee
(13, 'Kevin Harris', 6); -- Kevin Harris reports to David Lee
Question 23
Given the employee table with columns EMP_ID and SALARY,
write an SQL query to find all salaries greater than the average salary.
return emp_id and salary
CREATE TABLE employee (
EMP_ID INT PRIMARY KEY,
SALARY DECIMAL(10, 2)
)
SELECT
emp_id,
salary
FROM employee
WHERE salary < (SELECT AVG(salary) FROM employee)
Question 24
Consider a table named customers with the following columns:
customer_id, first_name, last_name, and email.
Write an SQL query to find all the duplicate email addresses
in the customers table.
SELECT
email
-- COUNT(email) as cnt_frequency
FROM customers
GROUP BY email
HAVING COUNT(email) > 1
Question 25
Question:
Write a SQL query to calculate the running
total revenue for each combination of date and product ID.
Expected Output Columns:
date, product_id, product_name, revenue, running_total
ORDER BY product_id, date ascending
CREATE TABLE orders (
date DATE,
product_id INT,
product_name VARCHAR(255),
revenue DECIMAL(10, 2)
)
SELECT
o1.date,
o1.product_id,
o1.product_name,
o1.revenue,
SUM(o2.revenue) as running_total
FROM orders as o1
JOIN
orders as o2
ON
o1.product_id = o2.product_id
AND
o1.date >= o2.date
GROUP BY
o1.date,
o1.product_id,
o1.product_name,
o1.revenue
ORDER BY
o1.product_id, o1.date
Question 26
Suppose you are given two tables - Orders and Returns.
The Orders table contains information about orders placed by customers,
and the Returns table contains information about returned items.
Design a SQL query to
find the top 5 ustomer with the highest percentage
of returned items out of their total orders.
Return the customer ID
and the percentage of returned items rounded to two decimal places.
*/
-- customer_id,
-- total_items_ordered by each cx
-- total_items_returned by each cx
-- 2/4*100 50% total_items_returned/total_items_ordered*100
CREATE TABLE orders (
order_id INT,
customer_id INT,
order_date DATE,
total_items_ordered INT
)
CREATE TABLE returns (
return_id INT,
order_id INT,
return_date DATE,
returned_items INT
)
WITH orders_cte
AS
(
SELECT
customer_id,
SUM(total_items_ordered) as total_items_ordered
FROM orders
GROUP BY customer_id
),
return_cte
As
(
SELECT
o.customer_id,
SUM(r.returned_items) as total_items_returned
FROM returns as r
JOIN
orders as o
ON r.order_id = o.order_id
GROUP BY
o.customer_id
)
SELECT
oc.customer_id,
oc.total_items_ordered,
rc.total_items_returned,
ROUND(CASE
WHEN oc.total_items_ordered > 0 THEN
(rc.total_items_returned::float/oc.total_items_ordered::float)*100
ELSE 0 END::numeric ,2) as return_percentage
FROM orders_cte as oc
JOIN
return_cte rc
ON oc.customer_id = rc.customer_id
ORDER BY return_percentage DESC
LIMIT 5;
Question 27
-- Question: Write an SQL query to fetch user IDs that have only bought both 'Burger' and 'Cold
Drink' items.
-- Expected Output Columns: user_id
CREATE TABLE orders (
user_id INT,
item_ordered VARCHAR(512)
)
SELECT
user_id
-- COUNT(DISTINCT item_ordered)
FROM orders
GROUP BY user_id
HAVING COUNT(DISTINCT item_ordered) = 2
AND
SUM(CASE WHEN item_ordered IN ('Burger', 'Cold Drink')
THEN 1
ELSE 0
END
)=2
Question 28
-- Given two tables, orders and return, containing sales and returns data for Amazon's
write a SQL query to find the top 3 sellers with the highest sales amount
but the lowest lowest return qty.
CREATE TABLE orders (
order_id INT PRIMARY KEY,
seller_id INT,
sale_amount DECIMAL(10, 2)
);
WITH orders_cte
AS
(
SELECT
seller_id,
SUM(sale_amount) as total_sales
FROM orders
GROUP BY seller_id
),
returns_cte
AS
(
SELECT
seller_id,
SUM(return_quantity) as total_return_qty
FROM returns
GROUP BY seller_id
)
SELECT
orders_cte.seller_id as seller_id,
orders_cte.total_sales as total_sale_amt,
COALESCE(returns_cte.total_return_qty, 0) as total_return_qty
FROM orders_cte
LEFT JOIN
returns_cte
ON orders_cte.seller_id = returns_cte.seller_id
ORDER BY total_sale_amt DESC, total_return_qty ASC
LIMIT 3
Question 29
Write a solution to select the product id, year, quantity,
and price for the first year of every product sold.
CREATE TABLE Sales (
sale_id INT,
product_id INT,
year INT,
quantity INT,
price INT
)
SELECT
product_id,
first_year,
quantity,
price
FROM (
SELECT
product_id,
year as first_year,
quantity,
price,
RANK() OVER(PARTITION BY product_id ORDER BY year) as rn
FROM sales
) subquery
WHERE rn = 1
Question 30
Write a SQL query to find the top 10 most popular songs by total number of listens.
You have two tables: Songs (containing song_id, song_name,
and artist_name) and Listens (containing listen_id, user_id, song_id, and listen_date).
CREATE TABLE Songs (
song_id INT PRIMARY KEY,
song_name VARCHAR(255),
artist_name VARCHAR(255)
);
CREATE TABLE Listens (
listen_id INT PRIMARY KEY,
user_id INT,
song_id INT,
listen_date DATE,
FOREIGN KEY (song_id) REFERENCES Songs(song_id)
);
SELECT
song_name,
times_of_listens,
DENSE_RANK() OVER (ORDER BY times_of_listens DESC) AS rank
FROM
(SELECT
s.song_name,
COUNT(l.listen_id) AS times_of_listens
FROM
Songs s
JOIN
Listens l ON s.song_id = l.song_id
GROUP BY
s.song_name) AS sub
ORDER BY
times_of_listens DESC
LIMIT 10;