GDS_SQL_Class_2_Assignment
GDS_SQL_Class_2_Assignment
Write an SQL query to report the name, population, and area of the big countries.
Return the result table in any order.
The query result format is in the following example.
Input:
World table:
name continent area population gdp
Afghanistan Asia 652230 25500100 20343000000
Albania Europe 28748 2831741 12960000000
Algeria Africa 2381741 37100000 188681000000
Andorra Europe 468 78115 3712000000
Angola Africa 1246700 20609294 100990000000
Output:
name population area
Afghanistan 25500100 652230
Algeria 37100000 2381741
Q52.
Table: Customer
Column Name Type
id int
name varchar
referee_id int
Write an SQL query to report the names of the customer that are not referred by the customer with id
= 2.
Return the result table in any order.
The query result format is in the following example.
Input:
Customer table:
id name referee_id
1 Will null
2 Jane null
3 Alex 2
4 Bill null
5 Zack 1
6 Mark 2
Output:
name
Will
Jane
Bill
Zack
Q53.
Table: Customers
Column Name Type
id int
name varchar
Table: Orders
Column Name Type
id int
customerId int
Write an SQL query to report all customers who never order anything.
Return the result table in any order.
The query result format is in the following example.
Input:
Customers table:
id name
1 Joe
2 Henry
3 Sam
4 Max
Orders table:
id customerId
1 3
2 1
Output:
Customers
Henry
Max
Q54.
Table: Employee
Column Name Type
employee_id int
team_id int
Write an SQL query to find the team size of each of the employees.
Return result table in any order.
The query result format is in the following example.
Input:
Employee Table:
employee_id team_id
1 8
2 8
3 8
4 7
5 9
6 9
Output:
employee_id team_size
1 3
2 3
3 3
4 1
5 2
6 2
Explanation:
Employees with Id 1,2,3 are part of a team with team_id = 8.
Employee with Id 4 is part of a team with team_id = 7.
Employees with Id 5,6 are part of a team with team_id = 9.
Q55
Table Person:
Column Name Type
id int
name varchar
phone_number varchar
Table Country:
Column Name Type
name varchar
country_code varchar
Table Calls:
Column Name Type
caller_id int
callee_id int
duration int
A telecommunications company wants to invest in new countries. The company intends to invest in
the countries where the average call duration of the calls in this country is strictly greater than the
global average call duration.
Write an SQL query to find the countries where this company can invest.
Return the result table in any order.
The query result format is in the following example.
Input:
Person table:
id name phone_number
3 Jonathan 051-1234567
12 Elvis 051-7654321
1 Moncef 212-1234567
2 Maroua 212-6523651
7 Meir 972-1234567
9 Rachel 972-0011100
Country table:
name country_code
Peru 51
Israel 972
Morocco 212
Germany 49
Ethiopia 251
Calls table:
caller_id callee_id duration
1 9 33
2 9 4
1 2 59
3 12 102
3 12 330
12 3 5
7 9 13
7 1 3
9 7 1
1 7 7
Output:
country
Peru
Explanation:
The average call duration for Peru is (102 + 102 + 330 + 330 + 5 + 5) / 6 = 145.666667
The average call duration for Israel is (33 + 4 + 13 + 13 + 3 + 1 + 1 + 7) / 8 = 9.37500
The average call duration for Morocco is (33 + 4 + 59 + 59 + 3 + 7) / 6 = 27.5000
Global call duration average = (2 * (33 + 4 + 59 + 102 + 330 + 5 + 13 + 3 + 1 + 7)) / 20 = 55.70000
Since Peru is the only country where the average call duration is greater than the global average, it is
the only recommended country.
Q56.
Table: Activity
Column Name Type
player_id int
device_id int
event_date date
games_played int
Write an SQL query to report the device that is first logged in for each player.
Return the result table in any order.
The query result format is in the following example.
Input:
Activity table:
player_id device_id event_date games_played
1 2 2016-03-01 5
1 2 2016-05-02 6
2 3 2017-06-25 1
3 1 2016-03-02 0
3 4 2018-07-03 5
Output:
player_id device_id
1 2
2 3
3 1
Q57.
Table: Orders
Column Name Type
order_number int
customer_number int
Write an SQL query to find the customer_number for the customer who has placed the largest
number of orders.
The test cases are generated so that exactly one customer will have placed more orders than any
other customer.
The query result format is in the following example.
Input:
Orders table:
order_number customer_numbe
1 1
2 2
3 3
4 3
Output:
customer_number
3
Explanation:
The customer with number 3 has two orders, which is greater than either customer 1 or 2 because
each of them only has one order.
So the result is customer_number 3.
Follow up: What if more than one customer has the largest number of orders, can you find all the
customer_number in this case?
Q58.
Table: Cinema
Column Name Type
seat_id int
free bool
Write an SQL query to report all the consecutive available seats in the cinema.
Return the result table ordered by seat_id in ascending order.
The test cases are generated so that more than two seats are consecutively available.
The query result format is in the following example.
Input:
Cinema table:
seat_id free
1 1
2 0
3 1
4 1
5 1
Output:
seat_id
3
4
5
Q59.
Table: SalesPerson
Column Name Type
sales_id int
name varchar
salary int
commission_rate int
hire_date date
Table: Company
Column Name Type
com_id int
name varchar
city varchar
Table: Orders
Column Name Type
order_id int
order_date date
com_id int
sales_id int
amount int
Write an SQL query to report the names of all the salespersons who did not have any orders related to
the company with the name "RED".
Return the result table in any order.
The query result format is in the following example.
Input:
SalesPerson table:
sales_id name salary commission_rate hire_date
1 John 100000 6 4/1/2006
2 Amy 12000 5 5/1/2010
3 Mark 65000 12 12/25/2008
4 Pam 25000 25 1/1/2005
5 Alex 5000 10 2/3/2007
Company table:
com_id name city
1 RED Boston
2 ORANGE New York
3 YELLOW Boston
4 GREEN Austin
Orders table:
order_id order_date com_id sales_id amount
1 1/1/2014 3 4 10000
2 2/1/2014 4 5 5000
3 3/1/2014 1 1 50000
4 4/1/2014 1 4 25000
Output:
name
Amy
Mark
Alex
Explanation:
According to orders 3 and 4 in the Orders table, it is easy to tell that only salesperson John and Pam
have sales to company RED, so we report all the other names in the table salesperson.
Q60.
Table: Triangle
Column Name Type
x int
y int
z int
Write an SQL query to report for every three line segments whether they can form a triangle.
Return the result table in any order.
The query result format is in the following example.
Input:
Triangle table:
x y z
13 15 30
10 20 15
Output:
x y z triangle
13 15 30 No
10 20 15 Yes
Q61.
Table: Point
Column Name Type
x int
Write an SQL query to report the shortest distance between any two points from the Point table.
The query result format is in the following example.
Input:
Point table:
x
-1
0
2
Output:
shortest
1
Explanation:
Follow up: How could you optimise your query if the Point table is ordered in ascending order?
Q62.
Table: ActorDirector
Column Name Type
actor_id int
director_id int
timestamp int
Write a SQL query for a report that provides the pairs (actor_id, director_id) where the actor has
cooperated with the director at least three times.
Return the result table in any order.
The query result format is in the following example.
Input:
ActorDirector table:
actor_id director_id timestamp
1 1 0
1 1 1
1 1 2
1 2 3
1 2 4
2 1 5
2 1 6
Output:
actor_id director_id
1 1
Explanation:
The only pair is (1, 1) where they cooperated exactly 3 times.
Q63.
Table: Sales
Column Name Type
sale_id int
product_id int
year int
quantity int
price int
Table: Product
Column Name Type
product_id int
product_name varchar
Write an SQL query that reports the product_name, year, and price for each sale_id in the Sales table.
Return the resulting table in any order.
The query result format is in the following example.
Input:
Sales table:
sale_id product_id year quantity price
1 100 2008 10 5000
2 100 2009 12 5000
7 200 2011 15 9000
Product table:
product_id product_name
100 Nokia
200 Apple
300 Samsung
Output:
product_name year price
Nokia 2008 5000
Nokia 2009 5000
Apple 2011 9000
Explanation:
From sale_id = 1, we can conclude that Nokia was sold for 5000 in the year 2008.
From sale_id = 2, we can conclude that Nokia was sold for 5000 in the year 2009.
From sale_id = 7, we can conclude that Apple was sold for 9000 in the year 2011.
Q64.
Table: Project
Column Name Type
project_id int
employee_id int
Table: Employee
Column Name Type
employee_id int
name varchar
experience_years int
Write an SQL query that reports the average experience years of all the employees for each project,
rounded to 2 digits.
Return the result table in any order.
The query result format is in the following example.
Input:
Project table:
project_id employee_id
1 1
1 2
1 3
2 1
2 4
Employee table:
employee_id name experience_years
1 Khaled 3
2 Ali 2
3 John 1
4 Doe 2
Output:
project_id average_years
1 2
2 2.5
Explanation:
The average experience years for the first project is (3 + 2 + 1) / 3 = 2.00 and for the second project is
(3 + 2) / 2 = 2.50
Q65.
Table: Product
Column Name Type
product_id int
product_name varchar
unit_price int
Table: Sales
Column Name Type
seller_id int
product_id int
buyer_id int
sale_date date
quantity int
price int
Write an SQL query that reports the best seller by total sales price, If there is a tie, report them all.
Return the result table in any order.
The query result format is in the following example.
Input:
Product table:
product_id product_name unit_price
1 S8 1000
2 G4 800
3 iPhone 1400
Sales table:
seller_id product_id buyer_id sale_date quantity price
1 1 1 2019-01-21 2 2000
1 2 2 2019-02-17 1 800
2 2 3 2019-06-02 1 800
3 3 4 2019-05-13 2 2800
Output:
seller_id
1
3
Explanation: Both sellers with id 1 and 3 sold products with the most total price of 2800.
Q66.
Table: Product
Column Name Type
product_id int
product_name varchar
unit_price int
Table: Sales
Column Name Type
seller_id int
product_id int
buyer_id int
sale_date date
quantity int
price int
Write an SQL query that reports the buyers who have bought S8 but not iPhone. Note that S8 and
iPhone are products present in the Product table.
Return the result table in any order.
The query result format is in the following example.
Input:
Product table:
product_id product_name unit_price
1 S8 1000
2 G4 800
3 iPhone 1400
Sales table:
seller_id product_id buyer_id sale_date quantity price
1 1 1 2019-01-21 2 2000
1 2 2 2019-02-17 1 800
2 1 3 2019-06-02 1 800
3 3 3 2019-05-13 2 2800
Output:
buyer_id
1
Explanation:
The buyer with id 1 bought an S8 but did not buy an iPhone. The buyer with id 3 bought both.
Orders table:
order_id book_id quantity dispatch_date
1 1 2 2018-07-26
2 1 1 2018-11-05
3 3 8 2019-06-11
4 4 6 2019-06-05
5 4 5 2019-06-20
6 5 9 2009-02-02
7 5 8 2010-04-13
Output:
book_id name
1 "Kalila And Demna"
2 "28 Letters"
5 "The Hunger Games"
Q67.
Table: Customer
Column Name Type
customer_id int
name varchar
visited_on date
amount int
You are the restaurant owner and you want to analyse a possible expansion (there will be at least one
customer every day).
Write an SQL query to compute the moving average of how much the customer paid in a seven days
window (i.e., current day + 6 days before). average_amount should be rounded to two decimal places.
Return result table ordered by visited_on in ascending order.
The query result format is in the following example.
Input:
Customer table:
customer_id name visited_on amount
1 Jhon 2019-01-01 100
2 Daniel 2019-01-02 110
3 Jade 2019-01-03 120
4 Khaled 2019-01-04 130
5 Winston 2019-01-05 110
6 Elvis 2019-01-06 140
7 Anna 2019-01-07 150
8 Maria 2019-01-08 80
9 Jaze 2019-01-09 110
1 Jhon 2019-01-10 130
3 Jade 2019-01-10 150
Output:
visited_on amount average_amount
2019-01-07 860 122.86
2019-01-08 840 120
2019-01-09 840 120
2019-01-10 1000 142.86
Explanation:
1st moving average from 2019-01-01 to 2019-01-07 has an average_amount of (100 + 110 + 120 +
130 + 110 + 140 + 150)/7 = 122.86
2nd moving average from 2019-01-02 to 2019-01-08 has an average_amount of (110 + 120 + 130 +
110 + 140 + 150 + 80)/7 = 120
3rd moving average from 2019-01-03 to 2019-01-09 has an average_amount of (120 + 130 + 110 +
140 + 150 + 80 + 110)/7 = 120
4th moving average from 2019-01-04 to 2019-01-10 has an average_amount of (130 + 110 + 140 +
150 + 80 + 110 + 130 + 150)/7 = 142.86
Q68.
Table: Scores
Column Name Type
player_name varchar
gender varchar
day date
score_points int
Write an SQL query to find the total score for each gender on each day.
Return the result table ordered by gender and day in ascending order.
The query result format is in the following example.
Input:
Scores table:
player_name gender day score_points
Aron F 2020-01-01 17
Alice F 2020-01-07 23
Bajrang M 2020-01-07 7
Khali M 2019-12-25 11
Slaman M 2019-12-30 13
Joe M 2019-12-31 3
Jose M 2019-12-18 2
Priya F 2019-12-31 23
Priyanka F 2019-12-30 17
Output:
Explanation:
For the female team:
The first day is 2019-12-30, Priyanka scored 17 points and the total score for the team is 17.
The second day is 2019-12-31, Priya scored 23 points and the total score for the team is 40.
The third day is 2020-01-01, Aron scored 17 points and the total score for the team is 57.
The fourth day is 2020-01-07, Alice scored 23 points and the total score for the team is 80.
Q69.
Table: Logs
Column Name Type
log_id int
Write an SQL query to find the start and end number of continuous ranges in the table Logs.
Return the result table ordered by start_id.
The query result format is in the following example.
Input:
Logs table:
log_id
1
2
3
7
8
10
Output:
start_id end_id
1 3
7 8
10 10
Explanation:
The result table should contain all ranges in table Logs.
From 1 to 3 is contained in the table.
From 4 to 6 is missing in the table
From 7 to 8 is contained in the table.
Number 9 is missing from the table.
Number 10 is contained in the table.
Q70.
Table: Students
Column Name Type
student_id int
student_name varchar
Table: Subjects
Column Name Type
subject_name varchar
Table: Examinations
Column Name Type
student_id int
subject_name varchar
Input:
Students table:
student_id student_name
1 Alice
2 Bob
13 John
6 Alex
Subjects table:
subject_name
Math
Physics
Programming
Examinations table:
student_id subject_name
1 Math
1 Physics
1 Programming
2 Programming
1 Physics
1 Math
13 Math
13 Programming
13 Physics
2 Math
1 Math
Output:
student_id student_name subject_name attended_exams
1 Alice Math 3
1 Alice Physics 2
1 Alice Programming 1
2 Bob Math 1
2 Bob Physics 0
2 Bob Programming 1
6 Alex Math 0
6 Alex Physics 0
6 Alex Programming 0
13 John Math 1
13 John Physics 1
13 John Programming 1
Explanation:
The result table should contain all students and all subjects.
Alice attended the Math exam 3 times, the Physics exam 2 times, and the Programming exam 1 time.
Bob attended the Math exam 1 time, the Programming exam 1 time, and did not attend the Physics
exam.
Alex did not attend any exams.
John attended the Math exam 1 time, the Physics exam 1 time, and the Programming exam 1 time.
Q71.
Table: Employees
Column Name Type
employee_id int
employee_name varchar
manager_id int
Write an SQL query to find employee_id of all employees that directly or indirectly report their work to
the head of the company.
The indirect relation between managers will not exceed three managers as the company is small.
Return the result table in any order.
The query result format is in the following example.
Input:
Employees table:
employee_nam
employee_id e manager_id
1 Boss 1
3 Alice 3
2 Bob 1
4 Daniel 2
7 Luis 4
8 Jhon 3
9 Angela 8
77 Robert 1
Output:
employee_id
2
77
4
7
Explanation:
The head of the company is the employee with employee_id 1.
The employees with employee_id 2 and 77 report their work directly to the head of the company.
The employee with employee_id 4 reports their work indirectly to the head of the company 4 --> 2 --> 1.
The employee with employee_id 7 reports their work indirectly to the head of the company 7 --> 4 --> 2
--> 1.
The employees with employee_id 3, 8, and 9 do not report their work to the head of the company
directly or indirectly.
Q72.
Table: Transactions
Column Name Type
id int
country varchar
state enum
amount int
trans_date date
Write an SQL query to find for each month and country, the number of transactions and their total
amount, the number of approved transactions and their total amount.
Return the result table in any order.
The query result format is in the following example.
Input:
Transactions table:
id country state amount trans_date
121 US approved 1000 2018-12-18
122 US declined 2000 2018-12-19
123 US approved 2000 2019-01-01
124 DE approved 2000 2019-01-07
Output:
approved_cou trans_total_a
month country trans_count nt mount roved_total_amo
2018-12 US 2 1 3000 1000
2019-01 US 1 1 2000 2000
2019-01 DE 1 1 2000 2000
Q73.
Table: Actions
Column Name Type
user_id int
post_id int
action_date date
action enum
extra varchar
There is no primary key for this table, it may have duplicate rows.
The action column is an ENUM type of ('view', 'like', 'reaction', 'comment', 'report', 'share').
The extra column has optional information about the action, such as a reason for the report or a type
of reaction.
Table: Removals
Column Name Type
post_id int
remove_date date
Write an SQL query to find the average daily percentage of posts that got removed after being
reported as spam, rounded to 2 decimal places.
The query result format is in the following example.
Input:
Actions table:
user_id post_id action_date action extra
1 1 2019-07-01 view null
1 1 2019-07-01 like null
1 1 2019-07-01 share null
2 2 2019-07-04 view null
2 2 2019-07-04 report spam
3 4 2019-07-04 view null
3 4 2019-07-04 report spam
4 3 2019-07-02 view null
4 3 2019-07-02 report spam
5 2 2019-07-03 view null
5 2 2019-07-03 report racism
5 5 2019-07-03 view null
5 5 2019-07-03 report racism
Removals table:
post_id remove_date
2 2019-07-20
3 2019-07-18
Output:
average_daily_percent
75
Explanation:
The percentage for 2019-07-04 is 50% because only one post of two spam reported posts were
removed.
The percentage for 2019-07-02 is 100% because one post was reported as spam and it was removed.
The other days had no spam reports so the average is (50 + 100) / 2 = 75%
Note that the output is only one number and that we do not care about the remove dates.
Q74.
Table: Activity
Column Name Type
player_id int
device_id int
event_date date
games_played int
Write an SQL query to report the fraction of players that logged in again on the day after the day they
first logged in, rounded to 2 decimal places. In other words, you need to count the number of players
that logged in for at least two consecutive days starting from their first login date, then divide that
number by the total number of players.
The query result format is in the following example.
Input:
Activity table:
player_id device_id event_date games_played
1 2 2016-03-01 5
1 2 2016-03-02 6
2 3 2017-06-25 1
3 1 2016-03-02 0
3 4 2018-07-03 5
Output:
fraction
0.33
Explanation:
Only the player with id 1 logged back in after the first day he had logged in so the answer is 1/3 = 0.33
Q75.
Table: Activity
Column Name Type
player_id int
device_id int
event_date date
games_played int
Write an SQL query to report the fraction of players that logged in again on the day after the day they
first logged in, rounded to 2 decimal places. In other words, you need to count the number of players
that logged in for at least two consecutive days starting from their first login date, then divide that
number by the total number of players.
The query result format is in the following example.
Input:
Activity table:
player_id device_id event_date games_played
1 2 2016-03-01 5
1 2 2016-03-02 6
2 3 2017-06-25 1
3 1 2016-03-02 0
3 4 2018-07-03 5
Output:
fraction
0.33
Explanation:
Only the player with id 1 logged back in after the first day he had logged in so the answer is 1/3 = 0.33
Q76.
Table Salaries:
Column Name Type
company_id int
employee_id int
employee_name varchar
salary int
Write an SQL query to find the salaries of the employees after applying taxes. Round the salary to the
nearest integer.
The tax rate is calculated for each company based on the following criteria:
● 0% If the max salary of any employee in the company is less than $1000.
● 24% If the max salary of any employee in the company is in the range [1000, 10000] inclusive.
● 49% If the max salary of any employee in the company is greater than $10000.
Return the result table in any order.
The query result format is in the following example.
Input:
Salaries table:
employee_nam
company_id employee_id e salary
1 1 Tony 2000
1 2 Pronub 21300
1 3 Tyrrox 10800
2 1 Pam 300
2 7 Bassem 450
2 9 Hermione 700
3 7 Bocaben 100
3 2 Ognjen 2200
3 13 Nyan Cat 3300
3 15 Morning Cat 7777
Output:
company_id employee_id employee_name salary
1 1 Tony 1020
1 2 Pronub 10863
1 3 Tyrrox 5508
2 1 Pam 300
2 7 Bassem 450
2 9 Hermione 700
3 7 Bocaben 76
3 2 Ognjen 1672
3 13 Nyan Cat 2508
3 15 Morning Cat 5911
Explanation:
For company 1, Max salary is 21300. Employees in company 1 have taxes = 49%
For company 2, Max salary is 700. Employees in company 2 have taxes = 0%
For company 3, Max salary is 7777. Employees in company 3 have taxes = 24%
The salary after taxes = salary - (taxes percentage / 100) * salary
For example, Salary for Morning Cat (3, 15) after taxes = 7777 - 7777 * (24 / 100) = 7777 - 1866.48 =
5910.52, which is rounded to 5911.
Q77.
Table Variables:
Column Name Type
name varchar
value int
Table Expressions:
Column Name Type
left_operand varchar
operator enum
right_operand varchar
Input:
Variables table:
name value
x 66
y 77
Expressions table:
left_operand operator right_operand
x > y
x < y
x = y
y > x
y < x
x = x
Output:
left_operand operator right_operand value
x > y false
x < y true
x = y false
y > x true
y < x false
x = x true
Explanation:
As shown, you need to find the value of each boolean expression in the table using the variables table.
Q78.
Table Person:
Column Name Type
id int
name varchar
phone_number varchar
Table Country:
Column Name Type
name varchar
country_code varchar
Table Calls:
Column Name Type
caller_id int
callee_id int
duration int
There is no primary key for this table, it may contain duplicates.
Each row of this table contains the caller id, callee id and the duration of the call in minutes. caller_id
!= callee_id
A telecommunications company wants to invest in new countries. The company intends to invest in
the countries where the average call duration of the calls in this country is strictly greater than the
global average call duration.
Write an SQL query to find the countries where this company can invest.
Return the result table in any order.
The query result format is in the following example.
Input:
Person table:
id name phone_number
3 Jonathan 051-1234567
12 Elvis 051-7654321
1 Moncef 212-1234567
2 Maroua 212-6523651
7 Meir 972-1234567
9 Rachel 972-0011100
Country table:
name country_code
Peru 51
Israel 972
Morocco 212
Germany 49
Ethiopia 251
Calls table:
caller_id callee_id duration
1 9 33
2 9 4
1 2 59
3 12 102
3 12 330
12 3 5
7 9 13
7 1 3
9 7 1
1 7 7
Output:
country
Peru
Explanation:
The average call duration for Peru is (102 + 102 + 330 + 330 + 5 + 5) / 6 = 145.666667
The average call duration for Israel is (33 + 4 + 13 + 13 + 3 + 1 + 1 + 7) / 8 = 9.37500
The average call duration for Morocco is (33 + 4 + 59 + 59 + 3 + 7) / 6 = 27.5000
Global call duration average = (2 * (33 + 4 + 59 + 102 + 330 + 5 + 13 + 3 + 1 + 7)) / 20 = 55.70000
Since Peru is the only country where the average call duration is greater than the global average, it is
the only recommended country.
Q79.
Write a query that prints a list of employee names (i.e.: the name attribute) from the Employee table in
alphabetical order.
Level - Easy
Hint - Use ORDER BY
Input Format
The Employee table containing employee data for a company is described as follows:
where employee_id is an employee's ID number, name is their name, months is the total number of
months they've been working for the company, and salary is their monthly salary.
Sample Input
Sample Output
Angela
Bonnie
Frank
Joe
Kimberly
Lisa
Michael
Patrick
Rose
Todd
Q80.
Assume you are given the table below containing information on user transactions for particular
products. Write a query to obtain the year-on-year growth rate for the total spend of each product for
each year.
Output the year (in ascending order) partitioned by product id, current year's spend, previous year's
spend and year-on-year growth rate (percentage rounded to 2 decimal places).
Level - Hard
Hint - Use extract function
user_transactions Table:
transaction_id integer
product_id integer
spend decimal
transaction_date datetime
Example Output:
2 123424 1500.60
0
Q81.
Amazon wants to maximise the number of items it can stock in a 500,000 square feet warehouse. It
wants to stock as many prime items as possible, and afterwards use the remaining square footage to
stock the most number of non-prime items.
Write a SQL query to find the number of prime and non-prime items that can be stored in the 500,000
square feet warehouse. Output the item type and number of items to be stocked.
Hint - create a table containing a summary of the necessary fields such as item type ('prime_eligible',
'not_prime'), SUM of square footage, and COUNT of items grouped by the item type.
inventory table:
item_id integer
item_type string
item_category string
square_footage decimal
item_type item_count
prime_eligible 9285
not_prime 6
Q82.
Assume you have the table below containing information on Facebook user actions. Write a query to
obtain the active user retention in July 2022. Output the month (in numerical format 1, 2, 3) and the
number of monthly active users (MAUs).
Hint: An active user is a user who has user action ("sign-in", "like", or "comment") in the current month
and last month.
user_actions Table:
user_id integer
event_id integer
event_date datetime
user_actionsExample Input:
6 1
Q83.
Google's marketing team is making a Superbowl commercial and needs a simple statistic to put on
their TV ad: the median number of searches a person made last year.
However, at Google scale, querying the 2 trillion searches is too costly. Luckily, you have access to the
summary table which tells you the number of searches made last year and how many Google users
fall into that bucket.
Write a query to report the median of searches made by a user. Round the median to one decimal
point.
Hint- Write a subquery or common table expression (CTE) to generate a series of data (that's keyword
for column) starting at the first search and ending at some point with an optional incremental value.
search_frequency Table:
searches integer
num_users integer
searches num_users
1 2
2 2
3 3
4 1
Example Output:
median
2.5
Q84.
Write a query to update the Facebook advertiser's status using the daily_pay table. Advertiser is a
two-column table containing the user id and their payment status based on the last payment and
daily_pay table has current information about their payment. Only advertisers who paid will show up in
this table.
Output the user id and current payment status sorted by the user id.
Hint- Query the daily_pay table and check through the advertisers in this table. .
advertiser Table:
user_id string
status string
user_id status
bing NEW
yahoo NEW
alibaba EXISTING
daily_pay Table:
user_id string
paid decimal
user_id paid
yahoo 45.00
alibaba 100.00
target 13.00
Example Output:
user_id new_status
bing CHURN
yahoo EXISTING
alibaba EXISTING
Bing's updated status is CHURN because no payment was made in the daily_pay table whereas Yahoo
which made a payment is updated as EXISTING.
The dataset you are querying against may have different input & output - this is just an example!
For better understanding of the advertiser's status, we're sharing with you a table of possible
transitions based on the payment status.
1. Row 2, 4, 6, 8: As long as the user has not paid on day T, the end status is updated to CHURN
regardless of the previous status.
2. Row 1, 3, 5, 7: When the user paid on day T, the end status is updated to either EXISTING or
RESURRECT, depending on their previous state. RESURRECT is only possible when the
previous state is CHURN. When the previous state is anything else, the status is updated to
EXISTING.
Q85.
Amazon Web Services (AWS) is powered by fleets of servers. Senior management has requested
data-driven solutions to optimise server usage.
Write a query that calculates the total time that the fleet of servers was running. The output should be
in units of full days.
Level - Hard
Hint-
2. Sum those up to obtain the uptime of the whole fleet, keeping in mind that the result must be
output in units of full days
Assumptions:
server_utilization Table:
server_id integer
status_time timestamp
session_status string
Example Output:
total_uptime_days
21
Q86.
Sometimes, payment transactions are repeated by accident; it could be due to user error, API failure or
a retry error that causes a credit card to be charged twice.
Using the transactions table, identify any payments made at the same merchant with the same credit
card for the same amount within 10 minutes of each other. Count such repeated payments.
Level - Hard
Hint- Use Partition and order by
Assumptions:
● The first transaction of such payments should not be counted as a repeated payment. This
means, if there are two transactions performed by a merchant with the same credit card and
for the same amount within 10 minutes, there will only be 1 repeated payment.
transactions Table:
transaction_id integer
merchant_id integer
credit_card_id integer
amount integer
transaction_timestamp datetime
Example Output:
payment_count
Q87.
DoorDash's Growth Team is trying to make sure new users (those who are making orders in their first
14 days) have a great experience on all their orders in their 2 weeks on the platform.
Unfortunately, many deliveries are being messed up because:
● the orders are being completed incorrectly (missing items, wrong order, etc.)
● the orders aren't being received (wrong address, wrong drop off spot)
● the orders are being delivered late (the actual delivery time is 30 minutes later than when the
order was placed). Note that the estimated_delivery_timestamp is automatically set to 30
minutes after the order_timestamp.
Write a query to find the bad experience rate in the first 14 days for new users who signed up in June
2022. Output the percentage of bad experience rounded to 2 decimal places.
orders Table:
order_id integer
customer_id integer
trip_id integer
order_timestamp timestamp
trips Table:
dasher_id integer
trip_id integer
estimated_delivery_timestamp timestamp
actual_delivery_timestamp timestamp
customers Table:
customer_id integer
signup_timestamp timestamp
customer_id signup_timestamp
Example Output:
bad_experience_pct
75.00
Q88
Table: Scores
Column Name Type
player_name varchar
gender varchar
day date
score_points int
Write an SQL query to find the total score for each gender on each day.
Return the result table ordered by gender and day in ascending order.
The query result format is in the following example.
Input:
Scores table:
player_name gender day score_points
Aron F 2020-01-01 17
Alice F 2020-01-07 23
Bajrang M 2020-01-07 7
Khali M 2019-12-25 11
Slaman M 2019-12-30 13
Joe M 2019-12-31 3
Jose M 2019-12-18 2
Priya F 2019-12-31 23
Priyanka F 2019-12-30 17
Output:
gender day total
F 2019-12-30 17
F 2019-12-31 40
F 2020-01-01 57
F 2020-01-07 80
M 2019-12-18 2
M 2019-12-25 13
M 2019-12-30 26
M 2019-12-31 29
M 2020-01-07 36
Explanation:
Q89.
Table Person:
Column Name Type
id int
name varchar
phone_number varchar
Table Country:
Column Name Type
name varchar
country_code varchar
A telecommunications company wants to invest in new countries. The company intends to invest in
the countries where the average call duration of the calls in this country is strictly greater than the
global average call duration.
Write an SQL query to find the countries where this company can invest.
Return the result table in any order.
The query result format is in the following example.
Input:
Person table:
id name phone_number
3 Jonathan 051-1234567
12 Elvis 051-7654321
1 Moncef 212-1234567
2 Maroua 212-6523651
7 Meir 972-1234567
9 Rachel 972-0011100
Country table:
name country_code
Peru 51
Israel 972
Morocco 212
Germany 49
Ethiopia 251
Ethiopia 251
Calls table:
caller_id callee_id duration
1 9 33
2 9 4
1 2 59
3 12 102
3 12 330
12 3 5
7 9 13
7 1 3
9 7 1
1 7 7
Output:
country
Peru
Explanation:
The average call duration for Peru is (102 + 102 + 330 + 330 + 5 + 5) / 6 = 145.666667
The average call duration for Israel is (33 + 4 + 13 + 13 + 3 + 1 + 1 + 7) / 8 = 9.37500
The average call duration for Morocco is (33 + 4 + 59 + 59 + 3 + 7) / 6 = 27.5000
Global call duration average = (2 * (33 + 4 + 59 + 102 + 330 + 5 + 13 + 3 + 1 + 7)) / 20 = 55.70000
Since Peru is the only country where the average call duration is greater than the global average, it is
the only recommended country.
Q90.
Table: Numbers
Column Name Type
num int
frequency int
The median is the value separating the higher half from the lower half of a data sample.
Write an SQL query to report the median of all the numbers in the database after decompressing the
Numbers table. Round the median to one decimal point.
The query result format is in the following example.
Input:
Numbers table:
num frequency
0 7
1 1
2 3
3 1
Output:
median
0
Explanation:
If we decompose the Numbers table, we will get [0, 0, 0, 0, 0, 0, 0, 1, 2, 2, 2, 3], so the median is (0 + 0) /
2 = 0.
Q91.
Table: Salary
Column Name Type
id int
employee_id int
amount int
pay_date date
Table: Employee
Column Name Type
employee_id int
department_id int
Write an SQL query to report the comparison result (higher/lower/same) of the average salary of
employees in a department to the company's average salary.
Return the result table in any order.
The query result format is in the following example.
Input:
Salary table:
id employee_id amount pay_date
1 1 9000 2017/03/31
2 2 6000 2017/03/31
3 3 10000 2017/03/31
4 1 7000 2017/02/28
5 2 6000 2017/02/28
6 3 8000 2017/02/28
Employee table:
employee_id department_id
1 1
2 2
3 2
Output:
pay_month department_id comparison
2017-02 1 same
2017-03 1 higher
2017-02 2 same
2017-03 2 lower
Explanation:
In March, the company's average salary is (9000+6000+10000)/3 = 8333.33...
The average salary for department '1' is 9000, which is the salary of employee_id '1' since there is only
one employee in this department. So the comparison result is 'higher' since 9000 > 8333.33 obviously.
The average salary of department '2' is (6000 + 10000)/2 = 8000, which is the average of employee_id
'2' and '3'. So the comparison result is 'lower' since 8000 < 8333.33.
With the same formula for the average salary comparison in February, the result is 'same' since both
the departments '1' and '2' have the same average salary with the company, which is 7000.
Q92.
Table: Activity
Column Name Type
player_id int
device_id int
event_date date
games_played int
The install date of a player is the first login day of that player.
We define day one retention of some date x to be the number of players whose install date is x and
they logged back in on the day right after x, divided by the number of players whose install date is x,
rounded to 2 decimal places.
Write an SQL query to report for each install date, the number of players that installed the game on
that day, and the day one retention.
Return the result table in any order.
The query result format is in the following example.
Input:
Activity table:
player_id device_id event_date games_played
1 2 2016-03-01 5
1 2 2016-03-02 6
2 3 2017-06-25 1
3 1 2016-03-01 0
3 4 2016-07-03 5
Output:
install_dt installs Day1_retention
2016-03-01 2 0.5
2017-06-25 1 0
Explanation:
Player 1 and 3 installed the game on 2016-03-01 but only player 1 logged back in on 2016-03-02 so
the day 1 retention of 2016-03-01 is 1 / 2 = 0.50
Player 2 installed the game on 2017-06-25 but didn't log back in on 2017-06-26 so the day 1 retention
of 2017-06-25 is 0 / 1 = 0.00
Q93.
Table: Players
Column Name Type
player_id int
group_id int
Table: Matches
Column Name Type
match_id int
first_player int
second_player int
first_score int
second_score int
The winner in each group is the player who scored the maximum total points within the group. In the
case of a tie, the lowest player_id wins.
Write an SQL query to find the winner in each group.
Return the result table in any order.
The query result format is in the following example.
Input:
Players table:
player_id group_id
15 1
25 1
30 1
45 1
10 2
35 2
50 2
20 3
40 3
Matches table:
match_id first_player second_player first_score second_score
1 15 45 3 0
2 30 25 1 2
3 30 15 2 0
4 40 20 5 2
5 35 50 1 1
Output:
group_id player_id
1 15
2 35
3 40
Q94.
Table: Student
Column Name Type
student_id int
student_name varchar
Table: Exam
Column Name Type
exam_id int
student_id int
score int
A quiet student is the one who took at least one exam and did not score the high or the low score.
Write an SQL query to report the students (student_id, student_name) being quiet in all exams. Do not
return the student who has never taken any exam.
Return the result table ordered by student_id.
The query result format is in the following example.
Input:
Student table:
student_id student_name
1 Daniel
2 Jade
3 Stella
4 Jonathan
5 Will
Exam table:
exam_id student_id score
10 1 70
10 2 80
10 3 90
20 1 80
30 1 70
30 3 80
30 4 90
40 1 60
40 2 70
40 4 80
Output:
student_id student_name
2 Jade
Explanation:
For exam 1: Student 1 and 3 hold the lowest and high scores respectively.
For exam 2: Student 1 holds both the highest and lowest score.
For exam 3 and 4: Student 1 and 4 hold the lowest and high scores respectively.
Students 2 and 5 have never got the highest or lowest in any of the exams.
Since student 5 is not taking any exam, he is excluded from the result.
So, we only return the information of Student 2.
Q95.
Table: Student
Column Name Type
student_id int
student_name varchar
Table: Exam
Column Name Type
exam_id int
student_id int
score int
A quiet student is the one who took at least one exam and did not score the high or the low score.
Write an SQL query to report the students (student_id, student_name) being quiet in all exams. Do not
return the student who has never taken any exam.
Return the result table ordered by student_id.
The query result format is in the following example.
Input:
Student table:
student_id student_name
1 Daniel
2 Jade
3 Stella
4 Jonathan
5 Will
Exam table:
exam_id student_id score
10 1 70
10 2 80
10 3 90
20 1 80
30 1 70
30 3 80
30 4 90
40 1 60
40 2 70
40 4 80
Output:
student_id student_name
2 Jade
Explanation:
For exam 1: Student 1 and 3 hold the lowest and high scores respectively.
For exam 2: Student 1 holds both the highest and lowest score.
For exam 3 and 4: Student 1 and 4 hold the lowest and high scores respectively.
Students 2 and 5 have never got the highest or lowest in any of the exams.
Since student 5 is not taking any exam, he is excluded from the result.
So, we only return the information of Student 2.
Q96.
You're given two tables on Spotify users' streaming data. songs_history table contains the historical
streaming data and songs_weekly table contains the current week's streaming data.
Write a query to output the user id, song id, and cumulative count of song plays as of 4 August 2022
sorted in descending order.
Definitions:
● song_weekly table currently holds data from 1 August 2022 to 7 August 2022.
● songs_history table currently holds data up to to 31 July 2022. The output should include the
historical data in this table.
Assumption:
● There may be a new user or song in the songs_weekly table not present in the songs_history
table.
songs_history Table:
history_id integer
user_id integer
song_id integer
song_plays integer
song_plays: Refers to the historical count of streaming or song plays by the user.
songs_weekly Table:
user_id integer
song_id integer
listen_time datetime
Example Output:
777 1238 12
695 4520 2
125 9630 1
Q97.
New TikTok users sign up with their emails, so each signup requires a text confirmation to activate the
new user's account.
Write a query to find the confirmation rate of users who confirmed their signups with text messages.
Round the result to 2 decimal places.
Assumptions:
● A user may fail to confirm several times with text. Once the signup is confirmed for a user,
they will not be able to initiate the signup again.
● A user may not initiate the signup confirmation process at all.
emails Table:
email_id integer
user_id integer
signup_date datetime
texts Table:
text_id integer
email_id integer
signup_action varchar
Example Output:
confirm_rate
0.67
Q98.
The table below contains information about tweets over a given period of time. Calculate the 3-day
rolling average of tweets published by each user for each date that a tweet was posted. Output the
user id, tweet date, and rolling averages rounded to 2 decimal places.
Hint- Use Count and group by
Important Assumptions:
tweets Table:
tweet_id integer
user_id integer
tweet_date timestamp
Example Output:
Q99.
Assume you are given the tables below containing information on Snapchat users, their ages, and
their time spent sending and opening snaps. Write a query to obtain a breakdown of the time spent
sending vs. opening snaps (as a percentage of total time spent on these activities) for each age
group.
Notes:
activities Table:
activity_id integer
user_id integer
time_spent float
activity_date datetime
age_breakdown Table:
user_id integer
123 31-35
456 26-30
789 21-25
Example Output:
Q100 .
The LinkedIn Creator team is looking for power creators who use their personal profile as a company
or influencer page. This means that if someone's Linkedin page has more followers than all the
companies they work for, we can safely assume that person is a Power Creator. Keep in mind that if a
person works at multiple companies, we should take into account the company with the most
followers.
Level - Medium
Hint- Use join and group by
Write a query to return the IDs of these LinkedIn power creators in ascending order.
Assumptions:
personal_profiles Table:
profile_id integer
name string
followers integer
employee_company Table:
personal_profile_id integer
company_id integer
personal_profile_id company_id
1 4
1 9
2 2
3 1
4 3
5 6
6 5
company_pages Table:
company_id integer
name string
followers integer
2 Airbnb 700,000
4 DataLemur 200
5 YouTube 1,6000,000
6 DataScience.Vin 4,500
Example Output:
profile_id