0% found this document useful (0 votes)
33 views5 pages

Extra Credit Quiz Sample Questions - Week 5

The document provides SQL and Python coding challenges related to data analysis, including calculating total drug sales by manufacturer, retrieving specific user transaction data, and computing rolling averages of tweets. It includes sample data tables and SQL queries for each problem, along with Python functions for calculating inequity and maximum product of numbers. The solutions are formatted to be easily understandable and applicable for data science interviews.

Uploaded by

sunjay199518
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views5 pages

Extra Credit Quiz Sample Questions - Week 5

The document provides SQL and Python coding challenges related to data analysis, including calculating total drug sales by manufacturer, retrieving specific user transaction data, and computing rolling averages of tweets. It includes sample data tables and SQL queries for each problem, along with Python functions for calculating inequity and maximum product of numbers. The solutions are formatted to be easily understandable and applicable for data science interviews.

Uploaded by

sunjay199518
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 5

From Huo, Kevin, and Singh, Nick. Ace the Data Science Interview (DataLemur.com).

2022.

Question 1 – SQL

CVS Health wants to gain a clearer understanding of its pharmacy sales and the
performance of various products.
Write a query to calculate the total drug sales for each manufacturer. Round the
answer to the nearest million and report your results in descending order of total
sales. In case of any duplicates, sort them alphabetically by the manufacturer
name.

Since this data will be displayed on a dashboard viewed by business stakeholders,


please format your results as follows: "$36 million".

pharmacy_sales_Table:
------------------------------
Column Name Type
------------------------------
product_id integer
units_sold integer
total_sales decimal
cogs decimal
manufacturer varchar
drug varchar

pharmacy_sales Table Data:


-------------------------------------------------------------
units_sold total_sales cogs manufacturer drug
total_loss
-----------------------------------------------------------------------------------
-------------------
156 89514 3130097.00 3427421.73 Biogen
Acyclovir
25 222331 2753546.00 2974975.36 AbbVie
Lamivudine and Zidovudine
50 90484 2521023.73 2742445.90 Eli Lilly
Dermasorb TA
41 189925 3499574.92 3692136.66 AbbVie
Clarithromycin
63 93513 2104765.00 2462370.76 Johnson & Johnson
Pepcid AC Acid Reducer
8 177270 2930134.52 3035522.06 Johnson & Johnson
Nicorobin Clean and Clear
189 99858 84759462.01 3243809.46 AbbVie
Humira

SOLUTION
SELECT manufacturer, '$' || ROUND(SUM(total_sales)/1000000,0) || ' million' AS
total_sales_manu
FROM pharmacy_sales
GROUP BY manufacturer
ORDER BY SUM(total_sales) DESC, manufacturer
;

Output:
-----------------------------------------
manufacturer total_sales_manu
-----------------------------------------
AbbVie $114 million
Eli Lilly $77 million
Biogen $70 million
Johnson & Johnson $43 million
Bayer $34 million
AstraZeneca $32 million
Pfizer $28 million
Novartis $26 million
Sanofi $25 million
Merck $25 million
Roche $16 million
GlaxoSmithKline $4 million

Question 2 – SQL
Assume you are given the table below on Uber transactions made by users. Write a
query to obtain the third transaction of every user. Output the user id, spend and
transaction date.

transactions_Table:
----------------------------------
Column Name Type
----------------------------------
user_id integer
spend decimal
transaction_date timestamp

transactions Table Data:


----------------------------------------------
user_id spend transaction_date
----------------------------------------------
111 100.50 01/08/2022 12:00:00
111 55.00 01/10/2022 12:00:00
121 36.00 01/18/2022 12:00:00
145 24.99 01/26/2022 12:00:00
111 89.60 02/05/2022 12:00:00
145 45.30 02/28/2022 12:00:00
121 22.20 04/01/2022 12:00:00
121 67.90 04/03/2022 12:00:00
263 156.00 04/11/2022 12:00:00
230 78.30 06/14/2022 12:00:00
263 68.12 07/11/2022 12:00:00
263 100.00 07/12/2022 12:00:00

SOLUTION
WITH transaction_order_table AS
(
SELECT user_id, spend, transaction_date,
RANK() OVER (PARTITION BY user_id ORDER BY transaction_date) AS transaction_order
FROM transactions
)
SELECT user_id, spend, transaction_date
FROM transaction_order_table
WHERE transaction_order = 3;

Output:
----------------------------------------------
user_id spend transaction_date
----------------------------------------------
111 89.60 02/05/2022 12:00:00
121 67.90 04/03/2022 12:00:00
263 100.00 07/12/2022 12:00:00

Question 3 – SQL
This is the same question as problem #10 in the SQL Chapter of Ace the Data Science
Interview!
Given a table of tweet data over a specified time period, calculate the 3-day
rolling average of tweets for each user. Output the user ID, tweet date, and
rolling averages rounded to 2 decimal places.

tweets_Table:
----------------------------------
Column Name Type
----------------------------------
user_id integer
tweet_date timestamp
tweet_count integer

tweets_Table DATA:
----------------------------------------------
user_id tweet_date tweet_count
----------------------------------------------
111 06/01/2022 00:00:00 2
111 06/02/2022 00:00:00 1
111 06/03/2022 00:00:00 3
111 06/04/2022 00:00:00 4
111 06/05/2022 00:00:00 5
111 06/06/2022 00:00:00 4
111 06/07/2022 00:00:00 6
199 06/01/2022 00:00:00 7
199 06/02/2022 00:00:00 5
199 06/03/2022 00:00:00 9
199 06/04/2022 00:00:00 1
199 06/05/2022 00:00:00 8
199 06/06/2022 00:00:00 2
199 06/07/2022 00:00:00 2
254 06/01/2022 00:00:00 1
254 06/02/2022 00:00:00 1
254 06/03/2022 00:00:00 2
254 06/04/2022 00:00:00 1
254 06/05/2022 00:00:00 3
254 06/06/2022 00:00:00 1
254 06/07/2022 00:00:00 3

SOLUTION
SELECT user_id, tweet_date, ROUND(AVG(tweet_count)
OVER (
PARTITION BY user_id
ORDER BY tweet_date
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
), 2) AS rolling_avg_3d
FROM tweets;

Output:
---------------------------------------------------------
user_id tweet_date rolling_avg_3d
---------------------------------------------------------
111 06/01/2022 00:00:00 2.00
111 06/02/2022 00:00:00 1.50
111 06/03/2022 00:00:00 2.00
111 06/04/2022 00:00:00 2.67
111 06/05/2022 00:00:00 4.00
111 06/06/2022 00:00:00 4.33
111 06/07/2022 00:00:00 5.00
199 06/01/2022 00:00:00 7.00
199 06/02/2022 00:00:00 6.00
199 06/03/2022 00:00:00 7.00
199 06/04/2022 00:00:00 5.00
199 06/05/2022 00:00:00 6.00
199 06/06/2022 00:00:00 3.67
199 06/07/2022 00:00:00 4.00
254 06/01/2022 00:00:00 1.00
254 06/02/2022 00:00:00 1.00
254 06/03/2022 00:00:00 1.33
254 06/04/2022 00:00:00 1.33
254 06/05/2022 00:00:00 2.00
254 06/06/2022 00:00:00 1.67
254 06/07/2022 00:00:00 2.33

Question 4 – Python
Given a list of salaries, we'll define a metric called inequity which is the
difference between max and min salary seen in the list: inequity=max(input_list)
−min(input_list).
Write a function called min_inequity which takes in a list of salaries, and a value
n, and returns the minimum inequity possible when taking n salaries from the full
salary list.
SOLUTION
a=[60000, 80000, 120000, 70000]
n=4
def xxx(sal,num):
a.sort()
for i in a[0:n]:
ineq=max(a[0:i])-min(a[0:i])
print(ineq)
xxx(a,n)

Output
60000

Question 5 – Python
Given a list of integers, return the maximum product of any three numbers in the
array.
For example, for A = [1, 2, 3, 4, 5], you should return 60, since 3∗4∗5=603∗4∗5=60.

SOLUTION
a=[1,2,3,4,5]
def max_three(input):
a1=max(input)
b=input
b.remove(a1)
b1=max(b)
c=b
c.remove(b1)
c1=max(c)
prodmax=a1*b1*c1
return prodmax
max_three(a)

Output
60

You might also like