Data Analytics Case Study Guide (Updated For 2024)
Data Analytics Case Study Guide (Updated For 2024)
2024)
Written by IQ Team
Table of contents
One reason they’re so challenging is that case studies don’t typically have a right or wrong answer.
Instead, case study interviews require you to come up with a hypothesis for an analytics question and then
produce data to support or validate your hypothesis. In other words, it’s not just about your technical skills;
you’re also being tested on creative problem-solving and your ability to communicate with stakeholders.
This article provides an overview of how to answer data analytics case study interview questions. You can
nd an in-depth course in the data analytics learning path.
In short, to ace a data analytics case interview, you not only need to brush up on case questions, but you also
should be adept at writing all types of SQL queries and have strong data sense.
These questions are especially challenging to answer if you don’t have a framework or know how to answer
them. To help you prepare, we created this step-by-step guide to answering data analytics case questions.
We show you how to use a framework to answer case questions, provide example analytics questions, and
help you understand the di erence between analytics case studies and product metrics case studies.
Ultimately, the type of case question you are asked will depend on the role. For example, product analysts will
likely face more product-oriented questions.
Product metrics cases tend to focus on a hypothetical situation. You might be asked to:
yp g
Investigate Metrics - One of the most common types will ask you to investigate a metric, usually one
that’s going up or down. For example, “Why are Facebook friend requests falling by 10 percent?”
Measure Product/Feature Success - A lot of analytics cases revolve around the measurement of product
success and feature changes. For example, “We want to add X feature to product Y. What metrics would
you track to make sure that’s a good idea?”
With product data cases, the key di erence is that you may or may not be required to write the SQL query to
nd the metric.
Instead, these interviews are more theoretical and are designed to assess your product sense and ability to
think about analytics problems from a product perspective. Product metrics questions may also show up in
the data analyst interview, but likely only for product data analyst roles.
Question:
You’re given a table that represents search results from searches on Facebook. The query column is the
search term, the position column represents each position the search result came in, and the rating column
represents the human rating from 1 to 5, where 5 is high relevance, and 1 is low relevance.
Each row in the search_events table represents a single search, with the has_clicked column representing
if a user clicked on a result or not. We have a hypothesis that the CTR is dependent on the search result
rating.
search_results table:
Column Type
query VARCHAR
result_id INTEGER
position INTEGER
rating INTEGER
search_events table
Column Type
search_id INTEGER
query VARCHAR
has_clicked BOOLEAN
Hint: Start by making assumptions and thinking out loud. With this question, focus on coming up with a
metric to support the hypothesis. If the question is unclear or if you think you need more information, be
sure to ask.
Answer. The hypothesis is that CTR is dependent on search result rating. Therefore, we want to focus on the
CTR metric, and we can assume:
If CTR is high when search result ratings are high, and CTR is low when the search result ratings are low,
then the hypothesis is correct.
If CTR is low when the search ratings are high, or there is no proven correlation between the two, then our
hypothesis is not proven.
Hint: Walk the interviewer through your reasoning. Talking about the decisions you make and why you’re
making them shows o your problem-solving approach.
Answer. One way we can investigate the hypothesis is to look at the results split into di erent search rating
buckets. For example, if we measure the CTR for results rated at 1, then those rated at 2, and so on, we can
identify if an increase in rating is correlated with an increase in CTR.
First, I’d write a query to get the number of results for each query in each bucket. We want to look at the
distribution of results that are less than a rating threshold, which will help us see the relationship between
search rating and CTR.
WITH ratings AS (
SELECT query
, SUM(CASE WHEN
rating <= 1 THEN 1 ELSE 0
END) AS num_results_rating_one
, SUM(CASE WHEN
rating <= 2 THEN 1 ELSE 0
END) AS num_results_rating_two
, SUM(CASE WHEN
rating <= 3 THEN 1 ELSE 0
END) AS num_results_rating_three
, COUNT(*) AS total_results
FROM search_results
GROUP BY 1
)
This CTE aggregates the number of results that are less than a certain rating threshold. Later, we can use
this to see the percentage that are in each bucket. If we re-join to the search_events table, we can calculate
the CTR by then grouping by each bucket.
WITH ratings AS (
SELECT query
, SUM(CASE WHEN
rating <= 1 THEN 1 ELSE 0
END) AS num_results_rating_one
, SUM(CASE WHEN
rating <= 2 THEN 1 ELSE 0
END) AS num_results_rating_two
, SUM(CASE WHEN
rating <= 3 THEN 1 ELSE 0
END) AS num_results_rating_three
, COUNT(*) AS total_results
FROM search_results
GROUP BY 1
)
SELECT
CASE
WHEN total_results - num_results_rating_one = 0
THEN 'results_one'
WHEN total_results - num_results_rating_two = 0
THEN 'results_two'
WHEN total_results - num_results_rating_three = 0
THEN 'results_three'
END AS ratings_bucket
, SUM(has_clicked)/COUNT(*) AS ctr
FROM search_events AS se
LEFT JOIN ratings AS r
ON se.query = r.query
GROUP BY 1
Hint: Be prepared to justify your solution. Interviewers will follow up with questions about your reasoning,
and ask why you make certain assumptions.
Answer. By using the CASE WHEN statement, I calculated each ratings bucket by checking to see if all the
search results were less than 1, 2, or 3 by subtracting the total from the number within the bucket and seeing
if it equates to 0.
I did that to get away from averages in our bucketing system. Outliers would make it more di cult to
measure the e ect of bad ratings. For example, if a query had a 1 rating and another had a 5 rating, that
would equate to an average of 3. Whereas in my solution, a query with all of the results under 1, 2, or 3 lets us
know that it actually has bad ratings.
Question:
Let’s say you work for a social media company that has just done a launch in a new city. Looking at weekly
metrics, you see a slow decrease in the average number of comments per user from January to March in this
city.
The company has been consistently growing new users in the city from January to March.
What are some reasons why the average number of comments per user would be decreasing and what
metrics would you look into?
Hint: This question is very vague. It’s all hypothetical, so we don’t know very much about users, what the
product is, and how people might be interacting. Be sure you ask questions upfront about the product.
Answer: Before I jump into an answer, I’d like to ask a few questions:
Who uses this social network? How do they interact with each other?
Has there been any performance issues that might be causing the problem?
What are the goals of this particular launch?
Has there been any changes to the comment features in recent weeks?
For the sake of this example, let’s say we learn that it’s a social network similar to Facebook with a young
audience, and the goals of the launch are to grow the user base. Also, there have been no performance
issues and the commenting feature hasn’t been changed since launch.
Hint: Look for clues in the question. For example, this case gives you a metric, “average number of comments
per user.” Consider if the clue might be helpful in your solution. But be careful, sometimes questions are
designed to throw you o track.
Answer: From the question, we can hypothesize a little bit. For example, we know that user count is
increasing linearly. That means two things:
We can also model out the data to help us get a better picture of the average number of comments per user
metric:
One thing to note: Although this is an interesting metric, I’m not sure if it will help us solve this question. For
one, average comments per user doesn’t account for churn. We might assume that during the three-month
period users are churning o the platform. Let’s say the churn rate is 25% in January, 20% in February and
15% in March.
Hint: Don’t worry too much about making a correct hypothesis. Instead, interviewers want to get a sense of
your product initiation and that you’re on the right track. Also, be prepared to measure your hypothesis.
Answer. I would say that average comments per user isn’t a great metric to use, because it doesn’t reveal
insights into what’s really causing this issue.
That’s because it doesn’t account for active users, which are the users who are actually commenting. A
better metric to investigate would be retained users and monthly active users.
What I suspect is causing the issue is that active users are commenting frequently and are responsible for
the increase in comments month-to-month. New users, on the other hand, aren’t as engaged and aren’t
commenting as often.
Hint: Within your solution, include key metrics that you’d like to investigate that will help you measure
success.
Answer: I’d say there are a few ways we could investigate the cause of this problem, but the one I’d be most
interested in would be the engagement of monthly active users.
If the growth in comments is coming from active users, that would help us understand how we’re doing at
retaining users. Plus, it will also show if new users are less engaged and commenting less frequently.
One way that we could dig into this would be to segment users by their onboarding date, which would help us
to visualize engagement and see how engaged some of our longest-retained users are.
If engagement of new users is the issue, that will give us some options in terms of strategies for addressing
the problem. For example, we could test new onboarding or commenting features designed to generate
engagement.
Answer. If the cause wasn’t due to a lack of engagement among new users, then I’d want to investigate
active users. One potential cause would be active users commenting less. In that case, we’d know that our
earliest users were churning out, and that engagement among new users was potentially growing.
Again, I think we’d want to focus on user engagement since the onboarding date. That would help us
understand if we were seeing higher levels of churn among active users, and we could start to identify some
solutions there.
Once you have the framework down, what’s the best way to practice? Mock interviews with our coaches are
very e ective, as you’ll get feedback and helpful tips as you answer. You can also learn a lot by practicing
P2P mock interviews with other Interview Query students. No data analytics background? Check out how to
become a data analyst without a degree.
Finally, if you’re looking for sample data analytics case questions and other types of interview questions, see
our guide on the top data analyst interview questions.
Related articles
How he transitioned from Financial Analyst to Data Scientist at Credit Expert: An IQ Success Story
Go Premium