9 Spotify SQL Interview Questions
9 Spotify SQL Interview Questions
playlist creation and managing databases to optimize server performance during peak streaming
hours. Unsurprisingly this is why Spotify asks SQL query questions during interviews for Data
In case you're preparing for a SQL Assessment, here’s 9 Spotify SQL interview questions to
practice, which are similar to recently asked questions at Spotify – can you solve them?
month. They are potential customers to involve in user feedback sessions. Given the database
tables users and user_listen_history , write a SQL query to identify the top 5 users who have
listened to the most unique tracks in the last 30 days. Assume today's date is 2023-03-22 .
Answer:
FROM users u
LIMIT 5;
Here, we're joining the users table and user_listen_history table on user_id . The WHERE
clause is used to specify the date range for the last 30 days. We then group the results
by user_id and username to calculate the total number of unique tracks each user has listened to
within the specified time range. This total number is counted using COUNT(DISTINCT
reveal the top 5 users who have listened to the most unique tracks in the last 30 days.
named artist_listens containing daily listening counts for different artists. The table has three
You are required to write a SQL query to calculate the 7-day rolling average of daily listens for
each artist. The rolling average should be calculated for each day for each artist based on the
1 2022-06-01 15000
1 2022-06-02 21000
1 2022-06-03 17000
2 2022-06-01 25000
2 2022-06-02 27000
2 2022-06-03 29000
Example Output:
1 2022-06-01 15000.00
artist_id listen_date rolling_avg_listens
1 2022-06-02 18000.00
1 2022-06-03 17666.67
2 2022-06-01 25000.00
2 2022-06-02 26000.00
2 2022-06-03 27000.00
Answer:
SELECT
artist_id,
listen_date,
AVG(daily_listens) OVER (
PARTITION BY artist_id
ORDER BY listen_date
RANGE BETWEEN INTERVAL '6 days' PRECEDING AND CURRENT ROW
) AS rolling_avg_listens
FROM artist_listens
ORDER BY artist_id, listen_date;
Explanation:
This SQL query computes the rolling average by using a window function AVG with a window
The PARTITION BY clause ensures the rolling average is calculated separately for each artist.
The ORDER BY clause is used to order the rows in each partition by the listen_date .
The RANGE BETWEEN INTERVAL '6 days' PRECEDING AND CURRENT ROW clause defines the window
frame for the window function. It states that for a given row, consider all rows from 6 days before
Finally, the AVG() function calculates the average of daily_listens over the defined window frame.
The ORDER BY artist_id, listen_date; at the end just keeps the result set ordered by artist and
date.
join only returns rows that match the join condition between the two tables.
For an example of each one, say you had sales data exported from Spotify's Salesforce CRM stored
INNER JOIN : retrieves rows from both tables where there is a match in the shared key or keys.
SELECT *
FROM sales
INNER JOIN spotify_customers
ON sales.customer_id = spotify_customers.id
This query will return rows from the sales and spotify_customers tables that have matching
customer id values. Only rows with matching customer_id values will be included in the results.
FULL OUTER JOIN : retrieves all rows from both tables, regardless of whether there is a match in the
shared key or keys. If there is no match, NULL values will be returned for the columns of the non-
matching table.
Here is an example of a SQL full outer join using the sales and spotify_customers tables:
SELECT *
FROM sales
FULL OUTER JOIN spotify_customers
ON sales.customer_id = spotify_customers.id
SQL Question 4: Music Streaming Statistics
As a Data Analyst of Spotify, suppose your team is interested in understanding the listening habits
• Every song has one and only one artist, but an artist can have multiple songs.
• Every song can be listened to by multiple users, and every user can listen to multiple songs.
Your goal is to write a SQL query that returns each user's favourite artist, based on the number of
1 user101 USA
2 user202 UK
3 user303 Brazil
artist_id artist_name
1001 artist1001
1002 artist1002
1 101 5:00
1 102 5:30
1 103 6:00
2 101 8:00
2 103 9:00
3 102 10:00
Answer:
SELECT u.username, a.artist_name FROM (
LIMIT 1;
The above query will find the count of songs each user has listened to by each artist, grouping by
both user_id and artist_id. This result is sorted in descending order of the count, and the top
record for each user represents their favorite artist. The outer query then joins this result back to
By adding redundant copies of data or grouping data together in a way that does not follow
eliminating costly join operations, which is important for OLAP use cases that are read-heavy and
have listened to at least 15 different artists in the current month. Active users are those who have
|**user_id**|**subscription_status**|**last_login**|
|:----|:----|:----|
|1|Premium|2022-08-20|
|2|Free|2022-08-01|
|3|Premium|2022-07-30|
|4|Premium|2022-08-21|
activity Example Input:
|**user_id**|**artist_name**|**month**|
|:----|:----|:----|
|1|Artist 1|August|
|1|Artist 2|August|
|1|Artist 3|August|
|2|Artist 4|August|
|1|Artist 5|August|
|1|Artist 6|August|
|1|Artist 7|August|
|1|Artist 8|August|
|1|Artist 9|August|
|1|Artist 10|August|
|1|Artist 11|August|
|1|Artist 12|August|
|1|Artist 13|August|
|1|Artist 14|August|
|1|Artist 15|August|
|3|Artist 1|July|
|4|Artist 2|August|
Answer:
INNER JOIN (
GROUP BY user_id
) a
ON u.user_id = a.user_id
This query first groups the activity table by user_id and calculates the count of distinct artists
each user interacted with in August. It then joins this table with the users table on user_id . The
WHERE clause filters for Premium users who have logged in within the last 30 days and have
set. For an example of the difference between a left vs. right join, suppose you had a table of
A LEFT JOIN retrieves all rows from the left table (in this case, the Orders table) and any matching
rows from the right table (the Customers table). If there is no match in the right table, NULL values
A RIGHT JOIN combines all rows from the right table (in this case, the Customers table) and any
matching rows from the left table (the Orders table). If there is no match in the left table, NULL
of music on their platform. As a data scientist, your task is to write a SQL query that calculates the
Assume you have access to a user_activity table and a songs table with the following schema:
song_id genre
5001 Rock
6985 Pop
Example Output:
3 Rock 255
3 Pop 120
4 Rock 240
mth genre avg_listening_duration_sec
4 Pop 180
Answer:
In order to get to the answer, we need to join the two tables on song_id and then use the GROUP
BY clause. The AVG() function can be used with GROUP BY to find the average listening duration
• Joins the user_activity table (as ua) and the songs table (as s) on the song_id field.
• Uses the EXTRACT function to get the month from the timestamp in user_activity.
• Uses AVG to calculate the average listening_duration_sec for each genre in each month. The
result will be average listening duration per genre for every month.
To solve another question about calculating rates, solve this SQL interview question from
artist "Adele". Assume you have access to a users table that keeps track of user information and
an album_listens table that keeps track of all instances where a user listened to an album. Here
users table
user_id user_name
1 John
2 Jane
3 Alice
album_listens table
101 1 Adele 19
102 1 Adele 21
103 1 Adele 25
104 2 Adele 21
105 3 Adele 21
106 3 Adele 25
Using SQL, write a query to find all users who have listened to all Adele's albums (19, 21, 25).
Answer:
SELECT u.user_id, u.user_name
FROM users u
WHERE NOT EXISTS (SELECT 1
FROM (SELECT DISTINCT album_name
FROM album_listens
WHERE artist_name = 'Adele') a
WHERE NOT EXISTS (SELECT 1
FROM album_listens al
WHERE al.user_id = u.user_id
AND al.album_name = a.album_name
AND al.artist_name = 'Adele'))
This query works by first identifying all distinct Adele's albums and then checking whether there
are any of these albums that a certain user has not listened to. If a user has listened to all Adele's
albums, they will not have any of Adele's albums that they have not listened to. Such users are