Machine Learning &
Big Data @
Andy Sloane
@a1k0n
http://a1k0n.net
Madison Big Data Meetup
Jan 27, 2015
Big data?
60M Monthly Active Users (MAU)
50M tracks in our catalog
...But many are identical copies from different
releases (e.g. US and UK releases of the same
album)
...and only 4M unique songs have been listened to
>500 times
Big data?
Raw material: application logs, delivered via Apache
Kafka
Wake Me Up by Avicii has been played 330M times, by
~6M different users
"EndSong": 500GB / day
...But aggregated per-user play counts for a whole
year fit in ~60GB ("medium data")
Hadoop @ Spotify
900 nodes (all in London datacenter)
34 TB RAM total
~16000 typical concurrent tasks (mappers/reducers)
2GB RAM per mapper/reducer slot
What do we need ML for?
Recommendations
Related Artists
Radio
Recommendations
The Discover page
4M tracks x 60M active users, rebuilt daily
The Discover page
Okay, but how do we come up with recommendations?
Collaborative filtering!
Collaborative filtering
Collaborative filtering
Great, but how does that actually work?
Each time a user plays something, add it to a matrix
Compute similarity, somehow, between items based on
who played what
Collaborative filtering
So compute some distance between every pair of rows
and columns
That's just O( ) = O( ) operations... O_O
We need a better way...
60M
2
2
1.8 × 10
15
(BTW: Twitter has a decent approximation that can actually make this work, called DIMSUM:
https://blog.twitter.com/2014/all-pairs-similarity-via-dimsum)
I've tried it but don't have results to report here yet :(
Collaborative filtering
Latent factor models
Instead, we use a "small" representation for each user &
item: -dimensional vectorsf
(here, )f = 2
and approximate the big matrix with it.
Why vectors?
Very compact representation of musical style or user's
taste
Only like 40-200 elements (2 shown above for
illustration)
Why vectors?
Dot product between items = similarity between items
Dot product between vectors = good/bad
recommendation
user x item
2 x 4 = 8
-4 x 0 = 0
2 x -2 = -4
-1 x 5 = + -5
= -1
Recommendations via dot products
Another example of tracks in two
dimensions
Implicit Matrix Factorization
Hu, Koren, Volinsky - Collaborative Filtering for Implicit
Feedback Datasets
Tries to predict whether user listens to item :u i
P = ≈ ( )
⎛
⎝
⎜
⎜
⎜
⎜
0
0
0
1
0
1
0
0
0
1
1
0
1
0
0
1
⎞
⎠
⎟
⎟
⎟
⎟
X
⎛
⎝
⎜
⎜
⎜
Y
T
⎞
⎠
⎟
⎟
⎟
is all item vectors, is all user vectorsY X
"implicit" because users don't tell us what they like, we
only observe what they do/don't listen to
Goal: make close to 1 for things each user has
listened to, 0 for everything else.
Implicit Matrix Factorization
⋅xu y
i
— user 's vector
— item 's vector
— 1 if user played item , 0 otherwise
— "confidence", ad-hoc weight based on number of
times user played item ; e.g.,
— regularization penalty to avoid overfitting
xu u
y
i
i
p
ui
u i
cui
u i 1 + α ⋅
λ
Minimize:
+ λ
(
|| | + || |
)
∑
u,i
cui ( − )p
ui
x
T
u y
i
2
∑
u
xu |
2
∑
i
y
i
|
2
Solution: alternate solving for all users :
and all items :
Alternating Least Squares
xu
= ( Y + ( − I)Y + λIxu Y
T
Y
T
C
u
)
−1
Y
T
C
u
p
u⋅
y
i
= ( X + ( − I)X + λIy
i
X
T
X
T
C
i
)
−1
X
T
C
i
p
⋅i
= x matrix, sum of outer products of all items
same, except only items the user played
= weighted -dimensional sum of items the
user played
YY
T
f f
( − I)YY
T
C
u
Y
T
C
u
p
u
f
Alternating Least Squares
Key point: each iteration is linear in size of input, even
though we are solving for all users x all items, and needs
only memory to solvef
2
No learning rates, just a few tunable parameters ( , , )f λ α
All you do is add stuff up, solve an x matrix problem,
and repeat!
f f
We use dimensional vectors for
recommendations
f = 40
Matrix/vector math using numpy in Python, breeze in
scala
Alternating Least Squares
Adding lots of stuff up
Problem: any user (60M) can play any item (4M)
thus we may need to add any user's vector to any
item's vector
If we put user vectors in memory, it takes a lot of RAM!
Worst case: 60M users * 40 dimensions * sizeof(float) =
9.6GB of user vectors
...too big to fit in a mapper slot on our cluster
Solution: Split the data into a matrix
Most recent run made a 14 x 112 grid
Adding lots of stuff up
Input is a bunch of tuples
is the same modulo K for all users
is the same modulo L for all items
e.g., if K = 4, mapper #1 gets users 1, 5, 9, 13, ...
One map shard
(user, item, count)
user
item
Add up vectors from every data point
Then flip users ↔items and repeat!
Adding stuff up
(user, item, count)
def mapper(self, input): # Luigi-style python job
user, item, count = parse(input)
conf = AdHocConfidenceFunction(count) # e.g. 1 + alpha*count
# add up user vectors from previous iteration
term1 = conf * self.user_vectors[user]
term2 = np.outer(user_vectors[user], user_vectors[user])
* (conf - 1)
yield item, np.array([term1, term2])
def reducer(self, item, terms):
term1, term2 = sum(terms)
item_vector = np.solve(
self.YTY + term2 + self.l2penalty * np.identity(self.dim),
term1)
yield item, item_vector
Alternating Least Squares
Implemented in Java Map-Reduce framework which
runs other models, too
After about 20 iterations, we converge
Each iteration takes about 20 minutes, so about 7-8
hours total
Recomputed from scratch weekly
User vectors recomputed daily, keeping items fixed
So we have vectors, now what?
60M users x 4M recommendable items
Finding Recommendations
For each user, how do we find the best items given
their vector?
Brute force is O(60M x 4M x 40) = O(9 peta-operations)!
Instead, use an approximation based on locality
sensitive hashing (LSH)
Approximate Nearest Neighbors /
Locality-Sensitive Hashing
Annoy - github.com/spotify/annoy
Annoy - github.com/spotify/annoy
Pre-built read-only database of item vectors
Internally, recursively splits random hyperplanes
Nearby points likely on the same side of random split
Builds several random trees (a forest) for better
approximation
Given an -dimensional query vector, finds similar items
in database
Index loads via mmap, so all processes on the same
machine share RAM
Queries are very, very fast, but approximate
Python implementation available, Java forthcoming
f
Generating recommendations
Annoy index for all items is only 1.2GB
I have one on my laptop... Live demo!
Could serve up nearest neighbors at load time, but we
precompute Discover on Hadoop
Generating recommendations in parallel
Send annoy index in distributed cache, load it via mmap
in map-reduce process
Reducer loads vectors + user stats, looks up ANN,
generates recommendations.
Related Artists
Related Artists
Great for music discovery
Essential for finding believable reasons for latent
factor-based recommendations
When generating recommendations, run through a list
of related artists to find potential reasons
Similar items use cosine distance
Cosine is similar to dot product; just add a
normalization step
Helps "factor out" popularity from similarity
Related Artists
How we build it
Similar to user recommendations, but with more
models, not necessarily collaborative filtering based
Implicit Matrix Factorization (shown previously)
"Vector-Exp", similar model but probabilistic in
nature, trained with gradient descent
Google word2vec on playlists
Echo Nest "cultural similarity" — based on scraping
web pages about music!
Query ANNs to generate candidates
Score candidates from all models, combine and rank
Pre-build table of 20 nearest artists to each artist
Radio
ML-wise, exactly the same as Related Artists!
Radio
For each track, generate candidates with ANN from
each model
Score w/ all models, rank with ensemble
Store top 250 nearest neighbors in a database
(Cassandra)
User plays radio → load 250 tracks and shuffle
Thumbs up → load more tracks from the thumbed-up
song
Thumbs down → remove that song / re-weight tracks
Upcoming work
Deep learning based item similarity
http://benanne.github.io/2014/08/05/spotify-cnns.html
Upcoming work
Audio fingerprint based
content deduplication
~1500 Echo Nest Musical Fingerprints per track
based matching to accelerate all-pairs
similarity
Fast connected components using Hash-to-Min
algorithm - mapreduce steps
Min-Hash
O(log d)
http://arxiv.org/pdf/1203.5387.pdf
Thanks!
I can be reached here:
Andy Sloane
Email:
Twitter:
Special thanks to , whose slides I
plagiarized mercilessly
andy@a1k0n.net
@a1k0n
http://a1k0n.net
Erik Bernhardsson

More Related Content

PPTX
Collaborative Filtering at Spotify
PDF
Algorithmic Music Recommendations at Spotify
PDF
Machine Learning and Big Data for Music Discovery at Spotify
PDF
Interactive Recommender Systems with Netflix and Spotify
PDF
Music recommendations @ MLConf 2014
PDF
ML+Hadoop at NYC Predictive Analytics
PDF
Scala Data Pipelines for Music Recommendations
PDF
CF Models for Music Recommendations At Spotify
Collaborative Filtering at Spotify
Algorithmic Music Recommendations at Spotify
Machine Learning and Big Data for Music Discovery at Spotify
Interactive Recommender Systems with Netflix and Spotify
Music recommendations @ MLConf 2014
ML+Hadoop at NYC Predictive Analytics
Scala Data Pipelines for Music Recommendations
CF Models for Music Recommendations At Spotify

What's hot (20)

PDF
Recommending and Searching (Research @ Spotify)
PDF
From Idea to Execution: Spotify's Discover Weekly
PDF
Homepage Personalization at Spotify
PDF
Music Personalization At Spotify
PDF
DataEngConf: Building a Music Recommender System from Scratch with Spotify Da...
PDF
Music Recommendations at Scale with Spark
PDF
Personalizing the listening experience
PDF
Recommending and searching @ Spotify
PDF
Personalized Playlists at Spotify
PDF
The Evolution of Hadoop at Spotify - Through Failures and Pain
PDF
Building Data Pipelines for Music Recommendations at Spotify
PDF
Introduction to Recommendation Systems
PDF
Big data and machine learning @ Spotify
PDF
Past, present, and future of Recommender Systems: an industry perspective
PPTX
Personalized Page Generation for Browsing Recommendations
PDF
Approximate nearest neighbor methods and vector models – NYC ML meetup
PDF
Music Personalization : Real time Platforms.
PPTX
Spotify Discover Weekly: The machine learning behind your music recommendations
PDF
Big Data At Spotify
PDF
Artwork Personalization at Netflix
Recommending and Searching (Research @ Spotify)
From Idea to Execution: Spotify's Discover Weekly
Homepage Personalization at Spotify
Music Personalization At Spotify
DataEngConf: Building a Music Recommender System from Scratch with Spotify Da...
Music Recommendations at Scale with Spark
Personalizing the listening experience
Recommending and searching @ Spotify
Personalized Playlists at Spotify
The Evolution of Hadoop at Spotify - Through Failures and Pain
Building Data Pipelines for Music Recommendations at Spotify
Introduction to Recommendation Systems
Big data and machine learning @ Spotify
Past, present, and future of Recommender Systems: an industry perspective
Personalized Page Generation for Browsing Recommendations
Approximate nearest neighbor methods and vector models – NYC ML meetup
Music Personalization : Real time Platforms.
Spotify Discover Weekly: The machine learning behind your music recommendations
Big Data At Spotify
Artwork Personalization at Netflix
Ad

Similar to Machine learning @ Spotify - Madison Big Data Meetup (20)

PDF
Introduction to recommender systems
PDF
Recsys 2018 overview and highlights
PDF
Machine learning
PPTX
Collaborative filtering at scale
PPTX
TypeScript and Deep Learning
PDF
Scala Data Pipelines @ Spotify
PDF
Collaborative Filtering with Spark
PPTX
D3, TypeScript, and Deep Learning
PPTX
D3, TypeScript, and Deep Learning
PDF
Standardizing arrays -- Microsoft Presentation
PPT
New zealand bloom filter
KEY
How to make intelligent web apps
PDF
MS CS - Selecting Machine Learning Algorithm
DOCX
Cite References.Classification in Discriminant Analysis Discussi.docx
PDF
Introduction to Python_for_machine_learning.pdf
PDF
Project - Deep Locality Sensitive Hashing
PPTX
Tg noh jeju_workshop
PDF
Monads and Monoids by Oleksiy Dyagilev
PPTX
Recommendation Engine Powered by Hadoop
PPTX
Recommendation Engine Powered by Hadoop - Pranab Ghosh
Introduction to recommender systems
Recsys 2018 overview and highlights
Machine learning
Collaborative filtering at scale
TypeScript and Deep Learning
Scala Data Pipelines @ Spotify
Collaborative Filtering with Spark
D3, TypeScript, and Deep Learning
D3, TypeScript, and Deep Learning
Standardizing arrays -- Microsoft Presentation
New zealand bloom filter
How to make intelligent web apps
MS CS - Selecting Machine Learning Algorithm
Cite References.Classification in Discriminant Analysis Discussi.docx
Introduction to Python_for_machine_learning.pdf
Project - Deep Locality Sensitive Hashing
Tg noh jeju_workshop
Monads and Monoids by Oleksiy Dyagilev
Recommendation Engine Powered by Hadoop
Recommendation Engine Powered by Hadoop - Pranab Ghosh
Ad

Recently uploaded (20)

PDF
2025-08 San Francisco FinOps Meetup: Tiering, Intelligently.
PPTX
langchainpptforbeginners_easy_explanation.pptx
PDF
Teal Blue Futuristic Metaverse Presentation.pdf
PPT
Classification methods in data analytics.ppt
PPT
2011 HCRP presentation-final.pptjrirrififfi
PPTX
cardiac failure and associated notes.pptx
PDF
Buddhism presentation about world religion
PPT
What is life? We never know the answer exactly
PPTX
DAA UNIT 1 for unit 1 time compixity PPT.pptx
PPT
BME 301 Lecture Note 1_2.ppt mata kuliah Instrumentasi
PPTX
Bussiness Plan S Group of college 2020-23 Final
PDF
General category merit rank list for neet pg
PPTX
Chapter security of computer_8_v8.1.pptx
PPTX
AI-Augmented Business Process Management Systems
PDF
Mcdonald's : a half century growth . pdf
PPTX
GPS sensor used agriculture land for automation
PDF
American Journal of Multidisciplinary Research and Review
PPT
Technicalities in writing workshops indigenous language
PDF
Hikvision-IR-PPT---EN.pdfSADASDASSAAAAAAAAAAAAAAA
PDF
9 FinOps Tools That Simplify Cloud Cost Reporting.pdf
2025-08 San Francisco FinOps Meetup: Tiering, Intelligently.
langchainpptforbeginners_easy_explanation.pptx
Teal Blue Futuristic Metaverse Presentation.pdf
Classification methods in data analytics.ppt
2011 HCRP presentation-final.pptjrirrififfi
cardiac failure and associated notes.pptx
Buddhism presentation about world religion
What is life? We never know the answer exactly
DAA UNIT 1 for unit 1 time compixity PPT.pptx
BME 301 Lecture Note 1_2.ppt mata kuliah Instrumentasi
Bussiness Plan S Group of college 2020-23 Final
General category merit rank list for neet pg
Chapter security of computer_8_v8.1.pptx
AI-Augmented Business Process Management Systems
Mcdonald's : a half century growth . pdf
GPS sensor used agriculture land for automation
American Journal of Multidisciplinary Research and Review
Technicalities in writing workshops indigenous language
Hikvision-IR-PPT---EN.pdfSADASDASSAAAAAAAAAAAAAAA
9 FinOps Tools That Simplify Cloud Cost Reporting.pdf

Machine learning @ Spotify - Madison Big Data Meetup

  • 1. Machine Learning & Big Data @ Andy Sloane @a1k0n http://a1k0n.net Madison Big Data Meetup Jan 27, 2015
  • 2. Big data? 60M Monthly Active Users (MAU) 50M tracks in our catalog ...But many are identical copies from different releases (e.g. US and UK releases of the same album) ...and only 4M unique songs have been listened to >500 times
  • 3. Big data? Raw material: application logs, delivered via Apache Kafka Wake Me Up by Avicii has been played 330M times, by ~6M different users "EndSong": 500GB / day ...But aggregated per-user play counts for a whole year fit in ~60GB ("medium data")
  • 4. Hadoop @ Spotify 900 nodes (all in London datacenter) 34 TB RAM total ~16000 typical concurrent tasks (mappers/reducers) 2GB RAM per mapper/reducer slot
  • 5. What do we need ML for? Recommendations Related Artists Radio
  • 7. The Discover page 4M tracks x 60M active users, rebuilt daily
  • 8. The Discover page Okay, but how do we come up with recommendations? Collaborative filtering!
  • 10. Collaborative filtering Great, but how does that actually work? Each time a user plays something, add it to a matrix Compute similarity, somehow, between items based on who played what
  • 11. Collaborative filtering So compute some distance between every pair of rows and columns That's just O( ) = O( ) operations... O_O We need a better way... 60M 2 2 1.8 × 10 15 (BTW: Twitter has a decent approximation that can actually make this work, called DIMSUM: https://blog.twitter.com/2014/all-pairs-similarity-via-dimsum) I've tried it but don't have results to report here yet :(
  • 12. Collaborative filtering Latent factor models Instead, we use a "small" representation for each user & item: -dimensional vectorsf (here, )f = 2 and approximate the big matrix with it.
  • 13. Why vectors? Very compact representation of musical style or user's taste Only like 40-200 elements (2 shown above for illustration)
  • 14. Why vectors? Dot product between items = similarity between items Dot product between vectors = good/bad recommendation user x item 2 x 4 = 8 -4 x 0 = 0 2 x -2 = -4 -1 x 5 = + -5 = -1
  • 16. Another example of tracks in two dimensions
  • 17. Implicit Matrix Factorization Hu, Koren, Volinsky - Collaborative Filtering for Implicit Feedback Datasets Tries to predict whether user listens to item :u i P = ≈ ( ) ⎛ ⎝ ⎜ ⎜ ⎜ ⎜ 0 0 0 1 0 1 0 0 0 1 1 0 1 0 0 1 ⎞ ⎠ ⎟ ⎟ ⎟ ⎟ X ⎛ ⎝ ⎜ ⎜ ⎜ Y T ⎞ ⎠ ⎟ ⎟ ⎟ is all item vectors, is all user vectorsY X "implicit" because users don't tell us what they like, we only observe what they do/don't listen to
  • 18. Goal: make close to 1 for things each user has listened to, 0 for everything else. Implicit Matrix Factorization ⋅xu y i — user 's vector — item 's vector — 1 if user played item , 0 otherwise — "confidence", ad-hoc weight based on number of times user played item ; e.g., — regularization penalty to avoid overfitting xu u y i i p ui u i cui u i 1 + α ⋅ λ Minimize: + λ ( || | + || | ) ∑ u,i cui ( − )p ui x T u y i 2 ∑ u xu | 2 ∑ i y i | 2
  • 19. Solution: alternate solving for all users : and all items : Alternating Least Squares xu = ( Y + ( − I)Y + λIxu Y T Y T C u ) −1 Y T C u p u⋅ y i = ( X + ( − I)X + λIy i X T X T C i ) −1 X T C i p ⋅i = x matrix, sum of outer products of all items same, except only items the user played = weighted -dimensional sum of items the user played YY T f f ( − I)YY T C u Y T C u p u f
  • 20. Alternating Least Squares Key point: each iteration is linear in size of input, even though we are solving for all users x all items, and needs only memory to solvef 2 No learning rates, just a few tunable parameters ( , , )f λ α All you do is add stuff up, solve an x matrix problem, and repeat! f f We use dimensional vectors for recommendations f = 40 Matrix/vector math using numpy in Python, breeze in scala
  • 21. Alternating Least Squares Adding lots of stuff up Problem: any user (60M) can play any item (4M) thus we may need to add any user's vector to any item's vector If we put user vectors in memory, it takes a lot of RAM! Worst case: 60M users * 40 dimensions * sizeof(float) = 9.6GB of user vectors ...too big to fit in a mapper slot on our cluster
  • 22. Solution: Split the data into a matrix Most recent run made a 14 x 112 grid Adding lots of stuff up
  • 23. Input is a bunch of tuples is the same modulo K for all users is the same modulo L for all items e.g., if K = 4, mapper #1 gets users 1, 5, 9, 13, ... One map shard (user, item, count) user item
  • 24. Add up vectors from every data point Then flip users ↔items and repeat! Adding stuff up (user, item, count) def mapper(self, input): # Luigi-style python job user, item, count = parse(input) conf = AdHocConfidenceFunction(count) # e.g. 1 + alpha*count # add up user vectors from previous iteration term1 = conf * self.user_vectors[user] term2 = np.outer(user_vectors[user], user_vectors[user]) * (conf - 1) yield item, np.array([term1, term2]) def reducer(self, item, terms): term1, term2 = sum(terms) item_vector = np.solve( self.YTY + term2 + self.l2penalty * np.identity(self.dim), term1) yield item, item_vector
  • 25. Alternating Least Squares Implemented in Java Map-Reduce framework which runs other models, too After about 20 iterations, we converge Each iteration takes about 20 minutes, so about 7-8 hours total Recomputed from scratch weekly User vectors recomputed daily, keeping items fixed So we have vectors, now what?
  • 26. 60M users x 4M recommendable items Finding Recommendations For each user, how do we find the best items given their vector? Brute force is O(60M x 4M x 40) = O(9 peta-operations)! Instead, use an approximation based on locality sensitive hashing (LSH)
  • 27. Approximate Nearest Neighbors / Locality-Sensitive Hashing Annoy - github.com/spotify/annoy
  • 28. Annoy - github.com/spotify/annoy Pre-built read-only database of item vectors Internally, recursively splits random hyperplanes Nearby points likely on the same side of random split Builds several random trees (a forest) for better approximation Given an -dimensional query vector, finds similar items in database Index loads via mmap, so all processes on the same machine share RAM Queries are very, very fast, but approximate Python implementation available, Java forthcoming f
  • 29. Generating recommendations Annoy index for all items is only 1.2GB I have one on my laptop... Live demo! Could serve up nearest neighbors at load time, but we precompute Discover on Hadoop
  • 30. Generating recommendations in parallel Send annoy index in distributed cache, load it via mmap in map-reduce process Reducer loads vectors + user stats, looks up ANN, generates recommendations.
  • 32. Related Artists Great for music discovery Essential for finding believable reasons for latent factor-based recommendations When generating recommendations, run through a list of related artists to find potential reasons
  • 33. Similar items use cosine distance Cosine is similar to dot product; just add a normalization step Helps "factor out" popularity from similarity
  • 34. Related Artists How we build it Similar to user recommendations, but with more models, not necessarily collaborative filtering based Implicit Matrix Factorization (shown previously) "Vector-Exp", similar model but probabilistic in nature, trained with gradient descent Google word2vec on playlists Echo Nest "cultural similarity" — based on scraping web pages about music! Query ANNs to generate candidates Score candidates from all models, combine and rank Pre-build table of 20 nearest artists to each artist
  • 35. Radio
  • 36. ML-wise, exactly the same as Related Artists! Radio For each track, generate candidates with ANN from each model Score w/ all models, rank with ensemble Store top 250 nearest neighbors in a database (Cassandra) User plays radio → load 250 tracks and shuffle Thumbs up → load more tracks from the thumbed-up song Thumbs down → remove that song / re-weight tracks
  • 37. Upcoming work Deep learning based item similarity http://benanne.github.io/2014/08/05/spotify-cnns.html
  • 38. Upcoming work Audio fingerprint based content deduplication ~1500 Echo Nest Musical Fingerprints per track based matching to accelerate all-pairs similarity Fast connected components using Hash-to-Min algorithm - mapreduce steps Min-Hash O(log d) http://arxiv.org/pdf/1203.5387.pdf
  • 39. Thanks! I can be reached here: Andy Sloane Email: Twitter: Special thanks to , whose slides I plagiarized mercilessly [email protected] @a1k0n http://a1k0n.net Erik Bernhardsson