Dokumen - Tips - Mmds 2014 Talk Distributing ML Algorithms From Gpus To The Cloud
Dokumen - Tips - Mmds 2014 Talk Distributing ML Algorithms From Gpus To The Cloud
MMDS 2014
June, 2014
Xavier Amatriain
Director - Algorithms Engineering @xamat
Outline
■ Introduction
■ Emmy-winning Algorithms
■ Distributing ML Algorithms in Practice
■ An example: ANN over GPUs & AWS Cloud
What we were interested in:
■ High quality recommendations
Proxy question:
■ Accuracy in predicted rating
■ Improve by 10% = $1million!
Data size:
■ 100M ratings (back then “almost massive”)
2006 2014
Netflix Scale
▪ > 44M members
▪ > 40 countries
▪ > 1000 device types
▪ > 5B hours in Q3 2013
▪ Plays: > 50M/day
▪ Searches: > 3M/day
▪ Ratings: > 5M/day
▪ Log 100B events/day
▪ 31.62% of peak US downstream
traffic
Smart Models ■ Regression models (Logistic,
Linear, Elastic nets)
■ GBDT/RF
■ SVD & other MF models
■ Factorization Machines
■ Restricted Boltzmann Machines
■ Markov Chains & other graphical
models
■ Clustering (from k-means to
modern non-parametric models)
■ Deep ANN
■ LDA
■ Association Rules
■ …
“Emmy Winning”
Netflix Algorithms
Rating Prediction
2007 Progress Prize
▪ Top 2 algorithms
▪ MF/SVD - Prize RMSE: 0.8914
▪ RBM - Prize RMSE: 0.8990
Many features/
low-bias models
Sometimes, it’s not
about more data
At what level should I parallelize?
The three levels of Distribution/Parallelization
▪ Multi-layered Machine
Learning
Matrix Factorization Example
Xavier Amatriain (@xamat)
[email protected]
Thanks!
(and yes, we are hiring)