Introduction To Data Science Unsupervised Learning: CS 194 Fall 2015 John Canny
Introduction To Data Science Unsupervised Learning: CS 194 Fall 2015 John Canny
Lecture 8
Unsupervised Learning
Iterate:
• For every remaining data point x, compute D(x) the distance
from x to the closest cluster center.
• Choose a remaining point x randomly with probability
proportional to D(x)2, and make it a new cluster center.
K
B
M
M S A
features
K
B
M
M S A
features
Matrix Factorization
is a popular method for sparse, linear data (ratings, kw/URL
combos, CTR on ads).
𝑑𝑙
= −2𝐴𝑇 𝑆 − 𝐴 ∗𝑆 𝐵 + 2𝑤𝐵 𝐵
𝑑𝐵
where wB is the weight of the regularizer on B.
Then the loss can be minimized with SGD. This method is quite
fast, but suffers from weak local optima.
Matrix Factorization with MCMC
The local optima problem can be minimized by using MCMC
methods (Markov-Chain Monte-Carlo).
𝐵𝑖 = 𝑤𝐵 𝐼 − 𝐴𝑇 𝐷𝑖 𝐴 −1 𝐴𝑇 𝑆𝑖
From Koren, Bell, Volinksy, “Matrix Factorization Techniques for Recommender Systems”
IEEE Computer 2009.
Outline
• Unsupervised Learning
• K-Means clustering
• DBSCAN
• Matrix Factorization
• Performance
Performance
There are several reasons to design for performance:
• Dataset size (assuming model quality improves with size)
• Model size (bigger models generally perform better)
Both the above are true (but not obvious) for power-law
datasets. Because the data “tail” is long, models also need to be
long-tailed: it allows you to model less-frequent users and
features.