Machine Learning Course - Matrix Factorization
Machine Learning Course - Matrix Factorization
Matrix Factorizations
̣̣
̣
̣
̣ ̣
̣̣
̣̣
̣̣
̣
̣ ̣
̣̣
̣̣ ̣
̣̣
̣̣
̣̣ ̣
̣
X ≈ WZ> .
oy
ms
more likely Braveheart
y-B
au
ya ove
to reflect user opinion. ated with a vector pu ∈ f. For a given item
ke
nb
he run es
n
Jou ra ich
on
Ro k L
tio
-D a be
ne
Amadeus The matrix factorization model
nD
de in ov
r sla
The Color Purple
l Te
nch ck
1.0 elements of qi measure the extent to
k
i, the
n
ille ol. 1 ulie
lle st al
Pu rt Hu
Be Lo n M
T
fac orn Kill: V J
can readily accept varying confidence
ea
Joh
ll
which the item possesses those factors,
Ha
rs
IH
ing
nie
levels, which let it give less weight to
l B ill B
Be
e
An
an
positive or negative. For a given user u,
ad aso lf B ed
tur K
nK
0.5
er
d
less meaningful observations. If con-
ize
ing
a
e
ak
the elements of pu measure the extent of
Cit
e
tF
Na
Go
fidence in observing rui is denoted as
e
Ro vs. J Ha
oic
car
Lethal Weapon
y
interest the user has in items that are high
s: S o e W tru ’s Ch
dd
S
Tri n
Fre
cui, then the model enhances the cost
e
Factor vector 2
Oz
Sense and
ea f M er ck
of
o
Ocean’s 11 (Equation 5) to account
function for
S
dy
rd
Geared Sensibility Geared
on nd W ns
iza
d
Fre
son us e
tive or negative. The resulting dot product,
eW
toward confidence as follows: toward
1 c
Th
i
females males qiT pu, captures the interaction between user
d
us
r
–0.5
e W he e W
Arm the st Ya
rio
¤
Th T Th
min cui(rui - µ - bu - bi u and item i—the user’s overall interest in
an nge
F
tw on
n
ast Lo
p *,q *,b*
ma
( u ,i )K
Ca edd
the item’s characteristics. This approximates
e F The
ter m
Dave
Ste e
o
t
rid
Sis mo
ag
Ac
ly
wa an
Ug
yB
p
The Lion King - pu qi) + λ (|| pu || + || qi ||
T 2 2 2
na tt
user
–1.0u’s rating of item i, which is denoted by
a
te
Th
Ru anh
yo
Dumb and (8)
Co
+ bu2 + bi2)
M
rui, leading to the estimate
in
Dumber
id
Ma
The Princess Independence
Diaries For information on a real-life ap- –1.5
Day r̂ui = qiT pu. (1)
plication involving such schemes, –1.5 –1.0 –0.5 0.0 0.5 1.0
Gus
refer to “Collaborative Filtering for Factor vector 1
Escapist The major challenge is computing the map-
Implicit Feedback Datasets.” 10
ping
Figure of3.each itemtwo
The first and user to
vectors factor
from vectors
a matrix decomposition of the Netflix Prize
Figure 2. A simplified illustration of the latent factor PRIZE
NETFLIX approach, which qdata. Selected movies
theare placed at the appropriate
system spot based on their factor
, pu ∈ f. After recommender
characterizes both users and movies using two axes—male versus female i
vectors in two dimensions. The plot reveals distinct genres, including clusters of
COMPETITION completes this mapping, it can easily esti-
and serious versus escapist. movies with strong female leads, fraternity humor, and quirky independent films.
In 2006, the online DVD rental mate the rating a user will give to any item
company Netflix announced a con- by using Equation 1.
recommendation. These methods have become test to improve
popularthe in state of itsSuchrecommender
a model issystem.
12
To
closely related Our winning
to singular valueentries
decom- consist of more than 100 differ-
enable this, the company released a training set of more ent predictor sets, the majority of which are factorization
recent years by combining good scalability with predictive position (SVD), a well-established technique for identifying
Recall that for K-means, K was than 100 million ratings spanning about 500,000 anony-
accuracy. In addition, they offer much flexibility for model-
models using some variants of the methods described here.
latent semantic factors in information retrieval. Applying
mous customers and their ratings on more than 17,000 Our discussions with other top teams and postings on the
ing various real-life situations. SVD in the collaborative
movies, each movie being rated on a scale of 1 to 5 stars. filtering domain requires
public contest factoring
forum indicate that these are the most popu-
the number of clusters. (Similarly
Recommender systems rely on different
input data, which are often placed in a of matrix
types
Participating of submit
teams
with one3 milliondue
approximately
the user-item
predicted
to theand
ratings, high
rating
ratings for matrix.
portion
Netflix
a test set This often
lar andraises
of missing values
calculates caused by
Factorizing
difficulties
successful methods for predicting ratings.
thesparse-
Netflix user-movie matrix allows us
dimension representing users and the other dimension
-square errorness in the user-item ratings matrix. Conventional SVD is
for GMMs, K was the number of
representing items of interest. The mosttruth.
a root-mean
The firstdata
convenient team that can
(RMSE) based
improve when
undefined
on the held-out
on the Netflix
knowledge algo- about
to discover
ingthemovie
the most
preferences.
matrix
descriptive
is incom-
dimensions for predict-
We can identify the first few most
is high-quality explicit feedback, which rithm’s
includes RMSE performance
explicit by 10Moreover,
plete. percent orcarelessly
more wins addressing
a important
only dimensions
the relatively from a matrix decomposition and
Regularization
mouse movements. Implicit feedback usually
and won the 2007 Progress Prize with the best score
chase history, browsing history, search patterns, or even
time: 8.43 percent better than
denotes the
p*
( u ,i )K
with team Big Chaos to win the 2008 Progress Prize with a
Later, we aligned
at the
min ¤ (rui - qiTpu)2 + λ(|| qi ||2 + || pu ||2)
q *,Netflix.
don, Runaway Bride). There are interesting intersections
(2)
between these boundaries: On the top left corner, where
indie meets lowbrow, are Kill Bill and Natural Born Kill-
presence or absence of an event, so it isscore typically
of 9.46repre-
percent. At the Here,
time of κ isthis the set ofwe
writing, theare(u,i)
still pairsers,
for both
whicharty is known
ruimovies that play off violent themes. On the
sented by a densely filled matrix. in first place, inching toward (thethe training
10 percent set).landmark. bottom right, where the serious female-driven movies meet
We can add a regularizer and mini-
A BASIC MATRIX FACTORIZATION MODEL
The system learns the model by fitting the previously
observed ratings. However, the goal is to generalize those
AUGUST 2009 35
mize the following cost:
Matrix factorization models map both users and items
to a joint latent factor space of dimensionality f, such that
previous ratings in a way that predicts future, unknown
ratings. Thus, the system should avoid overfitting the
user-item interactions are modeled as inner products in observed data by regularizing the learned parameters,
that space. Accordingly, each item i is associated with a whose magnitudes are penalized. The constant λ controls
X
>
2 λw λz Authorized licensed use limited to: ETH BIBLIOTHEK ZURICH. Downloaded on October 1, 2009 at 12:51 from IEEE Xplore. Restrictions apply.
1
32 2 xdn − (WZ )dn + kWkFrob + kZk2Frob
COMPUTER
2
2 2
(d,n)∈Ω
Authorized licensed use limited to: ETH BIBLIOTHEK ZURICH. Downloaded on October 1, 2009 at 12:51 from IEEE Xplore. Restrictions apply.
∂
fd,n(W, Z)
∂zn0,k
(
− xdn − (WZ )dn wd,k if n0 = n
>
=
0 otherwise
Alternating Least-Squares (ALS)
For simplicity, let us first assume
that there are no missing ratings,
that is Ω = [D] × [N ]. Then
N
D X
1
X >
2
2 xdn − (WZ )dn
d=1 n=1
= 21 kX − WZ>k2Frob .