0% found this document useful (0 votes)
100 views20 pages

Fast Maximum Margin Matrix Factorization

This document summarizes collaborative filtering methods for predicting user preferences in recommender systems. It describes low-rank matrix factorization approaches that represent users and items as vectors of factors inferred from existing ratings. A maximum margin matrix factorization (MMMF) method is proposed to directly optimize the hinge loss between predicted and observed ratings. MMMF finds a local minimum of the non-convex objective function using gradient descent on factor matrices U and V. An experimental study on movie rating datasets shows MMMF outperforms other factorization models in terms of normalized mean absolute error.

Uploaded by

Rahul Batra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
100 views20 pages

Fast Maximum Margin Matrix Factorization

This document summarizes collaborative filtering methods for predicting user preferences in recommender systems. It describes low-rank matrix factorization approaches that represent users and items as vectors of factors inferred from existing ratings. A maximum margin matrix factorization (MMMF) method is proposed to directly optimize the hinge loss between predicted and observed ratings. MMMF finds a local minimum of the non-convex objective function using gradient descent on factor matrices U and V. An experimental study on movie rating datasets shows MMMF outperforms other factorization models in terms of normalized mean absolute error.

Uploaded by

Rahul Batra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

Jason D.M.

Rennie Nathan Srebro


MIT Univ. of Toronto

BY:
RAHUL BATRA (MT13049)
SHILPA GARG (MT12049)
COLLABORATIVE PREDICTION

LOW RANK MATRIX FACTORIZATION

MAXIMUM MARGIN MATRIX FACTORIZATION(MMMF)

OPTIMIZATION METHOD

EXPERIMENTAL STUDY OF VARIOUS FACTOR MODELS ON


1MILLION MOVIELENS DATASET
Collaborative Prediction
Based on partially observed matrix:
Predict unobserved entries “Will user i like movie j?”
movies
2 1 4 5
5 4 ? 1 3
3 5 2
4 ? 5 3 ?
4 1 3 5
2 1 ? 4
1 5 5 4
2 ? 5 ? 4
users

3 3 1 5 2 1
3 1 2 3
4 5 1 3
3 3 ? 5
2 ? 1 1
5 2 ? 4 4
1 3 1 5 4 5
1 2 4 5 ?
Matrix Factorization

V’

U X Y
q
Ordinal Regression
Feature Vectors

v10
Preference Weights

v1

v3
v2

v4
v5
v6
v7
v8
v9
w1 -2.5 1.4 -0.9 5.6 3.4 -1.8 -2.9 0.2 -4.2 2.1 2 3 3 5 5 2 2 3 1 4
Preference Scores q Ratings

1 2 3 4 5
Low Rank Matrix Factorization
2 45 1 42
31 22 5 4 × V’
4 2 41
3 34 2
31
4 X
Y
23 1 43
22 1
2
4 5 U = rank k
2 414 23
1 3 11 43
4 22 5 31

• Sum-Squared Loss Use SVD to find


• Fully Observed Y Global Optimum

• Classification Error Loss


Non-convex
• Partially Observed Y No explicit soln.
Problems with other factor models and
low rank approximations-
• In a collaborative prediction setting, only some of the entries
of Y are observed, and the low-rank matrix X minimizing the
sum-squared distance to the observed entries can no longer
be computed in terms of a singular value decomposition
• The problem of finding a low-rank approximation to a partially
observed matrix is a difficult non-convex optimization
problem with many local minima
• Low-rank approximations constrain the dimensionality of the
factorization X = UV’, i.e. the number of allowed factors.
Factor Model MMMF
User Preference Weights
movies
+1.5 × v1 comic value
Features + +2.4 × v2 dramatic value
+ -1.9 × v3 violence

Feature Values

movies
Preferences 1.4 -5.9 2.2 -0.8 -3.7 4.6 -1.8 3.5

movies
Ratings 3 1 4 3 2 5 2 4
Matrix Factorization
Feature Vectors

v10
v1

v3

v8
v2

v4
v5
v6
v7

v9
w1 -2.5 1.4 -0.9 5.6 3.1 -1.8 -2.7 0.2 -4.2 2.1 3 5 2 3 4
w2 0.2 -4.2 2.1 -2.5 1.4 -0.9 5.6 3.1 -1.8 -2.7 3 4 2 3 3 5 2 2
w3 3.1 -1.8 -2.7 0.2 -4.2 2.1 -2.5 1.4 -0.9 5.6 5 2 2 3
w4 1.4 -0.9 5.6 3.1 -1.8 -2.7 0.2 -4.2 2.1 -2.5 5 5 2 1 4 2
w5 -4.2 2.1 -2.5 1.4 -0.9 5.6 3.1 -1.8 -2.7 0.2 1 4 2 3 5 2
Preference Weights

w6 -1.8 -2.7 0.2 -4.2 2.1 -2.5 1.4 -0.9 5.6 3.1 2 3 1 4 3 3 5
w7 -0.9 5.6 3.1 -1.8 -2.7 0.2 -4.2 2.1 -2.5 1.4 5 2 3 1 2 3
w8 2.1 -2.5 1.4 -0.9 5.6 3.1 -1.8 -2.7 0.2 -4.2 4 3 2 2
w9 -2.7 0.2 -4.2 2.1 -2.5 1.4 -0.9 5.6 3.1 -1.8 2 3 4 2 3 5 2
w10
w11
5.6 3.1 -1.8 -2.7 0.2 -4.2 2.1 -2.5 1.4 -0.9
-2.5 1.4 -0.9 5.6 3.1 -1.8 -2.7 0.2 -4.2 2.1
q 2 3
2 2 1 2
5 2 1
3
w12 0.2 -4.2 2.1 -2.5 1.4 -0.9 5.6 3.1 -1.8 -2.7 3 4 2 3 3 5 2 2
w13 3.1 -1.8 -2.7 0.2 -4.2 2.1 -2.5 1.4 -0.9 5.6 5 2 1 2 3 5
w14 1.4 -0.9 5.6 3.1 -1.8 -2.7 0.2 -4.2 2.1 -2.5 3 3 5 2 3 1 4
w15 -4.2 2.1 -2.5 1.4 -0.9 5.6 3.1 -1.8 -2.7 0.2 4 2 3 5 2 3
Preference Scores Ratings
Norm Constrained Factorization

low norm
V’ ||X||tr = minU,V
(||U||Fro2 + ||V||Fro2)
/2
||U||Fro2 =∑i,j Uij2
U X
MMMF Objective
Original Objective All-Thresholds

minX ||X||tr + c loss(X,Y)


Factorized Objective
minU,V (||U||Fro2 + ||V||Fro2)/2
+ c loss(UV’,Y)
All-Thresholds
low norm
V’
||U||Fro2 = ∑i,j Uij2 U X
LOW NORM FACTORIZATION-
First we consider binary labels Y £ {±1} nxm

Then we seek a minimum trace norm X that matches the


observed labels with a margin of one.

Now to minimize the hinge loss which is h(z)=max(0,1-z) , we


can write our optimization problem as-
OPTIMIZATION PROBLEM-
Our Problem can be re-written as –

where-

To relate the real-valued Xij to the discrete Yij we use R − 1 thresholds Θ1, Θ2..,
ΘR−1

So now we need to minimize -


OPTIMIZATION PROBLEM-

This gradient descent is used to locally optimize J(U,V,Θ)


Here we ignored the non-differentiability of hinge function h(z) at z=1,
and hence it is referred to as smooth hinge.
The only problem that arises now is that the J(U,V,Θ) is not a convex
function of U,V but the optimization problem was a convex function of
X,Θ.
So now we locally minimize the hinge function, as-
Smooth Hinge

Shown are the loss function values (left) and gradients


(right) for the Hinge and Smooth Hinge. The gradients are
identical outside the region z £ (0, 1).
Local Minima
Factorized Objective
minU,V (||U||Fro2 + ||V||Fro2)/2
+ c loss(UV’,Y)

 Optimize U,V with grad. descent

Optimize X with SDP


Local Minima

Y
Y

Data: 100 x 100 MovieLens, 65% sparse


Collaborative Prediction Results
EachMovie MovieLens
size, sparsity: 36656x1648, 96% 6040x3952, 96%
Weak Strong Weak Strong
NMAE NMAE NMAE NMAE
Algorith
m
URP .4422 .4557 .4341 .4444
Attitude .4520 .4550 .4320 .4375
MMMF .4397 .4341 .4156 .4203
Summary
• We scaled MMMF to large problems by optimizing the
Factorized Objective
• Empirical tests indicate that local minima issues are rare or
absent
• We compare against results obtained by Marlin (2004) and
find that MMMF substantially outperforms all nine methods
he tested.
THANK YOU

You might also like