0% found this document useful (0 votes)
8 views20 pages

26 Matrix Factorization

This document discusses matrix factorization for machine learning recommendations. It begins by introducing the problem of predicting missing ratings in a user-item matrix. It then describes matrix factorization as a machine learning approach where [1] latent factors derived from user and item matrices interact to produce ratings, and [2] these latent factors can be learned through optimization to accurately predict missing ratings. The document explains how matrix factorization is implemented using either stochastic gradient descent, which learns factors simultaneously, or alternating least squares, which learns one factor at a time while holding the other constant. It concludes by noting autoencoders can also be used for matrix factorization recommendations.

Uploaded by

ONCE Thaqief
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views20 pages

26 Matrix Factorization

This document discusses matrix factorization for machine learning recommendations. It begins by introducing the problem of predicting missing ratings in a user-item matrix. It then describes matrix factorization as a machine learning approach where [1] latent factors derived from user and item matrices interact to produce ratings, and [2] these latent factors can be learned through optimization to accurately predict missing ratings. The document explains how matrix factorization is implemented using either stochastic gradient descent, which learns factors simultaneously, or alternating least squares, which learns one factor at a time while holding the other constant. It concludes by noting autoencoders can also be used for matrix factorization recommendations.

Uploaded by

ONCE Thaqief
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

Machine Learning

Matrix Factorization

Dong-Kyu Chae

PI of the Data Intelligence Lab @HYU


Department of Computer Science
Hanyang University
Problem Setting
❑ Suppose we have a sparse user-item rating matrix
❑ Which rating will be given on each empty cell?
Problem Setting
❑ This is a typical recommendation scenario
❑ Several ways to predict future ratings
❑ Simple approach: popularity sorting, High Avg. rating, …
▪ Simple and easy-to-implement
▪ Global recommendation (non-personalized)
Problem Setting
❑ This is a typical recommendation scenario
❑ Several ways to predict future ratings
❑ Heuristic approach: Neighborhood-based
▪ Looking for the most similar K users
▪ Aggregating their ratings on a target movie

• 𝑁: Set of neighbors for user u


• sim(u,u’): similarity between users
• 𝑟𝑢ҧ : Average of ratings of user u
Problem Setting
❑ This is a typical recommendation scenario
❑ Several ways to predict future ratings
❑ Machine learning approach?
▪ What is ground truth?: observed ratings
▪ What is our model?: Matrix factorization! (a.k.a. latent factor models)
Matrix Factorization
❑ Ratings come from interaction between latent factors
❑ Assumption: there are user latent factors and item latent factors that
result in the rating score.

[a, b, c] x = 5
5

[a′, b′, c′]


[a, b, c] [a′, b′, c′]
<user latent factors>
3

[a, b, c] x = 3

[a′, b′, c′]


[a′, b′, c′]
<item latent factors> <linear interaction> <ratings>
Matrix Factorization
❑ Latent factors
❑ Features that describe characteristics of users and items hidden in data
❑ We don′t know the explicit meaning of each factor in the latent matrices,
but there might be “some”meaning

How much the user 2 cares about


the first factor
How much the item 3 satisfies the
first factor

V
A low-rank matrix of
U
item latent factors
R A low-rank matrix of
user latent factors
Matrix Factorization
❑ Perspective of latent space
Matrix Factorization
❑ How can I know such latent factor values?
❑ “Learn” the latent factors, 𝑈 and 𝑉, that may originate the ratings
❑ All the latent factors in 𝑈 and 𝑉 are model parameters
❑ Randomly initialized, and learned through some optimization
techniques.
Objective:
Matrix Factorization
❑ Rating prediction with the completely learned U and V
❑ We can reconstruct “dense” rating matrix through 𝑈⋅𝑉𝑇
Matrix Factorization
❑ Objective function: minimizing the sum of squared error

❑ Practically: there is an overfitting issue

100 -81
Matrix Factorization
❑ Objective function with the regularization term

❑ The goodness of fit is to reduce the prediction error


❑ The regularization term is used to alleviate the overfitting problem

❑ Two ways to training the MF model


❑ Stochastic gradient descent (SGD)
▪ Training U and V simultaneously
❑ Alternating least squares (ALS)
▪ Fix one of U and V, and then optimize the other
Stochastic Gradient Descent
❑ Objective:

❑ Process:

1. Initialize U and V randomly


2. Repeat:
1. Choose a rating (r, u, i) randomly
2. Then the loss on the chosen (u, i) is:
1
( )
2
3. Update via
Stochastic Gradient Descent
1
❑ Partial derivative of the loss 𝐿 = ( )
2

𝜕𝐿 1 ′
= 2𝑒𝑢𝑖 ∙ 𝑒𝑢𝑖 +2𝜆𝑈𝑢
𝜕𝑈𝑢 2


= 𝑒𝑢𝑖 ∙ 𝑒𝑢𝑖 +𝜆𝑈𝑢

= 𝑒𝑢𝑖 ∙ (−𝑉𝑖𝑇 ) + 𝜆𝑈𝑢

𝜕𝐿 1 ′
= 2𝑒𝑢𝑖 ∙ 𝑒𝑢𝑖 +2𝜆𝑉𝑖
𝜕𝑉𝑖 2


= 𝑒𝑢𝑖 ∙ 𝑒𝑢𝑖 +𝜆𝑉𝑖

= 𝑒𝑢𝑖 ∙ (−𝑈𝑢 ) + 𝜆𝑉𝑖


Alternating Least Squares
❑ Fix one, then the loss surface becomes simple

𝝏𝑬
=0
𝝏𝑼𝒊
Alternating Least Squares
❑ Objective:

❑ Procedure

1. Initialize the user latent factors U randomly


2. For each item 𝑖, let 𝑟𝑖 be the vector of ratings of that item. Compute:

𝑉𝑖𝑇 = 𝑈 𝑇 𝑈 + 𝜆𝐼 −1 𝑈 𝑇 𝑟
𝑖
(here, 𝑈 is constant)

3. For each user 𝑢, let 𝑟𝑢 be the vector of ratings of that item. Compute:

𝑈𝑢𝑇 = 𝑉 𝑇 𝑉 + 𝜆𝐼 𝑉 𝑟𝑢 (here, 𝑉 is constant)


−1 𝑇

4. Repeat 2), 3) until convergence


Alternating Least Squares
❑ Partial derivative of the loss
❑ When U is constant, 𝐿 = )

𝑇
𝐿 = 𝑟𝑖 − 𝑈𝑉𝑖𝑇 𝑟𝑖 − 𝑈𝑉𝑖𝑇 + 𝜆𝑉𝑖 𝑉𝑖𝑇 *

Let 𝑉𝑖𝑇 = 𝛽. Then,

𝐿 = 𝑟𝑖 − 𝑈𝛽 𝑇 𝑟𝑖 − 𝑈𝛽 + 𝜆𝛽 𝑇 𝛽

𝜕𝐿
Let = 0. Then,
𝜕𝛽

𝛽 = 𝑉𝑖𝑇 = 𝑈 𝑇 𝑈 + 𝜆𝐼 −1 𝑈 𝑇 𝑟
𝑖

repeat i=1 to n
Alternating Least Squares
❑ Partial derivative of the loss
❑ When V is constant, 𝐿 = )

Let 𝑟𝑢 = 𝑅𝑢𝑇 . Then, we can re-write the loss:


*
𝐿 = 𝑟𝑢 − 𝑉𝑈𝑢𝑇 𝑇
𝑟𝑢 − 𝑉𝑈𝑢𝑇 + 𝜆𝑈𝑢 𝑈𝑢𝑇

Let 𝑈𝑢𝑇 = 𝛽. Then,

𝐿 = 𝑟𝑢 − 𝑉𝛽 𝑇 𝑟𝑢 − 𝑉𝛽 + 𝜆𝛽 𝑇 𝛽

𝜕𝐿
Let = 0. Then,
𝜕𝛽

𝛽 = 𝑈𝑢𝑇 = 𝑉 𝑇 𝑉 + 𝜆𝐼 −1 𝑇
𝑉 𝑟𝑢

repeat u=1 to m
Autoencoder for MF
❑ Same objective function, different model
❑ Matrix factorization => Autoencoder
❑ Two types of Autoencoders for recommendation
❑ user-based and item-based
❑ Users and items are represented as vectors

Learned embeddings
[ x,
3,




x,
5]
reconstruction loss
2
ℒ𝑚𝑠𝑒 =σ𝑖 (𝐫𝒊 − 𝐫ො𝒊 ) ∙ 𝐲𝒊
Actual Predicted Indicator
ratings ratings {0, 1}
Thank You

You might also like