0% found this document useful (0 votes)
97 views30 pages

Factorization Machines

This document provides an introduction to factorization machines. Factorization machines combine the advantages of support vector machines and matrix factorization models. They are able to model interactions between variables in very sparse data efficiently in linear time. This makes them well-suited for applications involving sparse data like recommender systems. The document discusses how factorization machines can model different recommender system approaches like matrix factorization and pairwise interaction tensor factorization. It also provides information on available implementations and potential applications to click prediction.

Uploaded by

Guru75
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
97 views30 pages

Factorization Machines

This document provides an introduction to factorization machines. Factorization machines combine the advantages of support vector machines and matrix factorization models. They are able to model interactions between variables in very sparse data efficiently in linear time. This makes them well-suited for applications involving sparse data like recommender systems. The document discusses how factorization machines can model different recommender system approaches like matrix factorization and pairwise interaction tensor factorization. It also provides information on available implementations and potential applications to click prediction.

Uploaded by

Guru75
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

Factorization Machines

- Introduction
Bartłomiej Twardowski
18.10.2016
Warsaw Data Science Meetup
Polish English?
• Support Vector Machines
=> “maszyna wektorów
nośnych”

• Matrix Factorization =>


“faktoryzacja macierzy”

• Factorization Machines =>


“maszyna faktoryzująca”?

• LMGTFY:-) Let’s stick to the


English name then!
Motivation
• one of the most successful model with a great of
expressiveness

• great for begin with context-aware recommendations

• considered as base toolbox for advertisers/kagglers

• FFM presentation from many years ago was on


RecSys 2016 ( still, almost nothing new in it :-( )

• considered it as a fun and original subject for meetup


2015.10.6 - meetup about recommender systems
Not motivated enough?
Success stories.
1. competitions

2. more often appears in DS job offers


Factorization Machines
• S. Rendle 2010 [1]

• combines advantages os Support Vector


Machines(SVM) with factorization models

• generic (real-value features)

• incredible good for sparse data

• model expressiveness
MF - quick recap
Simplest problem formulation[3]:

• U - user set, I - item set


|U |⇥|I|
• matrix contains user ratings R 2 R

• find the best representation in k dimensional latent space for


user P (|U| × k) and items Q (|I| × k) so the matrix Rˆ is defined as: 


• to predict rating:
MF - quick recap

with regularization[4]:
Linear & Poly2 models
simple linear regression model:
n
X
ŷ(x) = w0 + w i xi
i=1

adding two-way interactions:


n
X n
X n
X
ŷ(x) = w0 + w i xi + vi,j xi xj
i=1 i=1 j=i+1
FM Model
for two-way interactions:

model parameters:

For each xi we have dedicated vector vi with k-features.


Then instead of weight wij for feature interactions we have
dot product:
2
Wait, it’s O(kn )! Not linear!
Making it O(kn)
2 3
x11 x12 x13 ... x1n
6x21 x22 x23 ... x2n 7
6 7
6x31 x32 x33 ... x3n 7
6 7
6 .. .. .. .. .. 7
4 . . . . . 5
xd1 xd2 xd3 ... xdn
Simplified version
for k = 1, n =2 perspective
let:

v1 x1 = a, v2 x2 = b

then:
2 2 2
(a + b) = a + 2ab + b
1 2 2 2
ab = (a + b) a b
2

And now it looks very familiar :-)


FM vs SVM
• FM combines the advantages of SVM and factorization
models

• general prediction working on real-values (like SVM)

• good estimates interactions model with huge sparsity,


where SVM fail (e.g. recommender systems)

• model equation of FMs can be calculated in linear time

• comparable to a polynomial kernel in SVM, but works


for very spars data and works fast.
Use case: Context-Aware
Recommender Systems
• U = {Alice (A),Bob (B),Charlie (C), . . .}

• I = {Titanic (TI),Notting Hill (NH), Star Wars (SW),


Star Trek (ST), . . .}

• S = {(A,TI, 2010-1, 5), (A,NH, 2010-2, 3), (A, SW,


2010-4, 1),(B, SW, 2009-5, 4), (B, ST, 2009-8, 5),
(C,TI, 2009-9, 1), (C, SW, 2009-12, 5)}

• Example from [1]


Example of input data
preparation
Why us FM for this?
The drawback of tensor factorization models and
even more for specialized factorization models is
that [1]:

(1) they are not applicable to standard prediction


data (e.g. a real valued feature vector)

(2) that specialized models are usually derived


individually for a specific task requiring effort in
modeling and design of a learning algorithm.
How about ranking?

http://www.tongji.edu.cn/~qiliu/lor_vs.html

Go for pairwise approach!


Model expressiveness
FM ~ MF
given

the model will then mimic a biased MF:


MF ~ PITF
given user x item x tag interactions as:

FM will mimic a pairwise interaction


tensor factorization model (PITF) [7]:
And others

(e.g. factorized NN, KNN++, SVD++, …)

presented in [2].
Field-aware FM
• Have been used to win two CTR competitions [5].

• Introducing grouped features - fields, eg. user,


color, time.

• Learn a different set of latent factors for every pair


of fields
Xn Xn Xn
ŷ(x) = w0 + w i xi + hvi,f (j) , vj,f (i) ixi xj
i=1 i=1 j=i+1

where f(i) is the field of a feature i.


Available implementations
• libfm (http://www.libfm.org/), SGD/ALS/MCMC

• FM for Julia (https://github.com/btwardow/


FactorizationMachines.jl)

• fastFM (https://github.com/ibayer/fastFM)

• DiFacto (https://github.com/dmlc/difacto)

• lightfm

• spark-libFM, libffm
My experiments with FM on GPU

The same implementation moved from numpy to Theano was


~7x faster! Without using any special GPU tricks.
Going for click prediction?

• feature engineering (counting features, like hist. ctr)

• hashing trick

• L1, FTRL using e.g. vw

• making new features - e.g. decision tree encoding


How about now? :-)
References
[1] Rendle, Steffen. "Factorization machines." 2010 IEEE International
Conference on Data Mining. IEEE, 2010.

[2] Rendle, Steffen. "Factorization machines with libfm." ACM


Transactions on Intelligent Systems and Technology (TIST) 3.3 (2012): 57.

[3] Takács, Gábor, et al. "Matrix factorization and neighbor based


algorithms for the netflix prize problem." Proceedings of the 2008 ACM
conference on Recommender systems. ACM, 2008.

[4] Paterek, Arkadiusz. "Improving regularized singular value


decomposition for collaborative filtering." Proceedings of KDD cup and
workshop. Vol. 2007. 2007.
References
[5] http://www.csie.ntu.edu.tw/~r01922136/slides/ffm.pdf

[6] SREBRO,N., RENNIE,J. D. M., AND JAAKOLA, T. S. 2005.


Maximum-margin matrix factorization. In Advances in Neural
Information Processing Systems 17,MIT 1329–1336.

[7] RENDLE,S. AND SCHMIDT-THIEME, L. 2010. Pairwise interaction


tensor factorization for personalized tag recommendation. In
Proceedings of the third ACM International Conference on Web
Search and Data Mining (WSDM’10). ACM, New York, NY, 81–90.
Q&A
@btwardow, Bartłomiej Twardowski
[email protected]

You might also like