Rec Sys
Rec Sys
Pawan Goyal
CSE, IITKGP
What is given?
User model: ratings, preferences, demographics, situational context
Items: with or without description of item characteristics
What is given?
User model: ratings, preferences, demographics, situational context
Items: with or without description of item characteristics
Find
Relevance score: used for ranking
What is given?
User model: ratings, preferences, demographics, situational context
Items: with or without description of item characteristics
Find
Relevance score: used for ranking
Final Goal
Recommend items that are assumed to be relevant
What is given?
User model: ratings, preferences, demographics, situational context
Items: with or without description of item characteristics
Find
Relevance score: used for ranking
Final Goal
Recommend items that are assumed to be relevant
But
Remember that relevance might be context-dependent
Characteristics of the list might be important (diversity)
Approach
Use the “wisdom of the crowd” to recommend items
Approach
Use the “wisdom of the crowd” to recommend items
Given an active user Alice and an item i not yet seen by Alice
The goal is to estimate Alice’s rating for this item, e.g., by
Given an active user Alice and an item i not yet seen by Alice
The goal is to estimate Alice’s rating for this item, e.g., by
I Find a set of users who liked the same items as Alice in the past and who
have rated item i
I use, e.g. the average of their ratings to predict, if Alice will like item i
I Do this for all items Alice has not seen and recommend the best-rated ones
Pearson Correlation
∑p∈P (ra,p − ra )(rb,p − rb )
sim(a, b) = q q
∑p∈P (ra,p − ra )2 ∑p∈P (rb,p − rb )2
a, b: users
ra,p : rating of user a for item p
P: set of items, rated both by a and b
ra , rb : user’s average ratings
Possible similarity values are between -1 to 1
Pearson Correlation
∑p∈P (ra,p − ra )(rb,p − rb )
sim(a, b) = q q
∑p∈P (ra,p − ra )2 ∑p∈P (rb,p − rb )2
a, b: users
ra,p : rating of user a for item p
P: set of items, rated both by a and b
ra , rb : user’s average ratings
Possible similarity values are between -1 to 1
Calculate, whether the neighbor’s ratings for the unseen item i are higher
or lower than their average
Combine the rating differences - use similarity as a weight
Add/subtract neighbor’s bias from the active user’s average and use this
as a prediction
Basic Idea
Use the similarity between items to make predictions
Basic Idea
Use the similarity between items to make predictions
For Instance
Look for items that are similar to Item5
Take Alice’s ratings for these items to predict the rating for Item5
~a ·~b
sim(~a,~b) =
|~a| ∗ |~b|
Adjusted cosine similarity: take average user ratings into account
Explicit ratings
Most commonly used (1 to 5, 1 to 10 response scales)
Research topics: what about multi-dimensional ratings?
Challenge: Sparse rating matrices, how to stimulate users to rate more
items?
Explicit ratings
Most commonly used (1 to 5, 1 to 10 response scales)
Research topics: what about multi-dimensional ratings?
Challenge: Sparse rating matrices, how to stimulate users to rate more
items?
Implicit ratings
clicks, page views, time spent on some page, demo downloads ..
Can be used in addition to explicit ones; question of correctness of
interpretation
Straight-forward approach
Use another method (e.g., content-based, demographic or simply
non-personalized) in the initial phase
Straight-forward approach
Use another method (e.g., content-based, demographic or simply
non-personalized) in the initial phase
Alternatives
Use better algorithms (beyond nearest-neighbor approaches)
Example: Assume “transitivity” of neighborhoods
Recursive CF
Assume there is a very close neighbor n of u who however has not rated
the target item i yet.
Recursive CF
Assume there is a very close neighbor n of u who however has not rated
the target item i yet.
Apply CF-method recursively and predict a rating for item i for the
neighbor n
Recursive CF
Assume there is a very close neighbor n of u who however has not rated
the target item i yet.
Apply CF-method recursively and predict a rating for item i for the
neighbor n
Use this predicted rating instead of the rating of a more distant direct
neighbor
Basic Idea
Both users and items are characterized by vectors of factors, inferred
from item rating patterns
High correspondence between item and user factors leads to a
recommendation.
Mk = Uk × Σk × Vk T
Both users and items are mapped to a joint latent factor space of
dimensionality f ,
Both users and items are mapped to a joint latent factor space of
dimensionality f ,
User-item interactions are modeled as inner products in that space
Both users and items are mapped to a joint latent factor space of
dimensionality f ,
User-item interactions are modeled as inner products in that space
Each item i associated with a vector qi ∈ Rf , and each user u associated
with a vector pu ∈ Rf
Both users and items are mapped to a joint latent factor space of
dimensionality f ,
User-item interactions are modeled as inner products in that space
Each item i associated with a vector qi ∈ Rf , and each user u associated
with a vector pu ∈ Rf
qi measures the extent to which the item possesses the factors, positive
or negative
Both users and items are mapped to a joint latent factor space of
dimensionality f ,
User-item interactions are modeled as inner products in that space
Each item i associated with a vector qi ∈ Rf , and each user u associated
with a vector pu ∈ Rf
qi measures the extent to which the item possesses the factors, positive
or negative
pu measures the extent of interest the user has in items that are high on
the corresponding factors, positive or negative
Both users and items are mapped to a joint latent factor space of
dimensionality f ,
User-item interactions are modeled as inner products in that space
Each item i associated with a vector qi ∈ Rf , and each user u associated
with a vector pu ∈ Rf
qi measures the extent to which the item possesses the factors, positive
or negative
pu measures the extent of interest the user has in items that are high on
the corresponding factors, positive or negative
qi T pu captures the interaction between user u and item i
Both users and items are mapped to a joint latent factor space of
dimensionality f ,
User-item interactions are modeled as inner products in that space
Each item i associated with a vector qi ∈ Rf , and each user u associated
with a vector pu ∈ Rf
qi measures the extent to which the item possesses the factors, positive
or negative
pu measures the extent of interest the user has in items that are high on
the corresponding factors, positive or negative
qi T pu captures the interaction between user u and item i
This approximates user u’s rating of item i, denoted by rui
rˆui = qi T pu
Major Challenge
Computing the mapping of each item and user to factor vectors qi , pu ∈ Rf
Major Challenge
Computing the mapping of each item and user to factor vectors qi , pu ∈ Rf
Matrix factorization is quite flexible in dealing with various data aspects and
other application-specific requirements.
Matrix factorization is quite flexible in dealing with various data aspects and
other application-specific requirements.
Adding Biases
Some users might always give higher ratings than others, some items are
widely perceived as better than others.
Full rating value may not be explained solely by qi T pu
Matrix factorization is quite flexible in dealing with various data aspects and
other application-specific requirements.
Adding Biases
Some users might always give higher ratings than others, some items are
widely perceived as better than others.
Full rating value may not be explained solely by qi T pu
Identify the portion that individual user or item biases can explain
bui = µ + bi + bu
An Example
You want a first-order estimate for user Joe’s rating of the movie Titanic.
Let the average rating over all movies, µ, is 3.7 stars
Titanic tends to be rated 0.5 stars above the average
Joe is a critical user, who tends to rate 0.3 stars lower than the average
Thus, the estimate (bias) for Titanic’s rating by Joe would be (3.7+0.5-0.3)
= 3.9 stars
rˆui = µ + bi + bu + qi T pu
Four components: global average, item bias, user bias, user-item interaction
The squared error function:
∑ xi
i∈N(u)
Normalizing the sum: √
|N(u)|
Social Influence
Ratings are influenced by ratings of friends, i.e. friends are more likely to have
similar ratings than strangers
Social Influence
Ratings are influenced by ratings of friends, i.e. friends are more likely to have
similar ratings than strangers
Benefits
Can deal with cold-start users, as long as they are connected to the
social network
Exploit social influence, correlational influence, transitivity
Explore the network to find raters in the neighborhood of the target user
Aggregate the ratings of these raters to predict the rating of the target
user
Different methods to calculate the “trusted neighborhood” of users
Predicted Rating
∑ tu,v rv,i
v∈raters
rˆu,i =
∑ tu,v
v∈raters
Predicted Rating
∑ tu,v rv,i
v∈raters
rˆu,i =
∑ tu,v
v∈raters
Shortest distance?
Efficient
Taking a short distance gives high precision and low recall
One can consider raters up to a maximum-depth d, a trade-off between
precision (and efficiency) and recall
At step k, at node u
If u has rated i, return ru,i , otherwise
With probability φu,i,k , stop random walk, randomly select item j rated by u
and return ru,j
With probability 1 − φu,i,k , continue the random walk to a direct neighbor
of u
Pawan Goyal (IIT Kharagpur) Recommendation Systems October 29-30, 2015 51 / 61
Selecting φu,i,k
φu,i,k gives the probability of staying at u to select one of its items at step
k, while we are looking for a prediction on target item i
This probability should be related to the similarities of the items rated by u
and the target item i, consider the maximum similarity
The deeper we go into the network, the probability of continuing random
walk should decrease, so φu,i,k should increase with k
1
φu,i,k = max sim(i, j) × k
j∈RIu 1 + e− 2
where RIu denotes the set of items rated by user u
Selecting sim(i, j)
Let UCi,j be the set of common users, who have rated both items i and j, we
can define the correlation between items i and j as:
Selecting sim(i, j)
Let UCi,j be the set of common users, who have rated both items i and j, we
can define the correlation between items i and j as:
Three alternatives
Reaching a node which has expressed a rating on the target item i
At some user node u, decide to stay at the node and select one of the
items rated by u and return the rating for that item as result of the random
walk
The random walk might continue forever, so terminate when it is very far
(k > max − depth). What value of k ?
Three alternatives
Reaching a node which has expressed a rating on the target item i
At some user node u, decide to stay at the node and select one of the
items rated by u and return the rating for that item as result of the random
walk
The random walk might continue forever, so terminate when it is very far
(k > max − depth). What value of k ?
“six-degrees of separation”
Perform several random walks, as described before and the aggregation of all
ratings returned by different random walks are considered as the predicted
rating ruˆ0 ,i .
Perform several random walks, as described before and the aggregation of all
ratings returned by different random walks are considered as the predicted
rating ruˆ0 ,i .
Estimated rating for source user u on target item i:
XYu,i is the random variable for stopping the random walk at node v and
selecting item j rated by v
Intuition
Can we incorporate the Social information in the matrix factorization methods?
Intuition
Can we incorporate the Social information in the matrix factorization methods?
where rui is the actual rating given by user u to item i, rˆui approximates user
u’s rating of item i, simplest of the expression being qi T pu , though other biases
can also be incorporated.
Example Categories
Videos and DVDs
Books
Music
Toys
Software
Cars
...
Using the nomalized trust matrix S(c)∗ , a separate matrix mactorization model
is trained for each category c.
Using the nomalized trust matrix S(c)∗ , a separate matrix mactorization model
is trained for each category c.
+β ∑ ((qu (c) − ∑ S(c)∗ u,v q(c) v )(qu (c) − ∑ S(c)∗ u,v q(c) v )T )
all u v v
(c) 2 (c) 2
+λ(||q i || + ||p u || )
Consider the following ratings provided by 5 users, Alice, User1 - User4, to 5 items, Item1 to Item5.
Assume that there is an underlying social network between these 5 users, which is given by the following adjacency list. The network is directed.
Also, assume that the ratings given by the users to various items are same as in the above matrix, except that we do not have the ratings
provided by User1 and User2 to Item5 anymore. Suppose you are using the TrustWalker method to predict the rating of Item5 by the user
‘Alice’. Assuming that at each step, you can choose any of the direct neighbors with equal probability, find out the probability that the random