Lecture 2 Part1
Lecture 2 Part1
Systems
Lecture 2: Neighborhood-
Based Collaborative Filtering
Imranuddin
Part 1
Neighborhood-based Collaborative
Filtering algorithms
• Note: An important distinction between user-based collaborative filtering and item-based collaborative filtering algorithms is that
the ratings in the former case are predicted using the ratings of neighboring users, whereas the ratings in the latter case are
predicted using the user’s own ratings on neighboring (i.e., closely related) items. In the former case, neighborhoods are defined by
similarities among users (rows of ratings matrix), whereas in the latter case, neighborhoods are defined by similarities among items
(columns of ratings matrix).
Collaborative Filtering Problem
Formulation
• We assume that the user-item ratings matrix is an incomplete m × n matrix R = [ruj ]
containing m users and n items. It is assumed that only a small subset of the ratings matrix
is specified or observed. Neighborhood-based collaborative filtering algorithms can be
formulated in one of two ways:
• 1. Predicting the rating value of a user-item combination: This is the simplest and most
primitive formulation of a recommender system. In this case, the missing rating ruj of the
user u for item j is predicted.
• 2. Determining the top-k items or top-k users: In most practical settings, the merchant is not
necessarily looking for specific ratings values of user-item combinations. Rather, it is more
interesting to learn the top-k most relevant items for a particular user, or the top-k most
relevant users for a particular item. The problem of determining the top-k items is more
common than that of finding the top-k users. This is because the former formulation is used
to present lists of recommended items to users. In traditional recommender algorithms, the
“top-k problem” almost always refers to the process of finding the top-k items, rather than
the top-k users. However, the latter formulation is also useful to the merchant because it can
be used to determine the best users to target with marketing efforts.
Key Properties of Ratings Matrices
1. Continuous ratings
2. Interval-based ratings
3. Ordinal ratings
4. Binary ratings
5. Unary ratings
For the m× n ratings matrix R = [ruj ] with m users and n items, let Iu denote the set of item indices for
which ratings have been specified by user (row) u.
For example, if the ratings of the first, third, and fifth items (columns) of user (row) u are specified
(observed. and the remaining are missing, then we have Iu = {1, 3, 5}. Therefore, the set of items
rated by both users u and v is given by Iu ∩ Iv. For example, if user v has rated the first four items,
then Iv = {1, 2, 3, 4}, and Iu ∩ Iv = {1, 3, 5} ∩ {1, 2, 3, 4} = {1, 3}. It is possible (and quite common)
for Iu ∩ Iv to be an empty set because ratings matrices are generally sparse. The set Iu ∩ Iv defines the
mutually observed ratings, which are used to compute the similarity between the uth and vth users for
neighborhood computation.
Strictly speaking, the traditional definition of Pearson(u, v) mandates that the values of μu and μv
should be computed only over the items that are rated both by users u and v
Example
Overall neighborhood-based
prediction function ->
Example