0% found this document useful (0 votes)
31 views37 pages

UNIT3

UIUX

Uploaded by

Dharu Dharani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views37 pages

UNIT3

UIUX

Uploaded by

Dharu Dharani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 37

KCE-CSE –RS 2024

UNIT – III COLLABORATIVE FILTERING


Topic Topic
No.

12. A systematic approach, Nearest-neighbor collaborative


filtering (CF)

13. User-based and item-based CF,

14. Components of neighborhood methods

15. Rating normalization, similarity weight computation, and


neighborhood selection

UNIT-III- Page 1 of 29
KCE-CSE –RS 2024

12.A systematic approach, Nearest-neighbor collaborative filtering (CF)


Collaborative filtering is a technique used in recommendation systems to predict a
user's preferences or interests by collecting and analyzing information from many
users. It works based on the idea that if two users have similar preferences in the past,
they are likely to have similar preferences in the future.

There are two main approaches to collaborative filtering:

User-based collaborative filtering: This method identifies users who have similar
preferences or behavior to the target user and recommends items that they have
liked or interacted with. For example, if User A and User B have both liked similar
movies in the past, and User A likes a new movie, the system might recommend that
movie to User B.

Item-based collaborative filtering: Instead of comparing users, this method identifies


items that are similar to the items the user has liked or interacted with in the
past. It recommends items that are similar to those the user has already shown interest
in. For example, if a user likes a particular book, the system might recommend other
books that have been liked by users who also liked that book.

Collaborative filtering is widely used in recommendation systems in various domains


such as e-commerce, social media, and entertainment platforms to provide personalized
recommendations to users.

Working of collaborative filtering

Collaborative filtering works by leveraging the collective wisdom of a group of users to


make recommendations to individuals.

∙ Data Collection: The system collects data on user interactions with items, such as
ratings, purchases, clicks, likes, etc. This data is typically represented in a matrix
where rows represent users and columns represent items. The entries in the matrix
represent the users' interactions with the items (e.g., ratings, binary indicators of
likes, etc.).
∙ Similarity Calculation: The system then calculates the similarity between users or
items based on their past interactions. This is typically done using similarity metrics
such as cosine similarity, Pearson correlation, or Jaccard similarity.
∙ User-based Collaborative Filtering: For user-based collaborative filtering, the
system calculates the similarity between users. Users who have similar interaction
patterns are considered similar to each other.

UNIT-III- Page 2 of 29
KCE-CSE –RS 2024
∙ Item-based Collaborative Filtering: For item-based collaborative filtering, the
system calculates the similarity between items. Items that are frequently interacted
with by the same users are considered similar to each other.
∙ Neighborhood Selection: Based on the calculated similarities, the system selects a

neighborhood of users or items that are most similar to the target user or item. ∙
Prediction: Once the neighborhood is identified, the system predicts the user's
preference for items they have not yet interacted with. This prediction is typically based
on a weighted average of the ratings or interactions of the neighbors, where the weights
are the similarities between the target user or item and the neighbors. ∙
Recommendation Generation: Finally, the system generates recommendations by
selecting the top-rated items from the predicted preferences for the target user. ∙
Feedback Incorporation: As users interact with the recommended items, their
feedback is incorporated back into the system to update the recommendations for
future users.

This process iterates continuously, with the system refining its recommendations as
more data becomes available and as users' preferences evolve over time. Collaborative
filtering is powerful because it does not rely on explicit knowledge about items or
users but instead learns from the implicit feedback provided by user interactions.

Applications of collaborative filtering

Collaborative filtering has numerous applications across various industries and


domains. Here are some of the key applications:

∙ E-commerce Recommendations: One of the most common applications of


collaborative filtering is in e-commerce platforms like Amazon, where it's used to
recommend products to users based on their past purchases, browsing history, and
preferences.
∙ Movie and Music Recommendations: Services like Netflix and Spotify utilize
collaborative filtering to suggest movies, TV shows, or songs to users based on their
viewing or listening history and the behavior of similar users.
∙ Social Media Feeds: Social media platforms like Facebook, Instagram, and Twitter
use collaborative filtering to personalize users' feeds by recommending posts,
friends, or accounts to follow based on their interactions and the behavior of similar
users.
∙ News and Content Recommendations: News websites and content aggregators
employ collaborative filtering to suggest articles, videos, or other content to users
based on their reading history, interests, and the preferences of similar users.
∙ Job Recommendations: Job search platforms like LinkedIn use collaborative filtering
to recommend job postings to users based on their skills, experience, and the job
preferences of similar users.

UNIT-III- Page 3 of 29
KCE-CSE –RS 2024

∙ Travel Recommendations: Travel websites and apps utilize collaborative filtering to


recommend destinations, hotels, and activities to users based on their past bookings,
searches, and the preferences of similar travelers.
∙ Restaurant Recommendations: Restaurant review platforms like Yelp employ
collaborative filtering to suggest restaurants to users based on their dining history,
reviews, and the preferences of similar users.
∙ Healthcare Recommendations: Healthcare systems can use collaborative filtering to
recommend treatments, healthcare providers, or wellness programs to patients
based on their medical history, symptoms, and the preferences of similar patients

Collaborative filtering algorithms


Collaborative filtering algorithms are used to make predictions about the interests of
a user by collecting preferences from many users. Here are some common
collaborative filtering algorithms:

Memory-Based Collaborative Filtering:

∙ User-Based Collaborative Filtering: This algorithm recommends items based on


the similarity between users. It identifies users who have similar preferences to
the target user and recommends items that those similar users have liked or
interacted with.
∙ Item-Based Collaborative Filtering: This algorithm recommends items based on
the similarity between items. It identifies items that are similar to those the
target user has interacted with positively and recommends them.

Model-Based Collaborative Filtering:


∙ Matrix Factorization: Matrix factorization techniques, such as Singular Value
Decomposition (SVD) and Alternating Least Squares (ALS), decompose the user item
interaction matrix into lower-dimensional matrices representing user and

UNIT-III- Page 4 of 29
KCE-CSE –RS 2024

item latent factors. These latent factors capture the underlying patterns in the
data and are used to make predictions about user preferences.
∙ Factorization Machines: Factorization machines generalize matrix factorization
techniques by incorporating additional user and item features into the model.
They model interactions between user and item features to make more accurate
predictions.

Deep Learning-Based Collaborative Filtering:

∙ Neural Collaborative Filtering: This algorithm employs neural networks to learn


user-item interactions directly from the data. It uses embeddings to represent
users and items in a low-dimensional space and learns to predict user
preferences based on these embeddings.
∙ Autoencoders: Autoencoder-based models learn to reconstruct the input data by
passing it through a bottleneck layer with a lower-dimensional representation.
They can be used to learn user and item embeddings from the user-item
interaction data and make predictions based on these embeddings.

Hybrid Collaborative Filtering:

∙ Content-Based Collaborative Filtering: Hybrid models combine collaborative


filtering with content-based filtering techniques to leverage the strengths of both
approaches. They incorporate additional features such as item attributes or user
demographics into the recommendation process to improve prediction accuracy.

∙ Ensemble Methods: Ensemble methods combine multiple collaborative filtering


models to generate more accurate predictions. They aggregate the predictions of
individual models using techniques such as weighted averaging or stacking.

The choice of algorithm depends on factors such as the characteristics of the data,
the scalability requirements, and the specific recommendation problem being
addressed. Experimentation and evaluation are essential to determine the most
effective algorithm for a given application.
13.User-based and item-based CF

Collaborative filtering techniques can be broadly categorized into two main types:
user-based and item-based collaborative filtering. Additionally, hybrid approaches
that combine elements of both user-based and item-based methods are also common.
Here's an overview of each technique:

UNIT-III- Page 5 of 29
KCE-CSE –RS 2024

User-Based Collaborative Filtering:

∙ Nearest Neighbor: This approach identifies a set of users similar to the target user
based on their past interactions with items. Recommendations are then generated
by aggregating the preferences of these similar users. Common similarity metrics
include cosine similarity and Pearson correlation.
∙ User-User Collaborative Filtering: In this technique, the system calculates the
similarity between users and uses this similarity to predict the preferences of the
target user for items they have not yet interacted with. The predictions are typically
generated by averaging or weighted averaging the ratings of similar users for the
target items.

Item-Based Collaborative Filtering:

∙ Nearest Neighbor: Similar to user-based collaborative filtering, this approach


identifies a set of items similar to the target item based on their interactions with
users. Recommendations are then generated by selecting items that are similar to
those the user has interacted with positively.
∙ Item-Item Collaborative Filtering: In this technique, the system calculates the
similarity between items based on the patterns of user interactions.
Recommendations are generated by identifying items similar to those the user has
interacted with positively and selecting the top-rated items from this set.

Hybrid Collaborative Filtering:

∙ Model Combination: Hybrid approaches combine user-based and item-based


collaborative filtering models to leverage the strengths of both techniques. For
example, predictions from both user-based and item-based models can be combined
using weighted averaging or other ensemble techniques to improve
recommendation accuracy.

UNIT-III- Page 6 of 29
KCE-CSE –RS 2024

∙ Feature Combination: Hybrid models can also incorporate additional features such
as content-based features or demographic information to enhance recommendation
quality. These features are combined with collaborative filtering models to generate
more personalized recommendations.

Each collaborative filtering technique has its strengths and weaknesses, and the choice
of technique depends on factors such as the characteristics of the data, the scalability
requirements, and the specific recommendation problem being addressed.
Experimentation and evaluation are essential to determine the most effective approach
for a given application.

Example for user based collaborative filtering

Let's consider an example of user-based collaborative filtering in the context of movie


recommendations.

Suppose we have a small dataset containing information about user ratings for a few
movies
User Movie A Movie B Movie Movie D
C

User1 5 4 - 3

User2 - 3 4 -

User3 4 - 5 2

User4 2 5 3 -

In this dataset, users have rated movies on a scale of 1 to 5, where "-" indicates that the
user has not rated that particular movie.

Now, let's say we want to recommend movies to User 1, who has rated "Movie A" with 5
stars and "Movie B" with 4 stars. We can use user-based collaborative filtering to find
other users similar to User 1 and recommend movies that those similar users have rated
highly.

Similarity Calculation:

We calculate the similarity between User 1 and each of the other users using a similarity
metric such as cosine similarity or Pearson correlation coefficient. For example:

Similarity(User 1, User 2) = ? (since User 1 and User 2 have rated one movie in
common)

Similarity(User 1, User 3) = ? (since User 1 and User 3 have rated two movies in
common)

Similarity(User 1, User 4) = ? (since User 1 and User 4 have rated two movies in
common)

UNIT-III- Page 7 of 29
KCE-CSE –RS 2024

Neighborhood Selection:

Based on the calculated similarities, we select a subset of similar users (the


neighborhood) for User 1. For example, if we decide to consider the top 2 most similar
users, our neighborhood might include User 3 and User 4.

Prediction:

We predict User 1's rating for "Movie C" by averaging the ratings of Users 3 and 4 for
that movie. Let's say User 3 rated "Movie C" with 5 stars and User 4 rated it with 3 stars.
So, our predicted rating for "Movie C" for User 1 would be (5 + 3) / 2 = 4 stars.

Recommendation Generation:

Finally, we recommend "Movie C" to User 1 since it has the highest predicted rating
among the movies User 1 has not yet rated.

Item based collaborative filtering

Example of item-based collaborative filtering in the e-commerce domain, specifically for


recommending products to users on an online retail platform.

Suppose we have a dataset containing information about user purchases or interactions


with various products.
User Produ Produ Produ Produ
ct A ct B ct C ct D

User1 1 0 1 0

User2 0 1 1 1
User3 1 1 0 0

User4 0 1 0 1

In this dataset, each row represents a user, and each column represents a product. The
entries of the matrix indicate whether a user has purchased or interacted with a
particular product (1 indicates interaction, 0 indicates no interaction).

Now, let's say we want to recommend products similar to "Product A" to User

1. Similarity Calculation:

We calculate the similarity between "Product A" and each of the other products based
on the interactions of users who have interacted with both products. For example:

Similarity(Product A, Product B) = ? (since Users 1 and 3 have both interacted with


these products)

Similarity(Product A, Product C) = ? (since Users 1 and 2 have both interacted with


these products)

UNIT-III- Page 8 of 29
KCE-CSE –RS 2024

Similarity(Product A, Product D) = ? (since User 3 has interacted with "Product A" but
not "Product D")

Neighborhood Selection:

Based on the calculated similarities, we select a subset of similar products (the


neighborhood) for "Product A". For example, if we decide to consider the top 2 most
similar products, our neighborhood might include "Product B" and "Product C".

Prediction:

We predict User 1's likelihood of interacting with "Product D" by combining the
interactions of "Product D" from Users 1 and 4, along with the similarities between
"Product D" and the products in the neighborhood. The prediction could be computed
using a weighted average or another aggregation method.

Recommendation Generation:

Finally, we recommend "Product D" to User 1 since it has the highest predicted
likelihood of interaction among the products similar to those already interacted with by
User 1.

This example demonstrates how item-based collaborative filtering can be applied in the
e-commerce domain to recommend products to users based on the similarity of their
interactions with other products.
Suppose we have a small dataset representing user ratings for movies:

Ea
ch row represents a user, and each column represents a movie. A rating of 0 indicates
that the user has not rated the movie.

Now, let's say we want to recommend movies to User E based on collaborative filtering,
i.e., by finding users with similar tastes and recommending movies that they liked but
User E hasn't seen yet.
Steps to follow:
Compute Similarity: Calculate the similarity between User E and other users using a
similarity metric such as cosine similarity.

UNIT-III- Page 9 of 29
KCE-CSE –RS 2024

Find Neighbors: Select the top-k most similar users to User E as neighbors. Generate
Recommendations: Recommend movies to User E based on the movies liked by their
neighbors that User E hasn't seen yet.

Let's perform these steps:


Compute Similarity:

Calculate the cosine similarity between User E and each of the other
users. For example, cosine similarity between User E and User A:

Similarly, calculate similarity with other users.

Find Neighbors:
Suppose we choose the top 2 most similar users as neighbors. Let's say User A and User
D are the closest neighbors.

Generate Recommendations:
Recommend movies liked by User A and User D that User E hasn't seen yet. For example,
User A liked Movie 1 and Movie 2, and User D liked Movie 5 and Movie 4. So, we can
recommend Movie 1, Movie 2, Movie 5, and Movie 4 to User E.

These recommended movies are based on the assumption that users with similar tastes
tend to like similar items. Collaborative filtering leverages this idea to provide
personalized recommendations to users.
Predicting Ratings with Neighborhood-Based Methods

The basic idea in neighborhood-based methods is to use either user-user similarity or


item-item similarity to make recommendations from a ratings matrix. The concept of a
neighborhood implies that we need to determine either similar users or similar items in
order to make predictions.

1.User-Based Neighborhood Models

In this approach, user-based neighborhoods are defined in order to identify similar


users to the target user for whom the rating predictions are being computed. In order to
determine the neighborhood of the target user i, her similarity to all the other users is
computed. Therefore, a similarity function needs to be defined between the ratings
specified by users.

UNIT-III- Page 10 of 29
KCE-CSE –RS 2024

For the m× n ratings matrix R = [ruj ] with m users and n items, let Iu denote the set of
item indices for which ratings have been specified by user (row) u. For example, if the
ratings of the first, third, and fifth items (columns) of user (row) u are specified
(observed) and the remaining are missing, then we have Iu = {1, 3, 5}. Therefore, the set
of items rated by both users u and v is given by Iu ∩ Iv. For example, if user v has rated
the first four items, then Iv = {1, 2, 3, 4}, and Iu ∩ Iv = {1, 3, 5} ∩ {1, 2, 3, 4} = {1, 3}. It is
possible (and quite common) for Iu ∩ Iv to be an empty set because ratings matrices are
generally sparse. The set Iu ∩ Iv defines the mutually observed ratings, which are used
to compute the similarity between the uth and vth users for neighborhood computation

In this case, the ratings of five users 1 . . . 5 are indicated for six items denoted by 1 . . . 6.
Each rating is drawn from the range {1 . . . 7}. Consider the case where the target user
index is 3, and we want to make item predictions on the basis of the ratings in Table.

Need to compute the predictions ˆr31 and ˆr36 of user 3 for items 1 and 6 in order to
determine the top recommended item.

The first step is to compute the similarity between user 3 and all the other users.
Two possible ways of computing similarity in the last two columns of the same table.
The second-last column shows the similarity based on the raw cosine between the
ratings and the last column shows the similarity based on the Pearson correlation
coefficient.

For example, the values of Cosine(1, 3) and Pearson(1, 3) are computed as follows:

UNIT-III- Page 11 of 29
KCE-CSE –RS 2024

The Pearson and raw cosine similarities of user 3 with all other users are illustrated in
the final two columns of Table. Pearson correlation coefficient is much more
discriminative and the sign of the coefficient provides information about similarity and
dissimilarity.

The top-2 closest users to user 3 are users 1 and 2 according to both measures. By using
the Pearson-weighted average of the raw ratings of users 1 and 2, the following
predictions are obtained for user 3 with respect to her unrated items 1 and 6:

Thus, item 1 should be prioritized over item 6 as a recommendation to user 3.


Furthermore, the prediction suggests that user 3 is likely to be interested in both movies
1 and 6 to a greater degree than any of the movies she has already rated. This is,
however, a result of the bias caused by the fact that the peer group {1, 2} of user indices
is a far more optimistic group with positive ratings, as compared to the target user 3.

Let us now examine the impact of mean-centered ratings on the prediction. The mean
centered ratings are illustrated in Table below.
The corresponding predictions with mean-centered Equation are as follows:

UNIT-III- Page 12 of 29
KCE-CSE –RS 2024

Thus, the mean-centered computation also provides the prediction that item 1 should
be prioritized over item 6 as a recommendation to user 3. There is, however, one crucial
difference from the previous recommendation.

In this case, the predicted rating of item 6 is only 0.86, which is less than all the other
items that user 3 has rated. This is a drastically different result than in the previous
case, where the predicted rating for item 6 was greater than all the other items that user
3 had rated.

Upon visually inspecting Table.1 (or Table 2), it is indeed evident that item 6 ought to be
rated very low by user 3 (compared to her other items), because her closest peers
(users 1 and 2) have also rated it lower than their other items. Thus, the mean-centering
process enables a much better relative prediction with respect to the ratings that have
already been observed

Example2

Suppose we have a small dataset representing user ratings for movies:


Let's say we want to select the top-k users most similar to User E to form the user-based
neighborhood. We'll use cosine similarity as the similarity metric.

Calculate Similarity: Compute the cosine similarity between User E and each of the other
users.

For example, cosine similarity between User E and User A:

Similarly, calculate similarity with other users.

2. Sort : Sort the users based on their similarity with User E in descending order.
Users

UNIT-III- Page 13 of 29
KCE-CSE –RS 2024
For example, after sorting, the similarity between User E and other users might be:

∙ User A: 1.467

∙ User B: 0.965

∙ User C: 0.576

∙ User D: 0.802

3. Select : Select the top-k users with the highest similarity scores as
Neighbors neighbors.

For example, if we choose k = 2, then the selected neighbors for User E would be User
A and User D.

4. : Recommend items to User E based on the items liked by


Generate
Recommendations

their neighbors that User E hasn't seen yet.


For example, User A liked Movie 1 and Movie 2, and User D liked Movie 5 and Movie 4.
So, we can recommend Movie 1, Movie 2, Movie 5, and Movie 4 to User E.

These recommended movies are based on the assumption that users with similar
tastes tend to like similar items. User-based neighborhood selection leverages this
idea to provide personalized recommendations to users.

2. Item-Based Neighborhood Models

In item-based models, peer groups are constructed in terms of items rather than users.
Therefore, similarities need to be computed between items (or columns in the ratings
matrix).

Before computing the similarities between the columns, each row of the ratings matrix
is centered to a mean of zero. As in the case of user-based ratings, the average rating of
each item in the ratings matrix is subtracted from each rating to create a mean-centered
matrix.

First, the similarity between items are computed after adjusting for mean-centering. The
mean-centered ratings matrix is illustrated in Table 2. The corresponding adjusted
cosine similarities of each item to 1 and 6, respectively, are indicated in the final two
rows of the table.

For example, the value of the adjusted cosine between items 1 and 3, denotedby
AdjustedCosine(1, 3), is as follows:

UNIT-III- Page 14 of

29

KCE-CSE –RS 2024


Other item-item similarities are computed in an exactly analogous way, and are
illustrated in the final two rows of Table 2. It is evident that items 2 and 3 are most
similar to item 1, whereas items 4 and 5 are most similar to item 6.

Therefore, the weighted average of the raw ratings of user 3 for items 2 and 3 is used to
predict the rating ˆr31 of item 1, whereas the weighted average of the raw ratings of
user 3 for items 4 and 5 is used to predict the rating ˆr36 of item 6:

Thus, the item-based method also suggests that item 1 is more likely to be preferred by
user 3 than item 6.

However, in this case, because the ratings are predicted using the ratings of user 3
herself, the predicted ratings tend to be much more consistent with the other ratings of
this user.

As a specific example, it is noteworthy that the predicted rating of item 6 is no longer


outside the range of allowed ratings, as in the case of the user-based method. The
greater prediction accuracy of the item-based method is its main advantage. In some
cases, the item-based method might provide a different set of top-k recommendations,
even though the recommended lists will generally be roughly similar.

Suppose we have a small dataset representing user ratings for movies:

Let's say we want to select the top-k items most similar to Movie 1 to form the item
based neighborhood. We'll again use cosine similarity as the similarity metric.

1. : Transpose the ratings matrix so that rows represent


Transpose the Ratings
Matrix

items and columns represent users.


UN
IT-III- Page 15 of 29
KCE-CSE –RS 2024

2. : Compute the cosine similarity between Movie 1 and each of the


Calculate
Similarity

other movies.
For example, cosine similarity between Movie 1 and Movie 2:

Similarly, calculate similarity with other items.

3. Sort : Sort the items based on their similarity with Movie 1 in descending order.
Items

For example, after sorting, the similarity between Movie 1 and other movies might be:

∙ Movie 2: 0.71

∙ Movie 3: 0.48

∙ Movie 4: 0.56

∙ Movie 5: 0.48

4. : Select the top-k items with the highest similarity scores as


Select
Neighbors
neighbors.
For example, if we choose k = 2, then the selected neighbors for Movie 1 would be
Movie 2 and Movie 4.

5. : Recommend items to the user based on the selected


Generate
Recommendations

neighbors.
For example, if a user has interacted with Movie 1, we can recommend Movie 2 and
Movie 4 to the user based on the item-based neighborhood model.

This approach leverages the idea that similar items are likely to be preferred by users
who have interacted with the same item. Item-based neighborhood selection provides
personalized recommendations to users based on the similarity between items.

Comparing User-Based and Item-Based Methods

∙ Item-based methods often provide more relevant recommendations because of the


fact that a user’s own ratings are used to perform the recommendation. In item based
methods, similar items are identified to a target item, and the user’s own ratings on
those items are used to extrapolate the ratings of the target

UNIT-III- Page 16 of 29
KCE-CSE –RS 2024

∙ For example, similar items to a target historical movie might be a set of other
historical movies. In such cases, the user’s own recommendations for the similar set
might be highly indicative of her preference for the target. This is not the case for
user-based methods in which the ratings are extrapolated from other users, who
might have overlapping but different interests. As a result, item-based methods
often exhibit better accuracy.

∙ Although item-based recommendations are often more likely to be accurate, the


relative accuracy between item-based and user-based methods also depends on the
data set at hand.

∙ Item-based methods are also more robust to shilling attacks in recommender


systems. On the other hand, it is precisely these differences that can lead to greater
diversity in the recommendation process for user-based methods over item-based
methods.

∙ Diversity refers to the fact that the items in the ranked list tend to be somewhat
different. If the items are not diverse, then if the user does not like the first item, she
might not also like any of the other items in the list.

∙ Greater diversity also encourages serendipity, through which somewhat surprising


and interesting items are discovered. Item-based methods might sometimes
recommend obvious items, or items which are not novel from previous user
experiences.
Let's compare the user-based and item-based collaborative filtering models
based on several factors:

1. Data :
Representation

∙ : Represents the ratings matrix where rows correspond to

User-Based
Model

users and columns correspond to items.


∙ : Represents the transposed ratings matrix where rows

Item-Based
Model

correspond to items and columns correspond to users.


2. Similarity :
Calculation

User-Based
Model

∙ : Computes similarity between users based on their ratings for items.


∙ : Computes similarity between items based on the ratings

Item-Based
Model

they received from users.


3. Neighbor :
Selection

∙ : Selects a subset of users most similar to the target user as

User-Based
Model

neighbors.
∙ : Selects a subset of items most similar to the target item as

Item-Based
Model

neighbors.
4. Recommendation :
Generation

∙ : Recommends items that neighbors liked but the target user

User-Based
Model

hasn't interacted with yet.

UNIT-III- Page 17 of 29
KCE-CSE –RS 2024

∙ : Recommends items similar to those the target user has

Item-Based
Model

already interacted with.


5. Computation :
Complexity

∙ : Complexity depends on the number of users and the

User-Based
Model

number of items rated by each user.


∙ : Complexity depends on the number of items and the

Item-Based
Model

number of users who rated each item.


6. Sparsity :
Handling

∙ : Prone to sparsity issues when users have rated only a few

User-Based
Model

items.
∙ : Better at handling sparsity since items are usually rated by

Item-Based
Model

multiple users.
7. Cold Start :
Problem
∙ : Faces a cold start problem when a new user joins the system

User-Based
Model

since there's no user similarity information available.


∙ : Faces a cold start problem when a new item is added to the

Item-Based
Model

system since there's no item similarity information available.


8. Scalabili :
ty

∙ : Scales better with a large number of users but may suffer

User-Based
Model

from scalability issues with a large number of items.


∙ : Scales better with a large number of items but may suffer

Item-Based
Model

from scalability issues with a large number of users.


9. Performan :
ce

∙ : Tends to perform better when users have similar tastes.

User-Based
Model

Item-Based
Model

∙ : Tends to perform better when items have consistent quality or characteristics.


In summary, both user-based and item-based collaborative filtering models have their
strengths and weaknesses. The choice between them depends on factors such as the
nature of the dataset, the sparsity of the data, the scalability requirements, and the
characteristics of the recommendation problem. It's common for recommendation
systems to employ a hybrid approach that combines both user-based and item-based
models to leverage their respective strengths and mitigate their weaknesses.

14.Components of neighborhood methods

Neighborhood-based collaborative filtering algorithms can be formulated in one of two


ways:

1. Predicting the rating value of a user-item combination: This is the simplest and
most primitive formulation of a recommender system. In this case, the missing rating
ruj of the user u for item j is predicted.

2. Determining the top-k items or top-k users: In most practical settings, the
merchant is not necessarily looking for specific ratings values of user-item
combinations.

UNIT-III- Page 18 of 29
KCE-CSE –RS 2024

Rather, it is more interesting to learn the top-k most relevant items for a
particular user, or the top-k most relevant users for a particular item.

The problem of determining the top-k items is more common than that of finding the
top-k users. This is because the former formulation is used to present lists of
recommended items to users in Webcentric scenarios.

∙ In traditional recommender algorithms, the “top-k problem” almost always refers


to the process of finding the top-k items, rather than the top-k users. However, the
latter formulation is also useful to the merchant because it can be used to
determine the best users to target with marketing efforts.
∙ The two aforementioned problems are closely related. For example, in order to
determine the top-k items for a particular user, one can predict the ratings of each
item for that user.
∙ The top-k items can be selected on the basis of the predicted rating. In order to
improve efficiency, neighborhood-based methods pre-compute some of the data
needed for prediction in an offline phase. This pre-computed data can be used in
order to perform the ranking in a more efficient way.

The main advantages of neighborhood-based methods are:

∙ Simplicity: Neighborhood-based methods are intuitive and relatively simple to


implement. In their simplest form, only one parameter (the number of neighbors
used in the prediction) requires tuning.
∙ Justifiability: Such methods also provide a concise and intuitive justification for
the computed predictions. For example, in item-based recommendation, the list of
neighbor items, as well as the ratings given by the user to these items, can be
presented to the user as a justification for the recommendation
∙ Efficiency: One of the strong points of neighborhood-based systems is their
efficiency. Unlike most model-based systems, they require no costly training
phases, which need to be carried out at frequent intervals in large commercial
applications.
o While the recommendation phase is usually more expensive than for
model-based methods, the nearest-neighbors can be pre-computed in an
offline step, providing near instantaneous recommendations.
o Moreover, storing these nearest neighbors requires very little memory,
making such approaches scalable to applications having millions of users
and items.
∙ Stability: Another useful property of recommender systems based on this
approach is that they are little affected by the constant addition of users, items
and ratings, which are typically observed in large commercial applications.
o For instance, once item similarities have been computed, an item-based
system can readily make recommendations to new users, without having
to re-train the system.

UNIT-III- Page 19 of 29
KCE-CSE –RS 2024

o Moreover, once a few ratings have been entered for a new item, only the
similarities between this item and the ones already in the system need to
be computed.

Key properties of rating metrices

The ratings matrix is denoted by R, and it is an m × n matrix containing m users and n


items. Therefore, the rating of user u for item j is denoted by r uj . Only a small subset
of the entries in the ratings matrix are typically specified. The specified entries of the
matrix are referred to as the training data, whereas the unspecified entries of the
matrix are referred to as the test data. This definition has a direct analog in
classification, regression, and semisupervised learning algorithms

Types of ratings

Continuous Ratings: Continuous ratings are ratings provided on a continuous


scale, meaning they can take any value within a specified range. For example, a user
might rate a product on a scale from 1 to 100 or provide a numerical rating with
decimal points (e.g., 4.5 out of 5 stars). Continuous ratings offer fine-grained feedback
and allow users to express nuanced opinions about items.

Interval-Based Ratings: Interval-based ratings are similar to continuous ratings but


are provided within a specific interval or range. For example, a user might rate a
movie on a scale from 1 to 10 or a product on a scale from 0 to 100. Interval-based
ratings are commonly used in rating systems to provide a structured way for users to
express their preferences.

Ordinal Ratings: Ordinal ratings involve ranking items based on their perceived
quality or preference, without specifying the magnitude of the difference
between ranks. For example, users might rank products from best to worst or rate
them on a scale such as "excellent," "good," "average," "poor," etc. Ordinal ratings
capture the relative ordering of preferences but do not provide information about the
magnitude of differences between items.

Binary Ratings: Binary ratings are ratings that indicate whether a user likes or
dislikes an item. They are typically represented as binary values (e.g., 1 for like, 0
for dislike) or as true/false values. Binary ratings are simple to collect and interpret
but provide limited information about the strength or intensity of user preferences.

Unary Ratings: Unary ratings involve users providing a single rating or feedback
without specifying any alternative options. For example, users might rate a movie
with a thumbs-up or thumbs-down, indicating whether they enjoyed it or not. Unary
ratings are straightforward to collect and can be useful for quick feedback but lack
granularity compared to multi-level rating systems.

UNIT-III- Page 20 of 29
KCE-CSE –RS 2024

Each type of rating has its advantages and limitations, and the choice of rating type
depends on factors such as the complexity of the recommendation task, user
preferences, and the specific goals of the recommendation system. Effective
recommendation systems often incorporate multiple types of ratings to capture
diverse aspects of user preferences and behavior.
Neighborhood methods in collaborative filtering recommendation systems involve
several components that work together to provide personalized recommendations
based on the preferences of users or the characteristics of items. Here are the key
components of neighborhood methods:
1. Rating :
Matrix

∙ The rating matrix represents the interactions between users and items, where
each cell contains the rating given by a user to an item. It forms the basis for
similarity calculations in neighborhood methods.

2. Similarity :
Metric

∙ A similarity metric is used to quantify the similarity between users or items


based on their ratings. Common similarity metrics include cosine similarity,
Pearson correlation coefficient, and Jaccard similarity. The choice of similarity
metric depends on the characteristics of the data and the recommendation
problem.

3. Neighborhood :
Selection

∙ Neighborhood selection involves identifying a subset of users or items that are


most similar to a target user or item. This subset, known as the neighborhood,
is used to make personalized recommendations. Neighborhood selection can be
based on fixed-size k-nearest neighbors or a threshold-based approach.

4. Prediction or Recommendation :
Generation

∙ Once the neighborhood is selected, predictions or recommendations are


generated for the target user or item. For user-based methods, predictions are
typically generated by aggregating the ratings of items in the neighborhood,
weighted by their similarity to the target item. For item-based methods,
recommendations are generated by selecting items from the neighborhood that
the user has not interacted with yet.

5. Aggregation :
Function

∙ In user-based neighborhood methods, an aggregation function is used to


combine the ratings of items in the neighborhood to generate a prediction for
the target user. Common aggregation functions include weighted sum,
weighted average, and weighted median. The weights are typically the
similarities between the target item and items in the neighborhood.

6. Normalizati :
on

∙ Normalization is often performed to address biases in the ratings matrix. It


involves subtracting the mean rating of a user or an item from each rating to
center the ratings around zero. Normalization helps in improving the accuracy
of similarity calculations and predictions.

7. Scalability :
Techniques
∙ Scalability techniques are employed to handle large-scale datasets efficiently. This
may include dimensionality reduction techniques like Singular Value

UNIT-III- Page 21 of 29
KCE-CSE –RS 2024

Decomposition (SVD), memory-based caching, or distributed computing


frameworks.

8. Cold Start :
Handling

∙ Cold start handling addresses the challenges posed by new users or items that
have limited or no interaction history. Techniques such as item popularity-
based recommendations, content-based recommendations, or hybrid
approaches are used to provide recommendations in cold start scenarios.

By integrating these components effectively, neighborhood methods can provide


personalized and accurate recommendations to users based on their historical
interactions or item preferences.

15.Rating normalization, similarity weight computation, and neighborhood


selection

1.Rating Normalization

When it comes to assigning a rating to an item, each user has its own personal scale.
Even if an explicit definition of each of the possible ratings is supplied (e.g.,1=“strongly
disagree”, 2=“disagree”, 3=“neutral”, etc.), some users might be reluctant to give
high/low scores to items they liked/disliked.
Two of the most popular rating normalization schemes that have been proposed to
convert individual ratings to a more universal scale are mean-centering and Z-score.

Mean-centering

The idea of mean-centering is to determine whether a rating is positive or negative by


comparing it to the mean rating. In user-based recommendation, a raw rating rui is
transformation to a mean-centered one h(rui) by subtracting to rui the average ru of the
ratings given by user u to the items in Iu:

Using this approach the user-based prediction of a rating rui is obtained

as
In the same way, the item-mean-centered normalization of rui is given

by

UNIT-III- Page 22 of 29
KCE-CSE –RS 2024

where ri corresponds to the mean rating given to item i by user in Ui. This
normalization technique is most often used in item-based recommendation, where a
rating rui is predicted as:

User mean centering


Assume we have a dataset of ratings given by users for three different items
User Item1 Item2 Item3

1 4 3 5

2 2 5 4

3 3 2 3

4 5 4 2

5 1 2 3

1. Calculate the mean rating for each item


Mean rating for user1=4+3+5/3=12/3=4
Mean rating for user2=2+5+4/3=11/3=3.6
Mean rating for user3=3+2+3/3=8/3=2.6
Mean rating for user4=5+4+2/3=11/3=3.6
Mean rating for user5=1+2+3/3=6/3=2

2. Subtract the mean rating for each item from the corresponding ratings
New rating (User 1, Item 1) =4−4=0
New rating (User 1, Item 2) =3−4=−1
New rating (User 1, Item 3) =5−4=1

So, after mean centering, the dataset would look something like this:

User Item1 Item2 Item3

1 0 -1 1

2 -1.6 1.4 0.4

3 0.4 -0.6 0.4

4 1.4 0.4 -1.6

5 -1 0 1

Item mean centering


1. Calculate the mean rating for each item.

Mean rating for item1=4+2+3+5+1/5=15/5=3


Mean rating for item2=3+5+2+4+2/5=16/5=3.2
Mean rating for item3=5+4+3+2+3/5=17/5=3.4

2. Subtract the mean rating for each item from the corresponding ratings.

UNIT-III- Page 23 of 29
KCE-CSE –RS 2024
User Item1 Item2 Item3

1 4 3 5

2 2 5 4

3 3 2 3

4 5 4 2

5 1 2 3
New rating (Item 1, user1) =4−3=1
New rating (Item 1,user2) =2−3=-1
New rating (Item 1,user3) =3−3=0
New rating (Item 1,user4) =3−3=0
New rating (Item 1,user5) =3−3=0

After mean centering, the dataset would look something like this:
User Item1 Item2 Item3

1 1 0.2 1.6

2 -1 1.8 0.6

3 0 -1.2 -0.4

4 0 0.8 -1.4

5 0 -1.2 -0.4

z-score normalization

Z-score normalization, also known as standardization, is a method used to transform


data into a standard normal distribution with a mean of 0 and a standard deviation of 1.
This normalization technique is widely used in statistics and machine learning to
compare different datasets and to prepare data for certain algorithms that require
standardized inputs
The formula for z-score normalization is:

In user-based methods, the normalization of a rating rui divides the user-mean centered
rating by the standard deviation σu of the ratings given by user u:

A user-based prediction of rating rui using this normalization approach would therefore
be obtained as

UNIT-III- Page 24 of 29
KCE-CSE –RS 2024
Likewise, the z-score normalization of rui in item-based methods divides the itemmean
centered rating by the standard deviation of ratings given to item i:

The item-based prediction of rating rui would then be

Choosing a normalization scheme


∙ In some cases, rating normalization can have undesirable effects. For instance,
imagine he case of a user that gave only the highest ratings to the items he has
purchased.
∙ Mean-centering would consider this user as “easy to please” and any rating below
this highest rating (whether it is a positive or negative rating) would be considered
as negative.
∙ However, it is possible that this user is in fact “hard to please” and carefully selects
only items that he will like for sure. Furthermore, normalizing on a few ratings can
produce unexpected results.
∙ For example, if a user has entered a single rating or a few identical ratings, his rating
standard deviation will be 0, leading to undefined prediction values. Nevertheless, if
the rating data is not overly sparse, normalizing ratings has been found to
consistently improve the predictions
∙ Comparing mean-centering with Z-score, as mentioned, the second one has the
additional benefit of considering the variance in the ratings of individual users or
items. This is particularly useful if the rating scale has a wide range of discrete
values or if it is continuous.
∙ On the other hand, because the ratings are divided and multiplied by possibly very
different standard deviation values, Z-score can be more
sensitive than mean-centering and, more often, predict ratings that are outside the
rating scale.
∙ Finally, if rating normalization is not possible or does not improve the results, another
possible approach to remove the problems caused by the individual rating scale is
preference-based filtering.
∙ The particularity of this approach is that it focuses on predicting the relative
preferences of users instead of absolute rating values. Since the rating scale does not
change the preference order for items, predicting relative preferences removes the
need to normalize the ratings
UNIT-III- Page 25 of 29
KCE-CSE –RS 2024

2. Similarity Weight Computation

Correlation-based similarity
A measure of the similarity between two objects a and b, often used in information
retrieval, consists in representing these objects in the form of two vectors xa and xb and
computing the Cosine Vector (CV) (or Vector Space) similarity between these vectors:

In the context of item recommendation, this measure can be employed to compute user

similarities by considering a user u as a vector if


user u has rated item i, and 0 otherwise. The similarity between two users u and v
would then be computed as

where Iuv once


more denotes the items rated by both u and v. A problem with this measure is that is
does not consider the differences in the mean and variance of the ratings made by users
u and v.

A popular measure that compares ratings where the effects of mean and variance have
been removed is the Pearson Correlation (PC) similarity:

Note that this is different from computing the CV similarity on the Z-score normalized
ratings, since the standard deviation of the ratings is evaluated only on the common
items Iuv, not on the entire set of items rated by u and v, i.e. Iu and Iv. The same idea can
be used to obtain similarities between two items i and j, this time by comparing the
ratings made by users that have rated both of these items:

While the sign of a similarity weight indicates whether the correlation is direct or
inverse, its magnitude (ranging from 0 to 1) represents the strength of the correlation.

Let's consider a dataset where we have ratings for three movies (Movie A, Movie B, and
Movie C) from three users (User 1, User 2, and User 3):

UNIT-III- Page 26 of 29
KCE-CSE –RS 2024

User 1 User 2 User 3


Movie A 5 3 0
Movie B 4 0 2
Movie C 0 2 5

The cosine similarity between each pair of movies using the formula:

Where:

∙ u and v are the rating vectors for two movies.


∙ u⋅ is the dot product of the rating vectors.
v

∥ and ∥v∥ are the Euclidean norms of the rating vectors.
u

Let's compute the cosine similarity manually:

Movie A vs. Movie :


B

∙ Rating vectors: and


u=[5,4, v=[3,0,
0] 2]

∙ Dot product: ∙ Euclidean norms: and


5×3+4×0+0×2=155×3+4×0+0×2=15 =sqrt(1
3)

∥u∥=sqrt(52+42+02∙ Cosine similarity:


sqrt(41) 15/sqrt(41)×sqrt(13)≈0.955
∥v∥=sqrt(32+02+2
2
)
Movie A vs. :
Movie C

∙ Rating vectors: and


u=[5,4, v=[0,2,
0] 5]

∙ Dot product: ∙ Euclidean norms:

5×0+4×2+0×5=85×0+4×2+
∥v∥=sqrt(42+22+52)=sqrt(4
5 ))

∥u∥=sqrt(4 and ∥v∥=sqrt(4


1)
∙ Cosine similarity: 8/sqrt(41)×sqrt(45)≈0.355

Movie B vs. :
Movie C

∙ Rating vectors: and


u=[4,0, v=[0,2,
2] 5]

∙ Dot product: 4×0+0×2+2×5=104×0+0×2+2×5=10

+22)=sqrt(20)
2
∥v∥=sqrt(4
and 5))

∙ Euclidean norms: )
∥u∥=sqrt(42+02+22)=sqrt(20
∙ Cosine similarity: 10/sqrt(20)×sqrt(45)≈0.739

So, the cosine similarity matrix would be:

This matrix quantifies the similarity between each pair of movies based on user
ratings. Higher values indicate greater similarity.
UNIT-III- Page 27 of 29
KCE-CSE –RS 2024

Let's consider another example with a slightly larger dataset. Suppose we have five
movies (Movie 1, Movie 2, Movie 3, Movie 4, and Movie 5) and four users (User A, User
B, User C, and User D) with the following ratings:

3. Neighborhood selection
The number of nearest-neighbors to select and the criteria used for this selection can
also have a serious impact on the quality of the recommender system. The selection of
the neighbors used in the recommendation of items is normally done in two steps: 1) a
global filtering step where only the most likely candidates are kept, and 2) a per
prediction step which chooses the best candidates for this prediction.

UNIT-III- Page 28 of 29
KCE-CSE –RS 2024
Pre-filtering of neighbors
∙ In large recommender systems that can have millions of users and items, it is usually
not possible to store the (non-zero) similarities between each pair of users or items,
due to memory limitations.
∙ Moreover, doing so would be extremely wasteful as only the most significant of these
values are used in the predictions.
∙ The pre-filtering of neighbors is an essential step that makes neighborhood-based
approaches practicable by reducing the amount of similarity weights to store, and
limiting the number of candidate neighbors to consider in the predictions. There are
several ways in which this can be accomplished:

• Top-N filtering:
For each user or item, only a list of the N nearest-neighbors and their respective
similarity weight is kept. To avoid problems with efficiency or accuracy, N should be
chosen carefully. Thus, if N is too large, an excessive amount of memory will be required
to store the neighborhood lists and predicting ratings will be slow. On the other hand,
selecting a too small value for N may reduce the coverage of the recommendation
method, which causes some items to be never recommended.

• Threshold filtering:
Instead of keeping a fixed number of nearest-neighbors, this approach keeps all the
neighbors whose similarity weight has a magnitude greater than a given threshold
wmin. While this is more flexible than the previous filtering technique, as only the most
significant neighbors are kept, the right value of wmin may be difficult to determine.

• Negative filtering:
In general, negative rating correlations are less reliable than positive ones. Intuitively,
this is because strong positive correlation between two users is a good indicator of their
belonging to a common group (e.g., teenagers, science-fiction fans, etc.). However,
although negative correlation may indicate membership to different groups, it does not
tell how different these groups are, or whether these groups are compatible for other
categories of items.

Neighbors in the predictions


Once a list of candidate neighbors has been computed for each user or item, the
prediction of new ratings is normally made with the k-nearest-neighbors, that is, the k
neighbors whose similarity weight has the greatest magnitude. The important question
is which value to use for k.

UNIT-III- Page 29 of 29

You might also like