0% found this document useful (0 votes)
28 views10 pages

AStudyof Mathematical Modelfor Collaborative Filtering

The document discusses collaborative filtering techniques for making recommendations. It describes user-based collaborative filtering and item-based collaborative filtering. User-based collaborative filtering finds similar users to a target user and recommends items liked by similar users. Item-based collaborative filtering finds similar items to those liked by the target user and recommends those similar items. The document provides a mathematical model for user-based collaborative filtering to predict a rating for a target user and item. It calculates similarity between users and makes a prediction based on similar users' ratings, weighted by similarity.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views10 pages

AStudyof Mathematical Modelfor Collaborative Filtering

The document discusses collaborative filtering techniques for making recommendations. It describes user-based collaborative filtering and item-based collaborative filtering. User-based collaborative filtering finds similar users to a target user and recommends items liked by similar users. Item-based collaborative filtering finds similar items to those liked by the target user and recommends those similar items. The document provides a mathematical model for user-based collaborative filtering to predict a rating for a target user and item. It calculates similarity between users and makes a prediction based on similar users' ratings, weighted by similarity.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/366902172

Study of Mathematical Model for User-based Collaborative Filtering and Item-


based Collaborative Filtering

Research · January 2023

CITATIONS READS

0 980

1 author:

Prakash Upadhyaya
Tribhuvan University
11 PUBLICATIONS 0 CITATIONS

SEE PROFILE

All content following this page was uploaded by Prakash Upadhyaya on 06 January 2023.

The user has requested enhancement of the downloaded file.


Study of Mathematical Model for User-based Collaborative Filtering and
Item-based Collaborative Filtering
______________________________________________________________

Collaborative Filtering

Collaborative Filtering has two senses: a narrow one and a more general one.

In a newer, narrower sense, collaborative filtering is a method of making automatic


predictions (filtering)about the interests of user by collecting preferences or tastes
information from many users (collaborating). The underlying assumption of the
collaborative approach is that if the person A has the same opinion as a person B on
an issue, A is more likely to have B’s opinion on a different issue than that of a
randomly chosen person. Note that these predictions are specific to the user, but use
information gleaned from many users. This differs from the simpler approach of
giving an average (non-specific) score for each item of interest, for example based
on its number of votes. In more general case, collaborative filtering is the process of
filtering for information or patterns using techniques involving collaboration among
multiple agents, viewpoints, data sources, etc. Applications of collaborative filtering
typically involves very large data sets. Collaborative filtering methods have been
applied in many different kinds of data where the focus is on user data.

The growth of the internet has made it much more difficult to effectively extract
useful information from all the available online information. The overwhelming
amount of data necessitates mechanism for efficient information filtering.
Collaborative filtering is one of the techniques used for dealing with this problem.

Recommendation engines analyze information about user with similar tastes to


access the probability that a target individual will enjoy something, such as video, a
book, or a product. Collaborative filtering is also known as social filtering.
Collaborative filtering uses algorithms to filter data from user reviews to make
personalized recommendations to users with similar preferences. Collaborative
filtering is also used to select content and advertising for individuals on social media.

Two types of collaborative filtering commonly used by recommendation systems are


neighbour-based, and item-based.

078MSCSK011 Prakash Upadhyaya


Figure 1: Recommendation Techniques

User-based collaborative filtering leverages the behaviour of other users to know


what the target user might enjoy. It may find people similar to target user and
recommend stuff they liked or recommend stuff that other people bought after
buying what target user has bought. Same can be done to items as well.
Item-based collaborative filtering measures the similarity between the items that
target user rates or interact with and other items. It uses a matrix to determine the
likeness of pair of items. Item-based similarity processes then compare the current
user’s preferences to the items in the matrix for similarities upon which to base the
recommendations.

User-based Similarity Collaborative Filtering


User-based collaborative filtering, sometimes called as neighbour-based
collaborative filtering, is also known as memory-based approach for collaborative
filtering. It uses the rating matrix directly to find similarity and/or make predictions.
However, it does not scale for most real-world scenarios, mostly large e-commerce
sites have tens of millions of customers and millions of items.

User-based similarity collaborative filtering is based on basic idea as stated below:


- If users had similar tastes in past, they will have similar tastes in the future.
- Users’ preferences remain stable and consistent over time.

078MSCSK011 Prakash Upadhyaya


The basic technique:
- Given an “Active User” (PAKKU) and an ‘item i’ not yet seen by PAKKU
- Find a set of users (peers/nearest neighbours) who liked the same items as
PAKKU in the past and who have rated item i.
- Use, e.g., the average of their ratings to predict, if PAKKU will like item i.
- Do this for all items PAKKU has not seen and recommend best rating.

#Example:
A database of ratings of the current user “PAKKU”, and other users is given:
Users Item 1 Item 2 Item 3 Item 4 Item 5
PAKKU 5 3 4 4 ?
U1 3 1 2 3 3
U2 4 3 4 3 4
U3 3 3 1 5 4
U4 1 5 5 2 1
- Determine whether PAKKU will like or dislike Item 5, which PAKKU has not yet
seen or rated.

- How do we measure similarity?


- How many users should we consider?
- How do we generate a prediction from other user’s rating?

A popular similarity measure in user-based collaborative filtering is Pearson


correlation:

∑𝑝∈𝑃(𝑟𝑎,𝑝 − 𝑟̅𝑎 )(𝑟𝑏,𝑝 − 𝑟̅𝑏 )


𝑠𝑖𝑚(𝑎, 𝑏) =
√∑𝑝∈𝑃(𝑟𝑎,𝑝 − 𝑟̅𝑎 )2 √∑𝑝∈𝑃(𝑟𝑏,𝑝 − 𝑟̅𝑏 )2

𝑎, 𝑏 ∶ 𝑢𝑠𝑒𝑟𝑠
𝑟𝑎,𝑝 ∶ rating of user 𝑎 for item p
𝑃 ∶ set of items rated by both 𝑎 and b

A common prediction function is:

∑𝑏∈𝑁 𝑠𝑖𝑚(𝑎, 𝑏) ∗ (𝑟𝑏,𝑝 − 𝑟̅𝑏 )


𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛(𝑎, 𝑝) = 𝑟̅𝑎 +
∑𝑝∈𝑃 |𝑠𝑖𝑚(𝑎, 𝑏)|

078MSCSK011 Prakash Upadhyaya


→ Calculate, whether the neighbour’s ratings for the unseen item i are higher or
lower than average.
→ Combine the rating differences – use the similarity with 𝑎 as a weigh.
→ Add/subtract the neighbour’s bias from the active users’ average and use this
as a prediction.

Step 1: Calculating the similarity between PAKKU and all the user excluding Item
5 as it is not rated by PAKKU
∑𝑝 𝑟𝑖,𝑝
We calculate average as: 𝑟̅𝑖 = ∑𝑝
Therefore, we have
𝑟̅𝑃𝐴𝐾𝐾𝑈 = 4
𝑟̅𝑈1 = 2.25
𝑟̅𝑈2 = 3.5
𝑟̅𝑈3 = 3
𝑟̅𝑈4 = 3.25
And, calculating new rating (i.e., variance) as
𝑟̅𝑖𝑝 = 𝑟𝑖𝑝 − 𝑟̅𝑖
Which helps to obtain following matrix:

Users Item 1 Item 2 Item 3 Item 4


PAKKU 1 -1 0 0
U1 0.75 -1.25 -0.25 0.75
U2 0.5 -0.5 0.5 -0.5
U3 0 0 -2 2
U4 -2.25 -1.75 -1.75 -1.25

Now, we calculate similarity between PAKKU and all the other users:
(1 × 0.75) + (1 × 1.25) + (0 × −0.25) + (0 × 0.75)
𝑠𝑖𝑚(𝑃𝐴𝐾𝐾𝑈, 𝑈1) = = 0.85
√12 + 12 + 02 + 02 √0.752 + 1.252 + 0.252 + 0.752

(1 × 0.5) + (1 × 0.5) + (0 × 0.5) + (0 × −0.5)


𝑠𝑖𝑚(𝑃𝐴𝐾𝐾𝑈, 𝑈2) = = 0.71
√12 + 12 + 02 + 02 √0.52 + 0.52 + 0.52 + 0.52

(1 × 0) + (−1 × 0) + (0 × −2) + (0 × 2)
𝑠𝑖𝑚(𝑃𝐴𝐾𝐾𝑈, 𝑈3) = =0
√12 + 12 + 02 + 02 √02 + 02 + 22 + 22

(1 × −2.25) + (1 × 1.25) + (0 × −1.75) + (0 × −1.25)


𝑠𝑖𝑚(𝑃𝐴𝐾𝐾𝑈, 𝑈4) = = −0.099
√12 + 12 + 02 + 02 √2.252 + 1.752 + 1.752 + 1.252

078MSCSK011 Prakash Upadhyaya


Step 2: Now we predict the rating for Item 5 which is not seen by PAKKU:

𝑠𝑖𝑚(𝑃𝐴𝐾𝐾𝑈, 𝑈1) ∗ (𝑟𝑈1,𝐼5 − ̅̅̅̅)


𝑟𝑈1 +
𝑠𝑖𝑚(𝑃𝐴𝐾𝐾𝑈, 𝑈2) ∗ (𝑟𝑈2,𝐼5 − ̅̅̅̅)
𝑟𝑈2 +
𝑠𝑖𝑚(𝑃𝐴𝐾𝐾𝑈, 𝑈3) ∗ (𝑟𝑈3,𝐼5 − ̅̅̅̅)
𝑟𝑈3 +
𝑠𝑖𝑚(𝑃𝐴𝐾𝐾𝑈, 𝑈4) ∗ (𝑟𝑈4,𝐼5 − ̅̅̅̅)
𝑟𝑈4
𝑟𝑎𝑡𝑖𝑛𝑔(𝑃𝐴𝐾𝐾𝑈, 𝐼5) = 4 +
𝑠𝑖𝑚(𝑃𝐴𝐾𝐾𝑈, 𝑈1) + 𝑠𝑖𝑚(𝑃𝐴𝐾𝐾𝑈, 𝑈2) + 𝑠𝑖𝑚(𝑃𝐴𝐾𝐾𝑈, 𝑈3) + 𝑠𝑖𝑚(𝑃𝐴𝐾𝐾𝑈, 𝑈4)

(0.85 × 0.75) + (0.70 × 0.5) + (0 × 1) + (−0.099 × −2.25)


= 4+
|0.85| + |0.70| + |0| + | − 0.099|

= 4.73

Hence, with the obtained new rating (4.73 ~ 5), we can conclude that Item 5 could
be liked by current user PAKKU.

Figure 2: User-based and Item-based Collaborative Filtering

Item-based Similarity Collaborative Filtering


Item-based similarity collaborative filtering is a model-based approach of
collaborative filtering, based on an offline pre-processing. While calculation of
prediction of rating only the learned model is used to make the decision. The models
which are used to analyze for calculating similarity and the predictions are updated
and/or re-trained periodically.

078MSCSK011 Prakash Upadhyaya


Here the basic idea is to explore the relationship between the pair of items (the user
who bought Item_1 also Item_4). We find the missing ratings with help of the ratings
given to the other items by the user. Rather than matching the user to similar
customers, item-to-item collaborative filtering matches each of the user’s purchased
and rated items to similar items, then combines those similar items into a
recommendation list.

#Example:
Users Item 1 Item 2 Item 3 Item 4
U1 ? 1 2 3
U2 4 3 4 ?
U3 3 3 ? 5
U4 1 5 5 2
Determine the missing ratings in the table:

The very first step is to build the model by finding similarity between all the item
pairs. The similarity between item pairs can be found using the Cosine Similarity.

→ Formula for cosine similarity is:


𝐴⃗. 𝐵
⃗⃗
𝑠𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦(𝐴⃗, 𝐵
⃗⃗ ) =
|𝐴⃗|. |𝐵
⃗⃗ |
The second stage involves executing a recommendation system. It uses the items
(already rated by users) that are most similar to the missing item to generate rating.
We hence try generate predictions based in the ratings of similar products.

→ To compute the prediction, we use the formula:


∑𝑗 𝑟𝑎𝑡𝑖𝑛𝑔(𝑈, 𝐼𝑗 ) × 𝑠𝑖𝑚𝑖,𝑗
𝑟𝑎𝑡𝑖𝑛𝑔(𝑈, 𝐼𝑖 ) =
∑𝑗 𝑠𝑖𝑚𝑖,𝑗

𝑟𝑎𝑡𝑖𝑛𝑔(𝑈, 𝐼𝑖 ) = rating of user U for Item_i


𝑠𝑖𝑚𝑖,𝑗 = similarity between Item_i and Item_j

Step 1: Finding the similarity of all item pairs.


- Forming the item pairs: (I1, I2), (I1, I3), (I1, I4), (I2, I3), (I2, I4), (I3, I4)
- We select each pair one by one. After that we must consider all the users
who have rated for both items in the item pair. We must form vector for each
item and calculate the similarity between the two items using the cosine
similarity formula as stated above:

078MSCSK011 Prakash Upadhyaya


a) Similarity (I1, I2)
In the table we can see that U2, U3, and U4 have rated for both Item_1
and Item_2. Thus let I1 be the vector for Item_1 and I2 be the vector for
Item_2. Then
I1 = 4U2 + 3U3 + U4
I2 = 3U2 + 3U3 + 5U4
(4 × 3) + (3 × 3) + (1 × 5)
similarity(I1, I2) = = 0.7776
√42 + 32 + 12 √32 + 32 + 52

b) Similarity (I1, I3)


In the table we can see that U2, and U4 have rated for both Item_1 and
Item_3. Thus let I1 be the vector for Item_1 and I3 be the vector for
Item_3. Then
I1 = 4U2 + U4
I3 = 4U2 + 5U4
(4 × 4) + (1 × 5)
similarity(I1, I3) = = 0.7954
√42 + 12 √42 + 52

c) Similarity (I1, I4)


In the table we can see that U3, and U4 have rated for both Item_1 and
Item_4. Thus let I1 be the vector for Item_1 and I4 be the vector for
Item_4. Then
I1 = 3U3 + U4
I4 = 5U3 + 2U4
(3 × 5) + (1 × 2)
similarity(I1, I4) = = 0.9982
√32 + 12 √52 + 22

d) Similarity (I2, I3)


In the table we can see that U1, U2, and U4 have rated for both Item_2
and Item_3. Thus let I2 be the vector for Item_2 and I3 be the vector for
Item_3. Then
I2 = U1 + 3U2 + 5U4
I3 = 2U1 + 4U2 + 5U4
(1 × 2) + (3 × 4) + (5 × 5)
similarity(I2, I3) = = 0.9827
√52 + 32 + 12 √22 + 42 + 52

078MSCSK011 Prakash Upadhyaya


e) Similarity (I2, I4)
In the table we can see that U1, U3, and U4 have rated for both Item_2
and Item_4. Thus let I2 be the vector for Item_2 and I4 be the vector for
Item_4. Then
I2 = U1 + 3U3 + 5U4
I4 = 3U1 + 5U3 + 2U4
(1 × 3) + (3 × 5) + (5 × 2)
similarity(I2, I4) = = 0.7678
√52 + 32 + 12 √22 + 32 + 52

f) Similarity (I3, I4)


In the table we can see that U1, and U4 have rated for both Item_3 and
Item_4. Thus let I3 be the vector for Item_3 and I4 be the vector for
Item_4. Then
I3 = 2U1 + 5U4
I4 = 3U1 + 2U4
(2 × 3) + (5 × 2)
similarity(I3, I4) = = 0.8240
√22 + 52 √22 + 32

Step 2: Generating the missing ratings in the table


a) Rating of Item_1 for User_1
𝑟(𝑈1 , 𝐼2 ) × 𝑠𝑖𝑚(𝐼1 , 𝐼2 ) + 𝑟(𝑈1 , 𝐼3 ) × 𝑠𝑖𝑚(𝐼1 , 𝐼3 ) + 𝑟(𝑈1 , 𝐼4 ) × 𝑠𝑖𝑚(𝐼1 , 𝐼4 )
𝑟𝑎𝑡𝑖𝑛𝑔(𝑈1 , 𝐼1 ) =
𝑠𝑖𝑚(𝐼1 , 𝐼2 ) + 𝑠𝑖𝑚(𝐼1 , 𝐼3 ) + 𝑠𝑖𝑚(𝐼1 , 𝐼4 )

1 × 0.7776 + 2 × 0.7954 + 3 × 0.9982


= = 2.09
0.776 + 0.7954 + 0.9982

b) Rating of Item_4 for User_2


𝑟(𝑈2 , 𝐼1 ) × 𝑠𝑖𝑚(𝐼1 , 𝐼4 ) + 𝑟(𝑈2 , 𝐼2 ) × 𝑠𝑖𝑚(𝐼2 , 𝐼4 ) + 𝑟(𝑈2 , 𝐼3 ) × 𝑠𝑖𝑚(𝐼3 , 𝐼4 )
𝑟𝑎𝑡𝑖𝑛𝑔(𝑈2 , 𝐼4 ) =
𝑠𝑖𝑚(𝐼1 , 𝐼4 ) + 𝑠𝑖𝑚(𝐼2 , 𝐼4 ) + 𝑠𝑖𝑚(𝐼3 , 𝐼4 )

4 × 0.9982 + 3 × 0.7678 + 4 × 0.8240


= = 3.70
0.9982 + 0.7678 + 0.8240

c) Rating of Item_3 for User_3


𝑟(𝑈3 , 𝐼1 ) × 𝑠𝑖𝑚(𝐼1 , 𝐼3 ) + 𝑟(𝑈3 , 𝐼2 ) × 𝑠𝑖𝑚(𝐼2 , 𝐼3 ) + 𝑟(𝑈3 , 𝐼4 ) × 𝑠𝑖𝑚(𝐼3 , 𝐼4 )
𝑟𝑎𝑡𝑖𝑛𝑔(𝑈3 , 𝐼3 ) =
𝑠𝑖𝑚(𝐼1 , 𝐼3 ) + 𝑠𝑖𝑚(𝐼2 , 𝐼3 ) + 𝑠𝑖𝑚(𝐼3 , 𝐼4 )
d)
3 × 0.7954 + 3 × 0.9827 + 5 × 0.8240
= = 3.64
0.7954 + 0.9827 + 0.8240

078MSCSK011 Prakash Upadhyaya


Thus, we can now obtain the new table by putting the obtained rating of items for
respective users as following:

Users Item 1 Item 2 Item 3 Item 4


U1 2.09 1 2 3
U2 4 3 4 3.7
U3 3 3 3.64 5
U4 1 5 5 2

078MSCSK011 Prakash Upadhyaya

View publication stats

You might also like