0% found this document useful (0 votes)
73 views6 pages

Paper Joke Recommendation Systems

This document summarizes a research paper that implements a joke recommendation system using collaborative filtering with singular value decomposition. The system is evaluated on the Jester joke dataset containing ratings from over 73,000 users on 150 jokes. Collaborative filtering finds similar users to a given user based on rating patterns to make recommendations. Singular value decomposition is used to reduce the dimensionality of the user-item rating matrix to identify latent concepts for more accurate recommendations. The paper compares collaborative filtering to content-based filtering approaches.

Uploaded by

Gustavo Luizon
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
73 views6 pages

Paper Joke Recommendation Systems

This document summarizes a research paper that implements a joke recommendation system using collaborative filtering with singular value decomposition. The system is evaluated on the Jester joke dataset containing ratings from over 73,000 users on 150 jokes. Collaborative filtering finds similar users to a given user based on rating patterns to make recommendations. Singular value decomposition is used to reduce the dimensionality of the user-item rating matrix to identify latent concepts for more accurate recommendations. The paper compares collaborative filtering to content-based filtering approaches.

Uploaded by

Gustavo Luizon
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Jokes Recommendation Systems

Gustavo V.N. Luizon, Frederico Cardoso


Informatic Engineering Department
University of Coimbra, UC
Coimbra, Portugal

Abstract
1.1 Jester Dataset
This paper implements a Joke Recommendation
System through a Collaborative Filtering Recom-
mendation Systems of jokes using Singular Value The dataset used to implement the recommendations system
Decomposition (SVD) method and evaluates its per- based was the Jester Dataset [Goldberg, K.,2001].
formance using Pearson correlation and Text Con- The Jester dataset contains 150 jokes with ratings ranging be-
tent-Based method for text features cosine similarity tween -10 and +10 for 73421 users collected between April
calculation. 1999 and May 2003.

The data are available in two tables as described:


1 Introduction  Item Table: Composed by one identification number
field and one text filed containing the joke in html
A recommender system’s main goal is to estimate ratings for encoding, see Table 1;
items and provide a personalized list of the recommended  User Rating Table: contains the user identification
items to a given user. There are many different approaches to number, the item identification number and the cor-
this task, such as the comparison of ratings, item or user at- responding user evaluation rating, see Table 2.
tributes and demographic relations.

Every basic recommender system contains some type of Item_ID Joke


background information, consisting of a set of users, a set of 1 A man visits the doctor…
items and the relation between those which in many cases is 2 This couple had an excellent…
the rating a user gives to an item. Recommender systems fil-
3 What’s 200 feet long and…
tering can be categorized into three main approaches: Con-
… …
tent-based filtering, Collaborative Filtering and Hybrid filter-
ing [Najafi, S., 2016]. 150 In an interview with David…

In collaborative filtering, the idea is to make recommenda- Table 1: Item Table


tions to users on the basis of the buying behavior of other
similar users. In this context, localized pattern with text min-
ing is particularly useful. In localized pattern mining, the idea User_ID Item_ID Rating
is to cluster the data into segments, and then determine the 1 5 0.291
patterns in these segments. The patterns from each segment 1 7 -9.281
are typically more resistant to noise from the global data dis- … … …
tribution and provide a clearer idea of the patterns within like- 2 5 -9.688
minded customers. 2 7 9.938
… … …
Development of recomendar systems is a multi-disciplinary 63978 58 -8.656
effort which involves experts from various fields such as Ar- 63978 44 -8.438
tificial intelligence, Human Computer Interaction, Informa-
tion Technology, Data Mining, Statistics, Adaptive User In- Table 2: User Rating Table
terfaces, Decision Support Systems, Marketing, or Consumer
Behavior [Ricci, F., 2010].
1.2 Collaborative Filtering

Collaborative filtering system recommends items to users by


comparing ratings. Opinions of real-life related users play a
role in personal decision making. A collaborative system is
based on the same principle, but instead of considering geo-
graphically adjacent user opinions, the system searches glo-
bally for similar users. However, basing recommendation on
all users may not result in accurate predictions. In collabora-
tive filtering, the issue is addressed by matching users that Figure 2: SVD Theorem
share rating similarities. As opposed to content-based filte-
ring, collaborative filtering provides recommendations by The λ diagonal matrix contains the singular values, which
examining user-item rating relations [Najafi, S., 2016]. will always be positive and sorted in decreasing order. The U
matrix is interpreted as the “item-to-concept” similarity ma-
trix, while the V matrix is the “term-to-concept” similarity
matrix.

In order to compute the SVD of a rectangular matrix A, we


consider AAT and ATA. The columns of U are the eigenvec-
tors of AAT , and the columns of V are the eigenvectors of
ATA. The singular values on the diagonal of λ are the posi-
tive square roots of the nonzero eigenvalues of both AAT and
ATA. Therefore, in order to compute the SVD of matrix A
we first compute T as AAT and D as ATA and then compute
the eigenvectors and eigenvalues for T and D. The r eigen-
Figure 1: Collaborative Filtering
values in λ are ordered in decreasing magnitude. Therefore,
the original matrix A can be approximated by simply truncat-
ing the eigenvalues at a given k.
1.2.1 Singular Value Decomposition
The truncated SVD creates a rank-k approximation to A so
that Ak = UkλkVkT . Ak is the closest rank-k matrix to A. The
The Matrix Factorization methods transforms both items and term “closest” means that Ak minimizes the sum of the
users to the same latent factor space. The latent space is then squares of the differences of the elements of A and Ak. The
used to explain ratings by characterizing both products and truncated SVD is a representation of the underlying latent
users in term of factors automatically inferred from user feed- structure in a reduced k-dimensional space, which generally
back. means that the noise in the features is reduced [Ricci, F.,
2010].
The Singular Value Decomposition (SVD) is a powerful
technique for dimensionality reduction. It is a particular real-
ization of the Matrix Factorization approach and it is there- 1.3 Content Based Filtering
fore also related to Principal Component Analysis (PCA).
The key issue in an SVD decomposition is to find a lower
dimensional feature space where the new features represent In content-based recommender systems, a recommendation is
“concepts” and the strength of each concept in the context of based on the relation between the attributes of items that a
the collection is computable. Because SVD allows to auto- given user has previously rated and items which the user have
matically derive semantic “concepts” in a low dimensional not yet rated. The content-based approach utilizes the concept
space, it can be used as the basis of latent-semantic analysis, of monotonic personal interest, meaning a person’s interest
a very popular technique for text classification in Information will most likely not change in the near future.
Retrieval. The process of how a basic content-based approach recom-
mends an item involves having a set of items and a set of us-
The core of the SVD algorithm lies in the following theorem: ers. Assuming every user have rated a subset of the items, a
It is always possible to decompose a given matrix A into A user profile can be constructed for each user containing their
=UλVT . Given the n×m matrix data A (n items, m features), preferences. User preferences are computed by analyzing the
we can obtain an n×r matrix U (n items, r concepts), an r×r similarities between the items that the user has previously
diagonal matrix λ (strength of each concept), and an m×r ma- rated. Recommending an item when a user profile is available
trix V (m features, r concepts). See Figure 2. is done by matching the profile attributes with items not yet
rated by the user [Najafi, S., 2016].
The classical problem of association pattern mining is to de- 2 Joke Recommendation System
termine associations between groups of items bought by cus-
tomers, which can intuitively be viewed as k-way correlations
between items. The most popular model for association pat- The procedure for recommending jokes was implemented us-
tern mining uses the frequencies of sets of items as the quan- ing the Surprise [2] library for the Python programming lan-
tification of the level of association. guage. The Cross-validation method is used to assess the gen-
eralization ability of predictive models and to prevent over-
The discovered sets of items are referred to as large item sets, fitting of SVD model, see Table3:
frequent item sets, or frequent patterns. The association pat-
tern mining problem has a wide variety of applications, one
of them is the text mining. Meas. Fd 1 Fd 2 Fd 3 Mean Std

RMSE 4.431 4.441 4.428 4.433 0.005


On text mining, data is often represented in the bag-of-words
model, frequent pattern mining can help in identifying co-oc- MAE 3.331 3.345 3.333 3.336 0.005
curring terms and keywords. Such co-occurring terms have Fit (s) 56.9 56.39 56.62 56.66 0.23
numerous text-mining applications [Aggarwal, C., 2015]. 5.55 4.93 5.98 5.49 0.43
Test (s)

1.3.1 Pre-Processing Table 3: SVD Cross Validation

As the text is not directly available in a multidimensional rep-


resentation, the first step is to convert raw text documents to
The result of the cross-validation procedure is a root mean
the multidimensional format. In cases where the documents
square error (RMSE) of 4.43, mean absolute error (MAE) of
are retrieved from the Web, additional steps are needed, as
3.34 and standard deviation of 0.5% for both. Considering a
described:
rating range between -10 and 10, the value of 4.43 represents
a variation of approximately 20% more or less than the real
 Stop Word Removal: Frequently occurring words in
rating value, the low standard deviation value shows good
a language that are not very discriminative for min-
consistency in the estimated rates.
ing applications;
 Stemming: Variations of the same word need to be
The Cross Validation result shows expected precision values
consolidated.
for recommendation systems, considering uncertainty on fi-
 Punctuation marks: After stemming has been per-
delity of the attributed ratings by users and the low number
formed, punctuation marks, such as commas and
of rated jokes by many users of the dataset.
semicolons, are removed.

1.3.2 Cosine Similarity Using the trained model is possible to estimate ratings for not
known items and provide a personalized list of the recom-
mended items to a given user, see Table 4:
Many data mining applications require the determination of
similar or dissimilar objects, patterns, attributes, and events
in the data. In other words, a methodical way of quantifying User_ID Item_ID Estimative
similarity between data objects is required. Virtually all data 1 140 9.9
mining problems, such as clustering, outlier detection, and 1 145 9.3
classification, require the computation of similarity. 1 143 7.8
1 117 7.5
The cosine measure computes the angle between the two doc-
1 114 6.7
uments, which is insensitive to the absolute length of the doc-
… … …
ument.
Let X = (X1. . . Xd) and Y = (Y1 . . . Yd) be two documents on 1 75 -5.3
a lexicon of size d. Then, the cosine measure cos(X, Y ) be- 1 84 -5.4
tween X and Y can be defined as follows: 1 122 -5.4
1 58 -9.3
∑ 𝑥 .𝑦 1 43 -10
cos(𝑋, 𝑌) =
∑ 𝑥 . ∑ 𝑦 Table 4: User Estimative List Example
3 Joke Similarity
Hit Rate 10
To implement the procedure for calculating the similarity be- 1,2
tween the jokes, the SciKit Learn [Bruchier, M., 2007] and
Scify[Virtanen, P., 2021] libraries for Python were used. 1
0,8
The Jester dataset is available in web format, so it has many
terms that make it difficult to analyze its content such as html 0,6
tags, lowercase and uppercase letters, accentuation, punctua-
0,4
tion and many words that are not relevant to the context. So
before the calculate the similarity between the jokes text, a 0,2
pre-processing step was necessary to clean the data.
0

1168
1752
2336
2920
3504
4088
4672
5256
5840
6424
7008
7592
8176
8760
9344
9928
584
The text features were obtained using the SciKit Learn’s class
“CountVetorizer” and the cosine similarities between all
jokes contained in the Jester dataset were calculated resulting Graph 1: Hit Rate 10 – Avg: 0,53
in a cross-similarity joke table, see table 5.

ID 1 2 … 149 150 Hit Rate 20


1 1.00 0.13 … 0.00 0.28
2 0.13 1.00 … 0.14 0.21 1,2
… … … … … … 1
149 0.00 0.14 … 1.00 0.07
150 0.28 0.21 … 0.07 1.00 0,8
0,6
Table 5: Cross-Similarity Joke Table
0,4

With the similarity table it is possible to rank the user’s un- 0,2
know jokes by similarity with the user’s preferred jokes. 0
1

1169
1753
2337
2921
3505
4089
4673
5257
5841
6425
7009
7593
8177
8761
9345
9929
585

4 Results
Graph 2: Hit Rate 20 – Avg: 0,77
In order to determine the collected recommendation results,
approximately 10000 users were selected for the test set, the
selected user have at least 50 joke evaluations to avoid the
cold start effect. For each of the users on the list, 3 metrics
were calculated as described below: Hit Rate 30
1,2
4.1 Hit Rate
1
The hit rate was obtained through the intersection between 0,8
the collaborative filtering recommendation list and the list of
user preferences according to the real classification rated by 0,6
the user. The calculation of the rate of this measure was per-
0,4
formed by comparing the set of 10 jokes best rated by the user
with a list of recommendations at 3 different levels, first us- 0,2
ing the set of 10 best collaborative filtering recommenda-
tions, then with the set of 20 best recommendations and fi- 0
1

1169
1753
2337
2921
3505
4089
4673
5257
5841
6425
7009
7593
8177
8761
9345
9929
585

nally with the top 30, see Graph 1, Graph 2 and Graph 3.

Graph 3: Hit Rate 30 – Avg: 0,87


4.2 Pearson Correlation
The obtained results point to a good accuracy for recom-
mending jokes based on collaborative filtering, it’s possible
Pearson's correlation was obtained using all joke ratings for to see a high amount of recommendations list containing
each user, with the list of ratings provided by the recommen- jokes approved by the user, especially when we extend the
dation system investigation domain to 30 jokes, in this case the rate reaches
the average value of 0.87, additionally we can see in the
Graph 3 a large number of users with 100% of the approved
pearson_correlation jokes contained in the recommended set.
1,2 The Pearson correlation between the user’s joke rate and the
1 recommendation joke rate presents an average of 0.91 which
0,8 indicate a very high correlation between them confirming the
0,6 hit rate results and the model’s predicting capability.
0,4
0,2
The text similarity test has an average of 0.32, this shows that
the content of the recommended jokes does not show very
0
similar to the content of the jokes preferred by the user in
1

1169
1753
2337
2921
3505
4089
4673
5257
5841
6425
7009
7593
8177
8761
9345
9929
585

-0,2 most cases, however the cosine similarity values come close
-0,4 to 0 .5 for some users which shows some influence of content
-0,6 on preference.
-0,8

5 Conclusion
Graph 4: Pearson Correlation – Avg: 0,91

In general, we can conclude that the joke recommendation


4.2 Text Similarity system has a good performance, very accurate results are be-
ing presented both in the model validation step made through
the cross validate and in the final recommendation analysis,
The text similarity was obtained by the average from the sim- therefore, we can conclude that the recommendation tech-
ilarities between each joke from the set of the 30 best rated niques by collaborative filtering can be applied to recom-
jokes by the user and the 30 best recommendations made by mending jokes quite efficiently.
the recommendation system, this analysis aims to obtain a
measure of content similarity between the recommended total On text similarity tests, the results were not so satisfying, ap-
set with the user's preference. parently there is not a very close relationship between the
content of jokes and the user's preferences, it can happen for
several reasons, one of them is the fact that the user may not
Text Similarity have a well-defined preference for a certain genre of jokes,
he evaluates it according to the degree of satisfaction he gets
0,5 from reading the joke, but he doesn't care if the content is
about drunks, blondes or lawyers. In addition, the dataset has
0,4 150 jokes from very varied genres, this can create a certain
scarcity of jokes for certain genres making it impossible to
0,3 recommend another joke with similar content to user prefer-
ences.
0,2

0,1
In future experiments, another approach can be carried out
using a recommendation based entirely on content and veri-
0 fying whether the results obtained are consistent with the us-
er's preferences.
1

1169
1753
2337
2921
3505
4089
4673
5257
5841
6425
7009
7593
8177
8761
9345
9929
585

Graph 5: Text Similarity – Avg: 0,32


References
[ Aggarwal, c., 2015] Charu C. Aggarwal. Data Mining, The
Textbook, 2015.
[Ricci, 2010] Ricci, F., Rokach, L., Shapira, B., Kantor, P.:
Recommender Systems Handbook, 2010.
[Najafi, S., 2016] Najafi, S., Salam, Z. Evaluating Prediction
Accuracy for Collaborative Filtering Algorithms in Rec-
ommender Systems, 2016
[Goldberg, K., 2001] Goldberg, K. Jester Dataset, 2001
[Bruchier, M., 2007] Scikit Learn Python Library, 2007
[Virtanen, P., 2021] Scify Python Library, 2021

You might also like