2020-Enhanced Collaborative Filtering-Based Approach For Recommender Systems
2020-Enhanced Collaborative Filtering-Based Approach For Recommender Systems
net/publication/342987712
CITATIONS READS
12 1,420
3 authors:
Hamdy M. Mousa
Menoufia University - Faculty of Computers and Information
62 PUBLICATIONS 305 CITATIONS
SEE PROFILE
All content following this page was uploaded by Rouhia Sallam on 16 July 2020.
ABSTRACT similarity between users or items. They are divided into two
Recommender systems are software applications that provide categories: user-based and item-based CF [2]. User-based CF
product recommendations for users based on their purchase identifies similar users to the target user for whom the rating
history or ratings of items. The product recommendations are predictions are being computed. In item-based CF the
likely to be of interest to the users and encompass items such as similarities need to be computed between items rather than
books, music CDs, movies, restaurants, documents (news users. In the model-based approaches, machine learning and
articles, medical texts, and Wikipedia articles), and other data mining methods are used for predictive models [2].
services. In this paper, we propose a framework for Model-based approaches use the ratings to learn a model in
collaborative filtering to enhance recommendation accuracy. order to improve the performance of CF. Examples of these
The proposed approach summarized in two steps: (1) item- techniques include dimensionality reduction techniques such as
based collaborative filtering and (2) singular-value- singular value decomposition (SVD), the matrix completion
decomposition-based collaborative filtering. In item-based technique, latent semantic methods, and regression and
collaborative filtering, the similarity between the target item clustering [1] .
and any other item is calculated. Then, the most similar items In the proposed approach, we have combined the best methods
are recommended. The Singular Value Decomposition based in collaborative filtering. Item-based CF provides better
approach handles the problem of scalability and sparsity posed performance. The SVD-based approach handles the problem of
by collaborative filtering and improves the performance of scalability and sparsity in CF and improves the performance of
item-based collaborative filtering. We have tested the proposed recommender systems.
approach by A Large-Scale Arabic Book Reviews (LABR)
dataset. We used four different datasets to compare our In this paper, we analyze the user-item matrix to identify
approach with existing work. The proposed approach evaluated relationships between different items of item-based CF. Then
using the most common metrics found in the collaborative we use these relationships to indirectly compute
filtering: the mean absolute error (MAE) and the root mean recommendations for users [3].The item-based approach has
squared error (RMSE). The proposed approach achieved high two key processes: (a) computing the similarity between each
performance and obtained minimum errors in terms of RMSE pair of items using various similarity measures like cosine and
and MAE values. Pearson metrics [2] and (b) computing the prediction. Item-
based CF provides better performance and quality than user-
Keywords based algorithms in most published research [3, 4,5].
Collaborative filtering (CF), k-Nearest Neighbors (KNN),
This work uses the model-based technique by applying the
Item-based collaborative, filtering Matrix Factorization (MF)
matrix factorization algorithm via SVD. Matrix-factorization-
Singular Value Decomposition (SVD), the mean absolute error
based CF aims to reduce the dimensions of the rating matrix
(MAE), root mean squared error (RMSE).
and discover potential features under the rating matrix for
1.INTRODUCTION recommendation [6, 7].
Recommender systems are one of the important techniques in The SVD technique produces high-quality recommendations
machine learning and data mining, which is used in search of that handle the problem of scalability and sparsity posed by CF
similarities between items and customer preferences. successfully [8, 9, 10, 11, 12].
Recommendation techniques are categorized into three types:
collaborative filtering, content-based techniques, and hybrid The experimental results in the proposed approach showed that
techniques [1]. SVD-based CF achieved better recommendations compared to
item-based CF; the two achieved 1.0187 and 1.1969 in terms of
Collaborative filtering (CF) is the most successful technique in RMSE, respectively. They also achieved 0.8077 and 0.922 in
recommender systems. It recommends items by identifying terms of MAE, respectively. Our proposed approach also has
other users with similar tastes; uses their opinion to more accurate when compared to existing work with different
recommend items to the active user. CF systems have two main datasets.
approaches: memory-based and model-based approaches [1].
Memory-based approaches use user rating data to compute the This paper is organized as follows. Section 2 briefly describes
related works in the area of CF. The proposed approach is
9
International Journal of Computer Applications (0975 – 8887)
Volume 176 – No. 41, July 2020
described in Section 3. Sections 4 and 5 outline the proposed approach consists of two steps: memory-based and
experimental setting and the results, respectively. Conclusions model-based CF.
and suggestions for future work are presented in Section 6.
2.RELATED WORK
In this section, we will describe CF with two methods:
memory-based and model-based CF.
Jianfang and Pengfei in [13] introduced a CF algorithm
combined with the Singular Value Decomposition (SVD) and
Trust Factors (CFSVD-TF). For similarity computation, they
used the cosine distance metric. The dataset used was the
MovieLens 100k dataset containing 100,000 ratings (1-5) from
943 users for 1682 movies with each user having rated at least
20 movies. The proposed technique was evaluated using the
root mean square error (RMSE). The proposed method
obtained better prediction accuracy. It obtained 0.9762 in term
of RMSE with 10 neighbors.
Another method [14] applied CF based on items to produce a
recommendation in movies. The dataset was the Group Lens
M1, consisting of around one million ratings from 6,040 users
for 4,000 movies. For calculating the similarities between
movies, they used adjusted cosine similarity. The proposed
approach evaluated using MAE and achieved 0.938 in terms of
MAE with 20 neighbors.
The proposed approach in [15] introduced a book
recommendation system using item-based collaborative
filtering. Cosine distance metrics have been used to calculate
similarity books. The dataset used was goodbooks10k contains
ratings of 10,000 popular books and 53424 users. The proposed
method performed evaluations using MAE. The experimental
results achieved 0.72 in terms of MAE.
Fig 1: An Enhanced Collaborative Filtering-based
The proposed in [16] presented the Book Recommendation Approach for Recommender Systems
Algorithm using Deep Learning. The dataset used was
goodbooks10k contains 6 million ratings for 10,000 of the most 3.1 Memory-based Collaborative Filtering
popular books. The experiment randomly divided the data set Memory-based methods: are referred to as neighborhood-based
into an 80% training set and a 10% validation set and a 10% CF. The ratings of user-item combinations are predicted based
test set. The proposed technique was evaluated using RMSE. It on their neighborhoods. These neighborhoods can be defined in
obtained 1.1426 in terms of RMSE. one of two ways: User-based collaborative filtering, and Item-
based collaborative filtering. User-based collaborative filtering
Mala et al. [17] proposed a web-based movie recommender
calculates the similarity between users by comparing their
system that recommends movies to users based on their profile
ratings on the same item. Then computes the predicted rating
using the different recommendation algorithms such as K-
for an item. This method was initially quite popular. They are
Nearest Neighbor (KNN), singular value decomposition,
not easily scalable and sometimes inaccurate [2]. The
Alternating Least Squares (ALS) and Restricted Boltzmann
advantages of memory-based techniques are that they are
Machines (RBM). Experimental results showed that SVD
simple to implement. Other advantages are that the resulting
achieved better recommendations compared to KNN, ALS, and
recommendations are often easy to explain. On the other hand,
RBM. SVD, KNN, and ALS achieved 0.9002, 0.9375 and
memory-based algorithms do not work very well with sparse
1.069 in terms of RMSE respectively. They also achieved
rating matrices [2].
0.6925, 0.7263 0.9935 in terms of MAE respectively.
The K-nearest neighbors (KNN) Item-based CF finds the
Sandeep and Rajesh in [18] proposed a new method called
similarity between items by selecting the k most similar items.
Accelerated Singular Value Decomposition (ASVD). It uses
And their corresponding similarities are also determined using
momentum based Gradient Descent Optimization. They used
a cosine similarity measure. The prediction of the unknown
real world datasets (MovieLens100k, Film Trust and Yahoo
rating is created based on item-item similarity. The top items
Movie). The proposed technique evaluated using Absolute
are returned as recommendations. Figure 2 shows the
Error (MAE) and Root Mean Square Error (RMSE). The
Collaborative filtering Process.
experimental results showed that the proposed ASVD
outperformed other models of SVD using RSVD and SSVD. In the first step, the similarity between two items is measured
by computing the cosine of the angle between two vectors m
3.PROPOSED APPROACH ×n, m list of users and n list of items. The similarity between
The main stages for an enhanced collaborative filtering-based items i and j, given by [3]. Then the similar neighbor items are
approach for recommender systems are shown in Figure 1. The
10
International Journal of Computer Applications (0975 – 8887)
Volume 176 – No. 41, July 2020
Despite the success of the Item-Based CF technique, it has SVD (AK) =UK ΣKVKT
some problems such as sparsity and scalability [19]. To solve Where, UK ,VK are m × k and n × k matrices composed by the
these problems, we use model-based approach via Matrix first k columns of matrix U and the first k columns of matrix V
Factorization techniques as it deals with these problems respectively. K × k is the principle diagonal sub-matrix of Σ..
successfully and efficiently. AK represents the closest linear approximation of the original
matrix A with reduced rank k.
3.2 Model-based Collaborative Filtering
The model-based CF requires a learning phase in advance to 4. EXPERMINTAL WORK
learn a model to improve the performance of collaborative In the experimental work, we used the Large-scale Arabic
filtering. It includes some techniques such as clustering, Book Review (LABR) dataset. It has over 63K book reviews,
classification, latent model, Markov decision process (MDP), each with a rating of 1 to 5 stars [22].Table 1 is describing the
and matrix factorization. dataset used for testing the proposed approach. Figure 3 shows
Matrix Factorization is the most successful latent factor number of books for each rating.
models. It has become popular recently by combining good Table 1: Dataset used in proposed approach evaluation
scalability with predictive accuracy. It maps both users and
items to a joint latent factor space of dimensionality f. User- 63257
item interactions are modeled as inner products in that space Number of ratings
[20]. There are various matrix factorization models: Singular
Value Decomposition (SVD), Principal Component Analysis Number of unique book id's 2131
(PCA), Probabilistic Matrix Factorization (PMF) and Non-
Negative Matrix Factorization (NMF). 16486
Number of unique users
We use SVD as it is one of the most common and successful
matrix factorization techniques used in collaborative filtering.
60152
Singular Value Decomposition (SVD): is the powerful Number of unique reviews
technique of dimensionality reduction. This is a specific
implementation of the MF approach and is also related to the Average number of ratings per user 3.65
PCA. The main issue in SVD decomposition is to find a lower-
dimensional feature space [20]. SVD of an m × n matrix A is
of the form: Average number of ratings per book 28.23
SVD (A) =U ΣVT
Where, U and V are m × m and n × n orthogonal matrices
Average number of reviews per user 3.65
respectively. Σ is an m × n singular orthogonal matrix with
11
International Journal of Computer Applications (0975 – 8887)
Volume 176 – No. 41, July 2020
5. RESULTS
This section outlines the experiment results by presenting the
obtained MAE and RMSE values using the cross-validation
technique. Three experiments are carried out. In the first
experiment, we evaluated KNN Item-based CF. In the second
one, SVD-based CF is evaluated. In the third experiment,
performance comparisons with different methods are
performed.
12
International Journal of Computer Applications (0975 – 8887)
Volume 176 – No. 41, July 2020
0.4
0.976 -
Proposed in [13]
0.2 2
MovieLens
0 The proposed 0.936 - 100k
Fold 1 Fold 2 Fold 3 Fold 4 Fold 5 Mean approach 5
13
International Journal of Computer Applications (0975 – 8887)
Volume 176 – No. 41, July 2020
14
International Journal of Computer Applications (0975 – 8887)
Volume 176 – No. 41, July 2020
IJCATM : www.ijcaonline.org 15