Deep Learning Based Recommendation Systems
Deep Learning Based Recommendation Systems
SJSU ScholarWorks
Master's Projects Master's Theses and Graduate Research
Winter 2018
Recommended Citation
Pinnapareddy, Nishanth Reddy, "Deep Learning based Recommendation Systems" (2018). Master's Projects. 644.
http://scholarworks.sjsu.edu/etd_projects/644
This Master's Project is brought to you for free and open access by the Master's Theses and Graduate Research at SJSU ScholarWorks. It has been
accepted for inclusion in Master's Projects by an authorized administrator of SJSU ScholarWorks. For more information, please contact
[email protected].
Deep Learning based Recommendation Systems
A Project
Presented to
In Partial Fulfillment
Master of Science
by
May 2018
○
c 2018
by
May 2018
systems help users filter out relevant content from a large pool of available content.
The recommender systems play a vital role in today’s internet applications. Collabo-
rative Filtering (CF) is one of the popular technique used to design recommendation
systems. This technique recommends new content to users based on preferences that
the user and similar users have. However, there are some shortcomings to current CF
In recent years, deep learning has achieved great success in natural language process-
ing, computer vision and speech recognition. However, the use of deep learning in
Although some recent work has employed deep learning for recommendation,
items and auricular features of audios. Moreover, these models ignore the important
factor of collaborative filtering, that is the user-item interaction function, but some
models still employ matrix factorization, by using inner product on the latent features
ture, which learns an user-item interaction function from data. To handle any non-
ical evidence shows deep learning based recommendation models have better perfor-
mance.
ACKNOWLEDGMENTS
Her unwavering enthusiasm for the study of social networks kept me engaged with
right direction towards completion of the project. I would like to thank her for her
My deep gratitude also goes to Prof. Sami Khuri and my co-worker at work
them for their time and efforts. Lastly, I would like to thank my friends and family.
They supported and helped me to survive this stress and not letting me give up.
vi
TABLE OF CONTENTS
CHAPTER
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
5 The Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
6 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . 22
6.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
vii
6.1.1 MovieLens . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
6.1.2 Pinterest . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
6.3.2 User-KNN . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
LIST OF REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
viii
LIST OF TABLES
1 Characteristics of Datasets . . . . . . . . . . . . . . . . . . . . . . 22
2 System Configuration . . . . . . . . . . . . . . . . . . . . . . . . . 25
3 Parameter variations . . . . . . . . . . . . . . . . . . . . . . . . . 25
ix
LIST OF FIGURES
2 Representation of neuron . . . . . . . . . . . . . . . . . . . . . . . 7
x
CHAPTER 1
Introduction
Recommender systems are intelligent systems which exploit users ratings on items
in the past to recommend similar items to other users. These systems play a crucial
role in on-line businesses by pro-actively narrowing the navigation items for the users
based on their preferences. The main problem solved by recommender systems is the
mender system would provide a positive impact on their revenue. The personalized
content provided by a recommender system would improve the experience of the user
Collaborative filtering (CF) [1], Content-based filtering (CB) and Hybrid Meth-
ods are the popular recommendation techniques in recommender system domain. The
collaborative filtering methods and hybrid methods use different criteria to suggest
the items tailored for users. For example, CF-based methods make use of history of
the user ratings on products whereas hybrid methods, which combine both collab-
methods like collecting user profile information, collaborative filtering based methods
the most popular method of all CF based methods. This method rely on user-item
latent vectors.
After Netflix Prize [3] competition, Matrix Factorization became the default
1
of research work was put in to enhance Matrix Factorization, essentially to integrate
MF along with neighbor-based models [4], merging MF with topic models of the
item description [5], and extending the functionality to factorization machines [6] for
analyzing general methods to model the features. Though MF is very effective for
collaborative filtering , it is also a known fact that its capabilities could be negatively
scenario of rating the analysis on EF(Explicit Feedback), where we very well know
that by using item bias terms and user into IF, it will improve the performance of
MF model. Interactions between items and users which are termed as latent feature
interactions can be designed in an effective way by just making a minor tweak to the
features in a linear method, would not be sufficient for obtaining the complex model
Many people have previously worked on handcraft models but we have explored
the IF from data instead of handcraft by using deep neural networks [7]. Several
domains like processing the text from speech recognition and computer vision [8] are
the prominent areas where deep neural networks[DNN’s] have already proven their
literature available on MF models but a considerably lesser work has been done to get
recommendation by applying DNN’s. Some latest advancements [10] have used DNN’s
for recommendations and have reported great results. DNN’s have been mostly used
items and audio features of music. But to model the effect of collaborative filtering,
MF has been used for the calculation of inner product by adding user and latent
features of an item.
2
In this project, we address the drawbacks of Matrix factorization by replacing in-
ner product with a neural network architecture. Rather than using explicit ratings we
focused on implicit ratings, that inherently shows user preference through behaviors
like clicking or buying items and watching videos and can be tracked automatically.
Thus, it makes easy for content providers to collect data. However, there is one
problem with this feedback as we can not differentiate between positive feedback and
negative feedback. To reduce the effect of negative feedback we utilize deep neural
networks to model recommendation task. The main contribution of our work is that
3
CHAPTER 2
Background
system that seeks to predict the "rating" or "preference" a user would give to an item.
buying patterns.
∙ Content based filtering techniques rely on user activity attributes like keywords
4
during search or their profiles.
The following sections we will cover each of the techniques and their limitations.
for new items in the system since they find the similar items based on items descriptive
attributes which are rated by active user. However, these techniques restrict models
to predict particular type of items as they rely on keywords and content of items.
They also don’t work well when making predictions for new users.
Models based on this technique rely on the collaborative power of ratings pro-
vided by users. The key challenge in building these models is that ratings matrices
are sparse. Collaborative filtering methods predict unspecified ratings based on the
observed ratings since they are often highly correlated across various users and items.
algorithms. Here, the predictions are based on the neighborhoods. These neighbor-
5
∙ User-based collaborative filtering: This filtering technique will give recom-
if you want to provide suggestion to user A you determine all the users who
are similar to user A and recommend ratings for the missing ratings of A by
item B by user A, we need to calculate the set of items which are similar to
item B. The ratings in this set provided by A will help us determine if user A
Memory based models are simple to implement and easy to understand. However,
they don’t perform well with sparse rating matrices that is they lack full coverage
items.
These methods use predictive data mining and machine learning techniques to
and latent factor models are some examples. These models have high level of predic-
tion coverage even for sparse ratings. However, these methods tend to be heuristic
This is the best recommender system when we have diverse set of input categories
6
used to achieve the best results. This will use the power of many machine learning
Deep learning is a subset of Machine Learning family, which learns data represen-
tations, as opposed to task specific algorithms. Deep learning models use a cascade of
multi layered non-linear processing units called as neurons, which can perform feature
biological neural networks in the human brain process information. The smallest unit
inputs from other neurons and computes an output. Each input to the node has a
associated weight (w) which signifies its relative importance to other inputs. The
7
function is useful to learn complex patterns in data.
This neural network contains multiple neurons arranged in layers. Neurons from
adjacent layers have connections between them and each of these connections have
network.
∙ Input Nodes - These nodes provide information from external sources to net-
work and together it is referred as "Input Layer". These nodes don’t perform
∙ Hidden Nodes - These nodes does not have any connection with external
8
output nodes. Feedforward network will have only single input and output
forward - from input nodes, through hidden nodes and finally to output nodes. It does
not contain any cycles or loops. Single Layer Perceptron is type of feedforward
neural network does not contain any hidden layer. Multi Layer Perceptron has
9
CHAPTER 3
Problem Definition
In this chapter we introduce the problem and then discuss about existing collabo-
rative filtering techniques based on implicit data. We then discuss the most popularly
used Matrix Factorization method and its limitation due to inner product of user and
Let M and N represent the number of users and items respectively. We denote
If the value of 𝑦𝑢𝑖 is 1, then is an interaction between user and item. Otherwise
there is no interaction between user and item. Since, these interactions do not specify
whether actually the user likes the item or not there is possibility of noise signals.
𝑦𝑢𝑖 = 𝑓 (𝑢, 𝑖|Θ), where 𝑦ˆ𝑢𝑖 represents the predicted score for 𝑦𝑢𝑖 , Θ represents model
calculated score.
function to calculate model parameters Θ. These techniques commonly use two types
10
of objective functions - pointwise loss [12] and pairwise loss. Most models use point-
wise loss objective function and they learn by following a regression framework by
These models handle negative feedback either by sampling negative entries from ob-
served entries [13] or by treating all unobserved entries as negative feedback. Models
based on pairwise loss function assumes the observed entries should be ranked higher
that the unobserved entries. These models increase the difference between observed
entry 𝑦ˆ𝑢𝑖 and unobserved entry 𝑦ˆ𝑢𝑗 Ȯur proposed model based on neural network sup-
ports uses pointwise learning, but it can also extended to pairwise learning.
Matrix Factorization (MF) is one of the most popular collaborative filter tech-
niques used in Industry for Recommender Systems. It pairs each user-item interaction
with a real valued vector of latent features. Let 𝑝𝑢 and 𝑞𝑖 represent the real valued
latent vectors are user and item, respectively. MF computes (𝑦𝑢𝑖 ) as the dot product
of 𝑝𝑢 and 𝑞𝑖 :
𝐾
∑︁
𝑦ˆ𝑢𝑖 = 𝑓 (𝑢, 𝑖|p𝑢 , q𝑖 ) = p𝑇𝑢 q𝑖 = 𝑝𝑢𝑘 𝑞𝑘𝑖 (2)
𝑘=1
where K represent latent space. The bi-directional communication of user and product
latent factors considering the each direction of latent space is not connected with each
other and linearly adding them with same load. Therefore, MF can be considered as
To understand the above illustration well, there are two settings to be stated
clearly beforehand. Since the latent space is the result of mapping users and products
in the same dimension the, the dot product or the cosine of angle in latent vectors gives
us the similarity between two people. The second point is the Jaccard coefficient [14]
11
Figure 3: An example explains MF’s limitation
[14] From user-item matrix (a), u4 is most identical to u1, followed by u3, and
lastly u2. However, in user latent space (b), placing p4 closet to p1 makes p4 closer
to p2 than p3, resulting greater ranking loss.
which helps MF to calculate similarity between the users without losing generality
between them.
We can see from the first three rows of user-item matrix in Fig. 3a, the cosine
similarity score of 𝑠23 (0.66) > 𝑠12 (0.5) > 𝑠13 (0.4). As such, the geometric relations of
p1, p2, and p3 in the latent space can be plotted as in Figure 3b. Now, let us consider
a new user u4, whose input is represented by dashed line in Fig. 3a. We can have
𝑠41 (0.6) > 𝑠43 (0.4) > 𝑠42 (0.2), meaning that u4 is most similar to u1, followed by u3,
and lastly u2. However, if p4 is placed closer to p1 by this model, it will result in p4
closer to p2 than p3, which unfortunately will results in greater ranking loss.
From this illustration, we can see the negative impact created by simple and fixed
inner product on model performance. Our models address this drawback by learning
user-item interactions using deep neural networks which is covered in later sections.
12
CHAPTER 4
Related Work
Past models rely on data from explicit feedback as the primary source for rec-
ommendations tasks [15], but the attention is slowly moving towards implicit data.
of recommendation of the item that focuses on recommending a simple item list for
users. The problem on predicting the rating is broadly solved so far by the work
done on explicit feedback(EF) but it is more practical to solve the problem on item
for the item recommendation based on implicit feedback(IF) , recent works added to
a uniform weighting where proposal is made with two strategies, which considered
all the data missing to be negative instances or derived the negative instances from
the data that was missing. To weigh the missing data, dedicated models have been
proposed by He et al[ [2] and Liang et al [16]. For the models that are based on
scent (iCD), which achieved the cutting-edge performance for recommendation of the
item. Neural networks usage for the recommendation works is discussed in depth in
The work done by Salakhutdenov et al. [15] involves a two layered Restricted
Boltzmann Machines for modeling the users that contain explicit ratings for the items.
This particular work was then extended to model the ratings for ordinal nature[ref].
In recent times, the mostly used choice to build the recommendation systems is au-
toencoders. A study of hidden patterns that are capable of reconstructing the ratings
of a user with the inputs of historical ratings is called user-based AutoRec [18]. Rather
13
than personalizing the user data this approach is shares a similarity with item-item
model[ref] where the rated items represent a user. For the purpose of avoiding au-
toencoders identity function learning and failure to generalize the unseen data, the
introduction of denoising autoencoders (DAE’s) has been done to study from the
inputs which are intentionally corrupted[ref]. A neural autoregressive method for col-
laborative filtering (CF) has been recently proposed by Zheng et al [19]. The effort
which has been put previously has provided a very strong support which improved
the success of neural networks (NN) to address the problem of collaborative filtering
where the focus was more on the explicit ratings and it is only modeled using observed
data. Accordingly, they could fail in learning users preferences because implicit data
is positive.
While some recent work [20] have analyzed recommendation established on im-
plicit feedback (IF) by using deep learning models, they have mainly used deep neu-
ral networks (DNN’s) to model the additional information like text description of the
items, sound properties of the music which deals with physics, behavior of users across
multiple domains, and abundant content in the knowledge areas. These particular fea-
tures derived from deep neural networks are then combined with Matrix Factorization
for collaborative filtering. The one which is more similar to the work [21], that ensures
filtering with the implicit feedback (IF). Contrary to the denoising auto-encoder based
with SVD++ model [20] where the activation of hidden structures of collaborative
14
tion. Though CDAE is used as neural modeling method for collaborative filtering ,
it also involves applying inner product to design or model the user and item inter-
actions(UII). This explains very well why the usage of deep layers for collaborative
denoising autoencoder will not enhance its performing ability (cf. Section 6 of [21]).
Noticing this typical behavior from collaborative denoising autoencoder the NCF por-
trays a two - way architecture where the user and item interactions are modeled with
which is arbitrary from the data provided which is more self-explanatory and very
Identically, grasping the relationship between two objects has been worked on
extensively in the previous works of knowledge base graphs [22]. A lot of development
has taken place like machine learning models which are relative [13]. An other method
called Neural Tensor Network has shown robust performance as it uses the neural
our proposal. This targets a different aspect of collaborative filtering. Since Neural
appears to be leveraged from NTN but Neural MF is very dynamic and general than
Recently, Google published their deep neural network models which they are us-
ing for product recommendations [23]. These models used Multi-Layer Perceptron
architecture, which showed promising results and also made the model generic. Al-
at analyzing deep neural networks for only CF based recommender systems. In this
project, we explored the use of deep neural networks to model complex user-item
interactions.
15
CHAPTER 5
The Model
action function using neural networks with a probabilistic model which emphasizes
the implicit feedback data. We then express matrix factorization (MF) [11] as a
neural network model. To explore deep neural networks for collaborative filtering, a
multi-layer perceptron (MLP) [11] model is used to learn user-item interaction func-
tion. Finally, we present our neural network matrix factorization model, which is a
fusion of MF and MLP models. This model gets strengths of linearity of MF and
in Figure 3, where the output of one layer serves as the input to the next layer.
The first input layer has two input vectors 𝑣𝑢𝑈 and 𝑣𝑖𝐼 that represent user u and item
i. These are sparse binary vectors with one-hot encoding. After input layer, there
is an embedding layer. This layer is fully connected one, that projects the sparse
the latent vector for user/item in the context of latent factor model. These embedding
layers are then fed into a multi-layer neural architecture to map the latent vectors
to prediction scores. We can also customize each hidden layer to discover new latent
structures from user-item interactions. The final layer gives the predicted score 𝑦ˆ𝑢𝑖 and
the dimension of last hidden layer determines the model’s capability. We performed
training by minimizing the pointwise loss between 𝑦ˆ𝑢𝑖 and its actual value 𝑦𝑢𝑖 .
16
Figure 4: Generalized Neural Network Framework
[14]
where P ∈ ℜ𝑀 𝑋𝐾 and Q ∈ ℜ𝑁 𝑋𝐾 , denoting the latent factor matrix for users and
items and Θ𝑓 represents the model parameters for interaction function. Since 𝑓 is
𝑓 (P𝑇 v𝑈𝑢 , Q𝑇 v𝐼𝑖 |P, Q, Θ𝑓 ) = 𝜑𝑜𝑢𝑡 (𝜑𝑋 (...𝜑2 (𝜑1 (P𝑇 v𝑈𝑢 , Q𝑇 v𝐼𝑖 ))...)) (4)
where 𝜑𝑜𝑢𝑡 and 𝜑𝑋 represent the mapping function for the output layers and X-th
17
5.1.1 Learning Model Parameters
observations. While the squared loss works better on data drawn from Gaussian
distribution [24] it fails to perform well on binary data [0, 1]. So to learn model
∏︁ ∏︁
𝑝(𝑌, 𝑌 − |𝑃, 𝑄, Θ𝑓 ) = 𝑦ˆ𝑢𝑖 − (1 − 𝑦ˆ𝑢𝑗 ) (6)
(𝑢,𝑖)∈𝑌 (𝑢,𝑗)∈𝑌 −
Equation 7 is known as binary cross-entropy loss or log loss. We used this as our
(SGD).
collaborative filtering (NCF). By modeling this in to a NCF we can cover large family
of factorization methods.
The input to this model is one-hot encoding of user/item vectors and then fol-
lowed embedding layer can be viewed as latent vector of user/item. Let us denote
18
user latent vector as p𝑢 and item latent vector as q𝑖 , respectively. We define the
where ⊙ denotes the dot product of vectors. We then project the vector to output
layer as:
where 𝑎𝑜𝑢𝑡 and h𝑇 represent activation function and edge weights of out put layer,
respectively.
function as activation function and learns model parameters with log loss objective
function.
to model user and items. It is intuitive to concatenate both these pathways [11]
vector concatenation is not enough to capture the interactions between user and item
latent features. To overcome this issue, we added hidden layers on the concatenated
vector, used MLP to learn the interaction between user and item latent vectors. We
We implemented this model with ReLU [25] as activation function and to design
neural network architecture we followed a tower pattern, where the bottom is the
19
widest one and each successive layer has smaller number of neuron units as shown in
Figure 3.
So far we have seen two neural network based models - GMF that applies linear
kernel and MLP that uses a non-linear kernal, respectively to learn interaction func-
tion from data. Now, we present a hybrid model by fusing GMF and MLP so they
can mutually reinforce each other and learn the complex user-item interactions.
An obvious approach to fuse these models is to share both GMF and MLP same
embedding layer, and then combine the outputs of their interaction functions. How-
ever, sharing embeddings of GMF and MLP may limit the performance and flexibility
of fused model. So, we allowed GMF and MLP to learn separate embeddings, and
20
combine these models by concatenating their last hidden layers as shown in Figure 4.
This model combines linearity from MF and non-linearity from neural networks for
21
CHAPTER 6
Experimental Results
In this chapter, we cover the experiments that aim to answer the following re-
search questions.
Research Question 1 - Did our proposed models out perform existing state of
6.1 Datasets
6.1.1 MovieLens
MovieLens is one of the most widely used dataset for evaluating collaborative
filtering algorithms. There are different versions of this dataset available we used
the one which contains 1,000,000 (million) movie ratings and every user has given
more than 20 ratings. These ratings given by user are explicit, we have choose this
particular dataset explicitly to evaluate the learning of implicit feedback from explicit
22
ratings. We converted it to implicit data by transforming each entry to 1 or 0 denoting
6.1.2 Pinterest
This implicit feedback dataset [26] is originally used for analyzing the perfor-
sparse and more than 25% of users has only single pin which makes it harder to ana-
lyze the performance of collaborative filtering techniques. So, we modifies the dataset
to be similar to Movielens dataset by ignoring users who doesn’t have at-least 20 pins
(interactions). Each interactions represents whether a user has pinned the image to
cording to this protocol or strategy, for each user leave out last user-item interaction
which is used for testing. The remaining user-item interactions is used for training.
This protocol is widely used by other implicit feedback recommendation models. Now,
we need to rank the items for each user. Since it is very time consuming to do this,
we used a popular strategy [23] that randomly draws top-K items, which user has not
interacted and then rank the leave out item among these top-K item list. We used
Hit Ratio (HR) and Normalized Discounted Cumulative Gain (NDCG) [25]
evaluation metrics to measure the performance. Hit Ratio measures the whether the
leave out item is present in top-K ranked list and Normalized Discounted Cumulative
Gain measures the position of leave out item in the rank list by assigning high scores
if it hits at high rank. We performed our experiments on top-10 item rank list for
23
users.
filtering techniques. Since, our neural network models evaluates the user-items inter-
actions, we competed our models with user-item collaborative filtering models rather
Items are ranked by number of times they appear in user-item interactions. This
6.3.2 User-KNN
This is one of the popular user based neighborhood collaborative filtering tech-
nique [27]. We adapted this model to learn from implicit user-item interactions data
present in section 3.2. using a pairwise ranking loss technique to evaluate user-
item interactions from implicit feedback. This is one of the best model for item
recommendations. We varied learning rate and then reported the best performance.
24
6.4 System Configuration
All our experiments are performed on system with the configuration in Table 2.
Property Value
Operating System MacOS High Sierra
Processor 2.8 GHz Intel Core i7
Memory 16 GB 2133 MHz LPDDR3
Disk 512 GB SSD
Keras 1.0.8 version
Theano 0.9.0 version
are parametric models, to determine these hyper parameters we randomly sample one
user-item interaction for each user and used this random sample as validation data to
tune hyper parameters. These models learn by optimized log loss objective function.
neural networks that are trained from scratch and then optimized using a mini-batch
Adam [29]. We performed experiments by varying batch sizes and learning rates as
shown in Table 3.
Property Values
Batch Sizes 128, 256, 512 and 1024
Learning rates 0.0001, 0.0005, 0.005 and 0.001
We termed the last hidden layer in neural network model as predictive factors
as it determines model capability and evaluated factors of [8, 16, 32, 64]. We used
three hidden layers for Multi Layer Perceptron models, for example, if the number of
25
predictive factors is 16, then the architecture of neural collaborative filtering layers is
64->32->16 and the embedding layer size is 32. For our, fused neural network model
with pre-training, 𝛼 was set to 0.5 there by so that pre-trained Generalized Matrix
questions.
MovieLens and Pinterest datasets. These plots use performance metrics Hit Ratio@10
and Normalized Discounted Cumulative Gain@10 along y-axes and size of predictive
factors along x-axes. For MF based Bayesian Personalized Ranking (BPR) model,
the size of predictive factors matches to number of user and item latent vectors. In
case of UserKNN model, we evaluated this model with different neighborhood sizes
and picked the best performed one. To highlight the performance of personalized
We can seen clearly from figures 6 and 7, Neural-MF is the winner among all
models by good margin. has better performance on both datasets, significantly out-
performed the state of the art. On Pinterest dataset, even small number of predictive
factors of 64. This shows the expressiveness of our model which is obtained by fus-
ing Generalized Matrix Factorization and Multi Layer Perceptron models. We can
26
Figure 6: Performance of HitRatio@10 and NDCG@10 on MovieLens Dataset
also see the other neural models, Generalized MF and Multi-Layer Perceptron also
have very good performance. Among them, Multi-Layer Perceptron is sightly less
27
Figure 7: Performance of HitRatio@10 and NDCG@10 on Pinterest Dataset
recommendation problems.
dations where ranking position ranges from 1 to 10. To highlight the power of neural
networks, we compared Neural-MF with other non-neural based methods rather than
28
all neural network based methods. We can see that Neural-MF shows gradual im-
better across Model-based methods. Finally, we can see that Most-Popular Item
tems.
29
Figure 9: Top-K item recommendation on on Pinterest Dataset
isons between models performance with pre-training and with random initializations.
For Neural-MF with random initializations we used Adam to learn model parameters.
30
In Table 4, we compared performance of models. In most of the cases, Neural-MF
with pre-training achieves better performance compared to the one with random ini-
Neural-MF model.
tant to know whether deep neural networks are really beneficial to recommendation
number of hidden layer units in MLP. The results of this experiments are shown in
Table 5 and 6. The MLP@K indicate the MLP model with K number hidden layer
units.
the number of hidden units in MLP model from 0 to 4 and predictive factors from 8-
>16->32->64. We can see that increasing layers are beneficial to performance. Thus
31
showing the importance of deep neural layers in neural collaborative filtering models.
For MLP model with no hidden layers the performance is very less than non-
personalized item popularity recommendation model. This adds values to our argu-
ment, that is simply concatenating both user and item latent vectors is not enough
32
CHAPTER 7
models performed better than state-of-art existing models on real world datasets.
Our models are simple and generic that can be applied or extended to different types
els for collaborative filtering, opening up a new avenue of research possibilities for
As a future work, we want to use pairwise learners for Neural Matrix Factoriza-
tion models and broaden it by using auxiliary information such user reviews, knowl-
edge bases, and temporal signals as integral part. We want to do research in personal-
ization models which target group of users rather than individuals. These models will
be helpful in social group recommendations [30]. Apart from these models, we want to
develop neural net recommender systems for multi-media products [31] which are less
ments that capture users interest. To add another dimension to deep neural network
based models we want to explore recurrent neural networks and hashing methods [32]
33
LIST OF REFERENCES
[1] J. Wei, “Collaborative filtering and deep learning based recommendation system
for cold start items,” Expert Systems with Applications, vol. 69, 2017.
[2] X. He, H. Zhang, M. Kan, and T. Chua, “Fast matrix factorization for online
recommendation with implicit feedback,” in SIGIR, 2016, pp. 549–558.
[3] Netfilx prize Competition, “Netfilx prize competition — Wikipedia, the free
encyclopedia,” 2006. [Online]. Available: https://en.wikipedia.org/wiki/Netflix_
Prize
[5] H. Wang, N. Wang, and D. Yeung, “Collaborative deep learning for recommender
systems.” in KDD, 2015, pp. 1235–1244.
[7] L. HU, “Your neighbors affect your ratings: On geographical neighborhood in-
fluence to rating prediction.”
[11] L. He, L. Liao, H. Zhang, H. Nie, X. Hu, and T. Chua, “Neural collaborative
filtering.” in Proceedings of the 26th International Conference on World Wide
Web. International World Wide Web Conferences Steering Committee, 2017,
pp. 173–182.
[12] Y. Hu, Y. Koren, and C. Volinsky, “Collaborative filtering for implicit feedback
datasets.” in ICDM, 2008, pp. 263–272.
[13] R. Socher, D. Chen, C. Manning, and A. Ng, “Reasoning with neural tensor
networks for knowledge base completion.” in NIPS, 2013, pp. 926–934.
34
[14] L. He, L. Liao, H. Zhang, H. Nie, X. Hu, and T. Chua, “Discrete collaborative
filtering.” in SIGIR, 2016, pp. 325–334.
[15] R. Salakhutdinov, A. Mnih, and G. Hinton, “Restricted boltzmann machines for
collaborative filtering.” in ICDM, 2007, pp. 791–798.
[16] K. Hornik, M. Stinchcombe, and H. White, “Multilayer feedforward networks are
universal approximators,” Neural Networks, vol. 5, 1989.
[17] I. Bayer, X. He, B. Kanagal, and S. Rendle, “A generic coordinate descent frame-
work for learning from implicit feedback.” in WWW, 2017.
[18] S. Sedhain, A. Menon, S. Sanner, and L. Xie, “Autorec: Autoencoders meet
collaborative filtering.” in WWW, 2015, pp. 111–112.
[19] Y. Zheng, B. Tang, W. Ding, and H. Zhou, “A neural autoregressive approach
to collaborative filtering.” in ICML, 2016, pp. 764–773.
[20] A. Elkahky, Y. Song, and X. He, “A multi-view deep learning approach for cross
domain user modeling in recommendation systems.” in WWW, 2015, pp. 278–
288.
[21] F. Strub and J. Mary, “Collaborative filtering with stacked denoising autoen-
coders and sparse inputs.” in NIPS Workshop on Machine Learning for eCom-
merce, 2015.
[22] A. Bordes, N. Usunier, A. Garcia-Duran, J. Weston, and O. Yakhnenko, “Trans-
lating embeddings for modeling multi-relational data.” in NIPS, 2013, pp. 2787–
2795.
[23] T. Cheng, L. Koc, J. Harmsen, and T. Shaked, “Wide and deep learning for
recommender systems,” in WWW, 2016, pp. 2787–2795.
[24] R. Salakhutdinov and A. Mnih, “Probabilistic matrix factorization,” in NIPS,
2008, pp. 1–8.
[25] C. T. He, X, M. Kan, and X. Chen, “Trirank: Review-aware explainable recom-
mendation by modeling aspects,” in CIKM, 2001, pp. 285–295.
[26] X. Geng, H. Zhang, J. Bian, and T. Chua, “Learning image and user features for
recommendation in social networks,” in ICCV, 2015, pp. 4274–4282.
[27] B. Sarwar, G. Karypis, J. Konstan, and J. Riedl, “Item-based collaborative fil-
tering recommendation algorithms,” in WWW, 2015, pp. 1661–1670.
[28] S. Rendle, C. Freudenthaler, Z. Gantner, and L. Schmidt-Thieme, “Item-based
collaborative filtering recommendation algorithms,” in WWW, 2015, pp. 1661–
1670.
35
[29] D. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in ICLR,
2014, pp. 1–15.
[30] X. Wang, L. Nie, X. Song, D. Zhang, and T. Chua, “Unifying virtual and physical
worlds: Learning towards local and global consistency,” ACM Transactions on
Information Systems, 2017.
[32] I. Bayer, X. He, B. Kanagal, and S. Rendle, “A generic coordinate descent frame-
work for learning from implicit feedback.” in WWW, 2017.
36