0% found this document useful (0 votes)

44 views32 pages

Performance Analysis of Various Activation Functions Using LSTM Neural Network For Movie Recommendation Systems

This document summarizes an undergraduate degree project that analyzes the performance of various activation functions in LSTM neural networks for movie recommendation systems. Specifically, it compares the hyperbolic tangent, sigmoid, ELU, and SELU activation functions applied to the input and output of LSTM blocks. The results indicate that the default hyperbolic tangent and sigmoid functions perform similarly while ELU and SELU perform worse. Further research is needed to identify other activation functions that could improve prediction accuracy.

Uploaded by

say so

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

44 views32 pages

Performance Analysis of Various Activation Functions Using LSTM Neural Network For Movie Recommendation Systems

Uploaded by

say so

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 32

DEGREE PROJECT IN TECHNOLOGY,

FIRST CYCLE, 15 CREDITS

STOCKHOLM, SWEDEN 2020

Performance Analysis of Various

Activation Functions Using LSTM
Neural Network For Movie
Recommendation Systems

ANDRÉ BROGÄRD

PHILIP SONG

KTH ROYAL INSTITUTE OF TECHNOLOGY

SCHOOL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE
Performance Analysis of
Various Activation Functions
Using LSTM Neural Network
For Movie Recommendation
Systems

ANDRÉ BROGÄRD, PHILIP SONG

Degree Project in Computer Science, DD142X

Date: June 8, 2020
Supervisor: Erik Fransén
Examiner: Pawel Herman
School of Electrical Engineering and Computer Science
Swedish title: Prestandaanalys av olika aktiveringsfunktioner i
LSTM neurala nätverk applicerat på rekommendationssystem för
filmer
iii

Abstract
The growth of importance and popularity of recommendations system has in-
creased in many various areas. This thesis focuses on recommendation sys-
tems for movies. Recurrent neural networks using LSTM blocks have shown
some success for movie recommendation systems. Research has indicated that
by changing activation functions in LSTM blocks, the performance, measured
as accuracy in predictions, can be improved. In this study we compare four
different activation functions (hyperbolic tangent, sigmoid, ELU and SELU
activation functions) used in LSTM blocks, and how they impact the predic-
tion accuracy of the neural networks. Specifically, they are applied to the block
input and the block output of the LSTM blocks. Our results indicate that the
hyperbolic tangent, which is the default, and sigmoid function perform about
the same, whereas the ELU and SELU functions perform worse. Further re-
search is needed to identify other activation functions that could improve the
prediction accuracy and improve certain aspects of our methodology.
iv

Sammanfattning
Rekommendationssystem har ökat i betydelse och popularitet i många olika
områden. Denna avhandling fokuserar på rekommendationssystem för filmer.
Recurrent neurala nätverk med LSTM blocks har visat viss framgång för re-
kommendationssystem för filmer. Tidigare forskning har indikerat att en änd-
ring av aktiverings funktioner har resulterat i förbättrad prediktering. I denna
studie jämför vi fyra olika aktiveringsfunktioner (hyperbolic tangent, sigmoid,
ELU and SELU) som appliceras i LSTM blocks och hur de påverkar predik-
teringen i det neurala nätverket. De appliceras specifikt på block input och
block output av LSTM blocken. Våra resultat indikerar att den hyperboliska
tangentfunktionen, som är standardvalet, och sigmoid funktionen presterar li-
ka, men ELU och SELU presterar båda sämre. Ytterligare forskning krävs för
att indentifiera andra aktiveringsfunktioner och för att förbättra flera delar av
metodologin.
Contents

1 Introduction 1
1.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2 Background 3
2.1 Artifical Neural Networks . . . . . . . . . . . . . . . . . . . . 3
2.2 Multilayer Perception ANN . . . . . . . . . . . . . . . . . . . 4
2.3 Recurrent Neural Network . . . . . . . . . . . . . . . . . . . 4
2.4 Long Short-Term Memory . . . . . . . . . . . . . . . . . . . 5
2.4.1 LSTM Architecture . . . . . . . . . . . . . . . . . . . 5
2.4.2 Activation Functions . . . . . . . . . . . . . . . . . . 6
2.5 Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.6 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3 Methods 12
3.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.3 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

4 Results 14

5 Discussion 17
5.1 Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
5.2 Improvements . . . . . . . . . . . . . . . . . . . . . . . . . . 18

6 Conclusions 19

Bibliography 20

v
Chapter 1

Introduction

With more online movie platforms becoming available, people have a lot of
movie content to choose from. According to a study from Ericsson, people
spend up to one hour per day searching for movie content [1]. Seeking to min-
imize this time, movie recommendation systems have been developed using
Artificial Intelligence [2].
Recommendation systems aim to solve the problem of information over-
loading, which denies access to interesting items, by filtering information [3].
One such way is through collaborative filtering (CF) where similar users’ in-
terests are considered [3]. Popular approaches to CF include the use of neural
networks, and in [4] it is demonstrated that CF can be converted to a sequence
prediction problem with the use of recurrent neural networks (RNN).
Long Short-Term Memory (LSTM), an RNN with LSTM blocks, was de-
signed in order to solve a problem with RNN and has shown an improvement in
performance [5]. LSTM has been applied in several recommendation systems
[6] targeted at both entertainment (movie, music, videos) and e-commerce set-
tings and has outperformed state-of-the-art models in many cases.
In [4] an LSTM neural network was applied to the top-N recommendation
problem, using the default choice of activation functions, recommending 10
movies the user would be interested in seeing next. The rating of a movie
was ignored, only the sequence of watched movies was considered. It was
observed that extra features such as age, rating or sex did not lead to an increase
in accuracy. Both the Movielens and Netflix dataset were used and LSTM
outperformed all baseline models in nearly all metrics.
This study will use the same framework as in [4]. Since there has been
success in switching activation functions [7], the study will compare different
choices of activation functions in LSTM blocks and its impact on prediction

1
2 CHAPTER 1. INTRODUCTION

accuracy in the context of movie recommendations.

1.1 Problem Statement

The most important functionality for movie recommendation systems is the
ability to predict a user’s preference of movies. Therefore, in our project
we will investigate the performance, measured as accuracy in predictions for
movies, of LSTM using various activation functions applied to the top-N rec-
ommender problem in movie recommendation. To this end we pose a question:
How does applying different activation functions to LSTM blocks affect the
accuracy of predicting movies for users?

1.2 Scope
The implementation of LSTM is the same as in [4] with small modifications.
This study is therefore only considering this type of LSTM applied on the top
N recommendation problem. In [4] they limit the amount of features to only
three (user id, movie id and timestamp) and further conclude that more features
such as sex or age doesn’t improve the accuracy of the models, unless they are
all put together. We limit the features identically.
Only the Movielens 1M dataset will be used in this study because of limited
computational resources. Additionally, only the hyperbolic, sigmoid, ELU
and SELU activation functions will be tested due to them showing promising
results in previous work.
Chapter 2

Background

2.1 Artifical Neural Networks

Artificial Neural Networks (ANN) are a type of computing system inspired
by the biological neural networks in human brains [8]. There are a lot of dif-
ferent types of networks, all characterized with the following components: a
set of nodes, in our case artificial neurons (nodes), and connections between
these nodes called weights. Like the synapses in a biological brain, each con-
nection between nodes can transmit a signal to other nodes. The neurons re-
ceive inputs and some processing and computing occurs and then an output
has been obtained which can be signaled to other neurons connected to it. The
weight in each connection determines the strength of one node’s influence on
another[9]. Figure 2.1 shows how an artificial neuron receives inputs, which
are multiplied by weights and then the mathematical function, activation func-
tion, determines the activation of the neuron. The activation functions will be
more thoroughly discussed in section 2.4.2.

Figure 2.1: An artificial Neuron

3
4 CHAPTER 2. BACKGROUND

2.2 Multilayer Perception ANN

Multilayer perceptron (MLP) are comprised of one or more layers of neurons.
The numbers of neurons in the input and output layer depends on the problem
whereas the number of neurons in the hidden layers are arbitrary. The goal of
MLPs is to approximate a function f ∗ . For example, a classifier y = f ∗ (x)
maps an input x to a category y. The MLPs are also called feedforward neural
networks because information flows through the function being evaluated from
x, through the intermediate computations used to define f , and finally to the
output y. There are no feedback connections in which outputs of the model
are fed back to itself. If an MLP were to include feedback connections, they
would be a recurrent neural network (RNN) [10].

2.3 Recurrent Neural Network

A weakness with MLP is that is lacks the ability to learn and efficiently store
temporal dependencies [10]. A recurrent neural network is specialized for pro-
cessing a sequence of values and they can scale to much longer sequences than
networks without sequence-based specialization. Another advantage for RNN
over MLP is the ability to share parameters across different parts of a model.
For example, if we have two sentences, “I went to Nepal in 2009” and “In 2009,
I went to Nepal”. Then extracting when the narrator went to Nepal using a
machine learning model, the MLP model, which processes sentences of fixed
length would have separate parameters for each input feature, which means it
would need to learn all the rules of the language separately in each position
in the sentence. Whereas, the RNN shares the same weights across several
time steps. However, RNN has a problem with long-term memory, meaning it
lacks the ability to connect present information to old information in order to
achieve correct context [10]. For example, consider trying to predict the last
word in the meaning “I grew up in France. . . I speak fluent French”. Latest
information suggests the word to be a language. But to tell which specific lan-
guage it is, context from further back in the text about France is needed. It is
possible for the gap between the recent information and the information fur-
ther back to become very large. As this gap grows RNNs become unable to use
the past information as context for the recent information. Fortunately, Long
Short-Term Memory neural network, is explicitly designed to solve long-term
dependency problem [11].
CHAPTER 2. BACKGROUND 5

2.4 Long Short-Term Memory

As discussed in previous section, RNN has a problem with long term mem-
ory. Long Short-Term Memory (LSTM), a special kind of RNN, is capable of
learning long-term dependencies using LSTM blocks [11]. The network was
designed to solve the problem and has shown an improvement in performance
[5]. Each LSTM block consists of one or more self-connected memory cells
along with input, forget, and output gates. The memory cells are able to store
and access information for longer time to improve performance.

2.4.1 LSTM Architecture

The main concept with LSTMs is the cell state, the round circle “Cell” in fig-
ure 2.2. The cell state holds information which flows in and out between each
LSTM block. More explicitly, the output of a cell is called hidden state. In
figure 2.2 hidden state is the output of the cell together with the pointwise op-
eration from the output gate [11]. With regulated structures called gates, the
LSTM has the ability to remove or add information to the cell state and hidden
state. They consist of a sigmoid neural net layer and a pointwise multiplica-
tion operation. The sigmoid layer, the round circle with σ in the figure, outputs
numbers between zero and one. The numbers represent how much information
that will flow through the gate. If a zero is returned nothing will flow through
whereas 1 stands for all information bits flow through. The function determin-
ing the output value between zero and one is called activation function and can
be switched in the neural nets [11]. The three gates (input, forget and output
gates), block input and block output activation functions are displayed in the
figure. The sign is a pointwise multiplication of two vectors. The activation
functions are σ and tanh [7].

ft = σ(Wf xt + Uf ht−1 + bf ) (2.1)

it = σ(Wi xt + Ui ht−1 + bi ) (2.2)
oi = σ(Wo xt + Uo ht−1 + bo ) (2.3)
C̃t = tanh(WC xt + UC ht−1 + bC ) (2.4)
Ct = ft Ct−1 + it C̃t (2.5)
ht = ot tanh(Ct ) (2.6)

The forget, input and output gates of each LSTM block are defined by
equations 2.1-2.3 respectively. C̃t defined in equation 2.4 is at time t the block
6 CHAPTER 2. BACKGROUND

Figure 2.2: Architecture of a single LSTM block where σ is the sigmoidal

gates. From [12]

input which consists of a tanh layer with the input gate. Together they decide
what information will be stored in the cell state, Ct . The cell state is updated
from the old cell state at time t. W and U are weight matrices and b is a bias
vector. Finally, the hidden state ht , is block output at time t.

2.4.2 Activation Functions

Nodes of neural networks take N number of inputs which is passed through a
nonlinearity into an output. These nonlinearities are called activations func-
tions. This is illustrated in figure 2.1. A bad choice of activation functions
can lead to loss of input data or vanishing/exploding gradients in the neural
network [13].

Sigmoid function
The sigmoid function has a range of [0, 1] and is illustrated in figure 2.3. The
formula is given by:
1
σ(x) = −x
e −1
CHAPTER 2. BACKGROUND 7

Figure 2.3: Sigmoid activation function

Hyperbolic tangent function

The hyperbolic tangent formula, further referred to as the hyperbolic function.
Is defined by:
sinh(x)
tanh(x) =
cosh(x)
It has a range of [−1, 1] and is illustrated in figure 2.4.

Exponential linear unit

The ELU was introduced in [14] and made the deep neural network of the
study learn faster and more accurately. Its formula is given by

x :x>0
ELU (x) = x
α(e − 1) : x ≤ 0

In figure 2.5, the alpha parameter is set to 1, then its range is [−1, ∞].

Self-normalizing exponential linear unit

The SELU was introduced in [15]. It is similar to the ELU but with additional
and specific parameters. It has properties that should eliminate the possibility
of vanishing/exploding gradients. The function is illustrated in figure 2.6. It
8 CHAPTER 2. BACKGROUND

Figure 2.4: Hyperbolic activation function

Figure 2.5: ELU activation function

CHAPTER 2. BACKGROUND 9

Figure 2.6: SELU activation function

is defined by:

λx :x>0
SELU (x) = x
λα(e − 1) : x ≤ 0
λ = 1.0507009873554804934193349852946
α = 1.6732632423543772848170429916717

2.5 Metrics
These are the same metrics used in [4] and are thus identically defined. They
are used to evaluate qualities in various recommendation systems.

• Sps. The Short-term Prediction Success captures the ability of the method
to predict the next item. It is 1 if the next item is present in the recom-
mendations, 0 else.

• Recall. The usual metrics for top-N recommendation captures the ability
of the method to do long term predictions.

• User coverage. The fraction of users who received at least one correct
recommendation. Average recall (and precision) hide the distribution
of success among users. A high recall could still mean that many users
10 CHAPTER 2. BACKGROUND

do not receive any good recommendation. This metrics captures the

generality of the method.

• Item coverage. The number of distinct items that were correctly recom-
mended. It captures the capacity of the method to make diverse, suc-
cessful, recommendations.

Observe that these metrics are all computed using recommendation systems
which always produces ten recommendations for each user.

2.6 Related work

Applying different activation functions
In previous works [7] and [12] a comparative study was conducted where the
performance of an LSTM network was analysed when switching different ac-
tivation functions. Both papers concluded that switching activation functions
impact the performance of the network. Although the standard activation func-
tion in the sigmoidal gates, the sigmoid function, give high performance, some
other tested less-recognized functions activation functions which could result
in more accurate performance. Furthermore, in [7] they compared exactly
23 different activation functions, where the three gates (the input, output and
forget gate) change activation functions while block input and block output
activation functions is held constant with the hyperbolic tangent (tanh). Addi-
tionally, the authors encourage further research to be conducted on other parts
of an LSTM network such as the effect of changing the hyperbolic tangent
function on the block input and block output instead of changing the activa-
tion functions in the three gates.
Different activation functions have been applied on more complex LSTM
based neural networks on different areas rather than recommendation systems
[16]. Several activation functions have been tested in LSTM blocks [16], in
the context of a spatiotemporal convolutional LSTM (convLSTM) network
introduced by [17], and applied on the MNIST dataset. The study showed
great performance for ELU and SELU activation functions and outperformed
traditional and popular choices such as the hyperbolic and sigmoid activation
functions.
CHAPTER 2. BACKGROUND 11

Applying LSTM in movie recommender systems

The authors’ experiments in [4] where they tested LSTM in movie recommen-
dation systems, showed that “...the LSTM produces very good results on the
Movielens and Netflix datasets, and is especially good in terms of short term
prediction and item coverage”. Furthermore the authors mention it is possible
to achieve better performance by adjusting the RNNs to specifically handle
collaborative filtering problems.
Chapter 3

Methods

3.1 Dataset
The dataset used is Movielens 1M. The dataset contains many possible features
that are not considered in the model, only the user id, movie id and timestamp
are treated as features. Preprocessing is included in the LSTM implementation
by [4].

3.2 Implementation
The modifications to the original code by [4] can be found in the authors’
fork of the original repository on github: github.com/andrebrogard/
sequence-based-recommendations. The only modifications made
are the option to specify which activation functions to apply to the all individ-
ual gates of the LSTM blocks when training and testing the model.

Neural network parameters

The authors of [4] observed comparable performance and the fastest learning
rate using a layer size of 20 neurons. Common to all layer sizes tested is that
they seemed to not improve beyond 100 epochs. Therefore all our tests use a
layer size of 20 neurons and runs for just above 100 epochs. One epoch is a
unit of measurement indicating training the model on the entire dataset once.

Switching activation functions

The hyperbolic activation function is default in the cell and hidden state which
are referred to block input and block output. The sigmoid function is default

12
CHAPTER 3. METHODS 13

for the input, output and forget gate. In our tests, we will compare four different
activations functions applied on the block input and block output identically,
namely the hyperbolic, sigmoid, ELU and SELU functions.

3.3 Evaluation
Metrics
The metrics used are identical to those of [4] and captures the same properties
in order to make the results comparable. They are all calculated in the context
where the recommendation system makes ten recommendations. See 2.5 for
their definition.

Test Data Set and Validation Data Set

The validation set is used during training to assess the accuracy of each model
produced. The test set is used after, and has never before been seen by the
model. All results observed in the study are from using the test data set. Test
data size and validation data size has both been chosen to 500 to maintain
comparability with [4].

Number of tests
Training will be conducted for each activation function on the dataset 15 times
in order to capture variance and observe a fair result. The models are then
evaluated according to the metrics above.
Chapter 4

Results

Figure 4.1-4.4 show mean sps, recall, user coverage and item coverage respec-
tively across intermediate epochs from 1 to 102. All results are evaluated on
the test data on saved models from each intermediate epoch. Each activation
function was, as described, used to train a model 15 times, from which the
mean of all metrics has been evaluated. Table 4.1 shows the mean and the
standard deviation of the results over 15 models.
Both ELU and SELU perform worse than the sigmoid and hyperbolic func-
tion across all metrics. Additionally, ELU always performs worse than SELU.
The hyperbolic and sigmoid function is similar in their performance, with a
slight advantage only to the hyperbolic in the recall metric.
An observation shared between most activation functions and metrics is
that the models don’t seem to improve significantly beyond around 20 epochs.
In the recall and sps metric all activation functions instead decrease. The
SELU function always decreases in all metrics after around 50 epochs. The
ELU function instead always decreases after around 20 epochs.

Activation function SPS(%) Recall(%) User Coverage(%) Item Coverage

Hyperbolic 26.0 ± 1.4 7.05 ± 0.16 85.2 ± 1.0 595 ± 11
Sigmoid 26.6 ± 1.1 6.91 ± 0.17 84.7 ± 1.5 610 ± 14
SELU 22.9 ± 1.5 6.08 ± 0.19 74.4 ± 1.5 507 ± 15
ELU 16.1 ± 2.5 4.94 ± 0.43 78.9 ± 3.2 413 ± 27

Table 4.1: Comparison of activation functions and their metrics.

14
CHAPTER 4. RESULTS 15

Figure 4.1: The mean sps across intermediate epochs. Evaluated on the test
data.

Figure 4.2: The mean recall across intermediate epochs. Evaluated on the test
data.

Figure 4.3: The mean user coverage across intermediate epochs. Evaluated on
the test data.
16 CHAPTER 4. RESULTS

Figure 4.4: The mean item coverage across intermediate epochs. Evaluated
on the test data.
Chapter 5

Discussion

5.1 Result
The ELU and SELU seems to have had a negative impact on the models, as they
did not achieve the same accuracy as the hyperbolic and sigmoid functions.
Both functions were less accurate with shorter term and longer term recom-
mendations and less users received a correct recommendation and fewer items
were ever recommended. Interestingly, the sigmoid and hyperbolic function
displayed no significant difference in metrics and the SELU function achieved
the highest mean sps value at around 50 epochs compared to all other activa-
tion functions before it started decreasing.
The ELU displayed the lowest mean and highest standard deviation in
mostly all metrics. This further indicates that ELU was not a good choice
of activation function. Moreover, SELU had lower mean but similar standard
deviation to the sigmoid and hyperbolic function. We believe that is a promis-
ing property of the SELU function as it appears to be as stable as the sigmoid
and hyperbolic function.
The sigmoid function yields better results in sps and item coverage over
the hyperbolic. Additionally, the standard deviation is slightly lower for the
sigmoid function in those two metrics. Thus, the sigmoid function could be a
substitute for the default function according to our results.
The metrics associated with the hyperbolic function should be comparable
with the results of [4] because the same framework is used and similar tests
were performed. They presented, for a layer size of 20 neurons as ours, better
results; their mean for sps on the hyperbolic function, on the same dataset, was
well over 30% for around 100 epochs. Furthermore, it wasn’t until around 100
epochs that the model stopped improving. Our results show that most activa-

17
18 CHAPTER 5. DISCUSSION

tion functions had already attained its maximum sps at around 20 epochs. Had
we observed a smoother learning rate, then we would have had more convinc-
ing results for the SELU and ELU functions.

5.2 Improvements
The choice of neural network parameters may explain the difference in re-
sults compared to [4], especially the learning rate could affect the models. It
could contribute to the fact that our models reach maximum value quicker and
hinders it from achieving similar results. We use the default learning rate pa-
rameters of the framework for RNN, which uses Adam, which might explain
the difference compared to [4]. Furthermore, the layer size, which was 20 neu-
rons in this study, should have been varied as in [4] to better observe possible
differences in learning rate. What neural network parameters to use should be
considered more carefully in future work.
In each LSTM-block the block input and block output activation functions
were the only ones changed while maintaining the same activation functions in
the three gates (input, forget and output gates) using the sigmoid function (as
is default). Whereas, in [7], 23 activation functions were applied on the three
gates. The same activation functions that showed great performance in that
study was not tested here because of time restraints. We did not observe a sig-
nificant advantage for an activation function compared to the default. For fu-
ture work, more comprehensive experiments evaluating more activation func-
tions should be performed.
The study in [4] uses two datasets: Movielens 1M and Netflix. In this
study, only the Movielens 1M is used because of time restraints. Therefore,
our results could be very bound to the structure of this specific dataset. In
future work, more datasets need to be considered.
The performance for each activation function is evaluated strictly on ac-
curacy using each metric. The temporal aspect was overlooked. Because our
tests did not record the duration the network was trained; how and if an ac-
tivation function achieves better accuracy in shorter time was not evaluated.
To better evaluate an activation function, future work should not overlook the
temporal aspect.
Chapter 6

Conclusions

In this study, we have demonstrated that by changing activation functions in

LSTM neural networks, the prediction accuracy for movie recommendation
systems can be altered. Moreover, we have compared the performance of four
different activation functions in the LSTM neural networks (hyperbolic tan-
gent, sigmoid, ELU and SELU activation functions). Our results show that the
tangent and sigmoid functions yielded higher prediction accuracy for movie
recommendation systems than the ELU and SELU.
We have only compared four different activation functions and trained the
neural network on a single dataset. More research is needed to search for
other activations functions that might perform better than the default hyper-
bolic tangent function. Furthermore, only one dataset was used and temporal
aspect was not considered. More and larger datasets should be employed to
search for higher performing activation functions.

19
Bibliography

[1] Ericsson Consumer Lab. TV and Media - a consumer driven future of

media. 2017.
[2] Song Tang, Zhiyong Wu, and Kang Chen. “Movie Recommendation
via BLSTM”. In: MultiMedia Modeling. Ed. by Laurent Amsaleg et
al. Cham: Springer International Publishing, 2017, pp. 269–279. isbn:
978-3-319-51814-5.
[3] F.O. Isinkaye, Y.O. Folajimi, and B.A. Ojokoh. “Recommendation sys-
tems: Principles, methods and evaluation”. In: Egyptian Informatics
Journal 16.3 (2015), pp. 261–273. issn: 1110-8665. doi: https://
doi . org / 10 . 1016 / j . eij . 2015 . 06 . 005. url: http :
/ / www . sciencedirect . com / science / article / pii /
S1110866515000341.
[4] Robin Devooght and Hugues Bersini. Collaborative Filtering with Re-
current Neural Networks. 2016. arXiv: 1608.07400 [cs.IR].
[5] Sepp Hochreiter and Jürgen Schmidhuber. “Long short-term memory”.
In: Neural computation 9.8 (1997), pp. 1735–1780.
[6] Ayush Singhal, Pradeep Sinha, and Rakesh Pant. “Use of Deep Learn-
ing in Modern Recommendation System: A Summary of Recent Works”.
In: International Journal of Computer Applications 180.7 (Dec. 2017),
pp. 17–22. issn: 0975-8887. doi: 10 . 5120 / ijca2017916055.
url: http://dx.doi.org/10.5120/ijca2017916055.
[7] Amir Farzad, Hoda Mashayekhi, and Hamid Hassanpour. “A compara-
tive performance analysis of different activation functions in LSTM net-
works for classification”. In: Neural Computing and Applications 31.7
(2019), pp. 2507–2521. issn: 1433-3058. doi: 10.1007/s00521-
017-3210-6. url: https://doi.org/10.1007/s00521-
017-3210-6.

20
BIBLIOGRAPHY 21

[8] Yung-Yao Chen et al. “Design and Implementation of Cloud Analytics-

Assisted Smart Power Meters Considering Advanced Artificial Intelli-
gence as Edge Analytics in Demand-Side Management for Smart Homes”.
In: Sensors (Basel, Switzerland) 19.9 (May 2019). s19092047[PII], p. 2047.
issn: 1424-8220. doi: 10 . 3390 / s19092047. url: https : / /
pubmed.ncbi.nlm.nih.gov/31052502.
[9] Patrick Henry Winston. Artificial Intelligence (3rd Ed.) USA: Addison-
Wesley Longman Publishing Co., Inc., 1992. isbn: 0201533774.
[10] Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning.
http://www.deeplearningbook.org. MIT Press, 2016.
[11] Christopher Olah. Understanding LSTM Networks. Aug. 2017. url: http:
//colah.github.io/posts/2015-08-Understanding-
LSTMs/#fn1.
[12] Gecynalda S. da S. Gomes, Teresa B. Ludermir, and Leyla M. M. R.
Lima. “Comparison of new activation functions in neural network for
forecasting financial time series”. In: Neural Computing and Applica-
tions 20.3 (2011), pp. 417–439. issn: 1433-3058. doi: 10 . 1007 /
s00521-010-0407-3. url: https://doi.org/10.1007/
s00521-010-0407-3.
[13] Soufiane Hayou, Arnaud Doucet, and Judith Rousseau. “On the Impact
of the Activation function on Deep Neural Networks Training”. In: Pro-
ceedings of the 36th International Conference on Machine Learning.
Ed. by Kamalika Chaudhuri and Ruslan Salakhutdinov. Vol. 97. Pro-
ceedings of Machine Learning Research. Long Beach, California, USA:
PMLR, Sept. 2019, pp. 2672–2680. url: http://proceedings.
mlr.press/v97/hayou19a.html.
[14] Djork-Arné Clevert, Thomas Unterthiner, and Sepp Hochreiter. Fast
and Accurate Deep Network Learning by Exponential Linear Units (ELUs).
2015. arXiv: 1511.07289 [cs.LG].
[15] Günter Klambauer et al. “Self-Normalizing Neural Networks”. In: CoRR
abs/1706.02515 (2017). arXiv: 1706.02515. url: http://arxiv.
org/abs/1706.02515.
[16] Nelly Elsayed, Anthony Maida, and Magdy Bayoumi. “Effects of Differ-
ent Activation Functions for Unsupervised Convolutional LSTM Spa-
tiotemporal Learning”. In: Advances in Science, Technology and Engi-
neering Systems Journal 4 (Apr. 2019). doi: 10.25046/aj040234.
22 BIBLIOGRAPHY

[17] Xingjian SHI et al. “Convolutional LSTM Network: A Machine Learn-

ing Approach for Precipitation Nowcasting”. In: Advances in Neural
Information Processing Systems 28. Ed. by C. Cortes et al. Curran As-
sociates, Inc., 2015, pp. 802–810. url: http://papers.nips.
cc / paper / 5955 - convolutional - lstm - network - a -
machine - learning - approach - for - precipitation -
nowcasting.pdf.
TRITA-EECS-EX-2020:414

www.kth.se

Result and Discussion Table 3 Demographic Profile of Teachers Profile F P
No ratings yet
Result and Discussion Table 3 Demographic Profile of Teachers Profile F P
2 pages
Compiler Design: Syntax-Directed Translation Sample Exercises and Solutions
100% (1)
Compiler Design: Syntax-Directed Translation Sample Exercises and Solutions
16 pages
NBA 2K13 PSP Manual Digital
50% (2)
NBA 2K13 PSP Manual Digital
10 pages
Nitsuko 124i - 384i Software Hardware Program Features
No ratings yet
Nitsuko 124i - 384i Software Hardware Program Features
1,195 pages
Unit 6
No ratings yet
Unit 6
41 pages
Deep Learning Cat 2
No ratings yet
Deep Learning Cat 2
14 pages
Stock Price Prediction Exercise
No ratings yet
Stock Price Prediction Exercise
15 pages
A Review On The Long Short Term Memory Model
No ratings yet
A Review On The Long Short Term Memory Model
34 pages
Long Short-Term Memory Based Movie Recommendation
No ratings yet
Long Short-Term Memory Based Movie Recommendation
10 pages
Provide Excellent Office Multifunction Printer in UAE - Konica Minolta Dubai
No ratings yet
Provide Excellent Office Multifunction Printer in UAE - Konica Minolta Dubai
4 pages
Lab11 Report 2022mcb1318
No ratings yet
Lab11 Report 2022mcb1318
5 pages
Satish Deep Learning Lab MAnual
No ratings yet
Satish Deep Learning Lab MAnual
85 pages
1 s2.0 S1877050923006804 Main
No ratings yet
1 s2.0 S1877050923006804 Main
5 pages
Resume Sia
No ratings yet
Resume Sia
10 pages
Lifetime Limited Memory Neural Networks
No ratings yet
Lifetime Limited Memory Neural Networks
54 pages
LSTM
No ratings yet
LSTM
27 pages
Change - Data Science Project Research Paper
No ratings yet
Change - Data Science Project Research Paper
17 pages
Drager Perseus A500 DR Ger Brochure
No ratings yet
Drager Perseus A500 DR Ger Brochure
8 pages
RNN LSTM
No ratings yet
RNN LSTM
37 pages
DeepuGupta1057 ML601
No ratings yet
DeepuGupta1057 ML601
9 pages
GMT4000product Information
No ratings yet
GMT4000product Information
2 pages
Ansbk Status
No ratings yet
Ansbk Status
2 pages
CSEC Information Technology June 2016 P02
No ratings yet
CSEC Information Technology June 2016 P02
17 pages
Mechanical State Prediction Based On LSTM Neural Netwok
No ratings yet
Mechanical State Prediction Based On LSTM Neural Netwok
6 pages
Long Short Term Memory LSTM Based Deep Learning For Sentiment Analysis of English and Spanish Data
No ratings yet
Long Short Term Memory LSTM Based Deep Learning For Sentiment Analysis of English and Spanish Data
5 pages
MN10
No ratings yet
MN10
13 pages
LSTM: A Search Space Odyssey: Klaus Greff, Rupesh K. Srivastava, Jan Koutn Ik, Bas R. Steunebrink, J Urgen Schmidhuber
No ratings yet
LSTM: A Search Space Odyssey: Klaus Greff, Rupesh K. Srivastava, Jan Koutn Ik, Bas R. Steunebrink, J Urgen Schmidhuber
12 pages
Plagarism Report
No ratings yet
Plagarism Report
12 pages
A LSTM Neural Network Applied To Mobile Robots Path Planning
No ratings yet
A LSTM Neural Network Applied To Mobile Robots Path Planning
6 pages
Exercise 12
No ratings yet
Exercise 12
5 pages
Assignment 3 2
No ratings yet
Assignment 3 2
2 pages
Exemple de Contrôle Continu
No ratings yet
Exemple de Contrôle Continu
1 page
Predictive Maintenance in Oil & Gas
No ratings yet
Predictive Maintenance in Oil & Gas
35 pages
API's Examples
No ratings yet
API's Examples
194 pages
System Design Handbook
No ratings yet
System Design Handbook
21 pages
Recommendation System Using Deep Learning
No ratings yet
Recommendation System Using Deep Learning
5 pages
Comparing Hidden Markov Models and Long Short Term Memory Neural Networks For Learning Action Representations
No ratings yet
Comparing Hidden Markov Models and Long Short Term Memory Neural Networks For Learning Action Representations
12 pages
Brain CT and MRI Medical Image Fusion Using Convolutional Neural Networks and A Dual-Channel Spiking Cortical Model
No ratings yet
Brain CT and MRI Medical Image Fusion Using Convolutional Neural Networks and A Dual-Channel Spiking Cortical Model
14 pages
Lecture 40 Final Review - F24
No ratings yet
Lecture 40 Final Review - F24
56 pages
Implementation and Optimization of The Accelerator Based On FPGA Hardware For LSTM Network
No ratings yet
Implementation and Optimization of The Accelerator Based On FPGA Hardware For LSTM Network
8 pages
A Very Brief Description or A Subtopic
No ratings yet
A Very Brief Description or A Subtopic
15 pages
9 RNN LSTM Gru
No ratings yet
9 RNN LSTM Gru
91 pages
Leca 102
No ratings yet
Leca 102
70 pages
J-3 Eyebrow Cooling Baffles Drawings and Instructions
50% (2)
J-3 Eyebrow Cooling Baffles Drawings and Instructions
15 pages
LSTM
No ratings yet
LSTM
10 pages
1308 0850 PDF
No ratings yet
1308 0850 PDF
43 pages
Sentiment Analysis From Movie Reviews Us
No ratings yet
Sentiment Analysis From Movie Reviews Us
5 pages
Numerical I Module-1
No ratings yet
Numerical I Module-1
95 pages
Asp Dac2017 1352 11
No ratings yet
Asp Dac2017 1352 11
6 pages
Chapter 12 PartII en
No ratings yet
Chapter 12 PartII en
23 pages
Algorithms 17 00104
No ratings yet
Algorithms 17 00104
28 pages
Asgore V2
No ratings yet
Asgore V2
29 pages
Sequence Classification Movie Reviews Paper Submission
No ratings yet
Sequence Classification Movie Reviews Paper Submission
8 pages
6 - RNN LSTM & Gru
No ratings yet
6 - RNN LSTM & Gru
14 pages
Long Short Term Memory Networks - Architecture of LSTM
No ratings yet
Long Short Term Memory Networks - Architecture of LSTM
14 pages
LSTM 1738024034
No ratings yet
LSTM 1738024034
13 pages
7XV5662-0AB0 Catalog SIP2004 en
No ratings yet
7XV5662-0AB0 Catalog SIP2004 en
3 pages
Exploring LSTMs
No ratings yet
Exploring LSTMs
35 pages
Paper Id - ICCCAI25 - 188
No ratings yet
Paper Id - ICCCAI25 - 188
8 pages
Architecture of Industrial Automation Systems: Abdu Idris Omer Taleb M.M., PHD
No ratings yet
Architecture of Industrial Automation Systems: Abdu Idris Omer Taleb M.M., PHD
11 pages
Sentiment Classification With Deep Neural Networks: Yi Zhou
No ratings yet
Sentiment Classification With Deep Neural Networks: Yi Zhou
58 pages
4 - General Ledger Accounting
No ratings yet
4 - General Ledger Accounting
5 pages
Remotegate and The New Sic & Eurosic: Business Applications For Banks and For Corporates
No ratings yet
Remotegate and The New Sic & Eurosic: Business Applications For Banks and For Corporates
1 page
NN Text Generation Zaid Bouslikhin
No ratings yet
NN Text Generation Zaid Bouslikhin
14 pages
A Narrative Review of Medical Image Processing by Deep Learning Models: Origin To COVID-19
No ratings yet
A Narrative Review of Medical Image Processing by Deep Learning Models: Origin To COVID-19
22 pages
Deep Learning For Stock Selection Based On High Frequency Price-Volume Data
No ratings yet
Deep Learning For Stock Selection Based On High Frequency Price-Volume Data
25 pages
Unit Iii
No ratings yet
Unit Iii
5 pages
Stock Price Trends Prediction Paper
No ratings yet
Stock Price Trends Prediction Paper
4 pages
Counters: "Registers" Section
No ratings yet
Counters: "Registers" Section
10 pages
QP14 15 Informatics Practice XI Paper
No ratings yet
QP14 15 Informatics Practice XI Paper
12 pages
6:12 Volt Lead Acid Battery Charger - Power Supply Circuits
No ratings yet
6:12 Volt Lead Acid Battery Charger - Power Supply Circuits
3 pages
Project
No ratings yet
Project
39 pages
Recurrent Neural Network Using LSTM Model
No ratings yet
Recurrent Neural Network Using LSTM Model
15 pages
QP - 12-CS - PB-I 23-24 Set 1
No ratings yet
QP - 12-CS - PB-I 23-24 Set 1
10 pages
Filmmora Pentig
No ratings yet
Filmmora Pentig
3 pages
Smarter Decisions – The Intersection of Internet of Things and Decision Science
From Everand
Smarter Decisions – The Intersection of Internet of Things and Decision Science
Jojo Moolayil
No ratings yet
Fundamentals of Machine Learning: An Introduction to Neural Networks
From Everand
Fundamentals of Machine Learning: An Introduction to Neural Networks
Peter Johnson
No ratings yet
Reinforcement Learning: A Practical Guide to Algorithms
From Everand
Reinforcement Learning: A Practical Guide to Algorithms
Trilokesh Khatri
No ratings yet
Using Vocals Determine Human Emotion
From Everand
Using Vocals Determine Human Emotion
Faiz ul haque Zeya
No ratings yet
Artificial Intelligence Interview Questions
From Everand
Artificial Intelligence Interview Questions
Tech Interviews
5/5 (2)
Image Classification: Step-by-step Classifying Images with Python and Techniques of Computer Vision and Machine Learning
From Everand
Image Classification: Step-by-step Classifying Images with Python and Techniques of Computer Vision and Machine Learning
Mark Magic
No ratings yet
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
César Pérez López
No ratings yet
Python Machine Learning: Machine Learning Algorithms for Beginners - Data Management and Analytics for Approaching Deep Learning and Neural Networks from Scratch
From Everand
Python Machine Learning: Machine Learning Algorithms for Beginners - Data Management and Analytics for Approaching Deep Learning and Neural Networks from Scratch
Ahmed Ph. Abbasi
No ratings yet
Data Mining Models: Techniques and Applications
From Everand
Data Mining Models: Techniques and Applications
Ravi Deshpande
No ratings yet
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: NAIVE BAYES, NEAREST NEIGHBORS and NEURAL NETWORKS: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: NAIVE BAYES, NEAREST NEIGHBORS and NEURAL NETWORKS: Examples with MATLAB
César Pérez López
No ratings yet
Programming Concepts in Java
From Everand
Programming Concepts in Java
Robert Burns
No ratings yet
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
Programming Concepts in Python
From Everand
Programming Concepts in Python
Robert Burns
No ratings yet
Action Recognition: Step-by-step Recognizing Actions with Python and Recurrent Neural Network
From Everand
Action Recognition: Step-by-step Recognizing Actions with Python and Recurrent Neural Network
Mark Magic
No ratings yet
Advanced Multiplayer Game Development with Ureal Engine 5: A Comprehensive Guide to C++ Scripting
From Everand
Advanced Multiplayer Game Development with Ureal Engine 5: A Comprehensive Guide to C++ Scripting
Vladimir Kiselev
No ratings yet

Performance Analysis of Various Activation Functions Using LSTM Neural Network For Movie Recommendation Systems

Uploaded by

Performance Analysis of Various Activation Functions Using LSTM Neural Network For Movie Recommendation Systems

Uploaded by

DEGREE PROJECT IN TECHNOLOGY,

FIRST CYCLE, 15 CREDITS

Performance Analysis of Various

KTH ROYAL INSTITUTE OF TECHNOLOGY

ANDRÉ BROGÄRD, PHILIP SONG

Degree Project in Computer Science, DD142X

accuracy in the context of movie recommendations.

1.1 Problem Statement

2.1 Artifical Neural Networks

Figure 2.1: An artificial Neuron

2.2 Multilayer Perception ANN

2.3 Recurrent Neural Network

2.4 Long Short-Term Memory

2.4.1 LSTM Architecture

ft = σ(Wf xt + Uf ht−1 + bf ) (2.1)

Figure 2.2: Architecture of a single LSTM block where σ is the sigmoidal

2.4.2 Activation Functions

Figure 2.3: Sigmoid activation function

Hyperbolic tangent function

Exponential linear unit

Self-normalizing exponential linear unit

Figure 2.4: Hyperbolic activation function

Figure 2.5: ELU activation function

Figure 2.6: SELU activation function

do not receive any good recommendation. This metrics captures the

2.6 Related work

Applying LSTM in movie recommender systems

Neural network parameters

Switching activation functions

Test Data Set and Validation Data Set

Activation function SPS(%) Recall(%) User Coverage(%) Item Coverage

Table 4.1: Comparison of activation functions and their metrics.

In this study, we have demonstrated that by changing activation functions in

[1] Ericsson Consumer Lab. TV and Media - a consumer driven future of

[8] Yung-Yao Chen et al. “Design and Implementation of Cloud Analytics-

[17] Xingjian SHI et al. “Convolutional LSTM Network: A Machine Learn-

You might also like