0% found this document useful (0 votes)
44 views32 pages

Performance Analysis of Various Activation Functions Using LSTM Neural Network For Movie Recommendation Systems

This document summarizes an undergraduate degree project that analyzes the performance of various activation functions in LSTM neural networks for movie recommendation systems. Specifically, it compares the hyperbolic tangent, sigmoid, ELU, and SELU activation functions applied to the input and output of LSTM blocks. The results indicate that the default hyperbolic tangent and sigmoid functions perform similarly while ELU and SELU perform worse. Further research is needed to identify other activation functions that could improve prediction accuracy.

Uploaded by

say so
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views32 pages

Performance Analysis of Various Activation Functions Using LSTM Neural Network For Movie Recommendation Systems

This document summarizes an undergraduate degree project that analyzes the performance of various activation functions in LSTM neural networks for movie recommendation systems. Specifically, it compares the hyperbolic tangent, sigmoid, ELU, and SELU activation functions applied to the input and output of LSTM blocks. The results indicate that the default hyperbolic tangent and sigmoid functions perform similarly while ELU and SELU perform worse. Further research is needed to identify other activation functions that could improve prediction accuracy.

Uploaded by

say so
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

DEGREE PROJECT IN TECHNOLOGY,

FIRST CYCLE, 15 CREDITS


STOCKHOLM, SWEDEN 2020

Performance Analysis of Various


Activation Functions Using LSTM
Neural Network For Movie
Recommendation Systems

ANDRÉ BROGÄRD

PHILIP SONG

KTH ROYAL INSTITUTE OF TECHNOLOGY


SCHOOL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE
Performance Analysis of
Various Activation Functions
Using LSTM Neural Network
For Movie Recommendation
Systems

ANDRÉ BROGÄRD, PHILIP SONG

Degree Project in Computer Science, DD142X


Date: June 8, 2020
Supervisor: Erik Fransén
Examiner: Pawel Herman
School of Electrical Engineering and Computer Science
Swedish title: Prestandaanalys av olika aktiveringsfunktioner i
LSTM neurala nätverk applicerat på rekommendationssystem för
filmer
iii

Abstract
The growth of importance and popularity of recommendations system has in-
creased in many various areas. This thesis focuses on recommendation sys-
tems for movies. Recurrent neural networks using LSTM blocks have shown
some success for movie recommendation systems. Research has indicated that
by changing activation functions in LSTM blocks, the performance, measured
as accuracy in predictions, can be improved. In this study we compare four
different activation functions (hyperbolic tangent, sigmoid, ELU and SELU
activation functions) used in LSTM blocks, and how they impact the predic-
tion accuracy of the neural networks. Specifically, they are applied to the block
input and the block output of the LSTM blocks. Our results indicate that the
hyperbolic tangent, which is the default, and sigmoid function perform about
the same, whereas the ELU and SELU functions perform worse. Further re-
search is needed to identify other activation functions that could improve the
prediction accuracy and improve certain aspects of our methodology.
iv

Sammanfattning
Rekommendationssystem har ökat i betydelse och popularitet i många olika
områden. Denna avhandling fokuserar på rekommendationssystem för filmer.
Recurrent neurala nätverk med LSTM blocks har visat viss framgång för re-
kommendationssystem för filmer. Tidigare forskning har indikerat att en änd-
ring av aktiverings funktioner har resulterat i förbättrad prediktering. I denna
studie jämför vi fyra olika aktiveringsfunktioner (hyperbolic tangent, sigmoid,
ELU and SELU) som appliceras i LSTM blocks och hur de påverkar predik-
teringen i det neurala nätverket. De appliceras specifikt på block input och
block output av LSTM blocken. Våra resultat indikerar att den hyperboliska
tangentfunktionen, som är standardvalet, och sigmoid funktionen presterar li-
ka, men ELU och SELU presterar båda sämre. Ytterligare forskning krävs för
att indentifiera andra aktiveringsfunktioner och för att förbättra flera delar av
metodologin.
Contents

1 Introduction 1
1.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2 Background 3
2.1 Artifical Neural Networks . . . . . . . . . . . . . . . . . . . . 3
2.2 Multilayer Perception ANN . . . . . . . . . . . . . . . . . . . 4
2.3 Recurrent Neural Network . . . . . . . . . . . . . . . . . . . 4
2.4 Long Short-Term Memory . . . . . . . . . . . . . . . . . . . 5
2.4.1 LSTM Architecture . . . . . . . . . . . . . . . . . . . 5
2.4.2 Activation Functions . . . . . . . . . . . . . . . . . . 6
2.5 Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.6 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3 Methods 12
3.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.3 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

4 Results 14

5 Discussion 17
5.1 Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
5.2 Improvements . . . . . . . . . . . . . . . . . . . . . . . . . . 18

6 Conclusions 19

Bibliography 20

v
Chapter 1

Introduction

With more online movie platforms becoming available, people have a lot of
movie content to choose from. According to a study from Ericsson, people
spend up to one hour per day searching for movie content [1]. Seeking to min-
imize this time, movie recommendation systems have been developed using
Artificial Intelligence [2].
Recommendation systems aim to solve the problem of information over-
loading, which denies access to interesting items, by filtering information [3].
One such way is through collaborative filtering (CF) where similar users’ in-
terests are considered [3]. Popular approaches to CF include the use of neural
networks, and in [4] it is demonstrated that CF can be converted to a sequence
prediction problem with the use of recurrent neural networks (RNN).
Long Short-Term Memory (LSTM), an RNN with LSTM blocks, was de-
signed in order to solve a problem with RNN and has shown an improvement in
performance [5]. LSTM has been applied in several recommendation systems
[6] targeted at both entertainment (movie, music, videos) and e-commerce set-
tings and has outperformed state-of-the-art models in many cases.
In [4] an LSTM neural network was applied to the top-N recommendation
problem, using the default choice of activation functions, recommending 10
movies the user would be interested in seeing next. The rating of a movie
was ignored, only the sequence of watched movies was considered. It was
observed that extra features such as age, rating or sex did not lead to an increase
in accuracy. Both the Movielens and Netflix dataset were used and LSTM
outperformed all baseline models in nearly all metrics.
This study will use the same framework as in [4]. Since there has been
success in switching activation functions [7], the study will compare different
choices of activation functions in LSTM blocks and its impact on prediction

1
2 CHAPTER 1. INTRODUCTION

accuracy in the context of movie recommendations.

1.1 Problem Statement


The most important functionality for movie recommendation systems is the
ability to predict a user’s preference of movies. Therefore, in our project
we will investigate the performance, measured as accuracy in predictions for
movies, of LSTM using various activation functions applied to the top-N rec-
ommender problem in movie recommendation. To this end we pose a question:
How does applying different activation functions to LSTM blocks affect the
accuracy of predicting movies for users?

1.2 Scope
The implementation of LSTM is the same as in [4] with small modifications.
This study is therefore only considering this type of LSTM applied on the top
N recommendation problem. In [4] they limit the amount of features to only
three (user id, movie id and timestamp) and further conclude that more features
such as sex or age doesn’t improve the accuracy of the models, unless they are
all put together. We limit the features identically.
Only the Movielens 1M dataset will be used in this study because of limited
computational resources. Additionally, only the hyperbolic, sigmoid, ELU
and SELU activation functions will be tested due to them showing promising
results in previous work.
Chapter 2

Background

2.1 Artifical Neural Networks


Artificial Neural Networks (ANN) are a type of computing system inspired
by the biological neural networks in human brains [8]. There are a lot of dif-
ferent types of networks, all characterized with the following components: a
set of nodes, in our case artificial neurons (nodes), and connections between
these nodes called weights. Like the synapses in a biological brain, each con-
nection between nodes can transmit a signal to other nodes. The neurons re-
ceive inputs and some processing and computing occurs and then an output
has been obtained which can be signaled to other neurons connected to it. The
weight in each connection determines the strength of one node’s influence on
another[9]. Figure 2.1 shows how an artificial neuron receives inputs, which
are multiplied by weights and then the mathematical function, activation func-
tion, determines the activation of the neuron. The activation functions will be
more thoroughly discussed in section 2.4.2.

Figure 2.1: An artificial Neuron

3
4 CHAPTER 2. BACKGROUND

2.2 Multilayer Perception ANN


Multilayer perceptron (MLP) are comprised of one or more layers of neurons.
The numbers of neurons in the input and output layer depends on the problem
whereas the number of neurons in the hidden layers are arbitrary. The goal of
MLPs is to approximate a function f ∗ . For example, a classifier y = f ∗ (x)
maps an input x to a category y. The MLPs are also called feedforward neural
networks because information flows through the function being evaluated from
x, through the intermediate computations used to define f , and finally to the
output y. There are no feedback connections in which outputs of the model
are fed back to itself. If an MLP were to include feedback connections, they
would be a recurrent neural network (RNN) [10].

2.3 Recurrent Neural Network


A weakness with MLP is that is lacks the ability to learn and efficiently store
temporal dependencies [10]. A recurrent neural network is specialized for pro-
cessing a sequence of values and they can scale to much longer sequences than
networks without sequence-based specialization. Another advantage for RNN
over MLP is the ability to share parameters across different parts of a model.
For example, if we have two sentences, “I went to Nepal in 2009” and “In 2009,
I went to Nepal”. Then extracting when the narrator went to Nepal using a
machine learning model, the MLP model, which processes sentences of fixed
length would have separate parameters for each input feature, which means it
would need to learn all the rules of the language separately in each position
in the sentence. Whereas, the RNN shares the same weights across several
time steps. However, RNN has a problem with long-term memory, meaning it
lacks the ability to connect present information to old information in order to
achieve correct context [10]. For example, consider trying to predict the last
word in the meaning “I grew up in France. . . I speak fluent French”. Latest
information suggests the word to be a language. But to tell which specific lan-
guage it is, context from further back in the text about France is needed. It is
possible for the gap between the recent information and the information fur-
ther back to become very large. As this gap grows RNNs become unable to use
the past information as context for the recent information. Fortunately, Long
Short-Term Memory neural network, is explicitly designed to solve long-term
dependency problem [11].
CHAPTER 2. BACKGROUND 5

2.4 Long Short-Term Memory


As discussed in previous section, RNN has a problem with long term mem-
ory. Long Short-Term Memory (LSTM), a special kind of RNN, is capable of
learning long-term dependencies using LSTM blocks [11]. The network was
designed to solve the problem and has shown an improvement in performance
[5]. Each LSTM block consists of one or more self-connected memory cells
along with input, forget, and output gates. The memory cells are able to store
and access information for longer time to improve performance.

2.4.1 LSTM Architecture


The main concept with LSTMs is the cell state, the round circle “Cell” in fig-
ure 2.2. The cell state holds information which flows in and out between each
LSTM block. More explicitly, the output of a cell is called hidden state. In
figure 2.2 hidden state is the output of the cell together with the pointwise op-
eration from the output gate [11]. With regulated structures called gates, the
LSTM has the ability to remove or add information to the cell state and hidden
state. They consist of a sigmoid neural net layer and a pointwise multiplica-
tion operation. The sigmoid layer, the round circle with σ in the figure, outputs
numbers between zero and one. The numbers represent how much information
that will flow through the gate. If a zero is returned nothing will flow through
whereas 1 stands for all information bits flow through. The function determin-
ing the output value between zero and one is called activation function and can
be switched in the neural nets [11]. The three gates (input, forget and output
gates), block input and block output activation functions are displayed in the
figure. The sign is a pointwise multiplication of two vectors. The activation
functions are σ and tanh [7].

ft = σ(Wf xt + Uf ht−1 + bf ) (2.1)


it = σ(Wi xt + Ui ht−1 + bi ) (2.2)
oi = σ(Wo xt + Uo ht−1 + bo ) (2.3)
C̃t = tanh(WC xt + UC ht−1 + bC ) (2.4)
Ct = ft Ct−1 + it C̃t (2.5)
ht = ot tanh(Ct ) (2.6)

The forget, input and output gates of each LSTM block are defined by
equations 2.1-2.3 respectively. C̃t defined in equation 2.4 is at time t the block
6 CHAPTER 2. BACKGROUND

Figure 2.2: Architecture of a single LSTM block where σ is the sigmoidal


gates. From [12]

input which consists of a tanh layer with the input gate. Together they decide
what information will be stored in the cell state, Ct . The cell state is updated
from the old cell state at time t. W and U are weight matrices and b is a bias
vector. Finally, the hidden state ht , is block output at time t.

2.4.2 Activation Functions


Nodes of neural networks take N number of inputs which is passed through a
nonlinearity into an output. These nonlinearities are called activations func-
tions. This is illustrated in figure 2.1. A bad choice of activation functions
can lead to loss of input data or vanishing/exploding gradients in the neural
network [13].

Sigmoid function
The sigmoid function has a range of [0, 1] and is illustrated in figure 2.3. The
formula is given by:
1
σ(x) = −x
e −1
CHAPTER 2. BACKGROUND 7

Figure 2.3: Sigmoid activation function

Hyperbolic tangent function


The hyperbolic tangent formula, further referred to as the hyperbolic function.
Is defined by:
sinh(x)
tanh(x) =
cosh(x)
It has a range of [−1, 1] and is illustrated in figure 2.4.

Exponential linear unit


The ELU was introduced in [14] and made the deep neural network of the
study learn faster and more accurately. Its formula is given by

x :x>0
ELU (x) = x
α(e − 1) : x ≤ 0

In figure 2.5, the alpha parameter is set to 1, then its range is [−1, ∞].

Self-normalizing exponential linear unit


The SELU was introduced in [15]. It is similar to the ELU but with additional
and specific parameters. It has properties that should eliminate the possibility
of vanishing/exploding gradients. The function is illustrated in figure 2.6. It
8 CHAPTER 2. BACKGROUND

Figure 2.4: Hyperbolic activation function

Figure 2.5: ELU activation function


CHAPTER 2. BACKGROUND 9

Figure 2.6: SELU activation function

is defined by:

λx :x>0
SELU (x) = x
λα(e − 1) : x ≤ 0
λ = 1.0507009873554804934193349852946
α = 1.6732632423543772848170429916717

2.5 Metrics
These are the same metrics used in [4] and are thus identically defined. They
are used to evaluate qualities in various recommendation systems.

• Sps. The Short-term Prediction Success captures the ability of the method
to predict the next item. It is 1 if the next item is present in the recom-
mendations, 0 else.

• Recall. The usual metrics for top-N recommendation captures the ability
of the method to do long term predictions.

• User coverage. The fraction of users who received at least one correct
recommendation. Average recall (and precision) hide the distribution
of success among users. A high recall could still mean that many users
10 CHAPTER 2. BACKGROUND

do not receive any good recommendation. This metrics captures the


generality of the method.

• Item coverage. The number of distinct items that were correctly recom-
mended. It captures the capacity of the method to make diverse, suc-
cessful, recommendations.

Observe that these metrics are all computed using recommendation systems
which always produces ten recommendations for each user.

2.6 Related work


Applying different activation functions
In previous works [7] and [12] a comparative study was conducted where the
performance of an LSTM network was analysed when switching different ac-
tivation functions. Both papers concluded that switching activation functions
impact the performance of the network. Although the standard activation func-
tion in the sigmoidal gates, the sigmoid function, give high performance, some
other tested less-recognized functions activation functions which could result
in more accurate performance. Furthermore, in [7] they compared exactly
23 different activation functions, where the three gates (the input, output and
forget gate) change activation functions while block input and block output
activation functions is held constant with the hyperbolic tangent (tanh). Addi-
tionally, the authors encourage further research to be conducted on other parts
of an LSTM network such as the effect of changing the hyperbolic tangent
function on the block input and block output instead of changing the activa-
tion functions in the three gates.
Different activation functions have been applied on more complex LSTM
based neural networks on different areas rather than recommendation systems
[16]. Several activation functions have been tested in LSTM blocks [16], in
the context of a spatiotemporal convolutional LSTM (convLSTM) network
introduced by [17], and applied on the MNIST dataset. The study showed
great performance for ELU and SELU activation functions and outperformed
traditional and popular choices such as the hyperbolic and sigmoid activation
functions.
CHAPTER 2. BACKGROUND 11

Applying LSTM in movie recommender systems


The authors’ experiments in [4] where they tested LSTM in movie recommen-
dation systems, showed that “...the LSTM produces very good results on the
Movielens and Netflix datasets, and is especially good in terms of short term
prediction and item coverage”. Furthermore the authors mention it is possible
to achieve better performance by adjusting the RNNs to specifically handle
collaborative filtering problems.
Chapter 3

Methods

3.1 Dataset
The dataset used is Movielens 1M. The dataset contains many possible features
that are not considered in the model, only the user id, movie id and timestamp
are treated as features. Preprocessing is included in the LSTM implementation
by [4].

3.2 Implementation
The modifications to the original code by [4] can be found in the authors’
fork of the original repository on github: github.com/andrebrogard/
sequence-based-recommendations. The only modifications made
are the option to specify which activation functions to apply to the all individ-
ual gates of the LSTM blocks when training and testing the model.

Neural network parameters


The authors of [4] observed comparable performance and the fastest learning
rate using a layer size of 20 neurons. Common to all layer sizes tested is that
they seemed to not improve beyond 100 epochs. Therefore all our tests use a
layer size of 20 neurons and runs for just above 100 epochs. One epoch is a
unit of measurement indicating training the model on the entire dataset once.

Switching activation functions


The hyperbolic activation function is default in the cell and hidden state which
are referred to block input and block output. The sigmoid function is default

12
CHAPTER 3. METHODS 13

for the input, output and forget gate. In our tests, we will compare four different
activations functions applied on the block input and block output identically,
namely the hyperbolic, sigmoid, ELU and SELU functions.

3.3 Evaluation
Metrics
The metrics used are identical to those of [4] and captures the same properties
in order to make the results comparable. They are all calculated in the context
where the recommendation system makes ten recommendations. See 2.5 for
their definition.

Test Data Set and Validation Data Set


The validation set is used during training to assess the accuracy of each model
produced. The test set is used after, and has never before been seen by the
model. All results observed in the study are from using the test data set. Test
data size and validation data size has both been chosen to 500 to maintain
comparability with [4].

Number of tests
Training will be conducted for each activation function on the dataset 15 times
in order to capture variance and observe a fair result. The models are then
evaluated according to the metrics above.
Chapter 4

Results

Figure 4.1-4.4 show mean sps, recall, user coverage and item coverage respec-
tively across intermediate epochs from 1 to 102. All results are evaluated on
the test data on saved models from each intermediate epoch. Each activation
function was, as described, used to train a model 15 times, from which the
mean of all metrics has been evaluated. Table 4.1 shows the mean and the
standard deviation of the results over 15 models.
Both ELU and SELU perform worse than the sigmoid and hyperbolic func-
tion across all metrics. Additionally, ELU always performs worse than SELU.
The hyperbolic and sigmoid function is similar in their performance, with a
slight advantage only to the hyperbolic in the recall metric.
An observation shared between most activation functions and metrics is
that the models don’t seem to improve significantly beyond around 20 epochs.
In the recall and sps metric all activation functions instead decrease. The
SELU function always decreases in all metrics after around 50 epochs. The
ELU function instead always decreases after around 20 epochs.

Activation function SPS(%) Recall(%) User Coverage(%) Item Coverage


Hyperbolic 26.0 ± 1.4 7.05 ± 0.16 85.2 ± 1.0 595 ± 11
Sigmoid 26.6 ± 1.1 6.91 ± 0.17 84.7 ± 1.5 610 ± 14
SELU 22.9 ± 1.5 6.08 ± 0.19 74.4 ± 1.5 507 ± 15
ELU 16.1 ± 2.5 4.94 ± 0.43 78.9 ± 3.2 413 ± 27

Table 4.1: Comparison of activation functions and their metrics.

14
CHAPTER 4. RESULTS 15

Figure 4.1: The mean sps across intermediate epochs. Evaluated on the test
data.

Figure 4.2: The mean recall across intermediate epochs. Evaluated on the test
data.

Figure 4.3: The mean user coverage across intermediate epochs. Evaluated on
the test data.
16 CHAPTER 4. RESULTS

Figure 4.4: The mean item coverage across intermediate epochs. Evaluated
on the test data.
Chapter 5

Discussion

5.1 Result
The ELU and SELU seems to have had a negative impact on the models, as they
did not achieve the same accuracy as the hyperbolic and sigmoid functions.
Both functions were less accurate with shorter term and longer term recom-
mendations and less users received a correct recommendation and fewer items
were ever recommended. Interestingly, the sigmoid and hyperbolic function
displayed no significant difference in metrics and the SELU function achieved
the highest mean sps value at around 50 epochs compared to all other activa-
tion functions before it started decreasing.
The ELU displayed the lowest mean and highest standard deviation in
mostly all metrics. This further indicates that ELU was not a good choice
of activation function. Moreover, SELU had lower mean but similar standard
deviation to the sigmoid and hyperbolic function. We believe that is a promis-
ing property of the SELU function as it appears to be as stable as the sigmoid
and hyperbolic function.
The sigmoid function yields better results in sps and item coverage over
the hyperbolic. Additionally, the standard deviation is slightly lower for the
sigmoid function in those two metrics. Thus, the sigmoid function could be a
substitute for the default function according to our results.
The metrics associated with the hyperbolic function should be comparable
with the results of [4] because the same framework is used and similar tests
were performed. They presented, for a layer size of 20 neurons as ours, better
results; their mean for sps on the hyperbolic function, on the same dataset, was
well over 30% for around 100 epochs. Furthermore, it wasn’t until around 100
epochs that the model stopped improving. Our results show that most activa-

17
18 CHAPTER 5. DISCUSSION

tion functions had already attained its maximum sps at around 20 epochs. Had
we observed a smoother learning rate, then we would have had more convinc-
ing results for the SELU and ELU functions.

5.2 Improvements
The choice of neural network parameters may explain the difference in re-
sults compared to [4], especially the learning rate could affect the models. It
could contribute to the fact that our models reach maximum value quicker and
hinders it from achieving similar results. We use the default learning rate pa-
rameters of the framework for RNN, which uses Adam, which might explain
the difference compared to [4]. Furthermore, the layer size, which was 20 neu-
rons in this study, should have been varied as in [4] to better observe possible
differences in learning rate. What neural network parameters to use should be
considered more carefully in future work.
In each LSTM-block the block input and block output activation functions
were the only ones changed while maintaining the same activation functions in
the three gates (input, forget and output gates) using the sigmoid function (as
is default). Whereas, in [7], 23 activation functions were applied on the three
gates. The same activation functions that showed great performance in that
study was not tested here because of time restraints. We did not observe a sig-
nificant advantage for an activation function compared to the default. For fu-
ture work, more comprehensive experiments evaluating more activation func-
tions should be performed.
The study in [4] uses two datasets: Movielens 1M and Netflix. In this
study, only the Movielens 1M is used because of time restraints. Therefore,
our results could be very bound to the structure of this specific dataset. In
future work, more datasets need to be considered.
The performance for each activation function is evaluated strictly on ac-
curacy using each metric. The temporal aspect was overlooked. Because our
tests did not record the duration the network was trained; how and if an ac-
tivation function achieves better accuracy in shorter time was not evaluated.
To better evaluate an activation function, future work should not overlook the
temporal aspect.
Chapter 6

Conclusions

In this study, we have demonstrated that by changing activation functions in


LSTM neural networks, the prediction accuracy for movie recommendation
systems can be altered. Moreover, we have compared the performance of four
different activation functions in the LSTM neural networks (hyperbolic tan-
gent, sigmoid, ELU and SELU activation functions). Our results show that the
tangent and sigmoid functions yielded higher prediction accuracy for movie
recommendation systems than the ELU and SELU.
We have only compared four different activation functions and trained the
neural network on a single dataset. More research is needed to search for
other activations functions that might perform better than the default hyper-
bolic tangent function. Furthermore, only one dataset was used and temporal
aspect was not considered. More and larger datasets should be employed to
search for higher performing activation functions.

19
Bibliography

[1] Ericsson Consumer Lab. TV and Media - a consumer driven future of


media. 2017.
[2] Song Tang, Zhiyong Wu, and Kang Chen. “Movie Recommendation
via BLSTM”. In: MultiMedia Modeling. Ed. by Laurent Amsaleg et
al. Cham: Springer International Publishing, 2017, pp. 269–279. isbn:
978-3-319-51814-5.
[3] F.O. Isinkaye, Y.O. Folajimi, and B.A. Ojokoh. “Recommendation sys-
tems: Principles, methods and evaluation”. In: Egyptian Informatics
Journal 16.3 (2015), pp. 261–273. issn: 1110-8665. doi: https://
doi . org / 10 . 1016 / j . eij . 2015 . 06 . 005. url: http :
/ / www . sciencedirect . com / science / article / pii /
S1110866515000341.
[4] Robin Devooght and Hugues Bersini. Collaborative Filtering with Re-
current Neural Networks. 2016. arXiv: 1608.07400 [cs.IR].
[5] Sepp Hochreiter and Jürgen Schmidhuber. “Long short-term memory”.
In: Neural computation 9.8 (1997), pp. 1735–1780.
[6] Ayush Singhal, Pradeep Sinha, and Rakesh Pant. “Use of Deep Learn-
ing in Modern Recommendation System: A Summary of Recent Works”.
In: International Journal of Computer Applications 180.7 (Dec. 2017),
pp. 17–22. issn: 0975-8887. doi: 10 . 5120 / ijca2017916055.
url: http://dx.doi.org/10.5120/ijca2017916055.
[7] Amir Farzad, Hoda Mashayekhi, and Hamid Hassanpour. “A compara-
tive performance analysis of different activation functions in LSTM net-
works for classification”. In: Neural Computing and Applications 31.7
(2019), pp. 2507–2521. issn: 1433-3058. doi: 10.1007/s00521-
017-3210-6. url: https://doi.org/10.1007/s00521-
017-3210-6.

20
BIBLIOGRAPHY 21

[8] Yung-Yao Chen et al. “Design and Implementation of Cloud Analytics-


Assisted Smart Power Meters Considering Advanced Artificial Intelli-
gence as Edge Analytics in Demand-Side Management for Smart Homes”.
In: Sensors (Basel, Switzerland) 19.9 (May 2019). s19092047[PII], p. 2047.
issn: 1424-8220. doi: 10 . 3390 / s19092047. url: https : / /
pubmed.ncbi.nlm.nih.gov/31052502.
[9] Patrick Henry Winston. Artificial Intelligence (3rd Ed.) USA: Addison-
Wesley Longman Publishing Co., Inc., 1992. isbn: 0201533774.
[10] Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning.
http://www.deeplearningbook.org. MIT Press, 2016.
[11] Christopher Olah. Understanding LSTM Networks. Aug. 2017. url: http:
//colah.github.io/posts/2015-08-Understanding-
LSTMs/#fn1.
[12] Gecynalda S. da S. Gomes, Teresa B. Ludermir, and Leyla M. M. R.
Lima. “Comparison of new activation functions in neural network for
forecasting financial time series”. In: Neural Computing and Applica-
tions 20.3 (2011), pp. 417–439. issn: 1433-3058. doi: 10 . 1007 /
s00521-010-0407-3. url: https://doi.org/10.1007/
s00521-010-0407-3.
[13] Soufiane Hayou, Arnaud Doucet, and Judith Rousseau. “On the Impact
of the Activation function on Deep Neural Networks Training”. In: Pro-
ceedings of the 36th International Conference on Machine Learning.
Ed. by Kamalika Chaudhuri and Ruslan Salakhutdinov. Vol. 97. Pro-
ceedings of Machine Learning Research. Long Beach, California, USA:
PMLR, Sept. 2019, pp. 2672–2680. url: http://proceedings.
mlr.press/v97/hayou19a.html.
[14] Djork-Arné Clevert, Thomas Unterthiner, and Sepp Hochreiter. Fast
and Accurate Deep Network Learning by Exponential Linear Units (ELUs).
2015. arXiv: 1511.07289 [cs.LG].
[15] Günter Klambauer et al. “Self-Normalizing Neural Networks”. In: CoRR
abs/1706.02515 (2017). arXiv: 1706.02515. url: http://arxiv.
org/abs/1706.02515.
[16] Nelly Elsayed, Anthony Maida, and Magdy Bayoumi. “Effects of Differ-
ent Activation Functions for Unsupervised Convolutional LSTM Spa-
tiotemporal Learning”. In: Advances in Science, Technology and Engi-
neering Systems Journal 4 (Apr. 2019). doi: 10.25046/aj040234.
22 BIBLIOGRAPHY

[17] Xingjian SHI et al. “Convolutional LSTM Network: A Machine Learn-


ing Approach for Precipitation Nowcasting”. In: Advances in Neural
Information Processing Systems 28. Ed. by C. Cortes et al. Curran As-
sociates, Inc., 2015, pp. 802–810. url: http://papers.nips.
cc / paper / 5955 - convolutional - lstm - network - a -
machine - learning - approach - for - precipitation -
nowcasting.pdf.
TRITA-EECS-EX-2020:414

www.kth.se

You might also like