Long Short-Term Memory Neural Network For Financial Time Series
Long Short-Term Memory Neural Network For Financial Time Series
Long Short-Term Memory Neural Network For Financial Time Series
Carmina Fjellström
Abstract
opments in machine learning and neural networks have given rise to non-linear time series
models that provide modern and promising alternatives to traditional methods of analysis.
In this paper, we present an ensemble of independent and parallel long short-term memory
(LSTM) neural networks for the prediction of stock price movement. LSTMs have been shown
to be especially suited for time series data due to their ability to incorporate past information,
while neural network ensembles have been found to reduce variability in results and improve
generalization. A binary classification problem based on the median of returns is used, and the
ensemble’s forecast depends on a threshold value, which is the minimum number of LSTMs
required to agree upon the result. The model is applied to the constituents of the smaller,
less efficient Stockholm OMX30 instead of other major market indices such as the DJIA and
S&P500 commonly found in literature. With a straightforward trading strategy, comparisons
with a randomly chosen portfolio and a portfolio containing all the stocks in the index show
that the portfolio resulting from the LSTM ensemble provides better average daily returns
and higher cumulative returns over time. Moreover, the LSTM portfolio also exhibits less
volatility, leading to higher risk-return ratios.
1 Introduction
Prediction of asset prices has long been a central endeavor in mathematical finance and economet-
rics. Financial time series, however, are notoriously challenging to analyze because of their nonsta-
tionarity, nonlinearity, and noise, resulting from the irrational human behavior that drive the data.
In the past, methods used are those of traditional nature such as ones based on Autoregressive
Integrated Moving Average (ARIMA), Generalized Autoregressive Conditional Heteroskedasticity
(GARCH), as well as other stochastic volatility models (see, for example, [5, 6, 28, 34]). The use
of these models often entail making assumptions about the data, its underlying distribution, and
the different processes affecting it. Because of these assumptions, these methods often generalize
poorly for new, out-of-sample data, even though they fit the current data well and do provide
valuable insights into the time series [32]. Recently, developments in machine learning and neural
networks have given rise to non-linear time series models that are increasingly being adapted for
financial applications. Support vector machines (SVM), restricted Boltzmann machines (RBM),
random forests, gradient boosted trees (GBM), and multilayer perceptrons (MLP) are just some
examples of the machine learning models that are being used [26, 19, 25, 31, 11]. Amongst these
models, one particular type of machine learning architecture, a recurrent neural network (RNN),
has been shown, compared to others, to be better suited for sequential data such as time series.
The suitability is due to the feedback loops in RNNs that allow them to use information not just
from the current input, but also from past inputs. This is unlike other neural networks that, in
general, process inputs as separate, independent data points. There is however, one major problem
with RNNs - their inability to learn long-term dependencies due to the infamous vanishing gradient
problem [2, 4, 20]. To address this, the long short-term memory (LSTM) was introduced.
1
In this paper, an LSTM model is used. A type of RNN, an LSTM also has feedback loops,
but moreover, it can also regulate its memory by using a gating mechanism that learns which
information to keep, to pass on, and to forget. It is widely used and has been shown to have
excellent predictive capabilities in natural language processing, handwriting recognition, image
recognition, and image captioning. See, for example, [7, 14, 16, 30, 35]. In finance, LSTMs have
been increasingly used for time series analysis. For example, applications for price predictions on
major stock market indices all over the world such as the S&P500, Shanghai’s SSE Index, India’s
NIFTY 50, and Brazil’s Ibovespa are studied in [2, 8, 17, 22, 27]. In addition, Tsantekidis et al.
[33] used LSTMs on Finnish companies to predict price movements through high frequency trading
data on a limit order book. Apart from predicting prices, Yeung et al. [37] employed LSTMs to
detect jumps in the values of different stock market indices, and Xiong et al. [36] applied LSTMs
on the S&P500 and Google domestic trends data to forecast price volatility. These are just some
examples of LSTM implementations on financial time series showing the neural network to produce
promising results. Comparisons with other methods have also been made. Siami-Namini et al. [29],
for example, compared LSTM with ARIMA for time series forecasting. They not only used data
from major exchanges such as the Dow Jones Industrial Average (DJIA) and Nasdaq Composite,
but also other economic time series such as the M1 money supply, currency exchange indices, and
transportation data. Results from their study show that LSTM forecasts have significantly less
root mean square error (RMSE) than those from ARIMA. Fischer and Krauss [12] applied LSTM
to S&P500 data for price prediction and compared the results with random forest, a standard deep
neural network (DNN), and logistic regression. Their findings indicate that LSTM does indeed
have higher accuracy than the other approaches, and that LSTM-based portfolios offer higher
returns and lower volatilities. Di Persio and Honchar [11], on the other hand, compared LSTM
and MLP with their own method, which is an ensemble of wavelets and a convolutional neural
network (CNN). Although they reported that their method appears to be superior, the results are
very close to those from LSTM [22].
Most of the literature on the application of LSTM on financial time series have been made
on major market indices such as the DJIA and S&P500. In this paper, an LSTM is applied
to Stockholm’s OMX30 to explore what advantages an LSTM-based approach can provide for a
smaller, less perfect market. The method used is inspired by both Fischer and Krauss [12] and Barra
et al. [3]. LSTMs are applied to sequences of daily returns; however, as opposed to applying the
network to the index itself, the model is applied to the individual stock constituents. Rather than
the usual regression problem that is commonly seen in literature, a binary classification problem
is used based on the daily median across the different stocks. The target is therefore whether
or not the stock’s next day return will be above or below the median. Furthermore, instead of
just one LSTM, an ensemble of independent and parallel LSTMs is used, where the ensemble’s
prediction depends on the majority of the individual results. This is in line with [3] who argue that
such ensembles can eliminate much of the randomness in the model and increase the reliability
of outcomes. The combination of a median-based binary classification and an LSTM ensemble
that this paper implements on the relatively less efficient Swedish market is a unique approach.
Results show that, when compared to a randomly chosen portfolio and a portfolio containing all
the stocks considered, the LSTM-based method gives rise to portfolios that yield higher returns,
lower volatility, and higher risk-return ratios.
The rest of the paper is organized as follows: Section 2 presents LSTMs and their mechanisms
in detail. Section 3 describes the method, where the data is introduced and the neural network
architecture, ensemble, and trading strategy are explained. Section 4 is the presentation and
discussion of results. Finally, section 6 provides a summary and conclusions.
2
2 Long Short-Term Memory
As mentioned above, RNNs are especially suited for sequential data due to their feedback loops
that enable them to use both current and past inputs, therefore allowing information to persist.
This feature of RNNs means that they are able to learn and take into account trends and context
when training and making predictions. There is, however, one major limitation - RNNs lose
memory in the long run because of the vanishing gradient problem. To tackle this, Hochreiter and
Schmidhuber [18] introduced the LSTM network in 1997. Since then, it has been modified and
improved upon over the years by, for example, [13, 14, 15, 16].
Figure 1 shows an example of an LSTM network with one input feature x, one hidden layer with
several units, and one output y. An LSTM unit, also called a memory cell, is magnified to show
its inner components. A memory cell contains three gates, each controlling how much information
should be kept in memory, forgotten, and passed on as cell output. A sigmoid activation function
is used for all three gates as its value ranges from 0, corresponding to no information, to 1,
corresponding to all information
1
σ(x) = (2.0.1)
1 + e−x
3
• s̃t is the candidate cell state
• ht is the output of the cell, also called hidden state
• ft , it , and ot are the values for the forget, input, and output gates, respectively
• Wf , Wi , Wo , and Ws̃ are the weight matrices associated with the input x
• Uf , Ui , Uo , and Us̃ are the weight matrices associated with the output ht
• bf , bi , bo , and bs̃ are the bias vectors
When the model is created, the cell states and outputs are initialized, say to s0 and h0 , re-
spectively. For a forward pass, an input x = (x1 , x2 , . . . , xn ) in the form of a sequence is fed into
the model, where the memory cells take the data points xt consecutively to calculate the new cell
states and outputs. Figure 1’s magnified memory cell shows its evolution over time as it processes
one data point after another: xt−1 , xt , xt+1 , . . . .
The values for the gates at time t are calculated based on the input xt and the previous output
ht−1 as follows:
ft = σ(Wf xt + Uf ht−1 + bf ) (2.0.2)
it = σ(Wi xt + Ui ht−1 + bi ) (2.0.3)
it = σ(Wi xt + Ui ht−1 + bi ) (2.0.4)
Similarly, a candidate cell state s̃t is computed, still based on the input xt and the previous output
ht−1 , but with a tanh activation function. This represents the new information that the memory
cell has received.
s̃t = tanh(Ws̃ xt + Us̃ ht−1 + bs̃ ) (2.0.5)
The actual cell state st is then calculated based on the previous cell state st−1 and candidate cell
state s̃t from (2.0.5), where the values of the forget and input gates, ft and it , determine how much
should be forgotten from st−1 and retained from s̃t :
st = ft st−1 + it s̃t (2.0.6)
Finally, the output ht is computed based on the value of the output gate ot and current cell state
st calculated above:
ht = ot tanh(st ) (2.0.7)
Note that the outputs ht are not just looped into the same memory cell, but are also passed to
the other memory cells in the network as shown in Figure 2.
With m being the number of hidden units and n the number of input neurons, the total number
of trainable parameters in the neural network is
4mn + 4m2 + 4m (2.0.8)
where 4mn is the number of weights associated with the input, 4m2 is the number of weights
associated with the outputs ht , and 4m is the number of biases.
3 Method
3.1 Data
Data for the empirical investigation were taken from the constituents of Stockhom’s OMX30 (http:
//www.nasdaqomxnordic.com). Table 1 lists the top ten securities in the index by weight, while
4
Figure 2: Hidden states ht from an LSTM unit do not only get looped within the unit, but are
also passed on to other LSTM units
Table 2 presents the different industries. Both are from December 2019 as that was the latest
available information at the time the data were extracted.
Daily closing prices for the constituent stocks1 from May 2002 to January 2020 were down-
loaded. In order to avoid keeping track of the changes in constituents over time, the stocks used
were kept to be those comprising the index as of February 2020. With the closing prices pt , the
daily returns Rt were calculated as follows:
pt
Rt = −1 (3.1.1)
pt−1
In line with [12] and [31], the daily medians of the returns of the stocks were calculated. Each
stock was then classified as either 0 if its daily return is below the daily median, or 1 if its daily
return is above. To create the inputs to the LSTM, sequences of returns were created for each
stock, where the target for each sequence is the one day ahead prediction of whether the return
will be above or below the median. Figure 3 illustrates this.
1 Essity B excluded due to lack of data for the dates of interest.
5
Industry Weight (%) No. of Securities
Oil & Gas 0.00 0
Basic Materials 3.27 3
Industrials 37.73 10
Consumer Goods 9.21 4
Health Care 4.07 2
Consumer Services 6.56 1
Telecommunications 6.13 2
Utilities 0.00 0
Financials 22.82 6
Technology 10.21 2
The LSTM model consists of one input neuron, one hidden layer, and one output neuron. A sigmoid
activation function is used for the output, which can be interpreted as a measure of confidence.
Closer to 1 means that the model is more confident that the return will be above the median,
while closer to 0 means it is more confident that it will be below. Adam optimizer was used
together with a learning rate of 0.0075, which was chosen with the help of Bayesian optimization
[23]. The same Bayesian optimization algorithm was also used to determine the values of the other
hyperparameters to be:
For training and testing, the data for each stock were divided into blocks of length 750 days
for training (approximately three years of trading), 270 days for valuation (more than a year of
trading), and 270 days for testing, as illustrated in Figure 4. Sequences of length 240 were created
6
for each of these data sets2 . A rolling window of 30 days was used, which results in 30 non-
overlapping days of prediction for each block. In practice, it also means that the model is retrained
approximately every six weeks. The 30-day rolling window was chosen based on trial and error,
where a shorter rolling window resulted in overfitting, while a longer rolling window resulted in
lower accuracy for predictions towards the end of testing as the data becomes further from the
training dates.
Using ensembles of neural network models has been argued to reduce variation and improve gener-
alization. Therefore, as with [3], instead of just one LSTM network, several LSTMs were used, all
independent of each other and trained in parallel. A diagram of the ensemble is shown in Figure 5,
where each LSTM has the same architecture described in Section 3.2, but with a different weight
initialization.
Figure 5: LSTM ensemble. The LSTMs are independent of each other and are trained in parallel.
The different initializations used were those readily available in Keras [10] and the details of which
are provided in Table 3. In total, 11 LSTMs were used.
As the LSTMs are independent, so are their outputs. The final ensemble prediction as to
whether the next day return will be above or below the median is decided based upon a minimum
number, referred to as a threshold, of LSTMs with that result. In other words, the threshold is
the minimum number of LSTMs that must agree on a prediction. Since there are 11 LSTMs, any
threshold equal to or above six is considered majority.
2 Note that a 270-day testing data with sequences of length 240 will result in 30 days of predictions.
7
Initialization Description Parameters
RandomNormal Normal Distribution µ = 0.0, σ = 0.05
RandomUniform Uniform Distribution min = −0.05, max = 0.05
TruncatedNormal Normal Distribution, values σ ≥ 2 are re- µ = 0.0, σ = 0.05
drawn
Zeros Initialized to 0
Ones Initialized to 1
q
GlorotNormal Normal Distributed 2
µ = 0.0, σ = f an_in+f an_out
q
GlorotUniform Uniform Distribution[−limit, limit] 6
limit = f an_in+f an_out
Identity Identity matrix
Orthogonal Orthogonal matrix from QR decomposition
of random matrix drawn from normal distri-
bution
Constant Initialized to a constant constant = 0.05
q
VarianceScaling TruncatedNormal 1
σ = f an_in
Table 3: Different weight initializations with chosen parameters for the individual LSTMs
For the trading strategy, stocks predicted by the LSTM ensemble to perform better than the
median are bought and added to an equally weighted portfolio. A stock is held until the model
no longer predicts it to be above the median, in which case, the position is closed. The resulting
portfolio is rather dynamic and is adjusted daily based on the LSTM forecasts. There is no fixed
number of stocks in the portfolio; it contains however many stocks the model predicts to perform
well.
4 Results
4.1 Accuracy
Figure 6 below displays the test accuracy for three of the LSTMs, with Random Normal, Random
Uniform, and Glorot Uniform3 weight initializations. Other LSTMs showed similar results so these
were chosen as examples. The accuracies are shown to be just near 50%, which is consistent with
findings in literature (see, for example, [3, 11, 22]). Because of the complexity of financial time
series, accuracy achieved with machine learning methods is often around 50%, unlike the much
higher accuracies achieved in other areas such as image recognition.
To examine the effects of different thresholds and minimum required accuracy on the average daily
returns, these variables of the LSTM ensemble were varied. Figure 7a shows the result for changing
the threshold. Excluding three, which appears to be an anomaly, one can see that the daily average
portfolio return grows as the threshold increases, until it reaches eight, at which point the return
3 Glorot Uniform is the default Keras initialization.
8
Figure 6: Samples of test accuracy for the LSTMS with different weight initializations
starts to decrease. Having too low of a threshold means that even stocks predicted by many of
the LSTMs to perform worse than the median may be added to the portfolio. On the other hand,
after the optimal number of eight, requiring more LSTMs to have the same prediction becomes
too strong of a condition and results in less or even no stocks included in the portfolio.
Figure 7: Average daily returns as threshold and minimum accuracy are varied
Similarly, Figure 7b shows the result for varying the minimum required accuracy, where it is shown
9
that the average daily return decreases as the minimum required accuracy increases. An increase
in minimum accuracy means requiring the model to be more confident of its prediction. However,
as shown above, the predictive accuracy stays very close to 50%, which means that, as with the
threshold, having a higher accuracy requirement may be too strict of a condition, leading to fewer
or no stocks being included in the portfolio.
Apart from looking at the effects of the threshold and required accuracy, the daily returns from the
LSTM-based portfolio were compared with a portfolio containing all 29 stocks, to be referred to as
an all-stock portfolio4 , and a randomly chosen portfolio. The random portfolio contains randomly
chosen stocks from the 29 that were studied and the number of stocks included in the portfolio
daily is also random. Table 4 displays the overall results. As seen in the table, both the LSTM
Table 4: Comparison of daily returns (%) for the LSTM, all-stock, and randomly chosen portfolios
and the all-stock portfolios have higher average daily returns than the random portfolio, with the
LSTM portfolio having the highest. The LSTM portfolio also has a lower standard deviation than
the all-stock portfolio, indicating that the portfolio chosen by the LSTM method may incur less
risk. It is also worth noting that although the all-stock portfolio appears to have a slightly higher
maximum daily return, the minimum return for the LSTM is less negative, which may suggest
that the LSTM portfolio has a much less downside that that of the all-stock portfolio.
For more details, Table 5 compares the average daily returns by year, where the best results
are highlighted in bold. During the 2007 - 2008 financial crisis, the random portfolio provided
better returns than both the LSTM and all-stock portfolios. Afterwards, however, as in the overall
results in Table 4, the random portfolio is outperformed by the LSTM and all-stock portfolios.
The results of these two portfolios are very close to each other, where in some years, the LSTM
portfolio appears to have a higher average daily return, while in others, the all-stock portfolio does.
Similar to Table 4, however, one can see that for periods of negative returns, such as for the years
2007 - 2008, 2011, and 2018, the LSTM appears to do better than the all-stock portfolio by having
less negative results. Again, this implies that the LSTM portfolio is certainly a more attractive
choice when losses are to be expected.
To see the development of profits over time, Figure 8 graphs the cumulative returns for all three
portfolios from 2007 - 2020. It can be seen that the LSTM portfolio clearly has the highest
accumulated returns for the whole period. Once again, during intervals of decline, as is evident in
2008 - 2009 for example, the graph shows that the LSTM portfolio returns does not decrease as
much as the all-stock portfolio.
4 A comparison with the OMX30 index cannot be made directly for two reasons. First, ESSITY B was excluded
due to lack of data for the period of interest. Second, the stocks examined here are fixed to be those making up the
index as of February 2020, while the index constituents in practice are regularly adjusted.
10
Year LSTM All-Stock Portfolio Random Portfolio
2007 -0,0531 -0,0755 -0,0407
2008 -0,0872 -0,1743 -0,0641
2009 0,1911 0,2322 0,1184
2010 0,1052 0,1046 0,0621
2011 -0,0085 -0,0462 -0,0246
2012 0,0805 0,0667 0,0404
2013 0,0639 0,0699 0,0275
2014 0,0453 0,0519 0,0253
2015 0,0181 0,0204 0,0126
2016 0,0142 0,0441 0,0141
2017 0,0344 0,0344 0,0131
2018 -0,0403 -0,0497 -0,0447
2019 0,1095 0,1088 0,0434
Table 5: Comparison of average daily returns (%) by year. Best results are highlighted in bold.
For a more rigorous assessment of the risk, the annualized volatility, Sharpe ratio, and Sortino
ratio were examined. The Sharpe ratio is a commonly used measurement of return per unit of risk
and is given by the following formula:
Rp − rf
Sharpe ratio = (4.4.1)
σp
where the Rp is the portfolio return, rf the risk-free return, and σp the standard deviation of the
portfolio returns. On the other hand, the Sortino ratio is a variant of the Sharpe ratio that only
takes into account the downside standard deviation. It is given by:
Rp − rf
Sortino ratio = (4.4.2)
σd
where σd is the standard deviation of the negative portfolio returns. By only considering the
negative returns, the Sortino ratio distinguishes between good and bad volatility. Regardless of
11
which is used, higher values are preferred for both ratios. In the calculations, the risk-free rate
was taken to be the Swedish 1-month Treasury bill (accessed from https://www.riksbank.se/
en-gb/statistics/search-interest--exchange-rates/). Results are presented in Table 6.
Table 6: Comparisons of risk measures and risk-reward ratios between the LSTM and all-stock
portfolios. Better results are highlighted in bold.
Looking at the annualized volatility columns, it is evident that the LSTM has lower volatility
throughout, denoting a less risky portfolio. Comparing the Sharpe and Sortino ratios, one can
see that apart from 2013 - 2016, the LSTM portfolio appears to have a higher return per unit of
risk than the all-stock portfolio. Negative Sharpe and Sortino ratios in 2007 - 2008, 2011, and
2018 indicate the portfolios’ returns during these periods are less than the risk-free rate (see Table
5). Care must always be taken when comparing negative ratios as they can be misleading. For
example, for the same return, a larger standard deviation, i.e., higher risk, will result in a less
negative ratio, which may lead some to conclude better performance. In this case, however, the
less negative ratios of the LSTM portfolio do indeed come from the combination of higher returns
and lower volatility.
In this paper, we present an approach for the prediction of stock price movement by using an
ensemble of independent and parallel LSTM neural networks. A binary classification problem
based on the median of returns is used and the ensemble’s forecast depends on how many of the
LSTMs agree on the same output. The model is applied to the constituents of the relatively smaller
and less efficient OMX30 index as opposed to the commonly used major stock market indices such
as the S&P500 and DJIA. A straightforward trading strategy is then implemented based on the
LSTM forecasts. Compared to a randomly chosen portfolio and a portfolio containing all the stocks
in the index, the LSTM-based portfolio appears to have better average daily returns and a higher
cumulative return over time. Even more remarkably, the LSTM portfolio exhibits less volatility
throughout the whole period than the portfolio containing all the stocks. The combination of this
lower volatility with the higher returns results in higher risk-return ratios.
12
With such encouraging outcomes, we identify several ways to improve the approach even further.
For example, using a learning rate decay might increase speed of convergence and accuracy. Having
a more systematic way of deciding when the model should be retrained may also lead to better
results and prevent overfitting. For example, statistics from the data may be taken regularly,
where retraining is done whenever a significant shift in distribution or parameters is observed.
Several input neurons corresponding to the number of stocks instead of a single one for the whole
neural network may force the model to process the stocks simultaneously and learn the correlations
between them. Finally, instead of being equal, having a trading strategy that weighs the stocks
depending on the model’s confidence may also lead to higher returns. These are just some examples
on how to increase the performance of a method which already gives promising results.
References
[1] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis,
J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia, R. Joze-
fowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Mané, R. Monga, S. Moore, D. Murray, C. Olah,
M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Va-
sudevan, F. Viégas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, and X. Zheng.
TensorFlow: Large-scale machine learning on heterogeneous systems, 2015. Software available
from tensorflow.org.
[2] W. Bao, J. Yue, and Y. Rao. A deep learning framework for financial time series using stacked
autoencoders and long-short term memory. PloS one, 12(7):e0180944, 2017.
[3] S. Barra, S. M. Carta, A. Corriga, A. S. Podda, and D. R. Recupero. Deep learning and time
series-to-image encoding for financial forecasting. IEEE/CAA Journal of Automatica Sinica,
7(3):683–692, 2020.
[4] Y. Bengio, P. Simard, and P. Frasconi. Learning long-term dependencies with gradient descent
is difficult. IEEE Transactions on Neural Networks, 5(2):157–166, 1994.
[7] W. Byeon, T. M. Breuel, F. Raue, and M. Liwicki. Scene labeling with lstm recurrent neural
networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recogni-
tion, pages 3547–3555, 2015.
[8] J. Cao, Z. Li, and J. Li. Financial time series forecasting model based on ceemdan and lstm.
Physica A: Statistical Mechanics and its Applications, 519:127–139, 2019.
[9] K. Cho, B. van Merrienboer, Ç. Gülçehre, F. Bougares, H. Schwenk, and Y. Bengio. Learning
phrase representations using RNN encoder-decoder for statistical machine translation. CoRR,
abs/1406.1078, 2014.
[10] F. Chollet et al. Keras. https://keras.io, 2015.
[11] L. Di Persio and O. Honchar. Artificial neural networks architectures for stock price prediction:
Comparisons and applications. International journal of circuits, systems and signal processing,
10(2016):403–413, 2016.
[12] T. Fischer and C. Krauss. Deep learning with long short-term memory networks for financial
market predictions. European Journal of Operational Research, 270(2):654–669, 2018.
13
[13] F. A. Gers, J. Schmidhuber, and F. Cummins. Learning to Forget: Continual Prediction with
LSTM. Neural Computation, 12(10):2451–2471, 10 2000.
[14] A. Graves, M. Liwicki, S. Fernández, R. Bertolami, H. Bunke, and J. Schmidhuber. A novel
connectionist system for unconstrained handwriting recognition. IEEE Transactions on Pat-
tern Analysis and Machine Intelligence, 31(5):855–868, 2009.
[15] A. Graves and J. Schmidhuber. Framewise phoneme classification with bidirectional lstm and
other neural network architectures. Neural Networks, 18(5):602–610, 2005. IJCNN 2005.
[16] K. Greff, R. K. Srivastava, J. Koutník, B. R. Steunebrink, and J. Schmidhuber. Lstm: A search
space odyssey. IEEE Transactions on Neural Networks and Learning Systems, 28(10):2222–
2232, 2017.
[17] J. B. Heaton, N. G. Polson, and J. H. Witte. Deep learning in finance. CoRR, abs/1602.06561,
2016.
[18] S. Hochreiter and J. Schmidhuber. Long Short-Term Memory. Neural Computation, 9(8):1735–
1780, 11 1997.
[19] C. Krauss, X. A. Do, and N. Huck. Deep neural networks, gradient-boosted trees, random
forests: Statistical arbitrage on the s&p 500. European Journal of Operational Research,
259(2):689–702, 2017.
[20] Y. LeCun, Y. Bengio, and G. Hinton. Deep learning. nature, 521(7553):436–444, 2015.
[21] Nasdaq. Omx stockholm 30 index, Fact Sheet, 2020. https://indexes.nasdaq.com/docs/
FS_OMXS30.pdf, Last accessed on 2020-02-11.
[22] D. M. Nelson, A. C. Pereira, and R. A. de Oliveira. Stock market’s price movement predic-
tion with lstm neural networks. In 2017 International Joint Conference on Neural Networks
(IJCNN), pages 1419–1426, 2017.
[23] F. Nogueira. Bayesian Optimization: Open source constrained global optimization tool for
Python, 2014–.
[24] C. Olah. Understanding lstm networks, 2015.
[25] J. Patel, S. Shah, P. Thakkar, and K. Kotecha. Predicting stock and stock price index move-
ment using trend deterministic data preparation and machine learning techniques. Expert
Systems with Applications, 42(1):259–268, 2015.
[26] N. I. Sapankevych and R. Sankar. Time series prediction using support vector machines: A
survey. IEEE Computational Intelligence Magazine, 4(2):24–38, 2009.
[27] S. Selvin, R. Vinayakumar, E. Gopalakrishnan, V. K. Menon, and K. Soman. Stock price
prediction using lstm, rnn and cnn-sliding window model. In 2017 International Conference
on Advances in Computing, Communications and Informatics (ICACCI), pages 1643–1647,
2017.
[28] R. H. Shumway and D. S. Stoffer. ARIMA Models, pages 75–163. Springer International
Publishing, Cham, 2017.
[29] S. Siami-Namini, N. Tavakoli, and A. Siami Namin. A comparison of arima and lstm in
forecasting time series. In 2018 17th IEEE International Conference on Machine Learning
and Applications (ICMLA), pages 1394–1401, 2018.
[30] M. Sundermeyer, R. Schlüter, and H. Ney. Lstm neural networks for language modeling. In
Thirteenth Annual Conference of the International Speech Communication Association, 2012.
[31] L. Takeuchi and Y.-Y. A. Lee. Applying deep learning to enhance momentum trading strate-
gies in stocks. In Technical Report. Stanford University, 2013.
14
[32] D. T. Tran, A. Iosifidis, J. Kanniainen, and M. Gabbouj. Temporal attention-augmented
bilinear network for financial time-series data analysis. IEEE Transactions on Neural Networks
and Learning Systems, 30(5):1407–1418, 2018.
[33] A. Tsantekidis, N. Passalis, A. Tefas, J. Kanniainen, M. Gabbouj, and A. Iosifidis. Using deep
learning to detect price change indications in financial markets. In 2017 25th European Signal
Processing Conference (EUSIPCO), pages 2511–2515, 2017.
[34] R. S. Tsay. Analysis of Financial Time Series, volume 543. John wiley & sons, 2005.
[35] O. Vinyals, A. Toshev, S. Bengio, and D. Erhan. Show and tell: A neural image caption
generator. In Proceedings of the IEEE conference on computer vision and pattern recognition,
pages 3156–3164, 2015.
[36] R. Xiong, E. P. Nichols, and Y. Shen. Deep learning stock volatility with google domestic
trends. arXiv preprint arXiv:1512.04916, 2015.
[37] J. F. A. Yeung, Z.-k. Wei, K. Y. Chan, H. Y. Lau, and K.-F. C. Yiu. Jump detection in
financial time series using machine learning algorithms. Soft Computing, 24(3):1789–1801,
2020.
15