Predictive Modeling of Stock Prices Using Transformer Model

Predictive Modeling of Stock Prices Using Transformer Model
Leila Mozaffari Jianhua Zhang

[email protected] [email protected]
Oslo Metropolitan University Oslo Metropolitan University
Oslo, Norway Oslo, Norway
ABSTRACT There is great significance to stock price prediction, since it can

Financial market prediction utilizing deep learning has attracted yield substantial returns while minimizing losses. Trading decisions
the attention of both investors and researchers. Deep learning meth- can be informed by successful forecasting, guiding traders on when
ods, such as convolutional neural networks and recurrent neural to buy, sell, or hold stocks. Investors can also use it to protect their
networks work well at predicting stock indices based on the non- portfolios and hedge their positions, which can be a key component
linear characteristics of stock markets. The goal of this work is to of risk management.
predict the stock index using the latest deep learning framework, Stock price prediction presents a number of challenges. There are
Transformer. This paper presents a comprehensive analysis of stock many factors that influence stock prices, including macroeconomic
closing price prediction using three distinct machine learning mod- events, geopolitical shifts, market sentiment, and unforeseeable
els: Long Short-Term Memory (LSTM), Prophet, and Transformer. shocks. As a result of these inherent complexities, researchers have
Using the encoder-decoder architecture and the multi-head atten- developed advanced prediction models designed to take into ac-
tion mechanism, Transformer is able to better characterize stock count this complex interaction of variables [1]. In recent years,
market dynamics. The present study uses data from Yahoo Finance. neural networks, specifically deep learning models, have proven
The Transformer model demonstrated superior performance in to be one of the most effective methods for stock price prediction.
comparison with LSTM and Prophet. In this work, we handle the Since neural networks are capable of learning and adapting to
complexities of market dynamics to improve stock price predictions. non-linear patterns in data, they are well-suited to modeling stock
markets, which are dynamic and unpredictable.
CCS CONCEPTS By optimizing neural network architectures, this research aims
to improve the predictive accuracy of stock price models. It has
• Mathematics of computing → Time series analysis; • Com-
been shown that neural networks are capable of forecasting finan-
puting methodologies → Neural networks.
cial time series, but the selection of appropriate hyperparameters
KEYWORDS (e.g., learning rate, hidden layer sizes, and training epochs) greatly
influences their performance [13].
Stock Market, Time Series, Transformer, Prophet, LSTM, Stock Price There is no doubt that the world of finance, particularly the stock
Prediction market, is one of the most dynamic and consequential domains in
ACM Reference Format: the global economy. Companies and individuals can use it as a
Leila Mozaffari and Jianhua Zhang. 2024. Predictive Modeling of Stock Prices source of capital, as a measure of the health of the economy, and as
Using Transformer Model. In 2024 9th International Conference on Machine an investment vehicle. Hence, stock market analysis is of paramount
Learning Technologies (ICMLT) (ICMLT 2024), May 24–26, 2024, Oslo, Norway. importance in this context. The purpose of in-depth market analysis
ACM, New York, NY, USA, 8 pages. https://doi.org/10.1145/3674029.3674037
is to help investors and financial analysts make informed decisions,
optimize investment strategies, and manage risks.
1 INTRODUCTION Traditionally, two primary methods are used to analyze the stock
It has long been a challenge for investors, researchers, and data market: fundamental analysis and technical analysis [10]. Funda-
scientists to predict stock prices in the financial markets. Stock mental analysis involves analyzing a company’s financial state-
price forecasts can be immensely valuable to investors, helping ments, earnings, and overall health to determine its actual value. In
them optimize trading decisions, manage portfolios, and mitigate contrast, technical analysis looks at past price and volume data to
risk [24]. In an attempt to uncover the complex patterns associated predict future stock price movements. There are, however, limita-
with stock price movements, robust models are continually being tions to these methods, especially when it comes to dealing with
developed as a result of the attraction of potential financial gain vast and ever-changing financial data sets. This is where machine
from accurate predictions. The stock market, however, is a highly learning has found a valuable role in financial forecasting.
complex and dynamic system influenced by a multitude of variables Stock market analysis can be revolutionized by machine learning,
and external factors, making predictions difficult [19]. a subset of artificial intelligence. This method allows us to handle
huge amounts of data in a way traditional methods cannot, allow-
ing us to unlock new financial forecasting opportunities. Adaptive
This work is licensed under a Creative Commons Attribution International machine learning models learn from data and identify patterns that
4.0 License. are difficult for humans to detect. The rest of this paper is orga-
ICMLT 2024, May 24–26, 2024, Oslo, Norway nized as follows: A brief literature review is presented in Section
© 2024 Copyright held by the owner/author(s). 2. Our proposed method is described in detail in Section 3. Section
ACM ISBN 979-8-4007-1637-9/24/05 4 presents the results of a series of experiments conducted to test
https://doi.org/10.1145/3674029.3674037
41
ICMLT 2024, May 24–26, 2024, Oslo, Norway Leila Mozaffari and Jianhua Zhang
our approach. The final section concludes this paper and discusses architecture. Stock prices are perfectly aligned between highs and
future directions of research. lows. These models have been used to predict stock prices with
remarkable success.
This domain, however, faces challenges. In order to predict stock
2 LITERATURE REVIEW prices, a variety of factors must be considered, including economic
A significant revolution has taken place in the field of stock price indicators, geopolitical events, and investor sentiment.
prediction with the advent of machine learning techniques. Liter- Therefore, machine learning models have been successfully used
ature review highlighting relevant machine learning models and to predict stock prices. A combination of neural networks, hyper-
their applications in financial forecasting provides a comprehensive parameter optimization, and sentiment analysis offers a compelling
overview of the current landscape of stock price predictions. framework for financial forecasting. Clearly, the field continues to
In 1988, An important study of neural network models for stock evolve, and machine learning’s application to stock price predic-
price prediction was done by White [23]. IBM’s daily common stock tion holds great promise for investors, financial institutions, and
was used in his predictive model, and his training predictions were researchers equally. In the following section, we will discuss the
very optimistic. Later, many studies were conducted to test the methodology and results of our study.
neural network’s accuracy of stock market forecasting.
Predictive models are complicated to build based on time series 3 METHOD
analysis of daily stock data. MR. Islam et al. [14] Compared three
3.1 Data Preprocessing
different methods for predicting stock prices, namely Autoregres-
sive Integrated Moving Average (ARIMA), artificial neural network Financial data is rarely ready for machine learning models to be used
(ANN), and stochastic process-geometric Brownian motion (GBM). immediately. To prepare the raw data for training and evaluation, a
These methods are used to build predictive models using historical series of preprocessing steps were applied.
stock data collected from Yahoo Finance. A comparison is made • Missing Data Handling
between the outputs of each model and the actual stock price. For In financial datasets, missing data is a common problem,
next-day stock price prediction, Using the S & P 500 index for anal- which can be caused by a variety of factors. In order to pre-
ysis, the conventional statistical model ARIMA and the stochastic vent skewed or biased predictions, it is essential to address
model-geometric Brownian motion model perform better than the this issue [2].
artificial neural network models. • Feature Selection
Priyank Sonkiya et al. [18] proposed an ensemble of state-of-the- In financial datasets, many features are present, but not all of
art stock price prediction methods. A version of BERT, a pre-trained them are relevant to predicting stock prices [11]. Only those
transformer model by Google for Natural Language Processing features that are most likely to contribute to the predictive
(NLP), is used to perform sentiment analysis of news and headlines power of the model were identified and retained.
for Apple Inc., listed on the NASDAQ. The stock price for Apple • Time Series Data Formatting
Inc is then predicted using technical indicators, stock indexes of By nature, stock price data is a time series, and models must
various countries, some commodities, and historical prices along take this into account. Therefore, the data was organized
with sentiment scores. A comparison is made with baseline mod- into time series formats with features and target variables,
els such as Long Short Term Memory (LSTM), Gated Recurrent allowing the model to learn from historical data and predict
Units (GRU), vanilla GAN, and Auto-Regressive Integrated Moving the future.
Average (ARIMA). The proposed S-GAN model outperformed the • Outlier Detection
traditional time series forecasting models GAN, GRU, LSTM and Model performance can be significantly impacted by out-
ARIMA. RMSE for training was observed at 0.5606 when learning liers in the training data. Outliers were detected and handled
rate was 0.00016, batch size was 128, and epochs were 165. appropriately using robust statistical methods and visualiza-
A number of back-testing experiments were performed by Chao- tions, such as box plots.
jie Wang et al. [22] on the main stock market indices around the • Z-Score Normalization In data preprocessing, Z-Score nor-
world, including the CSI 300, S&P 500, Hang Seng Index, and Nikkei malization, also known as standardization, converts numeri-
225. Several experiments have shown that Transformer outper- cal data into a standard scale [15]. By rescaling the data, it
forms other classic methods and can generate excess earnings for has an average of zero and a standard deviation of one. It
investors. prevents one feature from dominating others during model
Muhammad Rizki Nur Majiid et al. [12] are collected datasets training by creating a consistent scale across features or vari-
from Yahoo Finance and investing.com, as well as other aspects of ables. The Z-Score normalization of a data point x is given
the stock market. An extended dataset derived from Bank Central by [4]:
Asia’s stock price is used to train these models. It includes open 𝑋 −𝜇
𝑍= (1)
price, close price, low price, high price, and volume. Ensemble 𝜎
Transformer LSTM (ET-LSTM) and Ensemble Transformer GRU where:
(ET-GRU) architectures forecast stock prices for the following day. z is the normalized value, x is the original value, 𝜇 is the
In order to obtain more accurate findings, a variety of deep learning mean of the data, 𝜎 is the standard deviation of the data.
approaches are applied to the data set. A total of 9% of MAPE As a fundamental step in the predictive modeling process,
delivered by both of the suggested approaches using ensemble data preprocessing is essential. By ensuring the data is of
42
Predictive Modeling of Stock Prices Using Transformer Model ICMLT 2024, May 24–26, 2024, Oslo, Norway
high quality and possesses the necessary attributes, machine 3.3 Model 2: Prophet Model
learning models can capture patterns and relationships in Prophet offers a robust methodology for forecasting that integrates
stock prices accurately. Moreover, it facilitates more accurate seamlessly into research papers. Prophet [20], designed by Face-
predictions and insightful analysis in subsequent phases of book, fits non-linear trends with yearly, weekly, and daily sea-
research by mitigating common challenges like missing data sonality components using an additive model. In the architecture,
and outliers. holidays, trends, and seasonality are incorporated into a decom-
posable time series model. One of its key strengths is its ability
3.2 Model 1: LSTM to handle outliers and missing data effectively. The Prophet soft-
A Long Short-Term Memory (LSTM) [8] is a type of Recurrent ware contributes significantly to forecasting tasks by providing
Neural Network that is designed to address challenges in learn- accurate predictions and flexibility in handling diverse datasets.
ing long-term relationships among sequential data. In an effort to Many studies combine Prophet with other techniques for enhanced
overcome the limitations of traditional RNNs, LSTMs incorporate forecasting performance, such as Long Short-Term Memory (LSTM)
memory cells and gate mechanisms to facilitate information reten- networks [17].
tion and selective processing. LSTM networks have a number of
3.3.1 Model Architecture. [3]
key components. With its recurrent nature, the LSTM layer allows
the model to capture dependencies over extended sequences from • Additive Time Series Decomposition:
the input layer. A model’s ability to retain information over time Time series are broken down into three components using
can be attributed to the cell state, along with the forget, input, and Prophet’s additive decomposition model: trend, seasonality,
output gates. and holidays. Decomposing the data in this way helps us to
LSTM’s flexibility in handling sequential data, as well as its wide gain a deeper understanding of the fundamental patterns.
adoption in a variety of domains, including time series analysis and • Trend Modeling:
forecasting, is revealed in a comprehensive review of its formula- Overarching trends are represented by the trend compo-
tion, training, and applications. Further research explores LSTM cell nent. To accommodate diverse data patterns, Prophet uses a
architecture variations and optimizations, highlighting its impor- piecewise linear model to capture both abrupt and gradual
tance for tasks such as forecasting time series. To understand the changes in the trend.
LSTM architecture, it is necessary to recognize its evolution from • Seasonality Modeling:
basic neural network structures to the sophisticated and powerful Time series data are heavily influenced by seasonality. A
models of today. With their ability to learn intricate long-term de- Fourier series expansion is incorporated into Prophet in
pendencies, these models have found applications in diverse fields, order to identify and predict repeating patterns over time.
demonstrating their effectiveness in handling complex sequential • Holiday Effects:
data. It is common for time series data to be influenced by holidays.
The Prophet model can incorporate holiday effects, allowing
3.2.1 Model Architecture. [8] it to account for the impact of holidays on observed data.
• Parameter Tuning:
• Input Layer:
There are several parameters that must be carefully tuned
Sequential data is received by the input layer of LSTMs. In
in the Prophet model, such as changepoints, seasonalities,
addition to these features, time-dependent information is
and holidays. To enhance model performance, the method-
crucial to the analysis.
ology should detail how parameters were selected and any
• Cell State:
adjustments made.
LSTMs store and carry information across time steps using
• Forecasting Uncertainty:
a cell state. RNNs with long-term dependencies have a hard
Prophet provides a unique feature by estimating the uncer-
time retaining them.
tainty intervals around forecasts. Decision-makers need this
• Forget, Input, Output Gates:
information, and the methodology should detail how Prophet
In LSTMs, information is controlled by three gates: Forget
quantifies and presents forecast uncertainty.
Gate, Input Gate, and Output Gate. In order to improve the
model’s ability to capture relevant patterns, these gates con-
trol information that is discarded, stored, or outputted. 3.4 Model 3: Transformer Model
• Memory Cells: The Transformer model is a revolutionary architecture in deep
In LSTMs, memory cells are crucial for storing and updating learning, initially introduced in the paper "Attention Is All You
information. As a result, the model is able to remember past Need" by Vaswani et al [21]. Transformers have achieved remark-
events, making it effective for predicting time series. able achievements across diverse domains. In the context of Natural
• Hybrid Architectures: Language Processing (NLP), they have proven their abilities in lan-
Hybrid models can be created by combining LSTMs with guage translation, sentiment analysis, and text summarization. Ex-
other architectures, such as convolutional neural networks panding their applicability to Image Processing, transformers have
(CNNs). Combinations like this enhance the model’s ability been adeptly tailored for vision tasks, demonstrating success in
to understand spatial and temporal dependencies at the same image classification and object detection. Furthermore, their effec-
time. tiveness extends to Time Series Analysis, where their unique ability
43
to capture long-range dependencies renders them suitable for pre- MAPE calculates the percentage difference between pre-
dicting sequential data, demonstrated in tasks such as forecasting dicted and actual values on average [7].
stock prices or predicting weather patterns. 𝑛
1 ∑︁ 𝑦𝑖 − 𝑦ˆ𝑖
𝑀𝐴𝑃𝐸 = × 100 (5)
3.4.1 Model Architecture. [6] 𝑛 𝑖=1 𝑦𝑖
• Self-Attention Mechanism: These metrics aid in assessing the accuracy and reliability of ma-
Unlike traditional recurrent neural networks (RNNs) and chine learning models, providing valuable insights for model selec-
long short-term memory networks (LSTMs), Transformers tion and improvement.
rely on self-attention mechanisms. This enables the model to
weigh the importance of different parts of the input sequence 4 EXPERIMENTAL RESULTS AND DISCUSSION
when making predictions.
4.1 Dataset
• Parallelization:
The Transformer’s architecture allows for highly parallelized This study focuses on historical records between 2013 and 2023
computation, making it more efficient than sequential mod- and fetching the stock data from Yahoo Finance [5], which includes
els like RNNs. This results in faster training times. essential attributes such as Low, Open, Volume, High, Close, and
• Encoder-Decoder Structure: Adjusted Close prices. AAL (American Airlines Group Inc.) and
The model is composed of an encoder and a decoder. The AAME (Atlantic American -Life Insurance) stock market indices
encoder processes the input sequence, capturing contex- were selected from Yahoo Finance in order to provide insight and
tual information, while the decoder generates the output analysis. The dataset contains historical stock prices and other
sequence. important financial data that can be used to analyze the market
• Multi-Head Attention: dynamics of these companies. Market movements can be examined
The self-attention mechanism is extended with multiple in a comprehensive way, enabling the examination of historical
heads, allowing the model to focus on different aspects of patterns, correlations, and predictive indicators. Financial data from
the input sequence simultaneously. This enhances its ability Yahoo Finance can be accessed with Yfinance, a popular Python
to capture complex relationships. library. This API tool provides a comprehensive set of features for
• Positional Encoding: retrieving stock-related information, historical data, and fundamen-
Transformers do not inherently understand the order of the tal company information. The dataset is split into three subsets for
input sequence. To address this, positional encodings are training, validation, and testing. Approximately 70% of the data
added to the input embeddings, providing information about is allocated for training. 15% for validation, and another 15% for
the positions of tokens in the sequence. testing.
3.5 Model Performance Metrics 4.2 Software and Hardware Requirements

The experiments were conducted utilizing the Jupyter Notebook.
In the domain of model evaluation, several metrics offer insights
The system specifications included an Nvidia Tesla T4 GPU, 12GB
into the performance of machine learning models. Here are expla-
of CPU memory, an Intel(R) Xeon(R) CPU running at 2.30GHz,
nations and formulas for key evaluation metrics [7]:
and 12GB of RAM. Python served as the primary programming
• Mean Absolute Error (MAE): language for running the experiments, and the PyTorch library was
MAE measures the average absolute difference between pre- employed for the computational tasks.
dicted and actual values [7].
4.3 Data Preprocessing
𝑛
1 ∑︁ Different preprocessing steps are performed on a dataset. It be-
𝑀𝐴𝐸 = |𝑦𝑖 − 𝑦ˆ𝑖 | (2)
𝑛 𝑖=1 gins by counting the number of missing values in each column
of the DataFrame, where there were no null values. Feature selec-
• Mean Squared Error (MSE):
tion [11] aims to enhance model performance by focusing on the
MSE quantifies the average squared difference between pre-
most relevant and informative features while discarding potentially
dicted and actual values [7].
redundant or less significant ones. In this case, ’Low,’ ’Open,’ and
𝑛
1 ∑︁ ’High,’ are chosen to serve as input variables for training a predictive
𝑀𝑆𝐸 = (𝑦𝑖 − 𝑦ˆ𝑖 ) 2 (3) model and predicting the target variable, ’Close,’ which represents
𝑛 𝑖=1
the closing stock prices. Next, the "Date" column is converted to a
• Root Mean Squared Error (RMSE): DateTime format. Outlier analysis [16] is performed on the ’Open’
RMSE is the square root of MSE, providing a measure in the and ’Close’ columns. By determining the first quartile (Q1) and third
original unit of the target variable [7]. quartile (Q3), the Interquartile Range (IQR) is calculated. Outliers
v
t 𝑛 are defined as data points that fall outside the lower and upper
1 ∑︁ bounds of the IQR. Based on this, there were no outliers in the
𝑅𝑀𝑆𝐸 = (𝑦𝑖 − 𝑦ˆ𝑖 ) 2 (4) ’Open’ and ’Close’ columns. The data is normalized using z-score
𝑛 𝑖=1
normalization, where each feature is standardized by subtracting
• Mean Absolute Percentage Error (MAPE): its mean and dividing by its standard deviation. By normalizing, all
44
features are brought to a similar scale, preventing one feature from derived from the length of the feature_columns. A hidden layer of
dominating another. 64 neurons is incorporated into the network with a hidden_size
of 64, which allows for intricate patterns to be captured. The out-
4.4 Architectural Settings put_size is set to 1 when predicting a single time series value. In
4.4.1 LSTM:. LSTM Model inherits from the nn.Module class and hierarchical feature learning, 2 num_layers implies the use of two
the constructor method initialize the LSTM model. It takes param- layers. The addition of four num_attention_heads enhances the
eters such as input size, hidden size, and number of layers. The model’s ability to focus simultaneously on multiple input aspects.
batch_first=True argument indicates that the input data has the With a learning_rate of 0.001, the optimization step size influences
batch size as the first dimension. A fully connected (linear) layer is convergence stability. A 100-epoch setting balances learning with-
added that transforms the output of the LSTM layer to the desired out the risk of overfitting the entire dataset. Adam optimizer, which
output size. The forward pass of the model takes an input x and adjusts learning rates dynamically, is chosen for efficient training.
computes the forward pass through the LSTM layer. The hyperpa- By measuring the squared difference between predicted and actual
rameters set hidden_size_lstm = 64 and num_layers_lstm = 2 for values, the MSELoss function quantifies prediction accuracy.
the LSTM model and the Mean Squared Error (MSE) loss function
and the Adam optimizer defined to update the model parameters Table 1: The models Hyperparameters
during training, with the learning rate 0.001.
Hyperparameters Values
4.4.2 Prophet: The historical stock prices are stored with the
’Date’ column renamed to ’ds’ and the ’Close’ column to ’y’. Addi- input_size len(feature_columns)
tionally, the timezone information is removed to ensure compat- hidden_size 64
ibility with Prophet. An instance of the Prophet class is created output_size 1
with the name m. The parameter daily_seasonality=True is set to num_layers 2
include daily seasonality patterns in the model. The historical stock num_attention_heads 4
price data (hist) is used to train the Prophet model using the fit learning_rate 0.001
method. The make_future_dataframe method is employed to create num_epochs 100
a DataFrame (future) that extends beyond the historical data, pro- Optimizer Adam
jecting into the future for a specified period (365 days in this case). loss function MSELoss
The prediction method is then applied to the future data frame,
generating a forecast for the stock prices. The code includes a line
(pd.options.display.max_columns = None) that ensures all columns
4.6 Data Analysis Experimental Results
are displayed when outputting the forecast.
The full code for the models can be found on GitHub:
4.4.3 Transformer: The model starts with an embedding layer https://github.com/layali64/Transformer-Model-Yahoo-Finance
(self.embedding) that transforms input data of size input_size to a
higher-dimensional space represented by hidden_size. The core of 4.6.1 LSTM Model: The first model was trained on AAL stock
the model is formed by the Transformer layers (self.transformer), Market indices of Yahoo Finance. Figure 1 illustrates the training,
following the architecture introduced in the paper "Attention is validation, and test losses over 1000 epochs for the LSTM model.
All You Need". It consists of multiple encoder and decoder layers, The blue line represents the training loss with a value of 0.0002.
each containing multi-head self-attention mechanisms. Parameters The orange line represents the validation loss with 0.0651 and the
such as num_layers and num_attention_heads allow fine-tuning green line corresponds to the test loss with a loss of 0.1972.
of the model’s capacity to capture different levels of contextual Figure 2 depicts real and predicted values for assessing the per-
information. The final layer (self.fc_output) maps the output from formance of the LSTM model. The blue line in the plot represents
the Transformer layers back to the original output dimensional- the true or actual values of the target variable across different time
ity (output_size), facilitating the prediction of the target variable. steps. The orange dashed line illustrates the values predicted by
The input sequence x is passed through the embedding layer to the LSTM model for the same time steps. An alignment in the mid-
obtain a higher-level representation. The input sequence is per- dle of the plot between the true and predicted values indicates a
muted to match the input requirements of the Transformer. The well-performing model.
Transformer processes the sequence, considering both encoder and 4.6.2 Prophet Model: Figure 3 shows the result of the prophet
decoder aspects, with self-attention mechanisms capturing con- model. Based on the specified date range from 2013 to 2023, the
textual relationships. The output from the Transformer layers is black dots represent data points used to train the model. A blue line
permuted back to the original shape. The last layer’s output is used shows the trend predicted by the Prophet model. The stock price
for prediction through the fully connected output layer. trend is captured by considering historical patterns and seasonality.
The light blue area around the forecasted trend represents the in-
4.5 Hyperparameters tervals and range of uncertainty. Based on the model’s uncertainty,
Table 1 depicts the hyperparameters for the models. Time series it shows the upper and lower bounds for actual prices. As can be
prediction is defined by the hyperparameters provided. A measure seen from the plot, the prophet model predicted the close stock
of the dimensionality of input features is the input_size, which is price from 2023 to 2024 for one year. It is possible to make informed
45
Figure 3: Comparison between Prophet model predictions

Figure 1: Values of the loss function for LSTM model over and the ground truth over the AAL dataset.
the AAL dataset.
Figure 4: Values of the loss function for Transformer model

Figure 2: Comparison between LSTM model predictions and over the AAL dataset.
the ground truth over the AAL dataset.
blue line, the training loss is 0.0185. Validation losses are 0.0501,
decisions about future stock price movements for 2024 by compar- while test losses are 0.0118.
ing forecasted trends with actual stock prices and examining the In Figure 7, real and predicted values are shown for assessing
uncertainty range. Transformer model performance. According to the plot, the blue line
represents the true values of the target variable. The transformer
4.6.3 Transformer Model: With 100 epochs, the final model was
model predicted values for the same time using an orange dashed
trained using AAL indexes from Yahoo Finance. Losses for training,
line. The true and predicted values are aligned in the plot, which
validation, and testing of the Transformer model are shown in
indicates a well-performing model.
Figure 4. With a loss of 0.0100, the blue line represents the training
loss. A loss of 0.0103 represents the validation loss, while a loss of
0.0085 corresponds to the test loss. 4.7 Comparative Results
Transformer model performance is shown in Figure 5. Based on The evaluation metrics for the validation and test sets provide quan-
the plot, the blue line represents the actual target variable value. titative insights into the model’s performance, including Mean Ab-
Using an orange dashed line, the transformer model predicted val- solute Error (MAE), Mean Squared Error (MSE), Root Mean Squared
ues at the same time. A robust model has true and predicted values Error (RMSE), and Mean Absolute Percentage Error (MAPE). LSTM
aligned in the plot. model achieves values in MAE (0.1855), MSE (0.0651), and RMSE
In the third model, AAME indexes of Yahoo Finance were used (0.2551), indicating superior performance in accuracy and error
for training with 100 epochs. The Transformer model’s training, metrics. Table 2 shows the comparative results of the models on
validation, and test losses are shown in Figure 6. As shown by the Validation data. Prophet model has higher error metrics with MAE
46
Figure 5: Comparison between Transformer model predic- Figure 7: Comparison between Transformer model predic-
tions and the ground truth over the AAL dataset. tions and the ground truth over the AAME dataset.
Table 3 shows the comparative results of the models on test data.

In the LSTM model, MAE, MSE, and RMSE are 0.2890, 0.1972, and
0.4440, respectively.
Although the prophet model improves compared to the vali-
dation set, MAE (3.5499), MSE (16.9321), and RMSE (4.1149) still
exhibit higher error metrics.
The AAL Transformer model consistently outperforms the AAME
Transformer model.
Table 3: Performance of the six models on the Test set.
Model Stock MAE MSE RMSE MAPE

indices
LSTM AAL 0.2890 0.1972 0.4440 24.3381
LSTM AAME 0.6890 0.0095 0.0976 14.7863
Prophet AAL 3.5499 16.9321 4.1149 -
Prophet AAME 4.1472 17.8601 4.2261 -
Figure 6: Values of the loss function for Transformer over
Transformer AAL 0.0793 0.0085 0.0923 8.0455
the AAME dataset.
Transformer AAME 0.0826 0.0118 0.1085 16.8889
(6.8927), MSE (73.3724), and RMSE (8.5658), suggesting lower ac- 4.8 Discussion
curacy compared to LSTM with market indices AAL. Both Trans-
Table 4 provides a comparative analysis of various models used
former models demonstrate competitive results, with AAL outper-
for stock index prediction, with metrics such as RMSE, MAE, and
forming AAME in all metrics.
Symmetric Mean Absolute Percentage Error (SMAPE). Different
approaches and their performance on different market indices are
Table 2: Performance of the six models on the Validation set. highlighted in the results.
Majiid et al. [12] utilized ET-LSTM and ET-GRU models for BBCA
stock, achieving RMSE values of 490.3815 and 493.7659, respec-
Model Stock MAE MSE RMSE MAPE
tively. In contrast, Xiaokang Hu [9] employed Google Temporal
indices
Fusion Transformer (TFT) on Google stock, reporting a lower MAE
LSTM AAL 0.1855 0.0651 0.2551 16.9622
of 275.67 and SMAPE of 0.2642. Chaojie Wang [22] implemented
LSTM AAME 0.7320 0.0194 0.1393 5.9126
Transformer models on Hang Seng indices, demonstrating superior
Prophet AAL 6.8927 73.3724 8.5658 -
performance with MAE values of 0.0881. The proposed LSTM and
Prophet AAME 3.0534 10.7997 3.2863 -
Prophet model for AAL stock yielded an MAE of 0.2890 and 3.5499,
Transformer AAL 0.0827 0.0103 0.1016 8.2274 respectively, while Transformer AAME and AAL achieved MAE
Transformer AAME 0.1421 0.0501 0.2238 10.4670 values of 0.0826, and 0.0793, respectively.
47
Table 4: Comparison of our modeling results with those in by combining temporal sequence learning and attention mecha-
literature. nism. Furthermore, comparing emerging algorithms and models in
machine learning across different financial markets, datasets, and
Author Stock Model Metric Value timeframes will contribute to a deeper understanding of their per-
Index formance. Hopefully, these future efforts would lead to enhanced
MRN BBCA ET-LSTM RMSE 490.3815 accuracy, robustness, and interpretability of stock price predictive
Majiid 2023 ET-GRU 493.7659 model.
[12] CNN- 2224.1882
BiLSTM-AM REFERENCES
Hu. Google Temporal MAE 275.67 [1] C Anand. 2021. Comparison of stock price prediction models using pre-trained
neural networks. Journal of Ubiquitous Computing and Communication Technolo-
Xiaokang Google Fusion SMAPE 0.2642 gies (UCCT) 3, 02 (2021), 122–134.
2022 [9] S&P 500 Transformer MAE 52.77 [2] Svetlana Bryzgalova, Sven Lerner, Martin Lettau, and Markus Pelger. 2022. Miss-
S&P 500 (TFT) SMAPE 0.0655 ing financial data. Available at SSRN 4106794 (2022).
[3] Zheng Chen, Yin-Liang Zhao, Xiao-Yu Pan, Zhao-Yu Dong, Bing Gao, and Zhi-
Chaojie CSI 300 CNN MAE 0.0948 Wen Zhong. 2009. An overview of Prophet. In Algorithms and Architectures for
Wang Hang Transformer 0.0881 Parallel Processing: 9th International Conference, ICA3PP 2009, Taipei, Taiwan, June
Seng 8-11, 2009. Proceedings 9. Springer, 396–407.
[4] Wikipedia contributors. 2024. Standard Score. https://en.wikipedia.org/wiki/
2022 [22] S&P 500 RNN 0.1359 Standard_score Accessed on February 4, 2024.
The Proposed AAL LSTM MAE 0.2890 [5] Yahoo Finance. 2024. Yahoo Finance. https://finance.yahoo.com/ Accessed on
February 4, 2024.
Model AAME LSTM 0.6890 [6] Anthony Gillioz, Jacky Casas, Elena Mugellini, and Omar Abou Khaled. 2020.
AAL Prophet 3.5499 Overview of the Transformer-based Models for NLP Tasks. In 2020 15th Conference
AAME Prophet 4.1472 on Computer Science and Information Systems (FedCSIS). IEEE, 179–183.
[7] JM González-Sopeña, V Pakrashi, and B Ghosh. 2021. An overview of performance
AAL Transformer 0.0793 evaluation metrics for short-term statistical wind power forecasting. Renewable
AAME Transformer 0.0826 and Sustainable Energy Reviews 138 (2021), 110515.
[8] Alex Graves and Alex Graves. 2012. Long short-term memory. Supervised sequence
labelling with recurrent neural networks (2012), 37–45.
[9] Xiaokang Hu. 2021. Stock price prediction based on temporal fusion transformer.
Based on these results, it is important to select models that are In 2021 3rd International Conference on Machine Learning, Big Data and Business
appropriate to specific market conditions. In spite of the fact that Intelligence (MLBDBI). IEEE, 60–66.
[10] Nti Isaac Kofi, Adebayo Felix Adekoya, and Benjamin Asubam Weyori. 2020.
some models exhibit remarkable accuracy, the choice depends on a A systematic review of fundamental and technical analysis of stock market
variety of factors, including the dataset characteristics, the model ar- predictions. The Artificial Intelligence Review 53, 4 (2020), 3007–3057.
chitecture, and the targeted market index. For stock index prediction [11] Salim Lahmiri. 2016. Features selection, data mining and finacial risk classification:
a comparative study. Intelligent Systems in Accounting, Finance and Management
endeavors, researchers and practitioners can use the comparative 23, 4 (2016), 265–275.
values to determine which models are most effective. [12] Muhammad Rizki Nur Majiid, Renaldy Fredyan, and Gede Putra Kusuma. 2023.
Application of Ensemble Transformer-RNNs on Stock Price Prediction of Bank
Central Asia. International Journal of Intelligent Systems and Applications in
5 CONCLUSION AND FUTURE WORK Engineering 11, 2 (2023), 471–477.
[13] Félix Morales, Carlos Sauer, Hans Mersch, Diego Stalder, and Miguel García
The purpose of this study was to explore a comprehensive approach Torres. 2022. Hyperparameter optimization of deep learning model for short-
to predicting stock prices by using three different models: LSTM, term electricity demand forecasting. In Proceedings of the 3rd South American
Prophet, and Transformer-based models. International Conference on Industrial Engineering and Operations Management.
[14] Nguyet Nguyen and Mohammad Islam. 2021. Comparison of Financial Models
LSTMs are known to capture long-term dependencies in sequen- for Stock Price Prediction. In 2021 Joint Mathematics Meetings (JMM). AMS.
tial data. Facebook’s Prophet model demonstrated its ability to [15] SGOPAL Patro and Kishore Kumar Sahu. 2015. Normalization: A preprocessing
handle non-linear trends, seasonality, and holidays in time series stage. arXiv preprint arXiv:1503.06462 (2015).
[16] Karanjit Singh and Shuchita Upadhyaya. 2012. Outlier detection: applications
data. Improved performance was achieved with the Transformer- and techniques. International Journal of Computer Science Issues (IJCSI) 9, 1 (2012),
based model. 307.
[17] S Sivaramakrishnan, Terrance Frederick Fernandez, RG Babukarthik, and S Pre-
In terms of accuracy metrics, the Transformer model consistently malatha. 2022. Forecasting time series data using arima and facebook prophet
outperformed other two models on the AAL stock index with MAE, models. In Big data management in Sensing. River Publishers, 47–59.
MSE, RMSE and MAPE value of 0.0793, 0.0085, 0.0923, and 8.0455, [18] Priyank Sonkiya, Vikas Bajpai, and Anukriti Bansal. 2021. Stock price prediction
using BERT and GAN. arXiv preprint arXiv:2107.09055 (2021).
respectively. [19] Zihe Tang, Yanqi Cheng, Ziyao Wang, et al. 2021. Quantified Investment Strate-
A comparison with previous studies revealed competitive perfor- gies and Excess Returns: Stock Price Forecasting Based on Machine Learning.
mance of Transformer model, indicating the importance of model Academic Journal of Computing & Information Science 4, 6 (2021), 10–14.
[20] Sean J Taylor and Benjamin Letham. 2018. Forecasting at scale. The American
selection based on market conditions and dataset characteristics. Statistician 72, 1 (2018), 37–45.
The results obtained in this investigation demonstrate the feasi- [21] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones,
Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all
bility of different models in predicting stock prices. It is imperative you need. Advances in neural information processing systems 30 (2017).
that future research in this domain takes into account the peculiar- [22] Chaojie Wang, Yuanyuan Chen, Shuqi Zhang, and Qiuhui Zhang. 2022. Stock
ities of financial datasets and explores innovative approaches to market index prediction using deep Transformer model. Expert Systems with
Applications 208 (2022), 118128.
improving the accuracy and reliability of stock price predictions. [23] Halbert White. 1988. Economic prediction using neural networks: The case of
In the future, along this line of research we need to develop IBM daily stock returns. In ICNN, Vol. 2. 451–458.
hybrid model architectures. Using LSTM model and Transformer [24] Yan Yu. 2022. A Study of Stock Market Predictability Based on Financial Time
Series Models. Mobile Information Systems 2022 (2022).
model in joint force might result in superior forecasting accuracy
48

Predictive Modeling of Stock Prices Using Transformer Model

Uploaded by

Copyright:

Available Formats

Predictive Modeling of Stock Prices Using Transformer Model

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Predictive Modeling of Stock Prices Using Transformer Model

Uploaded by

Copyright:

Available Formats

Predictive Modeling of Stock Prices Using Transformer Model

Leila Mozaffari Jianhua Zhang

ABSTRACT There is great significance to stock price prediction, since it can

3.5 Model Performance Metrics 4.2 Software and Hardware Requirements

Figure 3: Comparison between Prophet model predictions

Figure 4: Values of the loss function for Transformer model

Table 3 shows the comparative results of the models on test data.

Table 3: Performance of the six models on the Test set.

Model Stock MAE MSE RMSE MAPE

You might also like