Ref 1
Ref 1
College of Computing and IT, Arab Academy for Science, Technology and Maritime Transport,
Alexandria 5517220, Egypt; [email protected] (S.M.E.); [email protected] (M.W.F.)
* Correspondence: [email protected]
Abstract: Stock value prediction and trading, a captivating and complex research domain, continues
to draw heightened attention. Ensuring profitable returns in stock market investments demands
precise and timely decision-making. The evolution of technology has introduced advanced predictive
algorithms, reshaping investment strategies. Essential to this transformation is the profound reliance
on historical data analysis, driving the automation of decisions, particularly in individual stock
contexts. Recent strides in deep reinforcement learning algorithms have emerged as a focal point for
researchers, offering promising avenues in stock market predictions. In contrast to prevailing models
rooted in artificial neural network (ANN) and long short-term memory (LSTM) algorithms, this study
introduces a pioneering approach. By integrating ANN, LSTM, and natural language processing
(NLP) techniques with the deep Q network (DQN), this research crafts a novel architecture tailored
specifically for stock market prediction. At its core, this innovative framework harnesses the wealth of
historical stock data, with a keen focus on gold stocks. Augmented by the insightful analysis of social
media data, including platforms such as S&P, Yahoo, NASDAQ, and various gold market-related
channels, this study gains depth and comprehensiveness. The predictive prowess of the developed
model is exemplified in its ability to forecast the opening stock value for the subsequent day, a feat
validated across exhaustive datasets. Through rigorous comparative analysis against benchmark
algorithms, the research spotlights the unparalleled accuracy and efficacy of the proposed combined
algorithmic architecture. This study not only presents a compelling demonstration of predictive
analytics but also engages in critical analysis, illuminating the intricate dynamics of the stock market.
Ultimately, this research contributes valuable insights and sets new horizons in the realm of stock
Citation: Awad, A.L.; Elkaffas, S.M.; market predictions.
Fakhr, M.W. Stock Market Prediction
Using Deep Reinforcement Learning. Keywords: stock trading markets; deep reinforcement learning; DRL; neural networks; stock prediction;
Appl. Syst. Innov. 2023, 6, 106. variational mode decomposition; BERT
https://doi.org/10.3390/asi6060106
2. Related Work
Stock price prediction efforts have centered on supervised learning techniques, such
as neural networks, random forests, and regression methods [11]. A detailed analysis
by authors [12] underscored the dependency of supervised models on historical data,
revealing constraints that often lead to inaccurate predictions. In a separate study [13],
speech and deep learning (DL) techniques were applied to stock prediction using Google
stock datasets from NASDAQ. The research demonstrated that employing 2D principal
component analysis (PCA) with deep neural networks (DNN) outperformed the results
obtained with two-directional PCA combined with radial bias function neural network
(RBFNN), highlighting the efficacy of specific methodologies in enhancing accuracy. An-
other comprehensive survey [14] explored various DL methods, including CNN, LSTM,
DNN, RNN, RL, and others, in conjunction with natural language processing (NLP) and
WaveNet. Utilizing datasets sourced from foreign exchange stocks in Forex markets, the
study employed metrics like mean absolute percentage error (MAPE), root mean square
error (RMSE), mean square error (MSE), and the Sharpe ratio to evaluate performance.
The findings highlighted the prominence of RL and DNN in stock prediction research,
indicating the increasing popularity of these methods in financial modeling. While this
study covered a wide array of prediction techniques, it notably emphasized the absence
of results related to combining multiple DL methods for stock prediction. In a different
studies [15,16], four DL models utilizing data from NYSE and NSE markets were examined:
MLP, RNN, CNN, and LSTM. These models, when trained separately, identified trend
patterns in stock markets, providing insights into shared dynamics between the two stock
markets. Notably, the CNN-based model exhibited superior results in predicting stock
prices for specific businesses. However, this study did not explore hybrid networks, leav-
ing unexplored potential in creating combined models for stock prediction. Additionally,
Appl. Syst. Innov. 2023, 6, 106 3 of 21
3. Background
This section provides essential context for understanding the research presented in
this paper.
Figure1.1. The
Figure The architecture
architectureof
ofan
anartificial
artificialneural
neuralnetwork.
network.
3.2. Recurrent
3.2. RecurrentNeural
NeuralNetwork
Network
Recurrentneural
Recurrent neuralnetworks
networks(RNNs)
(RNNs)excelexcelin inprocessing
processingsequential
sequentialdata.
data.They
Theypossess
possess
a memory feature, retaining information from previous steps in a
a memory feature, retaining information from previous steps in a sequence as shown in sequence as shown in
Figure 2. RNNs incorporate inputs (“x”), outputs (“h”), and hidden
Figure 2. RNNs incorporate inputs (“x”), outputs (“h”), and hidden neurons (“A”). A neurons (“A”). A self-
loop on on
self-loop hidden neurons
hidden neuronssignifies
signifiesinput
inputfrom
fromthe theprevious
previous timetime step (“t −− 1”). However,
However,
RNNs face
RNNs facechallenges
challenges like
likethe
thevanishing
vanishinggradient
gradient problem,
problem,mitigated
mitigated bybytechniques
techniqueslikelike
longshort-term
long short-term memory
memory (LSTM)
(LSTM) units.
units. For
For instance,
instance, ifif the
the input
inputsequence
sequence comprises
comprises sixsix
daysof
days ofstock
stockopening
openingprice
pricedata,
data,the
thenetwork
network unfurls
unfurls into
into sixsix layers,
layers, each
each corresponding
corresponding to
to the opening stock price of a single day. However, a significant challenge
the opening stock price of a single day. However, a significant challenge confronting RNNs confronting
isRNNs is the vanishing
the vanishing gradientgradient
problem, problem,
which has which
beenhas been effectively
effectively addressed addressed
through through
various
various techniques,
techniques, includingincluding the incorporation
the incorporation of long short-term
of long short-term memory memory (LSTM)
(LSTM) units intounits
the
network.
into the network.
Figure2.2.Unfolded
Figure Unfoldedrecurrent
recurrentneural
neuralnetwork.
network.
Appl. Syst. Innov. 2023, 6, x FOR PEER REVIEW 5 of 22
3.3.
3.3. LSTM
LSTM
LSTM
LSTMenhances
enhancesRNNs’ RNNs’memory,
memory,crucial
crucialfor
forhandling
handlingsequential
sequentialfinancial
financialdata.
data.LSTM
LSTM
units, integrated
units,Each
integrated into
into RNNs,
RNNs, have
havethree
threegates: input
gates: gate
input (i),
gate forget
(i), gate
forget
of these gates operates utilizing sigmoid functions, transforming values into (f),
gate and
(f), output
and gatea
output
(o). These
gate (o).
range from gates
These use
zero to sigmoid
gates
one.use functions
Thissigmoid
mechanism to write,
functions delete,
to write,
empowers and
LSTMs read
delete, information,
and read
to adeptly addressing
write,information,
delete, and
long-term
addressing
read dependencies
informationlong-term and preserving
dependencies
from their data
and patterns.
memory, rendering preserving
themIn exceptionally
thedataLSTM architecture
patterns.
skilled illustrated
In atthe LSTM
handling
in Figure 3, three
architecturedependencies
long-term gates play
illustrated in Figurepivotal roles:
3, three gates
and preserving play patterns
crucial pivotal roles: in data. Crucially, LSTMs
1.
1. Input
address
Input theGate (i):
(i): This
challenge
Gate Thisofgate
the facilitates
gate vanishingthe
facilitates addition
gradient,
the of
of new
new information
additionensuring that gradient
information to
to the
the cell
values state.
cell remain
state.
2.
steep Forget Gate
enoughGate
2. Forget (f):
during The forget
(f): training.
The forget gate
This selectively
characteristic
gate discards
selectively significantly information that
reduces training
discards information is no
no longer
that is times and
longer
relevant
markedly
relevant or
or required
enhances by
by the
accuracy,
required model.
theestablishing
model. LSTMs as a foundational technology in the
3.
3. Output
domainOutput Gate
Gate (o):
of sequence Responsible
(o):prediction,
Responsible for
for choosing
especially the
the information
for intricate
choosing datasets to
information be
be presented
prevalent
to as
as the
in financial
presented the
output.
markets.
output.
Figure 3.
Figure 3. LSTM
LSTM architecture.
architecture.
3.4. Reinforcement
Each of theseLearning
gates operates utilizing sigmoid functions, transforming values into a
rangeReinforcement
from zero to one. This mechanism
learning empowers
involves an agent LSTMs
making to adeptly
decisions write, scenarios.
in different delete, andIt
read information from their memory, rendering them exceptionally skilled at handling
comprises the agent, environment, actions, rewards, and observations. Reinforcement
learning faces challenges such as excessive reinforcements and high computational costs,
especially for complex problems. The dynamics of reinforcement learning are
encapsulated in Figure 4, illustrating the interaction between the agent and its
environment. Notably, states in this framework are stochastic, meaning the agent remains
Appl. Syst. Innov. 2023, 6, 106 5 of 21
long-term dependencies and preserving crucial patterns in data. Crucially, LSTMs address
the challenge of the vanishing gradient, ensuring that gradient values remain steep enough
during training. This characteristic significantly reduces training times and markedly
enhances accuracy, establishing LSTMs as a foundational technology in the domain of
Figure 3. LSTM
sequence architecture.
prediction, especially for intricate datasets prevalent in financial markets.
3.4. Reinforcement
3.4. Reinforcement Learning
Learning
Reinforcement learning involves
Reinforcement involves anan agent
agentmaking
makingdecisions
decisionsinindifferent
differentscenarios. It
scenarios.
Itcomprises
comprisesthe theagent,
agent,environment,
environment, actions,
actions, rewards, and
and observations.
observations. Reinforcement
Reinforcement
learningfaces
learning faceschallenges
challengessuch
suchasasexcessive
excessive reinforcements
reinforcements andand high
high computational
computational costs,
costs, es-
especially
pecially for complex
for complex problems.
problems. The dynamics
The dynamics of reinforcement
of reinforcement learning are
learning are encapsulated in
encapsulated
Figure in Figure
4, illustrating 4, illustrating
the interaction betweenthethe interaction
agent and itsbetween the agent
environment. Notably,and its
states
in this framework
environment. are stochastic,
Notably, meaning
states in this the agent
framework remains unaware
are stochastic, meaningof the
the subsequent
agent remains
state, evenofwhen
unaware repeating the
the subsequent same
state, action.
even when repeating the same action.
Figure4.
Figure 4. The
The reinforcement
reinforcementlearning
learningprocess.
process.
Withinthe
Within the realm
realm of
of reinforcement
reinforcement learning,
learning, several
several crucial
crucial quantities
quantities are determined:
•• Reward: A scalar value from the environment
Reward: environment that evaluates
evaluates the preceding action.
Rewards can
Rewards can be
be positive
positive oror negative,
negative, contingent
contingent upon
upon the
the nature
nature of
of the
the environment
environment
andthe
and theagent’s
agent’saction.
action.
•• Policy: This
Policy: This guides
guides the
the agent
agent in
in deciding
deciding the
the subsequent
subsequent action
action based
based on
on the
the current
current
state,
state,helping
helpingthetheagent
agentnavigate
navigateits
itsactions
actionseffectively.
effectively.
•• Value
Value (V):
(V): Represents
Represents thethe long-term
long-term return,
return, factoring
factoring in
in discount
discount rates,
rates, rather
rather than
than
focusing solely on short-term rewards
focusing solely on short-term rewards (R). (R).
• Action Value: Like the reward value, but incorporates additional parameters from the
current action. This metric guides the agent in optimizing its actions within the given
environment.
Despite the advantages of reinforcement learning over supervised learning models, it
does come with certain drawbacks. These challenges include issues related to excessive
reinforcements, which can lead to erroneous outcomes. Additionally, reinforcement learn-
ing methods are primarily employed for solving intricate problems, requiring substantial
volumes of data and significant computational resources. The maintenance costs associated
with this approach are also notably high.
This study focuses on predicting gold prices based on next-day tweets sourced from
news and media datasets. Gold prices exhibit rapid fluctuations daily, necessitating a robust
prediction strategy. To achieve accurate predictions, this research employs a comprehensive
approach integrating deep reinforcement learning (DRL), long short-term memory (LSTM),
variational mode decomposition (VMD), and natural language processing (NLP). The
prediction time spans from 2012 to 2019, utilizing tweets related to gold prices. DRL
is enhanced by incorporating sentiment analysis of media news feeds and Twitter data,
elevating prediction accuracy. The dataset used for this analysis was retrieved from the
link https://www.kaggle.com/datasets/ankurzing/sentiment-analysis-in-commodity-
market-gold accessed on 1 February 2023. This dataset, spanning from 2000 to 2021,
Appl. Syst. Innov. 2023, 6, 106 6 of 21
Figure
Figure 5.
5. The
The DRL
DRL process.
process.
Another objective
objective of reinforcement learning is to maximize the cumulative reward
instead of
of the
the immediate
immediatereward
reward[29].
[29].Suppose
Supposethe
thecumulative
cumulative reward
reward is represented
is represented by by
Gt
𝐺 and immediate reward
and immediate reward by Rt : by 𝑅 :
𝐸[𝐺 ] = 𝐸[𝑅 + 𝑅 + ⋯ + 𝑅 ] (1)
E[ Gt ] = E [ R t +1 + R t +2 + · · · + R T ] (1)
In Equation (1), the reward is received at a terminal state T. This implies Equation (1)
In Equation
will hold (1), the
good when the reward
problemisends
received at a terminal
in terminal state T,state
alsoT. This implies
known Equationtask
as the episodic (1)
will hold good when the problem ends in terminal state T, also known as the episodic
[30]. In problems involving continuous data, the terminal state is not available, i.e., T = ∞.
task [30]. In problems involving continuous data, the terminal state is not available, i.e.,
A discount factor γ is introduced in Equation (2), which represents the cumulative reward,
T = ∞. A discount factor γ is introduced in Equation (2), which represents the cumulative
and (0 ≤ 𝛾 ≤ 1) to provide:
reward, and (0 ≤ γ ≤ 1) to provide:
𝐺 = 𝛾 𝑅 + 𝛾 𝑅 + 𝛾 𝑅 + ⋯𝛾 𝑇 + ⋯ (2)
Gt = γ0 Rt+1 + γ1 Rt+2 + γ2 Rt+3 + · · · γk−1 Tt+k + · · · (2)
𝐺 = 𝛾 𝑅 (3)
To perform an action in the given state by the agent, value functions in RL methods
determine the estimate of actions. The agent determines the value functions based on what
future actions will be taken [31]. Bellman’s equations are essential in RL, as they provide
Appl. Syst. Innov. 2023, 6, 106 7 of 21
∞
Gt = ∑ γ k R t + k +1 (3)
0
To perform an action in the given state by the agent, value functions in RL methods
determine the estimate of actions. The agent determines the value functions based on
what future actions will be taken [31]. Bellman’s equations are essential in RL, as they
provide the fundamental property for value functions and solve MDPs. Bellman’s equations
support the value function by calculating the sum of all possibilities of expected returns
and weighing each return by its probability of occurrence in a policy [32].
The actor–critic approach will form the policy as the actor will select actions, and the
critic will evaluate the chosen actions. Hence, in this approach, the policy parameters θ
will be adjusted for the actor to maximize the reward predicted by the critic. Here, the
value function estimate for the current state is summed as a baseline to accelerate learning.
The policy parameter θ of the actor is adjusted to maximize the total future reward. Policy
learning is done by maximizing the value function [36].
DRL is an action–critic-based value learning function that compromises current and
future rewards [37]. The stock prediction problem can be formulated by describing the
state space, action space, and reward function. Here, the state space is the environment
designed to support single or multiple stock trading by considering the number of assets to
trade in the market. The state space will show a linear increase with increasing assets. The
state space has two components: the position state and the market signals. The position
state will provide the cash balance and shares owned in each asset, and the market signals
will contain all necessary market features for the asset as tuples [38]. The information is
provided to the agent to make predictions of market movement. Here, the information is a
hypothesis based on technical analysis and of the future behavior of the financial market
based on its past trends. The information will also be used by economic and industry
conditions, media, and news releases.
3.9. TFIDF
TF-IDF stands for term frequency–inverse document frequency. It is used for document
search by getting a query as input and finding the relevant documents as output. It is a
statistical analysis technique used to know the importance of a word inside a document. It
calculates the frequency of a word inside a document, compares it with the frequency of
the word inside all documents, and compares the two values. The assumption is that if the
word is repeated many times in a document and rarely appears in other documents, this
means that this word is vital for this document.
3.10. BERT
Bidirectional encoder representations from transformers (BERT) is based on deep
learning transformers for natural language processing. BERT is trained bidirectionally,
which means it analyzes the word and the surrounding words in both directions. Reading
in both directions allows the model to understand the context deeply. BERT models are
already pretrained, so they already know the word representation and the relationships
between them. BERT is a generic model that can be fine-tuned for specific tasks like
sentiment analysis tasks. BERT contains a stack of transformers, each consisting of an
encoder and decoder network. It has two versions, the base version and the large one,
which gives the best results compared to any other model.
4. Problem Statement
In the complex landscape of stock markets, the central objective of trading resides in
the precise forecasting of stock prices. This accuracy is paramount, as it directly influences
investors’ confidence, shaping their decisions on whether to buy, hold, or sell stocks amid
the inherent risks of the market. Extensive scholarly research emphasizes the critical
necessity for efficiency in addressing the challenges associated with stock price prediction.
Efficient predictions are not just advantageous but pivotal, empowering investors with the
knowledge needed for astute decision-making. Market efficiency, a foundational concept
in this domain, refers to the phenomenon where stock prices authentically mirror the
information available in the current trading markets. It is essential to recognize that
Appl. Syst. Innov. 2023, 6, 106 9 of 21
these price adjustments might not solely stem from new information; rather, they can be
influenced by existing data, leading to outcomes that are inherently unpredictable. In
this context, our research endeavors to enhance the precision of stock price predictions,
addressing the need for informed and confident decision-making among investors.
Figure 6. The architecture with components of the proposed stock prediction model.
Figure 6. The architecture with components of the proposed stock prediction model.
In In
order
order to facilitatethethe
to facilitate implementation
implementation of the of the proposed
proposed framework framework the code is
the code is divided
into three
divided intomajor
threemodules that can be
major modules summarized
that in Algorithm
can be summarized in1.Algorithm 1.
Appl. Syst. Innov. 2023, 6, 106 10 of 21
BERT is used in NLP tasks for predicting the next sentence [41]. In NLP, mixed models
tend to provide the best results from BERT. For instance, TFIDF, SVM, and BERT will
provide better sentiment output from the dataset. The sentiments are further classified into
four categories: extremely positive, positive, negative, and extremely negative. NLP will
support investors in classifying if the news is positive or negative to decide whether to sell,
buy, or hold stock.
In this phase, news data are fed to the natural language processing module to decide
whether the news is positive or negative. The BERT model is used along with TFIDF in
this task to achieve the most accurate results. Fine-tuning BERT is achieved by applying
a binary classifier on top of BERT. This NLP phase involves the stages of preprocessing,
modeling, and prediction.
• Preprocessing: In this phase, the news dataset obtained from media or tweets is pre-
processed. The preprocessing involves reading the dataset, tokenizing the sentences,
converting words to lowercase, removing stop words, sentences stemmed, and finally,
the words with the same meaning are grouped or lemmatized.
• Modeling: This step involves feature extraction for the model and sentiment analysis.
Sentiment analysis will first convert the tokens to the dictionary, and the dataset will
be split for training and testing the model. The model is built using an artificial neural
network classifier.
• Prediction: This step will receive the testing news data and predict if the sentiment is
positive or negative. This result is concatenated with the historical dataset.
Figure7.7.The
Figure Thearchitecture
architectureofofVMD
VMDplus
plusLSTM.
LSTM.
5.3.
5.3.The
TheDeep
DeepReinforcement
ReinforcementLearning
LearningPhase
Phase
The last phase is the DRL model,
The last phase is the DRL model, from which
from the final
which decision
the final is generated.
decision The input
is generated. The
to this to
input phase
this is the output
phase from the
is the output sentiment
from analysis
the sentiment module,
analysis the predicted
module, prices prices
the predicted from
the
fromLSTM, and some
the LSTM, andtechnical indicators.
some technical The DRL
indicators. used
The DRLin this
usedphase
in thisis deep
phaseQislearning
deep Q
with
learning with a reply buffer. The neural network is trained to generate the Q valuesall
a reply buffer. The neural network is trained to generate the Q values for forthe
all
possible actions based on the current environment state, which is fed to the
the possible actions based on the current environment state, which is fed to the neural neural network
as input. as input.
network
Therefore,
Therefore,thetheproposed
proposedarchitecture
architectureandandalgorithm
algorithm depend
dependon onhistorical andand
historical media
mediaor
news datasets. The architecture consists of three phases: NLP, prediction,
or news datasets. The architecture consists of three phases: NLP, prediction, and DRL. Theand DRL. The
combined
combinedalgorithm
algorithmof ofsentiment
sentimentandandanalysis
analysisand
andDRL
DRLareareused
usedto toobtain
obtainpredictions
predictionsforfor
stock.
stock.
6. Implementation and Discussion of Results
6. Implementation and Discussion of Results
The implementation of our framework is carried out utilizing cloud GPUs, leveraging
The implementation of our framework is carried out utilizing cloud GPUs,
the advantages of cloud computing for enhanced processing capabilities. Rigorous evalu-
leveraging the advantages
ation and fine-tuning of eachofcode
cloud computing
module for enhanced
are conducted processing
to ensure capabilities.
optimal accuracy at
Rigorous evaluation and fine-tuning of each code module are conducted to ensure
every phase. The efficiency of the proposed framework is comprehensively evaluated and optimal
accuracy at
compared every
with phase. The
benchmark efficiency
trading of the
strategies to proposed
validate itsframework is comprehensively
effectiveness.
evaluated and compared with benchmark trading strategies to validate its effectiveness.
6.1. Sentiment Analysis Phase
6.1. Sentiment Analysis
In the sentiment Phase phase, various classification algorithms coupled with differ-
analysis
In the sentiment
ent preprocessing modelsanalysis phase,
are tested various the
to determine classification
most accuratealgorithms
algorithm.coupled with
The results,
different
as shown preprocessing models are
in Table 1, underscore thetested to determine
superiority the most accurate
of the combination algorithm.
of TFIDF The
and BERT,
results, as shown in Table 1, underscore the superiority of the combination of
which yielded a remarkable accuracy of 96.8%. Extensive analytics, including classification TFIDF and
BERT, which
techniques and yielded a remarkable
model overfitting accuracy were
identification, of 96.8%. Extensive
performed. analytics, especially
Visualization, including
classification
using techniques
artificial neural and (ANN)
networks model withoverfitting
BERT and identification,
TFIDF, playedwere performed.
a crucial role in
Visualization, especially
comprehending using artificialdynamics.
the training-prediction neural networks
The ANN (ANN)
modelwith BERT exceptional
exhibited and TFIDF,
played a crucial
performance, role an
boasting in accuracy
comprehending the as
rate of 97%, training-prediction
depicted in Figuredynamics.
8. The ANN
model exhibited exceptional performance, boasting an accuracy rate of 97%, as depicted
Table 1. Findings
in Figure 8. on gold data using sentiment analysis.
It important
It is is important
to to note
note that
that thethe accurate
accurate prediction
prediction from
from this
this phase
phase leads
leads to to accurate
accurate
decisions from the DRL phase. The efficiency of this prediction is evaluated, and results
decisions from the DRL phase. The efficiency of this prediction is evaluated, and the the
are shown in Figure 9, comparing the actual and predicted prices. The figure shows that
results are shown in Figure 9, comparing the actual and predicted prices. The figure shows
our prediction module works very well, as there is a significant correlation between the
actual and the predicted prices.
Appl. Syst. Innov. 2023, 6, x FOR PEER REVIEW 15 of 22
Figure 9.
Figure 9. Actual
Actual prices
prices vs.
vs. predicted
predicted prices.
prices.
6.3.
6.3. Final
Final Decision
Decision Phase
Phase
The
The next
next phase
phase is
is the
the deep
deep reinforcement
reinforcement learning
learning phase,
phase, which
which will
will make
make the
the final
final
decision.
decision. The
The implementation
implementation relies
relies on
on the
the famous
famous architecture
architecture ofof deep
deep QQ learning,
learning, which
which
belongs
belongs toto the
the value-based
value-based category
category ofof DRL
DRL algorithms.
algorithms. Table
Table 44 shows
shows the
the configuration
configuration
for
for the implemented network. The DQN relies on a reply buffer with two
the implemented network. The DQN relies on a reply buffer with two deep
deep neural
neural
networks:
networks: one is the main network, and the other is the target network. Both networks
one is the main network, and the other is the target network. Both networks
have
have the
the same
same architecture
architecturewithwiththree
threelayers.
layers.
Table4.4. Hyper-parameters
Table Hyper-parametersadopted
adoptedin
inthe
theimplemented
implementedDRL
DRLalgorithm.
algorithm.
The final decision phase employs deep reinforcement learning (DRL), specifically the
The final decision phase employs deep reinforcement learning (DRL), specifically the
deep Q learning architecture, a value-based DRL algorithm. The implementation details are
deep Q learning architecture, a value-based DRL algorithm. The implementation details
provided in Table 4. The state representation includes factors like historical and predicted
are provided in Table 4. The state representation includes factors like historical and
prices, sentiment analysis outputs, and technical indicators like relative strength index (RSI)
predicted prices, sentiment analysis outputs, and technical indicators like relative strength
and momentum (MOM). The action space consists of four actions: buy, buy more, sell, and
index (RSI) and momentum (MOM). The action space consists of four actions: buy, buy
sell more.
more, sell, and sell more.
The efficiency of the entire framework is deeply rooted in the accurate predictions
The efficiency of the entire framework is deeply rooted in the accurate predictions
from the stock price prediction phase. The DRL model’s capability to make informed
from the stock price prediction phase. The DRL model’s capability to make informed
decisions based on these predictions is crucial for successful trading strategies.
decisions based on these predictions is crucial for successful trading strategies.
6.4. Algorithms in Comparison
The gold dataset was processed using the algorithms, namely, best stock benchmark,
buy-and-hold benchmark, and “constantly rebalanced portfolios” (CRPs). The algorithms
provided results that were compared with the proposed architecture. The metrics and
values determined using these algorithms are provided in Table 5. The values obtained
are rounded to the nearest whole number. The classical buy-and-hold benchmark is quite
Appl. Syst. Innov. 2023, 6, 106 15 of 21
simple, where the user buys gold with all his money at the beginning of the period and
waits till the end of the period, then sells all his gold, and the total profit is the difference
between his wealth at the start and end of the period.
∑tT=1 Pt or Lt
AWRT = (5)
Casht=0
∑tT=1 (MDDt )
AMDDT = (7)
T
• Calmar ratio
This calculates the mean value for the accumulated wealth rate with respect to the
max of the max drawdown value. The following equation can calculate it.
∑tT=1 (MDDt )
AMDDT = (8)
T
mean(AWRT )
SRT = (11)
Std(AWRT )
∑tT=1 (SRt )
ASRT = (12)
T
• Annualized Return Rate and Annualized Sharpe Ratio
The annualized terms mean calculating the values with respect to a full year.
They are calculated with the same equations, but the trading periods will be 365.
100
RSI = 100 − (13)
(1 + RS)
fifth,
fifth, the
the predicted
predicted prices
prices for
for the
the upcoming
upcoming five
five days
days are
are predicted
predicted from
from the
the VMD
VMD plus
Appl. Syst. Innov. 2023, 6, 106 17 plus
of 21
LSTM;
LSTM; andand sixth,
sixth, the
the sentiment
sentiment analysis
analysis module
module generates
generates the
the sentiment
sentiment for
for the
the current
current
day.
day.
6.9.
6.9. Proposed Framework Results Comparison
6.9. Proposed
Proposed Framework
Framework Results
Results Comparison
Comparison
The
The results from the proposed framework are compared with the benchmark trading
The results from the proposed framework
results from the proposed framework are are compared
compared with
with the
the benchmark
benchmark trading
trading
strategies
strategies mentioned
mentioned above.
above. The results
The showed
results that the
showed proposed
that the framework
proposed outper-
framework
strategies mentioned above. The results showed that the proposed framework
formed the other algorithms
outperformed in different evaluation criteria,criteria,
as shown in Table 5.Table 5.
outperformed the the other
other algorithms
algorithms in in different
different evaluation
evaluation criteria, asas shown
shown inin Table 5.
The
The values for performance metrics are obtained from the same gold dataset earlier.
The values for performance metrics are obtained from the same gold dataset earlier.
values for performance metrics are obtained from the same gold dataset earlier.
The
The DQN results were compared with the other algorithms. The graphs were obtained to
The DQN
DQN results
results were
were compared
compared withwith the
the other
other algorithms.
algorithms. The
The graphs
graphs were
were obtained
obtained to to
show
show the performance of the algorithms. The annualized wealth rate algorithm provided
show thethe performance
performance of of the
the algorithms.
algorithms. The
The annualized
annualized wealth
wealth rate
rate algorithm
algorithm provided
provided
the
the following graph for metrics shown in Figure 10.
the following
following graph
graph for
for metrics
metrics shown
shown inin Figure
Figure 10.
10.
Figure
Figure 10. Graph
10. Graph
Figure10. showing
Graph showing metrics
showing metrics for
metricsfor annualized
forannualized wealth
annualizedwealth rate.
wealthrate.
rate.
The
The graph
graphforfor
Thegraph thethe
for average
average
the maximum
maximum
average drawdown
drawdown
maximum algorithm
algorithm
drawdown for the for
algorithm the
the performance
performance
for metrics
performance
metrics
is is
provided
metrics provided in Figure
in Figurein11.
is provided 11.
Figure 11.
Figure 11.
Figure11. Graph
11.Graph
Graph showing
showing the
the comparison
comparison of
of metrics
metrics for
for the average
average maximum
the maximum maximum drawdown
Figure
algorithm. showing the comparison of metrics for the average drawdown drawdown
algorithm.
algorithm.
In Figure 10, the peaks indicate the amount of profit possible at a certain point in
In Figure 10, the
the peaks indicate the amount
amount of profit
profit possible at aa certain point
point in
time. In Figure
The graphs10,show peaks indicate the
that regarding the annualizedof wealth possible
rate, the at certain
proposed in
algorithm
time.
time. The
The graphs
graphs show
show that
that regarding
regarding the
the annualized
annualized wealth
wealth rate,
rate, the
the proposed
proposed algorithm
algorithm
outperforms the other algorithms, and hence is effective in predicting stock value. Likewise,
outperforms
outperforms the other
thepeaks
other algorithms,
algorithms, and hence is
is effective in predicting stock
stock value.
in Figure 11, the of the proposedand hence indicate
algorithm effective in results
that the predicting
outperform value.
other
baseline algorithms. In addition, the NLP processing and the combined RNN, DQN, and
VMD architecture provide better prediction results.
Appl. Syst. Innov. 2023, 6, x FOR PEER REVIEW 19 of 22
Likewise, in Figure 11, the peaks of the proposed algorithm indicate that the results
Appl. Syst. Innov. 2023, 6, 106 outperform other baseline algorithms. In addition, the NLP processing and the combined
18 of 21
RNN, DQN, and VMD architecture provide better prediction results.
6.10.2.
6.10.2.Effect
Effectof ofUsing
UsingSentiment
SentimentAnalysis
AnalysisModule
Moduleon onthe
theFramework
FrameworkPerformance
Performance
In
In the same context, other experiments are conducted to emphasizethe
the same context, other experiments are conducted to emphasize theefficiency
efficiencyof of
using
using sentiment
sentiment analysis
analysis ininour
ourproposed
proposedalgorithm.
algorithm. Figure
Figure 1212 shows
shows thetheperformance
performance
improvement
improvement achieved
achieved by by adding
adding the
thesentiment
sentimentanalysis
analysismodule
moduleto toour
ouralgorithm.
algorithm.The The
experiments
experimentsare aredone
doneforfordifferent
different numbers
numbers ofofepisodes.
episodes.Each number
Each number of episodes
of episodes is done at
is done
least 10 times, and the average is taken. In these experiments, the performance
at least 10 times, and the average is taken. In these experiments, the performance is is measured
as follows. as
measured The current
follows. Theday’s closing
current day’sprice is compared
closing with the previous
price is compared day’s closing
with the previous day’s
price. If there is a price increase and the algorithm decides to sell, this is considered
closing price. If there is a price increase and the algorithm decides to sell, this is considered the
correct
the correct action. On the other hand, if the algorithm decides to buy, this is consideredaa
action. On the other hand, if the algorithm decides to buy, this is considered
wrong
wrongaction.
action.TheTheperformance
performancehere hereisiscalculated
calculatedas asthe
thepercentage
percentageof ofthe
thecorrect
correctactions
actions
relative to the algorithm’s total number of
relative to the algorithm’s total number of actions. actions.
Figure12.
Figure 12.Effect
Effectofofusing
usingthe
thesentiment
sentimentanalysis
analysismodule.
module.
7. Conclusions
This research introduces a novel architecture that combines various prediction al-
gorithms to tackle the challenges of stock value prediction with exceptional accuracy.
Specifically focusing on gold datasets, the study aimed to forecast gold prices for investors.
The input data encompassed gold datasets from reputable sources such as S&P, Yahoo, and
NASDAQ, representing standard stock market data. The predictive framework employed
Appl. Syst. Innov. 2023, 6, 106 19 of 21
natural language processing (NLP) to process sentiments extracted from social media
feeds, long short-term memory (LSTM) networks to analyze historical data, variation mode
decomposition (VMD) for feature selection, and artificial neural networks (ANNs) to make
predictions. Additionally, the research integrated deep reinforcement learning (DRL) algo-
rithms and deep Q networks (DQNs) to blend sentiments with other algorithms, enabling
the prediction of the opening stock value for the next day based on the previous day’s
data. The processes developed for training and testing data were meticulously presented,
forming the foundation of the prediction model. Comparative analysis was conducted
with benchmark performance metrics, including the best stock benchmark, buy-and-hold
benchmark, constant rebalanced portfolios, and DQN. Through rigorous evaluation, the
proposed architecture demonstrated superior accuracy in performance metrics. Graphi-
cal representations were employed to showcase peaks indicating high values at specific
times or on specific days, aligning with benchmark standards. The comparison clearly
highlighted that the DQN outperformed existing algorithms, underscoring the potential of
the proposed architecture to predict stocks with unparalleled precision.
Future research, which could extend this research into real-time applications within dy-
namic environments, such as livestock markets, holds immense promise. Such applications
could provide invaluable insights into the model’s effectiveness and adaptability across
different market scenarios. Moreover, the framework’s generic nature, as demonstrated in
this study, suggests its versatility for application across diverse products beyond gold. This
versatility transforms the model into a powerful tool for traders and investors in various
sectors. Subsequent studies focusing on real-time livestock market data not only stand
to validate the framework’s effectiveness but also pave the way for tailored adaptations
customized to specific industries and the unique intricacies of each market.
The proposed framework contains three main modules. Each module can be enhanced
with different techniques. In the sentiment analysis module, the proposed framework used
classification techniques to judge whether the sentence is positive or negative. However,
another primary technique that can be used is the lexicon-based technique in which the
language dictionary is used to make the sentiment analysis.
In the price prediction module, the proposed framework considered the stock historical
prices as a signal and used VMD as a signal-processing technique to decompose the signal
into sub-signals and remove the signal noise. Several other signal-processing techniques
can be used for noise removal. This area is open to research, and other signal-processing
techniques may easily enhance this module if they exist.
Finally, the decision-making is undertaken by the deep reinforcement network. Several
DRL techniques can be utilized in this module, giving better or worse results than the
implemented one.
Author Contributions: Methodology, A.L.A., S.M.E. and M.W.F.; Software, A.L.A.; Supervision,
S.M.E. and M.W.F. All authors have read and agreed to the published version of the manuscript.
Funding: This research received no external funding.
Data Availability Statement: Publicly available datasets were analyzed in this study. This data can
be found here: https://www.kaggle.com/datasets/ankurzing/sentiment-analysis-in-commodity-
market-gold (accessed on 1 February 2023).
Conflicts of Interest: The authors declare no conflict of interest.
References
1. Idrees, S.M.; Alam, M.A.; Agarwal, P. A Prediction Approach for Stock Market Volatility Based on Time Series Data. IEEE Accesss
2019, 7, 17287–17298. [CrossRef]
2. Bouteska, A.; Regaieg, B. Loss aversion, the overconfidence of investors and their impact on market performance evidence from
the US stock markets. J. Econ. Financ. Adm. Sci. 2020, 25, 451–478. [CrossRef]
3. Feng, F.; He, X.; Wang, X.; Luo, C.; Liu, Y.; Chua, T.S. Temporal Relational Ranking for Stock Prediction|ACM Transactions on
Information Systems. ACM Trans. Inf. Syst. (TOIS) 2019, 37, 1–30. [CrossRef]
Appl. Syst. Innov. 2023, 6, 106 20 of 21
4. Dirman, A. Financial distress: The impacts of profitability, liquidity, leverage, firm size, and free cash flow. Int. J. Bus. Econ. Law
2020, 22, 17–25.
5. Ghimire, A.; Thapa, S.; Jha, A.K.; Adhikari, S.; Kumar, A. Accelerating Business Growth with Big Data and Artificial Intelligence.
In Proceedings of the 2020 Fourth International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC),
Palladam, India, 7–9 October 2020. [CrossRef]
6. Kurani, A.; Doshi, P.; Vakharia, A.; Shah, M. A Comprehensive Comparative Study of Artificial Neural Networks (ANN) and
Support Vector Machines (SVM) on Stock Forecasting. Ann. Data Sci. 2021, 10, 183–208. [CrossRef]
7. Beg, M.O.; Awan, M.N.; Ali, S.S. Algorithmic Machine Learning for Prediction of Stock Prices. In FinTech as a Disruptive Technology
for Financial Institutions; IGI Global: Hershey, PA, USA, 2019; pp. 142–169. [CrossRef]
8. Shah, D.; Isah, H.; Zulkernine, F. Stock Market Analysis: A Review and Taxonomy of Prediction Techniques. Int. J. Financ. Stud.
2019, 7, 26. [CrossRef]
9. Yadav, A.; Chakraborty, A. Investor Sentiment and Stock Market Returns Evidence from the Indian Market. Purushartha-J. Manag.
Ethics Spiritual. 2022, 15, 79–93. [CrossRef]
10. Chauhan, L.; Alberg, J.; Lipton, Z. Uncertainty-Aware Lookahead Factor Models for Quantitative Investing. In Proceedings of the
37th International Conference on Machine Learning (PMLR), Virtual, 13–18 July 2020; Volume 119, pp. 1489–1499.
11. Nti, I.K.; Adekoya, A.F.; Weyori, B.A. A novel multi-source information-fusion predictive framework based on deep neural
networks for accuracy enhancement in stock market prediction. J. Big Data 2021, 8, 17. [CrossRef]
12. Sakhare, N.N.; Imambi, S.S. Performance analysis of regression-based machine learning techniques for prediction of stock market
movement. Int. J. Recent Technol. Eng. 2019, 7, 655–662.
13. Singh, R.; Srivastava, S. Stock prediction using deep learning. Multimed. Tools Appl. 2016, 76, 18569–18584. [CrossRef]
14. Hu, Z.; Zhao, Y.; Khushi, M. A Survey of Forex and Stock Price Prediction Using Deep Learning. Appl. Syst. Innov. 2021, 4, 9.
[CrossRef]
15. Hiransha, M.; Gopalakrishnan, E.A.; Menon, V.K.; Soman, K.P. NSE Stock Market Prediction Using Deep-Learning Models.
Procedia Comput. Sci. 2018, 132, 1351–1362. [CrossRef]
16. Patel, R.; Choudhary, V.; Saxena, D.; Singh, A.K. Review of Stock Prediction using machine learning techniques. In Proceedings of
the 5th International Conference on Trends in Electronics and Informatics (ICOEI), Tirunelveli, India, 3–5 June 2021; pp. 840–847.
17. Kamath, U.; Liu, J.; Whitaker, J. Deep Learning for NLP and Speech Recognition; Springer: Cham, Switzerland, 2019; pp. 575–613.
18. Manolakis, D.; Bosowski, N.; Ingle, V.K. Count Time-Series Analysis: A Signal Processing Perspective. IEEE Signal Process. Mag.
2019, 36, 64–81. [CrossRef]
19. Kabbani, T.; Duman, E. Deep Reinforcement Learning Approach for Trading Automation in the Stock Market. IEEE Access 2022,
10, 93564–93574. [CrossRef]
20. Moghar, A.; Hamiche, M. Stock Market Prediction Using LSTM Recurrent Neural Network. Procedia Comput. Sci. 2020, 170,
1168–1173. [CrossRef]
21. Ren, Y.; Liao, F.; Gong, Y. Impact of News on the Trend of Stock Price Change: An Analysis based on the Deep Bidirectional LSTM
Model. Procedia Comput. Sci. 2020, 174, 128–140. [CrossRef]
22. Jin, Z.; Yang, Y.; Liu, Y. Stock closing price prediction based on sentiment analysis and LSTM. Neural Comput. Appl. 2019, 32,
9713–9729. [CrossRef]
23. Parray, I.R.; Khurana, S.S.; Kumar, M.; Altalbe, A.A. Time series data analysis of stock price movement using machine learning
techniques. Soft Comput. 2020, 24, 16509–16517. [CrossRef]
24. Duan, G.; Lin, M.; Wang, H.; Xu, Z. Deep Neural Networks for Stock Price Prediction. In Proceedings of the 14th International
Conference on Computer Research and Development (ICCRD), Shenzhen, China, 7–9 January 2022. [CrossRef]
25. Huang, J.; Liu, J. Using social media mining technology to improve stock price forecast accuracy. J. Forecast. 2019, 39, 104–116.
[CrossRef]
26. Iqbal, S.; Sha, F. Actor-Attention-Critic for Multi-Agent Reinforcement Learning. In Proceedings of the 36th International
Conference on Machine Learning, Long Beach, CA, USA, 10–15 June 2019; pp. 2961–2970.
27. Singh, V.; Chen, S.-S.; Singhania, M.; Nanavati, B.; Kar, A.K.; Gupta, A. How are reinforcement learning and deep learning
algorithms used for big data-based decision making in financial industries—A review and research agenda. Int. J. Inf. Manag.
Data Insights 2022, 2, 100094. [CrossRef]
28. Padakandla, S. A survey of reinforcement learning algorithms for dynamically varying environments. ACM Comput. Surv.
(CSUR) 2021, 54, 1–25. [CrossRef]
29. Silver, D.; Singh, S.; Precup, D.; Sutton, R.S. A reward is enough. Artif. Intell. 2021, 299, 103535. [CrossRef]
30. Kartal, B.; Hernandez-Leal, P.; Taylor, M.E. Terminal Prediction as an Auxiliary Task for Deep Reinforcement Learning. Proc.
AAAI Conf. Artif. Intell. Interact. Digit. Entertain. 2019, 15, 38–44. [CrossRef]
31. Zhang, Z.; Zohren, S.; Roberts, S. Deep Reinforcement Learning for Trading. J. Financ. Data Sci. 2020, 2, 25–40. [CrossRef]
32. Sewak, M. Mathematical and Algorithmic Understanding of Reinforcement Learning. In Deep Reinforcement Learning; Springer:
Cham, Switzerland, 2019; pp. 19–27.
33. Xiao, Y.; Lyu, X.; Amato, C. Local Advantage Actor-Critic for Robust Multi-Agent Deep Reinforcement Learning. In Proceedings
of the International Symposium on Multi-Robot and Multi-Agent Systems (MRS), Cambridge, UK, 4–5 November 2021. [CrossRef]
Appl. Syst. Innov. 2023, 6, 106 21 of 21
34. Ren, Y.; Duan, J.; Li, S.E.; Guan, Y.; Sun, Q. Improving Generalization of Reinforcement Learning with Minimax Distributional
Soft Actor-Critic. In Proceedings of the IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC), Rhodes,
Greece, 20–23 September 2020. [CrossRef]
35. Yang, H.; Liu, X.Y.; Zhong, S.; Walid, A. Deep reinforcement learning for automated stock trading: An ensemble strategy. In
Proceedings of the First ACM International Conference on AI in Finance (ICAIF), New York, NY, USA, 6 October 2020; pp. 1–8.
36. Zanette, A.; Wainwright, M.J.; Brunskill, E. Provable Benefits of Actor-Critic Methods for Offline Reinforcement Learning. Adv.
Neural Inf. Process. Syst. 2021, 34, 13626–13640.
37. Nguyen, N.D.; Nguyen, T.T.; Vamplew, P.; Dazeley, R.; Nahavandi, S. A Prioritized objective actor-critic method for deep
reinforcement learning. Neural Comput. Appl. 2021, 33, 10335–10349. [CrossRef]
38. Wang, C.; Sandas, P.; Beling, P. Improving Pairs Trading Strategies via Reinforcement Learning. In Proceedings of the 2021
International Conference on Applied Artificial Intelligence (ICAPAI), Halden, Norway, 19–21 May 2021. [CrossRef]
39. Huang, H.; Zhao, T. Stock Market Prediction by Daily News via Natural Language Processing and Machine Learning. In
Proceedings of the 2021 International Conference on Computer, Blockchain and Financial Development (CBFD), Nanjing, China,
23–25 April 2021. [CrossRef]
40. Gupta, R.; Chen, M. Sentiment Analysis for Stock Price Prediction. In Proceedings of the 2020 IEEE Conference on Multimedia
Information Processing and Retrieval (MIPR), Shenzhen, China, 6–8 August 2020. [CrossRef]
41. Huo, H.; Iwaihara, M. Utilizing BERT Pretrained Models with Various Fine-Tune Methods for Subjectivity Detection. Web Big
Data 2020, 4, 270–284. [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.