0% found this document useful (0 votes)
90 views5 pages

Research Paper SMP

This document proposes using machine learning algorithms to predict stock market prices and determine whether to buy or sell stocks on a given day. Specifically, it uses an ensemble model combining Random Forest, K-Nearest Neighbors, and Gradient Boosting classifiers to make buy/sell predictions and a separate Long Short-Term Memory model to predict closing stock prices into the future. The models analyze historical stock data and various technical indicators to identify patterns and trends that can help investors make more informed investment decisions.

Uploaded by

DevashishGupta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
90 views5 pages

Research Paper SMP

This document proposes using machine learning algorithms to predict stock market prices and determine whether to buy or sell stocks on a given day. Specifically, it uses an ensemble model combining Random Forest, K-Nearest Neighbors, and Gradient Boosting classifiers to make buy/sell predictions and a separate Long Short-Term Memory model to predict closing stock prices into the future. The models analyze historical stock data and various technical indicators to identify patterns and trends that can help investors make more informed investment decisions.

Uploaded by

DevashishGupta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Stock Market Prediction Using ML Algorithms

Shivam Singhal
SRM Institute of Science and Technology

Manthan Solanki
SRM Institute of Science and Technology

S. Sharanya
SRM Institute of Science and Technology

Abstract – The stock market has been a topic of great Machine Learning algorithms find its application in almos t all
deliberation due to its diverse and convoluted nature. the fields right from failure prediction in machines till
Today’s financial investors are plagued by sudden and forecasting economic growth [8]. This work deploys a hybrid
notable fluctuations in the market. They cannot easily model that integrates the prowess of Long Short Term
comprehend as to which stocks they should buy or sell in
order to get profitable outcomes. However, with rapi d Memory (LSTM) and ensemble model to forecast the trends
advancements in machine learning, stock market in stock market.
prediction has become plausible. This paper proposes a
stock price prediction system that utilizes an ensemble 1.2 Definitions
model coupled with a separate LSTM model to make
predictions. The ensemble model makes use of Random Relative Strength Index (RSI): It is a momentum oscillator
Forest (RF), K-Nearest Neighbors (KNN), and Gradient that measures the magnitude of recent price movements. It
Boosting (GB) classifiers to determine whether an oscillates between 0 and 100 and examines the overvalued
investor should buy or sell stocks on a particular day. A (above 70) or undervalued (below 30) conditions of the stock
separate LSTM model analyzes the historical stock data prices.
to predict the closing stock prices in the future. The
combined model assists the investors to make the buy/sell Moving Average Convergence Divergence (MACD):
call on a particular day with an approximation of the MACD [5] is a momentum oscillator that evaluates a trend in
closing prices for better and safer investments.
stock prices. It determines the relation between two trend-
Keywords: - stock market prediction, Random Forest, K- following indicators, the moving averages (MA), by
Nearest Neighbors, Gradient Boosting, ensemble model, subtracting the higher MA from the lower MA.
Long Short Term Memory.
Stochastic Oscillator (STOCH): It [9] is a momentum
I. INTRODUCTION oscillator that depicts the relative location of the closing price
of a stock to its range of prices over a specified period. It is
1.1. Background used to identify overbought and oversold trading signals.
Investing in the stock market has been a lucrative temptation Accumulation/Distribution Line (ADL): It is a cumulative
for both novice and expert investors alike for the past few volume-based indicator that assesses the money flow into and
decades. However, its dynamic and complex nature makes it out of a stock. It determines whether the market trend is
intricately perplexing for investors to make the right choice for inclined towards accumulation or distribution and measures
remunerative trading. Such a predicament divides the market the strength of a trend.
experts on the possibility of making calculated predictions for
the right investments at the right times. Some believe that as Average True Range (ATR): It is an indicator used for
per the efficient-market hypothesis theory, the stock market measuring the price volatility of commodities. It also
reacts by assimilating newly available information. Therefore, accounts for any gaps in the price movement.
it is not possible to make accurate predictions without Market Momentum (MOM): It is a market indicator that
possessing any prior future information of the stocks.
reflects the comparison between the current market price and
However, other analysts argue that even though movements the price ‘n’ periods ago.
might seem random, they actually are correlated, and several
statistical indicators can help establish a pattern. Based on Money Flow Index (MFI): It is a market indicator equivalent
historical stock market data, some trends can be discerned to a volume-weighted RSI. It examines overbought and
about the behaviour of stocks. This can be used to make close oversold trade signals on the basis of both magnitude and
to precise predictions. Accurate predictions using technology volume of prices.
provide investors with an opportunity to make steady financial
Rate of Change (ROC): It [5] is a simple momentum
gains. It also assists researchers in determining how different
oscillator that computes the percentage change in price from
statistical indicators together can be used to improve accuracy.
the current price to a price ‘n’ periods ago. The oscillator for the next set of outputs. Hence, it can handle long sequences
forms a graph that oscillates above (positive change) and of data better than other RNNs, which can store only a short
below (negative change) the zero-line. Overbought and series of data in memory. Thus, it is a much more suited neural
oversold zones can be adjusted as per the market conditions. network for the prediction of time-series data as compared to
others.
On Balance Flow (OBV): It is a momentum indicator that
predicts stock price changes based on the flow of volume. It II. LITERATURE SURVEY
measures the buying and selling pressure by summing
volume on up days and subtracting it on down days. Title Merits Demerits
Commodity Channel Index (CCI): It is a momentum Deep learning with Provides Requires more
oscillator that evaluates price trends and overvalued/ long short-term efficient subtle patterns
undervalued conditions. It computes the current price level memory networks predictions for of LSTM neural
relative to that of the historical average price level. for financial large-scale networks.
market predictions, financial
Ease of Movement (EMV): It [10] is an indicator that 2018 [1] markets.
quantifies the price-volume relationship to determine the ease
at which prices move upwards or downwards. Global stock Network Insufficient for
market investment indicators measuring
Vortex Indicator (VI): It is an indicator comprising of two strategies based on provide better latent factors in
oscillator lines – an uptrend (VI+) line to capture positive financial network results for complex
trends and a downtrend (VI-) line to capture negative trends. indicators using global markets. financial
It is used to examine continuations and changes in trends. machine learning markets.
techniques, 2019
Random Forest (RF): It [6] is an ensemble machine learning [2]
technique based on the bagging method. RF combines
Stock market For index The relative
multiple decision trees to provide the final output. The
index prediction predictions, the errors of high,
aggregated result of multiple uncorrelated decision trees using deep neural relative errors of low, and close
delivers more accurate results than the individual constituent network ensemble, high and low predictions are
trees. 2017 [3] are less than a higher when the
percent. market index
K-Nearest Neighbours (KNN): It is a supervised ML fluctuates
algorithm that classifies new data into different categories fiercely.
based on the similarity of available data. Thus, whenever new Predicting and At 85% At 90%
data arrives, it is placed into a category that is similar to itself. Beating the Stock confidence confidence
Market level, ML level, technical
Gradient Boosting (GBM): It is an ML boosting algorithm with Machine outperformed analysis
[7] that derives the result by ensembling multiple weak Learning and technical outperformed
learners to form a strong learner. With regression trees as the Technical analysis during ML during
weak learners, each subsequent tree in the series is built on Analysis, 2018 [4] up-market. down-market.
the residual errors of the predecessor trees thereby,
minimizing the loss function.
III. PROPOSED WORK
Voting classifier: It [3] [8] is a classification technique that
utilizes an ensemble of multiple classifiers. It makes The system proposed in this paper emphasises on using an
predictions based on their highest probability of the chosen ensemble model to make accurate predictions. While an
class as the output. The first type of voting is hard voting, individual algorithm-based model might have higher
where the output is simply the mode of individual predictions accuracy, however, an ensemble model boosts the overall
of the constituent classifiers of an ensemble. The other type of confidence and reliability of the system. The proposed system
voting is soft voting, where the class with the greatest sum of has the following characteristics:
weighted probabilities is delivered as the output.
A. Buy/sell decision for stocks
Long Short-Term Memory (LSTM): It [6] is a Recurrent
Neural Network (RNN) that showcases the ability to A combination of Random Forest classifier, K-Nearest
demarcate between recent and relatively older examples. Neighbours classifier, and Gradient Boosting classifier forms
LSTM assigns the former with higher weights and the latter an ensemble model [3] which predicts whether an investor
with lower weights while forgetting data that seem irrelevant should buy/sell stocks on a particular day.
B. Closing price prediction

An LSTM model [12] predicts the closing prices of the stock


using historical datasets. The general stock direction trends
can also be extracted from the dataset to predict the future
behaviour of the stocks.

IV. IMPLEMENTATION Fig 2. Data from ‘close’ Fig 3. Data from ‘close’
column before smoothing column after smoothing
4.1 Methodology
4.1.1. Data Processing and Feature Engineering

In the preliminary step, the dataset obtained is plotted (Fig.


2), and a list of technical indicators derived from the FINTA
library is determined. Since stock prices are found to be
affected by a myriad of different factors, the proposed model
takes into account multiple factors to yield better predictions.
The various technical indicators taken as input include RSI,
MACD, STOCH, ADL, ATR, MOM, MFI, ROC, OBV, CCI,
EMV, and VI.

Before deriving features from the indicators, the dataset is


exponentially smoothed (Fig. 3) to remove the noise, which
can become problematic for the model while predicting
trends. The processed stock data is utilized for computing the
various technical indicators. These technical indicators
ensure that different aspects such as price changes, volume
variations, price volatility, stock trends, and gaps in price
movements, to name a few, are incorporated as essential
parameters of the model. In parallel, the exponential moving
averages (EMA) [11] at different average lengths along with
Fig 1. Flow of the proposed system a normalized volume value are calculated. The EMA is
calculated using the following formula:
The complete flow of processes used to build and deploy the
system is depicted in the flowchart (Fig. 1). The utilization of 𝐸𝑀𝐴𝑡𝑜𝑑𝑎𝑦 = 𝛼𝑃𝑟𝑖𝑐𝑒𝑡𝑜𝑑𝑎𝑦 + (1 − 𝛼)𝐸𝑀𝐴𝑡𝑜𝑑𝑎𝑦−1
three different algorithms to build the ensemble model 2
amalgamated with a separate LSTM model raises the overall 𝛼=
(𝑛 + 1)
accuracy and confidence of the system.
Where,
The proposed system is deployed under a series of modules,
namely – data pre-processing, feature engineering, ensemble o EMA today = EMA of today
model building to predict the final call, and finally, the LSTM o Pricetoday = Price of today
model building to predict the closing prices of stocks. The o EMA today -1 = EMA of yesterday
ensemble model outputs the final buy/sell call to be made for
a stock by the investor. A sample testing dataset’s results are o α = Smoothing factor
represented in a confusion matrix that delineates the model’s o n = Number of days
accuracy in terms of the buy/sell calls made. The final output For example, a 21-day smoothing factor will be computed
of the LSTM model is a graphical representation of the equal to 0.0909 ≈ 9.09%.
predicted, and actual closing prices with the root mean
squared error (RMSE) [12] analysis that depicts the accuracy The next step is to generate the truth values by observing the
of the model. The model also delivers an approximation of prices window rows ahead, examining whether the prices
the future stock behaviour to further aid the investors in increased or decreased. An increase in prices yields the truth
making the right decisions . value as buy (1), whereas a decrease in prices yields sell (0)
as the truth value. This step completes the data pre-processing
and feature engineering modules.
4.1.2. Ensemble Model Building RF Accuracy = 71.54%
KNN Accuracy = 67.33%
The Random Forest, K-Nearest Neighbours, and Gradient GBM Accuracy = 70.26%
Boosting algorithms are combined to form an ensemble ENSEMBLE Accuracy = 71.59%
model. A voting classifier is created which utilizes soft
voting, i.e., it predicts the buy/sell decisions based on the
A sample testing data of 40-days is utilized to measure the
average of the predicted results of all the constituent
accuracy of the ensemble model. The results are shown in the
classifiers used in the ensemble model. In order to avoid a
confusion matrix (Fig. 4). The accuracy obtained for the
look-ahead bias, it is essential to perform cross-validation. By
sample testing dataset is 95%. For two days, the predicted
iterating over the data with multiple evenly-sized chunks, the
results were contradictory to the actual results, thereby
data is partitioned. The partitioned data is bifurcated into
rendering the misclassification rate as 5%.
testing and training data. The look-ahead bias can be easily
avoided by not shuffling or randomizing the data in the train
5.2 LSTM Model
and test split function. For the last step, the models are
incorporated for the cross-validation, and the final results are
produced.

4.1.3. LSTM Model Building

A simple buy/sell call must not be the only basis of judgement


available for the investors. Any investor must have a rough
estimate of the closing prices so as to be prompted to make
the right call. Accordingly, the system utilizes an LSTM
model further to predict the closing prices and the general
stock direction. Fig 5. Closing Price Prediction using LSTM
Utilizing the same technical indicators, the date and close The graph (Fig. 5) depicts the actual closing prices versus
columns of the dataset are first filtered and subsequently the predicted closing prices. It is observed that the predicted
normalized using a min-max scaler. The LSTM model is built and actual values overlap at various points and intervals of
and trained on the dataset. The Adam optimization algorithm time. Although there are intervals where the decline in
[3] is used along with the mean squared error as the loss actual prices is not precisely depicted by the predicted
function while compiling the model. Once the model is built, prices, the stock direction clearly indicates the eventual
a testing dataset is taken to test the model for predicting decline. To evaluate the difference, RMSE analysis is done,
closing prices. Lastly, an RMSE analysis is done to check for which yields a value of 4.12. The low value obtained depicts
the accuracy of the predicted prices, and the results are plotted that the model is a good fit for making predictions.
graphically along with the actual prices.
The separation of the ensemble model and the LSTM model
V. RESULTS AND DISCUSSION ensures that any inaccuracies in the predicted closing prices
do not affect the buy/sell decision predicted by the ensemble
5.1 Ensemble Model model. Another major advantage of the system lies in the
reinforced stock price trend prediction. Both the ensemble
and LSTM models predict the price trend of stocks, thereby
lessening the risks associated with sudden fluctuations in the
market. By observing the results of both the models, the
investor can easily discern the future behaviour of the stocks
and make safer investments.

VI. CONCLUSION

The ensemble model predicts the final buy/sell call with an


accuracy of 71.59%. Coupled with an LSTM model that
predicts the closing prices of the stock, the entire system
ensures reliability, confidence, and robustness. As a future
Fig 4. Confusion matrix prospect, sentimental analysis of investors can be taken into
account to bolster the model’s accuracy further.
REFERENCES
[1] Fischer, T homas & Krauss, Christopher. (2017). Deep learning
with long short-term memory networks for financial market
predictions. European Journal of Operational Research. 270.
10.1016/j.ejor.2017.11.054.

[2] Lee, T ae & Cho, Joon & Kwon, Deuk & Sohn, So. (2018).
Global stock market investment strategies based on financial
network indicators using machine learning techniques. Expert
Systems with Applications. 117. 10.1016/j.eswa.2018.09.005.

[3] Yang, Bing & Gong, Zi-Jia & Yang, Wenqi. (2017). Stock
market index prediction using deep neural network ensemble.
3882-3887. 10.23919/ChiCC.2017.8027964.

[4] Macchiarulo, A. (2018). Predicting and Beating the Stock


Market with Machine Learning and T echnical Analysis. T he
Journal of Internet Banking and Commerce, 23, 1-22.

[5] Aguirre, Alberto & Medina, Ricardo & Méndez, Néstor.


(2020). Machine learning applied in the stock market through
the Moving Average Convergence Divergence (MACD)
indicator. Investment Management and Financial Innovat ions.
17. 44-60. 10.21511/imfi.17(4).2020.05.

[6] Pawar, Kriti & Jalem, Raj & T iwari, Vivek. (2019). Stock
Market Price Prediction Using LST M RNN: Proceedings of
ICET EAS 2018. 10.1007/978-981-13-2285-3_58.

[7] Momin, Faisal & Patel, Sunny & Shinde, Kuldeep & Sah ane,
Prof & Syed, Habeebullah Hussaini. (2020). Stock Market
Prediction System Using Machine Learning Approach . SSRN
Electronic Journal. 7. 190-194.

[8] S Sharanya, Revathi Venkataraman, G Murali (2020). Analysis


Of Machine Learning Based Fault Diagnosis Approaches In
Mechanical And Electrical Components. International Journal
of Advanced Research in Engineering and T echnology
(IJARET ). Volume 11, Issue 10, October 2020, pp.80 -94,

[9] Abraham, Cerene & Elayidom, M.Sudheep &


Santhanakrishnan, T .. (2019). Analysis and Design of an
Efficient T emporal Data Mining Model for the Indian Stock
Market: Proceedings of IEMIS 2018, Volume 2. 10.1007/978-
981-13-1498-8_54.

[10] Hu, Hongping & T ang, Li & Zhang, Shuhua & Wang, Haiyan.
(2018). Predicting the Direction of Stock Markets Using
Optimized Neural Networks with Google T rends.
Neurocomputing. 285. 10.1016/j.neucom.2018.01.038.

[11] Shastri, Malav & Roy, Sudipta & Mittal, Mamta. (2018). Stock
Price Prediction using Artificial Neural Model: An Application
of Big Data. ICST T ransactions on Scalable Information
Systems. 6. 156085. 10.4108/eai.19-12-2018.156085.

[12] Yu, Pengfei & Yan, Xuesong. (2020). Stock price prediction
based on deep neural networks. Neural Computing and
Applications. 32. 10.1007/s00521-019-04212-x.

You might also like