0% found this document useful (0 votes)
125 views7 pages

Stock Price Prediction Using Machine Learning Algorithms: ARIMA, LSTM & Linear Regression

ada

Uploaded by

techie
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
125 views7 pages

Stock Price Prediction Using Machine Learning Algorithms: ARIMA, LSTM & Linear Regression

ada

Uploaded by

techie
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 09 Issue: 03 | Mar 2022 www.irjet.net p-ISSN: 2395-0072

Stock Price Prediction using Machine Learning Algorithms: ARIMA,


LSTM & Linear Regression

Krushali Sohil Patel1, Udit Rajesh Prahladka1, Jaykumar Babulal Patel1, Yogita Shelar2

1Dept. of Information Technology, Atharva College of Engineering, Maharashtra, India


2Assistant Professor, Dept. of Information Technology, Atharva College of Engineering, Maharashtra, India
---------------------------------------------------------------------***---------------------------------------------------------------------
Abstract - This paper aims to promote the use of the that Neural Networks provides the ability to predict
ARIMA, LSTM & Linear Regression algorithm to predict market directions more accurately compared to other
stock prices of NASDAQ (American) and NSE (Indian) and to strategies. Vector Support Machines and Case Based
compare their accuracy. These are machine learning Consultations are also popular in stock market
algorithms used for historical stock 2 years ago and real- predictions. In addition, they found that capturing event
time stock prices. The NASDAQ stock data was downloaded information on a predictive model plays a very important
from the Yahoo Finance API and that the NSE stock was role in more accurate forecasting. The web provides up-to-
downloaded from the Alpha Vantage API. The full source date and up-to-date information about the stock market
code of the project was written via Python. It is thought that needed to deliver the highest accuracy of predictions and
the ARIMA and LSTM models are more compatible than the short-term forecasts [1].
Linear Regression model for the NASDAQ (American
Company) forecasting model. Although, in the NSE (Indian B. Predicting Stock Stocks Using Financial News Articles
Company) stocks, LSTM and Linear Regression appear to be
more efficient than ARIMA. M.I. Yasef Kaya and M. Elef Karshgil analyzed the
relationship between the content of financial news articles
Key Words: Machine Learning, ARIMA, LSTM, Linear and stock prices. Labeled news articles are positive or
Regression, Stock Market, Prediction, Stock Exchange, negative depending on their impact on the stock market.
Trading, Time Series, Historical Data, Python Instead of using one word as attributes, they use the
names of pairs as attributes. The word couple included a
1. INTRODUCTION combination of noun and verb. The SVM classifier was
trained with labeled articles to predict stock prices. [2].
Stock prices fluctuate greatly naturally. They vary based
C. Predicting Market Market Indicators Through the
on various factors such as previous prices, current market
Emotional Network
scenario, financial matters, competing companies etc. It is
important to have an accurate forecast of future trends in
Drs. Jay Joshi, Nisarg A Joshi in his career, used the neural
stock prices with wise investment decisions [1] [2].
artificial network (ANN) to predict stock prices in the
However, the volatile nature of stock prices makes it
respected Bombay Stock Exchange (BSE) Sensitive Index
difficult to measure accurately. Stock Market Prediction is
(Sensex) indexes. They performed experiments and case
an attempt to determine the future value of a company's
studies to compare neural network functionality with
stock [3]. NASDAQ stock stock prices were downloaded
random mobility and direct autoregressive models. They
from the Yahoo Finance API and those of the NSE stock
reported that the neural network exceeds the direct and
were downloaded from the Alpha Vantage API. This data
indirect travel models by all performance measurements
was previous processed and transferred to Machine
in both sample and out-of-sample predictions for BSE
Learning models. Finally, the results for each model are
Sensex daily return.[3]
shown.
D. Stock price forecast using the ARIMA model
2. LITERATURE REVIEW
Ayodele A. Adebiyi, Aderemi O. Adewumi and Charles K.
A. Machine learning strategies and application information Ayo used the ARIMA model to predict stock prices on data
for Stock Market forecasting information obtained from the New York Stock Exchange (NYSE) and
the Nigeria Stock Exchange (NSE). They used a data set
Paul D. Yoo, Maria H. Kim and Tony Jan compared and that included four components: open, low, near and high
examined some of the existing ML strategies used to price. In their work, they have taken the amount of closure
predict the stock market. After comparing easy retreats, as a predictive factor. The reason for this is that the closing
multivariate retreats, Neural Networks, Vector Support price is the most appropriate price at the end of the day.
Machines and Case Based Reasoning models concluded Show them that there is no correlation between
© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 2152
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 09 Issue: 03 | Mar 2022 www.irjet.net p-ISSN: 2395-0072

autocorrelation function (ACFs) and component d = At ARIMA, we equate and convert the related time
autocorrelation function (PACFs) using Q statistics and series into a standard time series by dividing. We use d to
integration sites. In addition, in static data, it is stabilized determine the number of different numbers.
with the help of various techniques. It was concluded near
the end of the study that the ARIMA model was very useful q = q is used to indicate a feature error. Part of the error is
for short-term prediction [4]. part of the unforgivable historical data for the general
price range
3. METHODOLOGY Autoregressive component: The independent AR model
relies on a combination of historical values. This
dependence is so great that it is seen in the reversal of the
old line that the number of parts of the Auto Regressive
has a direct dependence on the calculation of previous
times.

We use the Auto Regressive component if:

1) ACF graphs show the slope descending toward zero

2) The advantages of lag-1 are expressed in the ACF


framework of the timeline

3) The PACF numerical graph suddenly drops to zero

Moving Rates: Moving ratings are random jumps to data


that lead to more than two possible or non-sequential
events. These hops are used to clarify the calculated error
and to explain what part of the MA is left behind. The MA
model completely can justify and clarify this defect similar
to the descriptive slider method. We use the Moving
Averages component if:

a) Significant decrease in ACF is observed after just a few


delays

b) The model shows a negative Lag

c) The slope usually decreases downwards in the PACF


Fig-1: System Design Combined component: Combined component is only
started when real-time or historical series data is static or
A. Auto Regressive Integrated Moving Average (ARIMA) seasonal. The number of times that a series of time needs
to be divided and calculated to make it a static type of the
ARIMA Complete Form Auto Regressive Integrated Moving common component term.
Average. There are two types of ARIMA models that can be
used in prediction: ARIMA seasonal and non-seasonal
ARIMA. In our case, the off-season ARIMA model was used
due to the nature of the stock data. ARIMA is actually an
example of models based on its past values, so that this
relationship can be used to predict future values. The
ARIMA model takes three main parameters, described as
follows:

p = Number of times left. For example, if p = 4, we use the


last four times of our data in the default calculation. p
enables us to adjust the appropriate timeline line.
Fig-2: ARIMA model [5]

© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 2153
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 09 Issue: 03 | Mar 2022 www.irjet.net p-ISSN: 2395-0072

B. Long Short Term Memory (LSTM) (B) and is commonly known as a coefficient. In addition to
this, another coefficient is added to give the line additional
The Long Short-Term Memory network is a RNN that is degrees of freedom. This additional term is often referred
trained using Backpropagation. It takes care of the to as the bias coefficient. Typically, the bias coefficient is
disappearing gradient problem encountered earlier. LSTM calculated or otherwise measured by finding the distance
networks have their own memory and so they prove to be of our mathematical points from the most relevant line.
efficient in creating large RNNs and handle time specific This can be displayed as a straight line at right angles to
scheduling problems. The memory blocks in LSTM the vertex and calculated using the line bias. Statistically, a
network are connected through recurrent layers rather line tangent is used to measure its proximity to the
than having neurons. relative linear Regression.

A block has many basic and a few complex components A problem model model in Linear Regression will be
that make it smarter as compared to the standard neuron. provided as follows:
It consists of many gates that coordinate relative input
functions with output functions. Whenever a block y = B0 + Bt * xt + Et
receives an input, a gate is triggered which takes decision
about whether or no to pass the block forward for further This same line is also called a plane or plane when we are
processing. dealing with more than one input. This is often the case
with high-volume data. The Linear Regression model is
The standard LSTM block, in its simplest form, consists of therefore represented by the mathematical and
an input gate, an output gate, a cell and a forget gate. introverted values measured by the specific coefficients.
However, before using this line number, we are faced with
1) Cell: It is used to remember the values over arbitrary a number of issues. These issues often increase the
time intervals. complexity of the model which makes accurate estimates
difficult. This complexity is often discussed in terms of the
2) Input Gate: It decides which information to keep in the number of dependent and independent factors.
cell.
The effect of input variables on the model is effectively
3) Output Gate: It is used to decide which part of cell state disrupted when a certain coefficient becomes zero.
should be given as an output. Therefore, due to the empty values, the accuracy is
reduced in the estimates made from the model (0 * x = 0).
4) Forget Gate: It is used to decide which information to When we analyze adaptive techniques that can change the
throw away from the cell. learning algorithm to reduce the complexity of models by
emphasizing the importance of the perfect coefficient,
which drives some to zero, this exact position is important.

Fig-3: LSTM model [6]

C. Line Decline
Fig-4: Linear Regression [7]
In the Line Redistribution model, the calculation line
calculation is used to combine a set of input data values (x) Twitter Sentiment Analysis-
into a predicted output data set of input values (y). Both
the input and output variables and values are considered Social media data has high impact today than ever, it can
integers. The unique number given by the Line Rotation aid in predicting the trend of the stock market. The
equation is represented using the Greek capital letter Beta method involves collecting news and social media data

© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 2154
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 09 Issue: 03 | Mar 2022 www.irjet.net p-ISSN: 2395-0072

and extracting sentiments expressed by individual. Then In regression, according to the input given, a curve is
the correlation between the sentiments and the stock plotted in a graph. The curve represents the variations in
values is analyzed. The learned model can then be used to the stock prices over the years. Here, the X-axis will
make future predictions about stock values contain the date of the stock and the Y-axis will contain the
closing price of a stock.

Training and Testing Stage

1.Shifted values of the label attribute by the percentage


you want to predict.
2.Dataframe format is converted to Numpy array format.

3.All NaN data values removed before feeding it to the


Fig-5: general plot that illustrates how social media and classifier.
financial news affect stock market trend
4.The data is scaled such that for any value X.
4. DATA-SET AND RESOURCES
5.The data is split into test data and train data respective
to its type i.e. label and feature.
Data Analysis Stage:
Dataset:
Results
Yahoo finance provides an easy way to fetch any historical
stock values of a company with the help of the ticker-name
programmatically using in-built API's. It provides a feature
to get prices with initial-date and final date provided.

The first step in building a machine learning model is to


obtain an optimal dataset.

The open soured data which is available on the internet


consists of many discrepancies like having missing data,
having repeated rows of the same data, data being
unstructured etc.
5. RESULTS AND ANALYSIS
Before feeding the data to the machine learning model, the
data needs to be modified of preprocessed so that the A. Downloading and Viewing NASDAQ Data
model is able to deliver the results which are as accurate NASDAQ (American Company) stock data for the past 2
as possible. years and real-time prices are downloaded from the Yahoo
Finance API and displayed via python.
The main attributes that are found in financial datasets
(historical data about stock prices of a particular
company) are as follows:

1. Date of that particular stock price


2. Opening stock price
3. High stock price (highest value of that stock during that
day)
4. Low stock price (lowest value of that stock price during
that day)
5. Closing stock price
6. Volume of stocks traded

Of all these above parameters, the closing price is


predominantly used as an attribute to feed the model.
Using this single value, the future stock price of a company
can be predicted using various regression models
available in machine learning. Fig 5.1 Historic Stock Data for NASDAQ (AAPL) stock

© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 2155
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 09 Issue: 03 | Mar 2022 www.irjet.net p-ISSN: 2395-0072

Fig 5.6: LSTM prediction and Root Mean Squared Error


Fig 5.2 Real Time Stock Data for NASDAQ (AAPL) stock (RMSE) for NASDAQ (AAPL) stock
B. ARIMA stock forecast NASDAQ D. Linear Regression stock forecast for NASDAQ
The ARIMA model was used in test set data (20% of all Lineback Model was used for test set data (20% of all
data). The predicted values are compared to real values data). The predicted values are compared to real values
and the results are reflected in python. and the results are reflected in python

Fig 5.3: ARIMA forecast for NASDAQ (AAPL) stock


Fig 5.7: Linear Regression forecast for NASDAQ (AAPL)
stock

Fig 5.4: ARIMA prediction and Root Mean Squared Error


(RMSE) for NASDAQ (AAPL) stock
Fig 5.8: Linear Regression prediction and Root Mean
Squared Error (RMSE) for NASDAQ (AAPL) stock
C. LSTM stock forecast NASDAQ
E. Downloading and Viewing of NSE Data
The LSTM model was used for test set data (20% of all
data). The predicted values are compared to real values
NSE (Indian Company) stock data for the past 2 years and
and the results are reflected in python
real-time prices are downloaded from the Alpha Vantage
API and displayed in python.

Fig 5.5: LSTM forecast for NASDAQ (AAPL) stock


Fig 5.9 Historic Stock Data for NSE (HDFCBANK) stock

© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 2156
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 09 Issue: 03 | Mar 2022 www.irjet.net p-ISSN: 2395-0072

Fig 5.10 Real Time Stock Data for NSE (HDFCBANK) stock Fig 5.14: LSTM prediction and Root Mean Squared Error
(RMSE) for NSE (HDFCBANK) stock
F. ARIMA forecast for NSE stock
H. Linear Regression is a stock forecast for the NSE
The ARIMA model was used in test set data (20% of all
data). The predicted values are compared to real values Lineback Model was used for test set data (20% of all
and the results are reflected in python. data). The predicted values are compared to real values
and the results are reflected in python.

Fig 515: Linear Regression forecast for NSE (HDFCBANK)


Fig 5.11: ARIMA forecast for NSE (HDFCBANK) stock stock

Fig 5.16: Linear Regression prediction and Root Mean


Fig 5.12: ARIMA prediction and Root Mean Squared Error Squared Error (RMSE) for NSE (HDFCBANK) stock
(RMSE) for NSE (HDFCBANK) stock
I. Performance Comparison of the Models used
G. The forecast of LSTM stock NSE
The Root Mean Squared Error (RMSE) for ARIMA, LSTM
The LSTM model was used for test set data (20% of all and Linear Regression models for NASDAQ and NSE stocks
data). The predicted values are compared to real values are tabulated and compared below. It is evident that the
and the results are reflected in python ARIMA and LSTM models have a lower error rate than the
Linear Regression stock forecast model NASDAQ
(American Company). Although, in the NSE (Indian
Company) stocks, LSTM and Linear Regression have a
lower error rate than ARIMA.

RMSE ARIMA LSTM Linear


Regression

NASDAQ 4.65 10.04 12.01


(American) stocks

NSE (Indian) 257.16 141.62 185.09


stocks

Fig 5.13: LSTM forecast for NSE (HDFCBANK) stock


Table 5.17: Comparison of Model Performance

© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 2157
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 09 Issue: 03 | Mar 2022 www.irjet.net p-ISSN: 2395-0072

C. CLASSIFICATION MODELS COMPARISON [2] M. İ. Y. Kaya and M. E. Karsligil, "Stock price prediction
using financial news articles," 2010 2nd IEEE
The three models in consideration, namely Arima, Lstm, International Conference on Information and
Linear regression, were then compared based on multiple Financial Engineering, Chongqing, 2010, pp. 478-482.
areas of attention like the number of parameters and a [3] Hedayati, Amin & Moghaddam, Moein & Esfandyari,
comparison of trainable and non-trainable parameters, in Morteza. (2016). Stock market index prediction using
Table-3, the rmse value and loss graphs of each of the artificial neural network:. Journal of Economics,
models were investigated along with the ease of learning Finance and Administrative Science.
taken into account, and finally, the classification results 10.1016/j.jefas.2016.07.002.
were examined individually for all evaluation parameters,
[4] Ayodele A. Adebiyi., Aderemi O. Adewumi, “Stock Price
in table-1.
Prediction Using the ARIMA Model”, IJSST, Volume-15,
Issue-4. [Online]. Available :https://ijssst.info/Vol-
Table-1: Classification Results of All Models
15/No-4/data/4923a105.pdf
[5] Chen, Peiyuan. (2020). STOCHASTIC MODELING AND
Algorithms RMSE Predicted Present value
Value value
ANALYSIS OF POWER SYSTEM WITH RENEWABLE
GENERATION, ResearchGate Publication
[6] Angle Qian (2018), Structure of LSTM RNNs, Stack
ARIMA 3.06 175.57 177.77 Exchange [Online]. Available:
https://ai.stackexchange.com/questions/6961/struct
LSTM 6.66 166.61 177.77
ure-of-lstm-rnns
[7] Rob J Hyndman and George Athanasopoulos,
LINEAR 9.11 173.71 177.77
Forecasting: Principles and Practice, OTexts, Kindle
REGRESSION Edition. [Online]. Available: https://otexts.com/fpp2/

8. CONCLUSION

The proposed algorithms work best with NASDAQ stock


market data and NSE stocks. From the headings and tables
presented above, it appears that although the model's
predictions are slightly deviant from real prices, they offer
a good measure of future trends in stock prices. This
balance helps to obtain important information about
stocks, thus facilitating wise investment decisions. It is
noted that the ARIMA and LSTM models are more
compatible with the Linear Regression model for the
NASDAQ (US Company) forecast. Although, in the NSE
(Indian Company) stocks, LSTM and Linear Regression
appear to be more efficient than ARIMA. This also
supports the argument that different models and
algorithms react differently to stock of different indicators.
Therefore, one should choose models and algorithms
depending on the scale and indicators of their stock.

9. REFERENCES

[1] P. D. Yoo, M. H. Kim and T. Jan, "Machine Learning


Techniques and Use of Event Information for Stock
Market Prediction: A Survey and Evaluation,"
International Conference on Computational
Intelligence for Modeling, Control and Automation and
International Conference on Intelligent Agents, Web
Technologies and Internet Commerce (CIMCA-IAWTC
06), Vienna, 2005, pp. 835-841.

© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 2158

You might also like