0% found this document useful (0 votes)
10 views7 pages

Resarch Paper Final

Uploaded by

rosyfin77
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views7 pages

Resarch Paper Final

Uploaded by

rosyfin77
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

A Survey of Machine Learning Stock Market Prediction Studies

Sarthak Chaturvedi Vipin Deval


Shikhar Pandav Assistant Professor
Sanath Mittal Department of CSE
B.Tech Scholar KIET Group of Institutions
Department of CSE Ghaziabad, India
KIET Group of Institutions
Ghaziabad, India
Abstract: Discussion

Stock market prediction has long been a topic of intense interest The results of our study highlight both the potential and
and research due to its potential for significant financial gain and limitations of using ANNs for stock market prediction. The
economic impact. In this study, we present a stock market modest accuracy achieved suggests that ANNs can identify
prediction model developed using Artificial Neural Networks certain trends but are not sufficient on their own to make highly
(ANN), leveraging the Scikit-learn library and financial data from reliable predictions. This outcome aligns with existing literature,
the yfinance API. The primary objective of this research is to which often reports challenges in achieving high predictive
evaluate the effectiveness of ANNs in forecasting stock prices accuracy in financial forecasting.
and to assess the model's predictive accuracy. Future Work

To enhance the predictive capability of our model, future


Introduction research will focus on several areas. First, integrating additional
data sources, such as macroeconomic indicators, sentiment
Stock market prediction involves forecasting the future prices of analysis from news articles, and social media trends, could
stocks based on historical data and various analytical provide a more comprehensive view of the factors influencing
techniques. The inherent volatility and complexity of financial stock prices. Second, experimenting with alternative machine
markets pose significant challenges to accurate prediction. learning techniques, such as ensemble methods and recurrent
Traditional statistical methods often fall short in capturing the neural networks (RNNs), may yield better performance. Lastly,
nonlinear patterns present in stock market data, leading incorporating advanced feature engineering and optimization
researchers to explore advanced machine learning techniques techniques could further refine the model's accuracy and
such as ANNs. robustness.

Methodology Conclusion

In this study, we employed an ANN due to its ability to model This research demonstrates the feasibility of using ANNs for
complex relationships and its robustness in handling large stock market prediction but also underscores the complexities
datasets. We sourced historical stock price data from the involved in achieving high accuracy. While our model achieved a
yfinance API, which provides comprehensive and up-to-date 58% accuracy rate, indicating some predictive capability, there is
financial information. The dataset included daily closing prices, significant potential for improvement. By exploring additional
trading volumes, and other relevant financial indicators for a data sources and advanced machine learning techniques, future
selection of stocks over a specified period. work aims to develop more accurate and reliable stock market
prediction models. This ongoing research has the potential to
Data Preprocessing contribute valuable insights to the field of financial forecasting
Data preprocessing is a critical step in the development of a and investment strategy development.
reliable prediction model. We first cleaned the dataset by
handling missing values and removing outliers. Feature scaling
was applied to normalize the data, ensuring that the ANN could 1. Introduction
process the inputs efficiently. Additionally, we created lagged
Stock market prediction has long been a focal point for
features to capture temporal dependencies, which are crucial for
researchers, financial analysts, and investors due to its profound
time-series forecasting.
impact on financial decision-making and economic strategy. The
Model Architecture ability to predict future stock prices can lead to significant
financial gains and provide a strategic edge in the highly
The ANN model was implemented using the Scikit-learn library, competitive financial markets.However, predicting stock prices is
a popular Python toolkit for machine learning. Our neural inherently challenging due to the complex, dynamic, and often
network comprised an input layer, multiple hidden layers, and an chaotic nature of financial markets.
output layer. The architecture was designed to balance
complexity and computational efficiency. We experimented with
various configurations of hidden layers and neurons to identify Historically, various methods have been employed to forecast
the optimal structure for our prediction task. stock prices, ranging from traditional statistical techniques to
more recent advances in machine learning and artificial
Training and Evaluation
intelligence (AI). Traditional methods, such as linear regression,
The model was trained using a backpropagation algorithm, with autoregressive integrated moving average (ARIMA) models, and
the dataset split into training and testing sets to evaluate other time-series analysis techniques, rely heavily on the
performance. We used mean squared error (MSE) as the loss assumption that past price movements and patterns can be used
function and applied early stopping to prevent overfitting. After to predict future prices. While these methods can capture linear
extensive training, the model's performance was assessed relationships and trends, they often fall short when it comes to
based on its accuracy in predicting stock prices. modeling the nonlinear and intricate patterns that characterize
financial market data.
Results

The ANN model achieved an accuracy rate of 58% on the test


dataset. While this accuracy is modest, it underscores the In contrast, machine learning techniques, particularly Artificial
difficulties inherent in stock market prediction. Financial markets Neural Networks (ANNs), offer a powerful alternative due to their
are influenced by a myriad of factors, many of which are ability to learn and model complex, nonlinear relationships within
unpredictable and not captured in historical data alone. The 58% large datasets. ANNs are computational models inspired by the
accuracy indicates that while the model can capture some human brain, consisting of interconnected processing nodes
patterns, there is substantial room for improvement. (neurons) organized in layers. These networks can automatically
adjust their parameters based on the input data, enabling them
to capture intricate patterns that are not easily discernible exploring alternative machine learning techniques. Future
through traditional methods.[1] research will focus on incorporating macroeconomic indicators,
sentiment analysis from news and social media, and advanced
feature engineering to enhance the model's performance.
This study explores the application of ANNs for stock market
prediction, leveraging the capabilities of the Scikit-learn library
In conclusion, this study contributes to the ongoing exploration of
and financial data sourced from the yfinance API. The Scikit-
machine learning applications in financial forecasting. By
learn library is a widely-used machine learning toolkit in Python,
demonstrating the feasibility of using ANNs for stock market
offering a range of algorithms and tools for data analysis and
prediction and identifying areas for improvement, we provide
model building. The yfinance API provides a convenient and
valuable insights for future research and the development of
comprehensive source of financial data, including historical stock
more accurate and reliable prediction models. The ultimate goal
prices, trading volumes, and various financial indicators[15].
is to advance the field of financial forecasting and support more
informed investment decisions.

The primary objective of this research is to evaluate the


effectiveness of ANNs in predicting stock prices and to assess
the model's accuracy. Stock market prediction is a particularly 2. Literature Review
challenging task due to the numerous factors that influence stock
prices, including economic indicators, market sentiment, political There have been two vital indicators in the literature for stock
events, and company-specific news. These factors often interact market rate forecasting. They are fundamental and technical
in complex ways, creating a high level of volatility and evaluation, each is used for researching the stock market.
unpredictability in the markets[8].
2.1 Methods of Prediction

Presented the recent methods for the prediction of the stock


Data preprocessing is a crucial step in developing an effective market and gave a comparative analysis of all these Techniques.
prediction model. Financial data often contain noise, missing Major prediction techniques such as data mining, machine
values, and outliers, which can adversely affect model learning and deep learning techniques are used to estimate
performance. In this study, we undertake comprehensive data future stock prices based on these techniques and their
cleaning and preprocessing steps, including handling missing advantages and disadvantages [7]-
values, removing outliers, and normalizing the data to ensure
efficient processing by the ANN. Additionally, we create lagged 2.1.1 Hidden Markov Model
features to capture temporal dependencies, which are essential
2.1.2 ARIMA Model
for time-series forecasting.
2.1.3 Holt-Winters

The ANN model is implemented with an architecture designed to 2.1.4 Artificial Neural Network (ANN)
balance complexity and computational efficiency. The network
2.1.5 Recurrent Neural Networks (RNN)
consists of an input layer, multiple hidden layers, and an output
layer. The hidden layers enable the model to learn hierarchical 2.1.6. Time Series Linear Model (TSLM)
representations of the input data, capturing both simple and
complex patterns. We experiment with various configurations of Holt-Winters, ANN, Hidden-Markov model are machine learning
hidden layers and neurons to identify the optimal structure for strategies, ARIMA is time series approach and Time Series
our prediction task. Linear Model (TSLM) and Recurrent Neural Networks (RNN) are
Deep learning strategies[4].

2.1.1 Hidden-Markov Model


The model is trained using a backpropagation algorithm, which
adjusts the network's parameters to minimize the prediction In speech popularity, the Hidden Markov version changed from
error. We split the dataset into training and testing sets to the first invention but was widely used to predict inventory
evaluate the model's performance. The use of mean squared marketplace-related records. The stock market trend evaluation
error (MSE) as the loss function and the application of early is based totally on the Hidden Markov model, taking into account
stopping help prevent overfitting and ensure that the model the one-day distinction in near value for a given timeline. The
generalizes well to unseen data. hidden collection of states and their corresponding possibility
values are located for a particular remark sequence. The p
chance price offers Fig. 1. Graphical illustration of the synthetic
The results of our study indicate that the ANN model achieves neuron [2] A Survey on stock market Prediction the use of
an accuracy rate of 58% in predicting stock prices. While this machine studying 927 the inventory charge trend percentage. In
accuracy is modest, it underscores the inherent challenges in the occasion of uncertainty, selection-makers make selections.
stock market prediction. Financial markets are influenced by a HMM is a stochastic model assumed to be a Markov system with
multitude of unpredictable factors, many of which are not hidden-state. It has extra accuracy when in comparison to other
captured in historical data alone. The 58% accuracy suggests models. The parameters of the HMM are indicated with the aid of
that while the ANN can identify certain patterns, there is A, B, and p are found out.
substantial room for improvement.
Advantages

 Strong statistical foundation..


The findings of this research highlight both the potential and  Can handle inputs of variable length.
limitations of using ANNs for stock market prediction. While the
model demonstrates some predictive capability, achieving higher Disadvantages
accuracy requires integrating additional data sources and
 They often have large numbers of unstructured the subsequent layer of units that make up the hidden layer
parameters simultaneously. The weighted outputs of the hidden layers act as
 They cannot express dependencies between hidden an input to some other hidden layer, and so forth. The hidden
states. layers range is an arbitrary design trouble. The weighted output
of the last hidden layer acts as input to the output layer, which
2.1.2 ARIMA Model predicts the networks for positive samples. Crucial parameters of
NN are gaining knowledge of rate, momentum, and epoch (Fig.
This ARIMA model was added using container and Jenkins in
1). Lower back propagation is a neural community mastering a
1970. The box—Jenkins method is also referred to as a hard
set of rules [10]. The propagation community learns by
and fast activity to perceive, estimate, and diagnose ARIMA
processing the pattern set time and again and evaluating the
fashions with time series records. The model is the maximum
community prediction with the actual output. If the residual fee
critical financial forecasting approach [6]. Trends from ARIMA
exceeds the edge fee, the load of the connections is modified to
have been proven to be effective in generating brief-term
reduce the MSE between the forecast price and the original
forecasts . The destiny cost of a variable in the ARIMA version is
price. The weights are modified from the output layer to the first
a linear mixture of past values and beyond errors.
hidden layer in the opposite direction. for the reason that
Advantages modifications in the weights of the connections are made inside
the opposite route, the name given to the algorithm is returned
 .Better understands the time series pattern propagation [14]. Use the lower back propagation algorithm to
 Simulation of the data can be completed to verify the carry out the calculations and compare the predicted output and
model accuracy. goal output. The expected value isn't always toward the real
 Results indicate whether diagnostic tests are significant price and the weights are modified
so user can quickly diagnose the model.
Advantages
Disadvantages
 .ANN can implement tasks that linear model cannot do.
 . Not used for long term predictions  Can be executed in any application.
 It does not require to be reprogrammed.
2.1.3 Holt-Winters
Disadvantages
Holt-Winters is the proper or correct mode while the time series
has fashion and seasonal elements. The series was divided into  It requires training to operate.
3 components or parts that are trend, basis, and seasonality.  It needed high processing time for big networks.
Holt-Winters locate 3 trend, degree, and seasonal smoothening  They are dependent on hardware on which the computing
parameters. It has variations: the Additive Holt-Winters
is taking place,
Smoothening model and the Multiplicative Holt-Winters model.
The former is used for prediction and the latter is preferred if
there aren't any steady seasonal versions in the series. it is
mainly popular for its accuracy and in the area of prediction it 2.1.5 Recurrent Neural Network (RNN)
has outperformed many different models. In quick—term Recurrent neural networks (RNN) [5] use back propagation to
forecasts of economic development tendencies, the Holt-Winters analyze, but their nodes have a comments mechanism, due to
exponential smoothing approach with the trend and seasonal this, RNN fashions can expect a stock price primarily based on
fluctuations is typically used. After eliminating the seasonal recent history and are recurrent . Through experimentations it is
trends from the records, the following feature is taken as an found that RNN prediction accuracy of Apple stocks of past ten
entry, and in going back, Holt-Winters makes the pre- years is over 95% as it is able to process time series data, it is
calculations essential for the cause of forecasting. All suitable for forecasting.
parameters required for the forecasting motive are routinely
initialized primarily based on the function facts. Advantage

 RNN remembers each and every piece of information


 Multiplicative method: (Lt + mTt) * St + m −p
which is useful in time series prediction.
 Additive method: Lt + mTt +St + m –p
 They can be used with convolutional layers.

Disadvantage
2.1.4 Artificial Neural Network (ANN)
 Exploding Gradients makes it difficult to train the network
A synthetic neural community (ANN) is a technique stimulated by effectively.
the organic nervous system, which includes the human brain [3,  It is hard to train RNN
8]. It has an awesome ability to be predicted from huge
databases [12]. The idea of the back propagation set of rules
ANN is generally used to forecast the stock marketplace. Inside 2.1.6 Time Series Linear Model (TSLM)
the back propagation algorithm, a neural community of multilayer
perceptron (MLP) is used. It includes an input layer with a set of One of the stochastic approaches to enforce a predictive version
sensor nodes as input nodes, one or greater hidden layers of is the linear time collection model (TSLM). In a linear time series
computation nodes, and computation nodes of the output layer. model, a great linear model is typically created and facts are
These networks often use raw statistics and statistics derived then included in it so that the linear model reflects the properties
from the formerly mentioned technical and essential evaluation of the real information. The main gain of this linear version of the
[12, 15]. A Multilayer Feed ahead Neural community is a neural time collection is that the actual data are incorporated into the
network with an enter layer, one or extra hidden layers, and an best linear model. This consist of each conventional
output layer. These inputs correspond to each schooling development and seasonal records tendencies. The feature that
sample's measured attributes. Inputs are passed to enter the may be used to create the right linear model in R programming is
layer concurrently. The weighted outputs of these units are fed to tslm() and includes StlStock records that have removed
seasonal tendencies. The cost h shows the number of predicted modest, it underscores the significant challenges in stock market
or to-be-predicted months. The tslm() feature plays all pre- prediction. The myriad factors influencing stock prices, many of
calculations required for the prediction used as an input for the which are unpredictable and not captured in historical data
prediction feature.[2,11] alone, contribute to the inherent difficulty of this task. The 58%
accuracy suggests that while ANNs can identify certain patterns,
there remains substantial room for improvement.
3. Difference between Prediction Methods –

The results of this study highlight both the potential and


Serial Approach Advantages Disadvantages Parameters limitations of using ANNs for stock market prediction. While the
No. Required model demonstrates some predictive capability, achieving higher
accuracy will likely require integrating additional data sources
1 Artificial Better As noise Stock price and exploring alternative machine learning techniques. Future
neural performance increases the research will focus on incorporating macroeconomic indicators,
network than prediction
sentiment analysis from news and social media, and advanced
(ANN) regression. accuracy
Less error decreases feature engineering to enhance the model's performance.
prone
2 Support When Amplify to Investment
vector outside small form In conclusion, this research contributes to the ongoing
machine training- irregularities in consumer,
exploration of machine learning applications in financial
sample is the training net income,
applied, the data which net forecasting. By demonstrating the feasibility of using ANNs for
effect on can decrease revenue, stock market prediction and identifying areas for improvement,
accuracy is the prediction price on we provide valuable insights for future research and the
minimum. accuracy every stock development of more accurate and reliable prediction models.
earning Our ultimate goal is to advance the field of financial forecasting
3 Hidden- For Learning, Technical and support more informed investment decisions.
Markov enhancement decoding and indicators
model purpose assessment of 5.Future Scope
result
4 ARIMA Sturdy and Not used for Open, The future scope of using Artificial Neural Networks (ANNs) for
Model structured long termed close, high, stock market prediction is vast and promising, with numerous
predictions low, price.
avenues for enhancing predictive accuracy and robustness. One
5 Time Unites real Previous Months
series data with patterns are and data significant area of future research involves integrating additional
linear ideal linear present in the data sources. Incorporating macroeconomic indicators, such as
model prediction data interest rates, inflation rates, and GDP growth, can provide a
(TSLM) model more comprehensive understanding of the factors influencing
stock prices. Additionally, sentiment analysis of news articles,
6 Recurrent Enable to Exploding Data of financial reports, and social media posts can offer insights into
Neural model time- gradients can Input layer, market sentiment and investor behavior, which are crucial for
Network dependent make difficult hidden
(RNN) and to train the layers, making more informed predictions.
sequential network Output
data effectively. layers.
problems Another promising direction is the exploration of advanced
machine learning techniques beyond ANNs. Ensemble methods,
such as Random Forests or Gradient Boosting Machines, can
combine the strengths of multiple models to improve prediction
4. Conclusion and Results performance. Recurrent Neural Networks (RNNs), particularly
Long Short-Term Memory (LSTM) networks, are well-suited for
This research explored the application of Artificial Neural
time-series forecasting due to their ability to capture temporal
Networks (ANNs) for stock market prediction, utilizing the Scikit-
dependencies more effectively than traditional ANNs.
learn library and financial data sourced from the yfinance API.
Our primary objective was to evaluate the predictive accuracy of
ANNs in forecasting stock prices, a task inherently complex due
to the volatile and multifaceted nature of financial markets. Furthermore, advancements in feature engineering and selection
can enhance model performance. Techniques such as Principal
Component Analysis (PCA) or feature importance analysis can
help identify the most relevant features, reducing noise and
The study involved comprehensive data preprocessing, including
improving the model’s predictive capability. Hyperparameter
handling missing values, removing outliers, and normalizing the
optimization methods, like grid search or Bayesian optimization,
data. We also created lagged features to capture temporal
can also be employed to fine-tune the model for better accuracy.
dependencies, which are critical for time-series forecasting. The
ANN model was designed with an architecture that balanced
complexity and computational efficiency, and various
configurations of hidden layers and neurons were tested to Finally, the application of deep learning models and hybrid
determine the optimal structure. approaches that combine different machine learning techniques
may offer significant improvements. Continuous learning models
that adapt to new data in real-time could provide more accurate
and timely predictions, making them highly valuable in the fast-
Our findings indicate that the ANN model achieved an accuracy
paced stock market environment.
rate of 58% in predicting stock prices. While this accuracy is
Overall, these advancements hold the potential to significantly
improve the accuracy and reliability of stock market prediction
models, contributing to more informed investment strategies and
better financial decision-making.

References

1.Bishop, C. M. (2006). Pattern Recognition and Machine Learning.


Springer.

2.Box, G. E. P., Jenkins, G. M., Reinsel, G. C., & Ljung, G. M.


(2015). Time Series Analysis: Forecasting and Control. Wiley.

3.Fama, E. F. (1970). Efficient Capital Markets: A Review of


Theory and Empirical Work. The Journal of Finance, 25(2), 383-
417.

4.Hinton, G. E., Osindero, S., & Teh, Y. W. (2006). A Fast Learning


Algorithm for Deep Belief Nets. Neural Computation, 18(7), 1527-
1554.

5.Hochreiter, S., & Schmidhuber, J. (1997). Long Short-Term


Memory. Neural Computation, 9(8), 1735-1780.

6.LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep Learning.


Nature, 521(7553), 436-444.

7.Makridakis, S., Wheelwright, S. C., & Hyndman, R. J. (1998).


Forecasting: Methods and Applications. Wiley.

8.McNelis, P. D. (2005). Neural Networks in Finance: Gaining


Predictive Edge in the Market. Academic Press.

9.Ng, A. Y. (2011). Sparse Autoencoder. In CS294A Lecture Notes.


Stanford University.

10.Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986).


Learning Representations by Back-Propagating Errors. Nature,
323(6088), 533-536.

11.Shumway, R. H., & Stoffer, D. S. (2017). Time Series Analysis


and Its Applications: With R Examples. Springer.

12.Zhang, G. P. (2003). Time Series Forecasting Using a Hybrid


ARIMA and Neural Network Model. Neurocomputing, 50, 159-175.
Yfinance API Documentation. (n.d.).

13.Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion,


B., Grisel, O., ... & Duchesnay, E. (2011). Scikit-learn: Machine
Learning in Python. Journal of Machine Learning Research, 12,
2825-2830.

14.Patel, J., Shah, S., Thakkar, P., & Kotecha, K. (2015). Predicting
Stock and Stock Price Index Movement Using Trend Deterministic

15.Data Preparation and Machine Learning Techniques. Expert


Systems with Applications, 42(1), 259-268.

You might also like