Paper132024 6 2AlgorithmicTradingStrategiesJBMS-1
Paper132024 6 2AlgorithmicTradingStrategiesJBMS-1
net/publication/379958088
CITATIONS READS
6 142
4 authors, including:
Md Rokibul Hasan
Gannon University
18 PUBLICATIONS 107 CITATIONS
SEE PROFILE
All content following this page was uploaded by Md Rokibul Hasan on 20 April 2024.
| RESEARCH ARTICLE
| ABSTRACT
In the recent past, algorithmic trading has become exponentially predominant in the American stock market. The principal
objective of this research was to explore the employment of machine learning frameworks in formulating algorithmic trading
strategies tailored for the US stock market. For this investigation, an array of software tools was employed, comprising the
Pandas library for data manipulation and analysis, the Python programming language, the Scikit-learn library for machine
learning algorithms and analysis metrics, and the LIME library for explainable AI. In this study, the researcher gathered an
extensive dataset from the Amazon Stock Exchange, spanning from October 19, 2018, to October 16, 2022. The dataset
comprised a wide range of parameters related to Amazon's stock data, facilitating a rigorous analysis of its market performance.
Five models were subjected to the experiment, notably Ridge Regression, Ada-Boost, Light-GBM, XG-Boost, Linear Regression,
and Cat-Boost. From the experiment result, it was evident that the XG-Boost attained the highest R-squared (99.24%) and
accuracy (99.23%) among all the algorithms. From the above results, the analyst inferred that the XG-Boost was able to learn a
more complex and accurate model of the stock exchange data compared to the other algorithms. XG-Boost algorithm can be
utilized to back-test distinct trading strategies on historical data, enabling investors to evaluate their efficiency before risking
real capital. By assessing a wide array of factors, the XG-Boost algorithm can assist investors in selecting stocks with a higher
probability of outperforming the market.
| KEYWORDS
Algorithmic Strategies; Stock Market; Machine Learning; Python; Ridge Regression; Ada-Boost; Linear Regression; Cat-Boost;
Light GBM.
| ARTICLE INFORMATION
ACCEPTED: 01 April 2024 PUBLISHED: 20 April 2024 DOI: 10.32996/jbms.2024.6.2.13
1. Introduction
As per IJSRCSEIT (2023), the invention of algorithmic trading has transformed the domain of financial markets, facilitating traders
to perform large volumes of transactions with exceptional efficiency and speed. Within this spectrum, machine learning algorithms
have proven to be instrumental tools for devising trading strategies that can optimize market inefficiencies and generate alpha.
By utilizing a large volume of historical data and innovative computational methods, machine learning provides the possibility to
uncover sophisticated relationships and patterns that traditional techniques may overlook. The prime focus of this research is to
explore the employment of machine learning frameworks in formulating algorithmic trading strategies tailored for the US stock
market.
IRJET Journal (2022) indicates that in the recent past, algorithmic trading has become exponentially predominant in the American
stock market. Also deemed as black-box trading or automated trading, algorithmic trading entails utilizing computers to make
trades premised on machine learning models or predefined quantitative rules. The objectives of algorithmic trading are to utilize
advanced data analytics and computing power to pinpoint beneficial trading opportunities and trends that may be too complicated
Copyright: © 2024 the Author(s). This article is an open access article distributed under the terms and conditions of the Creative Commons
Attribution (CC-BY) 4.0 license (https://creativecommons.org/licenses/by/4.0/). Published by Al-Kindi Centre for Research and Development,
London, United Kingdom.
Page | 132
JBMS 6(2): 132-143
or rapid for human traders to acknowledge and act upon. While algorithmic trading has prevailed for a few decades now, recent
inventions in machine learning and the large volume of financial data available have inspired new opportunities to establish even
more complicated automated trading strategies.
2. Literature Review
The scholarly literature comprehensively covers a myriad of stock trading techniques. Nevertheless, the domain of algorithmic
trading is quite a recent development, and as a result, the employment of machine learning in algorithmic trading has emerged
as a prime focus in modern academic research. Previous research has extensively examined the use of machine learning in financial
markets, with a particular focus on algorithmic trading strategies. Various studies have demonstrated the effectiveness of machine
learning models in forecasting stock prices, identifying trading signals, and optimizing portfolio allocations. For instance, Salim
(2021) employed deep learning techniques to predict stock returns, achieving promising results compared to traditional methods.
Similarly, Ghania et al. (2019) utilized reinforcement learning algorithms to tailor adaptive trading strategies that outperformed
conventional methods. These findings emphasized the capability of machine learning in terms of consolidating the capabilities of
algorithmic trading systems and generating alpha in highly competitive markets.
Nabipour et al. (2020) investigated stock market forecasting utilizing regression methods and recommended a promising
regression technique for predicting stock market prices based on market data. The authors indicated that future enhancements in
the multiple regression technique could be attained by integrating a greater number of factors. The prime goal of their research
was to help stock traders and investors make strategic decisions when investing in the stock market. Provided the sophisticated
and dynamic aspect of the stock market, accurate forecasting plays a significant role in this intricate and challenging process.
Research undertaken by Rashid (2022) utilized the 'Random Forest' algorithm to design a predictive algorithm for predicting the
5-day-ahead and 10-day-ahead arrangements of the CROBEX index and chosen stocks. The outcomes of their study illustrate the
successful utilization of random forests in designing predictive models for the expected trends in the stock market.
On the other hand, Mohammad Awais (2023) performed research intending to develop and contrast three algorithms for
forecasting the trend of movement in the daily Tehran Stock Exchange (TSE) index. The frameworks were grounded on three
classification algorithms, notably Random Forest, Decision Tree, and Naïve Bayesian Classifier. The investigators inferred that
technical analysis played a more imperative role than fundamental analysis in the decision-making process of stakeholders and
brokers.
In 2022, Reddy & Sai designed a forecasting algorithm premised on the Backpropagation Neural Network (BPNN) and the K-
Nearest Neighbors (KNN) algorithms. The algorithm was employed to forecast the stock prices of Chinese stocks, and the test
outcomes illustrated that the average errors noted in the KNN-ANN models were smaller compared to those in the KNN model.
Their finding suggested that the forecasting algorithm based on the KNN-ANN models outperforms the KNN algorithm in stock
forecasting.
3. Methodology
For this investigation, an array of software tools was employed, comprising the Pandas library for data manipulation and analysis,
the Python programming language, the Scikit-learn library for machine learning algorithms and analysis metrics, and the LIME
library for explainable AI. Python was selected because of its versatility, simplicity, a rich variety of machine learning libraries, and
comprehensive capabilities for data analysis (proAIrokibul, 2024). By contrast, LIME was integrated to elevate the interpretability
of machine learning models, facilitating the analyst's gain of insight into the process of prediction generation.
3.1 Dataset
In this study, the researcher gathered an extensive dataset from the Amazon Stock Exchange, spanning from October 19, 2018, to
October 16, 2022. The dataset comprised a wide range of parameters related to Amazon's stock data, facilitating a rigorous analysis
of its market performance (proAIrokibul, 2024). Amazon stock exchange comprised the following parameters:
1. Date: The particular date for the stock trading, facilitating chronological analysis.
2. Open Price: Refers to the opening price of Amazon's stock at the start of a trading session.
3. Close Price: Denotes the closing price of Amazon's stock at the climax of a trading session.
4. High Price: Refer s to the highest price attained by Amazon's stock during a specific trading session.
5. Low Price: The lowest price set by Amazon's stock exchange during a particular trading session.
6. Volume: The overall number of shares traded during a particular period, presenting insights into investor interest and
market liquidity.
Page | 133
Algorithmic Trading Strategies: Leveraging Machine Learning Models for Enhanced Performance in the US Stock Market.
7. Adjusted Close Price: Denotes the closing price monitored for factors such as stock splits, dividends, and other corporate
activities, providing a more accurate representation of stock performance.
3.2 Pre-Processing
Data preprocessing entailed applying different techniques to cleanse the collected data. In this research, the analyst employed the
Min Max-Scaler technique, which is imported from the sci-kit-learn library. The Min Max-Scaler was responsible for converting
attributes by scaling each one to a particular range (proAIrokibul, 2024). This approximator operated by independently scaling and
changing each attribute to guarantee it falls within the targeted range, typically between zero and one, based on the training set.
Ada-boost Model
The AdaBoost model is a method that transforms weak learners into solid ones by employing a specialized boosting protocol
termed an ensemble model. Ada-Boosting targets to reinforce the accuracy of less-willed learners by sequentially reconsidering
their past forecasting (Saifan, 2020). This meta-predictor framework fits an algorithm to the principal dataset and then utilizes that
algorithm to fit supplementary copies of itself to the dataset. By modifying sample weights based on actual prediction error, the
training process facilitates the model to concentrate on the most challenging data points.
Page | 134
JBMS 6(2): 132-143
Ridge Regression is a model of approximating the coefficients of multiple regression frameworks in incidents where the
independent variables are greatly correlated. Ridge regression is specifically instrumental for reducing the challenge of
multicollinearity in linear regression, which mostly happens in algorithms with large quantities of parameters. The technique offers
enhanced efficiency in parameter approximation challenges in exchange for a tolerable volume of bias (Sanyal, 2022). The ridge
approximator is the resolution to the least square challenges that are vulnerable to the restraint that the sum of the squares of the
coefficients is less than a constant. The ridge parameter, which regulates the intensity of the penalty term, is normally selected as
the peer of the heuristic criterion.
Linear regression is a statistical technique utilized in machine learning and data science for predictive analysis. It is a model that
offers a linear association between an independent variable (explanatory and predictor variable) and a dependent variable
(outcome or response variable) that remains static because of the alteration in other variables (Umer, 2019). The regression
algorithms predict the value of the dependent variable, which is the outcome or response variable being analyzed or studied.
XG-Boost Model
XG-Boost is a comprehensive and efficient model of the gradient boosting model for regression forecasting modeling. It is an
open-source library that offers an effective and efficient deployment of the gradient boosting algorithm, which is a form of
ensemble machine learning framework that can be used for classification or regression predictive modeling (Sanyal, 2022).
Page | 135
Algorithmic Trading Strategies: Leveraging Machine Learning Models for Enhanced Performance in the US Stock Market.
Light-GBM Model
Light-GBM is a distributed and high-performance gradient-boosting model utilized for distinct machine-learning tasks. It is
renowned for its efficiency, speed, and capability to manage large datasets efficiently (Saifan, 2020). Light-GBM employs
histogram-based learning to escalate the training procedure while ensuring accuracy, making it a popular selection among data
analysts worldwide for tasks like regression, classification, and ranking.
Cat-boost Model
Cat-Boost is a renowned machine-learning model that can handle both numerical and categorical features efficiently, mitigate
missing values, and combat overfitting through various methods (Salim, 2021). Its GPU-powered version and dynamic classifier
class make it a prominent choice for regression and classification tasks with large datasets.
Page | 136
JBMS 6(2): 132-143
3.5 Experimentation
Importing Libraries
Output
Concerning the data frame, every row in the command was modified to portray the distribution of the price of the stock for a
specific month in the stock exchange. Particularly, for every month, the row displayed the range price of every stock from the
lowest as per the Amazon Stock Exchange. By organizing the data in that format, the analyst obtained insights concerning the
performance of stock price distributions within each month.
Page | 137
Algorithmic Trading Strategies: Leveraging Machine Learning Models for Enhanced Performance in the US Stock Market.
Page | 138
JBMS 6(2): 132-143
Output:
To visualize the distribution of the targeted variables, a code snippet was applied to generate a histogram chart of the targeted
variables:
Output:
Page | 139
Algorithmic Trading Strategies: Leveraging Machine Learning Models for Enhanced Performance in the US Stock Market.
Apart from the distribution of the targeted variable, the analyst equally tried to generate the price ratio distribution as showcased
below:
Output:
Page | 140
JBMS 6(2): 132-143
3.7 Interpretation
From the above table and chart, it was evident that the XG-Boost attained the highest R-squared (99.24%) and accuracy (99.23%)
among all the algorithms. From the above results, the analyst inferred that the XG-Boost was able to learn a more complex and
accurate model of the stock exchange data compared to the other algorithms. Besides, the linear regression came second, where
it achieved R-squared (98.02%) and accuracy (98.04%). Overall, the results indicated that XG-Boost was the best-performing
algorithm on this particular dataset.
4. Business Impact
4.1 Benefits for the Stock Investors
1. Enhanced Stock Price Signal Identification: XG-Boost's accuracy in pinpointing sophisticated patterns within large-volume
data can help discover subtle signals in financial indicators, historical stock price data, and news sentiment. Consequently, this
can assist investors in pinpointing prospective profitable opportunities that might be missed by simpler analysis methods.
2. Enhanced Stock Selection: By assessing a wide array of factors, the XG-Boost algorithm can assist investors in selecting stocks
with a higher probability of outperforming the market. This can entail considering mainstream financial ratios, organizational
news, social media sentiment, and other data points to develop a more robust picture of a stock's potential.
3. Back-testing and Scenario Planning: The XG-Boost algorithm can be utilized to back-test distinct trading strategies on
historical data, enabling investors to evaluate their efficiency before risking real capital. Furthermore, the algorithm can be
utilized for scenario planning, simulating possible market movements under various economic conditions.
Page | 141
Algorithmic Trading Strategies: Leveraging Machine Learning Models for Enhanced Performance in the US Stock Market.
investors can be more informed on investment decisions, which, in turn, can result in a more accurate allocation of capital for
valuable organizations.
2. Diminished Market Volatility: XG-Boost's capability to pinpoint hidden associations and forecast future patterns can assist
investors in expecting and responding more strategically to market fluctuations. Consequently, this could lead to enhanced
overall market movements and possibly reduce the effect of flash crashes or irrational exuberance.
3. Increased Investor Confidence: As XG-Boost enlightens investors with supreme decision-making mechanisms, foreign
investors might be empowered to invest in the market. This could lead to more long-term investments and potentially fuel
US economic growth.
5. Conclusion
The prime focus of this research was to explore the employment of machine learning frameworks in formulating algorithmic trading
strategies tailored for the US stock market. For this investigation, an array of software tools was employed, comprising the Pandas
library for data manipulation and analysis, the Python programming language, the Scikit-learn library for machine learning
algorithms and analysis metrics, and the LIME library for explainable AI. In this study, the researcher gathered an extensive dataset
from the Amazon Stock Exchange, spanning from October 19, 2018, to October 16, 2022. The dataset comprised a wide range of
parameters related to Amazon's stock data, facilitating a rigorous analysis of its market performance. Five models were subjected
to the experiment, notably Ridge Regression, Ada-Boost, XG-Boost, Linear Regression, and Cat-Boost. From the experiment result,
it was evident that the XG-Boost attained the highest R-squared (99.24%) and accuracy (99.23%) among all the algorithms. From
the above results, the analyst inferred that the XG-Boost was able to learn a more complex and accurate model of the stock
exchange data compared to the other algorithms. XG-Boost algorithm can be utilized to back-test distinct trading strategies on
historical data, enabling investors to evaluate their efficiency before risking real capital. By assessing a wide array of factors, the
XG-Boost algorithm can assist investors in selecting stocks with a higher probability of outperforming the market.
References
[1] Awais, M. (2019). Stock market prediction using Machine Learning (ML)Algorithms. www.academia.edu.
https://www.academia.edu/84023752/Stock_Market_Prediction_Using_Machine_Learning_ML_Algorithms?sm=b
[2] Ghania, M. U., Awaisa, M., & Muzammula, M. (2019). Stock market prediction using machine learning (ML) algorithms. ADCAIJ: Adv Distrib
Comput Artif Intell, 8(4), 97-116.
[3] IJSRCSEIT. (2022). Stock prediction using machine learning algorithms. International Journal of Scientific Research in Computer Science,
Engineering and Information Technology. Technoscienceacademy.
https://www.academia.edu/91042041/Stock_Prediction_Using_Machine_Learning_Algorithms?sm=b
[4] Nabipour, M., Nayyeri, P., Jabani, H., Shahab, S., & Mosavi, A. (2020). Predicting stock market trends using machine learning and deep
learning algorithms via continuous and binary data; a comparative analysis. Ieee Access, 8, 150199-150212.
Page | 142
JBMS 6(2): 132-143
[5] Rashid, M., (2022). Predicting stock market using machine learning algorithms. International Journal of Scientific Research in Science,
Engineering and Technology. Technoscienceacademy.
https://www.academia.edu/91050428/Predicting_Stock_Market_Using_Machine_Learning_Algorithms?sm=b
[6] Reddy, V. K. S., & Sai, K. (2018). Stock market prediction using machine learning. International Research Journal of Engineering and
Technology (IRJET), 5(10), 1033-1035.
[7] Salim, S., (2021). PREDICTING STOCK MARKET USING MACHINE LEARNING ALGORITHMS. www.academia.edu.
https://www.academia.edu/44896464/PREDICTING_STOCK_MARKET_USING_MACHINE_LEARNING_ALGORITHMS?sm=b
[8] IRJET Journal (2019). Prediction of Stock Market using Machine Learning Algorithms. Irjet.
https://www.academia.edu/40043469/IRJET_Prediction_of_Stock_Market_using_Machine_Learning_Algorithms?sm=b
[9] IRJET Journal, (2022). STOCK MARKET PREDICTION AND ANALYSIS USING MACHINE LEARNING ALGORITHMS. Irjet.
https://www.academia.edu/86783381/STOCK_MARKET_PREDICTION_AND_ANALYSIS_USING_MACHINE_LEARNING_ALGORITHMS?sm=b
[10] proAIrokibul. (n.d.). Stock-Price-Analysis-And-Prediction/Model/main.ipynb at main · proAIrokibul/Stock-Price-Analysis-And-Prediction.
GitHub. https://github.com/proAIrokibul/Stock-Price-Analysis-And-Prediction/blob/main/Model/main.ipynb
[11] Saifan, R. (2020). Investigating Algorithmic Stock Market Trading using Ensemble Machine Learning Methods. www.academia.edu.
https://www.academia.edu/81513651/Investigating_Algorithmic_Stock_Market_Trading_using_Ensemble_Machine_Learning_Methods?sm=b
[12] Sanyal, S. (2022). Stock Market Prediction using Machine Learning Algorithms. www.academia.edu.
https://www.academia.edu/68861133/Stock_Market_Prediction_using_Machine_Learning_Algorithms?sm=b
[13] Umer, M. (2019). Stock market prediction using Machine Learning(ML)Algorithms. www.academia.edu.
https://www.academia.edu/77318461/Stock_Market_Prediction_Using_Machine_Learning_ML_Algorithms?sm=b
Page | 143