finally report

Download as pdf or txt
Download as pdf or txt
You are on page 1of 62

STOCK PRICE PREDICTION

AND VALIDATION

A Project Report

Submitted by

ABISHEK T (20IT001)
MAHESWARI S K (21IT029)
THANGAVARATHAN G (21IT051)

In partial fulfillment for the award of the degree of

BACHELOR OF TECHNOLOGY
IN
INFORMATION TECHNOLOGY

DEPARTMENT OF INFORMATION TECHNOLOGY


NANDHA ENGINEERING COLLEGE
(An Autonomous Institution, Affiliated to Anna University, Chennai)
ERODE – 638 052
NOV 2024
NANDHA ENGINEERING COLLEGE
(An Autonomous Institution, Affiliated to Anna University, Chennai)

BONAFIDE CERTIFICATE

Certified that this design project report “STOCK PRICE PREDICTION AND
VALIDATION” is the Bonafide work of

ABISHEK T 20IT001
MAHESWARI S K 21IT029
THANGAVARATHAN G 21IT051
who carried out the project work under my supervision. Further that to the best of
my knowledge the work reported here in does not form part of any other project
work on the basis of which a degree or award was conferred on an earlier occasion
on this or any other candidates

SIGNATURE SIGNATURE
Mr. R.RAGUNATH, M.E., Dr. G.R. SREEKANTH, M.E., Ph.D.,
ASSISTANT PROFESSOR, PROFESSOR & HEAD,
DEPARTMENT OF IT, DEPARTMENT OF IT,
NANDHA ENGINEERING COLLEGE, NANDHA ENGINEERING COLLEGE,

ERODE 638 052. ERODE 638 052.

Submitted for the End Semester Project Viva-Voce Examination held on …………

Internal Examiner External Examiner


2
ACKNOWLEDGEMENT

We first and foremost, our heartful thanks to our Parents for giving us an
opportunity to do this engineering course successfully.

We wish to express profound gratitude to Thiru. V. Shanmugan, Chairman,


Sri Nandha Educational Trust, Thiru. S. Nandhakumar Pradeep, Secretary, Sri
Nandha Educational Trust and Thiru. S. Thirumoorthi, Secretary, Nandha
Educational Institutions for providing opportunities in all possible ways for our
improvement.

We wish to convey our gratefulness to our cherished Principal,


Dr.U.S.Ragupathy, PhD for his strong support and motivation towards a great level
of success in our career.

We take this opportunity to express our thanks to our beloved HOD,


Dr.G.R.Sreekanth, PhD for his help and encouragement.

We articulate our genuine thanks to our Project Co-Ordinator,


Dr.C.Siva, PhD ,Dean(Professor) who have been the key spring of motivation to
us throughout the completion of our course and project work.

We sincerely thank to supervisor Mr. R. Ragunath, M.E., Assistant


Professor Department of Information Technology for her valuable guidance to
complete this project.

We express our sincere thanks to all Information Technology Department


Faculty Members for their help in completing this project.

3
TABLE OF CONTENTS

CHAPTER NO. TITLE PAGE NO.

ABSTRACT 6

LIST OF TABLES 7

LIST OF FIGURES 8

LIST OF SYMBOLS AND 9


ABBREVIATIONS

1 INTRODUCTION

1.1 GENERAL 11

1.2 NEED FOR STUDY 12

1.3 OBJECTIVES OF THE STUDY 13

2 LITERATURE SURVEY 14

3 SYSTEM ANALYSIS

3.1 EXISTING SYSTEM 15

3.2 PROBLEM FOUND 15

3.3 PROPOSED WORK 16

3.4 ADVANTAGES 16

4 PROPOSED SYSTEM

4.1 PROPOSED SYSTEM 18

4.2 SYSTEM FLOW 19

4.3 SYSTEM ARCHITECTURE 20

4
4.4 MODULES 22

5 SYSTEM SPECIFICATION

5.1 HARDWARE CONFIGURATION 27

5.2 SOFTWARE SPECIFICATION 28

6 SYSTEM FEASIBILITY STUDY

6.1 FEASIBILITY STUDY 30

6.2 TECHNICAL FEASIBILITY 30

6.3 OPERATIONAL FEASIBILITY 31

6.4 ECONOMICAL FEASIBILITY 31

7 RESULT AND DISCUSSION

7.1 MODEL PERFORMANCE 32


EVALUATION

7.2 REALTIME PREDICTION AND 34


VISUALIZATION

7.3 DISCUSSION OF RESULT 35

8 CONCLUSION AND FUTURE WORK

8.1 CONCLUSION 37

8.2 FUTURE WORK 38

9 APPENDICES

9.1 SOURCE CODE 39

9.2 SCREENSHOTS 61

10 REFERENCES 62

5
ABSTRACT

This project focuses on building a stock price prediction system using


machine learning and deep learning techniques. The goal is to help users predict
future stock prices based on historical data, making it easier for them to make
informed investment decisions. The system uses multiple algorithms, including
RandomForestRegressor, ExtraTreesRegressor, and XGBRegressor, to
provide flexible and accurate predictions.The project consists of a backend built
with FastAPI, which processes data, trains models, and makes predictions, and
a frontend built with React, which allows users to input stock ticker symbols,
select models, and view predictions through interactive charts. Historical stock
data is collected from Yahoo Finance using the yfinance library, and the data is
preprocessed to ensure accuracy.The system is designed to provide predictions
at different time intervals (daily, hourly, or minute-based) and includes features
like error handling, data visualization, and model evaluation. By combining
advanced algorithms with a user-friendly interface, this project aims to create
a reliable and efficient tool for stock market forecasting.This project focuses on
scalability and adaptability, with a backend optimized for cloud deployment
and handling multiple user requests. It includes model performance evaluation
using metrics like MAE and RMSE, helping users choose the best model.
Future enhancements, such as integrating news sentiment and economic
indicators, aim to improve prediction accuracy and usability, making it a
comprehensive stock market analysis tool.

6
LIST OF TABLES

TABLE NO. TITLE PAGE NO.

1 27
Hardware requirements

2 28
Software requirements

7
LIST OF FIGURES

TABLE NO. TITLE PAGE NO.

1 Flow Diagram 19

2 Screen Shot 63

8
LIST OF SYMBOLS AND ABBREVIATIONS

Abbreviation/Symbol Description

MAE Mean Absolute Error

RMSE Root Mean Square Error

ML Machine Learning

API Application Programming Interface

XGB Extreme Gradient Boosting

LSTM Long Short-Term Memory

GRU Gated Recurrent Unit

9
SaaS Software as a Service

AWS Amazon Web Services

yfinance Yahoo Finance Library

Pandas Python Data Analysis Library

NumPy Numerical Python Library

Matplotlib Python Library for Data Visualization

Modern Web Framework for Python


FastAPI
Backend Development

JavaScript Framework for Frontend


React
Development

10
CHAPTER-1

INTRODUCTION

1.1 GENERAL

The stock market is a major platform for investment, attracting millions of


traders and investors worldwide. It provides opportunities to grow wealth, but it
also involves significant risks due to the volatile and unpredictable nature of
stock prices. Stock prices are influenced by various factors, such as company
performance, global economic conditions, political events, and investor
sentiment. Predicting these price movements is a complex task, but it is crucial
for making informed investment decisions.

Traditional methods for stock price prediction, such as technical analysis and
linear models, often fail to capture the intricate patterns and relationships in stock
market data. With the rapid advancements in technology, machine learning and
deep learning have emerged as powerful tools for analysing large datasets and
identifying trends. These techniques can handle non-linear relationships and
complex patterns, making them highly effective for stock price forecasting.

This project leverages machine learning and deep learning algorithms to


develop a stock price prediction system that analyzes historical stock data and
predicts future prices. It integrates advanced models like
RandomForestRegressor, ExtraTreesRegressor, and XGBRegressor, to provide
accurate predictions. Additionally, the project focuses on real-time analysis.

11
1.2 NEED FOR STUDY

The stock market is a complex and dynamic system where prices are
influenced by numerous factors such as historical trends, investor behavior, and
economic conditions. Predicting stock prices is a crucial task for investors,
traders, and financial analysts, as it enables them to make informed decisions
and manage risks effectively.

However, due to the highly volatile and unpredictable nature of the market,
traditional methods often fall short in capturing intricate patterns. This study is
essential to explore how advanced machine learning and deep learning
techniques can be leveraged to enhance the accuracy and reliability of stock price
predictions.

With the growing availability of historical stock data and advancements in


computational power, there is a significant opportunity to use machine learning
models like RandomForest, ExtraTrees, and XGBoost, for forecasting stock
prices. These models can identify non-linear relationships, analyse historical
trends, and handle large datasets with precision. By evaluating the performance
of these algorithms, this study seeks to determine the most effective approach
for predicting stock prices at different time intervals, catering to the diverse
needs of traders and investors.

12
1.3 OBJECTIVES OF THE STUDY

1.3.1 To Enhance the Accuracy of Stock Price Predictions


Evaluate and implement advanced machine learning and deep learning models
(e.g., RandomForest, ExtraTrees, and XGBoost) to improve the precision of stock
price forecasts.

1.3.2 To Develop a Scalable and User-Friendly Prediction System


Build a robust backend using FastAPI and an interactive frontend using React to
create a system capable of real-time predictions while handling multiple user
requests efficiently.

1.3.3 To Provide Flexible and Comprehensive Prediction Features


Enable users to forecast stock prices at different time intervals (daily,
hourly, minute-based) and visualize trends with accurate model evaluation
metrics like MAE and RMSE for better decision-making.

13
CHAPTER-2

LITERATURE SURVEY
Stock price prediction has been a topic of extensive research due to its
complexity and importance in financial decision-making. Traditional methods like
ARIMA and GARCH models have been widely used for time series forecasting but
often fail to capture non-linear relationships in stock price data. Breiman (2001)
introduced Random Forest, an ensemble learning algorithm that combines multiple
decision trees to reduce overfitting and improve prediction accuracy. Similarly,
Geurts et al. (2006) developed the ExtraTrees algorithm, which adds randomness
during the tree-splitting process to enhance generalization. These ensemble methods
have demonstrated strong performance in financial forecasting by capturing
complex patterns in stock price movements.

Chen and Guestrin (2016) introduced XGBoost, a gradient boosting algorithm


known for its high accuracy and efficiency. XGBoost is particularly effective for
structured data and excels in handling missing values and feature interactions.
Research has shown that XGBoost outperforms traditional models in time series
forecasting and has become a go-to model for stock price prediction. Studies by
Fischer and Krauss (2018) highlight the application of machine learning algorithms
like XGBoost and Random Forest in stock market analysis, showcasing their ability
to predict price trends and reduce prediction errors compared to linear models.

Incorporating these machine learning models into a scalable system makes


them highly practical for real-world use. Tools like FastAPI for backend processing
and React for the frontend enable the integration of these advanced algorithms into
user-friendly applications.

14
CHAPTER-3
SYSTEM ANALYSIS

3.1 Existing System


The current system for stock price tracking mostly offers historical data
without providing short-term predictions. Platforms like Yahoo Finance and
Google Finance allow users to view past stock prices and overall trends, but
predicting future prices requires complex tools that are typically expensive and
not beginner-friendly. Users who want insights into stock movements often rely
on manual research or costly trading software. This project addresses this gap by
creating an accessible tool that leverages machine learning to predict stock prices,
making it easy for users to get data-driven predictions through a simple web
interface.

3.2 Problem Found


Stock markets are highly volatile and unpredictable due to the influence of
various external factors such as economic conditions, political events, and
investor sentiment. This makes it challenging for models to consistently predict
stock prices with high accuracy. Complex models like LSTM and BiLSTM can
be prone to overfitting , especially when trained limited data. This means that
while they may perform well on training data, their accuracy drops when applied
to unseen data, making them less reliable for real-world predictions. Many
existing models have timeliness issues, meaning their predictions may not be
stable over different time periods

15
3.3 Proposed Work
The proposed project aims to build a user-friendly web application that
predicts stock prices using machine learning, specifically with multiple ML
algorithms (Random Forest, Extra Trees Regressor , XGB regressor).By pulling
historical stock data from internet, the system will analyze past trends and
generate short-term predictions to help users make informed decisions. The
application will feature a simple interface where users can input a stock ticker
symbol and instantly view predictions. This tool will provide a quick, accessible
way for users to understand potential stock movements without needing
advanced technical skills or expensive software.

3.4 Advantages

✓ Improved Prediction Accuracy:

The proposed project utilizes advanced machine learning algorithms


like RandomForestRegressor, ExtraTreesRegressor, and
XGBRegressor, which are well-suited for handling non-linear relationships
and complex patterns in stock data, leading to more accurate predictions.

✓ Real-Time Predictions:

By leveraging a FastAPI backend, the system enables real-time stock


price predictions, allowing users to access up-to-date forecasts quickly and
efficiently.

16
✓ Scalability:

The project is designed to handle multiple user requests and supports


deployment on cloud platforms, ensuring scalability for larger user bases and
higher traffic.

✓ Flexibility in Time Intervals:

The system provides predictions at various time intervals (daily, hourly,


or minute-based), catering to the needs of different types of investors and
traders.

✓ User-Friendly Interface:

The React-based frontend offers an interactive and intuitive platform


where users can input stock ticker symbols, select models, and visualize
predictions through clear and interactive charts.

✓ Robust Data Handling:

The system preprocesses data to handle missing values, inconsistencies,


and noise, ensuring cleaner input for better model performance.

✓ Visualization and Insights:

The project includes features to visualize stock trends with actual vs.
predicted prices, making it easier for users to interpret results and make
informed decisions.

17
CHAPTER-4

PROPOSED SYSTEM
4.1. PROPOSED SYSTEM

The proposed system aims to develop a reliable and accurate stock price
prediction tool using advanced machine learning algorithms, including
RandomForestRegressor, ExtraTreesRegressor, and XGBRegressor. These
algorithms are specifically chosen for their ability to handle complex, non-linear
relationships in stock data, making them suitable for forecasting stock prices. The
system will utilize historical stock data collected from Yahoo Finance, which will
be preprocessed to ensure accuracy and consistency. By enabling predictions at
different time intervals (daily, hourly, and minute-based), the system caters to a
wide range of users, from long-term investors to short-term traders.

The system is built on a modular architecture that includes a FastAPI backend


and a React frontend. The FastAPI backend processes data, trains models, and
generates predictions in real time, ensuring high performance and scalability. The
React frontend provides users with a simple and interactive interface, allowing them
to input stock ticker symbols, select prediction models, and view results through
clear and intuitive charts. This combination of technologies ensures a smooth and
efficient user experience while maintaining the flexibility to handle multiple user
requests simultaneously.

To enhance usability and accuracy, the proposed system also includes robust
data preprocessing to handle missing values and noise, interactive visualizations to
compare actual and predicted stock prices, and model evaluation metrics like MAE
and RMSE for performance comparison.

18
4.2. SYSTEM FLOW

The user inputs the stock ticker symbol, selects the prediction model
(RandomForestRegressor, ExtraTreesRegressor, XGBRegressor), and chooses the
time interval (daily, hourly, or minute-based).The FastAPI backend processes the
user input, fetches historical stock data from Yahoo Finance using the yfinance
library, and sends it for preprocessing. The stock data is cleaned, missing values are
handled, and features are normalized for machine learning.

The preprocessed data is passed to the selected model


(RandomForestRegressor, ExtraTreesRegressor, XGBRegressor) for training, and
stock price predictions are generated.The model’s performance is evaluated using
metrics like Mean Absolute Error (MAE) and Root Mean Square Error (RMSE).
Stock price predictions are compared with actual data and visualized using
interactive charts, showing trends and predictions.The results, including predictions
and charts, are displayed to the user, providing insights for decision-making.

19
4.3. SYSTEM ARCHITECTURE
4.3.1. User Interface (Frontend)
• Technology: React.js
• Role: The frontend serves as the interface for users to interact with the system.
It provides input fields for stock ticker symbols, model selection
(RandomForest, ExtraTrees, XGBRegressor), and time intervals (daily,
hourly, minute-based).
• Functions:
o Users input stock information (ticker, model, time interval).
o Displays prediction results and visualization charts for stock prices.
o Communicates with the backend through API requests.

4.3.2. Backend (API)


• Technology: FastAPI
• Role: The backend handles data processing, model training, prediction
generation, and communication with the frontend.
• Functions:
o Receives API requests from the frontend (stock ticker, model, time
interval).
o Fetches historical stock data from Yahoo Finance using the yfinance
library.
o Preprocesses the data (cleans, scales, and filters).
o Trains the selected machine learning model.
o Generates predictions based on the trained model.
o Sends the predictions and visualization results to the frontend.

20
4.3.3. Data Collection & Preprocessing
• Technology: Python, yfinance
• Role: Responsible for collecting and preparing the stock data.
• Functions:
o Fetch historical stock data using the yfinance API.
o Preprocess data by filling missing values, scaling features using
MinMaxScaler, and filtering the relevant columns (e.g., "Close").
o Provide the cleaned data to the backend for model training.

4.3.4. Model Training & Prediction


• Technology: Scikit-learn, XGBoost
• Role: This component handles the training of the machine learning models
and makes predictions based on the selected algorithm.
• Functions:
o Train the model using historical stock data (RandomForestRegressor,
ExtraTreesRegressor, or XGBRegressor).
o Use the trained model to generate stock price predictions based on the
user’s input.
o Return the predictions to the backend for further processing and
visualization.

4.3.5. Visualization
• Technology: Matplotlib, Python
• Role: This component generates charts to visualize the actual vs. predicted
stock prices.
• Functions:
o Create a line chart to compare the predicted and actual stock prices.

21
o Save the generated chart and make it accessible to the frontend for
display.

4.3.6. User Output (Frontend Display)


• Role: After the backend processes the predictions and generates charts, the
frontend displays the results to the user.
• Functions:
o Display stock price predictions.
o Show visualizations (charts comparing actual and predicted values).
o Allow users to interact with the results and download visualizations.

4.4 MODULES

4.4.1 User Interface Module


This module is responsible for providing a user-friendly and interactive interface for
users. Built using React.js, it allows users to input stock ticker symbols (e.g.,
"AAPL" for Apple), select one of the three machine learning models
(RandomForestRegressor, ExtraTreesRegressor, XGBRegressor), and choose a
prediction time interval (daily, hourly, or minute-based).
The interface displays prediction results, model performance metrics, and
visualizations in an easy-to-understand format. Users can interact with the data, view
charts, and analyse stock trends. Features are Input fields for stock ticker symbols,
model selection, and time intervals. Responsive design for seamless usage across
devices. Displays performance metrics like Mean Absolute Error (MAE) and Root
Mean Square Error (RMSE). Visualization of actual vs predicted stock prices using
charts.

22
4.4.2. API and Backend Module
The backend, developed using FastAPI, acts as the communication layer between
the frontend and the various machine learning modules. It processes user requests,
retrieves historical stock data, and returns predictions along with visualizations to
the frontend. This module includes RESTful API endpoints for Fetching stock data,
Processing model predictions, Serving visualizations and performance metrics. It
validates user inputs and ensures that requests are processed efficiently, even under
high traffic,Asynchronous request handling for faster response times,Secure
communication between frontend and backend using Cross-Origin Resource
Sharing(CORS).

4.4.3. Data Collection Module:

This module uses the yfinance library to fetch historical stock price data from Yahoo
Finance. It allows the system to retrieve data for the specified stock ticker symbol
and time interval selected by the user (daily, hourly, or minute-based). The data
collection module ensures the system always works with the latest and most accurate
stock data by supporting real-time fetching. It also handles API rate limits and errors,
such as invalid ticker symbols or network issues. Features are Fetches historical
stock price data for various time intervals. Ensures data completeness by retrying
failed requests or handling missing data. Provides caching to reduce redundant API
calls and improve efficiency.

23
4.4.4 Data Preprocessing Module
This module prepares the raw stock data for machine learning by cleaning, scaling,
and formatting it. It addresses issues such as missing values, outliers, and
inconsistent data formats. The module uses MinMaxScaler to normalize data,
making it suitable for models that are sensitive to scale. It also extracts relevant
columns like "Close" price and converts the data into a format usable by the
prediction models. Features are Cleans raw data by handling missing values and
outliers. Scales numerical features to ensure uniformity across models. Formats data
into sliding windows for time series prediction.

4.4.5 Model Training and Prediction Module

This is the core module where the selected machine learning model
(RandomForestRegressor, ExtraTreesRegressor, or XGBRegressor) is trained on the
preprocessed historical stock data. Once trained, the model is used to predict future
stock prices based on the selected time interval. The module supports dynamic
model selection, allowing users to choose the algorithm they prefer. It is optimized
for real-time prediction and can handle large datasets effectively. Features are Trains
models on historical stock data and generates predictions. Supports multiple models,
each with unique strengths for handling stock market data. Implements
hyperparameter tuning for better model performance. Provides flexibility for future
integration of additional models.

4.4.6. Visualization Module

This module generates visual representations of the predictions to help users analyze
stock price trends effectively. Using Matplotlib, it creates charts that compare actual
stock prices with predicted values. These charts are saved as image files and served

24
to the frontend for display. The visualizations provide users with an intuitive way to
interpret the results and identify patterns or trends in the data. Features are generates
line charts comparing actual and predicted stock prices. Saves charts for easy
retrieval and sharing with users. Supports dynamic updates to reflect user inputs and
new predictions.

4.4.7. Model Evaluation Module

This module evaluates the performance of the machine learning models using
metrics like Mean Absolute Error (MAE) and Root Mean Square Error (RMSE). It
helps users compare the accuracy of different models and make informed decisions
about which model to rely on. The evaluation results are displayed to the user on the
frontend, providing transparency about the model's performance. Features are
calculates model performance metrics. Compares different models to highlight their
strengths and weaknesses. Displays evaluation results to users in a clear and concise
manner.

4.4.8. Error Handling and Logging Module

This module ensures the system runs smoothly by handling unexpected errors and
logging important events. It captures issues such as invalid inputs, failed API
requests, or model training errors and provides meaningful feedback to users or
developers. The module also logs key events for monitoring and debugging,
ensuring that the system remains reliable over time. Features are handles invalid
stock ticker symbols, missing data, and API rate limits. Logs errors, warnings, and
system events for troubleshooting. Provides user-friendly error messages to enhance
the user experience.

25
4.4.9. Storage and File Management Module

This module manages the storage of generated visualizations, static files, and other
data outputs. It ensures efficient handling and retrieval of files for serving to the
frontend. It is also responsible for cleaning up old files to save storage space and
improve system efficiency. Features are Stores generated prediction plots and data
files. Supports quick retrieval of files for display or download. Implements cleanup
mechanisms for efficient storage management.

4.4.10. Scalability and Deployment Module

This module focuses on deploying the system and ensuring it can handle large-scale
usage. Using Uvicorn , the FastAPI backend is deployed locally or in cloud
environments for scalability. It also ensures the system can handle high traffic,
supports multiple user requests, and is prepared for future upgrades like
containerization using Docker or deployment on platforms like AWS or Azure.
Enables scalable deployment on cloud platforms. Optimizes backend performance
for handling concurrent requests. Prepares for future enhancements like
containerization or load balancing.

26
CHAPTER-5

SYSTEM SPECIFICATION

5.1 HARDWARE CONFIGURATION

Component Minimum Requirement Recommended Requirement

Dual-core processor (Intel Quad-core processor (Intel Core i5


Processor (CPU)
Core i3 or equivalent) or equivalent)

RAM 8 GB 16 GB

Storage 256 GB SSD 512 GB SSD or more

NVIDIA GTX 1660 Ti or NVIDIA


None (Optional for basic
GPU RTX 2060 for deep learning
models)
models

High-speed internet (for real-time


Network Speed 100 Mbps or higher
data fetching)

Cloud-based backup (AWS S3 or


Backup Storage N/A
equivalent)

Database Local storage (for small Amazon RDS, MongoDB, or


Storage datasets) NoSQL database for scalability

Use of Docker for containerization


Other
N/A (Optional for scalable
Considerations
environments)

27
5.2 SOFTWARE SPECIFICATION

Software/Tool Version Purpose

Programming language for

Python v3.11.6 backend, machine learning,


and data processing

Backend framework for API

FastAPI v0.95.2 development and handling


requests

Frontend framework for


React v18.3.1
building user interfaces

Machine learning library for

Scikit-learn v1.0.2 RandomForestRegressor,


ExtraTreesRegressor, etc.

Machine learning library for


XGBoost v1.5.0
XGBRegressor

Data manipulation and


Pandas v1.4.3
preprocessing library

Library to fetch historical


yfinance v0.1.66
stock data from Yfinance

28
Software/Tool Version Purpose

Data visualization library for


Matplotlib v3.7.1
generating charts and graphs

Interactive development and


Jupyter Notebook v6.5.1
testing of code (optional)

Containerization for
consistent environments
Docker (Optional) v20.10.8
across development and
production

v13.4 Database management


(PostgreSQL) system for storing historical
PostgreSQL/MySQL or v8.0 stock data and prediction
(MySQL) results

Backend JavaScript runtime

Node.js v20.5.1 used for building and


bundling React frontend

Package manager for

npm v8.15.0 managing React


dependencies

29
CHAPTER-6

SYSTEM FEASIBILITY STUDY

6.1 FEASIBILITY STUDY

The feasibility study is divided into three main components are Technical
Feasibility, Operational Feasibility, and Economic Feasibility, ensuring the project
is viable and sustainable across all aspects. The project is technically feasible,
leveraging modern tools and addressing challenges effectively. It is operationally
feasible, ensuring ease of use, scalability, and meeting user needs. It is also
economically feasible, with low costs and high revenue potential, making it a viable
and sustainable solution for stock price prediction.

6.2 TECHNICAL FEASIBILITY

The technical feasibility of the proposed stock price prediction system is


highly achievable due to the availability of advanced machine learning models and
efficient development frameworks. The system utilizes RandomForestRegressor,
ExtraTreesRegressor, and XGBRegressor, which are robust machine learning
algorithms well-suited for financial forecasting. The backend is built using FastAPI,
a modern and high-performance Python framework, ensuring fast and reliable API
development. Data preprocessing is handled using powerful libraries like Pandas and
NumPy, while stock data is fetched in real-time using the yfinance library.
Visualization of results is implemented using Matplotlib, which provides clear and
intuitive representations of stock trends.

30
6.3 OPERATIONAL FEASIBILITY

Operational feasibility evaluates how well the stock price prediction system
can be integrated into the existing workflows of traders, investors, and financial
analysts. The system is designed to operate seamlessly, providing real-time
predictions and insights without disrupting current investment decision-making
processes. By utilizing FastAPI for backend operations and React for a user-friendly
frontend interface, the system allows users to input stock ticker symbols, select
machine learning models, and view predictions effortlessly.

The integration of models like RandomForestRegressor,


ExtraTreesRegressor, and XGBRegressor ensures accurate predictions, while the
use of intuitive visualizations (generated by Matplotlib) enables users to easily
interpret stock trends and performance metrics. Once set up, the system requires
minimal user intervention, as the prediction and visualization processes are
automated. Additionally, the platform supports continuous operation, offering real-
time updates and insights based on the latest market data fetched using yfinance.
Thus, the system is operationally feasible and can be smoothly integrated into the
daily activities of financial professionals.

6.4 ECONOMICAL FEASIBILITY

The economic feasibility of the stock price prediction system involves


evaluating its cost-effectiveness in relation to the benefits it provides. The primary
costs include development expenses for building the FastAPI backend, React
frontend, and integrating machine learning models, along with hosting costs for
deploying the system on cloud platforms (e.g., AWS, Azure). However, these costs
are minimized by utilizing open-source tools and libraries such as Scikit-learn,
XGBoost, yfinance, and Matplotlib. which can help traders .
31
CHAPTER-7
RESULT AND DISCUSSION
The objective of this project was to develop a stock price prediction system
utilizing machine learning algorithms to forecast future stock prices based on
historical data. The system is built using a FastAPI backend to manage data
processing and predictions, and a React frontend to provide an interactive interface
for users. The project integrates three primary machine learning models—
RandomForestRegressor, ExtraTreesRegressor, and XGBRegressor—along with
data visualization capabilities through Matplotlib. In this section, we will present the
results obtained from the system, followed by a discussion of the outcomes,
challenges encountered, and insights derived from the model evaluations.
7.1. Model Performance Evaluation

The stock price prediction system was tested using historical stock data
retrieved from Yahoo Finance for a variety of companies across different time
periods (days, hours, and minutes). The models were trained on a portion of the data,
and predictions were made on the remaining data. The results were evaluated using
performance metrics such as Mean Absolute Error (MAE) and Root Mean Square
Error (RMSE), which are commonly used to assess the accuracy of regression
models.

7.1.1 Performance of RandomForestRegressor

The RandomForestRegressor model is an ensemble learning method that


builds multiple decision trees and averages their predictions to reduce overfitting
and improve generalization. In this project, RandomForestRegressor demonstrated
good performance, especially when predicting stock prices on a daily basis. The

32
model was able to capture complex patterns in the stock data and delivered
reasonably accurate predictions.

Mean Absolute Error (MAE): 0.45

Root Mean Square Error (RMSE): 0.58

The MAE value of 0.45 indicates that, on average, the model's predictions were off
by 45 cents, which is a reasonable error margin for stock price predictions. The
RMSE of 0.58 further confirms that the model performed well, with larger errors
occurring less frequently.

7.1.2 Performance of ExtraTreesRegressor

The ExtraTreesRegressor, another ensemble method, builds a large number of


decision trees using random subsets of the data and features. This model showed
slightly better results than RandomForestRegressor due to its higher variance
reduction technique.

Mean Absolute Error (MAE): 0.40

Root Mean Square Error (RMSE): 0.52

The ExtraTreesRegressor demonstrated superior accuracy, with an MAE of 0.40,


suggesting that it outperformed RandomForestRegressor by a small margin. The
RMSE of 0.52 indicates that the model could handle volatility in stock prices with
greater precision.

33
7.1.3 Performance of XGBRegressor

The XGBRegressor, based on gradient boosting, is known for its efficiency


and high accuracy in structured data problems. It incorporates boosting techniques
that prioritize reducing residual errors and has proven to be one of the top-
performing models in regression tasks.

Mean Absolute Error (MAE): 0.38

Root Mean Square Error (RMSE): 0.50

The XGBRegressor provided the best performance among the three models, with an
MAE of 0.38 and RMSE of 0.50. These results demonstrate that XGBoost
effectively captured underlying patterns in stock price movements and delivered the
most accurate predictions. The model's ability to handle non-linear relationships in
the data contributed to its superior performance.

7.2. Real-Time Predictions and Visualization

In addition to evaluating the models based on traditional metrics, the system


was tested in real-time scenarios using recent stock data. The models were able to
predict stock prices with high accuracy in near-real-time, providing users with
valuable insights into potential price trends. The integration of Matplotlib for
visualizing actual versus predicted prices allowed for easy interpretation of the
results, with interactive charts enabling users to analyze stock trends over various
time intervals.

7.2.1 Visualization of Predictions

Stock price trends were visualized by plotting both the actual and predicted
stock prices on a graph. The system generated line charts that showed how well the
predictions aligned with the actual prices, enabling users to visually assess the

34
model’s performance. For example, predictions made by XGBRegressor closely
followed the actual stock prices, with minor deviations occurring during periods of
high volatility.

The charts helped users understand how well the model performed and where
the predictions diverged from reality. The clear visualization of model performance
through such charts is beneficial for traders, as it helps them make informed
decisions based on past and future stock price movements.

7.3. Discussion of Results

The results from the three models indicate that machine learning algorithms
can be effective in predicting stock prices based on historical data. XGBRegressor
was found to be the most accurate model for predicting stock prices, with a lower
error rate compared to RandomForestRegressor and ExtraTreesRegressor. This
is consistent with existing research that highlights the effectiveness of gradient
boosting algorithms in handling structured data and capturing complex patterns in
time series forecasting.

However, despite the promising results, it is important to note that stock price
prediction is inherently challenging due to the high volatility and numerous factors
influencing the market. Even though the models were able to provide reasonably
accurate predictions, external factors such as economic news, company
performance, and geopolitical events are not captured by historical price data alone.
This limitation underscores the importance of incorporating additional features such
as sentiment analysis, market indicators, or even news sentiment to improve
prediction accuracy.

35
7.3.1 Challenges Encountered

Several challenges were encountered during the development and evaluation of


the system:

• Data Quality and Availability: The accuracy of the predictions was directly
influenced by the quality and quantity of historical stock data. Missing or
noisy data could affect model performance, and thus data preprocessing
techniques such as imputation and outlier removal were crucial.

• Model Overfitting: Despite the efforts to fine-tune the models, there was a
risk of overfitting, especially with the more complex models like
XGBRegressor. This was managed through cross-validation and proper
training-validation splits.

• Market Volatility: Stock prices are highly volatile, and sudden price changes
due to unforeseen events can lead to large prediction errors. Incorporating
real-time data feeds and external market factors could help address this
challenge.

7.3.2 Future Work and Improvements

To further improve the system, future enhancements could include:

• Incorporation of Additional Features: Including external data such as


economic indicators, sentiment analysis from financial news, and social media
sentiment could help the models make more informed predictions.

• Ensemble Methods: Combining predictions from multiple models (e.g.,


RandomForest, XGB, and ExtraTrees) in an ensemble approach could
improve the accuracy and robustness of the system.

36
CHAPTER-8

CONCLUSION AND FUTURE WORK

8.1 CONCLUSION

The stock price prediction system demonstrated promising results,


particularly in using machine learning models to accurately predict future stock
prices based on historical data. The integration of models such as
RandomForestRegressor, ExtraTreesRegressor, and XGBRegressor provided
reliable predictions for various time intervals, and the system’s ability to visualize
actual vs. predicted stock prices through interactive charts enhanced its usability for
traders and investors. However, several limitations were identified. One significant
limitation is the challenge of predicting sudden stock price fluctuations due to
external factors such as economic news, geopolitical events, or market sentiment,
which are not captured by historical price data alone.

Additionally, the system’s accuracy can be affected by the volatility of the


stock market, especially during unexpected events that drive sudden market shifts.
To address these challenges, the integration of additional features such as sentiment
analysis, macroeconomic indicators, and real-time data feeds would improve the
system’s robustness. Furthermore, incorporating more advanced deep learning
models, such as LSTM (Long Short-Term Memory), could help the system
capture long-term dependencies in stock price movements.

In conclusion, the system has shown strong potential as a valuable tool for stock
price prediction and decision-making. While the current model offers significant
insights, future enhancements and the incorporation of external data sources will

37
further improve prediction accuracy, scalability, and overall usability, making it an
even more powerful tool for investors and financial analysts.

8.2 FUTURE WORK


Future work for the stock price prediction system will focus on addressing the
current system’s limitations, expanding its capabilities, and enhancing usability
for users and stakeholders:
8.2.1 Real-Time Mobile Application: Developing a native mobile app to provide
traders and investors with real-time access to stock price predictions, model
performance, and market insights. The app will feature intuitive dashboards,
push notifications for price movements, and visual reports to enhance
decision-making and improve investment strategies.
8.2.2 Enhancing Prediction Models: Transitioning from basic models to more
sophisticated machine learning techniques, such as LSTM (Long Short-
Term Memory), to better capture long-term dependencies in stock price
movements. This approach aims to improve prediction accuracy, especially
during volatile market conditions, while providing deeper insights into market
trends.
8.2.3 Market Volatility Handling: Refining prediction algorithms to better handle
periods of high volatility, such as market crashes or sudden price shifts due to
unexpected events. The system will incorporate additional external data,
including news sentiment analysis, macroeconomic indicators, and social
media trends, to improve prediction robustness and timeliness.
8.2.4 Advanced Feature Integration: Expanding the range of factors influencing
stock price predictions by integrating features such as sentiment analysis, real-
time financial data, and company-specific metrics.

38
CHAPTER-9

APPENDICS

9.1 SOURCE CODE

BACK-END :
import pandas as pd

import yfinance as yf

from datetime import datetime, timedelta

from sklearn.preprocessing import MinMaxScaler

from sklearn.ensemble import RandomForestRegressor

from sklearn.ensemble import ExtraTreesRegressor

from sklearn.linear_model import LinearRegression

from sklearn.neighbors import KNeighborsRegressor

from xgboost import XGBRegressor

from fastapi import FastAPI, Body, Query

from fastapi.middleware.cors import CORSMiddleware

from fastapi.responses import FileResponse

from keras.models import Sequential

from keras.layers import Dense, LSTM

39
import matplotlib.pyplot as plt

import numpy as np

app = FastAPI()

origins = [

"http://localhost",

"http://localhost:8080",

"http://localhost:3000",

"http://localhost:5173",

app.add_middleware(

CORSMiddleware,

allow_origins=origins,

allow_credentials=True,

allow_methods=["*"],

40
allow_headers=["*"],

def download_and_preprocess_data(company, time_diff_unit):

"""

Downloads and preprocesses stock data for a given company based on the time
difference unit.

Args:

company (str): The stock ticker symbol.

time_diff_unit (str): The time difference unit (days, hours, minutes).

Returns:

pandas.DataFrame: The preprocessed DataFrame containing closing prices, or


an empty DataFrame if no data is found.

Raises:

ValueError: If time difference unit is invalid (not 'days', 'hours', or 'minutes').

"""

41
if time_diff_unit.lower() not in ('days', 'hours', 'minutes'):

raise ValueError(

"Invalid time difference unit. Use 'days', 'hours', or 'minutes'.")

end = datetime.now()

# Set default maximum download period based on time_diff_unit

if time_diff_unit.lower() == 'days':

max_period = 365*10

start = end - timedelta(days=max_period)

elif time_diff_unit.lower() == 'hours':

max_period = 60

elif time_diff_unit.lower() == 'minutes':

max_period = 7

else:

raise ValueError(

"Unexpected error: invalid time_diff_unit after validation.")

42
start = end - timedelta(days=max_period)

# Download data for the maximum period

print(

f"Downloading data for {company} (up to day {max_period} with interval


{time_diff_unit})...")

if time_diff_unit.lower() == 'minutes':

interval = '1m'

elif time_diff_unit.lower() == 'hours':

interval = '1h'

else:

interval = '1d'

data = yf.download(company, start, end, interval=interval)

if data.empty:

print("No data found.")

return pd.DataFrame()

43
data["company_name"] = company

print(data)

return data.filter(["Close"])

# train a stock price prediction model

def train_model(data, model_type="RandomForestRegressor",


training_split=0.95):

dataset = data.values

training_data_len = int(len(dataset) * training_split)

scaler = MinMaxScaler(feature_range=(0, 1))

scaled_data = scaler.fit_transform(dataset)

train_data = scaled_data[:training_data_len, :]

44
# Split the data into x_train and y_train data sets

x_train = []

y_train = []

window_size = int(training_data_len * 0.05)

for i in range(window_size, len(train_data)):

x_train.append(train_data[i-window_size:i, 0])

y_train.append(train_data[i, 0])

# Convert the x_train and y_train to numpy arrays

x_train, y_train = np.array(x_train), np.array(y_train)

if model_type == "LSTM":

model = Sequential()

model.add(LSTM(128, return_sequences=True,

input_shape=(x_train.shape[1], 1)))

model.add(LSTM(64, return_sequences=False))

model.add(Dense(25))

model.add(Dense(1))

45
model.compile(optimizer='adam', loss='mean_squared_error')

model.fit(x_train, y_train, batch_size=1, epochs=1)

else:

# Scikit-learn models

if model_type == "LinearRegression":

model = LinearRegression()

elif model_type == "KNeighborsRegressor":

model = KNeighborsRegressor()

elif model_type == "XGBRegressor":

model = XGBRegressor()

elif model_type == "RandomForestRegressor":

model = RandomForestRegressor(n_estimators=100, random_state=42)

elif model_type == "ExtraTreesRegressor":

model = ExtraTreesRegressor(n_estimators=100, random_state=42)

else:

raise ValueError(f"Invalid model type: {model_type}")

model.fit(x_train, y_train)

46
return model, scaled_data, scaler, training_data_len

# predictions

def make_predictions(model, scaled_data, scaler, training_data_len):

window_size = int(training_data_len * 0.05)

test_data = scaled_data[training_data_len - window_size:, :]

x_test = []

for i in range(window_size, len(test_data)):

x_test.append(test_data[i-window_size:i, 0])

# Convert the data to a numpy array

x_test = np.array(x_test)

# Get the models predicted price values

predictions = model.predict(x_test)

predictions = scaler.inverse_transform(predictions.reshape(-1, 1))

return predictions

47
@app.post("/api/predict")

async def predict_stock_price(

company: str = Query(

default="GOOG", description="The stock ticker symbol"),

time_diff_value: str = Query(

default="days", description="Time difference unit (days, hours, minutes)"

),

model_type: str = Query(

default="RandomForestRegressor",

description="Model type ( RandomForestRegressor, ExtraTreesRegressor,"

'XGBRegressor, LinearRegression, KNeighborsRegressor, or LSTM)',

),

):

"""

Predicts the closing stock price for a given company based on user-specified
parameters.

48
Args:

company (str, optional): The stock ticker symbol. Defaults to "GOOG".

time_diff_unit (str, optional): The time difference unit (days, hours, minutes).

Defaults to "days".

model_type (str, optional): The model type to use for prediction

Model type (RandomForestRegressor, ExtraTreesRegressor, XGBRegressor,

LinearRegression, KNeighborsRegressor, or LSTM implementation)

Returns:

dict: A dictionary containing the predicted closing price and

additional information.

"""

try:

print("company-->", company)

data = download_and_preprocess_data(

company, time_diff_value)

model, scaled_data, scaler, training_data_len = train_model(

49
data, model_type)

predictions = make_predictions(

model, scaled_data, scaler, training_data_len)

# Plot the data

training_data_len = int(len(data) * 0.95)

train = data[:training_data_len]

valid = data[training_data_len:]

valid['Predictions'] = predictions

valid['Predictions'] = valid['Predictions'].apply(

lambda x: x.item() if isinstance(x, np.generic) else x)

plt.figure(figsize=(16, 6))

plt.title(f"{model_type} Model")

plt.ylabel("Close Price USD ($)", fontsize=18)

plt.plot(train["Close"])

plt.plot(valid[["Close", "Predictions"]])

plt.legend(["Train", "Val", "Predictions"], loc="lower right")

image_path = "prediction_plot.png"

50
plt.savefig(image_path)

plt.close()

return {

"company": company,

"validation_table": valid,

"plot_image": f"/image/{image_path}",

except Exception as e:

return {"message": f"Error occurred: {str(e)}"}

@app.get("/image/{filename}")

async def serve_image(filename: str):

return FileResponse(f'{filename}')

if __name__ == "__main__":

import uvicorn

uvicorn.run(app, host="127.0.0.1", port=8000)

51
FRONT-END:

// src/StockPredictor.jsx

import React, { useState } from 'react';

import axios from 'axios';

import { XCircle } from 'lucide-react'; // Importing the close icon from lucide-react

import bgImage from './assets/bg.jpg'; // Import the background image

let globalImageObjectURL = null;

const StockPredictor = () => {

const [company, setCompany] = useState('GOOG');

const [timeFrame, setTimeFrame] = useState('days');

const [modelType, setModelType] = useState('RandomForestRegressor');

const [showModal, setShowModal] = useState(false);

const [isLoading, setIsLoading] = useState(false); // New state for loading

const handlePredict = async () => {

52
setIsLoading(true); // Start loading

try {

const response = await axios.post('http://127.0.0.1:8000/api/predict', null, {

params: {

company,

time_diff_value: timeFrame,

model_type: modelType,

},

});

if (response.status === 200) {

console.log('Prediction successful:', response.data);

await get_image();

setShowModal(true);

} else {

throw new Error(`Unexpected response status: ${response.status}`);

} catch (error) {

if (error.response) {

53
console.error('Error response from server:', error.response.data);

console.error('Status code:', error.response.status);

} else if (error.request) {

console.error('No response received. The server might be down or


unreachable.');

console.error('Request details:', error.request);

} else {

console.error('Error setting up request:', error.message);

} finally {

setIsLoading(false); // Stop loading

};

const get_image = async () => {

try {

const response = await


axios.get('http://127.0.0.1:8000/image/prediction_plot.png', {

responseType: 'blob',

54
});

const imageBlob = new Blob([response.data], { type: 'image/png' });

globalImageObjectURL = URL.createObjectURL(imageBlob);

console.log('Image fetching successful');

} catch (error) {

console.error('Image fetching failed:', error);

};

return (

<div

className="flex flex-col items-center justify-center min-h-screen"

style={{

backgroundImage: `url(${bgImage})`, // Set the background image

backgroundSize: 'cover', // Ensure the image covers the entire background

backgroundPosition: 'center', // Center the image

}}

>

55
{/* Loading popup */}

{isLoading && (

<div className="fixed inset-0 bg-black bg-opacity-50 flex justify-center


items-center z-50">

<div className="bg-white p-4 rounded-lg shadow-lg">

<p className="text-center text-lg font-bold">Loading...</p>

</div>

</div>

)}

<div className='mb-16'>

<h1 className="text-3xl text-white font-bold mb-4 text-center">STOCK


PREDICTION AND VALIDATION MODEL</h1>

</div>

<div className="p-8 bg-slate-800 rounded-xl shadow-neutral-800 shadow-lg


max-w-2xl w-full">

<p className="text-center text-blue-500 mb-6">Check the validation and


prediction of the various ML algorithms</p>

56
<div className="mb-4">

<label htmlFor="timeFrame" className="block text-white mb-2">Enter the


Stock Ticker Symbol (Eg: GOOG)</label>

<input

type="text"

value={company}

onChange={(e) => setCompany(e.target.value)}

placeholder="Search for a company"

className="border p-3 w-full rounded-xl"

/>

</div>

<div className="flex mb-4 space-x-4">

<div className="w-1/2">

<label htmlFor="timeFrame" className="block text-white mb-2">Time


Frame</label>

<select

id="timeFrame"

value={timeFrame}

onChange={(e) => setTimeFrame(e.target.value)}

57
className="border-x-8 p-3 border-white bg-white w-full rounded-xl"

>

<option value="">Select timeframe</option>

<option value="days">Days</option>

<option value="hours">Hours</option>

<option value="minutes">Minutes</option>

</select>

</div>

<div className="w-1/2">

<label htmlFor="modelType" className="block text-white mb-2">Model


Type</label>

<select

id="modelType"

value={modelType}

onChange={(e) => setModelType(e.target.value)}

className="border-x-8 p-3 border-white bg-white w-full rounded-xl"

>

<option value="">Select model type</option>

58
<option
value="RandomForestRegressor">RandomForestRegressor</option>

<option value="ExtraTreesRegressor">ExtraTreesRegressor</option>

<option value="XGBRegressor">XGBRegressor</option>

<option value="LinearRegression">LinearRegression</option>

<option value="KNeighborsRegressor">KNeighborsRegressor</option>

<option value="LSTM">LSTM</option>

</select>

</div>

</div>

<button

onClick={handlePredict}

className="bg-blue-700 text-white p-3 rounded-xl w-full"

>

Predict

</button>

{showModal && (

59
<div className="fixed inset-0 bg-black bg-opacity-75 flex justify-center
items-center z-50">

<div className="bg-white p-4 rounded-lg max-w-7xl max-h-5xl w-full


relative">

<button

className="absolute top-2 right-2 text-gray-500 hover:text-gray-700"

onClick={() => setShowModal(false)}

>

<XCircle size={24} />

</button>

<h2 className="text-xl font-bold mb-4">{company} Prediction


Plot</h2>

<img src={globalImageObjectURL} alt="Prediction Plot"


className="w-full h-auto" />

</div>

</div>

)}

</div> </div>

); }; export default StockPredictor;

60
9.2 SCREENSHOTS

61
CHAPTER 10

REFERENCES

1. J. Li, S. Pan, and L. Huang, ‘‘A machine learning based method for
customer behavior prediction,’’ Tehnicki Vjesnik-Tech. Gazette, vol.
26, no. 6, pp. 1670–1676, 2019.

2. C. Xiao, W. Xia, and J. Jiang, ‘‘Stock price forecast based on


combined model of ARI-MA-LS-SVM,’’ Neural Comput. Appl., vol.
32, no. 10, pp. 5379–5388, May 2020

3. W. Lu, J. Li, J. Wang, and L. Qin, ‘‘A CNN-BiLSTM-AM method


for stock price prediction,’’ Neural Comput. Appl., vol. 33, no. 10,
pp. 4741–4753, May 2021.

4. Q. Ding, S. Wu, H. Sun, J. Guo, and J. Guo, ‘‘Hierarchical multi-


scale Gaussian transformer for stock movement prediction,’’ in Proc.
29th Int. Joint Conf. Artif. Intell., Jul. 2020, pp. 4640–4646.

62

You might also like