finally report
finally report
finally report
AND VALIDATION
A Project Report
Submitted by
ABISHEK T (20IT001)
MAHESWARI S K (21IT029)
THANGAVARATHAN G (21IT051)
BACHELOR OF TECHNOLOGY
IN
INFORMATION TECHNOLOGY
BONAFIDE CERTIFICATE
Certified that this design project report “STOCK PRICE PREDICTION AND
VALIDATION” is the Bonafide work of
ABISHEK T 20IT001
MAHESWARI S K 21IT029
THANGAVARATHAN G 21IT051
who carried out the project work under my supervision. Further that to the best of
my knowledge the work reported here in does not form part of any other project
work on the basis of which a degree or award was conferred on an earlier occasion
on this or any other candidates
SIGNATURE SIGNATURE
Mr. R.RAGUNATH, M.E., Dr. G.R. SREEKANTH, M.E., Ph.D.,
ASSISTANT PROFESSOR, PROFESSOR & HEAD,
DEPARTMENT OF IT, DEPARTMENT OF IT,
NANDHA ENGINEERING COLLEGE, NANDHA ENGINEERING COLLEGE,
Submitted for the End Semester Project Viva-Voce Examination held on …………
We first and foremost, our heartful thanks to our Parents for giving us an
opportunity to do this engineering course successfully.
3
TABLE OF CONTENTS
ABSTRACT 6
LIST OF TABLES 7
LIST OF FIGURES 8
1 INTRODUCTION
1.1 GENERAL 11
2 LITERATURE SURVEY 14
3 SYSTEM ANALYSIS
3.4 ADVANTAGES 16
4 PROPOSED SYSTEM
4
4.4 MODULES 22
5 SYSTEM SPECIFICATION
8.1 CONCLUSION 37
9 APPENDICES
9.2 SCREENSHOTS 61
10 REFERENCES 62
5
ABSTRACT
6
LIST OF TABLES
1 27
Hardware requirements
2 28
Software requirements
7
LIST OF FIGURES
1 Flow Diagram 19
2 Screen Shot 63
8
LIST OF SYMBOLS AND ABBREVIATIONS
Abbreviation/Symbol Description
ML Machine Learning
9
SaaS Software as a Service
10
CHAPTER-1
INTRODUCTION
1.1 GENERAL
Traditional methods for stock price prediction, such as technical analysis and
linear models, often fail to capture the intricate patterns and relationships in stock
market data. With the rapid advancements in technology, machine learning and
deep learning have emerged as powerful tools for analysing large datasets and
identifying trends. These techniques can handle non-linear relationships and
complex patterns, making them highly effective for stock price forecasting.
11
1.2 NEED FOR STUDY
The stock market is a complex and dynamic system where prices are
influenced by numerous factors such as historical trends, investor behavior, and
economic conditions. Predicting stock prices is a crucial task for investors,
traders, and financial analysts, as it enables them to make informed decisions
and manage risks effectively.
However, due to the highly volatile and unpredictable nature of the market,
traditional methods often fall short in capturing intricate patterns. This study is
essential to explore how advanced machine learning and deep learning
techniques can be leveraged to enhance the accuracy and reliability of stock price
predictions.
12
1.3 OBJECTIVES OF THE STUDY
13
CHAPTER-2
LITERATURE SURVEY
Stock price prediction has been a topic of extensive research due to its
complexity and importance in financial decision-making. Traditional methods like
ARIMA and GARCH models have been widely used for time series forecasting but
often fail to capture non-linear relationships in stock price data. Breiman (2001)
introduced Random Forest, an ensemble learning algorithm that combines multiple
decision trees to reduce overfitting and improve prediction accuracy. Similarly,
Geurts et al. (2006) developed the ExtraTrees algorithm, which adds randomness
during the tree-splitting process to enhance generalization. These ensemble methods
have demonstrated strong performance in financial forecasting by capturing
complex patterns in stock price movements.
14
CHAPTER-3
SYSTEM ANALYSIS
15
3.3 Proposed Work
The proposed project aims to build a user-friendly web application that
predicts stock prices using machine learning, specifically with multiple ML
algorithms (Random Forest, Extra Trees Regressor , XGB regressor).By pulling
historical stock data from internet, the system will analyze past trends and
generate short-term predictions to help users make informed decisions. The
application will feature a simple interface where users can input a stock ticker
symbol and instantly view predictions. This tool will provide a quick, accessible
way for users to understand potential stock movements without needing
advanced technical skills or expensive software.
3.4 Advantages
✓ Real-Time Predictions:
16
✓ Scalability:
✓ User-Friendly Interface:
The project includes features to visualize stock trends with actual vs.
predicted prices, making it easier for users to interpret results and make
informed decisions.
17
CHAPTER-4
PROPOSED SYSTEM
4.1. PROPOSED SYSTEM
The proposed system aims to develop a reliable and accurate stock price
prediction tool using advanced machine learning algorithms, including
RandomForestRegressor, ExtraTreesRegressor, and XGBRegressor. These
algorithms are specifically chosen for their ability to handle complex, non-linear
relationships in stock data, making them suitable for forecasting stock prices. The
system will utilize historical stock data collected from Yahoo Finance, which will
be preprocessed to ensure accuracy and consistency. By enabling predictions at
different time intervals (daily, hourly, and minute-based), the system caters to a
wide range of users, from long-term investors to short-term traders.
To enhance usability and accuracy, the proposed system also includes robust
data preprocessing to handle missing values and noise, interactive visualizations to
compare actual and predicted stock prices, and model evaluation metrics like MAE
and RMSE for performance comparison.
18
4.2. SYSTEM FLOW
The user inputs the stock ticker symbol, selects the prediction model
(RandomForestRegressor, ExtraTreesRegressor, XGBRegressor), and chooses the
time interval (daily, hourly, or minute-based).The FastAPI backend processes the
user input, fetches historical stock data from Yahoo Finance using the yfinance
library, and sends it for preprocessing. The stock data is cleaned, missing values are
handled, and features are normalized for machine learning.
19
4.3. SYSTEM ARCHITECTURE
4.3.1. User Interface (Frontend)
• Technology: React.js
• Role: The frontend serves as the interface for users to interact with the system.
It provides input fields for stock ticker symbols, model selection
(RandomForest, ExtraTrees, XGBRegressor), and time intervals (daily,
hourly, minute-based).
• Functions:
o Users input stock information (ticker, model, time interval).
o Displays prediction results and visualization charts for stock prices.
o Communicates with the backend through API requests.
20
4.3.3. Data Collection & Preprocessing
• Technology: Python, yfinance
• Role: Responsible for collecting and preparing the stock data.
• Functions:
o Fetch historical stock data using the yfinance API.
o Preprocess data by filling missing values, scaling features using
MinMaxScaler, and filtering the relevant columns (e.g., "Close").
o Provide the cleaned data to the backend for model training.
4.3.5. Visualization
• Technology: Matplotlib, Python
• Role: This component generates charts to visualize the actual vs. predicted
stock prices.
• Functions:
o Create a line chart to compare the predicted and actual stock prices.
21
o Save the generated chart and make it accessible to the frontend for
display.
4.4 MODULES
22
4.4.2. API and Backend Module
The backend, developed using FastAPI, acts as the communication layer between
the frontend and the various machine learning modules. It processes user requests,
retrieves historical stock data, and returns predictions along with visualizations to
the frontend. This module includes RESTful API endpoints for Fetching stock data,
Processing model predictions, Serving visualizations and performance metrics. It
validates user inputs and ensures that requests are processed efficiently, even under
high traffic,Asynchronous request handling for faster response times,Secure
communication between frontend and backend using Cross-Origin Resource
Sharing(CORS).
This module uses the yfinance library to fetch historical stock price data from Yahoo
Finance. It allows the system to retrieve data for the specified stock ticker symbol
and time interval selected by the user (daily, hourly, or minute-based). The data
collection module ensures the system always works with the latest and most accurate
stock data by supporting real-time fetching. It also handles API rate limits and errors,
such as invalid ticker symbols or network issues. Features are Fetches historical
stock price data for various time intervals. Ensures data completeness by retrying
failed requests or handling missing data. Provides caching to reduce redundant API
calls and improve efficiency.
23
4.4.4 Data Preprocessing Module
This module prepares the raw stock data for machine learning by cleaning, scaling,
and formatting it. It addresses issues such as missing values, outliers, and
inconsistent data formats. The module uses MinMaxScaler to normalize data,
making it suitable for models that are sensitive to scale. It also extracts relevant
columns like "Close" price and converts the data into a format usable by the
prediction models. Features are Cleans raw data by handling missing values and
outliers. Scales numerical features to ensure uniformity across models. Formats data
into sliding windows for time series prediction.
This is the core module where the selected machine learning model
(RandomForestRegressor, ExtraTreesRegressor, or XGBRegressor) is trained on the
preprocessed historical stock data. Once trained, the model is used to predict future
stock prices based on the selected time interval. The module supports dynamic
model selection, allowing users to choose the algorithm they prefer. It is optimized
for real-time prediction and can handle large datasets effectively. Features are Trains
models on historical stock data and generates predictions. Supports multiple models,
each with unique strengths for handling stock market data. Implements
hyperparameter tuning for better model performance. Provides flexibility for future
integration of additional models.
This module generates visual representations of the predictions to help users analyze
stock price trends effectively. Using Matplotlib, it creates charts that compare actual
stock prices with predicted values. These charts are saved as image files and served
24
to the frontend for display. The visualizations provide users with an intuitive way to
interpret the results and identify patterns or trends in the data. Features are generates
line charts comparing actual and predicted stock prices. Saves charts for easy
retrieval and sharing with users. Supports dynamic updates to reflect user inputs and
new predictions.
This module evaluates the performance of the machine learning models using
metrics like Mean Absolute Error (MAE) and Root Mean Square Error (RMSE). It
helps users compare the accuracy of different models and make informed decisions
about which model to rely on. The evaluation results are displayed to the user on the
frontend, providing transparency about the model's performance. Features are
calculates model performance metrics. Compares different models to highlight their
strengths and weaknesses. Displays evaluation results to users in a clear and concise
manner.
This module ensures the system runs smoothly by handling unexpected errors and
logging important events. It captures issues such as invalid inputs, failed API
requests, or model training errors and provides meaningful feedback to users or
developers. The module also logs key events for monitoring and debugging,
ensuring that the system remains reliable over time. Features are handles invalid
stock ticker symbols, missing data, and API rate limits. Logs errors, warnings, and
system events for troubleshooting. Provides user-friendly error messages to enhance
the user experience.
25
4.4.9. Storage and File Management Module
This module manages the storage of generated visualizations, static files, and other
data outputs. It ensures efficient handling and retrieval of files for serving to the
frontend. It is also responsible for cleaning up old files to save storage space and
improve system efficiency. Features are Stores generated prediction plots and data
files. Supports quick retrieval of files for display or download. Implements cleanup
mechanisms for efficient storage management.
This module focuses on deploying the system and ensuring it can handle large-scale
usage. Using Uvicorn , the FastAPI backend is deployed locally or in cloud
environments for scalability. It also ensures the system can handle high traffic,
supports multiple user requests, and is prepared for future upgrades like
containerization using Docker or deployment on platforms like AWS or Azure.
Enables scalable deployment on cloud platforms. Optimizes backend performance
for handling concurrent requests. Prepares for future enhancements like
containerization or load balancing.
26
CHAPTER-5
SYSTEM SPECIFICATION
RAM 8 GB 16 GB
27
5.2 SOFTWARE SPECIFICATION
28
Software/Tool Version Purpose
Containerization for
consistent environments
Docker (Optional) v20.10.8
across development and
production
29
CHAPTER-6
The feasibility study is divided into three main components are Technical
Feasibility, Operational Feasibility, and Economic Feasibility, ensuring the project
is viable and sustainable across all aspects. The project is technically feasible,
leveraging modern tools and addressing challenges effectively. It is operationally
feasible, ensuring ease of use, scalability, and meeting user needs. It is also
economically feasible, with low costs and high revenue potential, making it a viable
and sustainable solution for stock price prediction.
30
6.3 OPERATIONAL FEASIBILITY
Operational feasibility evaluates how well the stock price prediction system
can be integrated into the existing workflows of traders, investors, and financial
analysts. The system is designed to operate seamlessly, providing real-time
predictions and insights without disrupting current investment decision-making
processes. By utilizing FastAPI for backend operations and React for a user-friendly
frontend interface, the system allows users to input stock ticker symbols, select
machine learning models, and view predictions effortlessly.
The stock price prediction system was tested using historical stock data
retrieved from Yahoo Finance for a variety of companies across different time
periods (days, hours, and minutes). The models were trained on a portion of the data,
and predictions were made on the remaining data. The results were evaluated using
performance metrics such as Mean Absolute Error (MAE) and Root Mean Square
Error (RMSE), which are commonly used to assess the accuracy of regression
models.
32
model was able to capture complex patterns in the stock data and delivered
reasonably accurate predictions.
The MAE value of 0.45 indicates that, on average, the model's predictions were off
by 45 cents, which is a reasonable error margin for stock price predictions. The
RMSE of 0.58 further confirms that the model performed well, with larger errors
occurring less frequently.
33
7.1.3 Performance of XGBRegressor
The XGBRegressor provided the best performance among the three models, with an
MAE of 0.38 and RMSE of 0.50. These results demonstrate that XGBoost
effectively captured underlying patterns in stock price movements and delivered the
most accurate predictions. The model's ability to handle non-linear relationships in
the data contributed to its superior performance.
Stock price trends were visualized by plotting both the actual and predicted
stock prices on a graph. The system generated line charts that showed how well the
predictions aligned with the actual prices, enabling users to visually assess the
34
model’s performance. For example, predictions made by XGBRegressor closely
followed the actual stock prices, with minor deviations occurring during periods of
high volatility.
The charts helped users understand how well the model performed and where
the predictions diverged from reality. The clear visualization of model performance
through such charts is beneficial for traders, as it helps them make informed
decisions based on past and future stock price movements.
The results from the three models indicate that machine learning algorithms
can be effective in predicting stock prices based on historical data. XGBRegressor
was found to be the most accurate model for predicting stock prices, with a lower
error rate compared to RandomForestRegressor and ExtraTreesRegressor. This
is consistent with existing research that highlights the effectiveness of gradient
boosting algorithms in handling structured data and capturing complex patterns in
time series forecasting.
However, despite the promising results, it is important to note that stock price
prediction is inherently challenging due to the high volatility and numerous factors
influencing the market. Even though the models were able to provide reasonably
accurate predictions, external factors such as economic news, company
performance, and geopolitical events are not captured by historical price data alone.
This limitation underscores the importance of incorporating additional features such
as sentiment analysis, market indicators, or even news sentiment to improve
prediction accuracy.
35
7.3.1 Challenges Encountered
• Data Quality and Availability: The accuracy of the predictions was directly
influenced by the quality and quantity of historical stock data. Missing or
noisy data could affect model performance, and thus data preprocessing
techniques such as imputation and outlier removal were crucial.
• Model Overfitting: Despite the efforts to fine-tune the models, there was a
risk of overfitting, especially with the more complex models like
XGBRegressor. This was managed through cross-validation and proper
training-validation splits.
• Market Volatility: Stock prices are highly volatile, and sudden price changes
due to unforeseen events can lead to large prediction errors. Incorporating
real-time data feeds and external market factors could help address this
challenge.
36
CHAPTER-8
8.1 CONCLUSION
In conclusion, the system has shown strong potential as a valuable tool for stock
price prediction and decision-making. While the current model offers significant
insights, future enhancements and the incorporation of external data sources will
37
further improve prediction accuracy, scalability, and overall usability, making it an
even more powerful tool for investors and financial analysts.
38
CHAPTER-9
APPENDICS
BACK-END :
import pandas as pd
import yfinance as yf
39
import matplotlib.pyplot as plt
import numpy as np
app = FastAPI()
origins = [
"http://localhost",
"http://localhost:8080",
"http://localhost:3000",
"http://localhost:5173",
app.add_middleware(
CORSMiddleware,
allow_origins=origins,
allow_credentials=True,
allow_methods=["*"],
40
allow_headers=["*"],
"""
Downloads and preprocesses stock data for a given company based on the time
difference unit.
Args:
Returns:
Raises:
"""
41
if time_diff_unit.lower() not in ('days', 'hours', 'minutes'):
raise ValueError(
end = datetime.now()
if time_diff_unit.lower() == 'days':
max_period = 365*10
max_period = 60
max_period = 7
else:
raise ValueError(
42
start = end - timedelta(days=max_period)
print(
if time_diff_unit.lower() == 'minutes':
interval = '1m'
interval = '1h'
else:
interval = '1d'
if data.empty:
return pd.DataFrame()
43
data["company_name"] = company
print(data)
return data.filter(["Close"])
dataset = data.values
scaled_data = scaler.fit_transform(dataset)
train_data = scaled_data[:training_data_len, :]
44
# Split the data into x_train and y_train data sets
x_train = []
y_train = []
x_train.append(train_data[i-window_size:i, 0])
y_train.append(train_data[i, 0])
if model_type == "LSTM":
model = Sequential()
model.add(LSTM(128, return_sequences=True,
input_shape=(x_train.shape[1], 1)))
model.add(LSTM(64, return_sequences=False))
model.add(Dense(25))
model.add(Dense(1))
45
model.compile(optimizer='adam', loss='mean_squared_error')
else:
# Scikit-learn models
if model_type == "LinearRegression":
model = LinearRegression()
model = KNeighborsRegressor()
model = XGBRegressor()
else:
model.fit(x_train, y_train)
46
return model, scaled_data, scaler, training_data_len
# predictions
x_test = []
x_test.append(test_data[i-window_size:i, 0])
x_test = np.array(x_test)
predictions = model.predict(x_test)
return predictions
47
@app.post("/api/predict")
),
default="RandomForestRegressor",
),
):
"""
Predicts the closing stock price for a given company based on user-specified
parameters.
48
Args:
time_diff_unit (str, optional): The time difference unit (days, hours, minutes).
Defaults to "days".
Returns:
additional information.
"""
try:
print("company-->", company)
data = download_and_preprocess_data(
company, time_diff_value)
49
data, model_type)
predictions = make_predictions(
train = data[:training_data_len]
valid = data[training_data_len:]
valid['Predictions'] = predictions
valid['Predictions'] = valid['Predictions'].apply(
plt.figure(figsize=(16, 6))
plt.title(f"{model_type} Model")
plt.plot(train["Close"])
plt.plot(valid[["Close", "Predictions"]])
image_path = "prediction_plot.png"
50
plt.savefig(image_path)
plt.close()
return {
"company": company,
"validation_table": valid,
"plot_image": f"/image/{image_path}",
except Exception as e:
@app.get("/image/{filename}")
return FileResponse(f'{filename}')
if __name__ == "__main__":
import uvicorn
51
FRONT-END:
// src/StockPredictor.jsx
import { XCircle } from 'lucide-react'; // Importing the close icon from lucide-react
52
setIsLoading(true); // Start loading
try {
params: {
company,
time_diff_value: timeFrame,
model_type: modelType,
},
});
await get_image();
setShowModal(true);
} else {
} catch (error) {
if (error.response) {
53
console.error('Error response from server:', error.response.data);
} else if (error.request) {
} else {
} finally {
};
try {
responseType: 'blob',
54
});
globalImageObjectURL = URL.createObjectURL(imageBlob);
} catch (error) {
};
return (
<div
style={{
}}
>
55
{/* Loading popup */}
{isLoading && (
</div>
</div>
)}
<div className='mb-16'>
</div>
56
<div className="mb-4">
<input
type="text"
value={company}
/>
</div>
<div className="w-1/2">
<select
id="timeFrame"
value={timeFrame}
57
className="border-x-8 p-3 border-white bg-white w-full rounded-xl"
>
<option value="days">Days</option>
<option value="hours">Hours</option>
<option value="minutes">Minutes</option>
</select>
</div>
<div className="w-1/2">
<select
id="modelType"
value={modelType}
>
58
<option
value="RandomForestRegressor">RandomForestRegressor</option>
<option value="ExtraTreesRegressor">ExtraTreesRegressor</option>
<option value="XGBRegressor">XGBRegressor</option>
<option value="LinearRegression">LinearRegression</option>
<option value="KNeighborsRegressor">KNeighborsRegressor</option>
<option value="LSTM">LSTM</option>
</select>
</div>
</div>
<button
onClick={handlePredict}
>
Predict
</button>
{showModal && (
59
<div className="fixed inset-0 bg-black bg-opacity-75 flex justify-center
items-center z-50">
<button
>
</button>
</div>
</div>
)}
</div> </div>
60
9.2 SCREENSHOTS
61
CHAPTER 10
REFERENCES
1. J. Li, S. Pan, and L. Huang, ‘‘A machine learning based method for
customer behavior prediction,’’ Tehnicki Vjesnik-Tech. Gazette, vol.
26, no. 6, pp. 1670–1676, 2019.
62