PT Report
PT Report
PT Report
at
Sathyabama Institute of Science and Technology
(Deemed to be University)
By
OCT 2024
i
SATHYABAMA
INSTITUTE OF SCIENCE AND TECHNOLOGY
(DEEMED TO BE UNIVERSITY)
Accredited with Grade “A” by NAAC
(Established under Section 3 of UGC Act, 1956)
JEPPIAAR NAGAR, RAJIV GANDHI SALAI, CHENNAI– 600119
www.sathyabamauniversity.ac.in
BONAFIDE CERTIFICATE
Internal Guide
Dr.N.Srinivasan
_________________________________________________________________
ii
DECLARATION
I Paramita Saha hereby declare that the Project Report entitled Stock Market Price Prediction
Done by me under the guidance of Ms.P.Preethy Jemima and Dr.M.D.Anto Praveena submitted
in partial fulfillment of the requirements for the award of Bachelor of Engineering degree in Computer
Science and Engineering.
DATE:
iii
ACKNOWLEDGEMENT
I would like to express my sincere and deep sense of gratitude to my Project Guide
Dr.N.Srinivasan for his valuable guidance, suggestions and constant encouragement
paved way for the successful completion of my project work.
I wish to express my thanks to all Teaching and Non-teaching staff members of the
Department of Computer Science and Engineering who were helpful in many
ways for the completion of the project
iv
TRAINING CERTIFICATE
v
ABSTRACT
The project will focus on building a predictive model utilizing algorithms such as
Linear Regression, Decision Trees, and advanced techniques like Long Short-
Term Memory (LSTM) networks, which are particularly effective in handling time-
series data. The model will analyze historical stock data, including open, high,
low, close prices, and trading volume. Additionally, technical indicators like
moving averages, RSI (Relative Strength Index), and MACD (Moving Average
Convergence Divergence) will be incorporated as features.
The training and evaluation of the model will involve the use of publicly available
datasets, such as those from Tesla Finance. The model's performance will be
measured using metrics like Mean Squared Error (MSE) and accuracy, and
results will be compared to traditional statistical methods to assess
improvements in prediction accuracy.
vi
Table of Content
Abstract.......................................................................................................................................i
List of figures..............................................................................................................................ii
List of Tables..............................................................................................................................iii
List of Abbrevations:..................................................................................................................iv
1. INTRODUCTION:...................................................................................................................1
1.1 Overview..............................................................................................................................2
1.2 Project Definition..................................................................................................................2
2. AIM AND SCOPE OF PRESENT INVESTIGATION.............................................................3
2.1 Literature Survay..................................................................................................................4
3. EXPERIMENTAL METHODS AND MATERIAL USED IN ALGORITHM...........................10
3.1 Existing System..................................................................................................................11
3.2 Proposed System...............................................................................................................11
3.3 Requirements Analysis and Specification..........................................................................12
3.3.1 Input Requirements........................................................................................................12
3.3.2 Output Requirements......................................................................................................13
3.3.3 Fuctional Requirements..................................................................................................13
3.4 Software Environment........................................................................................................14
3.5 Software Description..........................................................................................................14
4. RESULT, DISCUSSION AND PERFORMANCE ANALYSIS.............................................16
4.1 Use of case Diagram..........................................................................................................17
4.2 Classic Diagram.................................................................................................................18
4.3 Flowchart............................................................................................................................19
5. SUMMARY AND CONCLUSION.........................................................................................20
5.1 Architecture Overview........................................................................................................21
5.2 Module Description.............................................................................................................23
6. SYSTEM IMPLEMENTATION.............................................................................................27
6.1 Sample Coding...................................................................................................................28
6.2 Sample Screen...................................................................................................................31
7. RESULTS AND DISCUSSION…………….....………...........................................................35
7.1 Performance Analysis Report.............................................................................................36
8. CONCLUSION AND FUTURE ENHANCEMENT...........................................................38
8.1 Conclusion......................................................................................................................39
8.2 Future Enhancement..........................................................................................................39
9. REFERENCES.....................................................................................................................40
vii
TABLE OF FIGURES
viii
ix
CHAPTER-1
INTRODUCTION
1
INTRODUCTION
1.1 OVERVIEW
This project focuses on predicting Tesla's stock prices using machine learning
techniques. It involves importing historical Tesla stock data, performing
exploratory data analysis, feature engineering, and developing predictive models.
Models like Logistic Regression, Support Vector Classifier (SVC), and XGBoost
are trained to predict if the stock’s closing price will increase the next day. The
models are evaluated based on their accuracy and AUC scores, using key
features like price differences and quarter-end markers to drive predictions.
Key Steps:
1. Data Preprocessing: Clean and prepare the Tesla dataset, engineer relevant
features.
2. Modeling: Implement and train Logistic Regression, SVC, and XGBoost
models.
3. Evaluation: Assess model performance using metrics like AUC and confusion
matrices.
The project aims to provide a reliable stock price movement prediction tool for
informed decision-making.
2
CHAPTER 2
LITERATURE SURVEY
3
LITERATURE SURVEY
4
b. Data Preprocessing: The dataset is cleaned and prepared for analysis,
including handling missing values and normalizing the data.
c. Feature Engineering: Technical indicators are created, and features like the
difference between the opening and closing prices, as well as moving
averages, are computed to improve model performance.
d. Model Building: Various machine learning algorithms (e.g., Logistic
Regression, SVC, and XGBoost) are trained on the data, and advanced
models like LSTM are utilized to capture temporal patterns in stock price
movements.
e. Model Evaluation: The models are evaluated using metrics such as AUC,
accuracy, and confusion matrices. K-fold validation is performed to assess
model generalization, and a 70%:30% train-test split is used for performance
evaluation.
Year: 2021
Description:
The stock market's complexity, coupled with its highly volatile nature, has long
posed a challenge to both investors and analysts. The rise of artificial intelligence (AI)
and machine learning (ML) has opened up new opportunities for predicting stock
market movements more accurately than ever before. Mokhtari et al. (2021)
investigated the effectiveness of various machine learning algorithms in stock price
prediction and compared their performance against traditional statistical models. The
research emphasized the use of AI-based predictive models to analyze vast amounts
of historical data, identify patterns, and forecast future stock prices.
The study found that machine learning algorithms such as Decision Trees,
Random Forests, and Artificial Neural Networks (ANN) significantly improved stock
price prediction accuracy over conventional models. This is attributed to the ability of
5
these algorithms to handle large, complex datasets and detect non-linear
relationships that are often missed by traditional models. Additionally, the integration
of AI into stock prediction systems has led to advancements in real-time data
analysis, further improving the timeliness and relevance of stock market forecasts.
Methodology:
Year: 2021
Description:
The research focused on using GNB to classify stock price movements based
on historical stock data and key market indicators. While Naïve Bayes is not
commonly used for regression problems like stock price prediction, the authors
demonstrated its ability to predict the direction of price movement (up or down),
which can still be highly useful for making investment decisions. The simplicity of
GNB allows for quick training and fast predictions, making it practical for real-time
stock market applications.
The authors concluded that GNB performs well when applied to stock price
movement classification tasks, especially when integrated with other
predictive models to form an ensemble. The study further suggested that the use of
probabilistic models like GNB, alongside other machine learning techniques, can
significantly enhance the robustness and accuracy of stock market predictions.
Methodology:
7
The methodology followed in this study is summarized as follows:
a. Data Collection: The authors used historical stock price data and various
technical indicators from multiple markets to develop the prediction model.
b. Data Preprocessing: Raw data was cleaned and transformed for model input.
Missing data was handled appropriately, and normalization techniques were
applied to scale the features.
c. Feature Selection: Key features such as daily price fluctuations, trading
volume, and technical indicators like moving averages and Bollinger Bands
were selected for the model.
d. Model Building: The Gaussian Naïve Bayes algorithm was employed to
classify the stock price movement (increase or decrease). The model
assumes a normal distribution for the input features and calculates the
likelihood for each class.
e. Model Evaluation: The GNB model was evaluated using accuracy metrics,
confusion matrices, and precision-recall analysis. The results were compared
to other machine learning models such as Decision Trees and SVM to assess
its relative performance. K-fold cross-validation was used to ensure that the
model generalizes well to unseen data.
Year: 1997
Description:
8
Fama and French’s analysis of cross-sectional data from various industries
showed significant differences in equity costs based on the characteristics of each
industry. Riskier industries, such as technology and startups, were found to have
higher costs of equity, while more stable sectors like utilities exhibited lower equity
costs. Their findings suggested that industry-specific risk factors, beyond market-
wide factors, must be accounted for when estimating the cost of equity.
This study is a foundational work in finance and has influenced a wide range
of applications, including stock market prediction models. The integration of industry-
specific risk factors is often incorporated in machine learning models to predict stock
prices more accurately. In modern stock price prediction algorithms, these factors
help improve predictions by providing context-specific adjustments to standard risk
factors.
Methodology:
i. Data Collection: The study used historical data from a wide range of
industries, focusing on firm characteristics such as market capitalization, book-
to-market ratios, and stock returns.
ii. Three-Factor Model: The authors applied the Fama-French three-factor
model, which includes market risk, size, and value risk (book-to-market ratio),
to estimate the cost of equity for each industry.
iii. Risk Factor Estimation: Industry-specific equity costs were calculated based
on the sensitivity of each industry’s stock returns to the three factors. This
analysis revealed which industries have higher or lower equity costs, based on
their risk exposure.
iv. Model Evaluation: The effectiveness of the three-factor model was tested by
comparing its estimates of equity costs to traditional models like the Capital
Asset Pricing Model (CAPM). The results demonstrated the superiority of the
three-factor model in capturing industry-level variations in equity costs.
9
CHAPTER 3
SYSTEM ANALYSIS
10
SYSTEM ANALYSIS
Stock market prediction has traditionally been approached using time series analysis
and various machine learning algorithms, such as Logistic Regression, LDA
(Linear Discriminant Analysis), SVM (Support Vector Machine), and K-Nearest
Neighbors (KNN). These models aim to predict stock price trends (whether the stock
price will go up or down) based on historical data. Most existing systems rely on raw
price data such as open, high, low, and close values, along with the volume of shares
traded.
Many prediction models utilize basic features without the implementation of more
advanced techniques like deep learning or enhanced feature engineering, limiting
their ability to capture complex market trends and non-linear patterns. Additionally,
these models often face challenges related to the highly volatile and noisy nature of
stock market data, which can reduce their prediction accuracy.
● Predicted Stock Trends: The system will output the predicted stock price
movement (up or down) for Tesla stock, based on the historical data used for
training the model.
● Data Visualization: Visual representations (e.g., graphs, charts) showing the
predicted stock trends, model performance (e.g., accuracy, confusion matrix),
and feature importance.
● System for Interaction: A computer system for interacting with the data and
models, ideally with pre-installed tools like Jupyter Notebook, Python, and
necessary machine learning libraries (e.g., Scikit-learn, XGBoost).
13
during model training and evaluation processes.
● Operating System: Windows | Linux | Mac | any other stable operating system
● Tools: Colab.
● DataBase: ExcelSheet.
Python:
Colab:
Colaboratory is a data analysis tool that combines code, output, and descriptive text
into one document (interactive notebook).
● Colab provides GPU and is free. By using Google Colab , you can:
● Build your analytics products quickly in a standardized environment.
● Facilitates popular DL libraries on the go such as PyTorch, and TensorFlow
● Share code & results within your Google Drive
● Save copies and create playground modes for knowledge sharing
15
CHAPTER 4
SYSTEM DESIGN
16
SYSTEM DESIGN
4.1 Use case Diagram:
A Use Case Diagram visually represents the interactions between users
(actors) and a system to achieve specific goals. It highlights the system's
functionality through use cases (actions) and the external entities interacting
with those functions. This diagram is useful for understanding system
requirements and how users engage with the system, focusing on what the
system does rather than how it performs those actions.
17
4.2 Class Diagram:
A Class Diagram shows the static structure of a system by depicting its
classes, attributes, methods, and the relationships between classes. Each
class represents an object or entity within the system, detailing its properties
and behaviors. This diagram helps in modeling the system's object-oriented
structure, showing the key components and their interactions at a high level.
18
4.3 Flowchart:
19
CHAPTER 5
SYSTEM ARCHITECTURE
20
SYSTEM ARCHITECTURE
We divide the total dataset into two subsets: the training dataset and the test
dataset. If any record contains missing values, undefined, or irrelevant values in any
feature, that record is placed into the training dataset. This dataset is used to train
machine learning models by handling noise and learning patterns from imperfect
data. On the other hand, if the record contains complete and clean data across all
selected features, it is placed into the test dataset. The test dataset contains high-
quality data and is used to evaluate the accuracy and performance of the trained
model in predicting stock price movements.
The training dataset helps in creating a robust model capable of dealing with noisy or
incomplete data, while the test dataset ensures reliable model validation on clean,
real-world data. This architecture ensures that the system can handle data
imperfections while producing accurate stock prediction
21
Test and Training Dataset for Stock Market Prediction:
Separating the data into Test Dataset and Training Dataset is crucial to properly evaluate
and understand the performance of machine learning models. By doing so, you minimize the
effects of data inconsistency and better comprehend the characteristics of the prediction
model. In this case:
● Training Dataset: The training dataset consists of irrelevant or noisy data, which
helps the model learn and adjust during the training process.
● Test Dataset: The test dataset contains all the relevant data required for accurate
stock price predictions, helping to evaluate the performance of the trained model.
22
5.2 Module Description
1. Importing libraries that will be used throughout this program.
i. Numpy
ii. Pandas
iii. Seaborn
2. Loading the data.
3. Storing the data into a data frame.
4. Get the number of rows and columns in the data.
#There are 1470 rows of data or employees in the data set and 35 columns or data
points on each employee.
5. Get the column data types.
23
Here there is no missing data since all of the columns are returning a value of 0. Let's
double check the data set for any missing values.
False
8. Viewing some basic statistics about the data like the percentile, maximum,
minimum etc.
24
9. Get a count of high and low data & visualize it:
25
11. Splitting the dataset into 75% training set and 25% testing set.
26
CHAPTER 6
SYSTEM IMPLEMENTATION
27
6.1 Sample coding
28
29
30
6.2 Sample Screen:
1. Importing CSV:
2. Describing data:
3. Data info:
31
4. Data analysis:
5. Data head:
32
7. Data visualizing:
8.
33
9. Bar graph of dataset:
34
CHAPTER 7
35
7.1 PERFORMANCE ANALYSIS REPORT:
The Tesla stock market dataset used in this analysis serves as a good
representative for understanding stock price fluctuations in a real-world environment.
The Random Forest algorithm was chosen for its robustness in handling complex,
nonlinear relationships, and its ability to reduce overfitting in predictive models.
- Feature Selection: The features chosen for this prediction, such as stock prices,
trading volumes, and market trends, are highly influential in determining stock price
movements. These features are important in accurately predicting future stock prices,
justifying the selection of the dataset for training the model.
- Random Forest Classifier: The Random Forest algorithm was employed due to its
capability to handle large datasets and complex relationships. The reason for this
choice stems from its ability to create multiple decision trees and take a majority vote,
which results in better generalization for unseen data.
- Training Data: Used to build the model, the training dataset plays a crucial role in
teaching the model to detect patterns in stock price fluctuations.
- Test Data: Used to evaluate the performance, the test dataset gives insights into
the model’s ability to predict accurate stock prices based on real-time data.
- Accuracy Evaluation:
- When tested using the original dataset, the model showed a prediction accuracy
of around 91%. This indicates that the selected features and the machine learning
approach used were effective in predicting stock prices with a relatively high degree
of confidence.
- In scenarios where a synthetic dataset was introduced (to balance and enrich the
dataset), the prediction accuracy increased to about 94%. The synthetic data helped
in stabilizing the model by addressing data imbalances and preventing overfitting.
36
- Instance-based Classification: Instead of building a general predictive model, the
random forest approach works by storing instances of the data and classifying them
through majority voting from the decision trees. This results in more stable and
reliable predictions, particularly in volatile stock markets where trends fluctuate
frequently.
The performance of the Random Forest classifier in predicting Tesla stock prices
demonstrates its effectiveness in capturing trends and making accurate predictions.
The model achieves an accuracy of 91% using the original dataset and 94% with
synthetic data, reflecting its robustness and reliability in stock market forecasting.
37
CHAPTER 8
38
8.1 Conclusion:
The stock market plays a critical role in the global economy, and predicting stock
price fluctuations, particularly for high-profile companies like Tesla, can provide
substantial financial advantages. Through this project, we analyzed historical stock
data and implemented machine learning techniques, such as the Random Forest
algorithm, to predict stock prices. The results demonstrate that data mining
techniques can construct reliable and accurate predictive models for stock market
behavior. The analysis shows that the selection of key features from stock data plays
a pivotal role in increasing the prediction accuracy. By utilizing the test and training
datasets effectively, we achieved a high prediction accuracy of 91% with the original
dataset, and an improved 94% accuracy with the synthetic dataset. These findings
indicate that machine learning models can assist investors in making informed
decisions.
While the current project focuses on predicting stock prices using historical data and
traditional market indicators, future work can include integrating psychological
factors that influence investor behavior, such as market sentiment analysis through
social media trends or news reports. Understanding how psychological factors affect
stock price fluctuations can help provide more dynamic and early predictions,
enhancing the predictive model further. Additionally, incorporating real-time data
updates for market fluctuations could significantly improve model accuracy and
provide timely predictions for investors.
39
CHAPTER 9
REFERENCES
40
9. REFERENCES:
vi) S. Chopra, D. Yadav, and A. N. Chopra, “Artificial Neural Networks based Indian
stock market price prediction: Before and after demonetization,” International Journal
of Swarm Intelligence and Evolutionary Computation, vol. 8, no. 1, pp. 1–7, Jan.
2019, [Online].
41