0% found this document useful (0 votes)

18 views27 pages

My File

Report on stocks

Uploaded by

Harapriya Swain

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views27 pages

My File

Report on stocks

Uploaded by

Harapriya Swain

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 27

SKILL LAB AND PROJECT - I

Title :
Stock Market Price Prediction Using Machine Learning

Group Members :
Dibyajyoti Satpathy [SIC: 22BCTE32, GROUP: CST(A3) ROLL NO: 20]

Bhabani Charan Panda [SIC: 22BCTC48, GROUP: CST(A3) ROLL NO: 18]

Suraj Kumar Sikhar [SIC: 22BCTG89, GROUP: CST(A3) ROLL NO: 15]

Ankush Patel [SIC: 22BCTE07, GROUP: CST(A3) ROLL NO: 17]

Swadesh Panda [SIC: 22BCTJ04, GROUP: CST(A2) ROLL NO: 30]

Guided By :
MR ASIT KUMAR DAS
[Assistant Professor, Silicon University, Bhubaneswar]
Contents
1. Objective
2. Introduction
I. Role of Machine Learning in Stock
Market
II. Overview of ITC Stock Market
3. Literature Review
4. About the Model
5. Workflow
6. Tools and Techniques
7. Dataset
8. Implementation
9. Evaluation
10. Code Snippet
11. Result Analysis
12. Future Work
13. Conclusion
14. References
Objective
The main objective of this project is to develop a machine learning-
based predictive model that can accurately predict stock prices
based on historical data. By analyzing patterns and trends in
stock prices, the goal is to provide an efficient and reliable tool
for investors and traders. The specific objectives of the project
include:

1. Data Collection and Analysis: The first objective was to

collect and analyze a dataset containing historical stock
price data, including attributes such as opening price,
closing price, volume, and adjusted close prices. This data
serves as the foundation for training and evaluating the
predictive models.

2. Model Development: Using the dataset, the project

aimed to build machine learning models, specifically
Linear Regression and Support Vector Regression (SVR), to
predict stock prices. These models were selected for their
capability to handle both linear and non-linear
relationships in the data.

3. Model Evaluation: The models were trained on a subset

of the data and tested on another subset to evaluate their
performance. Key evaluation metrics such as Mean
Squared Error (MSE), Mean Absolute Error (MAE), Root
Mean Squared Error (RMSE), and R-squared value were
used to measure the effectiveness of the models.

4. Deployment of Predictive System: The final objective

was to develop a predictive system that could take new
stock market data as input and provide accurate price
predictions. This system aims to assist investors in making
informed decisions and navigating the complexities of
financial markets.
Introduction
The stock market is one of the most dynamic and unpredictable
domains, influenced by a myriad of factors such as economic policies,
global events, and investor sentiment. Predicting stock prices has long
been a focus of research and innovation, with traditional methods
often falling short due to their inability to adapt to rapidly changing
market conditions. Machine learning (ML) has emerged as a
transformative tool, offering advanced algorithms capable of
analyzing complex datasets to uncover patterns and make accurate
predictions.
Role of Machine Learning in the Stock Market
Machine learning has revolutionized stock market analysis by
automating the process of identifying patterns in historical data,
thereby providing actionable insights. ML models excel at detecting
non-linear relationships, enabling the prediction of stock price
movements that traditional statistical methods might miss. By
leveraging algorithms like Linear Regression and Support Vector
Regression (SVR), investors and analysts can improve their
forecasting accuracy, manage risks, and optimize investment
strategies.
Example: ITC Stock Market
This project focuses on ITC Limited, a prominent company listed in
the Indian stock market. ITC has a diverse business portfolio,
including FMCG, hotels, paperboards, and agribusiness, making its
stock performance a reflection of multiple industry trends. Analyzing
ITC’s historical stock data provides a challenging yet rewarding
opportunity to develop predictive models that can capture its price
trends and volatility.
Overview of ITC Stock Market
ITC’s stock has exhibited considerable variation over the years,
influenced by market conditions, company performance, and
external factors like government policies and global economic trends.
The dataset used in this project includes over 6,800 records of ITC’s
stock prices, covering attributes such as opening price, closing price,
adjusted close price, and trading volume. The analysis and
predictions derived from this dataset aim to provide valuable insights
into ITC’s stock performance, demonstrating the effectiveness of
machine learning models in financial forecasting.
Literature Review
Stock price prediction has been a focal point in both academia
and industry, leading to the development of numerous predictive
models. Classical methods such as ARIMA and Moving Averages
rely heavily on historical price data but often lack adaptability to
changing market dynamics. Machine learning approaches,
including Linear Regression, Decision Trees, Support Vector
Machines, and Neural Networks, have shown promise in
addressing these limitations. Studies have demonstrated that
combining historical data with external factors, such as news
sentiment and macroeconomic indicators, can enhance
prediction accuracy. Moreover, research emphasizes the
importance of preprocessing techniques like normalization,
feature selection, and handling missing values to improve model
performance. Despite advancements, challenges remain in
achieving high accuracy due to market volatility and unforeseen
events.

Recent research in financial prediction has emphasized the

integration of advanced machine learning techniques with
traditional statistical models to improve forecasting accuracy.
Techniques such as ensemble learning (e.g., Random Forests
and Gradient Boosting) have gained traction due to their ability
to reduce overfitting and enhance prediction robustness. Studies
have also explored the use of deep learning models, including
Long Short- Term Memory (LSTM) networks and Convolutional
Neural Networks (CNNs), for time-series forecasting,
demonstrating improved performance in capturing long-term
dependencies and intricate patterns in stock price data.
Additionally, the use of sentiment analysis from news articles
and social media platforms has been shown to complement
historical data, providing contextual insights that influence
market trends. Despite these advancements, challenges such as
data noise, overfitting, and the need for extensive computational
resources continue to pose barriers, underscoring the
importance of selecting appropriate models and preprocessing
techniques for each specific application.
About the Model
The stock price prediction models implemented in this project
include Linear Regression and Support Vector Regression (SVR),
which are widely used in machine learning for regression tasks.
These models were selected for their ability to analyze historical
stock price data and predict future trends, each offering unique
advantages in handling different types of relationships within the
data.
Linear Regression: A supervised learning algorithm that
establishes a relationship between dependent and independent
variables by fitting a linear equation. The model predicts stock
prices by analyzing historical data, including features like open
price, high price, low price, closing price, and trading volume.
The simplicity and interpretability of Linear Regression make it a
suitable choice for initial experimentation.

Support Vector Regression (SVR): A machine learning model

capable of capturing non-linear relationships in the data. SVR
uses kernel functions to transform input features into higher
dimensions, enabling the model to find optimal hyperplanes for
regression tasks. This capability makes it effective in handling
the complexities of stock market data.

The models’ performance is evaluated using metrics like

Mean Squared Error (MSE), R-squared value, Mean Absolute
Error (MAE), and Root Mean Squared Error (RMSE). While
Linear Regression provides a baseline, SVR addresses its
limitations in capturing non-linear patterns, offering a more
comprehensive approach to stock price prediction.
Workflow

The workflow for this project consists of the following steps:

Data Collection and Preprocessing: The dataset, containing historical
stock price data for ITC Limited, was sourced in CSV format. It includes
attributes such as Open, High, Low, Close, Adjusted Close, and Volume.
Missing values were addressed by filling with mean value, and exploratory
data analysis (EDA) was conducted to ensure data integrity and gain
insights into patterns and trends.
Feature Selection and Separation: Relevant features (e.g., Open, High,
Low, Volume) were selected to form the input dataset (X).The target
variable (y) was chosen as the "Close" price, representing the stock price
prediction objective.
Data Splitting: The dataset was split into training and testing sets using
an 80-20 ratio to evaluate the models' generalization capabilities.
Data Standardization: Feature scaling was performed using
StandardScaler to normalize the data, ensuring improved model
convergence and performance.
Model Training: Two machine learning models, Linear Regression and
Support Vector Regression (SVR) with an linear kernel, were trained on the
standardized training dataset. These models were selected to compare
their performance in predicting stock prices.
Model Evaluation
The trained models were evaluated on the test dataset using performance
metrics, including Mean Squared Error (MSE), Mean Absolute Error (MAE),
Root Mean Squared Error (RMSE), and R-squared value.
Prediction System Development: A predictive system was developed
to forecast stock prices for new data. Input features are standardized
before passing them to the trained models for prediction.
Tools and Techniques
This project leverages various tools and techniques to
implement and evaluate the stock price prediction models:

 Programming Language: Python, chosen for its

versatility and extensive library support.

 Libraries:
o Pandas: Used for data manipulation and
preprocessing, enabling efficient handling of large
datasets.
o NumPy: Provides numerical computation tools for
efficient array operations and mathematical functions.
o Matplotlib: Enables the creation of static,
animated, and interactive visualizations, aiding
in data exploration.
o Scikit-learn: Includes machine learning tools for
implementing Linear Regression, SVR models, and
evaluation metrics.
o Seaborn: Used for advanced visualization,
providing better aesthetics and additional insight
extraction from data.
o Statsmodels: Facilitates statistical tests and
regression analysis for more in-depth data
understanding.

 Environment:
o Jupyter Notebook: An interactive platform for
coding, analysis, and visualization.
o Google Colab: Employed for experimentation with
larger datasets and computationally intensive tasks,
leveraging cloud resources.
These tools enable efficient data handling, model training, and
performance evaluation, forming the foundation for this machine
learning project.
Dataset
The dataset used in this project comprises historical stock price
data for ITC Limited, a leading company in the Indian stock
market. The dataset was carefully curated to analyze trends and
predict future stock prices.
1. Source:
The dataset was sourced from a CSV file named ITC.NS.csv,
containing historical stock price data for ITC Limited. This
dataset was obtained from Yahoo Finance, a widely used
platform for financial data.
2. Data Set Information:
This dataset contains daily trading data for ITC Limited, including
crucial attributes that influence stock price movements. The
dataset provides comprehensive information about the stock’s
performance over time, enabling the development of predictive
models.
3. Size and Structure:
 Total Samples: 6,838
 Features: 7 columns, including: Date: The trading day.
Open: Opening price of the stock. High: Highest price
during the trading session. Low: Lowest price during the
trading session. Close: Closing price of the stock. Adj Close:
Adjusted closing price (accounting for splits, dividends,
etc.). Volume: Total number of shares traded.
5. Data Preprocessing:
 Handling Missing Values:
Checked for missing values and found no null entries,
ensuring data consistency.
 Feature Scaling:
Scaled numerical features (e.g., prices and volume) to
standardize the data and improve model performance.
 Statistical Insights:
Analyzed the mean, standard deviation, and other metrics to
understand the data distribution and patterns.
Implementation
Data Loading and Preprocessing:
Data is imported using Pandas, enabling efficient handling of large
datasets. Missing values are addressed through imputation
techniques or removal, ensuring data integrity.
Exploratory Data Analysis (EDA):
Line plots and histograms are used to visualize stock price trends
and distributions. Correlation matrices identify relationships
between features, aiding in feature selection.
Model Training:
The dataset is split into training and test sets, ensuring unbiased
evaluation. A Linear Regression model is trained using Scikit-learn,
mapping feature values to stock prices. An SVR model is trained
with an linear kernel, capturing non- linear dependencies in the
data.
Model Testing:
Predictions are generated on the test dataset for both models.
Performance is evaluated using metrics such as Mean Squared Error
(MSE), Mean Absolute Error (MAE), Root Mean Squared Error
(RMSE), and R-squared value.

Evaluation
The performance of the Linear Regression and SVR models is
assessed using the following metrics:
Mean Squared Error (MSE): Ǫuantifies the average squared
difference between predicted and actual values, reflecting
prediction accuracy.
Mean Absolute Error (MAE): Represents the average absolute
difference between predicted and actual values, providing a
straightforward measure of prediction error.
Root Mean Squared Error (RMSE): Provides the square root of
the MSE, offering a more interpretable metric for the average
prediction error.
R-squared Value: Measures the proportion of variance in the
dependent variable explained by the independent variables,
indicating model fit.
Importing the Dependencies
In [1]: import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn import metrics
from sklearn.linear_model import LinearRegression
from sklearn.svm import SVR
from sklearn.metrics import r2_score,mean_squared_error,mean_absolute_error,root_mean_squared_error

Data Collection and Analysis

In [3]: df =
pd.read_csv('/content/ITC.NS.csv')
df.head()
Out[3]: Date Open High Low Close Adj Volume
Close
0 01-01- 5.55000 5.60000 5.53333 5.58333 3.323907 985500.0
1996 0 0 3 3
1 02-01- 5.46666 5.56666 5.28888 5.37222 3.198226 7470000.0
1996 6 6 8 2
2 03-01- 5.13333 5.25444 5.10111 5.20000 3.095698 15160500.
1996 3 4 1 0 0
3 04-01- 5.20000 5.33222 5.14444 5.29777 3.153908 12397500.
1996 0 2 4 7 0
4 05-01- 5.29777 5.27777 5.18888 5.20222 3.097020 5008500.0
1996 7 7 8 2

In [4]: df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6890 entries, 0 to 6889
Data columns (total 7 columns):
# Column Non-Null Dtype
Count
0 Date 6890 non- object
null
1 Open 6879 non- float64
null
2 High 6879 non- float64
null
3 Low 6879 non- float64
null
4 Close 6879 non- float64
null
5 Adj Close 6879 non- float64
null
6 Volume 6879 non- float64
null
dtypes: float64(6), object(1)
memory usage: 376.9+ KB

In df.shape
[5]:
(6890, 7)
Out[5]
:

In df.describe()
[6]:

Out[6]: Open High Low Close Adj Close Volume

count 6879.00000 6879.00000 6879.00000 6879.00000 6879.00000 6.879000e+

0 0 0 0 0 03
mean 121.511181 122.859461 120.011619 121.426830 99.025659 4.402422e+
07
std 106.444634 107.405065 105.373130 106.383687 94.769531 9.927486e+
07
min 4.182222 4.182222 4.144444 4.182222 2.489788 0.000000e+
00
25% 18.222221 18.541666 17.889999 18.169999 11.208004 8.776656e+
06
50% 72.599998 74.416664 71.000000 72.583336 51.009666 1.371964e+
07
75% 219.666672 222.266663 217.333328 219.933334 181.465515 2.581413e+
07
max 432.799988 433.450012 429.350006 431.450012 431.450012 1.294168e+
09
Transform Data
In [8]: columns_to_drop = ['Adj Close']
df.drop(columns=columns_to_drop, inplace = True) # inplace=true implies that no new data is created and
changed in e

columns_order = ['Date', 'Open', 'High', 'Low', 'Volume',

'Close'] df = df[columns_order]
df.head(2)
Out[8]: Date Open High Low Volume Close

0 01-01-1996 5.55 5.60 5.53 985500.0 5.58

1 02-01-1996 5.47 5.57 5.29 7470000.0

5.37

In [9]: df['Date'] = pd.to_datetime(df['Date'], format="%d-%m-%Y",

errors='coerce') df['Date'] = df['Date'].dt.strftime('%d-%m-%Y')
df.head(2)

Out[9]
Date Open High Low Volume Close
:
0 01-01-1996 5.55 5.60 5.53 985500.0 5.58

1 02-01-1996 5.47 5.57 5.29 7470000.0

5.37

Handling Missing and Duplicate Values

In [10]: df.isnull().sum()

Out[10]
0
:
Date 0

Open
11
High 11

Low
11
Volume 11

Close
11

dtype: int64

In [11]: df['Open'] = df['Open'].fillna(df['Open'].mean()) # Replace NaN with the column mean

df['High'] = df['High'].fillna(df['High'].mean())
df['Low'] = df['Low'].fillna(df['Low'].mean())
df['Volume'] = df['Volume'].fillna(df['Volume'].mean())
df['Close'] = df['Close'].fillna(df['Close'].mean())

In [13]: # Finding the duplicate values

print(df.duplicated().sum()) # gives the duplicate values in the dataset

Data Visualization
In [14]: df.set_index('Date', inplace=True)
# Plot the 'Open' prices with the Date as the x-axis
df['Open'].plot(figsize=(16, 6), title='Open Prices Over Time', legend=True)
plt.xlabel('Date') # Label the x-axis
plt.ylabel('Open Price') # Label the y-axis
plt.grid(True)
plt.show()
In [16]: # Plot the heatmap
plt.figure(figsize=(8, 6))
sns.heatmap(df.corr()[['Close']].sort_values(by='Close',
ascending=False), annot=True, cmap='coolwarm',fmt='.6f')
plt.title("Correlation of Features with Target
Variable") plt.show()

In [17]: plt.figure(figsize=(8, 6))

plt.scatter(df['Close'], df['Open'], c=colors, alpha=0.7, s=100)
plt.title("Correlation between Close and Open Prices")
plt.xlabel("Close Prices")
plt.ylabel("Open Prices")

In [18]: plt.figure(figsize=(8, 6))

plt.scatter(df['Close'], df['Low'], c=colors, alpha=0.7,
s=100) plt.title("Correlation between Close and Low Prices")
plt.xlabel("Close
Prices")
plt.ylabel("Low

In [19]: plt.figure(figsize=(8, 6))

plt.scatter(df['Close'], df['High'], c=colors, alpha=0.7, s=100)
plt.title("Correlation between Close and High Prices")
plt.xlabel("Close
Prices")
plt.ylabel("High

In [20]: plt.figure(figsize=(8, 6))

plt.scatter(df['Close'], df['Volume'], c=colors, alpha=0.7, s=100)
plt.title("Correlation between Close and Volume")
plt.xlabel("Close
Prices")
plt.ylabel("Volume")
Spliting the data to training data and test data
In [21]: x =
df.drop(columns='Close',axis=1)
y = df['Close']
In [22]: x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.2,random_state=42)

In
[23]: In [24]:

Out[23] x_train.shape,x_test.shape,y_train.shape,y_test.shape
: ((5512, 4), (1378, 4), (5512,), (1378,))

Data Standardization
Out[24]: scaler
▾
= StandardScaler()
i ?
scaler.fit(x_train)
StandardScaler
StandardScaler(
)

In [25]: x_train = scaler.transform(x_train)

x_test = scaler.transform(x_test)

Model Training using Linear Regression

In [28]: LRmodel = LinearRegression()
LRmodel.fit(x_train,y_train)
y_pred = LRmodel.predict(x_test)
print(y_pred)

[ 15.79830509 182.28063245 16.48930378 ... 114.17185558 272.4732109

63.99756266]

In
[29]:

Out[29]
:

In dframe =
[30]: pd.DataFrame({'Actual':y_test,'Predicted':y_pred})
dframe.head(5)
y_pred.shape
(1378,)
Out[30]: Actual Predicted

Date

01-06- 15.46 15.798305

2000
17-06- 181.20 182.280632
2020
23-01- 16.50 16.489304
2002
16-08- 60.30 60.552873
2006
13-05- 258.60 259.193598
2022

graph = dframe.head(25)
In [31]: graph.plot(kind='bar',figsize=(16,5))
plt.title("ITC STOCK : Actual Price vs Predicted Price", fontsize=15)
plt.xlabel("Date", fontsize=12)
plt.ylabel("Price",
fontsize=12) plt.show()

Model Evaluation for Linear Regression Model

In
[32]: In [33]:

Out[32] r2_score(y_test, y_pred)

: 0.999884066238908
print('Mean Absolute
Error',metrics.mean_absolute_error(y_test,y_pred)) print('Mean
Squared Error',metrics.mean_squared_error(y_test,y_pred))
import math
print ('Root Mean Squared Error',math.sqrt(metrics.mean_squared_error(y_test,y_pred)))
Mean Absolute Error
0.564774466912428 Mean Squared Error
1.3041262317038838
Root Mean Squared Error 1.1419834638487039

Predictive Model using Linear Regression

In [34]: input_data = (120,130,123,980000)
input_data_as_numpy_array = np.asarray(input_data)
input_data_reshaped =
input_data_as_numpy_array.reshape(1,-1) std_data =
scaler.transform(input_data_reshaped)
prediction = LRmodel.predict(std_data)
print(prediction) # predicting the close price
[130.12984327]

Model Training using Support Vector Regressor

In [36]: svr_model = SVR(kernel='linear', C=50, epsilon=0.2)
svr_model.fit(x_train, y_train)

Out[36]: ▾ SVR i ?

SVR(C=50, epsilon=0.2,
kernel='linear')
In [37]: prediction =
svr_model.predict(x_test)
print(prediction)
[ 15.78699691 182.18550435 16.50436631 ... 114.10188452 272.44207497
64.00406944]

In [38]: dframeS =
pd.DataFrame({'Actual':y_test,'Predicted':prediction})
dframeS.head(5)
Out[38]: Actual Predicted

Date

01-06- 15.46 15.786997

2000
17-06- 181.20 182.185504
2020
23-01- 16.50 16.504366
2002
16-08- 60.30 60.593271
2006
13-05- 258.60 259.317527
2022

graphS = dframeS.head(25)
In [39]: graphS.plot(kind='bar',figsize=(16,5))
plt.title("ITC STOCK : Actual Price vs Predicted Price", fontsize=15)
plt.xlabel("Date", fontsize=12)
plt.ylabel("Price",
fontsize=12) plt.show()

Model Evaluation for Support Vector Regressor Model

In
[40]: In [41]:

Out[40] r2_score(y_test, prediction)

: print('Mean Absolute
0.9998759043005752
Error',metrics.mean_absolute_error(y_test,prediction)) print('Mean
Squared Error',metrics.mean_squared_error(y_test,prediction))
import math
print ('Root Mean Squared Error',math.sqrt(metrics.mean_squared_error(y_test,prediction)))
Mean Absolute Error
0.5604446370532994 Mean Squared
Error 1.3959389856524873
Root Mean Squared Error 1.181498618557164

Predictive Model for Support Vector Regressor

In [42]: input_data = (120,130,123,980000)
input_data_as_numpy_array = np.asarray(input_data)
input_data_reshaped =
input_data_as_numpy_array.reshape(1,-1) std_data =
scaler.transform(input_data_reshaped)
pred = svr_model.predict(std_data)
print(pred) # predicting the close price
[130.41050458]
Results
Correlation Between
Features
The correlation values between features and the target variable "Close" are
as follows:
 Open: 0.999768
 High: 0.999901
 Low: 0.999885
 Volume: -0.306688
 Close: 1.000000
This indicates a very strong positive correlation between the "Close" price
and the "Open," "High," and "Low" prices, while "Volume" shows a weak
negative correlation.
Linear Regression Results
 R² Score: 0.999884066238908
 Mean Absolute Error (MAE): 0.564774466912428
 Mean Squared Error (MSE): 1.3041262317038838
 Root Mean Squared Error (RMSE): 1.1419834638487039
Support Vector Regression (SVR) Results
 R² Score: 0.9998759043005752
 Mean Absolute Error (MAE): 0.5604446370532994
 Mean Squared Error (MSE): 1.3959389856524873
 Root Mean Squared Error (RMSE): 1.181498618557164
Comparative Analysis
The performance of both models is highly accurate, with the following
observations:
 Linear Regression:
o Slightly better R² score compared to SVR.
o Lower RMSE value, indicating better prediction accuracy.
 Support Vector Regression (SVR):
o Marginally lower MAE compared to Linear Regression.
o Provides comparable performance and accurate predictions.
Both models exhibit high accuracy, with minor differences in their evaluation
metrics. This highlights the robustness of the dataset and the models used for
stock price prediction.
Conclusion
This project successfully demonstrates the application of machine learning
models for stock price prediction. By leveraging historical data and
implementing Linear Regression and SVR, we achieved high prediction
accuracy and identified patterns in stock price movements. The evaluation
metrics underscore the effectiveness of these models in handling real-
world financial data, with Linear Regression performing slightly better in
this context.
However, stock market prediction remains a challenging domain due to
inherent volatility and external influences. Future work aims to address
these challenges by incorporating external data, enhancing model
capabilities, and deploying a scalable predictive system for practical use.
This project provides a solid foundation for further exploration and
development in financial forecasting.
In addition, this project highlights the importance of data
preprocessing and feature selection in improving model
performance. The integration of multiple machine learning
techniques, such as Linear Regression and SVR, allowed for a
comprehensive approach to stock price forecasting.

Future Work
While the models implemented in this project provide promising results,
there is substantial room for improvement and further exploration:
1. Incorporation of External Data: Integrating additional datasets
such as news sentiment, macroeconomic indicators, and
financial reports could enhance prediction accuracy by
considering external factors influencing stock prices.
2. Feature Engineering: Advanced techniques like feature selection
algorithms and principal component analysis (PCA) could improve
model efficiency by reducing dimensionality and noise.
3. Model Enhancement: Experimenting with more sophisticated
machine learning algorithms, such as Random Forests, Gradient
Boosting (e.g., XGBoost, LightGBM), LSTM and Deep Learning
models, to capture complex patterns in stock price data.
4. Visualization Dashboards: Creating interactive dashboards to
provide users with clear insights into model predictions, stock
trends, and analysis results.
5. Robust Evaluation: Conducting cross-validation and robustness
checks under varying market conditions to assess model reliability
in different scenarios.
References
ITC Stock Data
Set
 The dataset used for this project is publicly available and can be
accessed from various machine learning repositories such as UCI
Machine Learning Repository or Kaggle.
 https://www.kaggle.com/datasets/tejasurya/itc-stock-price-prediction
Scikit-learn Documentation
 Scikit-learn, the Python machine learning library used for model
implementation, provides comprehensive resources for algorithms
and tools for data preprocessing, model evaluation, and more.
 https://scikit-learn.org/
Pandas Documentation
 For data manipulation and analysis, the Pandas library was used.
 https://pandas.pydata.org/
NumPy Documentation
 NumPy is a core library for numerical computing in Python,
used for array manipulation and mathematical operations in
this project.
 https://numpy.org/
Support Vector Machines - A Practical Guide
 This guide provides a detailed explanation of Support Vector
Machines, the model used for regression in this project.
 https://www.svm-tutorial.com/
Matplotlib Documentation
 Matplotlib was used for any potential data visualization in the
project. The official site contains extensive guides for plotting and
customizing graphs.
 https://matplotlib.org/
"Machine Learning Yearning" by Andrew Ng
 This book by Andrew Ng is a key reference for understanding
machine learning workflows and practical approaches for model
development.
 https://www.deeplearning.ai/machine-learning-yearning/

1922 B.SC Cs Batchno 24
No ratings yet
1922 B.SC Cs Batchno 24
91 pages
Report SP
No ratings yet
Report SP
39 pages
Sem Proj-III Stock
No ratings yet
Sem Proj-III Stock
58 pages
O Level Project - Pratigya Gangwar
No ratings yet
O Level Project - Pratigya Gangwar
62 pages
SSRN 5089161
No ratings yet
SSRN 5089161
12 pages
Project Synopsis Stock Price Prediction Using Machine Learni
No ratings yet
Project Synopsis Stock Price Prediction Using Machine Learni
3 pages
Stock Market Prediction
No ratings yet
Stock Market Prediction
24 pages
Group 4 Stock Market Prediction
No ratings yet
Group 4 Stock Market Prediction
23 pages
Stock Price Prediction 2024
No ratings yet
Stock Price Prediction 2024
6 pages
Machine Learning Techniques For Stock Price Predic
No ratings yet
Machine Learning Techniques For Stock Price Predic
10 pages
Final Print Reporttt - Removed
No ratings yet
Final Print Reporttt - Removed
26 pages
Machine Learning
No ratings yet
Machine Learning
8 pages
Stock Price Sam23
No ratings yet
Stock Price Sam23
38 pages
ML Algorithm Paper
No ratings yet
ML Algorithm Paper
13 pages
Progressive Seminar 1
No ratings yet
Progressive Seminar 1
16 pages
8 Jsee2317
No ratings yet
8 Jsee2317
12 pages
New IEEE Paper-2
No ratings yet
New IEEE Paper-2
7 pages
Stock Market Analysis
100% (1)
Stock Market Analysis
19 pages
Predicting Stock Prices Using Artificial Intelligence: A Comparative Study of Machine Learning Algorithms
No ratings yet
Predicting Stock Prices Using Artificial Intelligence: A Comparative Study of Machine Learning Algorithms
13 pages
Stock Price Prediction: Project I (PRJCS681) Bachelor of Technology Department of CSE
No ratings yet
Stock Price Prediction: Project I (PRJCS681) Bachelor of Technology Department of CSE
14 pages
Prediction and Visualization of Trends in Stock Prices
No ratings yet
Prediction and Visualization of Trends in Stock Prices
9 pages
Synopsis SMP
No ratings yet
Synopsis SMP
5 pages
Stock Market Price Prediction
0% (1)
Stock Market Price Prediction
21 pages
My File
No ratings yet
My File
20 pages
Synopis
No ratings yet
Synopis
5 pages
JETIR2501512
No ratings yet
JETIR2501512
6 pages
BT4241 RP
No ratings yet
BT4241 RP
8 pages
Stock Market Price Prediction Using Machine Learning IJERTV14IS040441
No ratings yet
Stock Market Price Prediction Using Machine Learning IJERTV14IS040441
5 pages
Sustainable Stock Market Prediction Framework Usin
No ratings yet
Sustainable Stock Market Prediction Framework Usin
15 pages
Prediction of Stock Market Prices Using Machine Learning-1
No ratings yet
Prediction of Stock Market Prices Using Machine Learning-1
17 pages
StockMarketPredictionTermPaper Final1
No ratings yet
StockMarketPredictionTermPaper Final1
4 pages
A Systematic Analysis of Stock Prediction Models Using Artificial Intelligence Approaches
No ratings yet
A Systematic Analysis of Stock Prediction Models Using Artificial Intelligence Approaches
17 pages
IJNRD2307048
No ratings yet
IJNRD2307048
5 pages
IJISAE 50 Rahul+Marui+Dhokane 3 1867
No ratings yet
IJISAE 50 Rahul+Marui+Dhokane 3 1867
8 pages
Stock Prediction System Using ML
No ratings yet
Stock Prediction System Using ML
5 pages
Python
No ratings yet
Python
12 pages
Deepika
No ratings yet
Deepika
15 pages
Stock Prediction
No ratings yet
Stock Prediction
9 pages
Introduction
No ratings yet
Introduction
16 pages
Wa0142.
No ratings yet
Wa0142.
7 pages
19 Stock Price Trend Predictionusing Multiple Linear Regression
No ratings yet
19 Stock Price Trend Predictionusing Multiple Linear Regression
6 pages
Stock Prediction Using Machine Learning Google Scholar
No ratings yet
Stock Prediction Using Machine Learning Google Scholar
8 pages
20EJCIT200 - Abhishek Tiwari
No ratings yet
20EJCIT200 - Abhishek Tiwari
7 pages
1 s2.0 S187705092500050X Main
No ratings yet
1 s2.0 S187705092500050X Main
12 pages
Stock Market Prediction Using Machine Learning: December 2018
No ratings yet
Stock Market Prediction Using Machine Learning: December 2018
4 pages
Stock Market Prediction Using Machine Learning Proposal
No ratings yet
Stock Market Prediction Using Machine Learning Proposal
10 pages
StockMarketPredictionTermPaper Final1
No ratings yet
StockMarketPredictionTermPaper Final1
4 pages
A Systematic Literature Review Forecasting Stock Price Using Machine Learning Approach
No ratings yet
A Systematic Literature Review Forecasting Stock Price Using Machine Learning Approach
5 pages
Computation 07 00004
No ratings yet
Computation 07 00004
20 pages
Stock Price Prediction Using Machine Learning
No ratings yet
Stock Price Prediction Using Machine Learning
3 pages
Stock Price Preduction Report
No ratings yet
Stock Price Preduction Report
4 pages
Stock Price Prediction - Machine Learning Project in Python
No ratings yet
Stock Price Prediction - Machine Learning Project in Python
14 pages
Stock Price Prediction - Machine Learning Project in Python
No ratings yet
Stock Price Prediction - Machine Learning Project in Python
15 pages
Paper 8660
No ratings yet
Paper 8660
4 pages
Stock Market Prediction Using Machine Learning: December 2018
No ratings yet
Stock Market Prediction Using Machine Learning: December 2018
4 pages
STOCK MARKET PREDICTION USING MACHINE LEARNING
No ratings yet
STOCK MARKET PREDICTION USING MACHINE LEARNING
1 page
Stock Market Prediction Using Machine Learning Algorithms
No ratings yet
Stock Market Prediction Using Machine Learning Algorithms
7 pages
Stock Market Prediction Using Machine Learning: December 2018
No ratings yet
Stock Market Prediction Using Machine Learning: December 2018
4 pages
CS3491 Artificial Intelligence and Machine Learning Two Mark Questions 1
No ratings yet
CS3491 Artificial Intelligence and Machine Learning Two Mark Questions 1
23 pages
Employability Prediction Model Using Academic Performance in Higher Education Through Deep Learning Techniques
No ratings yet
Employability Prediction Model Using Academic Performance in Higher Education Through Deep Learning Techniques
16 pages
Stock Market Prediction Using Machine Learning
No ratings yet
Stock Market Prediction Using Machine Learning
5 pages
ML Practical Format
No ratings yet
ML Practical Format
82 pages
2935 5578 1 PB
No ratings yet
2935 5578 1 PB
5 pages
ESE Lab File
No ratings yet
ESE Lab File
105 pages
21 SVR
No ratings yet
21 SVR
22 pages
CBDA Research Paper
No ratings yet
CBDA Research Paper
19 pages
UNIT 3 - Part - 2
No ratings yet
UNIT 3 - Part - 2
43 pages
Project Report 5
No ratings yet
Project Report 5
51 pages
Unit 4
No ratings yet
Unit 4
23 pages
A Detailed Analysis of New Intrusion Detection
No ratings yet
A Detailed Analysis of New Intrusion Detection
19 pages
Predicting Music Popularity Using Spotify and YouT
No ratings yet
Predicting Music Popularity Using Spotify and YouT
14 pages
ISYE 7406 Fall 2023 Syllabus
No ratings yet
ISYE 7406 Fall 2023 Syllabus
10 pages
Synopsis
No ratings yet
Synopsis
10 pages
3 Machine Learning Techniques For The Detection of Erotic Content
No ratings yet
3 Machine Learning Techniques For The Detection of Erotic Content
13 pages
Predictive Analytics in Healthcare: An Engineering Project in Community Service
No ratings yet
Predictive Analytics in Healthcare: An Engineering Project in Community Service
23 pages
Eskandari Et Al RS 2020
No ratings yet
Eskandari Et Al RS 2020
32 pages
1 s2.0 S258972172300017X Main
No ratings yet
1 s2.0 S258972172300017X Main
11 pages
Application of Machine Learning Techniques in Mineral Classification
No ratings yet
Application of Machine Learning Techniques in Mineral Classification
13 pages
Factors Affecting Students Performance I
No ratings yet
Factors Affecting Students Performance I
32 pages
Heart Disease Prediction
No ratings yet
Heart Disease Prediction
16 pages
Machine Learning-Based Prediction of Hospital Prolonged Length of Stay Admission at Emergency Department A Gradient Boosting Algor
No ratings yet
Machine Learning-Based Prediction of Hospital Prolonged Length of Stay Admission at Emergency Department A Gradient Boosting Algor
19 pages
BTP Sixth Sem Report
No ratings yet
BTP Sixth Sem Report
33 pages
BERT-based Models For Classifying Multi-Dialect Arabic Texts
No ratings yet
BERT-based Models For Classifying Multi-Dialect Arabic Texts
10 pages
Machine Learning Classification Model For Identifying Wildlife Species in East Africa
No ratings yet
Machine Learning Classification Model For Identifying Wildlife Species in East Africa
9 pages
Unit 2 Question Papers by Pushpa
No ratings yet
Unit 2 Question Papers by Pushpa
4 pages
Using Deep Learning To Detect Price Change Indications in Financial Markets
No ratings yet
Using Deep Learning To Detect Price Change Indications in Financial Markets
5 pages
Virtual Screening
No ratings yet
Virtual Screening
11 pages
Stock Movement Prediction Based On Technical Indicators Applying Hybrid Machine Learning Models
No ratings yet
Stock Movement Prediction Based On Technical Indicators Applying Hybrid Machine Learning Models
4 pages
Comparison Between SVM Other Classifiers For Ser IJERTV2IS1457
No ratings yet
Comparison Between SVM Other Classifiers For Ser IJERTV2IS1457
6 pages
AI in Quantitative Analysis
From Everand
AI in Quantitative Analysis
Anand Vemula
No ratings yet

My File

Uploaded by

My File

Uploaded by

SKILL LAB AND PROJECT - I

Ankush Patel [SIC: 22BCTE07, GROUP: CST(A3) ROLL NO: 17]

Swadesh Panda [SIC: 22BCTJ04, GROUP: CST(A2) ROLL NO: 30]

1. Data Collection and Analysis: The first objective was to

2. Model Development: Using the dataset, the project

3. Model Evaluation: The models were trained on a subset

4. Deployment of Predictive System: The final objective

Recent research in financial prediction has emphasized the

Support Vector Regression (SVR): A machine learning model

The models’ performance is evaluated using metrics like

The workflow for this project consists of the following steps:

 Programming Language: Python, chosen for its

Data Collection and Analysis

Out[6]: Open High Low Close Adj Close Volume

count 6879.00000 6879.00000 6879.00000 6879.00000 6879.00000 6.879000e+

columns_order = ['Date', 'Open', 'High', 'Low', 'Volume',

0 01-01-1996 5.55 5.60 5.53 985500.0 5.58

1 02-01-1996 5.47 5.57 5.29 7470000.0

In [9]: df['Date'] = pd.to_datetime(df['Date'], format="%d-%m-%Y",

1 02-01-1996 5.47 5.57 5.29 7470000.0

Handling Missing and Duplicate Values

In [11]: df['Open'] = df['Open'].fillna(df['Open'].mean()) # Replace NaN with the column mean

In [13]: # Finding the duplicate values

In [17]: plt.figure(figsize=(8, 6))

In [18]: plt.figure(figsize=(8, 6))

In [19]: plt.figure(figsize=(8, 6))

In [20]: plt.figure(figsize=(8, 6))

In [25]: x_train = scaler.transform(x_train)

Model Training using Linear Regression

[ 15.79830509 182.28063245 16.48930378 ... 114.17185558 272.4732109

01-06- 15.46 15.798305

Model Evaluation for Linear Regression Model

Out[32] r2_score(y_test, y_pred)

Predictive Model using Linear Regression

Model Training using Support Vector Regressor

01-06- 15.46 15.786997

Model Evaluation for Support Vector Regressor Model

Out[40] r2_score(y_test, prediction)

Predictive Model for Support Vector Regressor

You might also like