0% found this document useful (0 votes)

5 views18 pages

ML A 6 Project

The document is a project report for a Machine Learning with Python course, detailing the analysis of a dataset related to travel itineraries. It includes sections on data import, visualization, correlation analysis, and the application of various machine learning models such as linear regression and K-means clustering. The report acknowledges the support of faculty and peers and is submitted for the Master's degree at Vivekananda Global University.

Uploaded by

gauravsharma895502

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views18 pages

ML A 6 Project

Uploaded by

gauravsharma895502

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

Dataset Itineraries

A6 PROJECT REPORT
of
MACHINE LEARNING WITH PYTHON
PGCSA113
Submitted in partial fulfilment of the requirement for the award of Degree of

MASTER OF COMPUTER APPLICATIONS

Submitted to

Submitted to: Submitted by:

Mr. Katib Showkat Zarger (Hemant Maurya)
Computer Science & Application Enroll No. 24CSA3BC006

Department of Computer Science & Application

Vivekananda Global University, Jaipur
Year- 2025

1
ACKNOWLEDGEMENT
I have taken this opportunity to express my gra tude and humble regards to the
Vivekananda Global University to provide an opportunity to present a project on
the “ On my Dataset I neraries ” Which is a “ Machine Learning with Python ”
based project.
I would also be thankful to my project guide “Mr.Ka b Showkat Zarger” to help me in the
comple on of my project and the documenta on. I have taken eﬀorts in this project,
but the success of this project would not be possible without their support and
encouragement.
I would like to thanks our Dean sir “Dr. R.C Tripathi” to help us in providing all
the necessary books and other stuﬀs as and when required. I show my gra tude to
the authors whose books has been proved as the guide in the comple on of my
project I am also thankful to my classmates and friends who have encouraged me in
the course of comple on of the project.

Thanks

Hemant Maurya
Enrollmnt No: - 24CSA3BC006

Place: Jaipur
Date: 01-05-2025

2
DECLARATION
We hereby declare that this Project Report tled “ On my Dataset I neraries ”

submi ed by us and approved by our project guide, to the Vivekananda Global University,

Jaipur is a bonaﬁde work undertaken by us and it is not submi ed to any other University

or Ins tu on for the award of any degree diploma / cer ﬁcate or published any me

before.

Student Name: Hemant Maurya(24CSA3BC006)

ERP: - 2430815

Project Guide: Mr. Ka b Showkat Zarger

3
1. Import Dataset
from google.colab import drive
drive.mount('/content/drive')
import numpy as np
import pandas as pd
import plotly.express as px
import pandas as pd
[5]

df = pd.read_csv("/content/drive/MyDrive/Colab Notebooks/itineraries.csv")
df.head(5)

2. import seaborn as sns

import matplotlib.pyplot as plt
airport_fare = df.groupby('startingAirport')['totalFare'].mean().reset_index()
sns.set(style="whitegrid")
plt.figure(figsize=(7, 4))
sns.barplot(x='startingAirport', y='totalFare', data=airport_fare, palette="Blues_d")
plt.title('Average Total Fare by Starting Airport', fontsize=14)
plt.xlabel('Starting Airport', fontsize=12)
plt.ylabel('Average Total Fare ($)', fontsize=12)
plt.xticks(rotation=45)
plt.show()

4
3. from bokeh.plotting import figure, show, output_notebook
from bokeh.models import ColumnDataSource

output_notebook()
source =
ColumnDataSource(df.groupby('startingAirport')['totalFare'].mean().reset_index())
p = figure(x_range=source.data['startingAirport'], height=600, width=800,
title="Average Total Fare by Starting Airport", toolbar_location=None,
tools="")
p.vbar(x='startingAirport', top='totalFare', width=0.9, source=source,
color="skyblue", line_color="black")
p.xaxis.axis_label, p.yaxis.axis_label = 'Starting Airport', 'Average Total Fare ($)'
p.xaxis.major_label_orientation, p.grid.grid_line_color = "vertical", None
show(p)

4. df1 = pd.DataFrame(df)
new_df1 = df1.drop(['legId', 'segmentsArrivalTimeEpochSeconds',
'segmentsDistance'], axis=1)
print(new_df1)
corr2=df.select_dtypes(include='number')
correlation_matrix = corr2.corr()

5
5. plt.figure(figsize=(5, 4))
sns.boxplot(data=df[["totalFare", "travelDuration"]])
plt.title("Box Plot of Features")
plt.xlabel("Features")
plt.ylabel("Value")
plt.show()

6
6. plt.figure(figsize=(5, 4))
plt.plot(df.index, [df['totalFare'].mean()] * len(df), marker="o", linestyle="-",
color="blue", alpha=0.6)
plt.title("Line Graph: Mean values across Samples")
plt.xlabel("Sample Index")
plt.ylabel("Mean")
plt.show()

7. df['Mean'] = df['totalFare'].mean()
df['Variance'] = df['totalFare'].var()
df['Target'] = 15
plt.figure(figsize=(8, 5))
sns.scatterplot(x=df["Mean"], y=df["Variance"], hue=df["Target"],
palette="coolwarm")
plt.title("Scatter Plot: Mean vs Variance")
plt.xlabel("Mean")
plt.ylabel("Variance")
plt.legend(title="Target")
plt.show()

7
8. Visualization of the dataset using different libraries.

correlation_matrix = df.select_dtypes(include=['number']).corr()

 # Histogram of 'totalFare'
plt.figure(figsize=(8,6))
plt.hist(df['totalFare'], bins=20, color='skyblue', edgecolor='black')
plt.xlabel('Total Fare')
plt.ylabel('Frequency')
plt.title('Distribution of Total Fare')
plt.show()

 # Scatter plot of 'travelDuration' vs. 'totalFare'

plt.figure(figsize=(8,6))
plt.scatter(df['travelDuration'], df['totalFare'], alpha=0.5)
plt.xlabel('Travel Duration')
plt.ylabel('Total Fare')
plt.title('Travel Duration vs. Total Fare')
plt.show()

8
 # Box plot of 'totalFare' for different 'Starting Airport'
plt.figure(figsize=(10,6))
sns.boxplot(x='startingAirport', y='totalFare', data=df)
plt.xticks(rotation=45, ha='right')
plt.ylabel('Total Fare')
plt.xlabel('Starting Airport')
plt.title('Total Fare Distribution by Starting Airport')
plt.show()

 # Pairplot of numerical features

sns.pairplot(df.select_dtypes(include=['number']))
plt.show()

9
 # Heatmap of the correlation matrix
plt.figure(figsize=(10,8))
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', fmt='.2f')
plt.title('Correlation Matrix Heatmap')
plt.show()

9. Calculating Correlation between different variables in a separate data frame.

numerical_df = df.select_dtypes(include=np.number)
correlation_matrix = numerical_df.corr()
print(correlation_matrix)
plt.figure(figsize=(10, 8))
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', fmt=".2f")
plt.title('Correlation Matrix Heatmap')
plt.show()

10
10.Train and test a Linear Regression Model on your Dataset.
df['travelDuration'] = pd.to_timedelta(df['travelDuration']).dt.total_seconds()
X = df[['travelDuration']] # Assuming these are your features
y = df['totalFare']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)
model = LinearRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print(f"Mean Squared Error: {mse}")
print(f"R-squared: {r2}")
plt.scatter(X_test, y_test, color='blue', label='Actual')
plt.plot(X_test, y_pred, color='red', linewidth=2, label='Predicted')
plt.xlabel('Travel Duration')
plt.ylabel('Total Fare')
plt.title('Linear Regression Model')
plt.legend()
plt.show()

11
11.from sklearn.preprocessing import PolynomialFeatures
X = np.random.rand(100, 1) * 10
y = 2.5 * X**2 - 1.5 * X + np.random.randn(100, 1)
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42)
poly = PolynomialFeatures(degree=2)
X_train_poly = poly.fit_transform(X_train)
X_test_poly = poly.transform(X_test)
model = LinearRegression()
model.fit(X_train_poly, y_train)
y_pred = model.predict(X_test_poly)
plt.scatter(X_test, y_test, color='blue')
plt.scatter(X_test, y_pred, color='red')
plt.title('Polynomial Regression')
plt.xlabel('X')
plt.ylabel('y')
plt.show()

12
12. X = df[['travelDuration']]
y = df['totalFare']
threshold = df['totalFare'].median()
y = (y > threshold).astype(int)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)
X_train['travelDuration'] =
pd.to_timedelta(X_train['travelDuration']).dt.total_seconds()
X_test['travelDuration'] = pd.to_timedelta(X_test['travelDuration']).dt.total_seconds()
# Now you can use SimpleImputer
imputer = SimpleImputer(strategy='mean')
X_train = imputer.fit_transform(X_train)
X_test = imputer.transform(X_test)
imputer = SimpleImputer(strategy='mean')
X_train = imputer.fit_transform(X_train)
X_test = imputer.transform(X_test)
model = LogisticRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy}")
print(classification_report(y_test, y_pred))
print(confusion_matrix(y_test, y_pred))
plt.figure(figsize=(6, 4))
plt.scatter(X_test, y_test, color='blue', label='Actual')
plt.scatter(X_test, y_pred, color='red', label='Predicted')
plt.xlabel('Travel Duration')
plt.ylabel('Total Fare (Binary)')
plt.title('Logistic Regression: Travel Duration vs Total Fare')
plt.legend()
plt.show()

13
13.df['travelDuration'] = pd.to_timedelta(df['travelDuration']).dt.total_seconds()
X = df[['travelDuration']]
y = df['totalFare']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)
model = LinearRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print(f"Mean Squared Error: {mse}")
print(f"R-squared: {r2}")
plt.scatter(X_test, y_test, color='blue', label='Actual')
plt.plot(X_test, y_pred, color='red', label='Predicted')
plt.xlabel('Travel Duration (minutes)')
plt.ylabel('Total Fare')
plt.title('Linear Regression: Travel Duration vs Total Fare')
plt.legend()
plt.show()

14
14.Gaussian Distribution in Machine Learning
mu = df['totalFare'].mean()
sigma = df['totalFare'].std()
df['Gaussian_TotalFare'] = np.random.normal(mu, sigma, len(df))

plt.figure(figsize=(5, 4))
plt.hist(df['Gaussian_TotalFare'], bins=30, density=True, alpha=0.6,
color='blue', label='Gaussian Distribution of totalFare')
bgcolor = 'pink'

xmin, xmax = plt.xlim()

x = np.linspace(xmin, xmax, 100)
p = 1/(sigma * np.sqrt(2 * np.pi)) * np.exp( - (x - mu)**2 / (2 *
sigma**2))
plt.plot(x, p, 'k', linewidth=2, label='Theoretical Gaussian
Distribution')

plt.xlabel('Total Fare')
plt.ylabel('Probability Density')
plt.title('Gaussian Distribution of Total Fare')
plt.legend()
plt.show()

15
15.. precision recall f1 score

from sklearn.metrics import precision_score, recall_score, f1_score

threshold = df['totalFare'].median()
y_test_binary = (y_test > threshold).astype(int)
y_pred_binary = (y_pred > threshold).astype(int)
precision = precision_score(y_test_binary, y_pred_binary)
recall = recall_score(y_test_binary, y_pred_binary)
f1 = f1_score(y_test_binary, y_pred_binary)
print(f"Precision: {precision}")
print(f"Recall: {recall}")
print(f"F1-score: {f1}")

16. K Means Clustering Practical

from sklearn.cluster import KMeans
X = df[['totalFare', 'travelDuration']]
X['travelDuration'] = pd.to_timedelta(X['travelDuration']).dt.total_seconds()
kmeans = KMeans(n_clusters=5, random_state=0) # Now KMeans is defined
kmeans.fit(X)
labels = kmeans.labels_
df['cluster'] = labels
print(df)
fig = px.scatter(df, x='totalFare', y='travelDuration', color='cluster', title='K-Means
Clustering')
fig.show()

16
17
 K Means Clustering Practical
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler # Import StandardScaler
df['travelDuration'] = pd.to_timedelta(df['travelDuration']).dt.total_seconds()
X = df[['travelDuration', 'totalFare']]
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
kmeans = KMeans(n_clusters=15, random_state=42)
df['cluster'] = kmeans.fit_predict(X_scaled)
plt.figure(figsize=(5, 3))
plt.scatter(df['travelDuration'], df['totalFare'], c=df['cluster'], cmap='viridis')
plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1], s=200, c='Red',
label='Centroids', marker = '*')
plt.xlabel('Travel Duration')
plt.ylabel('Total Fare')
plt.title('KMeans Clustering of Travel Data')
plt.legend()
plt.show()

Historical Dictionary of The British Empire
100% (1)
Historical Dictionary of The British Empire
767 pages
Dsbda Lab Manual
No ratings yet
Dsbda Lab Manual
167 pages
Orange3 Data Mining Library Using Python
50% (2)
Orange3 Data Mining Library Using Python
102 pages
Cool Cream Case Study
No ratings yet
Cool Cream Case Study
6 pages
Flight Fare
No ratings yet
Flight Fare
15 pages
Top 60 Python Projects For All Levels of Expertise
No ratings yet
Top 60 Python Projects For All Levels of Expertise
9 pages
Advanced Mathematical Thinking
100% (2)
Advanced Mathematical Thinking
76 pages
Great Quotes From Zig Ziglar PDF
100% (4)
Great Quotes From Zig Ziglar PDF
51 pages
cz4041 Project Final Report Nyc Taxi Fare Prediction
0% (1)
cz4041 Project Final Report Nyc Taxi Fare Prediction
18 pages
DSBDA Lab Manual
No ratings yet
DSBDA Lab Manual
167 pages
Kuiper
No ratings yet
Kuiper
223 pages
DSBDAlab Manual
No ratings yet
DSBDAlab Manual
116 pages
Machine Learning Guide: Meher Krishna Patel
No ratings yet
Machine Learning Guide: Meher Krishna Patel
121 pages
Informatics Practices Record Class 12
No ratings yet
Informatics Practices Record Class 12
60 pages
ML Lab Manual
No ratings yet
ML Lab Manual
60 pages
Optimizing Flight Booking Decisions Through Machine Learning Price Predictions
No ratings yet
Optimizing Flight Booking Decisions Through Machine Learning Price Predictions
50 pages
Lecture02. ML Pipeline (Chapter 2)
No ratings yet
Lecture02. ML Pipeline (Chapter 2)
50 pages
Spark Lab
No ratings yet
Spark Lab
6 pages
Merged
No ratings yet
Merged
47 pages
Report 4
No ratings yet
Report 4
50 pages
A Double Harm: Police Misuse of Force and Barriers To Necessary Health Care Services (SERI)
No ratings yet
A Double Harm: Police Misuse of Force and Barriers To Necessary Health Care Services (SERI)
83 pages
DS Journal
No ratings yet
DS Journal
46 pages
Step 16 Chapter4
No ratings yet
Step 16 Chapter4
64 pages
Winter Report
No ratings yet
Winter Report
82 pages
Car Price Prediction
No ratings yet
Car Price Prediction
42 pages
ML Manual
No ratings yet
ML Manual
30 pages
Nkomazana 2005 Gender Analysis of Bogwera and Bojale Initiation Among Batswana
No ratings yet
Nkomazana 2005 Gender Analysis of Bogwera and Bojale Initiation Among Batswana
22 pages
ML Lab Manual
No ratings yet
ML Lab Manual
36 pages
ML Lab Manual
No ratings yet
ML Lab Manual
25 pages
Machinelearning
No ratings yet
Machinelearning
26 pages
ML Code Output
No ratings yet
ML Code Output
38 pages
Flight Price Prediction
No ratings yet
Flight Price Prediction
34 pages
ML 1-11
No ratings yet
ML 1-11
27 pages
ML (Sudhanshu)
No ratings yet
ML (Sudhanshu)
24 pages
Vamshi ml-1,2
No ratings yet
Vamshi ml-1,2
25 pages
Fake News Detectio3
No ratings yet
Fake News Detectio3
24 pages
DW Lab File
No ratings yet
DW Lab File
18 pages
Experiment Number: 3: Aim:-Study of The Linear Regression in The Machine Learning Using The Boston Housing Dataset. 1)
No ratings yet
Experiment Number: 3: Aim:-Study of The Linear Regression in The Machine Learning Using The Boston Housing Dataset. 1)
14 pages
Document Reference
No ratings yet
Document Reference
33 pages
Practical (Data Science)
No ratings yet
Practical (Data Science)
13 pages
Human Centred Design For Mental Health Services Workshop Report 250523
No ratings yet
Human Centred Design For Mental Health Services Workshop Report 250523
26 pages
Flight Fare Prediction: Project Report
No ratings yet
Flight Fare Prediction: Project Report
38 pages
BPP Business School - Applied Modelling and Visualisation
No ratings yet
BPP Business School - Applied Modelling and Visualisation
19 pages
Experiment Number: 3: Aim:-Study of The Linear Regression in The Machine Learning Using The Boston Housing Dataset. 1)
No ratings yet
Experiment Number: 3: Aim:-Study of The Linear Regression in The Machine Learning Using The Boston Housing Dataset. 1)
16 pages
Training Report On Data Analysis With Python
No ratings yet
Training Report On Data Analysis With Python
12 pages
Flight-Price-Prediction - Flight - Price - Ipynb at Master Mandal-21 - Flight-Price-Prediction
No ratings yet
Flight-Price-Prediction - Flight - Price - Ipynb at Master Mandal-21 - Flight-Price-Prediction
28 pages
Lab 02 - Introduction To Pandas
No ratings yet
Lab 02 - Introduction To Pandas
6 pages
Case Study 219302405
No ratings yet
Case Study 219302405
14 pages
External
No ratings yet
External
11 pages
Data Science
No ratings yet
Data Science
9 pages
Report 1
No ratings yet
Report 1
11 pages
DBDAL LAB - MANUAL - Final
No ratings yet
DBDAL LAB - MANUAL - Final
93 pages
Naan Mudhalvan Phase 2
No ratings yet
Naan Mudhalvan Phase 2
13 pages
Batch Numbering System QA - 004
100% (8)
Batch Numbering System QA - 004
5 pages
ML - Practical File
No ratings yet
ML - Practical File
15 pages
Final 1
No ratings yet
Final 1
6 pages
Flight Price Prediction Report
No ratings yet
Flight Price Prediction Report
18 pages
Data Mining Journal 1 Kashan
No ratings yet
Data Mining Journal 1 Kashan
13 pages
2324 BigData Lab3
No ratings yet
2324 BigData Lab3
6 pages
Semantics Term Paper
No ratings yet
Semantics Term Paper
14 pages
Energy Efficient Pumping Technology Innovations and Recent Trends
No ratings yet
Energy Efficient Pumping Technology Innovations and Recent Trends
15 pages
Oceanic Feeling PDF
No ratings yet
Oceanic Feeling PDF
20 pages
Major Project
No ratings yet
Major Project
17 pages
Main - Py Text File
No ratings yet
Main - Py Text File
5 pages
Cyclone MKV 2 - User Manual
No ratings yet
Cyclone MKV 2 - User Manual
44 pages
Data Analysis
No ratings yet
Data Analysis
8 pages
ML - 2 - Jupyter Notebook
No ratings yet
ML - 2 - Jupyter Notebook
6 pages
DSBDA Lab Plan
No ratings yet
DSBDA Lab Plan
5 pages
DE Experiment 7
No ratings yet
DE Experiment 7
9 pages
The Captain's Shirt
100% (1)
The Captain's Shirt
3 pages
Dexos2™ Brands - GM Dexoscontact Dexos® Licensing Program
No ratings yet
Dexos2™ Brands - GM Dexoscontact Dexos® Licensing Program
9 pages
Regression Linaire Python Tome I
No ratings yet
Regression Linaire Python Tome I
9 pages
House Price Prediction: Project Description
No ratings yet
House Price Prediction: Project Description
11 pages
Project2 - 158755. 4.21
No ratings yet
Project2 - 158755. 4.21
3 pages
SDS - Oks 2101 - en
No ratings yet
SDS - Oks 2101 - en
16 pages
PR List Dsbda
No ratings yet
PR List Dsbda
2 pages
Profile
No ratings yet
Profile
2 pages
3 Amigos - SVS-Fault - Test & Mod - Sierrafery
No ratings yet
3 Amigos - SVS-Fault - Test & Mod - Sierrafery
11 pages
HIST 1127 Assignment #2 F2022
No ratings yet
HIST 1127 Assignment #2 F2022
5 pages
Find The Value of The Unknown in Each of The Following Quadrilaterals
No ratings yet
Find The Value of The Unknown in Each of The Following Quadrilaterals
3 pages
744845889-Murvin-Krak-1 2
No ratings yet
744845889-Murvin-Krak-1 2
1 page
A Short Introduction To Serverless Architecture
No ratings yet
A Short Introduction To Serverless Architecture
3 pages
Notes On Intro To Data Science Udacity
No ratings yet
Notes On Intro To Data Science Udacity
8 pages
104174us Minimax Operatinginstructions 100518a
No ratings yet
104174us Minimax Operatinginstructions 100518a
12 pages
Document
No ratings yet
Document
5 pages
Michael Evan Aguelo: Educational Background
No ratings yet
Michael Evan Aguelo: Educational Background
3 pages
T Baxter Portrait DWG Workshop May 19
No ratings yet
T Baxter Portrait DWG Workshop May 19
3 pages
Icd 16 5 Eng V2.1 PDF
No ratings yet
Icd 16 5 Eng V2.1 PDF
2 pages
Jama Caricchio 2021 Oi 210064 1626283669.23567
No ratings yet
Jama Caricchio 2021 Oi 210064 1626283669.23567
10 pages
Computer Engineering Laboratory Solution Primer
From Everand
Computer Engineering Laboratory Solution Primer
Karan Bhandari
No ratings yet

ML A 6 Project

Uploaded by

ML A 6 Project

Uploaded by

Dataset Itineraries

MASTER OF COMPUTER APPLICATIONS

Submitted to: Submitted by:

Department of Computer Science & Application

Student Name: Hemant Maurya(24CSA3BC006)

Project Guide: Mr. Ka b Showkat Zarger

2. import seaborn as sns

 # Scatter plot of 'travelDuration' vs. 'totalFare'

 # Pairplot of numerical features

9. Calculating Correlation between different variables in a separate data frame.

xmin, xmax = plt.xlim()

from sklearn.metrics import precision_score, recall_score, f1_score

16. K Means Clustering Practical

You might also like