0% found this document useful (0 votes)

20 views8 pages

SPPUML1

Machine learning lab assignment no1

Uploaded by

kanaseaditya800

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views8 pages

SPPUML1

Machine learning lab assignment no1

Uploaded by

kanaseaditya800

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Predict the price of the Uber ride from a given

pickup point to the agreed drop-off

location.
Perform following tasks:

1. Pre-process the dataset.

2. Identify outliers.
3. Check the correlation.
4. Implement linear regression and random forest regression models.
5. Evaluate the models and compare their respective scores like R2, RMSE, etc. Dataset
link: https://www.kaggle.com/datasets/yasserh/uber-fares-dataset
(https://www.kaggle.com/datasets/yasserh/uber-fares-dataset)

In [ ]: 1 Name:-Kanase Aditya Madhukar

2 Roll no:-2441059
3 Batch:-D
4 Assignment no:-1

In [45]: 1 import pandas as pd

2 import numpy as np
3 import matplotlib.pyplot as plt
4 import seaborn as sns
5 from sklearn.model_selection import train_test_split
6 from sklearn.linear_model import LinearRegression
7 from sklearn.ensemble import RandomForestRegressor
8

In [46]: 1 df=pd.read_csv('uber.csv')
In [47]: 1 df

Out[47]: Unnamed:
key fare_amount pickup_datetime pickup_longitude pick
0

2015-05-07 2015-05-07
0 24238194 7.5 -73.999817
19:52:06.0000003 19:52:06 UTC

2009-07-17 2009-07-17
1 27835199 7.7 -73.994355
20:04:56.0000002 20:04:56 UTC

2009-08-24 2009-08-24
2 44984355 12.9 -74.005043
21:45:00.00000061 21:45:00 UTC

2009-06-26 2009-06-26
3 25894730 5.3 -73.976124
08:22:21.0000001 08:22:21 UTC

2014-08-28 2014-08-28
4 17610152 16.0 -73.925023
17:47:00.000000188 17:47:00 UTC

... ... ... ... ... ...

2012-10-28 2012-10-28
199995 42598914 3.0 -73.987042
10:49:00.00000053 10:49:00 UTC

2014-03-14 2014-03-14
199996 16382965 7.5 -73.984722
01:09:00.0000008 01:09:00 UTC

2009-06-29 2009-06-29
199997 27804658 30.9 -73.986017
00:42:00.00000078 00:42:00 UTC

2015-05-20 2015-05-20
199998 20259894 14.5 -73.997124
14:56:25.0000004 14:56:25 UTC

2010-05-15 2010-05-15
199999 11951496 14.1 -73.984395
04:08:00.00000076 04:08:00 UTC

200000 rows × 9 columns

In [48]: 1 df.head()

Out[48]: Unnamed:
key fare_amount pickup_datetime pickup_longitude pickup_lat
0

2015-05-07 2015-05-07
0 24238194 7.5 -73.999817 40.73
19:52:06.0000003 19:52:06 UTC

2009-07-17 2009-07-17
1 27835199 7.7 -73.994355 40.72
20:04:56.0000002 20:04:56 UTC

2009-08-24 2009-08-24
2 44984355 12.9 -74.005043 40.74
21:45:00.00000061 21:45:00 UTC

2009-06-26 2009-06-26
3 25894730 5.3 -73.976124 40.79
08:22:21.0000001 08:22:21 UTC

2014-08-28 2014-08-28
4 17610152 16.0 -73.925023 40.74
17:47:00.000000188 17:47:00 UTC

In [49]: 1 df.shape

Out[49]: (200000, 9)
In [50]: 1 df.tail()

Out[50]: Unnamed:
key fare_amount pickup_datetime pickup_longitude picku
0

2012-10-28 2012-10-28
199995 42598914 3.0 -73.987042
10:49:00.00000053 10:49:00 UTC

2014-03-14 2014-03-14
199996 16382965 7.5 -73.984722
01:09:00.0000008 01:09:00 UTC

2009-06-29 2009-06-29
199997 27804658 30.9 -73.986017
00:42:00.00000078 00:42:00 UTC

2015-05-20 2015-05-20
199998 20259894 14.5 -73.997124
14:56:25.0000004 14:56:25 UTC

2010-05-15 2010-05-15
199999 11951496 14.1 -73.984395
04:08:00.00000076 04:08:00 UTC

In [51]: 1 df.describe()

Out[51]: Unnamed: 0 fare_amount pickup_longitude pickup_latitude dropoff_longitude dro

count 2.000000e+05 200000.000000 200000.000000 200000.000000 199999.000000 19

mean 2.771250e+07 11.359955 -72.527638 39.935885 -72.525292

std 1.601382e+07 9.901776 11.437787 7.720539 13.117408

min 1.000000e+00 -52.000000 -1340.648410 -74.015515 -3356.666300

25% 1.382535e+07 6.000000 -73.992065 40.734796 -73.991407

50% 2.774550e+07 8.500000 -73.981823 40.752592 -73.980093

75% 4.155530e+07 12.500000 -73.967154 40.767158 -73.963658

max 5.542357e+07 499.000000 57.418457 1644.421482 1153.572603

In [52]: 1 df.isna().sum()

Out[52]: Unnamed: 0 0
key 0
fare_amount 0
pickup_datetime 0
pickup_longitude 0
pickup_latitude 0
dropoff_longitude 1
dropoff_latitude 1
passenger_count 0
dtype: int64

In [53]: 1 df.fillna(0,inplace=True)
In [54]: 1 df["pickup_datetime"] = pd.to_datetime(df["pickup_datetime"])
2
3 missing_values = df.isnull().sum()
4 print("Missing values in the dataset:")
5 print(missing_values)
6 df.dropna(inplace=True)
7 missing_values = df.isnull().sum()
8 print("Missing values after handling:")
9 print(missing_values)
10 sns.boxplot(x=df["fare_amount"])
11 plt.show()

Missing values in the dataset:

Unnamed: 0 0
key 0
fare_amount 0
pickup_datetime 0
pickup_longitude 0
pickup_latitude 0
dropoff_longitude 0
dropoff_latitude 0
passenger_count 0
dtype: int64
Missing values after handling:
Unnamed: 0 0
key 0
fare_amount 0
pickup_datetime 0
pickup_longitude 0
pickup_latitude 0
dropoff_longitude 0
dropoff_latitude 0
passenger_count 0
dtype: int64
In [55]: 1 Q1 = df["fare_amount"].quantile(0.25)
2 Q3 = df["fare_amount"].quantile(0.75)
3 IQR = Q3 - Q1
4 threshold = 1.5
5 lower_bound = Q1 - threshold * IQR
6 upper_bound = Q3 + threshold * IQR
7 data_no_outliers = df[(df["fare_amount"] >= lower_bound) & (df["fare_am
8 sns.boxplot(x=data_no_outliers["fare_amount"])
9 plt.show()
In [56]: 1 df.plot(kind="box",subplots=True, layout=(7, 2), figsize=(15, 20))

Out[56]: Unnamed: 0 AxesSubplot(0.125,0.787927;0.352273x0.0920732)

fare_amount AxesSubplot(0.547727,0.787927;0.352273x0.0920732)
pickup_longitude AxesSubplot(0.125,0.677439;0.352273x0.0920732)
pickup_latitude AxesSubplot(0.547727,0.677439;0.352273x0.0920732)
dropoff_longitude AxesSubplot(0.125,0.566951;0.352273x0.0920732)
dropoff_latitude AxesSubplot(0.547727,0.566951;0.352273x0.0920732)
passenger_count AxesSubplot(0.125,0.456463;0.352273x0.0920732)
dtype: object
In [57]: 1 correlation_matrix = df.corr()
2 sns.heatmap(correlation_matrix, annot=True,fmt='.1f')
3 plt.figure(figsize=(8,8))
4 plt.show()

<Figure size 576x576 with 0 Axes>

In [58]: 1 X = df[['pickup_longitude', 'pickup_latitude', 'dropoff_longitude', 'dr

2 y = df['fare_amount']
3
4
5 y

Out[58]: 0 7.5
1 7.7
2 12.9
3 5.3
4 16.0
...
199995 3.0
199996 7.5
199997 30.9
199998 14.5
199999 14.1
Name: fare_amount, Length: 200000, dtype: float64

In [59]: 1 X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2

In [60]: 1 lr_model = LinearRegression()

2 lr_model.fit(X_train, y_train)

Out[60]: LinearRegression()
In [61]: 1 rf_model = RandomForestRegressor(n_estimators=100, random_state=42)
2 rf_model.fit(X_train, y_train)

Out[61]: RandomForestRegressor(random_state=42)

In [62]: 1 y_pred_lr = lr_model.predict(X_test)

2 y_pred_lr
3 print("Linear Model:",y_pred_lr)
4 y_pred_rf = rf_model.predict(X_test)
5 print("Random Forest Model:", y_pred_rf)

Linear Model: [11.29485003 12.16430284 11.55413146 ... 11.35862539 11.2940

5912
11.29346213]
Random Forest Model: [ 6.939 11.33572265 7.485 ... 4.773
6.357
7.639 ]

In [ ]: 1

Taxi Trips Analysis Project 1682332303
100% (2)
Taxi Trips Analysis Project 1682332303
28 pages
Exertherm® Modbus Datacard
No ratings yet
Exertherm® Modbus Datacard
2 pages
Loading The Dataset: First We Load The Dataset and Find Out The Number of Columns, Rows, NULL Values, Etc
100% (1)
Loading The Dataset: First We Load The Dataset and Find Out The Number of Columns, Rows, NULL Values, Etc
8 pages
Imaging Protocol Loading Guidance - SM - DOC2113485 - 1
No ratings yet
Imaging Protocol Loading Guidance - SM - DOC2113485 - 1
19 pages
Online Mobile Showroom and Services
No ratings yet
Online Mobile Showroom and Services
45 pages
Asymptotic Notation - Analysis of Algorithms
No ratings yet
Asymptotic Notation - Analysis of Algorithms
37 pages
Sap MM2
No ratings yet
Sap MM2
113 pages
Informecial App Analysis Question Test NEW
No ratings yet
Informecial App Analysis Question Test NEW
4 pages
Oracle - Overview of Oracle Spatial
No ratings yet
Oracle - Overview of Oracle Spatial
20 pages
Java J2EE-Unit-1
No ratings yet
Java J2EE-Unit-1
42 pages
Likha - Surigao Del Sur - Robotics Team
No ratings yet
Likha - Surigao Del Sur - Robotics Team
22 pages
Mad 22617 Model Ans
No ratings yet
Mad 22617 Model Ans
42 pages
Cryptographic Hash Functions
No ratings yet
Cryptographic Hash Functions
40 pages
Update and Document Operational Procedure-Final
No ratings yet
Update and Document Operational Procedure-Final
21 pages
UBER Data Wrangling
No ratings yet
UBER Data Wrangling
45 pages
Polynomials Test Paper
No ratings yet
Polynomials Test Paper
3 pages
Backup Exec Licensing Guide
No ratings yet
Backup Exec Licensing Guide
12 pages
Chapter-5-The Internet and Its Uses
No ratings yet
Chapter-5-The Internet and Its Uses
17 pages
Resume-Sumit Kumar Sehrawat
No ratings yet
Resume-Sumit Kumar Sehrawat
7 pages
Credit Card Fraud Detection
100% (1)
Credit Card Fraud Detection
14 pages
File 33
No ratings yet
File 33
12 pages
Question #1: Ans: B
No ratings yet
Question #1: Ans: B
25 pages
Practical 1
No ratings yet
Practical 1
6 pages
Question Paper Code: X11182
No ratings yet
Question Paper Code: X11182
2 pages
9.3 技术服务合同
No ratings yet
9.3 技术服务合同
9 pages
CS610 MIDTERM SOLVED MCQS by JUNAID
83% (6)
CS610 MIDTERM SOLVED MCQS by JUNAID
33 pages
QBEYBS
No ratings yet
QBEYBS
8 pages
Assignment 2
No ratings yet
Assignment 2
7 pages
Jana Resume
No ratings yet
Jana Resume
6 pages
Mock Endsem Question Paper Image ProcessingElective V INSEM SEM
No ratings yet
Mock Endsem Question Paper Image ProcessingElective V INSEM SEM
3 pages
Analyzing Taxi Trends
No ratings yet
Analyzing Taxi Trends
43 pages
BIORADIO PG Contribution - Bci2000.org BBS
No ratings yet
BIORADIO PG Contribution - Bci2000.org BBS
3 pages
Record-Carmen Bautista-Dator MD, PC - Fee Revision
No ratings yet
Record-Carmen Bautista-Dator MD, PC - Fee Revision
3 pages
ML Code Output
No ratings yet
ML Code Output
38 pages
ML A 6 Project
No ratings yet
ML A 6 Project
18 pages
Eetc Comments On Abb Offer
No ratings yet
Eetc Comments On Abb Offer
2 pages
Unlocked Games For School
No ratings yet
Unlocked Games For School
2 pages
Merged
No ratings yet
Merged
47 pages
Ass 1 ML
No ratings yet
Ass 1 ML
21 pages
Assignment No 1 Output
No ratings yet
Assignment No 1 Output
42 pages
C121 Exp2
No ratings yet
C121 Exp2
23 pages
UBER
No ratings yet
UBER
2 pages
Flight-Price-Prediction - Flight - Price - Ipynb at Master Mandal-21 - Flight-Price-Prediction
No ratings yet
Flight-Price-Prediction - Flight - Price - Ipynb at Master Mandal-21 - Flight-Price-Prediction
28 pages
Kaggle Course Notes
No ratings yet
Kaggle Course Notes
87 pages
ML All Prints
No ratings yet
ML All Prints
25 pages
HIV Regression Source Code
No ratings yet
HIV Regression Source Code
26 pages
Introduction of Phase 4
No ratings yet
Introduction of Phase 4
14 pages
ML 1 Um
No ratings yet
ML 1 Um
5 pages
Major Project
No ratings yet
Major Project
17 pages
05 E RandomForest LoanData
No ratings yet
05 E RandomForest LoanData
8 pages
EDA Plots Code
No ratings yet
EDA Plots Code
13 pages
Mercedes-Benz Greener Manufacturing Ai
0% (1)
Mercedes-Benz Greener Manufacturing Ai
16 pages
Dse4 Stug082
No ratings yet
Dse4 Stug082
43 pages
Da Lab Mannual
No ratings yet
Da Lab Mannual
25 pages
Practical No 01
No ratings yet
Practical No 01
9 pages
Railway Price Prediction
No ratings yet
Railway Price Prediction
20 pages
ML 1 16
No ratings yet
ML 1 16
13 pages
ML - Practical - 1 - Jupyter Notebook
No ratings yet
ML - Practical - 1 - Jupyter Notebook
15 pages
Predict The Price of The Uber Ride From A Given Pickup Point To The Agreed Drop-Off Location
No ratings yet
Predict The Price of The Uber Ride From A Given Pickup Point To The Agreed Drop-Off Location
9 pages
ML Practical 1
No ratings yet
ML Practical 1
15 pages
ML - 2 - Jupyter Notebook
No ratings yet
ML - 2 - Jupyter Notebook
6 pages
Airline Passenger Booking Analyze
No ratings yet
Airline Passenger Booking Analyze
26 pages
Name: Siddhesh Asati: #Group: B (ML) #Assignment: 6
No ratings yet
Name: Siddhesh Asati: #Group: B (ML) #Assignment: 6
9 pages
Zerox Ready
No ratings yet
Zerox Ready
21 pages
Airfare ML - Predicting Flight Fares
No ratings yet
Airfare ML - Predicting Flight Fares
21 pages
Car Price Prediction
No ratings yet
Car Price Prediction
42 pages
Uber ml1 - Jupyter Notebook
No ratings yet
Uber ml1 - Jupyter Notebook
10 pages
CPE/EE 421/521 Fall 2004 Chapter 1 - The Microcomputer: Dr. Rhonda Kay Gaede
No ratings yet
CPE/EE 421/521 Fall 2004 Chapter 1 - The Microcomputer: Dr. Rhonda Kay Gaede
6 pages
Lab1.ipynb - Colaboratory
No ratings yet
Lab1.ipynb - Colaboratory
9 pages
SourceCode Assignment1
No ratings yet
SourceCode Assignment1
9 pages
DA Programs
No ratings yet
DA Programs
44 pages
Advance Python
No ratings yet
Advance Python
5 pages
Random Forest Model
No ratings yet
Random Forest Model
16 pages
Taxi Fare Team 09
No ratings yet
Taxi Fare Team 09
25 pages
Data Science Record - 05
No ratings yet
Data Science Record - 05
20 pages
ASSi2 DSBDA
No ratings yet
ASSi2 DSBDA
4 pages
Uber
No ratings yet
Uber
7 pages
Machine Learning Model Building
No ratings yet
Machine Learning Model Building
6 pages
Praktikum 5
No ratings yet
Praktikum 5
20 pages
DataAnalytics Lab Manual
No ratings yet
DataAnalytics Lab Manual
35 pages
Machine Learning Lab
No ratings yet
Machine Learning Lab
20 pages
5) Randomforest - Ipynb - Colaboratory
No ratings yet
5) Randomforest - Ipynb - Colaboratory
12 pages
1
No ratings yet
1
13 pages
P1) Code Uber
No ratings yet
P1) Code Uber
6 pages
Exp-2 ML
No ratings yet
Exp-2 ML
6 pages
R Assignment
No ratings yet
R Assignment
8 pages
DataPreparation - Outlier - Treatment ASSIGEMENT ANSWER
No ratings yet
DataPreparation - Outlier - Treatment ASSIGEMENT ANSWER
4 pages
Cab Fare Prediction Report by Abhinav Jha
No ratings yet
Cab Fare Prediction Report by Abhinav Jha
41 pages
ML LAB Manual-1
No ratings yet
ML LAB Manual-1
33 pages
Laboratory Exercises in Astronomy: Solutions and Answers
From Everand
Laboratory Exercises in Astronomy: Solutions and Answers
Dr. Adrian Kaminski
No ratings yet