0% found this document useful (0 votes)

13 views9 pages

Predict The Price of The Uber Ride From A Given Pickup Point To The Agreed Drop-Off Location

The document outlines an assignment to predict Uber ride prices using a dataset. It includes tasks such as data pre-processing, outlier identification, correlation checking, and the implementation of linear and random forest regression models. The assignment also emphasizes evaluating and comparing the performance of these models using metrics like R2 and RMSE.

Uploaded by

jshruti6896

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views9 pages

Predict The Price of The Uber Ride From A Given Pickup Point To The Agreed Drop-Off Location

Uploaded by

jshruti6896

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

07/08/2024, 15:10 Assignment 1_ML - Jupyter Notebook

Name : J a d h a v S h r u t i

Roll No : 2441027

Batch : C

Predict the price of the Uber ride from a given pickup point
to the agreed drop-off location.
Perform following tasks:

Pre-process the dataset. Identify outliers. Check the correlation. Implement linear regression and random forest
regression models. Evaluate the models and compare their respective scores like R2, RMSE, etc.

In [1]: import pandas as pd

import numpy as np

In [2]: df=pd.read_csv("Downloads/uber.csv")
df

Out[2]:
Unnamed:
key fare_amount pickup_datetime pickup_longitude pickup_latitude dropoff_longitu
0

2015-05-07 2015-05-07
0 24238194 7.5 -73.999817 40.738354 -73.9995
19:52:06.0000003 19:52:06 UTC

2009-07-17 2009-07-17
1 27835199 7.7 -73.994355 40.728225 -73.9947
20:04:56.0000002 20:04:56 UTC

2009-08-24 2009-08-24
2 44984355 12.9 -74.005043 40.740770 -73.9625
21:45:00.00000061 21:45:00 UTC

2009-06-26 2009-06-26
3 25894730 5.3 -73.976124 40.790844 -73.9653
08:22:21.0000001 08:22:21 UTC

2014-08-28 2014-08-28
4 17610152 16.0 -73.925023 40.744085 -73.9730
17:47:00.000000188 17:47:00 UTC

... ... ... ... ... ... ...

2012-10-28 2012-10-28
199995 42598914 3.0 -73.987042 40.739367 -73.9865
10:49:00.00000053 10:49:00 UTC

2014-03-14 2014-03-14
199996 16382965 7.5 -73.984722 40.736837 -74.0066
01:09:00.0000008 01:09:00 UTC

2009-06-29 2009-06-29
199997 27804658 30.9 -73.986017 40.756487 -73.8589
00:42:00.00000078 00:42:00 UTC

2015-05-20 2015-05-20
199998 20259894 14.5 -73.997124 40.725452 -73.9832
14:56:25.0000004 14:56:25 UTC
2010-05-15 2010-05-15
199999 11951496 14.1 -73.984395 40.720077 -73.9855
04:08:00.00000076 04:08:00 UTC

200000 rows × 9 columns

In [3]: df.shape

Out[3]: (200000, 9)

localhost:8888/notebooks/Assignment 1_ML.ipynb 1/9

07/08/2024, 15:10 Assignment 1_ML - Jupyter Notebook

In [4]: df.dtypes

Out[4]: Unnamed: 0 int64

key object
fare_amount float64
pickup_datetime object
pickup_longitude float64
pickup_latitude float64
dropoff_longitude float64
dropoff_latitude float64
passenger_count int64
dtype: object

In [5]: df.head()

Out[5]:
Unnamed:
key fare_amou nt pickup_datetime pickup_longitude pickup_latitude dropoff_longitude dr
0

2015-05-07 2015-05-07
0 24238194 7.5 -73.999817 40.738354 -73.999512
19:52:06.0000003 19:52:06 UTC

2009-07-17 2009-07-17
1 27835199 7.7 -73.994355 40.728225 -73.994710
20:04:56.0000002 20:04:56 UTC

2009-08-24 2009-08-24
2 44984355 12.9 -74.005043 40.740770 -73.962565
21:45:00.00000061 21:45:00 UTC

2009-06-26 2009-06-26
3 25894730 5.3 -73.976124 40.790844 -73.965316
08:22:21.0000001 08:22:21 UTC

2014-08-28 2014-08-28
4 17610152 16.0 -73.925023 40.744085 -73.973082
17:47:00.000000188 17:47:00 UTC

In [6]: df.tail()

Out[6]:
Unnamed:
key fare_amou nt pickup_datetime pickup_longitude pickup_latitude dropoff_longitud
0

2012-10-28 2012-10-28
199995 42598914 3.0 -73.987042 40.739367 -73.98652
10:49:00.00000053 10:49:00 UTC

2014-03-14 2014-03-14
199996 16382965 7.5 -73.984722 40.736837 -74.00667
01:09:00.0000008 01:09:00 UTC

2009-06-29 2009-06-29
199997 27804658 30 .9 00:42:00 UTC
-73.986017 40.756487 -73.85895
00:42:00.00000078

2015-05-20 2015-05-20
199998 20259894 14.5 14:56:25 UTC -73.997124 40.725452 -73.98321
14:56:25.0000004

2010-05-15 2010-05-15
199999 11951496 14 .1 04:08:00 UTC
-73.984395 40.720077 -73.98550
04:08:00.00000076

In [7]: df=df.drop("Unnamed: 0",axis=1)

localhost:8888/notebooks/Assignment 1_ML.ipynb 2/9

07/08/2024, 15:10 Assignment 1_ML - Jupyter Notebook

In [8]: df

Out[8]:
key fare_amount pickup_datetime pickup_longitude pickup_latitude dropoff_longitude dropoff_

2015-05-07 2015-05-07
0 7.5 -73.999817 40.738354 -73.999512 40
19:52:06.0000003 19:52:06 UTC

2009-07-17 2009-07-17
1 7.7 -73.994355 40.728225 -73.994710 40
20:04:56.0000002 20:04:56 UTC

2009-08-24 2009-08-24
2 12.9 -74.005043 40.740770 -73.962565 40
21:45:00.00000061 21:45:00 UTC

2009-06-26 2009-06-26
3 5.3 -73.976124 40.790844 -73.965316 40
08:22:21.0000001 08:22:21 UTC

2014-08-28 2014-08-28
4 16.0 -73.925023 40.744085 -73.973082 40
17:47:00.000000188 17:47:00 UTC

... ... ... ... ... ... ...

2012-10-28 2012-10-28
199995 3.0 -73.987042 40.739367 -73.986525 40
10:49:00.00000053 10:49:00 UTC

2014-03-14 2014-03-14
199996 7.5 -73.984722 40.736837 -74.006672 40
01:09:00.0000008 01:09:00 UTC

2009-06-29 2009-06-29
199997 30.9 -73.986017 40.756487 -73.858957 40
00:42:00.00000078 00:42:00 UTC

2015-05-20 2015-05-20
199998 14.5 -73.997124 40.725452 -73.983215 40
14:56:25.0000004 14:56:25 UTC

2010-05-15 2010-05-15
199999 14.1 -73.984395 40.720077 -73.985508 40
04:08:00.00000076 04:08:00 UTC

200000 rows × 8 columns

In [9]: df=df.drop("key",axis=1)
df

Out[9]:
fare_amount pickup_datetime pickup_longitude pickup_latitude dropoff_longitude dropoff_latitude passenger_

2015 -05-07
0 7.5 -73.999817 40.738354 -73.999512 40.723217
19:52:06 UTC

2009 -07-17
1 7.7 -73.994355 40.728225 -73.994710 40.750325
20:04:56 UTC

2009 -08-24
2 12.9 -74.005043 40.740770 -73.962565 40.772647
21:45:00 UTC

2009 -06-26
3 5.3 -73.976124 40.790844 -73.965316 40.803349
08:22:21 UTC

2014 -08-28
4 16.0 -73.925023 40.744085 -73.973082 40.761247
17:47:0 0 UTC

... ... ... ... ... ... ...

2012 -10-28
199995 3.0 -73.987042 40.739367 -73.986525 40.740297
10:49:0 0 UTC

2014 -03-14
199996 7.5 -73.984722 40.736837 -74.006672 40.739620
01:09:00 UTC

2009 -06-29
199997 30.9 -73.986017 40.756487 -73.858957 40.692588
00:42:00 UTC

2015 -05-20
199998 14.5 -73.997124 40.725452 -73.983215 40.695415
14:56:25 UTC

2010 -05-15
199999 14.1 -73.984395 40.720077 -73.985508 40.768793
04:08:00 UTC

200000 rows × 7 columns

localhost:8888/notebooks/Assignment 1_ML.ipynb 3/9

07/08/2024, 15:10 Assignment 1_ML - Jupyter Notebook

In [10]: df.dtypes

Out[10]: fare_amount float64

pickup_datetime object
pickup_longitude float64
pickup_latitude float64
dropoff_longitude float64
dropoff_latitude float64
passenger_count int64
dtype: object

In [11]: df["pickup_datetime"]=pd.to_datetime(df["pickup_datetime"])# used to change from object

df.dtypes

Out[11]: fare_amount float64

pickup_datetime datetime64[ns, UTC]
pickup_longitude float64
pickup_latitude float64
dropoff_longitude float64
dropoff_latitude float64
passenger_count int64
dtype: object

In [12]: df.isna().sum()

Out[12]: fare_amount 0
pickup_datetime 0
pickup_longitude 0
pickup_latitude 0
dropoff_longitude 1
dropoff_latitude 1
passenger_count 0
dtype: int64

In [13]: df.fillna(0,inplace=True)

In [14]: df.isnull().sum()

Out[14]: fare_amount 0
pickup_datetime 0
pickup_longitude 0
pickup_latitude 0
dropoff_longitude 0
dropoff_latitude 0
passenger_count 0
dtype: int64

In [15]: df=df.assign(hour=df.pickup_datetime.dt.hour,day=df.pickup_datetime.dt.day,month=df.pick

localhost:8888/notebooks/Assignment 1_ML.ipynb 4/9

07/08/2024, 15:10 Assignment 1_ML - Jupyter Notebook

In [16]: df

Out[16]:
fare_amount pickup_datetime pickup_longitude pickup_latitude dropoff_longitude dropoff_latitude passenger_

2015-05-07
0 7.5 -73.999817 40.738354 -73.999512 40.723217
19:52:06+00:00

2009-07-17
1 7.7 -73.994355 40.728225 -73.994710 40.750325
20:04:56+00:00

2009-08-24
2 12.9 -74.005043 40.740770 -73.962565 40.772647
21:45:00+00:00

2009-06-26
3 5.3 -73.976124 40.790844 -73.965316 40.803349
08:22:21+00:00

2014-08-28
4 16.0 -73.925023 40.744085 -73.973082 40.761247
17:47:00+00:00

... ... ... ... ... ... ...

2012-10-28
199995 3.0 -73.987042 40.739367 -73.986525 40.740297
10:49:00+00:00

2014-03-14
199996 7.5 -73.984722 40.736837 -74.006672 40.739620
01:09:00+00:00

2009-06-29
199997 30.9 -73.986017 40.756487 -73.858957 40.692588
00:42:00+00:00

2015-05-20
199998 14.5 -73.997124 40.725452 -73.983215 40.695415
14:56:25+00:00
2010-05-15
199999 14.1 -73.984395 40.720077 -73.985508 40.768793
04:08:00+00:00

200000 rows × 10 columns

In [17]: df=df.drop("pickup_datetime",axis=1)

In [18]: df

Out[18]:
fare_amount pickup_longitude pickup_latitude dropoff_longitude dropoff_latitude passenger_count hour day

0 7.5 -73.999817 40.738354 -73.999512 40.723217 1 19 7

1 7.7 -73.994355 40.728225 -73.994710 40.750325 1 20 17

2 12.9 -74.005043 40.740770 -73.962565 40.772647 1 21 24

3 5.3 -73.976124 40.790844 -73.965316 40.803349 3 8 26

4 16.0 -73.925023 40.744085 -73.973082 40.761247 5 17 28

... ... ... ... ... ... ... ... ...

199995 3.0 -73.987042 40.739367 -73.986525 40.740297 1 10 28

199996 7.5 -73.984722 40.736837 -74.006672 40.739620 1 1 14

199997 30.9 -73.986017 40.756487 -73.858957 40.692588 2 0 29

199998 14.5 -73.997124 40.725452 -73.983215 40.695415 1 14 20

199999 14.1 -73.984395 40.720077 -73.985508 40.768793 1 4 15

200000 rows × 9 columns

localhost:8888/notebooks/Assignment 1_ML.ipynb 5/9

07/08/2024, 15:10 Assignment 1_ML - Jupyter Notebook

In [19]: df.plot(kind="box",subplots=True,layout=(7,2),figsize=(15,20))

Out[19]: fare_amount AxesSubplot(0.125,0.786098;0.352273x0.0939024)

pickup_longitude AxesSubplot(0.547727,0.786098;0.352273x0.0939024)
pickup_latitude AxesSubplot(0.125,0.673415;0.352273x0.0939024)
dropoff_longitude AxesSubplot(0.547727,0.673415;0.352273x0.0939024)
dropoff_latitude AxesSubplot(0.125,0.560732;0.352273x0.0939024)
passenger_count AxesSubplot(0.547727,0.560732;0.352273x0.0939024)
hour AxesSubplot(0.125,0.448049;0.352273x0.0939024)
day AxesSubplot(0.547727,0.448049;0.352273x0.0939024)
month AxesSubplot(0.125,0.335366;0.352273x0.0939024)
dtype: object

In [20]: def find_outliers_IQR(df,col):

q1=df[col].quantile(0.25)
q3=df[col].quantile(0.75)
IQR=q3-q1
upper_whisker = q1-1.5*IQR
lower_whisker = q3+1.5*IQR
df[col]=np.clip(df[col],lower_whisker,upper_whisker)
return df

def all_outliers(df,col_list):
for i in col_list:
df=find_outliers_IQR(df,i)
return df

In [21]: df=all_outliers(df,df.iloc[:,0::])

localhost:8888/notebooks/Assignment 1_ML.ipynb 6/9

07/08/2024, 15:10 Assignment 1_ML - Jupyter Notebook

In [22]: df.plot(kind="box",subplots=True,layout=(7,2),figsize=(15,20))

Out[22]: fare_amount AxesSubplot(0.125,0.786098;0.352273x0.0939024)

In [23]: df.corr()

Out[23]:
fare_amount pickup_longitude pickup_latitude dropoff_longitude dropoff_latitude passenger_count

fare_amount 1.000000 0.154069 -0.110842 0.218704 -0.125898 0.015778

pickup_longitude 0.154069 1.000000 0.259497 0.425631 0.073290 -0.013213

pickup_latitude -0.110842 0.259497 1.000000 0.048898 0.515714 -0.012889

dropoff_longitude 0.218704 0.425631 0.048898 1.000000 0.245627 -0.009325

dropoff_latitude -0.125898 0.073290 0.515714 0.245627 1.000000 -0.006308

passenger_count 0.015778 -0.013213 -0.012889 -0.009325 -0.006308 1.000000

hour -0.023623 0.011579 0.029681 -0.046578 0.019783 0.020274

day 0.004534 -0.003204 -0.001553 -0.004027 -0.003479 0.002712

month 0.030817 0.001169 0.001562 0.002394 -0.001193 0.010351

localhost:8888/notebooks/Assignment 1_ML.ipynb 7/9

07/08/2024, 15:10 Assignment 1_ML - Jupyter Notebook

In [24]: import seaborn as sns

sns.heatmap(df.corr())

Out[24]: <AxesSubplot:>

In [25]: X = df[['pickup_longitude', 'pickup_latitude', 'dropoff_longitude', 'dropoff_latitude',

y = df['fare_amount'] #Target

Out[25]: 0 7.50
1 7.70
2 12.90
3 5.30
4 16.00
...
199995 3.00
199996 7.50
199997 22.25
199998 14.50
199999 14.10
Name: fare_amount, Length: 200000, dtype: float64

In [27]: from sklearn.model_selection import train_test_split

from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import r2_score, mean_squared_error

In [28]: X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42

In [29]: lr_model = LinearRegression()

lr_model.fit(X_train, y_train)

Out[29]: LinearRegression()

In [31]: rf_model = RandomForestRegressor(n_estimators=100, random_state=42)

rf_model.fit(X_train, y_train)

Out[31]: RandomForestRegressor(random_state=42)

localhost:8888/notebooks/Assignment 1_ML.ipynb 8/9

07/08/2024, 15:10 Assignment 1_ML - Jupyter Notebook

In [32]: y_pred_lr = lr_model.predict(X_test)

y_pred_lr
print("Linear Model:",y_pred_lr)
y_pred_rf = rf_model.predict(X_test)
print("Random Forest Model:", y_pred_rf)

Linear Model: [ 9.8745977 17.13685119 10.30134461 ... 8.92996545 9.28083902

9.30188948]
Random Forest Model: [ 5.858 10.53971971 7.422 ... 5.3515 6.296
7.872 ]

In [33]: r2_lr = r2_score(y_test, y_pred_lr)

rmse_lr = np.sqrt(mean_squared_error(y_test, y_pred_lr))

In [34]: print("Linear Regression - R2:", r2_lr)

print("Linear Regression - RMSE:", rmse_lr)

Linear Regression - R2: 0.09111542765407288

Linear Regression - RMSE: 5.200086615056714

In [35]: r2_rf = r2_score(y_test, y_pred_rf)

rmse_rf = np.sqrt(mean_squared_error(y_test, y_pred_rf))

print("Random Forest Regression R2:", r2_rf)

print("Random Forest Regression RMSE:",rmse_rf)

Random Forest Regression R2: 0.7600801674798523

Random Forest Regression RMSE: 2.67170981840233

In [ ]:

localhost:8888/notebooks/Assignment 1_ML.ipynb 9/9

RAADS-R Test: Ritvo Autism Asperger Diagnostic Scale-Revised
100% (3)
RAADS-R Test: Ritvo Autism Asperger Diagnostic Scale-Revised
10 pages
Delhivery Feature Engineering Cs
No ratings yet
Delhivery Feature Engineering Cs
46 pages
Yash Week 3 Uber Case Study
No ratings yet
Yash Week 3 Uber Case Study
38 pages
Zeigarnik Effect 1
100% (4)
Zeigarnik Effect 1
8 pages
ML - Practical - 1 - Jupyter Notebook
No ratings yet
ML - Practical - 1 - Jupyter Notebook
15 pages
ML Practical 1
No ratings yet
ML Practical 1
15 pages
Lab1.ipynb - Colaboratory
No ratings yet
Lab1.ipynb - Colaboratory
9 pages
Name: Siddhesh Asati: #Group: B (ML) #Assignment: 6
No ratings yet
Name: Siddhesh Asati: #Group: B (ML) #Assignment: 6
9 pages
ML 1 16
No ratings yet
ML 1 16
13 pages
ML 1 Um
No ratings yet
ML 1 Um
5 pages
ML - 2 - Jupyter Notebook
No ratings yet
ML - 2 - Jupyter Notebook
6 pages
P1) Code Uber
No ratings yet
P1) Code Uber
6 pages
ML Practical 1 Code
100% (1)
ML Practical 1 Code
1 page
Practical 1
No ratings yet
Practical 1
6 pages
Assignment No 1 Output
No ratings yet
Assignment No 1 Output
42 pages
SourceCode Assignment1
No ratings yet
SourceCode Assignment1
9 pages
ML Practical 1
No ratings yet
ML Practical 1
15 pages
Loading The Dataset: First We Load The Dataset and Find Out The Number of Columns, Rows, NULL Values, Etc
100% (1)
Loading The Dataset: First We Load The Dataset and Find Out The Number of Columns, Rows, NULL Values, Etc
8 pages
Uber ml1 - Jupyter Notebook
No ratings yet
Uber ml1 - Jupyter Notebook
10 pages
ML All Prints
No ratings yet
ML All Prints
25 pages
Merged
No ratings yet
Merged
47 pages
SPPUML1
No ratings yet
SPPUML1
8 pages
Ml-Exp-1 - Jupyter Notebook
No ratings yet
Ml-Exp-1 - Jupyter Notebook
8 pages
Praktikum 5
No ratings yet
Praktikum 5
20 pages
Uber
No ratings yet
Uber
7 pages
UBER Data Wrangling
No ratings yet
UBER Data Wrangling
45 pages
Taxi Trips Analysis Project 1682332303
100% (2)
Taxi Trips Analysis Project 1682332303
28 pages
EDA Optimising NYC Taxis GautamTiwari - Cleanup
No ratings yet
EDA Optimising NYC Taxis GautamTiwari - Cleanup
1 page
Supervised Regression
No ratings yet
Supervised Regression
24 pages
ML Code Output
No ratings yet
ML Code Output
38 pages
Bose A S
No ratings yet
Bose A S
37 pages
Analyzing Taxi Trends
No ratings yet
Analyzing Taxi Trends
43 pages
Taxi Fare Team 09
No ratings yet
Taxi Fare Team 09
25 pages
Airfare ML - Predicting Flight Fares
No ratings yet
Airfare ML - Predicting Flight Fares
21 pages
Flight-Price-Prediction - Flight - Price - Ipynb at Master Mandal-21 - Flight-Price-Prediction
No ratings yet
Flight-Price-Prediction - Flight - Price - Ipynb at Master Mandal-21 - Flight-Price-Prediction
28 pages
Uber Drive Practice DP PDF
No ratings yet
Uber Drive Practice DP PDF
10 pages
Uber - Rides - Analysis - Jupyter Notebook
No ratings yet
Uber - Rides - Analysis - Jupyter Notebook
12 pages
PMT2 21
No ratings yet
PMT2 21
39 pages
Report
No ratings yet
Report
25 pages
Assignment 1, Codeandssfile
No ratings yet
Assignment 1, Codeandssfile
29 pages
Multiple - Linear - Regression - AirBNB - Solution-0.2 - New - Ipynb - Colaboratory
No ratings yet
Multiple - Linear - Regression - AirBNB - Solution-0.2 - New - Ipynb - Colaboratory
11 pages
Week1 Numpy, Pandas (178) .Ipynb Colab
No ratings yet
Week1 Numpy, Pandas (178) .Ipynb Colab
6 pages
Cleaning Data in Python
No ratings yet
Cleaning Data in Python
8 pages
Airline Passenger Booking Analyze
No ratings yet
Airline Passenger Booking Analyze
26 pages
Data Cleaning
No ratings yet
Data Cleaning
13 pages
Flight - Price - Machine Learning
No ratings yet
Flight - Price - Machine Learning
23 pages
Lesson - 3 - 1 Data Wrangling
No ratings yet
Lesson - 3 - 1 Data Wrangling
29 pages
Scaffold FG
No ratings yet
Scaffold FG
13 pages
ML A 6 Project
No ratings yet
ML A 6 Project
18 pages
Train Reservation
No ratings yet
Train Reservation
16 pages
DA Lab Manual r22
No ratings yet
DA Lab Manual r22
31 pages
DMV - 4 - Jupyter Notebook
No ratings yet
DMV - 4 - Jupyter Notebook
8 pages
Delhivery
No ratings yet
Delhivery
20 pages
Practical No. 6
No ratings yet
Practical No. 6
15 pages
DSP Lec6
No ratings yet
DSP Lec6
10 pages
3rd Semester DDM AI DAA DEV Print Pages For Spiral Record 25-1-24 - Removed
No ratings yet
3rd Semester DDM AI DAA DEV Print Pages For Spiral Record 25-1-24 - Removed
28 pages
Data Science Project
No ratings yet
Data Science Project
7 pages
Untitled 18
No ratings yet
Untitled 18
7 pages
Hotels Analysis Project
No ratings yet
Hotels Analysis Project
23 pages
Shaheed Zulfikar Ali Bhutto Institute of Science & Technology
No ratings yet
Shaheed Zulfikar Ali Bhutto Institute of Science & Technology
12 pages
Bi 6 New
No ratings yet
Bi 6 New
6 pages
Addition - Ipynb - Colab
No ratings yet
Addition - Ipynb - Colab
2 pages
HPC Report 1
No ratings yet
HPC Report 1
12 pages
Res Net
No ratings yet
Res Net
13 pages
6 SQL Bangla Tutorials
No ratings yet
6 SQL Bangla Tutorials
16 pages
Hazop Study Fire and Explosion
No ratings yet
Hazop Study Fire and Explosion
30 pages
ME2102 Tutorial 6
No ratings yet
ME2102 Tutorial 6
2 pages
Instruction Manual: Digital Genset Controller DGC-500
No ratings yet
Instruction Manual: Digital Genset Controller DGC-500
151 pages
Model Risk Tiering
100% (2)
Model Risk Tiering
32 pages
Sample Diagnostic
No ratings yet
Sample Diagnostic
29 pages
Biblio Tatla Aspects of Universality in Modern and Postmodern Architecture
No ratings yet
Biblio Tatla Aspects of Universality in Modern and Postmodern Architecture
3 pages
Grade 8 Revision
No ratings yet
Grade 8 Revision
11 pages
Dissertation Alexis de Tocqueville
100% (2)
Dissertation Alexis de Tocqueville
8 pages
Pt. Fortindo Sukses Makmur: Price List
No ratings yet
Pt. Fortindo Sukses Makmur: Price List
22 pages
S. G. Balekundri Institute of Technology: Shivabasavanagar, Belagavi-590 010, Karnataka - India
No ratings yet
S. G. Balekundri Institute of Technology: Shivabasavanagar, Belagavi-590 010, Karnataka - India
7 pages
APA 7 Referencing Sources Examples August 2021 v1.0
No ratings yet
APA 7 Referencing Sources Examples August 2021 v1.0
67 pages
Fazal Mahmood - Resume
No ratings yet
Fazal Mahmood - Resume
1 page
Abyip 2024 1
No ratings yet
Abyip 2024 1
11 pages
Subject: Physics Grade: 10-SCIENCE, 10-TVET Week: I Topic: Time
No ratings yet
Subject: Physics Grade: 10-SCIENCE, 10-TVET Week: I Topic: Time
1 page
Eel 5245 Power Electronics I Lecture #2: Chapter 1 Introduction To Power Electronics
No ratings yet
Eel 5245 Power Electronics I Lecture #2: Chapter 1 Introduction To Power Electronics
27 pages
List of MCA For CSC
No ratings yet
List of MCA For CSC
9 pages
Support Vector Machine For EEG Signal
No ratings yet
Support Vector Machine For EEG Signal
4 pages
Chapter 6
No ratings yet
Chapter 6
10 pages
Machine Standard Configuration: Horizon 03ix
No ratings yet
Machine Standard Configuration: Horizon 03ix
8 pages
Module 5 - Rocks
No ratings yet
Module 5 - Rocks
14 pages
Import As Import As From Import
No ratings yet
Import As Import As From Import
23 pages
02 - Introduction To Probabilities
No ratings yet
02 - Introduction To Probabilities
38 pages
Listening
No ratings yet
Listening
22 pages
NIPS2019 TGAN Supplementary PDF
No ratings yet
NIPS2019 TGAN Supplementary PDF
7 pages
CPSE Contacts
No ratings yet
CPSE Contacts
1,264 pages
DE09 Sol
No ratings yet
DE09 Sol
157 pages
Monthly Bill
No ratings yet
Monthly Bill
1 page

Predict The Price of The Uber Ride From A Given Pickup Point To The Agreed Drop-Off Location

Uploaded by

Predict The Price of The Uber Ride From A Given Pickup Point To The Agreed Drop-Off Location

Uploaded by

07/08/2024, 15:10 Assignment 1_ML - Jupyter Notebook

In [1]: import pandas as pd

... ... ... ... ... ... ...

200000 rows × 9 columns

localhost:8888/notebooks/Assignment 1_ML.ipynb 1/9

Out[4]: Unnamed: 0 int64

In [7]: df=df.drop("Unnamed: 0",axis=1)

localhost:8888/notebooks/Assignment 1_ML.ipynb 2/9

... ... ... ... ... ... ...

200000 rows × 8 columns

... ... ... ... ... ... ...

200000 rows × 7 columns

localhost:8888/notebooks/Assignment 1_ML.ipynb 3/9

Out[10]: fare_amount float64

In [11]: df["pickup_datetime"]=pd.to_datetime(df["pickup_datetime"])# used to change from object

Out[11]: fare_amount float64

localhost:8888/notebooks/Assignment 1_ML.ipynb 4/9

... ... ... ... ... ... ...

200000 rows × 10 columns

0 7.5 -73.999817 40.738354 -73.999512 40.723217 1 19 7

1 7.7 -73.994355 40.728225 -73.994710 40.750325 1 20 17

2 12.9 -74.005043 40.740770 -73.962565 40.772647 1 21 24

3 5.3 -73.976124 40.790844 -73.965316 40.803349 3 8 26

4 16.0 -73.925023 40.744085 -73.973082 40.761247 5 17 28

... ... ... ... ... ... ... ... ...

199995 3.0 -73.987042 40.739367 -73.986525 40.740297 1 10 28

199996 7.5 -73.984722 40.736837 -74.006672 40.739620 1 1 14

199997 30.9 -73.986017 40.756487 -73.858957 40.692588 2 0 29

199998 14.5 -73.997124 40.725452 -73.983215 40.695415 1 14 20

199999 14.1 -73.984395 40.720077 -73.985508 40.768793 1 4 15

200000 rows × 9 columns

localhost:8888/notebooks/Assignment 1_ML.ipynb 5/9

Out[19]: fare_amount AxesSubplot(0.125,0.786098;0.352273x0.0939024)

In [20]: def find_outliers_IQR(df,col):

localhost:8888/notebooks/Assignment 1_ML.ipynb 6/9

Out[22]: fare_amount AxesSubplot(0.125,0.786098;0.352273x0.0939024)

fare_amount 1.000000 0.154069 -0.110842 0.218704 -0.125898 0.015778

pickup_longitude 0.154069 1.000000 0.259497 0.425631 0.073290 -0.013213

pickup_latitude -0.110842 0.259497 1.000000 0.048898 0.515714 -0.012889

dropoff_longitude 0.218704 0.425631 0.048898 1.000000 0.245627 -0.009325

dropoff_latitude -0.125898 0.073290 0.515714 0.245627 1.000000 -0.006308

passenger_count 0.015778 -0.013213 -0.012889 -0.009325 -0.006308 1.000000

hour -0.023623 0.011579 0.029681 -0.046578 0.019783 0.020274

day 0.004534 -0.003204 -0.001553 -0.004027 -0.003479 0.002712

month 0.030817 0.001169 0.001562 0.002394 -0.001193 0.010351

localhost:8888/notebooks/Assignment 1_ML.ipynb 7/9

In [24]: import seaborn as sns

In [25]: X = df[['pickup_longitude', 'pickup_latitude', 'dropoff_longitude', 'dropoff_latitude',

In [27]: from sklearn.model_selection import train_test_split

In [28]: X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42

In [29]: lr_model = LinearRegression()

In [31]: rf_model = RandomForestRegressor(n_estimators=100, random_state=42)

localhost:8888/notebooks/Assignment 1_ML.ipynb 8/9

In [32]: y_pred_lr = lr_model.predict(X_test)

Linear Model: [ 9.8745977 17.13685119 10.30134461 ... 8.92996545 9.28083902

In [33]: r2_lr = r2_score(y_test, y_pred_lr)

In [34]: print("Linear Regression - R2:", r2_lr)

Linear Regression - R2: 0.09111542765407288

In [35]: r2_rf = r2_score(y_test, y_pred_rf)

print("Random Forest Regression R2:", r2_rf)

Random Forest Regression R2: 0.7600801674798523

localhost:8888/notebooks/Assignment 1_ML.ipynb 9/9

You might also like