0% found this document useful (0 votes)

7 views13 pages

ML 1 16

Machine learning practical for uber ride price prediction using logistic and random

Uploaded by

Sakshi Anil Ugale

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views13 pages

ML 1 16

Machine learning practical for uber ride price prediction using logistic and random

Uploaded by

Sakshi Anil Ugale

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

In [1]: 1 # Name: Vedika Santosh Jadhav

2 # Roll_no:2441011
3
4 #Assignment No.:1
5 #Title:Predict the price of the Uber ride from a given pickup
6 #point to the agreed drop-off location.
7 # Perform following tasks:
8 # 1. Pre-process the dataset.
9 # 2. Identify outliers.
10 # 3. Check the correlation.
11 # 4. Implement linear regression and random forest regression models.
12 # 5. Evaluate the models and compare their respective scores like R2, RMSE,etc.
13
14 import pandas as pd
15 import numpy as np
16 import seaborn as sns
17 import matplotlib.pyplot as plt

In [2]: 1 df=pd.read_csv("uber - uber.csv")

In [3]: 1 df

Out[3]:
Unnamed:
key fare_amount pickup_datetime pickup_longitude pickup_latitude dropoff_lon
0

2015-
2015-05-07
0 24238194 05-07 7.5 -73.999817 40.738354 -73.9
19:52:06 UTC
19:52:06

2009-
2009-07-17
1 27835199 07-17 7.7 -73.994355 40.728225 -73.9
20:04:56 UTC
20:04:56

2009-
2009-08-24
2 44984355 08-24 12.9 -74.005043 40.740770 -73.9
21:45:00 UTC
21:45:00

2009-
2009-06-26
3 25894730 06-26 5.3 -73.976124 40.790844 -73.9
08:22:21 UTC
8:22:21

2014-
2014-08-28
4 17610152 08-28 16.0 -73.925023 40.744085 -73.9
17:47:00 UTC
17:47:00

... ... ... ... ... ... ...

2012-
2012-10-28
199995 42598914 10-28 3.0 -73.987042 40.739367 -73.9
10:49:00 UTC
10:49:00

2014-
2014-03-14
199996 16382965 03-14 7.5 -73.984722 40.736837 -74.0
01:09:00 UTC
1:09:00

2009-
2009-06-29
199997 27804658 06-29 30.9 -73.986017 40.756487 -73.8
00:42:00 UTC
0:42:00

2015-
2015-05-20
199998 20259894 05-20 14.5 -73.997124 40.725452 -73.9
14:56:25 UTC
14:56:25

2010-
2010-05-15
199999 11951496 05-15 14.1 -73.984395 40.720077 -73.9
04:08:00 UTC
4:08:00

200000 rows × 9 columns

 
In [4]: 1 df.head()

Out[4]:
Unnamed:
key fare_amount pickup_datetime pickup_longitude pickup_latitude dropoff_longitude
0

2015-
2015-05-07
0 24238194 05-07 7.5 -73.999817 40.738354 -73.999512
19:52:06 UTC
19:52:06

2009-
2009-07-17
1 27835199 07-17 7.7 -73.994355 40.728225 -73.994710
20:04:56 UTC
20:04:56

2009-
2009-08-24
2 44984355 08-24 12.9 -74.005043 40.740770 -73.962565
21:45:00 UTC
21:45:00

2009-
2009-06-26
3 25894730 06-26 5.3 -73.976124 40.790844 -73.965316
08:22:21 UTC
8:22:21

2014-
2014-08-28
4 17610152 08-28 16.0 -73.925023 40.744085 -73.973082
17:47:00 UTC
17:47:00

 

In [5]: 1 df.tail()

Out[5]:
Unnamed:
key fare_amount pickup_datetime pickup_longitude pickup_latitude dropoff_lon
0

2012-
2012-10-28
199995 42598914 10-28 3.0 -73.987042 40.739367 -73.9
10:49:00 UTC
10:49:00

2014-
2014-03-14
199996 16382965 03-14 7.5 -73.984722 40.736837 -74.0
01:09:00 UTC
1:09:00

2009-
2009-06-29
199997 27804658 06-29 30.9 -73.986017 40.756487 -73.8
00:42:00 UTC
0:42:00

2015-
2015-05-20
199998 20259894 05-20 14.5 -73.997124 40.725452 -73.9
14:56:25 UTC
14:56:25

2010-
2010-05-15
199999 11951496 05-15 14.1 -73.984395 40.720077 -73.9
04:08:00 UTC
4:08:00

 

In [6]: 1 df.isna().sum()

Out[6]: Unnamed: 0 0
key 0
fare_amount 0
pickup_datetime 0
pickup_longitude 0
pickup_latitude 0
dropoff_longitude 1
dropoff_latitude 1
passenger_count 0
dtype: int64
In [7]: 1 df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 200000 entries, 0 to 199999
Data columns (total 9 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Unnamed: 0 200000 non-null int64
1 key 200000 non-null object
2 fare_amount 200000 non-null float64
3 pickup_datetime 200000 non-null object
4 pickup_longitude 200000 non-null float64
5 pickup_latitude 200000 non-null float64
6 dropoff_longitude 199999 non-null float64
7 dropoff_latitude 199999 non-null float64
8 passenger_count 200000 non-null int64
dtypes: float64(5), int64(2), object(2)
memory usage: 13.7+ MB

In [8]: 1 df.dtypes

Out[8]: Unnamed: 0 int64

key object
fare_amount float64
pickup_datetime object
pickup_longitude float64
pickup_latitude float64
dropoff_longitude float64
dropoff_latitude float64
passenger_count int64
dtype: object

In [9]: 1 df.shape

Out[9]: (200000, 9)

In [10]: 1 df['pickup_datetime']=pd.to_datetime(df['pickup_datetime'])

In [11]: 1 df.dtypes

Out[11]: Unnamed: 0 int64

key object
fare_amount float64
pickup_datetime datetime64[ns, UTC]
pickup_longitude float64
pickup_latitude float64
dropoff_longitude float64
dropoff_latitude float64
passenger_count int64
dtype: object

In [12]: 1 df=df.drop("Unnamed: 0",axis=1)

2 df=df.drop("key",axis=1)
In [13]: 1 df

Out[13]:
fare_amount pickup_datetime pickup_longitude pickup_latitude dropoff_longitude dropoff_latitud

2015-05-07
0 7.5 -73.999817 40.738354 -73.999512 40.7232
19:52:06+00:00

2009-07-17
1 7.7 -73.994355 40.728225 -73.994710 40.75032
20:04:56+00:00

2009-08-24
2 12.9 -74.005043 40.740770 -73.962565 40.77264
21:45:00+00:00

2009-06-26
3 5.3 -73.976124 40.790844 -73.965316 40.80334
08:22:21+00:00

2014-08-28
4 16.0 -73.925023 40.744085 -73.973082 40.76124
17:47:00+00:00

... ... ... ... ... ...

2012-10-28
199995 3.0 -73.987042 40.739367 -73.986525 40.74029
10:49:00+00:00

2014-03-14
199996 7.5 -73.984722 40.736837 -74.006672 40.73962
01:09:00+00:00

2009-06-29
199997 30.9 -73.986017 40.756487 -73.858957 40.69258
00:42:00+00:00

2015-05-20
199998 14.5 -73.997124 40.725452 -73.983215 40.6954
14:56:25+00:00

2010-05-15
199999 14.1 -73.984395 40.720077 -73.985508 40.76879
04:08:00+00:00

200000 rows × 7 columns

 

In [14]: 1 df.fillna(0,inplace=True)

In [15]: 1 df.isnull().sum()

Out[15]: fare_amount 0
pickup_datetime 0
pickup_longitude 0
pickup_latitude 0
dropoff_longitude 0
dropoff_latitude 0
passenger_count 0
dtype: int64

In [16]: 1 df=df.assign(hour=df.pickup_datetime.dt.hour,
2 day=df.pickup_datetime.dt.day,
3 month=df.pickup_datetime.dt.month,
4 year=df.pickup_datetime.dt.year,
5 daysofweek=df.pickup_datetime.dt.dayofweek)
In [17]: 1 df

Out[17]:
fare_amount pickup_datetime pickup_longitude pickup_latitude dropoff_longitude dropoff_latitud

2015-05-07
0 7.5 -73.999817 40.738354 -73.999512 40.7232
19:52:06+00:00

2009-07-17
1 7.7 -73.994355 40.728225 -73.994710 40.75032
20:04:56+00:00

2009-08-24
2 12.9 -74.005043 40.740770 -73.962565 40.77264
21:45:00+00:00

2009-06-26
3 5.3 -73.976124 40.790844 -73.965316 40.80334
08:22:21+00:00

2014-08-28
4 16.0 -73.925023 40.744085 -73.973082 40.76124
17:47:00+00:00

... ... ... ... ... ...

2012-10-28
199995 3.0 -73.987042 40.739367 -73.986525 40.74029
10:49:00+00:00

2014-03-14
199996 7.5 -73.984722 40.736837 -74.006672 40.73962
01:09:00+00:00

2009-06-29
199997 30.9 -73.986017 40.756487 -73.858957 40.69258
00:42:00+00:00

2015-05-20
199998 14.5 -73.997124 40.725452 -73.983215 40.6954
14:56:25+00:00

2010-05-15
199999 14.1 -73.984395 40.720077 -73.985508 40.76879
04:08:00+00:00

200000 rows × 12 columns

 

In [18]: 1 df=df.drop("pickup_datetime",axis=1)
In [19]: 1 df.plot()

Out[19]: <Axes: >

In [20]: 1 df.plot(kind="box")

Out[20]: <Axes: >

In [21]: 1 df.plot(kind="box",subplots=True,layout=(7,2),figsize=(15,20))

Out[21]: fare_amount Axes(0.125,0.786098;0.352273x0.0939024)

pickup_longitude Axes(0.547727,0.786098;0.352273x0.0939024)
pickup_latitude Axes(0.125,0.673415;0.352273x0.0939024)
dropoff_longitude Axes(0.547727,0.673415;0.352273x0.0939024)
dropoff_latitude Axes(0.125,0.560732;0.352273x0.0939024)
passenger_count Axes(0.547727,0.560732;0.352273x0.0939024)
hour Axes(0.125,0.448049;0.352273x0.0939024)
day Axes(0.547727,0.448049;0.352273x0.0939024)
month Axes(0.125,0.335366;0.352273x0.0939024)
year Axes(0.547727,0.335366;0.352273x0.0939024)
daysofweek Axes(0.125,0.222683;0.352273x0.0939024)
dtype: object
In [22]: 1 def remove_outlier(df1,col):
2 Q1=df1[col].quantile(0.25)
3 Q3=df1[col].quantile(0.75)
4 IQR=Q3-Q1
5 lower=Q1-1.5*IQR
6 upper=Q3+1.5*IQR
7 df[col]=np.clip(df1[col],lower,upper)
8 return df1
9
10 def treat_outliers(df,col_list):
11 for c in col_list:
12 df1=remove_outlier(df,c)
13 return df1
14
15 df=treat_outliers(df,df.iloc[:,0::])
16 df.plot(kind="box",subplots=True,layout=(7,2),figsize=(15,20))

Out[22]: fare_amount Axes(0.125,0.786098;0.352273x0.0939024)

pickup_longitude Axes(0.547727,0.786098;0.352273x0.0939024)
pickup_latitude Axes(0.125,0.673415;0.352273x0.0939024)
dropoff_longitude Axes(0.547727,0.673415;0.352273x0.0939024)
dropoff_latitude Axes(0.125,0.560732;0.352273x0.0939024)
passenger_count Axes(0.547727,0.560732;0.352273x0.0939024)
hour Axes(0.125,0.448049;0.352273x0.0939024)
day Axes(0.547727,0.448049;0.352273x0.0939024)
month Axes(0.125,0.335366;0.352273x0.0939024)
year Axes(0.547727,0.335366;0.352273x0.0939024)
daysofweek Axes(0.125,0.222683;0.352273x0.0939024)
dtype: object
In [23]: 1 corr=df.corr()
2 print(corr.shape)
3 plt.figure(figsize=(6,6))
4 sns.heatmap(corr,cbar=True,square=True,fmt='.1f',
5 annot=True,annot_kws={'size':15},cmap="Oranges")

(11, 11)

Out[23]: <Axes: >

In [24]: 1 pip install numpy

Defaulting to user installation because normal site-packages is not writeable

Requirement already satisfied: numpy in c:\programdata\anaconda3\lib\site-packages
(1.24.3)
Note: you may need to restart the kernel to use updated packages.

In [25]: 1 pip install numpy

Defaulting to user installation because normal site-packages is not writeable

Requirement already satisfied: numpy in c:\programdata\anaconda3\lib\site-packages
(1.24.3)
Note: you may need to restart the kernel to use updated packages.
In [26]: 1 pip install --force-reinstall haversine

Defaulting to user installation because normal site-packages is not writeable

Collecting haversine
Obtaining dependency information for haversine from https://files.pythonhosted.or
g/packages/5b/f1/b7274966f0b5b665d9114e86d09c6bc87d241781d63d8817323dcfa940c6/havers
ine-2.8.1-py2.py3-none-any.whl.metadata (https://files.pythonhosted.org/packages/5b/
f1/b7274966f0b5b665d9114e86d09c6bc87d241781d63d8817323dcfa940c6/haversine-2.8.1-py2.
py3-none-any.whl.metadata)
Using cached haversine-2.8.1-py2.py3-none-any.whl.metadata (5.9 kB)
Using cached haversine-2.8.1-py2.py3-none-any.whl (7.7 kB)
Installing collected packages: haversine
Attempting uninstall: haversine
Found existing installation: haversine 2.8.1
Uninstalling haversine-2.8.1:
Successfully uninstalled haversine-2.8.1
Successfully installed haversine-2.8.1
Note: you may need to restart the kernel to use updated packages.

In [27]: 1 import haversine as hs

In [28]: 1 travel_dist = []
2 for pos in range(len(df['pickup_longitude'])):
3 long1,lati1,long2,lati2 = [df['pickup_longitude']
4 [pos],df['pickup_latitude'][pos],df['dropoff_longitude']
5 [pos],df['dropoff_latitude'][pos]]
6 loc1=(lati1,long1)
7 loc2=(lati2,long2)
8 c = hs.haversine(loc1,loc2)
9 travel_dist.append(c)
10
11
12 print(travel_dist)
13 df['dist_travel_km'] = travel_dist
14 df.head()

IOPub data rate exceeded.

The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_data_rate_limit`.

Current values:
NotebookApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
NotebookApp.rate_limit_window=3.0 (secs)

Out[28]:
fare_amount pickup_longitude pickup_latitude dropoff_longitude dropoff_latitude passenger_count h

0 7.5 -73.999817 40.738354 -73.999512 40.723217 1.0

1 7.7 -73.994355 40.728225 -73.994710 40.750325 1.0

2 12.9 -74.005043 40.740770 -73.962565 40.772647 1.0

3 5.3 -73.976124 40.790844 -73.965316 40.803349 3.0

4 16.0 -73.929786 40.744085 -73.973082 40.761247 3.5

 
In [29]: 1 x=df[['pickup_longitude','pickup_latitude',
2 'dropoff_longitude','dropoff_latitude','passenger_count',
3 'hour','day','month','year','daysofweek','dist_travel_km']]
4 y=df['fare_amount']
5
6

In [30]: 1 from sklearn.model_selection import train_test_split

2 x_train,x_test,y_train,y_test=
3 train_test_split(x,y,test_size=0.2,random_state=42)

In [31]: 1 from sklearn.linear_model import LinearRegression

2 model=LinearRegression()
3 model.fit(x_train, y_train)
4 LinearRegression()
5 y_pred = model.predict(x_test)
6 y_pred

Out[31]: array([ 7.68008794, 10.60894655, 8.47854262, ..., 6.36062388,

6.57836563, 8.154517 ])

In [32]: 1 from sklearn.metrics import mean_absolute_error,

2 mean_squared_error, r2_score

In [33]: 1 r2 = r2_score(y_test, y_pred)

2 mae = mean_absolute_error(y_test, y_pred)
3 mse = mean_squared_error(y_test, y_pred)
4 rmse = np.sqrt(mse)

In [34]: 1 print(f'R² Score: {r2}')

2 print(f'Mean Absolute Error (MAE): {mae}')
3 print(f'Mean Squared Error (MSE): {mse}')
4 print(f'Root Mean Squared Error (RMSE): {rmse}')

R² Score: 0.6589516503634483
Mean Absolute Error (MAE): 2.157564875844668
Mean Squared Error (MSE): 10.146783070723341
Root Mean Squared Error (RMSE): 3.185401555647787

In [35]: 1
2 def custom_accuracy(y_true, y_pred, tolerance=0.1):
3 return np.mean(np.abs(y_true - y_pred) <= tolerance)
4
5 accuracy = custom_accuracy(y_test, y_pred, tolerance=0.5)
6 print(f'Custom Accuracy: {accuracy}')
7

Custom Accuracy: 0.170375

In [37]: 1 from sklearn.ensemble import RandomForestRegressor

In [38]: 1 rf_model = RandomForestRegressor(n_estimators=100,

2 random_state=42)
In [40]: 1 rf_model.fit(x_train, y_train)

Out[40]: ▾ RandomForestRegressor
RandomForestRegressor(random_state=42)

In [43]: 1 y_pred_rf = rf_model.predict(x_test)

2 print("Random Forest Model:", y_pred_rf)

Random Forest Model: [ 5.855 12.729 7.338 ... 4.816 5.828 8.495]

In [46]: 1 r2_lr = r2_score(y_test,y_pred )

2 rmse_lr = np.sqrt(mean_squared_error(y_test, y_pred))

In [47]: 1 r2_rf = r2_score(y_test, y_pred_rf)

2 rmse_rf = np.sqrt(mean_squared_error(y_test, y_pred_rf))
3 print("Random Forest Regression R2:", r2_rf)
4 print("Random Forest Regression RMSE:",rmse_rf)

Random Forest Regression R2: 0.7944021741272649

Random Forest Regression RMSE: 2.473235494329611

In [ ]: 1

Delhivery Feature Engineering Cs
No ratings yet
Delhivery Feature Engineering Cs
46 pages
Yash Week 3 Uber Case Study
No ratings yet
Yash Week 3 Uber Case Study
38 pages
ML Practical 1
No ratings yet
ML Practical 1
15 pages
Name: Siddhesh Asati: #Group: B (ML) #Assignment: 6
No ratings yet
Name: Siddhesh Asati: #Group: B (ML) #Assignment: 6
9 pages
ML - Practical - 1 - Jupyter Notebook
No ratings yet
ML - Practical - 1 - Jupyter Notebook
15 pages
P1) Code Uber
No ratings yet
P1) Code Uber
6 pages
Loading The Dataset: First We Load The Dataset and Find Out The Number of Columns, Rows, NULL Values, Etc
100% (1)
Loading The Dataset: First We Load The Dataset and Find Out The Number of Columns, Rows, NULL Values, Etc
8 pages
Merged
No ratings yet
Merged
47 pages
Predict The Price of The Uber Ride From A Given Pickup Point To The Agreed Drop-Off Location
No ratings yet
Predict The Price of The Uber Ride From A Given Pickup Point To The Agreed Drop-Off Location
9 pages
ML 1 Um
No ratings yet
ML 1 Um
5 pages
Lab1.ipynb - Colaboratory
No ratings yet
Lab1.ipynb - Colaboratory
9 pages
ML All Prints
No ratings yet
ML All Prints
25 pages
ML - 2 - Jupyter Notebook
No ratings yet
ML - 2 - Jupyter Notebook
6 pages
ML Practical 1 Code
100% (1)
ML Practical 1 Code
1 page
Uber
No ratings yet
Uber
7 pages
SPPUML1
No ratings yet
SPPUML1
8 pages
Assignment No 1 Output
No ratings yet
Assignment No 1 Output
42 pages
Practical 1
No ratings yet
Practical 1
6 pages
ML Practical 1
No ratings yet
ML Practical 1
15 pages
Analyzing Taxi Trends
No ratings yet
Analyzing Taxi Trends
43 pages
Uber ml1 - Jupyter Notebook
No ratings yet
Uber ml1 - Jupyter Notebook
10 pages
SourceCode Assignment1
No ratings yet
SourceCode Assignment1
9 pages
Supervised Regression
No ratings yet
Supervised Regression
24 pages
Bose A S
No ratings yet
Bose A S
37 pages
Data Cleaning
No ratings yet
Data Cleaning
13 pages
Praktikum 5
No ratings yet
Praktikum 5
20 pages
Flight-Price-Prediction - Flight - Price - Ipynb at Master Mandal-21 - Flight-Price-Prediction
No ratings yet
Flight-Price-Prediction - Flight - Price - Ipynb at Master Mandal-21 - Flight-Price-Prediction
28 pages
Airfare ML - Predicting Flight Fares
No ratings yet
Airfare ML - Predicting Flight Fares
21 pages
Taxi Fare Team 09
No ratings yet
Taxi Fare Team 09
25 pages
UBER Data Wrangling
No ratings yet
UBER Data Wrangling
45 pages
Airline Passenger Booking Analyze
No ratings yet
Airline Passenger Booking Analyze
26 pages
002 Python Pandas
No ratings yet
002 Python Pandas
19 pages
Ml-Exp-1 - Jupyter Notebook
No ratings yet
Ml-Exp-1 - Jupyter Notebook
8 pages
Uber Drive Practice DP PDF
No ratings yet
Uber Drive Practice DP PDF
10 pages
Titanic
No ratings yet
Titanic
22 pages
ML Code Output
No ratings yet
ML Code Output
38 pages
ML Practical 4D
No ratings yet
ML Practical 4D
11 pages
Cleaning Data in Python
No ratings yet
Cleaning Data in Python
8 pages
3rd Semester DDM AI DAA DEV Print Pages For Spiral Record 25-1-24 - Removed
No ratings yet
3rd Semester DDM AI DAA DEV Print Pages For Spiral Record 25-1-24 - Removed
28 pages
Overview of Data Cleaning
No ratings yet
Overview of Data Cleaning
17 pages
Report
No ratings yet
Report
25 pages
Modern Pandas: Hervé Mignot Equancy
No ratings yet
Modern Pandas: Hervé Mignot Equancy
21 pages
Taxi Trips Analysis Project 1682332303
100% (2)
Taxi Trips Analysis Project 1682332303
28 pages
Pyt Manual 1
No ratings yet
Pyt Manual 1
85 pages
Multiple - Linear - Regression - AirBNB - Solution-0.2 - New - Ipynb - Colaboratory
No ratings yet
Multiple - Linear - Regression - AirBNB - Solution-0.2 - New - Ipynb - Colaboratory
11 pages
Python For Machine Learning
No ratings yet
Python For Machine Learning
66 pages
Uber - Rides - Analysis - Jupyter Notebook
No ratings yet
Uber - Rides - Analysis - Jupyter Notebook
12 pages
Flight - Price - Machine Learning
No ratings yet
Flight - Price - Machine Learning
23 pages
Ass 1 ML
No ratings yet
Ass 1 ML
21 pages
SPPUML6
No ratings yet
SPPUML6
9 pages
Railway Price Prediction
No ratings yet
Railway Price Prediction
20 pages
Lesson - 3 - 1 Data Wrangling
No ratings yet
Lesson - 3 - 1 Data Wrangling
29 pages
Task2 Eda Cleaning
No ratings yet
Task2 Eda Cleaning
33 pages
Step 16 Chapter4
No ratings yet
Step 16 Chapter4
64 pages
Ai Tools and Applications-Lab
No ratings yet
Ai Tools and Applications-Lab
33 pages
Python2 Master
No ratings yet
Python2 Master
12 pages
Important Pandas Operations 1697910759
No ratings yet
Important Pandas Operations 1697910759
6 pages
Task 1 Vijaya Lakshman PDF
No ratings yet
Task 1 Vijaya Lakshman PDF
10 pages
Dse4 Stug082
No ratings yet
Dse4 Stug082
43 pages
Laboratory Exercises in Astronomy: Solutions and Answers
From Everand
Laboratory Exercises in Astronomy: Solutions and Answers
Dr. Adrian Kaminski
No ratings yet
Wa0015.
No ratings yet
Wa0015.
2 pages
OOMD Mini Project Report - Final
No ratings yet
OOMD Mini Project Report - Final
17 pages
Treasure Hunt-42
No ratings yet
Treasure Hunt-42
1 page
ML 2 16
No ratings yet
ML 2 16
6 pages
Comm Question Babnk
No ratings yet
Comm Question Babnk
2 pages
Lampiran 1 Irna Revisi FIX
No ratings yet
Lampiran 1 Irna Revisi FIX
18 pages
Modelo Estrutural 1
No ratings yet
Modelo Estrutural 1
558 pages
Lumen OHM 3
No ratings yet
Lumen OHM 3
3 pages
Dat Science Unit 2
No ratings yet
Dat Science Unit 2
27 pages
AP Stat - Chap 8 Test Review Solutions
No ratings yet
AP Stat - Chap 8 Test Review Solutions
7 pages
Quiz Stat
No ratings yet
Quiz Stat
7 pages
Safari 6
No ratings yet
Safari 6
4 pages
2 2 Data Presentation 1Dn0TYc0OYBq8y5E
No ratings yet
2 2 Data Presentation 1Dn0TYc0OYBq8y5E
42 pages
Box Whisker Plots Answers
No ratings yet
Box Whisker Plots Answers
3 pages
Measures of Central Tendency of Ungrouped Data
No ratings yet
Measures of Central Tendency of Ungrouped Data
34 pages
Covariance Correlation
No ratings yet
Covariance Correlation
4 pages
Analisis Deskriptif A. Pretest B.: Descriptive Statistics
No ratings yet
Analisis Deskriptif A. Pretest B.: Descriptive Statistics
2 pages
Gawain NG Mag-Aaral #10 - Pagbuo NG Kabanata IV
No ratings yet
Gawain NG Mag-Aaral #10 - Pagbuo NG Kabanata IV
9 pages
BPCC-104 Dedc 2021
No ratings yet
BPCC-104 Dedc 2021
4 pages
Statistics Probability q3 Mod2 Mean and Variance of Discrete Random
No ratings yet
Statistics Probability q3 Mod2 Mean and Variance of Discrete Random
25 pages
Nguyễn Phát Thịnh - assignment 11
No ratings yet
Nguyễn Phát Thịnh - assignment 11
6 pages
Topic:: Normal Probability Curve
No ratings yet
Topic:: Normal Probability Curve
20 pages
Finals MMW Final
No ratings yet
Finals MMW Final
4 pages
Talla - Edad OMS
No ratings yet
Talla - Edad OMS
12 pages
EDA On Titanic Dataset
100% (1)
EDA On Titanic Dataset
39 pages
Mahmudah & Erinda (2022)
No ratings yet
Mahmudah & Erinda (2022)
8 pages
Ekonometrika Uas
No ratings yet
Ekonometrika Uas
12 pages
Paper 3 Quantitative Aptitute Question
No ratings yet
Paper 3 Quantitative Aptitute Question
18 pages
Maths Revision PDF
No ratings yet
Maths Revision PDF
36 pages
Lesson 10 Relationship Between Variables
No ratings yet
Lesson 10 Relationship Between Variables
85 pages
Frequency Table: Data: GROUP 2-Mary Help of Christians
No ratings yet
Frequency Table: Data: GROUP 2-Mary Help of Christians
2 pages
Enma 104 Notes
No ratings yet
Enma 104 Notes
27 pages
Takehome UAS 2023
No ratings yet
Takehome UAS 2023
41 pages
Untitled
No ratings yet
Untitled
34 pages

ML 1 16

Uploaded by

ML 1 16

Uploaded by

In [1]: 1 # Name: Vedika Santosh Jadhav

In [2]: 1 df=pd.read_csv("uber - uber.csv")

... ... ... ... ... ... ...

200000 rows × 9 columns

Out[8]: Unnamed: 0 int64

Out[11]: Unnamed: 0 int64

In [12]: 1 df=df.drop("Unnamed: 0",axis=1)

... ... ... ... ... ...

200000 rows × 7 columns

... ... ... ... ... ...

200000 rows × 12 columns

Out[19]: <Axes: >

Out[20]: <Axes: >

Out[21]: fare_amount Axes(0.125,0.786098;0.352273x0.0939024)

Out[22]: fare_amount Axes(0.125,0.786098;0.352273x0.0939024)

Out[23]: <Axes: >

In [24]: 1 pip install numpy

Defaulting to user installation because normal site-packages is not writeable

In [25]: 1 pip install numpy

Defaulting to user installation because normal site-packages is not writeable

Defaulting to user installation because normal site-packages is not writeable

In [27]: 1 import haversine as hs

IOPub data rate exceeded.

0 7.5 -73.999817 40.738354 -73.999512 40.723217 1.0

1 7.7 -73.994355 40.728225 -73.994710 40.750325 1.0

2 12.9 -74.005043 40.740770 -73.962565 40.772647 1.0

3 5.3 -73.976124 40.790844 -73.965316 40.803349 3.0

4 16.0 -73.929786 40.744085 -73.973082 40.761247 3.5

In [30]: 1 from sklearn.model_selection import train_test_split

In [31]: 1 from sklearn.linear_model import LinearRegression

Out[31]: array([ 7.68008794, 10.60894655, 8.47854262, ..., 6.36062388,

In [32]: 1 from sklearn.metrics import mean_absolute_error,

In [33]: 1 r2 = r2_score(y_test, y_pred)

In [34]: 1 print(f'R² Score: {r2}')

Custom Accuracy: 0.170375

In [37]: 1 from sklearn.ensemble import RandomForestRegressor

In [38]: 1 rf_model = RandomForestRegressor(n_estimators=100,

In [43]: 1 y_pred_rf = rf_model.predict(x_test)

In [46]: 1 r2_lr = r2_score(y_test,y_pred )

In [47]: 1 r2_rf = r2_score(y_test, y_pred_rf)

Random Forest Regression R2: 0.7944021741272649

You might also like