0% found this document useful (0 votes)

15 views10 pages

Uber ml1 - Jupyter Notebook

Uploaded by

Arbaz Shaikh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views10 pages

Uber ml1 - Jupyter Notebook

Uploaded by

Arbaz Shaikh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

9/17/24, 8:10 PM Uber ml1 - Jupyter Notebook

In [1]:  import pandas as pd

import matplotlib.pyplot as plt
import seaborn as sns
import datetime as dt

In [2]:  df = pd.read_csv("uber.csv")
df.head()

Out[2]:
Unnamed:
key fare_amount pickup_datetime pickup_longitude pick
0

2015-05-07 2015-05-07
0 24238194 7.5 -73.999817
19:52:06.0000003 19:52:06 UTC

2009-07-17 2009-07-17
1 27835199 7.7 -73.994355
20:04:56.0000002 20:04:56 UTC

2009-08-24 2009-08-24
2 44984355 12.9 -74.005043
21:45:00.00000061 21:45:00 UTC

2009-06-26 2009-06-26
3 25894730 5.3 -73.976124
08:22:21.0000001 08:22:21 UTC

2014-08-28 2014-08-28
4 17610152 16.0 -73.925023
17:47:00.000000188 17:47:00 UTC

In [3]:  df.drop(columns=['Unnamed: 0','key'],inplace=True)

In [4]:  df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 200000 entries, 0 to 199999
Data columns (total 7 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 fare_amount 200000 non-null float64
1 pickup_datetime 200000 non-null object
2 pickup_longitude 200000 non-null float64
3 pickup_latitude 200000 non-null float64
4 dropoff_longitude 199999 non-null float64
5 dropoff_latitude 199999 non-null float64
6 passenger_count 200000 non-null int64
dtypes: float64(5), int64(1), object(1)
memory usage: 10.7+ MB

In [5]:  df.dropna(how='any',inplace=True)

localhost:8888/notebooks/Uber ml1.ipynb 1/10

9/17/24, 8:10 PM Uber ml1 - Jupyter Notebook

In [6]:  df.isnull().sum()

Out[6]: fare_amount 0
pickup_datetime 0
pickup_longitude 0
pickup_latitude 0
dropoff_longitude 0
dropoff_latitude 0
passenger_count 0
dtype: int64

Boxplots

In [7]:  for col in df.select_dtypes(exclude=['object']):

plt.figure()
sns.boxplot(data=df,x=col)

In [8]:  df = df[
(df.pickup_latitude > -90) & (df.pickup_latitude < 90) &
(df.dropoff_latitude > -90) & (df.dropoff_latitude < 90) &
(df.pickup_longitude > -180) & (df.pickup_longitude < 180) &
(df.dropoff_longitude > -180) & (df.dropoff_longitude < 180) &
(df.fare_amount > 0) & (df.passenger_count > 0) & (df.passenger_co
]

localhost:8888/notebooks/Uber ml1.ipynb 2/10

9/17/24, 8:10 PM Uber ml1 - Jupyter Notebook

In [9]:  from math import cos, asin, sqrt, pi

import numpy as np

def distance(lat_1,lon_1,lat_2,lon_2):
# lat1 = row.pickup_latitude
# lon1 = row.pickup_longitude
# lat2 = row.dropoff_latitude
# lon2 = row.dropoff_longitude
lon_1, lon_2, lat_1, lat_2 = map(np.radians, [lon_1, lon_2, lat_1,

diff_lon = lon_2 - lon_1

diff_lat = lat_2 - lat_1

km = 2 * 6371 * np.arcsin(np.sqrt(np.sin(diff_lat/2.0)**2 + np.cos

return km

In [10]:  temp = distance(df['pickup_latitude'],df['pickup_longitude'],df['dropof

temp.head()

Out[10]: 0 1.683323
1 2.457590
2 5.036377
3 1.661683
4 4.475450
dtype: float64

In [11]:  df_new = df.copy()

df_new['Distance'] = temp
df = df_new
df.head()

Out[11]:
fare_amount pickup_datetime pickup_longitude pickup_latitude dropoff_longitude d

2015-05-07
0 7.5 -73.999817 40.738354 -73.999512
19:52:06 UTC

2009-07-17
1 7.7 -73.994355 40.728225 -73.994710
20:04:56 UTC

2009-08-24
2 12.9 -74.005043 40.740770 -73.962565
21:45:00 UTC

2009-06-26
3 5.3 -73.976124 40.790844 -73.965316
08:22:21 UTC

2014-08-28
4 16.0 -73.925023 40.744085 -73.973082
17:47:00 UTC

localhost:8888/notebooks/Uber ml1.ipynb 3/10

9/17/24, 8:10 PM Uber ml1 - Jupyter Notebook

In [12]:  sns.boxplot(data=df,x='Distance')

Out[12]: <Axes: xlabel='Distance'>

In [13]:  df = df[(df['Distance'] < 200) & (df['Distance'] > 0)]

In [14]:  df['pickup_datetime'] = pd.to_datetime(df['pickup_datetime'])

C:\Users\HP\AppData\Local\Temp\ipykernel_16404\1295461447.py:1: Setti
ngWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/panda

localhost:8888/notebooks/Uber ml1.ipynb 4/10

9/17/24, 8:10 PM Uber ml1 - Jupyter Notebook

In [15]:  df['week_day'] = df['pickup_datetime'].dt.day_name()

df['Year'] = df['pickup_datetime'].dt.year
df['Month'] = df['pickup_datetime'].dt.month
df['Hour'] = df['pickup_datetime'].dt.hour

C:\Users\HP\AppData\Local\Temp\ipykernel_16404\2592915223.py:1: Setti
ngWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/panda

s-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
(https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.htm
l#returning-a-view-versus-a-copy)
df['week_day'] = df['pickup_datetime'].dt.day_name()
C:\Users\HP\AppData\Local\Temp\ipykernel_16404\2592915223.py:2: Setti
ngWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/panda

s-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
(https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.htm
l#returning-a-view-versus-a-copy)
df['Year'] = df['pickup_datetime'].dt.year
C:\Users\HP\AppData\Local\Temp\ipykernel_16404\2592915223.py:3: Setti
ngWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/panda

s-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
(https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.htm
l#returning-a-view-versus-a-copy)
df['Month'] = df['pickup_datetime'].dt.month
C:\Users\HP\AppData\Local\Temp\ipykernel_16404\2592915223.py:4: Setti
ngWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/panda

In [16]:  df.drop(columns=['pickup_datetime','pickup_latitude','pickup_longitude'

C:\Users\HP\AppData\Local\Temp\ipykernel_16404\3782303944.py:1: Setti
ngWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/panda

s-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
(https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.htm
l#returning-a-view-versus-a-copy)
df.drop(columns=['pickup_datetime','pickup_latitude','pickup_longit
ude','dropoff_latitude','dropoff_longitude'],inplace=True)

localhost:8888/notebooks/Uber ml1.ipynb 5/10

9/17/24, 8:10 PM Uber ml1 - Jupyter Notebook

In [17]:  df.head()

Out[17]:
fare_amount passenger_count Distance week_day Year Month Hour

0 7.5 1 1.683323 Thursday 2015 5 19

1 7.7 1 2.457590 Friday 2009 7 20

2 12.9 1 5.036377 Monday 2009 8 21

3 5.3 3 1.661683 Friday 2009 6 8

4 16.0 5 4.475450 Thursday 2014 8 17

localhost:8888/notebooks/Uber ml1.ipynb 6/10

9/17/24, 8:10 PM Uber ml1 - Jupyter Notebook

In [18]:  temp = df.copy()

def convert_week_day(day):
if day in ['Monday','Tuesday','Wednesday','Thursday']:
return 0 # Weekday
return 1 # Weekend

def convert_hour(hour):
if 5 <= hour <= 12:
return 1
elif 12 < hour <= 17:
return 2
elif 17 < hour < 24:
return 3
return 0

df['week_day'] = temp['week_day'].apply(convert_week_day)
df['Hour'] = temp['Hour'].apply(convert_hour)
df.head()

C:\Users\HP\AppData\Local\Temp\ipykernel_16404\3260682206.py:17: Sett
ingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/panda

s-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
(https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.htm
l#returning-a-view-versus-a-copy)
df['week_day'] = temp['week_day'].apply(convert_week_day)
C:\Users\HP\AppData\Local\Temp\ipykernel_16404\3260682206.py:18: Sett
ingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/panda

Out[18]:
fare_amount passenger_count Distance week_day Year Month Hour

0 7.5 1 1.683323 0 2015 5 3

1 7.7 1 2.457590 1 2009 7 3

2 12.9 1 5.036377 0 2009 8 3

3 5.3 3 1.661683 1 2009 6 1

4 16.0 5 4.475450 0 2014 8 2

localhost:8888/notebooks/Uber ml1.ipynb 7/10

9/17/24, 8:10 PM Uber ml1 - Jupyter Notebook

In [19]:  df.corr()

Out[19]:
fare_amount passenger_count Distance week_day Year Mo

fare_amount 1.000000 0.011884 0.778667 0.002305 0.120430 0.024

passenger_count 0.011884 1.000000 0.005112 0.035882 0.005339 0.008

Distance 0.778667 0.005112 1.000000 0.014518 0.018617 0.007

week_day 0.002305 0.035882 0.014518 1.000000 0.006910 -0.007

Year 0.120430 0.005339 0.018617 0.006910 1.000000 -0.115

Month 0.024120 0.008818 0.007373 -0.007328 -0.115182 1.000

Hour -0.021078 0.013572 -0.022691 -0.078129 0.001131 -0.005

In [20]:  sns.scatterplot(y=df['fare_amount'],x=df['Distance'])

Out[20]: <Axes: xlabel='Distance', ylabel='fare_amount'>

In [21]:  from sklearn.preprocessing import StandardScaler

x = df[['Distance']].values
y = df['fare_amount'].values.reshape(-1,1)

In [22]:  from sklearn.model_selection import train_test_split

x_train, x_test, y_train,y_test = train_test_split(x,y,random_state=10)

localhost:8888/notebooks/Uber ml1.ipynb 8/10

9/17/24, 8:10 PM Uber ml1 - Jupyter Notebook

In [23]:  std_x = StandardScaler()

x_train = std_x.fit_transform(x_train)

In [24]:  x_test = std_x.transform(x_test)

In [25]:  std_y = StandardScaler()

y_train = std_y.fit_transform(y_train)

In [26]:  y_test = std_y.transform(y_test)

In [27]:  from sklearn.metrics import mean_squared_error,r2_score, mean_absolute_

def fit_predict(model):
model.fit(x_train,y_train.ravel())
y_pred = model.predict(x_test)
r_squared = r2_score(y_test,y_pred)
RMSE = mean_squared_error(y_test, y_pred,squared=False)
MAE = mean_absolute_error(y_test,y_pred)
print('R-squared: ', r_squared)
print('RMSE: ', RMSE)
print("MAE: ",MAE)

In [28]:  from sklearn.linear_model import LinearRegression

In [29]:  fit_predict(LinearRegression())

R-squared: 0.6041167920841171
RMSE: 0.6290054895695945
MAE: 0.27552329590959806

C:\Users\HP\AppData\Roaming\Python\Python311\site-packages\sklearn\me
trics\_regression.py:483: FutureWarning: 'squared' is deprecated in v
ersion 1.4 and will be removed in 1.6. To calculate the root mean squ
ared error, use the function'root_mean_squared_error'.
warnings.warn(

In [30]:  from sklearn.ensemble import RandomForestRegressor

fit_predict(RandomForestRegressor())

R-squared: 0.6522221648884474
RMSE: 0.5895516309915084
MAE: 0.2918258149086775

localhost:8888/notebooks/Uber ml1.ipynb 9/10

9/17/24, 8:10 PM Uber ml1 - Jupyter Notebook

localhost:8888/notebooks/Uber ml1.ipynb 10/10

Taxi Trips Analysis Project 1682332303
100% (2)
Taxi Trips Analysis Project 1682332303
28 pages
Loading The Dataset: First We Load The Dataset and Find Out The Number of Columns, Rows, NULL Values, Etc
100% (1)
Loading The Dataset: First We Load The Dataset and Find Out The Number of Columns, Rows, NULL Values, Etc
8 pages
Working With The Divvy Data Set
100% (1)
Working With The Divvy Data Set
43 pages
ML Practical 1 Code
100% (1)
ML Practical 1 Code
1 page
Practical 1
No ratings yet
Practical 1
6 pages
Delhivery Business Case Study 1723758771
No ratings yet
Delhivery Business Case Study 1723758771
56 pages
UBER Data Wrangling
No ratings yet
UBER Data Wrangling
45 pages
Assignment No 1 Output
No ratings yet
Assignment No 1 Output
42 pages
PMT2 21
No ratings yet
PMT2 21
39 pages
Merged
No ratings yet
Merged
47 pages
Delhivery Feature Engineering Cs
No ratings yet
Delhivery Feature Engineering Cs
46 pages
Analyzing Taxi Trends
No ratings yet
Analyzing Taxi Trends
43 pages
Flight-Price-Prediction - Flight - Price - Ipynb at Master Mandal-21 - Flight-Price-Prediction
No ratings yet
Flight-Price-Prediction - Flight - Price - Ipynb at Master Mandal-21 - Flight-Price-Prediction
28 pages
ML All Prints
No ratings yet
ML All Prints
25 pages
ML Code Output
No ratings yet
ML Code Output
38 pages
No More Sad Pandas: Optimizing Pandas Code For Performance: Lead Data Scientist
No ratings yet
No More Sad Pandas: Optimizing Pandas Code For Performance: Lead Data Scientist
48 pages
Bose A S
No ratings yet
Bose A S
37 pages
ML - Practical - 1 - Jupyter Notebook
No ratings yet
ML - Practical - 1 - Jupyter Notebook
15 pages
Delhivery Case Study Compressed
No ratings yet
Delhivery Case Study Compressed
31 pages
Flight - Price - Machine Learning
No ratings yet
Flight - Price - Machine Learning
23 pages
ML Practical 1
No ratings yet
ML Practical 1
15 pages
Lesson - 3 - 1 Data Wrangling
No ratings yet
Lesson - 3 - 1 Data Wrangling
29 pages
Taxi Fare Team 09
No ratings yet
Taxi Fare Team 09
25 pages
Airline Passenger Booking Analyze
No ratings yet
Airline Passenger Booking Analyze
26 pages
Data Cleaning
No ratings yet
Data Cleaning
13 pages
Delhivery
No ratings yet
Delhivery
20 pages
ML 1 16
No ratings yet
ML 1 16
13 pages
Airfare ML - Predicting Flight Fares
No ratings yet
Airfare ML - Predicting Flight Fares
21 pages
Lab1.ipynb - Colaboratory
No ratings yet
Lab1.ipynb - Colaboratory
9 pages
Example of Markov Chain With Python
No ratings yet
Example of Markov Chain With Python
11 pages
Predict The Price of The Uber Ride From A Given Pickup Point To The Agreed Drop-Off Location
No ratings yet
Predict The Price of The Uber Ride From A Given Pickup Point To The Agreed Drop-Off Location
9 pages
Uber Data Analysis: Data Import and Sanity Checks
No ratings yet
Uber Data Analysis: Data Import and Sanity Checks
16 pages
Uber - Rides - Analysis - Jupyter Notebook
No ratings yet
Uber - Rides - Analysis - Jupyter Notebook
12 pages
Supervised Regression
No ratings yet
Supervised Regression
24 pages
Name: Siddhesh Asati: #Group: B (ML) #Assignment: 6
No ratings yet
Name: Siddhesh Asati: #Group: B (ML) #Assignment: 6
9 pages
SourceCode Assignment1
No ratings yet
SourceCode Assignment1
9 pages
Untitled 18
No ratings yet
Untitled 18
7 pages
Uber
No ratings yet
Uber
7 pages
You Have Two Datasets - Trips - TXT Which Records Tri...
No ratings yet
You Have Two Datasets - Trips - TXT Which Records Tri...
6 pages
ML 1 Um
No ratings yet
ML 1 Um
5 pages
P1) Code Uber
No ratings yet
P1) Code Uber
6 pages
Zahra Ratu Audia - (17821107) - Praktikum 6
100% (2)
Zahra Ratu Audia - (17821107) - Praktikum 6
10 pages
Train Reservation
No ratings yet
Train Reservation
16 pages
SPPUML1
No ratings yet
SPPUML1
8 pages
ML - 2 - Jupyter Notebook
No ratings yet
ML - 2 - Jupyter Notebook
6 pages
Code
No ratings yet
Code
7 pages
Lab 5
No ratings yet
Lab 5
10 pages
Notes Uber Data Analysis Project
No ratings yet
Notes Uber Data Analysis Project
11 pages
Praktikum 5
No ratings yet
Praktikum 5
20 pages
Yash Week 3 Uber Case Study
No ratings yet
Yash Week 3 Uber Case Study
38 pages
Prac3 23bme053
No ratings yet
Prac3 23bme053
5 pages
Ml-Exp-1 - Jupyter Notebook
No ratings yet
Ml-Exp-1 - Jupyter Notebook
8 pages
How To Convert Casuals To Members?": Google Data Analytics Course Capstone Project: Case Study 1 "Cyclistic"
No ratings yet
How To Convert Casuals To Members?": Google Data Analytics Course Capstone Project: Case Study 1 "Cyclistic"
18 pages
ML Practical 1
No ratings yet
ML Practical 1
15 pages
Anagh-Desai BigDataAssignments Uber Data Analysis Using RDD
No ratings yet
Anagh-Desai BigDataAssignments Uber Data Analysis Using RDD
4 pages
Car Analytics Solution
No ratings yet
Car Analytics Solution
4 pages
Uber Drive Practice DP PDF
No ratings yet
Uber Drive Practice DP PDF
10 pages
EDA Optimising NYC Taxis GautamTiwari - Cleanup
No ratings yet
EDA Optimising NYC Taxis GautamTiwari - Cleanup
1 page
D071171011 - Tugas 02
100% (1)
D071171011 - Tugas 02
7 pages
BI Miniproject B-25
No ratings yet
BI Miniproject B-25
14 pages
(2022-2024)
No ratings yet
(2022-2024)
67 pages
Tugas 5 Peramalan Bisnis Tri Felbi Rayenra (19134085)
No ratings yet
Tugas 5 Peramalan Bisnis Tri Felbi Rayenra (19134085)
6 pages
Cia 1.1
No ratings yet
Cia 1.1
7 pages
Prg7a - Jupyter Notebook
No ratings yet
Prg7a - Jupyter Notebook
12 pages
Forecasting of Sales of Mowers and Tractors
No ratings yet
Forecasting of Sales of Mowers and Tractors
84 pages
Bi 4
No ratings yet
Bi 4
6 pages
Regno: 19mis1106 Name: Sam Melvin M Course Code - Swe4012 - Machine Learning Lab Slot: L11+L12 Faculty: Dr. M. Premalatha
No ratings yet
Regno: 19mis1106 Name: Sam Melvin M Course Code - Swe4012 - Machine Learning Lab Slot: L11+L12 Faculty: Dr. M. Premalatha
37 pages
1q3b8AXWiBQ80Aki yDW-q qNGhtwoVV
No ratings yet
1q3b8AXWiBQ80Aki yDW-q qNGhtwoVV
8 pages
HPC Miniproject
No ratings yet
HPC Miniproject
11 pages
Tugas 1 - MANPRO - Riska Tiana - 140610210002 - Silma Minnatika - 140610210014
No ratings yet
Tugas 1 - MANPRO - Riska Tiana - 140610210002 - Silma Minnatika - 140610210014
24 pages
Numpy NP Pandas PD Matplotlib - Pyplot PLT Sklearn - Model - Selection Sklearn - Ensemble Sklearn - Metrics Xgboost Lightgbm Google - Colab Io
No ratings yet
Numpy NP Pandas PD Matplotlib - Pyplot PLT Sklearn - Model - Selection Sklearn - Ensemble Sklearn - Metrics Xgboost Lightgbm Google - Colab Io
14 pages
Cuadro Original de Ejercicios Coursera - Media Movil
No ratings yet
Cuadro Original de Ejercicios Coursera - Media Movil
15 pages
Cuadro Original de Ejercicios Coursera
No ratings yet
Cuadro Original de Ejercicios Coursera
15 pages
Estiven - Hurtado.Santos - Regresión Con Varios Algoritmos
No ratings yet
Estiven - Hurtado.Santos - Regresión Con Varios Algoritmos
16 pages
Dự Báo Định Lượng - Tuần 2 - Bài Tập
No ratings yet
Dự Báo Định Lượng - Tuần 2 - Bài Tập
14 pages
Forecast Exercise
No ratings yet
Forecast Exercise
9 pages
Pronósticos Medición Del Error: Ing. Msc. Luis Eduardo Leguizamon Castellanos
No ratings yet
Pronósticos Medición Del Error: Ing. Msc. Luis Eduardo Leguizamon Castellanos
15 pages
6116 20272 1 PB - 2 PDF
No ratings yet
6116 20272 1 PB - 2 PDF
9 pages
Tutorial 7.ipynb - Colab
No ratings yet
Tutorial 7.ipynb - Colab
7 pages
Optimization in Machine Learning
No ratings yet
Optimization in Machine Learning
8 pages
Prac8 23bme053
No ratings yet
Prac8 23bme053
2 pages
Forecasting Excel
No ratings yet
Forecasting Excel
3 pages
Tugas2 Regresi Linear Berganda - Ipynb - Colab
No ratings yet
Tugas2 Regresi Linear Berganda - Ipynb - Colab
3 pages
22 Ai 4
No ratings yet
22 Ai 4
4 pages
Chapter 18
No ratings yet
Chapter 18
3 pages
Rata-Rata Kesalahan (Mean Error) : Ukuran Statistik Standar
No ratings yet
Rata-Rata Kesalahan (Mean Error) : Ukuran Statistik Standar
3 pages
Profound Python Libraries
From Everand
Profound Python Libraries
Onder Teker
No ratings yet
TensorFlow深度学习项目实战: Chinese Edition
From Everand
TensorFlow深度学习项目实战: Chinese Edition
Posts & Telecom Press
No ratings yet

Uber ml1 - Jupyter Notebook

Uploaded by

Uber ml1 - Jupyter Notebook

Uploaded by

9/17/24, 8:10 PM Uber ml1 - Jupyter Notebook

In [1]:  import pandas as pd

In [3]:  df.drop(columns=['Unnamed: 0','key'],inplace=True)

localhost:8888/notebooks/Uber ml1.ipynb 1/10

In [7]:  for col in df.select_dtypes(exclude=['object']):

localhost:8888/notebooks/Uber ml1.ipynb 2/10

In [9]:  from math import cos, asin, sqrt, pi

diff_lon = lon_2 - lon_1

In [10]:  temp = distance(df['pickup_latitude'],df['pickup_longitude'],df['dropof

In [11]:  df_new = df.copy()

localhost:8888/notebooks/Uber ml1.ipynb 3/10

Out[12]: <Axes: xlabel='Distance'>

In [13]:  df = df[(df['Distance'] < 200) & (df['Distance'] > 0)]

In [14]:  df['pickup_datetime'] = pd.to_datetime(df['pickup_datetime'])

See the caveats in the documentation: https://pandas.pydata.org/panda

localhost:8888/notebooks/Uber ml1.ipynb 4/10

In [15]:  df['week_day'] = df['pickup_datetime'].dt.day_name()

See the caveats in the documentation: https://pandas.pydata.org/panda

See the caveats in the documentation: https://pandas.pydata.org/panda

See the caveats in the documentation: https://pandas.pydata.org/panda

See the caveats in the documentation: https://pandas.pydata.org/panda

See the caveats in the documentation: https://pandas.pydata.org/panda

localhost:8888/notebooks/Uber ml1.ipynb 5/10

0 7.5 1 1.683323 Thursday 2015 5 19

1 7.7 1 2.457590 Friday 2009 7 20

2 12.9 1 5.036377 Monday 2009 8 21

3 5.3 3 1.661683 Friday 2009 6 8

4 16.0 5 4.475450 Thursday 2014 8 17

localhost:8888/notebooks/Uber ml1.ipynb 6/10

In [18]:  temp = df.copy()

See the caveats in the documentation: https://pandas.pydata.org/panda

See the caveats in the documentation: https://pandas.pydata.org/panda

0 7.5 1 1.683323 0 2015 5 3

1 7.7 1 2.457590 1 2009 7 3

2 12.9 1 5.036377 0 2009 8 3

3 5.3 3 1.661683 1 2009 6 1

4 16.0 5 4.475450 0 2014 8 2

localhost:8888/notebooks/Uber ml1.ipynb 7/10

fare_amount 1.000000 0.011884 0.778667 0.002305 0.120430 0.024

passenger_count 0.011884 1.000000 0.005112 0.035882 0.005339 0.008

Distance 0.778667 0.005112 1.000000 0.014518 0.018617 0.007

week_day 0.002305 0.035882 0.014518 1.000000 0.006910 -0.007

Year 0.120430 0.005339 0.018617 0.006910 1.000000 -0.115

Month 0.024120 0.008818 0.007373 -0.007328 -0.115182 1.000

Hour -0.021078 0.013572 -0.022691 -0.078129 0.001131 -0.005

Out[20]: <Axes: xlabel='Distance', ylabel='fare_amount'>

In [21]:  from sklearn.preprocessing import StandardScaler

In [22]:  from sklearn.model_selection import train_test_split

localhost:8888/notebooks/Uber ml1.ipynb 8/10

In [23]:  std_x = StandardScaler()

In [24]:  x_test = std_x.transform(x_test)

In [25]:  std_y = StandardScaler()

In [26]:  y_test = std_y.transform(y_test)

In [27]:  from sklearn.metrics import mean_squared_error,r2_score, mean_absolute_

In [28]:  from sklearn.linear_model import LinearRegression

In [30]:  from sklearn.ensemble import RandomForestRegressor

localhost:8888/notebooks/Uber ml1.ipynb 9/10

localhost:8888/notebooks/Uber ml1.ipynb 10/10

You might also like