0% found this document useful (0 votes)

18 views6 pages

P1) Code Uber

Uploaded by

riteshakhade1234

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views6 pages

P1) Code Uber

Uploaded by

riteshakhade1234

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

10/15/24, 5:27 PM Uber

In [26]: #import libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import warnings
#We do not want to see warnings
warnings.filterwarnings("ignore")

In [2]: #import data

data = pd.read_csv("uber.csv")

In [3]: #Create a data copy

df = data.copy()

In [4]: #Print data

df.head

localhost:8888/nbconvert/html/OneDrive/Desktop/BE/Final Yr/SPPU-CSE-SEM7-Codes-main/ML/1. Uber Price Detection/Uber.ipynb?download=false 1/6

10/15/24, 5:27 PM Uber
<bound method NDFrame.head of Unnamed: 0 key fare
Out[4]:
_amount \
0 24238194 2015-05-07 19:52:06.0000003 7.5
1 27835199 2009-07-17 20:04:56.0000002 7.7
2 44984355 2009-08-24 21:45:00.00000061 12.9
3 25894730 2009-06-26 08:22:21.0000001 5.3
4 17610152 2014-08-28 17:47:00.000000188 16.0
... ... ... ...
199995 42598914 2012-10-28 10:49:00.00000053 3.0
199996 16382965 2014-03-14 01:09:00.0000008 7.5
199997 27804658 2009-06-29 00:42:00.00000078 30.9
199998 20259894 2015-05-20 14:56:25.0000004 14.5
199999 11951496 2010-05-15 04:08:00.00000076 14.1

pickup_datetime pickup_longitude pickup_latitude \

0 2015-05-07 19:52:06 UTC -73.999817 40.738354
1 2009-07-17 20:04:56 UTC -73.994355 40.728225
2 2009-08-24 21:45:00 UTC -74.005043 40.740770
3 2009-06-26 08:22:21 UTC -73.976124 40.790844
4 2014-08-28 17:47:00 UTC -73.925023 40.744085
... ... ... ...
199995 2012-10-28 10:49:00 UTC -73.987042 40.739367
199996 2014-03-14 01:09:00 UTC -73.984722 40.736837
199997 2009-06-29 00:42:00 UTC -73.986017 40.756487
199998 2015-05-20 14:56:25 UTC -73.997124 40.725452
199999 2010-05-15 04:08:00 UTC -73.984395 40.720077

dropoff_longitude dropoff_latitude passenger_count

0 -73.999512 40.723217 1
1 -73.994710 40.750325 1
2 -73.962565 40.772647 1
3 -73.965316 40.803349 3
4 -73.973082 40.761247 5
... ... ... ...
199995 -73.986525 40.740297 1
199996 -74.006672 40.739620 1
199997 -73.858957 40.692588 2
199998 -73.983215 40.695415 1
199999 -73.985508 40.768793 1

[200000 rows x 9 columns]>

In [8]: #Get Info

df.info()

localhost:8888/nbconvert/html/OneDrive/Desktop/BE/Final Yr/SPPU-CSE-SEM7-Codes-main/ML/1. Uber Price Detection/Uber.ipynb?download=false 2/6

10/15/24, 5:27 PM Uber
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 200000 entries, 0 to 199999
Data columns (total 9 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Unnamed: 0 200000 non-null int64
1 key 200000 non-null object
2 fare_amount 200000 non-null float64
3 pickup_datetime 200000 non-null object
4 pickup_longitude 200000 non-null float64
5 pickup_latitude 200000 non-null float64
6 dropoff_longitude 199999 non-null float64
7 dropoff_latitude 199999 non-null float64
8 passenger_count 200000 non-null int64
dtypes: float64(5), int64(2), object(2)
memory usage: 13.7+ MB

In [5]: #pickup_datetime is not in required data format

df["pickup_datetime"] = pd.to_datetime(df["pickup_datetime"])

In [6]: df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 200000 entries, 0 to 199999
Data columns (total 9 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Unnamed: 0 200000 non-null int64
1 key 200000 non-null object
2 fare_amount 200000 non-null float64
3 pickup_datetime 200000 non-null datetime64[ns, UTC]
4 pickup_longitude 200000 non-null float64
5 pickup_latitude 200000 non-null float64
6 dropoff_longitude 199999 non-null float64
7 dropoff_latitude 199999 non-null float64
8 passenger_count 200000 non-null int64
dtypes: datetime64[ns, UTC](1), float64(5), int64(2), object(1)
memory usage: 13.7+ MB

In [11]: #Statistics of data

df.describe()

Out[11]: Unnamed: 0 fare_amount pickup_longitude pickup_latitude dropoff_longitude dropoff_lati

count 2.000000e+05 200000.000000 200000.000000 200000.000000 199999.000000 199999.00

mean 2.771250e+07 11.359955 -72.527638 39.935885 -72.525292 39.92

std 1.601382e+07 9.901776 11.437787 7.720539 13.117408 6.79

min 1.000000e+00 -52.000000 -1340.648410 -74.015515 -3356.666300 -881.98

25% 1.382535e+07 6.000000 -73.992065 40.734796 -73.991407 40.73

50% 2.774550e+07 8.500000 -73.981823 40.752592 -73.980093 40.75

75% 4.155530e+07 12.500000 -73.967154 40.767158 -73.963658 40.76

max 5.542357e+07 499.000000 57.418457 1644.421482 1153.572603 872.69

localhost:8888/nbconvert/html/OneDrive/Desktop/BE/Final Yr/SPPU-CSE-SEM7-Codes-main/ML/1. Uber Price Detection/Uber.ipynb?download=false 3/6

10/15/24, 5:27 PM Uber

In [7]: #Number of missing values

df.isnull().sum()

Unnamed: 0 0
Out[7]:
key 0
fare_amount 0
pickup_datetime 0
pickup_longitude 0
pickup_latitude 0
dropoff_longitude 1
dropoff_latitude 1
passenger_count 0
dtype: int64

In [22]: #Correlation
df.corr()

Out[22]: Unnamed:
fare_amount pickup_longitude pickup_latitude dropoff_longitude drop
0

Unnamed: 0 1.000000 -0.000223 -0.000266 0.000061 -0.000310

fare_amount -0.000223 1.000000 0.004654 -0.003154 0.003021

pickup_longitude -0.000266 0.004654 1.000000 -0.806902 0.830658

pickup_latitude 0.000061 -0.003154 -0.806902 1.000000 -0.770049

dropoff_longitude -0.000310 0.003021 0.830658 -0.770049 1.000000

dropoff_latitude 0.000938 -0.004621 -0.844705 0.691893 -0.912750

passenger_count 0.002311 0.010705 -0.000644 -0.001441 0.000105

In [8]: #Drop the rows with missing values

df.dropna(inplace=True)

In [27]: plt.boxplot(df['fare_amount'])

{'whiskers': [<matplotlib.lines.Line2D at 0x241e10fadc0>,

Out[27]:
<matplotlib.lines.Line2D at 0x241e11130d0>],
'caps': [<matplotlib.lines.Line2D at 0x241e1113460>,
<matplotlib.lines.Line2D at 0x241e11137f0>],
'boxes': [<matplotlib.lines.Line2D at 0x241e10fa970>],
'medians': [<matplotlib.lines.Line2D at 0x241e1113b80>],
'fliers': [<matplotlib.lines.Line2D at 0x241e1113f10>],
'means': []}

localhost:8888/nbconvert/html/OneDrive/Desktop/BE/Final Yr/SPPU-CSE-SEM7-Codes-main/ML/1. Uber Price Detection/Uber.ipynb?download=false 4/6

10/15/24, 5:27 PM Uber

In [10]: #Remove Outliers

q_low = df["fare_amount"].quantile(0.01)
q_hi = df["fare_amount"].quantile(0.99)

df = df[(df["fare_amount"] < q_hi) & (df["fare_amount"] > q_low)]

In [11]: #Check the missing values now

df.isnull().sum()

Unnamed: 0 0
Out[11]:
key 0
fare_amount 0
pickup_datetime 0
pickup_longitude 0
pickup_latitude 0
dropoff_longitude 0
dropoff_latitude 0
passenger_count 0
dtype: int64

In [12]: #Time to apply learning models

from sklearn.model_selection import train_test_split

In [13]: #Take x as predictor variable

x = df.drop("fare_amount", axis = 1)
#And y as target variable
y = df['fare_amount']

In [14]: #Necessary to apply model

x['pickup_datetime'] = pd.to_numeric(pd.to_datetime(x['pickup_datetime']))
x = x.loc[:, x.columns.str.contains('^Unnamed')]

In [15]: x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.2, random_stat

In [16]: from sklearn.linear_model import LinearRegression

In [17]: lrmodel = LinearRegression()

lrmodel.fit(x_train, y_train)

localhost:8888/nbconvert/html/OneDrive/Desktop/BE/Final Yr/SPPU-CSE-SEM7-Codes-main/ML/1. Uber Price Detection/Uber.ipynb?download=false 5/6

10/15/24, 5:27 PM Uber

Out[17]: ▾ LinearRegression

LinearRegression()

In [18]: #Prediction
predict = lrmodel.predict(x_test)

In [19]: #Check Error

from sklearn.metrics import mean_squared_error
lrmodelrmse = np.sqrt(mean_squared_error(predict, y_test))
print("RMSE error for the model is ", lrmodelrmse)

RMSE error for the model is 7.083585521002763

In [20]: #Let's Apply Random Forest Regressor

from sklearn.ensemble import RandomForestRegressor
rfrmodel = RandomForestRegressor(n_estimators = 100, random_state = 101)

In [21]: #Fit the Forest

rfrmodel.fit(x_train, y_train)
rfrmodel_pred = rfrmodel.predict(x_test)

In [23]: #Errors for the forest

rfrmodel_rmse = np.sqrt(mean_squared_error(rfrmodel_pred, y_test))
print("RMSE value for Random Forest is:",rfrmodel_rmse)

RMSE value for Random Forest is: 8.565996490346976

In [ ]:

localhost:8888/nbconvert/html/OneDrive/Desktop/BE/Final Yr/SPPU-CSE-SEM7-Codes-main/ML/1. Uber Price Detection/Uber.ipynb?download=false 6/6

Taxi Trips Analysis Project 1682332303
100% (2)
Taxi Trips Analysis Project 1682332303
28 pages
ML Practical 1 Code
100% (1)
ML Practical 1 Code
1 page
Loading The Dataset: First We Load The Dataset and Find Out The Number of Columns, Rows, NULL Values, Etc
100% (1)
Loading The Dataset: First We Load The Dataset and Find Out The Number of Columns, Rows, NULL Values, Etc
8 pages
Chapter 2 Hypothesis Testing
100% (7)
Chapter 2 Hypothesis Testing
43 pages
Chapter7 Econometrics Multicollinearity
No ratings yet
Chapter7 Econometrics Multicollinearity
25 pages
Practical 1
No ratings yet
Practical 1
6 pages
Crime Prediction - Clustering - Ipynb
No ratings yet
Crime Prediction - Clustering - Ipynb
180 pages
002 Python Pandas
No ratings yet
002 Python Pandas
19 pages
UBER Data Wrangling
No ratings yet
UBER Data Wrangling
45 pages
Uber Drive Practice DP PDF
No ratings yet
Uber Drive Practice DP PDF
10 pages
PLS SEM Tutorial
No ratings yet
PLS SEM Tutorial
25 pages
D1UA401B Research Methodology-UNIT-4 Pazhanisamy-BBA IV Semester Section19
No ratings yet
D1UA401B Research Methodology-UNIT-4 Pazhanisamy-BBA IV Semester Section19
108 pages
Sbe9 TB16
50% (2)
Sbe9 TB16
21 pages
Practical Concepts of Quality Control
No ratings yet
Practical Concepts of Quality Control
128 pages
Analyzing Taxi Trends
No ratings yet
Analyzing Taxi Trends
43 pages
Delhivery Case Study Compressed
No ratings yet
Delhivery Case Study Compressed
31 pages
Merged
No ratings yet
Merged
47 pages
Step 16 Chapter4
No ratings yet
Step 16 Chapter4
64 pages
Delhivery Business Case Study 1723758771
No ratings yet
Delhivery Business Case Study 1723758771
56 pages
Outlook Module3
No ratings yet
Outlook Module3
21 pages
Assignment No 1 Output
No ratings yet
Assignment No 1 Output
42 pages
3rd Semester DDM AI DAA DEV Print Pages For Spiral Record 25-1-24 - Removed
No ratings yet
3rd Semester DDM AI DAA DEV Print Pages For Spiral Record 25-1-24 - Removed
28 pages
Bose A S
No ratings yet
Bose A S
37 pages
BBS11 ISM Ch13
No ratings yet
BBS11 ISM Ch13
43 pages
Statistical Analysis (T-Test)
No ratings yet
Statistical Analysis (T-Test)
61 pages
Delhivery Feature Engineering Cs
No ratings yet
Delhivery Feature Engineering Cs
46 pages
Delhivery
No ratings yet
Delhivery
20 pages
Ch. 9 Montgomery RGM
No ratings yet
Ch. 9 Montgomery RGM
66 pages
Data Cleaning On Melbourne Housing
No ratings yet
Data Cleaning On Melbourne Housing
16 pages
Report
No ratings yet
Report
25 pages
PMT2 21
No ratings yet
PMT2 21
39 pages
Mitrushina (2005) Rey O Meta Norms
No ratings yet
Mitrushina (2005) Rey O Meta Norms
11 pages
ML#05
No ratings yet
ML#05
35 pages
Yash Week 3 Uber Case Study
No ratings yet
Yash Week 3 Uber Case Study
38 pages
Pertemuan Sesi 3
No ratings yet
Pertemuan Sesi 3
34 pages
ML All Prints
No ratings yet
ML All Prints
25 pages
Taxi Fare Team 09
No ratings yet
Taxi Fare Team 09
25 pages
BN2102 1-6 Notes
No ratings yet
BN2102 1-6 Notes
38 pages
Scaffold FG
No ratings yet
Scaffold FG
13 pages
Supervised Regression
No ratings yet
Supervised Regression
24 pages
Heteroscedasticity Notes
No ratings yet
Heteroscedasticity Notes
9 pages
Rittik Kumar Naskar
No ratings yet
Rittik Kumar Naskar
19 pages
Zomato Rating Prediction
No ratings yet
Zomato Rating Prediction
11 pages
Feature Selection
No ratings yet
Feature Selection
22 pages
Predict The Price of The Uber Ride From A Given Pickup Point To The Agreed Drop-Off Location
No ratings yet
Predict The Price of The Uber Ride From A Given Pickup Point To The Agreed Drop-Off Location
9 pages
Quantitative Demand Analysis
No ratings yet
Quantitative Demand Analysis
34 pages
ML Practical 1
No ratings yet
ML Practical 1
15 pages
Preprocessing ch.1
No ratings yet
Preprocessing ch.1
24 pages
Coursework Assessment MFKhan v1.4
No ratings yet
Coursework Assessment MFKhan v1.4
9 pages
Praktikum 5
No ratings yet
Praktikum 5
20 pages
House Price Prediction Models
No ratings yet
House Price Prediction Models
16 pages
ML Practical 1
No ratings yet
ML Practical 1
15 pages
Airline Passenger Booking Analyze
No ratings yet
Airline Passenger Booking Analyze
26 pages
ML 1 16
No ratings yet
ML 1 16
13 pages
Flight-Price-Prediction - Flight - Price - Ipynb at Master Mandal-21 - Flight-Price-Prediction
No ratings yet
Flight-Price-Prediction - Flight - Price - Ipynb at Master Mandal-21 - Flight-Price-Prediction
28 pages
Airfare ML - Predicting Flight Fares
No ratings yet
Airfare ML - Predicting Flight Fares
21 pages
SN Travel Jupyter Notebook PDF
No ratings yet
SN Travel Jupyter Notebook PDF
28 pages
Setup: Chapter 2 - End-To-End Machine Learning Project
No ratings yet
Setup: Chapter 2 - End-To-End Machine Learning Project
31 pages
Wooldridge Control Function Approach
No ratings yet
Wooldridge Control Function Approach
31 pages
SourceCode Assignment1
No ratings yet
SourceCode Assignment1
9 pages
Data Cleaning
No ratings yet
Data Cleaning
13 pages
ML - Practical - 1 - Jupyter Notebook
No ratings yet
ML - Practical - 1 - Jupyter Notebook
15 pages
Stock Watson 3U ExerciseSolutions Chapter03 Students PDF
No ratings yet
Stock Watson 3U ExerciseSolutions Chapter03 Students PDF
12 pages
Multiple - Linear - Regression - AirBNB - Solution-0.2 - New - Ipynb - Colaboratory
No ratings yet
Multiple - Linear - Regression - AirBNB - Solution-0.2 - New - Ipynb - Colaboratory
11 pages
Uber ml1 - Jupyter Notebook
No ratings yet
Uber ml1 - Jupyter Notebook
10 pages
Name: Siddhesh Asati: #Group: B (ML) #Assignment: 6
No ratings yet
Name: Siddhesh Asati: #Group: B (ML) #Assignment: 6
9 pages
Ml-Exp-1 - Jupyter Notebook
No ratings yet
Ml-Exp-1 - Jupyter Notebook
8 pages
SPPUML1
No ratings yet
SPPUML1
8 pages
Uber
No ratings yet
Uber
7 pages
18 Simultaneous Equation Models Two Stage Least Squares Estimation
No ratings yet
18 Simultaneous Equation Models Two Stage Least Squares Estimation
6 pages
Sample 7620
No ratings yet
Sample 7620
11 pages
Multiple - Linear - Regression - AirBNB - Student - File0.2 - New (1) .Ipynb - Colaboratory
No ratings yet
Multiple - Linear - Regression - AirBNB - Student - File0.2 - New (1) .Ipynb - Colaboratory
8 pages
How To Convert Casuals To Members?": Google Data Analytics Course Capstone Project: Case Study 1 "Cyclistic"
No ratings yet
How To Convert Casuals To Members?": Google Data Analytics Course Capstone Project: Case Study 1 "Cyclistic"
18 pages
ML - 2 - Jupyter Notebook
No ratings yet
ML - 2 - Jupyter Notebook
6 pages
Lab1.ipynb - Colaboratory
No ratings yet
Lab1.ipynb - Colaboratory
9 pages
ML 1 Um
No ratings yet
ML 1 Um
5 pages
Data Science Project
No ratings yet
Data Science Project
7 pages
Bayes Rule
No ratings yet
Bayes Rule
1 page
Case Study 1 Exercise R Script
No ratings yet
Case Study 1 Exercise R Script
5 pages
HASTS 412 Assignment-2
No ratings yet
HASTS 412 Assignment-2
2 pages
National University of Singapore ST5223: Statistical Models: Theory/Applications (Semester 2: AY 2016-2017) Time Allowed: 2 Hours
No ratings yet
National University of Singapore ST5223: Statistical Models: Theory/Applications (Semester 2: AY 2016-2017) Time Allowed: 2 Hours
14 pages
Analysisof Red Onion Supply (Allium Ascalonicum L.) in Sumenep District
No ratings yet
Analysisof Red Onion Supply (Allium Ascalonicum L.) in Sumenep District
7 pages
Exponential Smoothing
No ratings yet
Exponential Smoothing
5 pages
Final Exam
No ratings yet
Final Exam
4 pages
TP6 Matlab
No ratings yet
TP6 Matlab
5 pages
Customer Data Outliers Pyspark
No ratings yet
Customer Data Outliers Pyspark
1 page
Final Applied Econometrics Test 290922
No ratings yet
Final Applied Econometrics Test 290922
1 page
Appendix Robust Regression
No ratings yet
Appendix Robust Regression
8 pages
Data Mining Models: Techniques and Applications
From Everand
Data Mining Models: Techniques and Applications
Ravi Deshpande
No ratings yet