0% found this document useful (0 votes)

46 views12 pages

Uber - Rides - Analysis - Jupyter Notebook

This document analyzes uber ride data from a dataset containing over 1,000 rides. The analysis includes data cleaning, exploring categorical variables through count plots, and visualizing correlations between features. Key insights are that most rides are for business purposes, meetings and meals are common purposes, and rides are most frequent in the afternoons. The data is preprocessed and encoded before further analysis of relationships between variables.

Uploaded by

bhavesh.sutrakar.02

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

46 views12 pages

Uber - Rides - Analysis - Jupyter Notebook

Uploaded by

bhavesh.sutrakar.02

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

9/12/23, 12:13 AM uber_rides_analysis - Jupyter Notebook

In [1]:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [2]:

dataset = pd.read_csv("UberDataset.csv")
dataset.head()

Out[2]:

START_DATE END_DATE CATEGORY START STOP MILES PURPOSE

01-01-2016 01-01-2016 Fort

0 Business Fort Pierce 5.1 Meal/Entertain
21:11 21:17 Pierce

01-02-2016 01-02-2016 Fort

1 Business Fort Pierce 5.0 NaN
01:25 01:37 Pierce

01-02-2016 01-02-2016 Fort

2 Business Fort Pierce 4.8 Errand/Supplies
20:25 20:38 Pierce

01-05-2016 01-05-2016 Fort

3 Business Fort Pierce 4.7 Meeting
17:31 17:45 Pierce

01-06-2016 01-06-2016 Fort West Palm

4 Business 63.7 Customer Visit
14:42 15:49 Pierce Beach

In [4]:

dataset.shape

Out[4]:

(1156, 7)

In [5]:

dataset.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1156 entries, 0 to 1155
Data columns (total 7 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 START_DATE 1156 non-null object
1 END_DATE 1155 non-null object
2 CATEGORY 1155 non-null object
3 START 1155 non-null object
4 STOP 1155 non-null object
5 MILES 1156 non-null float64
6 PURPOSE 653 non-null object
dtypes: float64(1), object(6)
memory usage: 63.3+ KB

localhost:8888/notebooks/uber_rides_analysis.ipynb 1/12
9/12/23, 12:13 AM uber_rides_analysis - Jupyter Notebook

In [6]:

dataset['PURPOSE'].fillna("NOT", inplace=True)

In [7]:

dataset.head()

Out[7]:

START_DATE END_DATE CATEGORY START STOP MILES PURPOSE

01-01-2016 01-01-2016 Fort

0 Business Fort Pierce 5.1 Meal/Entertain
21:11 21:17 Pierce

01-02-2016 01-02-2016 Fort

1 Business Fort Pierce 5.0 NOT
01:25 01:37 Pierce

01-02-2016 01-02-2016 Fort

2 Business Fort Pierce 4.8 Errand/Supplies
20:25 20:38 Pierce

01-05-2016 01-05-2016 Fort

3 Business Fort Pierce 4.7 Meeting
17:31 17:45 Pierce

01-06-2016 01-06-2016 Fort West Palm

4 Business 63.7 Customer Visit
14:42 15:49 Pierce Beach

In [8]:

dataset['START_DATE'] = pd.to_datetime(dataset['START_DATE'],
errors='coerce')
dataset['END_DATE'] = pd.to_datetime(dataset['END_DATE'],
errors='coerce')

In [9]:

from datetime import datetime

dataset['date'] = pd.DatetimeIndex(dataset['START_DATE']).date
dataset['time'] = pd.DatetimeIndex(dataset['START_DATE']).hour

#changing into categories of day and night

dataset['day-night'] = pd.cut(x=dataset['time'],
bins = [0,10,15,19,24],
labels = ['Morning','Afternoon','Evening','Night'])

In [10]:

dataset.dropna(inplace=True)

In [11]:

dataset.drop_duplicates(inplace=True)

localhost:8888/notebooks/uber_rides_analysis.ipynb 2/12
9/12/23, 12:13 AM uber_rides_analysis - Jupyter Notebook

In [13]:

obj = (dataset.dtypes == 'object')

object_cols = list(obj[obj].index)

unique_values = {}
for col in object_cols:
unique_values[col] = dataset[col].unique().size
unique_values

Out[13]:

{'CATEGORY': 2, 'START': 175, 'STOP': 186, 'PURPOSE': 11, 'date': 291}

localhost:8888/notebooks/uber_rides_analysis.ipynb 3/12
9/12/23, 12:13 AM uber_rides_analysis - Jupyter Notebook

Data Visualization
In [18]:

plt.figure(figsize=(10,5))

plt.subplot(1,2,1)
sns.countplot(data=dataset, x='CATEGORY')
plt.xticks(rotation=90)

plt.subplot(1,2,2)
sns.countplot(data=dataset, x='PURPOSE')
plt.xticks(rotation=90)

Out[18]:

(array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]),
[Text(0, 0, 'Meal/Entertain'),
Text(1, 0, 'NOT'),
Text(2, 0, 'Errand/Supplies'),
Text(3, 0, 'Meeting'),
Text(4, 0, 'Customer Visit'),
Text(5, 0, 'Temporary Site'),
Text(6, 0, 'Between Offices'),
Text(7, 0, 'Charity ($)'),
Text(8, 0, 'Commute'),
Text(9, 0, 'Moving'),
Text(10, 0, 'Airport/Travel')])

localhost:8888/notebooks/uber_rides_analysis.ipynb 4/12
9/12/23, 12:13 AM uber_rides_analysis - Jupyter Notebook

In [20]:

sns.countplot(data=dataset,x='day-night')
plt.xticks(rotation=90)

Out[20]:

(array([0, 1, 2, 3]),
[Text(0, 0, 'Morning'),
Text(1, 0, 'Afternoon'),
Text(2, 0, 'Evening'),
Text(3, 0, 'Night')])

localhost:8888/notebooks/uber_rides_analysis.ipynb 5/12
9/12/23, 12:13 AM uber_rides_analysis - Jupyter Notebook

In [21]:

plt.figure(figsize=(15, 5))
sns.countplot(data=dataset, x='PURPOSE', hue='CATEGORY')
plt.xticks(rotation=90)
plt.show()

Insights from the above count-plots :

Most of the rides are booked for business purpose.

Most of the people book cabs for Meetings and Meal / Entertain purpose.

Most of the cabs are booked in the time duration of 10am-5pm (Afternoon).

In [23]:

from sklearn.preprocessing import OneHotEncoder

object_cols = ['CATEGORY', 'PURPOSE']
OH_encoder = OneHotEncoder(sparse=False)
OH_cols = pd.DataFrame(OH_encoder.fit_transform(dataset[object_cols]))
OH_cols.index = dataset.index
OH_cols.columns = OH_encoder.get_feature_names_out()
df_final = dataset.drop(object_cols, axis=1)
dataset = pd.concat([df_final, OH_cols], axis=1)

C:\Users\ASUS\anaconda3\Lib\site-packages\sklearn\preprocessing\_encoders.
py:972: FutureWarning: `sparse` was renamed to `sparse_output` in version
1.2 and will be removed in 1.4. `sparse_output` is ignored unless you leav
e `sparse` to its default value.
warnings.warn(

localhost:8888/notebooks/uber_rides_analysis.ipynb 6/12
9/12/23, 12:13 AM uber_rides_analysis - Jupyter Notebook

In [24]:

plt.figure(figsize=(12, 6))
sns.heatmap(dataset.corr(),
cmap='BrBG',
fmt='.2f',
linewidths=2,
annot=True)

C:\Users\ASUS\AppData\Local\Temp\ipykernel_10148\1039674243.py:2: FutureWa
rning: The default value of numeric_only in DataFrame.corr is deprecated.
In a future version, it will default to False. Select only valid columns o
r specify the value of numeric_only to silence this warning.
sns.heatmap(dataset.corr(),

Out[24]:

<Axes: >

Insights from the heatmap:

Business and Personal Category are highly negatively correlated, this have already proven earlier. So this
plot, justifies the above conclusions.

There is not much correlation between the features.

localhost:8888/notebooks/uber_rides_analysis.ipynb 7/12
9/12/23, 12:13 AM uber_rides_analysis - Jupyter Notebook

In [25]:

dataset['MONTH'] = pd.DatetimeIndex(dataset['START_DATE']).month
month_label = {1.0: 'Jan', 2.0: 'Feb', 3.0: 'Mar', 4.0: 'April',
5.0: 'May', 6.0: 'June', 7.0: 'July', 8.0: 'Aug',
9.0: 'Sep', 10.0: 'Oct', 11.0: 'Nov', 12.0: 'Dec'}
dataset["MONTH"] = dataset.MONTH.map(month_label)

mon = dataset.MONTH.value_counts(sort=False)

# Month total rides count vs Month ride max count

df = pd.DataFrame({"MONTHS": mon.values,
"VALUE COUNT": dataset.groupby('MONTH',
sort=False)['MILES'].max()})

p = sns.lineplot(data=df)
p.set(xlabel="MONTHS", ylabel="VALUE COUNT")

Out[25]:

[Text(0.5, 0, 'MONTHS'), Text(0, 0.5, 'VALUE COUNT')]

Insights from the above plot :

The counts are very irregular.

Still its very clear that the counts are very less during Nov, Dec, Jan, which justifies the fact that time winters
are there in Florida, US.

In [26]:

dataset['DAY'] = dataset.START_DATE.dt.weekday
day_label = {
0: 'Mon', 1: 'Tues', 2: 'Wed', 3: 'Thus', 4: 'Fri', 5: 'Sat', 6: 'Sun'
}
dataset['DAY'] = dataset['DAY'].map(day_label)

localhost:8888/notebooks/uber_rides_analysis.ipynb 8/12
9/12/23, 12:13 AM uber_rides_analysis - Jupyter Notebook

In [27]:

day_label = dataset.DAY.value_counts()
sns.barplot(x=day_label.index, y=day_label);
plt.xlabel('DAY')
plt.ylabel('COUNT')

Out[27]:

Text(0, 0.5, 'COUNT')

In [28]:

sns.boxplot(dataset['MILES'])

Out[28]:

<Axes: >

localhost:8888/notebooks/uber_rides_analysis.ipynb 9/12
9/12/23, 12:13 AM uber_rides_analysis - Jupyter Notebook

In [29]:

sns.boxplot(dataset[dataset['MILES']<100]['MILES'])

Out[29]:

<Axes: >

localhost:8888/notebooks/uber_rides_analysis.ipynb 10/12
9/12/23, 12:13 AM uber_rides_analysis - Jupyter Notebook

In [30]:

sns.distplot(dataset[dataset['MILES']<40]['MILES'])

C:\Users\ASUS\AppData\Local\Temp\ipykernel_10148\1678554178.py:1: UserWarn
ing:

`distplot` is a deprecated function and will be removed in seaborn v0.14.

Please adapt your code to use either `displot` (a figure-level function wi

th
similar flexibility) or `histplot` (an axes-level function for histogram
s).

For a guide to updating your code to use the new functions, please see
https://gist.github.com/mwaskom/de44147ed2974457ad6372750bbe5751 (https://
gist.github.com/mwaskom/de44147ed2974457ad6372750bbe5751)

sns.distplot(dataset[dataset['MILES']<40]['MILES'])

Out[30]:

<Axes: xlabel='MILES', ylabel='Density'>

Insights from the above plots :

Most of the cabs booked for the distance of 4-5 miles.

Majorly people chooses cabs for the distance of 0-20 miles.

For distance more than 20 miles cab counts is nearly negligible.

localhost:8888/notebooks/uber_rides_analysis.ipynb 11/12
9/12/23, 12:13 AM uber_rides_analysis - Jupyter Notebook

localhost:8888/notebooks/uber_rides_analysis.ipynb 12/12

Jonathan Ma Resume
100% (2)
Jonathan Ma Resume
2 pages
Delhivery Mani
No ratings yet
Delhivery Mani
79 pages
Oops Abap Notes
100% (1)
Oops Abap Notes
16 pages
Filipino Alphabet Tracing
No ratings yet
Filipino Alphabet Tracing
28 pages
Exercise - Multivariate Analysis - Jupyter Notebook
No ratings yet
Exercise - Multivariate Analysis - Jupyter Notebook
14 pages
Uber Drive Practice DP PDF
No ratings yet
Uber Drive Practice DP PDF
10 pages
UBER Data Wrangling
No ratings yet
UBER Data Wrangling
45 pages
Uber - Analysis - Jupyter - Notebook
100% (1)
Uber - Analysis - Jupyter - Notebook
10 pages
Uber ml1 - Jupyter Notebook
No ratings yet
Uber ml1 - Jupyter Notebook
10 pages
Uber
No ratings yet
Uber
7 pages
Merged
No ratings yet
Merged
47 pages
ML 1 16
No ratings yet
ML 1 16
13 pages
Data Visualization
No ratings yet
Data Visualization
13 pages
P1) Code Uber
No ratings yet
P1) Code Uber
6 pages
N N N N N N: A Ovel Approach To A Alyze Uber Datausi G Machi E Lear I G
No ratings yet
N N N N N N: A Ovel Approach To A Alyze Uber Datausi G Machi E Lear I G
17 pages
Assignment No 1 Output
No ratings yet
Assignment No 1 Output
42 pages
How To Convert Casuals To Members?": Google Data Analytics Course Capstone Project: Case Study 1 "Cyclistic"
No ratings yet
How To Convert Casuals To Members?": Google Data Analytics Course Capstone Project: Case Study 1 "Cyclistic"
18 pages
Analyzing Taxi Trends
No ratings yet
Analyzing Taxi Trends
43 pages
Main - Py Text File
No ratings yet
Main - Py Text File
5 pages
Lab1.ipynb - Colaboratory
No ratings yet
Lab1.ipynb - Colaboratory
9 pages
Python For Machine Learning
No ratings yet
Python For Machine Learning
66 pages
ML Practical 1
No ratings yet
ML Practical 1
15 pages
Anagh-Desai BigDataAssignments Uber Data Analysis Using RDD
No ratings yet
Anagh-Desai BigDataAssignments Uber Data Analysis Using RDD
4 pages
Task-2 Example Code
No ratings yet
Task-2 Example Code
8 pages
Notes Uber Data Analysis Project
No ratings yet
Notes Uber Data Analysis Project
11 pages
Predict The Price of The Uber Ride From A Given Pickup Point To The Agreed Drop-Off Location
No ratings yet
Predict The Price of The Uber Ride From A Given Pickup Point To The Agreed Drop-Off Location
9 pages
AIL303 M
No ratings yet
AIL303 M
22 pages
ML - Practical - 1 - Jupyter Notebook
No ratings yet
ML - Practical - 1 - Jupyter Notebook
15 pages
MML Chinmay
No ratings yet
MML Chinmay
10 pages
Lab Record Dev
No ratings yet
Lab Record Dev
20 pages
Dev Record Final
No ratings yet
Dev Record Final
34 pages
Data Vizualization - Jupyter Notebook
No ratings yet
Data Vizualization - Jupyter Notebook
20 pages
Pyt Manual 1
No ratings yet
Pyt Manual 1
85 pages
Loading The Dataset: First We Load The Dataset and Find Out The Number of Columns, Rows, NULL Values, Etc
100% (1)
Loading The Dataset: First We Load The Dataset and Find Out The Number of Columns, Rows, NULL Values, Etc
8 pages
2016MIS013
No ratings yet
2016MIS013
36 pages
AL Notes
No ratings yet
AL Notes
61 pages
3 Creating Features - Kaggle
No ratings yet
3 Creating Features - Kaggle
14 pages
Semi-Automated Exploratory Data Analysis (EDA) in Python - by Destin Gong - Mar, 2021 - Towards Data
No ratings yet
Semi-Automated Exploratory Data Analysis (EDA) in Python - by Destin Gong - Mar, 2021 - Towards Data
3 pages
Dev Record Aids
No ratings yet
Dev Record Aids
24 pages
3rd Semester DDM AI DAA DEV Print Pages For Spiral Record 25-1-24 - Removed
No ratings yet
3rd Semester DDM AI DAA DEV Print Pages For Spiral Record 25-1-24 - Removed
28 pages
ML A 6 Project
No ratings yet
ML A 6 Project
18 pages
Bda Report1
No ratings yet
Bda Report1
17 pages
Uber Analysis Python Project in R
No ratings yet
Uber Analysis Python Project in R
29 pages
DMDS Mini Project Final
No ratings yet
DMDS Mini Project Final
15 pages
Name: Siddhesh Asati: #Group: B (ML) #Assignment: 6
No ratings yet
Name: Siddhesh Asati: #Group: B (ML) #Assignment: 6
9 pages
Cleaning Data in Python
No ratings yet
Cleaning Data in Python
8 pages
Uber Data Analysis: Data Import and Sanity Checks
No ratings yet
Uber Data Analysis: Data Import and Sanity Checks
16 pages
About The Dataset - Car Evaluation Dataset (UCI Machine Learning Repository
No ratings yet
About The Dataset - Car Evaluation Dataset (UCI Machine Learning Repository
5 pages
Exp 8 - LM
No ratings yet
Exp 8 - LM
10 pages
Jupyter Notebook Project CART RF ANN
100% (1)
Jupyter Notebook Project CART RF ANN
41 pages
Regression Linaire Python Tome I
No ratings yet
Regression Linaire Python Tome I
9 pages
Data Cleaning
No ratings yet
Data Cleaning
13 pages
Data Preprocessing - 241024 - 215531
No ratings yet
Data Preprocessing - 241024 - 215531
40 pages
Eda Notes
No ratings yet
Eda Notes
4 pages
Capstone Report: FIRST NAME: Gopalakrishnan LAST NAME: Kalarikovilagam Subramanian M12821535
No ratings yet
Capstone Report: FIRST NAME: Gopalakrishnan LAST NAME: Kalarikovilagam Subramanian M12821535
17 pages
Report of BDA Mini Project
No ratings yet
Report of BDA Mini Project
11 pages
Bank Marketing Ingles
No ratings yet
Bank Marketing Ingles
37 pages
Online Payments Fraud Detection Documentation
No ratings yet
Online Payments Fraud Detection Documentation
40 pages
Uber 240119080622 21f5d214
No ratings yet
Uber 240119080622 21f5d214
30 pages
Data Mining - Project
100% (2)
Data Mining - Project
11 pages
Investigate A Dataset-2
No ratings yet
Investigate A Dataset-2
9 pages
ML - 2 - Jupyter Notebook
No ratings yet
ML - 2 - Jupyter Notebook
6 pages
Exercises - Chapter 3 - Reference and Inference
No ratings yet
Exercises - Chapter 3 - Reference and Inference
4 pages
Molloy College Division of Education Lesson Plan
No ratings yet
Molloy College Division of Education Lesson Plan
4 pages
04 0862 02 MS 4RP AFP tcm143-736388
No ratings yet
04 0862 02 MS 4RP AFP tcm143-736388
10 pages
Continue
No ratings yet
Continue
5 pages
Acting With IRISH Manual
100% (1)
Acting With IRISH Manual
23 pages
DLL Proper Use of Tools in Embroidery
50% (6)
DLL Proper Use of Tools in Embroidery
3 pages
Improving Reading Comprehension Skill of Civil Engineering Students Through Collaborative Strategy
No ratings yet
Improving Reading Comprehension Skill of Civil Engineering Students Through Collaborative Strategy
9 pages
Individual Assignment II
No ratings yet
Individual Assignment II
2 pages
Makerere University Business School Report
No ratings yet
Makerere University Business School Report
32 pages
Week 2
No ratings yet
Week 2
30 pages
3rd Form
No ratings yet
3rd Form
6 pages
Chart For Kannada
No ratings yet
Chart For Kannada
5 pages
Final Time Table For Mock 2025
No ratings yet
Final Time Table For Mock 2025
2 pages
Practice Module 2 Introduction To Programming: NIM/Name: 4312111010/abdan Fauzan Nurtsani
No ratings yet
Practice Module 2 Introduction To Programming: NIM/Name: 4312111010/abdan Fauzan Nurtsani
6 pages
Worksheet Unit 8
No ratings yet
Worksheet Unit 8
5 pages
Huynh Duc Huy - FE Developer
No ratings yet
Huynh Duc Huy - FE Developer
3 pages
Prathyusha: Engineering College
No ratings yet
Prathyusha: Engineering College
41 pages
GOT Barcode Reader Function
No ratings yet
GOT Barcode Reader Function
8 pages
12.MODULE 12. Historical-Biographical Criticism - Lecture
No ratings yet
12.MODULE 12. Historical-Biographical Criticism - Lecture
2 pages
Revise Tos Grade 3
100% (1)
Revise Tos Grade 3
7 pages
CCNA 200-301: Number: 200-301 Passing Score: 825 Time Limit: 120 Min File Version: 1.0
No ratings yet
CCNA 200-301: Number: 200-301 Passing Score: 825 Time Limit: 120 Min File Version: 1.0
46 pages
7 Cs of Communication
No ratings yet
7 Cs of Communication
2 pages
Shady Hekmat Nasser
No ratings yet
Shady Hekmat Nasser
607 pages
Ang Kiukok
No ratings yet
Ang Kiukok
14 pages
Music Resource Guide 4dedbf9014
No ratings yet
Music Resource Guide 4dedbf9014
15 pages
Dokumen - Tips - Carsim Quick Start
No ratings yet
Dokumen - Tips - Carsim Quick Start
68 pages
JHS 1 Eng WK7
No ratings yet
JHS 1 Eng WK7
5 pages

Uber - Rides - Analysis - Jupyter Notebook

Uploaded by

Uber - Rides - Analysis - Jupyter Notebook

Uploaded by

9/12/23, 12:13 AM uber_rides_analysis - Jupyter Notebook

START_DATE END_DATE CATEGORY START STOP MILES PURPOSE

01-01-2016 01-01-2016 Fort

01-02-2016 01-02-2016 Fort

01-02-2016 01-02-2016 Fort

01-05-2016 01-05-2016 Fort

01-06-2016 01-06-2016 Fort West Palm

START_DATE END_DATE CATEGORY START STOP MILES PURPOSE

01-01-2016 01-01-2016 Fort

01-02-2016 01-02-2016 Fort

01-02-2016 01-02-2016 Fort

01-05-2016 01-05-2016 Fort

01-06-2016 01-06-2016 Fort West Palm

from datetime import datetime

#changing into categories of day and night

obj = (dataset.dtypes == 'object')

{'CATEGORY': 2, 'START': 175, 'STOP': 186, 'PURPOSE': 11, 'date': 291}

Insights from the above count-plots :

from sklearn.preprocessing import OneHotEncoder

Insights from the heatmap:

There is not much correlation between the features.

# Month total rides count vs Month ride max count

[Text(0.5, 0, 'MONTHS'), Text(0, 0.5, 'VALUE COUNT')]

Insights from the above plot :

Text(0, 0.5, 'COUNT')

`distplot` is a deprecated function and will be removed in seaborn v0.14.

Please adapt your code to use either `displot` (a figure-level function wi

<Axes: xlabel='MILES', ylabel='Density'>

Insights from the above plots :

Majorly people chooses cabs for the distance of 0-20 miles.

For distance more than 20 miles cab counts is nearly negligible.

You might also like