0% found this document useful (0 votes)

212 views23 pages

Project Template Notebook Ipynb 1

Uploaded by

KABILAN S

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

212 views23 pages

Project Template Notebook Ipynb 1

Uploaded by

KABILAN S

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

project-template-notebook-ipynb-1

March 16, 2024

1 Project Statistical Methods for Decision Making: FoodHub

Data Analysis
Marks: 60 points

1.0.1 Context
The number of restaurants in New York is increasing day by day. Lots of students and busy
professionals rely on those restaurants due to their hectic lifestyles. Online food delivery service is
a great option for them. It provides them with good food from their favorite restaurants. A food
aggregator company FoodHub offers access to multiple restaurants through a single smartphone
app.
The app allows the restaurants to receive a direct online order from a customer. The app assigns a
delivery person from the company to pick up the order after it is confirmed by the restaurant. The
delivery person then uses the map to reach the restaurant and waits for the food package. Once
the food package is handed over to the delivery person, he/she confirms the pick-up in the app and
travels to the customer’s location to deliver the food. The delivery person confirms the drop-off in
the app after delivering the food package to the customer. The customer can rate the order in the
app. The food aggregator earns money by collecting a fixed margin of the delivery order from the
restaurants.

1.0.2 Objective
The food aggregator company has stored the data of the different orders made by the registered
customers in their online portal. They want to analyze the data to get a fair idea about the demand
of different restaurants which will help them in enhancing their customer experience. Suppose you
are a Data Scientist at Foodhub and the Data Science team has shared some of the key questions
that need to be answered. Perform the data analysis to find answers to these questions that will
help the company to improve the business.

1.0.3 Data Description

The data contains the different data related to a food order. The detailed data dictionary is given
below.

1.0.4 Data Dictionary

• order_id: Unique ID of the order
• customer_id: ID of the customer who ordered the food

1
• restaurant_name: Name of the restaurant
• cuisine_type: Cuisine ordered by the customer
• cost_of_the_order: Cost of the order
• day_of_the_week: Indicates whether the order is placed on a weekday or weekend (The
weekday is from Monday to Friday and the weekend is Saturday and Sunday)
• rating: Rating given by the customer out of 5
• food_preparation_time: Time (in minutes) taken by the restaurant to prepare the food.
This is calculated by taking the difference between the timestamps of the restaurant’s order
confirmation and the delivery person’s pick-up confirmation.
• delivery_time: Time (in minutes) taken by the delivery person to deliver the food package.
This is calculated by taking the difference between the timestamps of the delivery person’s
pick-up confirmation and drop-off information

1.0.5 Please read the instructions carefully before starting the project.
This is a commented Jupyter IPython Notebook file in which all the instructions and tasks to be
performed are mentioned. Read along carefully to complete the project. * Blanks ‘_______’ are
provided in the notebook that needs to be filled with an appropriate code to get the correct result.
Please replace the blank with the right code snippet. With every ‘_______’ blank, there is a
comment that briefly describes what needs to be filled in the blank space. * Identify the task to
be performed correctly, and only then proceed to write the required code. * Fill the code wherever
asked by the commented lines like “# write your code here” or “# complete the code”. Running
incomplete code may throw an error. * Please run the codes in a sequential manner from the
beginning to avoid any unnecessary errors. * You can the results/observations derived from the
analysis here and use them to create your final presentation.

1.0.6 Let us start by importing the required libraries

[41]: # Import libraries for data manipulation

import numpy as np
import pandas as pd

# Import libraries for data visualization

import matplotlib.pyplot as plt
import seaborn as sns

1.0.7 Understanding the structure of the data

[ ]: # uncomment and run the following lines for Google Colab

# from google.colab import drive
# drive.mount('/content/drive')

[42]: # Read the data

df = pd.read_csv('/content/foodhub_order.csv') ## Fill the blank to read the␣
↪data

# Returns the first 5 rows

df.head()

2
[42]: order_id customer_id restaurant_name cuisine_type \
0 1477147 337525 Hangawi Korean
1 1477685 358141 Blue Ribbon Sushi Izakaya Japanese
2 1477070 66393 Cafe Habana Mexican
3 1477334 106968 Blue Ribbon Fried Chicken American
4 1478249 76942 Dirty Bird to Go American

cost_of_the_order day_of_the_week rating food_preparation_time \

0 30.75 Weekend Not given 25
1 12.08 Weekend Not given 25
2 12.23 Weekday 5 23
3 29.20 Weekend 3 25
4 11.59 Weekday 4 25

delivery_time
0 20
1 23
2 28
3 15
4 24

1.0.8 Question 1: How many rows and columns are present in the data? [0.5 mark]

[3]: # Check the shape of the dataset

df.shape ## Fill in the blank

[3]: (1898, 9)

1.0.9 Question 2: What are the datatypes of the different columns in the dataset?
[0.5 mark]

[4]: df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1898 entries, 0 to 1897
Data columns (total 9 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 order_id 1898 non-null int64
1 customer_id 1898 non-null int64
2 restaurant_name 1898 non-null object
3 cuisine_type 1898 non-null object
4 cost_of_the_order 1898 non-null float64
5 day_of_the_week 1898 non-null object
6 rating 1898 non-null object
7 food_preparation_time 1898 non-null int64
8 delivery_time 1898 non-null int64

3
dtypes: float64(1), int64(4), object(4)
memory usage: 133.6+ KB

1.0.10 Question 3: Are there any missing values in the data? If yes, treat them using
an appropriate method. [1 Mark]

[6]: # Checking for missing values in the data

df.isnull() #Write the appropriate function to print the sum of null values␣
↪for each column

[6]: order_id customer_id restaurant_name cuisine_type cost_of_the_order \

0 False False False False False
1 False False False False False
2 False False False False False
3 False False False False False
4 False False False False False
… … … … … …
1893 False False False False False
1894 False False False False False
1895 False False False False False
1896 False False False False False
1897 False False False False False

day_of_the_week rating food_preparation_time delivery_time

0 False False False False
1 False False False False
2 False False False False
3 False False False False
4 False False False False
… … … … …
1893 False False False False
1894 False False False False
1895 False False False False
1896 False False False False
1897 False False False False

[1898 rows x 9 columns]

1.0.11 Question 4: Check the statistical summary of the data. What is the minimum,
average, and maximum time it takes for food to be prepared once an order is
placed? [2 marks]

[7]: # Get the summary statistics of the numerical data

df.describe() ## Write the appropriate function to print the statitical summary␣
↪of the data (Hint - you have seen this in the case studies before)

4
[7]: order_id customer_id cost_of_the_order food_preparation_time \
count 1.898000e+03 1898.000000 1898.000000 1898.000000
mean 1.477496e+06 171168.478398 16.498851 27.371970
std 5.480497e+02 113698.139743 7.483812 4.632481
min 1.476547e+06 1311.000000 4.470000 20.000000
25% 1.477021e+06 77787.750000 12.080000 23.000000
50% 1.477496e+06 128600.000000 14.140000 27.000000
75% 1.477970e+06 270525.000000 22.297500 31.000000
max 1.478444e+06 405334.000000 35.410000 35.000000

delivery_time
count 1898.000000
mean 24.161749
std 4.972637
min 15.000000
25% 20.000000
50% 25.000000
75% 28.000000
max 33.000000

1.0.12 Question 5: How many orders are not rated? [1 mark]

[8]: df['rating'].value_counts() ## Complete the code

[8]: Not given 736

5 588
4 386
3 188
Name: rating, dtype: int64

1.0.13 Exploratory Data Analysis (EDA)

1.0.14 Univariate Analysis
1.0.15 Question 6: Explore all the variables and provide observations on their dis-
tributions. (Generally, histograms, boxplots, countplots, etc. are used for
univariate exploration.) [8 marks]
Order ID
[9]: # check unique order ID
df['order_id'].nunique()

[9]: 1898

Customer ID
[10]: # check unique customer ID

5
df['customer_id'].nunique() ## Complete the code to find out number of unique␣
↪Customer ID

[10]: 1200

Restaurant name
[11]: # check unique Restaurant Name
df['restaurant_name'].nunique() ## Complete the code to find out number of␣
↪unique Restaurant Name

[11]: 178

Cuisine type
[12]: # Check unique cuisine type
df['cuisine_type'].nunique() ## Complete the code to find out number of ␣
↪unique cuisine type

[12]: 14

[13]: plt.figure(figsize = (15,5))

sns.countplot(data = df, x = 'cuisine_type') ## Create a countplot for cuisine␣
↪type.

[13]: <Axes: xlabel='cuisine_type', ylabel='count'>

Cost of the order

[14]: sns.histplot(data=df,x='cost_of_the_order') ## Histogram for the cost of order
plt.show()
sns.boxplot(data=df,x='cost_of_the_order') ## Boxplot for the cost of order
plt.show()

6
7
Day of the week
[15]: # # Check the unique values
df['day_of_the_week'].nunique() ## Complete the code to check unique values for␣
↪the 'day_of_the_week' column

[15]: 2

[16]: sns.countplot(data = df, x = 'day_of_the_week') ## Complete the code to plot a␣

↪bar graph for 'day_of_the_week' column

[16]: <Axes: xlabel='day_of_the_week', ylabel='count'>

8
Rating
[17]: # Check the unique values
df['rating'].nunique() ## Complete the code to check unique values for the␣
↪'rating' column

[17]: 4

[18]: sns.countplot(data = df, x = 'rating') ## Complete the code to plot bar graph␣
↪for 'rating' column

[18]: <Axes: xlabel='rating', ylabel='count'>

9
Food Preparation time
[19]: sns.histplot(data=df,x='cost_of_the_order') ## Complete the code to plot the␣
↪histogram for the cost of order

plt.show()
sns.boxplot(data=df,x='cost_of_the_order') ## Complete the code to plot the␣
↪boxplot for the cost of order

plt.show()

10
11
Delivery time
[20]: sns.histplot(data=df,x='delivery_time') ## Complete the code to plot the␣
↪histogram for the delivery time

plt.show()
sns.boxplot(data=df,x='delivery_time') ## Complete the code to plot the boxplot␣
↪for the delivery time

plt.show()

12
13
1.0.16 Question 7: Which are the top 5 restaurants in terms of the number of orders
received? [1 mark]

[21]: # Get top 5 restaurants with highest number of orders

df['restaurant_name'].value_counts().head(5) ## Complete the code

[21]: Shake Shack 219

The Meatball Shop 132
Blue Ribbon Sushi 119
Blue Ribbon Fried Chicken 96
Parm 68
Name: restaurant_name, dtype: int64

1.0.17 Question 8: Which is the most popular cuisine on weekends? [1 mark]

[23]: # Get most popular cuisine on weekends

df_weekend = df[df['day_of_the_week'] == 'Weekend']
df_weekend['cuisine_type'].value_counts().head(3) ## Complete the code to check␣
↪unique values for the cuisine type on weekend

14
[23]: American 415
Japanese 335
Italian 207
Name: cuisine_type, dtype: int64

1.0.18 Question 9: What percentage of the orders cost more than 20 dollars? [2
marks]

[24]: # Get orders that cost above 20 dollars

df_greater_than_20 = df[df['cost_of_the_order']>20] ## Write the appropriate␣
↪column name to get the orders having cost above $20

# Calculate the number of total orders where the cost is above 20 dollars
print('The number of total orders that cost above 20 dollars is:',␣
↪df_greater_than_20.shape[0])

# Calculate percentage of such orders in the dataset

percentage = (df_greater_than_20.shape[0] / df.shape[0]) * 100

print("Percentage of orders above 20 dollars:", round(percentage, 2), '%')

The number of total orders that cost above 20 dollars is: 555
Percentage of orders above 20 dollars: 29.24 %

1.0.19 Question 10: What is the mean order delivery time? [1 mark]

[25]: # Get the mean delivery time

mean_del_time = df['delivery_time'].mean() ## Write the appropriate function␣
↪to obtain the mean delivery time

print('The mean delivery time for this dataset is', round(mean_del_time, 2),␣
↪'minutes')

The mean delivery time for this dataset is 24.16 minutes

1.0.20 Question 11: The company has decided to give 20% discount vouchers to the
top 3 most frequent customers. Find the IDs of these customers and the
number of orders they placed. [1 mark]

[26]: # Get the counts of each customer_id

df['customer_id'].value_counts().head(3) ## Write the appropriate column name␣
↪to get the top 5 cmost frequent customers

[26]: 52832 13
47440 10
83287 9
Name: customer_id, dtype: int64

15
1.0.21 Multivariate Analysis
1.0.22 Question 12: Perform a multivariate analysis to explore relationships between
the important variables in the dataset. (It is a good idea to explore rela-
tions between numerical variables as well as relations between numerical and
categorical variables) [9 marks]
Cuisine vs Cost of the order
[27]: # Relationship between cost of the order and cuisine type
plt.figure(figsize=(15,7))
sns.boxplot(x = "cuisine_type", y = "cost_of_the_order", data = df, palette =␣
↪'PuBu')

plt.xticks(rotation = 60)
plt.show()

<ipython-input-27-d4845c8bfb45>:3: FutureWarning:

Passing `palette` without assigning `hue` is deprecated and will be removed in

v0.14.0. Assign the `x` variable to `hue` and set `legend=False` for the same
effect.

sns.boxplot(x = "cuisine_type", y = "cost_of_the_order", data = df, palette =

'PuBu')

Cuisine vs Food Preparation time

[29]: # Relationship between food preparation time and cuisine type
plt.figure(figsize=(15,7))

16
sns.boxplot(data=df,x='food_preparation_time',y='cuisine_type') ## Complete␣
↪the code to visualize the relationship between food preparation time and␣

↪cuisine type using boxplot

plt.xticks(rotation = 60)
plt.show()

Day of the Week vs Delivery time

[30]: # Relationship between day of the week and delivery time
plt.figure(figsize=(15,7))
sns.boxplot(data=df,x='day_of_the_week',y='delivery_time') ## Complete the␣
↪code to visualize the relationship between day of the week and delivery time␣

↪using boxplot

plt.show()

17
Run the below code and write your observations on the revenue generated by the
restaurants.
[31]: df.groupby(['restaurant_name'])['cost_of_the_order'].sum().
↪sort_values(ascending = False).head(14)

[31]: restaurant_name
Shake Shack 3579.53
The Meatball Shop 2145.21
Blue Ribbon Sushi 1903.95
Blue Ribbon Fried Chicken 1662.29
Parm 1112.76
RedFarm Broadway 965.13
RedFarm Hudson 921.21
TAO 834.50
Han Dynasty 755.29
Blue Ribbon Sushi Bar & Grill 666.62
Rubirosa 660.45
Sushi of Gari 46 640.87
Nobu Next Door 623.67
Five Guys Burgers and Fries 506.47
Name: cost_of_the_order, dtype: float64

Rating vs Delivery time

[ ]: # Relationship between rating and delivery time
plt.figure(figsize=(15, 7))
sns.pointplot(x = 'rating', y = 'delivery_time', data = df)
plt.show()

Rating vs Food preparation time

[32]: # Relationship between rating and food preparation time
plt.figure(figsize=(15, 7))
sns.pointplot(data=df,x='rating',y='food_preparation_time') ## Complete the␣
↪code to visualize the relationship between rating and food preparation time␣

↪using pointplot

plt.show()

18
Rating vs Cost of the order
[33]: # Relationship between rating and cost of the order
plt.figure(figsize=(15, 7))
sns.pointplot(data=df,x='rating',y='cost_of_the_order') ## Complete the code␣
↪to visualize the relationship between rating and cost of the order using␣

↪pointplot

plt.show()

Correlation among variables

19
[34]: # Plot the heatmap
col_list = ['cost_of_the_order', 'food_preparation_time', 'delivery_time']
plt.figure(figsize=(15, 7))
sns.heatmap(df[col_list].corr(), annot=True, vmin=-1, vmax=1, fmt=".2f",␣
↪cmap="Spectral")

plt.show()

1.0.23 Question 13: The company wants to provide a promotional offer in the ad-
vertisement of the restaurants. The condition to get the offer is that the
restaurants must have a rating count of more than 50 and the average rating
should be greater than 4. Find the restaurants fulfilling the criteria to get the
promotional offer. [3 marks]

[35]: # Filter the rated restaurants

df_rated = df[df['rating'] != 'Not given'].copy()

# Convert rating column from object to integer

df_rated['rating'] = df_rated['rating'].astype('int')

# Create a dataframe that contains the restaurant names with their rating counts
df_rating_count = df_rated.groupby(['restaurant_name'])['rating'].count().
↪sort_values(ascending = False).reset_index()

df_rating_count.head()

[35]: restaurant_name rating

0 Shake Shack 133
1 The Meatball Shop 84

20
2 Blue Ribbon Sushi 73
3 Blue Ribbon Fried Chicken 64
4 RedFarm Broadway 41

[ ]: # Get the restaurant names that have rating count more than 50
rest_names = df_rating_count['rating']['restaurant_name'] ## Complete the code␣
↪to get the restaurant names having rating count more than 50

# Filter to get the data of restaurants that have rating count more than 50
df_mean_4 = df_rated[df_rated['restaurant_name'].isin(rest_names)].copy()

# Group the restaurant names with their ratings and find the mean rating of␣
↪each restaurant

df_mean_4.groupby(['rating'])['restaurant_name'].mean().sort_values(ascending =␣
↪False).reset_index().dropna() ## Complete the code to find the mean rating

1.0.24 Question 14: The company charges the restaurant 25% on the orders having
cost greater than 20 dollars and 15% on the orders having cost greater than 5
dollars. Find the net revenue generated by the company across all orders. [3
marks]

[ ]: #function to determine the revenue

def compute_rev(x):
if x > 20:
return x*0.25
elif x > 5:
return x*0.15
else:
return x*0

df['Revenue'] = df['order_cost'][ df['order_cost'] > 20].apply(compute_rev) ##␣

↪Write the apprpriate column name to compute the revenue

df.head()

[ ]: # get the total revenue and print it

total_rev = df['Revenue'].df['order_cost'][ df['order_cost'] > 20] ## Write the␣
↪appropriate function to get the total revenue

print('The net revenue is around', round(total_rev, 2), 'dollars')

21
1.0.25 Question 15: The company wants to analyze the total time required to deliver
the food. What percentage of orders take more than 60 minutes to get de-
livered from the time the order is placed? (The food has to be prepared and
then delivered.)[2 marks]

[49]: # Calculate total delivery time and add a new column to the dataframe df to␣
↪store the total delivery time

df['total_time'] = df['food_preparation_time'] + df['delivery_time']

df['total_time']

## Write the code below to find the percentage of orders that have more than 60␣
↪minutes of total delivery time (see Question 9 for reference)

delivery_greater_than_60 = df.loc[(df['total_time'] > 60)]['total_time']

delivery_longer_than_60 = delivery_greater_than_60.count()
print("Total no of orders have more than 60 mins of total delivery time␣
↪is",delivery_longer_than_60)

total_order = df['total_time'].count()
print("Total no of orders is",total_order)

Total no of orders have more than 60 mins of total delivery time is 200
Total no of orders is 1898

1.0.26 Question 16: The company wants to analyze the delivery time of the orders
on weekdays and weekends. How does the mean delivery time vary during
weekdays and weekends? [2 marks]

[36]: # Get the mean delivery time on weekdays and print it

print('The mean delivery time on weekdays is around',
round(df[df['day_of_the_week'] == 'Weekday']['delivery_time'].mean()),
'minutes')

## Write the code below to get the mean delivery time on weekends and print it

The mean delivery time on weekdays is around 28 minutes

1.0.27 Conclusion and Recommendations

1.0.28 Question 17: What are your conclusions from the analysis? What recommen-
dations would you like to share to help improve the business? (You can use
cuisine type and feedback ratings to drive your business recommendations.)
[6 marks]
1.0.29 Conclusions:
*Data information 1.) Data contains 1898 rows and 9columns. 2.) There are three data types
are present in the dataset like float64(1), int64(4) and object(4). 3.) There are no missing values
obtained in the dataset. 4.) There are 736 orders were not rated. 5.) Most popular cuisine is
American cusine, followed by Japanese and Italian cuisine. 6.) Most popular resturant is Shake-

22
Shack 7.)The number of total orders that cost above 20 dollars is: 555 8.) Percentage of orders
above 20 dollars: 29.24 % 9.)The mean delivery time for this dataset is 24.16 minutes 10.)Shake
Shack has the most revenue generated company and aslo the most rated company. 11.) Total no
of orders have more than 60 mins of total delivery time is 200. Total no of orders is 1898 12.)The
mean delivery time on weekdays is around 28 minutes.

1.0.30 Recommendations:
1.)736 out of 1898 orders were not rated, hence, restaurant should have some kind of promotional
offers to encourage customers to rate the orders. 2.)Discount should be given to the customers
for weekday orders to increase the revenue. 3.)Study shows that rating decreased to 3, for longer
delivery time, improvement in the terms of delivery time is needed; however traffic during the
weekdays causes the delay in delivery, which isn’t controllable. 4.)Most of the rating received on
weekend. 5.Promotional offers should be launched to attract weekday customers

Foodhub Project Full Code .HTML
89% (9)
Foodhub Project Full Code .HTML
30 pages
Programming in Parallel With CUDA A Practical Guide (Richard Ansorge)
100% (1)
Programming in Parallel With CUDA A Practical Guide (Richard Ansorge)
477 pages
51 Trading Strategies 1 Pages 1
33% (6)
51 Trading Strategies 1 Pages 1
7 pages
All Life Bank - AIML - ML - Project - Low - Code - Notebook
No ratings yet
All Life Bank - AIML - ML - Project - Low - Code - Notebook
78 pages
FINANCE & RISK ANALYTICS – PROJECT - YARESH VIJAYASUNDARAM
No ratings yet
FINANCE & RISK ANALYTICS – PROJECT - YARESH VIJAYASUNDARAM
48 pages
Capstone Project Final Report Rupesh Kumar PGP-DSBA APR 21C
No ratings yet
Capstone Project Final Report Rupesh Kumar PGP-DSBA APR 21C
77 pages
LGS-K3 Brief Installation Manual For Electrical Part - Rev2 (20181205)
No ratings yet
LGS-K3 Brief Installation Manual For Electrical Part - Rev2 (20181205)
27 pages
InPower Familiarization
100% (1)
InPower Familiarization
99 pages
FRA Project Report - Chilla Nagaraju
100% (1)
FRA Project Report - Chilla Nagaraju
66 pages
ML 2 - Problem Statements and Rubirics
No ratings yet
ML 2 - Problem Statements and Rubirics
3 pages
(EM) FWC ICT 2025 1st Term Paper With Scheme-1
No ratings yet
(EM) FWC ICT 2025 1st Term Paper With Scheme-1
21 pages
Business Report Project - Sheetal - SMDM
100% (1)
Business Report Project - Sheetal - SMDM
20 pages
FRA Main Project Part B Guided
No ratings yet
FRA Main Project Part B Guided
23 pages
FD30T3 Maintenance Manual
100% (1)
FD30T3 Maintenance Manual
11 pages
Time Series Forecasting Jupyter Code - Ipynb
No ratings yet
Time Series Forecasting Jupyter Code - Ipynb
2,484 pages
Finance Risk Analytics - Priyanka Sharma - Business Report
No ratings yet
Finance Risk Analytics - Priyanka Sharma - Business Report
49 pages
FRA Project Report Milestone 1 PDF
No ratings yet
FRA Project Report Milestone 1 PDF
29 pages
Machine Learning Project: Name-Rasmita Mallick Date - 5 September 2021
100% (2)
Machine Learning Project: Name-Rasmita Mallick Date - 5 September 2021
47 pages
Capstone Project Report 2
No ratings yet
Capstone Project Report 2
178 pages
Time Series Forecasting Project (Shoe Sales)
No ratings yet
Time Series Forecasting Project (Shoe Sales)
26 pages
LTE Training-Celcite
No ratings yet
LTE Training-Celcite
72 pages
Statisitics Project 6
100% (2)
Statisitics Project 6
48 pages
Problem Statement
0% (2)
Problem Statement
2 pages
Photogrammetry Surveying
No ratings yet
Photogrammetry Surveying
56 pages
Machine Learning - Nabeel Khan - Final Project Report - Problem 2
100% (1)
Machine Learning - Nabeel Khan - Final Project Report - Problem 2
24 pages
Predictive Modeling
No ratings yet
Predictive Modeling
38 pages
Qetero Service Booking Platform
No ratings yet
Qetero Service Booking Platform
23 pages
Basic Exception Handling
No ratings yet
Basic Exception Handling
7 pages
Car Transport Prediction
100% (2)
Car Transport Prediction
27 pages
Business - Report-Comp-Fin - Data - Part A - Problem
No ratings yet
Business - Report-Comp-Fin - Data - Part A - Problem
17 pages
Dinya Antony MRA ML2
100% (1)
Dinya Antony MRA ML2
24 pages
Wholesale Custumer
100% (1)
Wholesale Custumer
32 pages
Palash Bhai - Machine Learning Assignment
100% (2)
Palash Bhai - Machine Learning Assignment
18 pages
SMDM Project Report-Survi Ghura
100% (1)
SMDM Project Report-Survi Ghura
26 pages
AV Project Shivakumar Vanga
100% (1)
AV Project Shivakumar Vanga
37 pages
Random Forest - US - Heart - Patients - Class
100% (1)
Random Forest - US - Heart - Patients - Class
24 pages
Great Lakes Extraa - Learn Project Business Report - 2-Kavish-Rathod
No ratings yet
Great Lakes Extraa - Learn Project Business Report - 2-Kavish-Rathod
22 pages
SMDM Project Report
100% (1)
SMDM Project Report
19 pages
Chapter 5 - Classification Problems
100% (1)
Chapter 5 - Classification Problems
25 pages
Customer Returns With Quality Inspection in SAP EWM
No ratings yet
Customer Returns With Quality Inspection in SAP EWM
2 pages
Clustering Analysis: Prepared by Muralidharan N
100% (1)
Clustering Analysis: Prepared by Muralidharan N
16 pages
Cold Storage Assignment - Atanu
100% (2)
Cold Storage Assignment - Atanu
11 pages
Problem Statement 1
100% (1)
Problem Statement 1
17 pages
TSF - Project
100% (1)
TSF - Project
5 pages
PYF Project LearnerNotebook LowCode
No ratings yet
PYF Project LearnerNotebook LowCode
6 pages
Problem Statement - Foodhub: Context
0% (1)
Problem Statement - Foodhub: Context
5 pages
Capstone Project Report
No ratings yet
Capstone Project Report
187 pages
Week 1 Quiz
100% (1)
Week 1 Quiz
28 pages
SMDM Project Report
100% (1)
SMDM Project Report
9 pages
Capstone Notes-1
No ratings yet
Capstone Notes-1
18 pages
Open Source Flood Mapping Tools - Qgis River Gis A
No ratings yet
Open Source Flood Mapping Tools - Qgis River Gis A
8 pages
Predictive Modelling Project - Nandini
No ratings yet
Predictive Modelling Project - Nandini
31 pages
PM MG915,917,919,921,922
No ratings yet
PM MG915,917,919,921,922
85 pages
Uber Drive Practice DP PDF
No ratings yet
Uber Drive Practice DP PDF
10 pages
Continuous & Continued Process Verification: Presented by Eoin Hanley 4 July, 2016
No ratings yet
Continuous & Continued Process Verification: Presented by Eoin Hanley 4 July, 2016
39 pages
FDS Lab Manual R21
No ratings yet
FDS Lab Manual R21
47 pages
AS Notebook - PCA - Wine Data-4
100% (1)
AS Notebook - PCA - Wine Data-4
1 page
AFI Changemakers and UNCTAD Delegates Report On Technology 2019
No ratings yet
AFI Changemakers and UNCTAD Delegates Report On Technology 2019
35 pages
Ml-1-Guided-Bus Report
No ratings yet
Ml-1-Guided-Bus Report
35 pages
Rosemount Level Switch
No ratings yet
Rosemount Level Switch
24 pages
Accenture Eligible Students List
No ratings yet
Accenture Eligible Students List
25 pages
SMDM Guided Project Ashish
No ratings yet
SMDM Guided Project Ashish
25 pages
SMDM Guided Project Sample Business Report
No ratings yet
SMDM Guided Project Sample Business Report
25 pages
Rahulsharma - 03 12 23
No ratings yet
Rahulsharma - 03 12 23
25 pages
Nagareddy 18-Nov-2023
No ratings yet
Nagareddy 18-Nov-2023
20 pages
Piping Quiz & An
No ratings yet
Piping Quiz & An
13 pages
PWC Outlook22
No ratings yet
PWC Outlook22
28 pages
1.e10-Unit 3-On TX-GV
No ratings yet
1.e10-Unit 3-On TX-GV
2 pages
Midibox 2
No ratings yet
Midibox 2
8 pages
TestBank IntroToIS 8e TechGuide4
No ratings yet
TestBank IntroToIS 8e TechGuide4
17 pages
Machine Learning Projects PDF
No ratings yet
Machine Learning Projects PDF
5 pages
Sample - Customer Churn Prediction Python Documentation
No ratings yet
Sample - Customer Churn Prediction Python Documentation
33 pages
SMDM Guided Project Sample Business Report
No ratings yet
SMDM Guided Project Sample Business Report
17 pages
Problem 1 - (Download Data) : Importing Nessceary Libraries
No ratings yet
Problem 1 - (Download Data) : Importing Nessceary Libraries
16 pages
Admit Card: Important Points
No ratings yet
Admit Card: Important Points
1 page
Surabhi FRA PartA
No ratings yet
Surabhi FRA PartA
13 pages
AS Extended Buisnesss Report
No ratings yet
AS Extended Buisnesss Report
25 pages
Semester Course Status Title Score Grade Point Grade
No ratings yet
Semester Course Status Title Score Grade Point Grade
2 pages
Comparison Sponge Jet Vs Wet Blasting REI
No ratings yet
Comparison Sponge Jet Vs Wet Blasting REI
2 pages
SMDM Project Report Dipti
No ratings yet
SMDM Project Report Dipti
14 pages
Vijayalakshmi
No ratings yet
Vijayalakshmi
17 pages
Writing Effective Policies and Procedures
No ratings yet
Writing Effective Policies and Procedures
23 pages
Guidelines On Cyber Resilience For Participants of Paynet's Services - 005
No ratings yet
Guidelines On Cyber Resilience For Participants of Paynet's Services - 005
1 page
Assignment 5 - Heuristics and Principles
No ratings yet
Assignment 5 - Heuristics and Principles
4 pages
Ranks and Experience - Tanki Online Wiki
No ratings yet
Ranks and Experience - Tanki Online Wiki
1 page
SMDM Report
No ratings yet
SMDM Report
12 pages
Problem 2 Businessreport ML
No ratings yet
Problem 2 Businessreport ML
9 pages
Learn Dutch On The Web Recommendations
No ratings yet
Learn Dutch On The Web Recommendations
3 pages
End Term Quiz1 - Attempt Review
No ratings yet
End Term Quiz1 - Attempt Review
5 pages
Color: Due On Sunday June 7th, by 11:59PM
No ratings yet
Color: Due On Sunday June 7th, by 11:59PM
2 pages

Project Template Notebook Ipynb 1

Uploaded by

Project Template Notebook Ipynb 1

Uploaded by

project-template-notebook-ipynb-1

March 16, 2024

1 Project Statistical Methods for Decision Making: FoodHub

1.0.3 Data Description

1.0.4 Data Dictionary

1.0.6 Let us start by importing the required libraries

[41]: # Import libraries for data manipulation

# Import libraries for data visualization

1.0.7 Understanding the structure of the data

[ ]: # uncomment and run the following lines for Google Colab

[42]: # Read the data

# Returns the first 5 rows

cost_of_the_order day_of_the_week rating food_preparation_time \

[3]: # Check the shape of the dataset

[6]: # Checking for missing values in the data

[6]: order_id customer_id restaurant_name cuisine_type cost_of_the_order \

day_of_the_week rating food_preparation_time delivery_time

[1898 rows x 9 columns]

[7]: # Get the summary statistics of the numerical data

1.0.12 Question 5: How many orders are not rated? [1 mark]

[8]: df['rating'].value_counts() ## Complete the code

[8]: Not given 736

1.0.13 Exploratory Data Analysis (EDA)

[13]: plt.figure(figsize = (15,5))

[13]: <Axes: xlabel='cuisine_type', ylabel='count'>

Cost of the order

[16]: sns.countplot(data = df, x = 'day_of_the_week') ## Complete the code to plot a␣

[16]: <Axes: xlabel='day_of_the_week', ylabel='count'>

[18]: <Axes: xlabel='rating', ylabel='count'>

[21]: # Get top 5 restaurants with highest number of orders

[21]: Shake Shack 219

1.0.17 Question 8: Which is the most popular cuisine on weekends? [1 mark]

[23]: # Get most popular cuisine on weekends

[24]: # Get orders that cost above 20 dollars

# Calculate percentage of such orders in the dataset

print("Percentage of orders above 20 dollars:", round(percentage, 2), '%')

[25]: # Get the mean delivery time

The mean delivery time for this dataset is 24.16 minutes

[26]: # Get the counts of each customer_id

Passing `palette` without assigning `hue` is deprecated and will be removed in

sns.boxplot(x = "cuisine_type", y = "cost_of_the_order", data = df, palette =

Cuisine vs Food Preparation time

↪cuisine type using boxplot

Day of the Week vs Delivery time

Rating vs Delivery time

Rating vs Food preparation time

Correlation among variables

[35]: # Filter the rated restaurants

# Convert rating column from object to integer

[35]: restaurant_name rating

[ ]: #function to determine the revenue

df['Revenue'] = df['order_cost'][ df['order_cost'] > 20].apply(compute_rev) ##␣

[ ]: # get the total revenue and print it

print('The net revenue is around', round(total_rev, 2), 'dollars')

df['total_time'] = df['food_preparation_time'] + df['delivery_time']

delivery_greater_than_60 = df.loc[(df['total_time'] > 60)]['total_time']

[36]: # Get the mean delivery time on weekdays and print it

The mean delivery time on weekdays is around 28 minutes

1.0.27 Conclusion and Recommendations

You might also like