0% found this document useful (0 votes)
25 views26 pages

Pgpdsba Feb 24 Batch Mod2 Project

Uploaded by

R Sathish Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views26 pages

Pgpdsba Feb 24 Batch Mod2 Project

Uploaded by

R Sathish Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

PGP PROGRAM IN DATA SCIENCE

AND
BUSINESS ANALYTICS (PGPDSBA)
FEBRUARY 2024 BATCH
SMDM GUIDED PROJECT

NAME: R. SATHISH KUMAR

Project Statistical Methods for Decision Making: Food Hub


Data Analysis

1
Problem
Project Statistical Methods for Decision Making: Food Hub
Data Analysis

Context

The number of restaurants in New York is increasing day by day. Lots of


students and busy professionals rely on those restaurants due to their
hectic lifestyles. Online food delivery service is a great option for them. It
provides them with good food from their favourite restaurants. A food
aggregator company Food Hub offers access to multiple restaurants
through a single smartphone app.

The app allows the restaurants to receive a direct online order from a
customer. The app assigns a delivery person from the company to pick up
the order after it is confirmed by the restaurant. The delivery person then
uses the map to reach the restaurant and waits for the food package.
Once the food package is handed over to the delivery person, he/she
confirms the pick-up in the app and travels to the customer's location to
deliver the food. The delivery person confirms the drop-off in the app after
delivering the food package to the customer. The customer can rate the
order in the app. The food aggregator earns money by collecting a fixed
margin of the delivery order from the restaurants.

Objective

The food aggregator company has stored the data of the different orders
made by the registered customers in their online portal. They want to
analyze the data to get a fair idea about the demand of different
restaurants which will help them in enhancing their customer experience.

2
Suppose you are a Data Scientist at Food hub and the Data Science team
has shared some of the key questions that need to be answered. Perform

the data analysis to find answers to these questions that will help the
company to improve the business.

Data Description

The data contains the different data related to a food order. The detailed
data dictionary is given below.

Data Dictionary

● order_id: Unique ID of the order


● customer_id: ID of the customer who ordered the food
● restaurant_name: Name of the restaurant
● cuisine_type: Cuisine ordered by the customer
● cost_of_the_order: Cost of the order
● day_of_the_week: Indicates whether the order is placed on a weekday
or weekend (The weekday is from Monday to Friday and the weekend
is Saturday and Sunday)
● rating: Rating given by the customer out of 5
● food_preparation_time: Time (in minutes) taken by the restaurant to
prepare the food. This is calculated by taking the difference between
the timestamps of the restaurant's order confirmation and the
delivery person's pick-up confirmation.
● delivery_time: Time (in minutes) taken by the delivery person to
deliver the food package. This is calculated by taking the difference
between the timestamps of the delivery person's pick-up
confirmation and drop-off information

3
Answers: Analysis & Findings

Question 1: How many rows and columns are present in the data?

In the given dataset, it comprises of 1898 rows and 9 columns, providing


a comprehensive scope for data analysis and insights. Understanding the
structure of the dataset is crucial for informed decision-making and
effective business strategies.

Question 2: What are the datatypes of the different columns in the


dataset?

The data types of the different columns in the dataset have been
identified, providing valuable insights into the nature of the information
stored. These types include integers, floats, or objects, essential for
understanding the dataset's structure and facilitating effective analysis.
Additionally, the absence of missing values is indicated to understand and
ensure data integrity, while the memory usage indicates the dataset's
efficiency.

Question 3: Are there any missing values in the data? If yes, treat
them using an appropriate method.

The data has been examined for missing values. Through the analysis, it
was understood that no missing values were detected across any of the
columns. This absence of missing data signifies the dataset's
completeness and reliability, mitigating potential inaccuracies in
subsequent analysis. Such insights are essential for business decision-
making, as they ensure that decisions are based on accurate and
comprehensive information.

4
Question 4: Check the statistical summary of the data. What is the
minimum, average, and maximum time it takes for food to be
prepared once an order is placed?

After carefully analysing the statistical summary of the data, it is found that
the minimum time required for food preparation is 20 minutes, indicating
the shortest duration for order fulfilment. On average, orders are prepared in
approximately 27.37 minutes, providing a benchmark for expected
turnaround time. The maximum preparation time recorded is 35 minutes,
representing the longest duration for order completion. Understanding these
metrics, which is vital for optimizing operational efficiency and meeting
customer expectations in the food service industry.

Question 5: How many orders are not rated?

After assessing the dataset, it has been determined that 736 orders are
marked as "Not given" in the rating category. Consequently, the total
count of orders without a rating stands at 736. This observation suggests
that either these orders have not been rated yet, or the rating information
is absent from the dataset. Understanding such details is critical for
evaluating customer satisfaction levels and addressing any potential gaps
in feedback collection processes within the business operations.

5
Section - 1

Exploratory Data Analysis (EDA)

Univariate Analysis

Question 6: Explore all the variables and provide observations on


their distributions. (Generally, histograms, boxplots, countplots, etc. are
used for univariate exploration.)

After exploring the variables within the dataset, several observations on


their distributions have been understood as following

Order ID: The dataset comprises 1898 unique order IDs.

Customer ID: There are 1200 unique customer IDs recorded, highlighting
the variety of customers served.

Restaurant Name: Across the dataset, 178 unique restaurant names are
identified, showcasing a diverse range of dining establishments.

Cuisine Type: A count plot given below visually represents the


distribution of cuisine types, illustrating the frequency of each cuisine
within the dataset.

The above observations provide valuable insights into the dataset's


composition and are essential for understanding the distribution patterns
within various variables, thereby informing strategic business decisions.

6
The count plot given above offers a visual representation of the
distribution of cuisine types. This facilitates understanding the popularity
of various cuisines among customers.

Cost of the Order:

Visualizations for the 'cost_of_the_order' variable have been generated to


understand its distribution

7
Histogram: The above histogram represents the frequency distribution of
order costs, with each bar indicating the count of orders falling within
specific cost ranges.

Boxplot: Illustrates the central tendency, variability, and potential


outliers in the distribution of order costs through quartiles and median.

Day of the Week:

Unique values in the 'day_of_the_week' column have been extracted to


understand whether orders were placed on weekdays or weekends.

8
Rating:

Unique values in the 'rating' column have been identified, including 'Not
given', '5', '3', and '4'. The count plot visualization showcases the
distribution of ratings, highlighting the frequency of each rating value.

Food Preparation Time:

Visualizations for the 'food_preparation_time' variable has been given


below

Histogram: Displays the distribution of food preparation times, offering


insights into their variability and frequency.

9
Boxplot: Provides information on the central tendency, quartiles, and
potential outliers in food preparation times.

Delivery Time:

Two visualizations have been created for the 'delivery_time' variable:

Histogram: Depicts the distribution of delivery times, aiding in


understanding their frequency and variability.

10
Boxplot: Offers insights into the central tendency, quartiles, and potential
outliers in delivery times.

These visualizations aid in comprehending the characteristics and


distributions of key variables, providing valuable insights for business
decision-making and process optimization.

Question 7: Which are the top 5 restaurants in terms of the number


of orders received?

After analysing for top-rated 5 restaurants, it is found Shake Shack, The


Meatball Shop, and Blue Ribbon Sushi stand out for their exceptional
customer satisfaction.

Question 8: Which is the most popular cuisine on weekends?

After analysing it is understood American cuisine as the most popular


choice on weekends, strategic decisions can be made in menu selection
and promotions based on the analysis. Businesses can align offerings and

11
marketing efforts to capitalize on weekend demand for American cuisine,
enhancing customer satisfaction and driving revenue growth.

Question 9: What percentage of the orders cost more than 20


dollars?

Determining the proportion of orders exceeding $20 provides valuable


insights into customer spending behaviour and pricing effectiveness. With
555 orders surpassing this threshold, they represent approximately
29.24% of total orders. This knowledge allows businesses to evaluate
pricing strategies, optimize revenue generation, and tailor offerings to
meet customer preferences.

Question 10: What is the mean order delivery time?

Understanding the mean order delivery time is essential for evaluating


service efficiency and customer satisfaction. With an average delivery time
of 24.16 minutes, businesses can assess performance against industry
benchmarks and customer expectations.

Question 11: The company has decided to give 20% discount


vouchers to the top 3 most frequent customers. Find the IDs of these
customers and the number of orders they placed.

In the given dataset, it is understood customers with IDs 52832, 47440,


and 83287 stand out as the most frequent purchasers, with order counts
of 13, 10, and 9 respectively. Offering these top customers 20% discount
vouchers can incentivize repeat purchases and strengthen their loyalty to
the business.

12
Section -2

Multivariate Analysis

Question 12: Perform a multivariate analysis to explore relationships


between the important variables in the dataset. (It is a good idea to
explore relations between numerical variables as well as relations between
numerical and categorical variables)

Cuisine vs Cost of the order

In the boxplot given below, we can identify the relationship between the
cost of orders and cuisine types through boxplot visualization, which
provides valuable insights into pricing strategies and customer
preferences. Each boxplot represents the distribution of order costs for a
specific cuisine type, allowing businesses to discern patterns and outliers.
The x-axis denotes different cuisine types, while the y-axis represents
order costs. By visualizing the median, quartiles, and potential outliers,
businesses can identify trends in pricing and understand variations in
order costs across different cuisines. This analysis aids in optimizing
menu pricing, identifying opportunities for upselling or cost adjustments,
and tailoring marketing strategies to specific cuisine preferences.

13
Cuisine vs Food Preparation time

In the boxplot given below, between food preparation time and cuisine
type through boxplot visualization offers valuable insights into operational
efficiency and customer service. Each boxplot represents the distribution
of food preparation times for a specific cuisine type, allowing businesses
to identify trends and outliers. The x-axis displays different cuisine types,
while the y-axis indicates food preparation time. By examining the
boxplot, businesses can understand variations in preparation time across
different cuisines, enabling them to optimize kitchen workflows, allocate
resources effectively, and manage customer expectations. This analysis
facilitates strategic decision-making to streamline operations, reduce wait
times, and enhance overall customer satisfaction, thereby driving
profitability and competitiveness in the food service industry.

14
Day of the Week vs Delivery time

From the below given boxplot, the relationship between the day of the
week and delivery time, which provides valuable insights into service
performance and operational efficiency. Each boxplot depicts the
distribution of delivery times for a specific day of the week, enabling
businesses to identify trends and patterns. The x-axis showcases different
days of the week, while the y-axis denotes delivery time. By analysing the
boxplot, businesses can discern fluctuations in delivery times throughout
the week, allowing them to pinpoint potential inefficiencies or bottlenecks
in the delivery process. This analysis aids in optimizing delivery
scheduling, resource allocation, and logistics management to ensure
timely and reliable service.

15
Observations on Restaurant Revenue:

From the data analysis, it is understood that each restaurant's revenue is


calculated based on the total cost of orders placed. Please find given below
some observations for ready understanding

Highest Revenue: Restaurants like Shake Shack, The Meatball Shop,


and Blue Ribbon Sushi are among the highest revenue generators, with
Shake Shack leading the pack.

Difference in Revenue: There's a notable difference in revenue among


restaurants. For instance, popular spots like Shake Shack rake in
thousands of dollars, while others like Five Guys Burgers and Fries bring
in only a few hundred dollars.

Factors Affecting Revenue: Revenue generation is likely influenced by


factors such as a restaurant's popularity, cuisine type, location, and
pricing strategy. Restaurants with a strong reputation or unique offerings
tend to attract more customers, resulting in higher revenue.

16
Scope for Improvement: Restaurants with lower revenue may want to
consider strategies to attract more customers, improve satisfaction, or
introduce new menu items. Analysing customer feedback and market
trends can help identify areas for improvement and potential for growth.

Overall, understanding revenue patterns is essential for restaurants to


make informed decisions and maximize profitability in a competitive
market.

Rating vs Delivery time

The point plot presented here helps us see how customer ratings are
linked to delivery time. It's clear, as delivery time decreases, customer
ratings tend to go up. This means that when deliveries are faster,
customers are happier and give higher ratings. So, keeping delivery times
short is crucial for making customers happy and satisfied. Businesses
should focus on making their delivery operations smoother to meet
customer needs and improve service quality.

17
Rating vs Food preparation time

The point plot given below helps us see how customer ratings relate to
food preparation time. Some findings are appended below for
understanding

The plot shows us the average time it takes to prepare food for each rating
given by customers. When higher ratings are given, the food tends to be
prepared faster. This means customers who rate their experience highly
usually get their food quicker. But if there's no clear pattern or higher
ratings are linked to longer food preparation times, it might mean there
are areas in the kitchen that need improving, or that customers aren't as
satisfied.

This analysis tells us how important food preparation time is for keeping
customers happy. It helps restaurants figure out where they can make
improvements to make sure customers have a great dining experience
overall.

18
Rating vs Cost of the order

This point plot we've made helps us see how customer ratings are related
to the cost of their orders. Findings are appended below for understanding
from the point plot

The plot shows us the average amount customers spend on their orders
for each rating they give. The interesting part, there is a clear trend
understood, when customers give higher ratings, they tend to spend more
on their orders. This suggests that happier customers are willing to spend
more money.

This analysis gives us a idea of how order cost affects customer


satisfaction. It also helps us see how pricing affects customers. This
information is useful for setting prices and making sure customers are
happy with what they're paying.

19
Correlation among variables

The heatmap created helps us understand how the variables


'cost_of_the_order', 'food_preparation_time', and 'delivery_time' are related
to each other. The observations are given below for understanding

The heatmap uses colors to show us how strongly these variables are
connected to each other.

When the color is closer to 1, it means there's a strong positive


relationship between the variables. But when it's closer to -1, it's a strong
negative relationship. And if it's close to 0, it means there's not much of a
relationship.

By looking at the heatmap, we can quickly see which variables are linked.
For example, if 'food_preparation_time' and 'delivery_time' have a strong
positive relationship, it suggests that when food takes longer to prepare, it
also takes longer to deliver.

This analysis helps us make sense of how different parts of the data are
connected. It gives us a starting point for making decisions based on this
nformation and helps us figure out what to explore next.

20
Question 13: The company wants to provide a promotional offer in
the advertisement of the restaurants. The condition to get the offer
is that the restaurants must have a rating count of more than 50 and
the average rating should be greater than 4. Find the restaurants
fulfilling the criteria to get the promotional offer.

From the analysis, we find that the company aims to promote restaurants
meeting specific criteria, which has a rating count exceeding 50 and an
average rating above 4. The findings are appended below for
understanding

The output identifies top-rated restaurants such as Shake Shack, The


Meatball Shop, Blue Ribbon Sushi, Blue Ribbon Fried Chicken, and
RedFarm Broadway. These restaurants enjoy strong customer satisfaction
and popularity.

Analyzing the list further, we find restaurants meeting the criteria, The
Meatball Shop leads with an average rating of 4.51, followed by Blue
Ribbon Fried Chicken, Shake Shack, and Blue Ribbon Sushi. These high
21
ratings indicate significant customer satisfaction and positive feedback,
making these restaurants ideal candidates for promotional offers.

By focusing promotional efforts on these top-performing restaurants, the


company can leverage their positive reputation to attract more customers
and increase sales. Offering promotions at these establishments can
further enhance customer loyalty and drive business growth.

Question 14: The company charges the restaurant 25% on the orders
having cost greater than 20 dollars and 15% on the orders having
cost greater than 5 dollars. Find the net revenue generated by the
company across all orders.

From the analysis, we find that the focus is on how the company
generates revenue from orders based on specific charging conditions.
Some findings are appended below for understanding

Orders with a cost exceeding $20 incur a charge of 25%, while those
exceeding $5 but not $20 incur a charge of 15%. Orders below $5 do not
incur any charge.

A new column called 'Revenue' is created to store the computed revenue


values based on these charging conditions.

This revenue calculation strategy ensures that higher-priced orders


contribute proportionally more to the company's revenue, reflecting the
greater value of these orders. At the same time, it acknowledges lower-
priced orders by not charging them or applying a lower percentage.

Overall, this approach optimizes revenue generation for the company by


striking a balance between maximizing revenue from higher-priced orders

22
and maintaining affordability for customers across a range of order
values.

Question 15: The company wants to analyze the total time required
to deliver the food. What percentage of orders take more than 60
minutes to get delivered from the time the order is placed? (The food
has to be prepared and then delivered.)

Out of all the orders analysed, 10.54% take more than 60 minutes to be
delivered from the time the order is placed.

Understanding the percentage of orders exceeding the 60-minute delivery


threshold is crucial for evaluating service efficiency and customer
satisfaction. Orders taking longer to deliver may lead to customer
dissatisfaction and impact repeat business. By monitoring this metric, the
company can identify areas for improvement in delivery operations, such
as optimizing kitchen workflows or enhancing delivery logistics. Ensuring
timely deliveries is essential for meeting customer expectations and
maintaining a competitive edge in the market.

Question 16: The company wants to analyze the delivery time of the
orders on weekdays and weekends. How does the mean delivery time
vary during weekdays and weekends?

The company is looking into delivery times on both weekdays and


weekends. Here's what we found

The average delivery time on weekdays is approximately 28 minutes.

The average delivery time on weekends is approximately 22 minutes.

Comparing delivery times between weekdays and weekends helps the


company understand if there are any differences in service efficiency
23
based on the day of the week. These findings suggest that deliveries may
be faster on weekends compared to weekdays. Understanding these
patterns can help the company optimize delivery operations and ensure
timely service for customers, regardless of the day.

Conclusion and Recommendations

Question 17: What are your conclusions from the analysis? What
recommendations would you like to share to help improve the
business? (You can use cuisine type and feedback ratings to drive your
business recommendations.)

Conclusions: Based on the analysis and interpretations, we give below


the conclusion and recommendations for understanding

1. The analysis reveals restaurants with high ratings and frequent


customer orders, indicating strong customer satisfaction. These
restaurants, such as Shake Shack, The Meatball Shop, and Blue
Ribbon Sushi, should be prioritized for promotional offers to further
enhance customer loyalty.

2. The mean delivery time analysis shows variations between weekdays


and weekends. Ensuring prompt delivery, particularly during peak
times like weekends, can improve customer experience. Strategies
like optimizing delivery routes and increasing staffing levels during
busy periods can help reduce delivery times.

3. Implementing differential pricing strategies based on order values


can optimize revenue generation. Charging higher percentage fees for
larger orders ensures that higher-priced orders contribute
proportionally more to the company's revenue.

24
4. Analyzing cuisine types has provided insights into customer
preferences. Restaurants offering popular cuisines like American and
Italian can further capitalize on their offerings by expanding their
menu or introducing promotions targeting specific cuisine
preferences.

5. Regularly monitoring customer feedback and ratings is essential for


identifying areas for improvement. Restaurants with lower ratings
should focus on addressing customer concerns, such as food quality,
delivery times, or menu variety, to enhance overall satisfaction and
loyalty.

Recommendations:

The following points are submitted as recommendations after carefully


analysing the dataset

Prioritize Customer Satisfaction: Given the significance of customer


satisfaction in driving loyalty and repeat business, it's essential to
prioritize initiatives aimed at enhancing the overall dining experience.
This could involve offering personalized promotions and discounts to
frequent customers, ensuring consistent quality across all orders, and
promptly addressing any customer concerns or complaints. By
consistently exceeding customer expectations, the company can build a
loyal customer base and differentiate itself in the competitive market.

Optimize Delivery Efficiency: Delivery times play a crucial role in


customer satisfaction and retention. To improve delivery efficiency, the
company should focus on streamlining its delivery operations. This could
involve implementing advanced routing algorithms to optimize delivery
routes, investing in technology to track and manage delivery personnel,
and maintaining clear communication channels with customers regarding

25
their order status. By reducing delivery times and ensuring timely service,
the company can enhance the overall customer experience and gain a
competitive edge in the market.

Leverage Cuisine Preferences: Understanding and catering to customer


preferences in terms of cuisine types can help the company attract and
retain a diverse customer base. By analysing data on popular cuisines
and consumer trends, the company can tailor its menu offerings and
promotional campaigns to align with customer preferences. This could
involve introducing new dishes or meal options based on trending
cuisines, partnering with popular restaurants to offer exclusive menu
items, or running targeted marketing campaigns to promote specific
cuisine preferences. By leveraging cuisine preferences effectively, the
company can attract more customers and increase order volumes,
ultimately driving business growth and success in the food delivery
market.

*********

26

You might also like