0% found this document useful (0 votes)

15 views16 pages

Capstone CLA1

The document outlines a food delivery project using Python for data analysis, detailing steps from data importation to cleaning and visualization. Key observations include the dataset's structure, missing values, and insights from visualizations regarding gender distribution and food delivery app preferences. The project culminates in a cleaned dataset ready for further analysis, highlighting issues with data formatting and representation.

Uploaded by

srimegha.sarvani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views16 pages

Capstone CLA1

Uploaded by

srimegha.sarvani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

CAPSTONE PROJECT

CLA-1
FOOD DELIVERY PROJECT
STEP 1:
Defining the objective

STEP 2:

We imported essential Python libraries for data manipulation, analysis, and

visualization:
• pandas for handling datasets
• numpy for numerical operations
• matplotlib.pyplot and seaborn for data visualization
• IPython.display for enhanced output display

STEP 3:

We uploaded the dataset (Food_data.xlsx) into Google Colab using the

files.upload() function from the google.colab module. This allows us to load the
customer order history data for further analysis.
STEP 4:

Your dataset (Food_data.xlsx) has been successfully loaded into a Pandas

DataFrame (df), and you displayed the first few rows using df.head().
Observations:
1. The dataset contains 90 columns with various attributes related to food
ordering habits.
2. Some columns seem to have missing values (NaN), especially towards the
right side of the dataset.
3. The column headers appear misaligned due to incorrect row indexing. The
first row of the DataFrame contains column names instead of actual data,
indicating that you may need to set the first row as the header.

STEP 5:

dataset's column names have been replaced with numerical indices (0 to 89)
instead of actual column names. This likely happened because the first row of the
Excel file was not correctly set as the header.
STEP 6:
Here’s what we have done so far:

1. Checked the Number of Columns:

o We printed the total number of columns in our dataset, which is 90.

o We also printed the column names (which seem to be indexed numerically).

2. Renamed Specific Columns:

o We created a dictionary (rename_mapping) to rename selected columns,

making them more meaningful.

o The renamed columns cover:

▪ General Information: Year of Birth, Gender, Ordering Frequency,

Impact of Food Delivery Apps.

▪ First, Second, and Third Orders: Details like time of order, delivery
app used, cost, type of food, cuisine, rating, and likelihood of
ordering again.

3. Applied the Renaming:

o We used df.rename(columns=rename_mapping, inplace=True) to rename

the selected columns.

4. Created a New DataFrame with Only Selected Columns:

o We extracted only the renamed columns into a new DataFrame (new_df).

o We displayed the first few rows of this new DataFrame using

print(new_df.head()).
STEP 7:
STEP 7:

We have displayed the column names of the dataset using new_df.columns.

STEP 8:
STEP 8:

We have:

• Defined common columns that remain unchanged across orders.

• Created lists of order numbers (1st, 2nd, 3rd) and corresponding column prefixes.

• Iterated through each order, selecting relevant columns dynamically.

• Checked for missing columns and handled them.

• Renamed columns to a consistent format.

• Added an "Order Number" column.

• Concatenated transformed DataFrames into a final DataFrame.

• Displayed and saved the transformed DataFrame.

STEP 9 :

The output shows that the final_df now has standardized column names. Order-specific
columns have been merged with generic names, and an "Order Number" column has been
added to differentiate between orders.

STEP 10:

The output of final_df.info() shows that the dataset has 1,568 entries and 19 columns.
Some columns have missing values, especially in order-related fields. The data type for all
columns is object

STEP 11:
• Checked missing values to identify gaps.
• Dropped columns with >50% missing data.
• Filled missing values (mode for categorical, median for numerical).
• Removed duplicates to avoid redundancy.
• Converted data types (Year of Birth to numeric).
• Standardized text (lowercased, stripped spaces).
• Encoded categorical data (Gender: male → 1, female → 0).
• Final check & saved cleaned data as cleaned_data.csv.
The out put
STEP 12:

Observation & Explanation:

• The bar chart represents the distribution of gender in the dataset.

• The x-axis (Gender) has two categories: 0 and 1, which were encoded from 'Male'
and 'Female'.

• The y-axis (count) shows the number of entries for each gender.

• The dataset contains more individuals in category 1 (female) compared to category

0 (male), but the difference is not very large.

Explanation:

• Since gender was originally a categorical variable, we encoded it for better

processing.

• This visualization helps understand gender representation in the dataset, which

may be useful for further analysis, such as comparing food ordering behavior
across genders.
STEP 13:

Observation & Explanation:

• The bar chart represents food delivery app preferences among users in the dataset.

• The x-axis (Food Delivery Apps) includes Swiggy, Zomato, and Uber Eats, while
Dunzo is missing, likely because it had no recorded orders.

• The y-axis (count) indicates the number of times users placed orders using these
apps.

Key Insights:

1. Zomato is the most preferred app, followed closely by Swiggy.

2. Uber Eats has significantly fewer orders, indicating lower user preference or
availability issues.

3. Dunzo has no representation, which may suggest that it was rarely used for food
delivery in this dataset.

Explanation:

• This visualization helps understand customer preferences for different food

delivery services.

• Businesses can leverage this insight for targeted promotions, partnerships, and
marketing strategies with high-preference apps.
Observation & Explanation:

• The histogram visualizes the distribution of "Year of Birth" in the dataset.

• The x-axis (Year of Birth) represents different birth years, while the y-axis
(Frequency) shows the count of occurrences.

Key Insights:

1. The histogram appears skewed, with most data concentrated on the left side and
very few values spread across the right.

2. The presence of unrealistically large values (e.g., close to 10⁸) suggests incorrect or
misformatted data.

3. This issue likely arises from incorrect data types (e.g., storing birth years as integers
but misformatted as timestamps or large numbers).
The box plot attempts to show the distribution of the cost of the first order, but there is an
issue with the data.

• The scatter plot appears to have mixed categorical and numerical data, leading to
misalignment in axis labels.
• The x-axis (Cost of Order) includes both price ranges and cuisine types, suggesting
incorrect data mapping.
• The y-axis (Delivery Time) contains time intervals but also includes unrelated
categories like "yes", "no", and "maybe",

Foodhub Project Full Code .HTML
89% (9)
Foodhub Project Full Code .HTML
30 pages
Data Cleaning
No ratings yet
Data Cleaning
119 pages
Diwali Sales Analysis EDA 1696347982
No ratings yet
Diwali Sales Analysis EDA 1696347982
8 pages
Universal Data Analytics Algorithm
No ratings yet
Universal Data Analytics Algorithm
51 pages
Internship Report Data Science
100% (1)
Internship Report Data Science
58 pages
Delhivery Feature Engineering - Solution Approach
No ratings yet
Delhivery Feature Engineering - Solution Approach
7 pages
7 K-Means Clustering
No ratings yet
7 K-Means Clustering
27 pages
SMDM Guided Project Sample Business Report
No ratings yet
SMDM Guided Project Sample Business Report
25 pages
MLM Report Customer Churn
No ratings yet
MLM Report Customer Churn
17 pages
Practical No. 01
No ratings yet
Practical No. 01
114 pages
Hduud
No ratings yet
Hduud
55 pages
Practicals
No ratings yet
Practicals
42 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
48 pages
04 DS 2023
No ratings yet
04 DS 2023
63 pages
SMDM Guided Project Ashish
No ratings yet
SMDM Guided Project Ashish
25 pages
Cake Shop Shradha
No ratings yet
Cake Shop Shradha
34 pages
Chapter 2. Pre-Processing Data
No ratings yet
Chapter 2. Pre-Processing Data
37 pages
Customer Segmentation 1683225943
No ratings yet
Customer Segmentation 1683225943
34 pages
Advanced Python Programming Data Science: The University of Sheffield
No ratings yet
Advanced Python Programming Data Science: The University of Sheffield
55 pages
R-Programming Lab Mannual
No ratings yet
R-Programming Lab Mannual
33 pages
Data Visualization: Types of Data Visualization: Charts and Graphs Line Charts
No ratings yet
Data Visualization: Types of Data Visualization: Charts and Graphs Line Charts
15 pages
Pgpdsba Feb 24 Batch Mod2 Project
No ratings yet
Pgpdsba Feb 24 Batch Mod2 Project
26 pages
Business Report On Foodhub Data Analysis
No ratings yet
Business Report On Foodhub Data Analysis
20 pages
Cleaning Data in Python
No ratings yet
Cleaning Data in Python
8 pages
Task 1 - Data Preparation and Customer Analytics - Jupyter Notebook
No ratings yet
Task 1 - Data Preparation and Customer Analytics - Jupyter Notebook
64 pages
Python Class 6 Assignment Solution
No ratings yet
Python Class 6 Assignment Solution
9 pages
Class 12 Practical File Informatics Practices
No ratings yet
Class 12 Practical File Informatics Practices
28 pages
Project
No ratings yet
Project
12 pages
Extracted Notebook Content
No ratings yet
Extracted Notebook Content
17 pages
Task 6
No ratings yet
Task 6
14 pages
Project Sale Analysis
No ratings yet
Project Sale Analysis
8 pages
Walmart Solution PDF
No ratings yet
Walmart Solution PDF
35 pages
BAC 223 A1 Task Sheet
No ratings yet
BAC 223 A1 Task Sheet
5 pages
BigMart PDF
100% (1)
BigMart PDF
42 pages
ML Project - Jupyter Notebook
No ratings yet
ML Project - Jupyter Notebook
5 pages
Big Data
No ratings yet
Big Data
5 pages
Set 2
No ratings yet
Set 2
3 pages
Matplotlib Project Report AIPT
No ratings yet
Matplotlib Project Report AIPT
6 pages
Ds Cs
No ratings yet
Ds Cs
22 pages
1data Cleansing Cheklist
No ratings yet
1data Cleansing Cheklist
2 pages
Eda Indepth
No ratings yet
Eda Indepth
19 pages
PYF Project LearnerNotebook LowCode
No ratings yet
PYF Project LearnerNotebook LowCode
6 pages
Pandas
No ratings yet
Pandas
43 pages
Some Exercises
No ratings yet
Some Exercises
9 pages
Geakmindz Test - Ipynb - Colab
No ratings yet
Geakmindz Test - Ipynb - Colab
8 pages
DAV Practical 2
No ratings yet
DAV Practical 2
6 pages
Machine Learning Project Roadmap
No ratings yet
Machine Learning Project Roadmap
4 pages
Project Template Notebook Ipynb 1
No ratings yet
Project Template Notebook Ipynb 1
23 pages
Data Science in Society Cat
No ratings yet
Data Science in Society Cat
5 pages
Basic Structural Analysis - C.S.Reddy - Google Search PDF
No ratings yet
Basic Structural Analysis - C.S.Reddy - Google Search PDF
2 pages
Exp 8 - LM
No ratings yet
Exp 8 - LM
10 pages
Naan Muthalvan Practical Sample
No ratings yet
Naan Muthalvan Practical Sample
7 pages
BigMart Sales Data Analysis
No ratings yet
BigMart Sales Data Analysis
16 pages
Aerofit Case Study
No ratings yet
Aerofit Case Study
16 pages
1 2 Merged
No ratings yet
1 2 Merged
12 pages
Pandas Prac
No ratings yet
Pandas Prac
4 pages
MGNM - 801 - Ca1
No ratings yet
MGNM - 801 - Ca1
14 pages
NumPy and Pandas Step
No ratings yet
NumPy and Pandas Step
9 pages
Motorsport
No ratings yet
Motorsport
238 pages
SR Designworks: Head Office
100% (1)
SR Designworks: Head Office
15 pages
Abb Iec61850
No ratings yet
Abb Iec61850
20 pages
Name-Subodh Deshkar Branch-Electrical Engineering Subject - Introduction To Drones Roll No - B91
0% (1)
Name-Subodh Deshkar Branch-Electrical Engineering Subject - Introduction To Drones Roll No - B91
8 pages
Power Electronics: Er. Faruk Bin Poyen Dept. of Aeie, Uit, Bu, Burdwan, WB, India
No ratings yet
Power Electronics: Er. Faruk Bin Poyen Dept. of Aeie, Uit, Bu, Burdwan, WB, India
40 pages
Underground Storage Tanks
No ratings yet
Underground Storage Tanks
9 pages
Board Information & Wiring Diagram Power Supply: Com Bat Com V V
No ratings yet
Board Information & Wiring Diagram Power Supply: Com Bat Com V V
1 page
Ict Assignment
No ratings yet
Ict Assignment
3 pages
BW4-3 TP
No ratings yet
BW4-3 TP
4 pages
Emerging Trends in Online Communications
No ratings yet
Emerging Trends in Online Communications
40 pages
Site Boss 530 Manual
No ratings yet
Site Boss 530 Manual
101 pages
Homework - Review Unit 4 - Alexander Daniel Purnomo - 180217522
100% (2)
Homework - Review Unit 4 - Alexander Daniel Purnomo - 180217522
2 pages
Gammon India LTD - Is Not Only The Largest Civil Engineering Construction Company
No ratings yet
Gammon India LTD - Is Not Only The Largest Civil Engineering Construction Company
4 pages
Frequently Asked Questions About Our Cooling and Heating Technology
No ratings yet
Frequently Asked Questions About Our Cooling and Heating Technology
20 pages
War of Currents
No ratings yet
War of Currents
10 pages
EM Waves Applications
No ratings yet
EM Waves Applications
10 pages
Nutp Assignment BBSR E-Mobility Plan: By:-Samvrant Mishra MBA-Infrastructure Management 4 Sem, Rics-Sbe
No ratings yet
Nutp Assignment BBSR E-Mobility Plan: By:-Samvrant Mishra MBA-Infrastructure Management 4 Sem, Rics-Sbe
15 pages
Compare Vehicles Overview: 2011 Honda Odyssey: File Print
No ratings yet
Compare Vehicles Overview: 2011 Honda Odyssey: File Print
10 pages
Thayer Vietnam's New Maritime Strategy - Ends, Ways and Means
100% (1)
Thayer Vietnam's New Maritime Strategy - Ends, Ways and Means
3 pages
Halcyon06 Brochure
No ratings yet
Halcyon06 Brochure
56 pages
Paradigm Shift in Banking
No ratings yet
Paradigm Shift in Banking
3 pages
Precision t1600 Spec Sheet
No ratings yet
Precision t1600 Spec Sheet
2 pages
RFW Series Ps
No ratings yet
RFW Series Ps
1 page
CD700 Welder Brochure
No ratings yet
CD700 Welder Brochure
1 page
Battery Replacement Procedure For TAPI Rev 3
No ratings yet
Battery Replacement Procedure For TAPI Rev 3
3 pages
Fabrication of Metallic Bellow
No ratings yet
Fabrication of Metallic Bellow
18 pages
8051micro Details
No ratings yet
8051micro Details
2 pages
One Night and One Night Only
No ratings yet
One Night and One Night Only
1 page
CS472 Principles of Information Security - Image.marked
No ratings yet
CS472 Principles of Information Security - Image.marked
2 pages

Capstone CLA1

Uploaded by

Capstone CLA1

Uploaded by

CAPSTONE PROJECT

We imported essential Python libraries for data manipulation, analysis, and

We uploaded the dataset (Food_data.xlsx) into Google Colab using the

Your dataset (Food_data.xlsx) has been successfully loaded into a Pandas

1. Checked the Number of Columns:

o We printed the total number of columns in our dataset, which is 90.

o We also printed the column names (which seem to be indexed numerically).

2. Renamed Specific Columns:

o We created a dictionary (rename_mapping) to rename selected columns,

o The renamed columns cover:

▪ General Information: Year of Birth, Gender, Ordering Frequency,

3. Applied the Renaming:

o We used df.rename(columns=rename_mapping, inplace=True) to rename

4. Created a New DataFrame with Only Selected Columns:

o We extracted only the renamed columns into a new DataFrame (new_df).

o We displayed the first few rows of this new DataFrame using

We have displayed the column names of the dataset using new_df.columns.

• Defined common columns that remain unchanged across orders.

• Iterated through each order, selecting relevant columns dynamically.

• Checked for missing columns and handled them.

• Renamed columns to a consistent format.

• Added an "Order Number" column.

• Concatenated transformed DataFrames into a final DataFrame.

• Displayed and saved the transformed DataFrame.

Observation & Explanation:

• The bar chart represents the distribution of gender in the dataset.

• The dataset contains more individuals in category 1 (female) compared to category

• Since gender was originally a categorical variable, we encoded it for better

• This visualization helps understand gender representation in the dataset, which

Observation & Explanation:

1. Zomato is the most preferred app, followed closely by Swiggy.

• This visualization helps understand customer preferences for different food

• The histogram visualizes the distribution of "Year of Birth" in the dataset.

You might also like