0% found this document useful (0 votes)
15 views16 pages

Capstone CLA1

The document outlines a food delivery project using Python for data analysis, detailing steps from data importation to cleaning and visualization. Key observations include the dataset's structure, missing values, and insights from visualizations regarding gender distribution and food delivery app preferences. The project culminates in a cleaned dataset ready for further analysis, highlighting issues with data formatting and representation.

Uploaded by

srimegha.sarvani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views16 pages

Capstone CLA1

The document outlines a food delivery project using Python for data analysis, detailing steps from data importation to cleaning and visualization. Key observations include the dataset's structure, missing values, and insights from visualizations regarding gender distribution and food delivery app preferences. The project culminates in a cleaned dataset ready for further analysis, highlighting issues with data formatting and representation.

Uploaded by

srimegha.sarvani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

CAPSTONE PROJECT

CLA-1
FOOD DELIVERY PROJECT
STEP 1:
Defining the objective

STEP 2:

We imported essential Python libraries for data manipulation, analysis, and


visualization:
• pandas for handling datasets
• numpy for numerical operations
• matplotlib.pyplot and seaborn for data visualization
• IPython.display for enhanced output display

STEP 3:

We uploaded the dataset (Food_data.xlsx) into Google Colab using the


files.upload() function from the google.colab module. This allows us to load the
customer order history data for further analysis.
STEP 4:

Your dataset (Food_data.xlsx) has been successfully loaded into a Pandas


DataFrame (df), and you displayed the first few rows using df.head().
Observations:
1. The dataset contains 90 columns with various attributes related to food
ordering habits.
2. Some columns seem to have missing values (NaN), especially towards the
right side of the dataset.
3. The column headers appear misaligned due to incorrect row indexing. The
first row of the DataFrame contains column names instead of actual data,
indicating that you may need to set the first row as the header.

STEP 5:

dataset's column names have been replaced with numerical indices (0 to 89)
instead of actual column names. This likely happened because the first row of the
Excel file was not correctly set as the header.
STEP 6:
Here’s what we have done so far:

1. Checked the Number of Columns:

o We printed the total number of columns in our dataset, which is 90.

o We also printed the column names (which seem to be indexed numerically).

2. Renamed Specific Columns:

o We created a dictionary (rename_mapping) to rename selected columns,


making them more meaningful.

o The renamed columns cover:

▪ General Information: Year of Birth, Gender, Ordering Frequency,


Impact of Food Delivery Apps.

▪ First, Second, and Third Orders: Details like time of order, delivery
app used, cost, type of food, cuisine, rating, and likelihood of
ordering again.

3. Applied the Renaming:

o We used df.rename(columns=rename_mapping, inplace=True) to rename


the selected columns.

4. Created a New DataFrame with Only Selected Columns:

o We extracted only the renamed columns into a new DataFrame (new_df).

o We displayed the first few rows of this new DataFrame using


print(new_df.head()).
STEP 7:
STEP 7:

We have displayed the column names of the dataset using new_df.columns.


STEP 8:
STEP 8:

We have:

• Defined common columns that remain unchanged across orders.

• Created lists of order numbers (1st, 2nd, 3rd) and corresponding column prefixes.

• Iterated through each order, selecting relevant columns dynamically.

• Checked for missing columns and handled them.

• Renamed columns to a consistent format.

• Added an "Order Number" column.

• Concatenated transformed DataFrames into a final DataFrame.

• Displayed and saved the transformed DataFrame.


STEP 9 :

The output shows that the final_df now has standardized column names. Order-specific
columns have been merged with generic names, and an "Order Number" column has been
added to differentiate between orders.

STEP 10:

The output of final_df.info() shows that the dataset has 1,568 entries and 19 columns.
Some columns have missing values, especially in order-related fields. The data type for all
columns is object

STEP 11:
• Checked missing values to identify gaps.
• Dropped columns with >50% missing data.
• Filled missing values (mode for categorical, median for numerical).
• Removed duplicates to avoid redundancy.
• Converted data types (Year of Birth to numeric).
• Standardized text (lowercased, stripped spaces).
• Encoded categorical data (Gender: male → 1, female → 0).
• Final check & saved cleaned data as cleaned_data.csv.
The out put
STEP 12:

Observation & Explanation:

• The bar chart represents the distribution of gender in the dataset.

• The x-axis (Gender) has two categories: 0 and 1, which were encoded from 'Male'
and 'Female'.

• The y-axis (count) shows the number of entries for each gender.

• The dataset contains more individuals in category 1 (female) compared to category


0 (male), but the difference is not very large.

Explanation:

• Since gender was originally a categorical variable, we encoded it for better


processing.

• This visualization helps understand gender representation in the dataset, which


may be useful for further analysis, such as comparing food ordering behavior
across genders.
STEP 13:

Observation & Explanation:

• The bar chart represents food delivery app preferences among users in the dataset.

• The x-axis (Food Delivery Apps) includes Swiggy, Zomato, and Uber Eats, while
Dunzo is missing, likely because it had no recorded orders.

• The y-axis (count) indicates the number of times users placed orders using these
apps.

Key Insights:

1. Zomato is the most preferred app, followed closely by Swiggy.

2. Uber Eats has significantly fewer orders, indicating lower user preference or
availability issues.

3. Dunzo has no representation, which may suggest that it was rarely used for food
delivery in this dataset.

Explanation:

• This visualization helps understand customer preferences for different food


delivery services.

• Businesses can leverage this insight for targeted promotions, partnerships, and
marketing strategies with high-preference apps.
Observation & Explanation:

• The histogram visualizes the distribution of "Year of Birth" in the dataset.

• The x-axis (Year of Birth) represents different birth years, while the y-axis
(Frequency) shows the count of occurrences.

Key Insights:

1. The histogram appears skewed, with most data concentrated on the left side and
very few values spread across the right.

2. The presence of unrealistically large values (e.g., close to 10⁸) suggests incorrect or
misformatted data.

3. This issue likely arises from incorrect data types (e.g., storing birth years as integers
but misformatted as timestamps or large numbers).
The box plot attempts to show the distribution of the cost of the first order, but there is an
issue with the data.

• The scatter plot appears to have mixed categorical and numerical data, leading to
misalignment in axis labels.
• The x-axis (Cost of Order) includes both price ranges and cuisine types, suggesting
incorrect data mapping.
• The y-axis (Delivery Time) contains time intervals but also includes unrelated
categories like "yes", "no", and "maybe",

You might also like