Capstone CLA1
Capstone CLA1
CLA-1
FOOD DELIVERY PROJECT
STEP 1:
Defining the objective
STEP 2:
STEP 3:
STEP 5:
dataset's column names have been replaced with numerical indices (0 to 89)
instead of actual column names. This likely happened because the first row of the
Excel file was not correctly set as the header.
STEP 6:
Here’s what we have done so far:
▪ First, Second, and Third Orders: Details like time of order, delivery
app used, cost, type of food, cuisine, rating, and likelihood of
ordering again.
We have:
• Created lists of order numbers (1st, 2nd, 3rd) and corresponding column prefixes.
The output shows that the final_df now has standardized column names. Order-specific
columns have been merged with generic names, and an "Order Number" column has been
added to differentiate between orders.
STEP 10:
The output of final_df.info() shows that the dataset has 1,568 entries and 19 columns.
Some columns have missing values, especially in order-related fields. The data type for all
columns is object
STEP 11:
• Checked missing values to identify gaps.
• Dropped columns with >50% missing data.
• Filled missing values (mode for categorical, median for numerical).
• Removed duplicates to avoid redundancy.
• Converted data types (Year of Birth to numeric).
• Standardized text (lowercased, stripped spaces).
• Encoded categorical data (Gender: male → 1, female → 0).
• Final check & saved cleaned data as cleaned_data.csv.
The out put
STEP 12:
• The x-axis (Gender) has two categories: 0 and 1, which were encoded from 'Male'
and 'Female'.
• The y-axis (count) shows the number of entries for each gender.
Explanation:
• The bar chart represents food delivery app preferences among users in the dataset.
• The x-axis (Food Delivery Apps) includes Swiggy, Zomato, and Uber Eats, while
Dunzo is missing, likely because it had no recorded orders.
• The y-axis (count) indicates the number of times users placed orders using these
apps.
Key Insights:
2. Uber Eats has significantly fewer orders, indicating lower user preference or
availability issues.
3. Dunzo has no representation, which may suggest that it was rarely used for food
delivery in this dataset.
Explanation:
• Businesses can leverage this insight for targeted promotions, partnerships, and
marketing strategies with high-preference apps.
Observation & Explanation:
• The x-axis (Year of Birth) represents different birth years, while the y-axis
(Frequency) shows the count of occurrences.
Key Insights:
1. The histogram appears skewed, with most data concentrated on the left side and
very few values spread across the right.
2. The presence of unrealistically large values (e.g., close to 10⁸) suggests incorrect or
misformatted data.
3. This issue likely arises from incorrect data types (e.g., storing birth years as integers
but misformatted as timestamps or large numbers).
The box plot attempts to show the distribution of the cost of the first order, but there is an
issue with the data.
• The scatter plot appears to have mixed categorical and numerical data, leading to
misalignment in axis labels.
• The x-axis (Cost of Order) includes both price ranges and cuisine types, suggesting
incorrect data mapping.
• The y-axis (Delivery Time) contains time intervals but also includes unrelated
categories like "yes", "no", and "maybe",