0% found this document useful (0 votes)

8 views4 pages

Tasks For Students

The document outlines a data wrangling and preprocessing scenario for customer segmentation in an e-commerce business. It details the dataset structure, including customer demographics and purchasing behavior, and provides tasks for handling missing values, data transformation, feature engineering, and data visualization. The goal is to identify high-value customers and optimize marketing strategies through various analytical techniques.

Uploaded by

raguammu38

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views4 pages

Tasks For Students

Uploaded by

raguammu38

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Data Wrangling and Preprocessing

Scenario: Customer Segmentation for an E-Commerce Business

Business Challenge:

An e-commerce company wants to segment its customers based on purchasing

behavior, demographics, and engagement. The goal is to identify high-value
customers, understand different buyer personas, and optimize marketing strategies.

Dataset Overview:

The dataset contains the following columns:

● Customer_ID – Unique identifier for each customer

● Age – Customer's age
● Gender – Male/Female/Other
● Annual_Income – Yearly income of the customer
● Spending_Score – Score based on purchasing behavior (0-100)
● Purchase_Frequency – Number of orders placed per month
● Last_Transaction_Days – Days since last purchase
● Preferred_Category – Customer’s most frequently purchased product
category

Tasks for Students

1️⃣ Handling Missing Values & Duplicates

● Identify and count missing values in the dataset.
● Remove rows with missing values OR fill missing numerical values using
the mean/median and categorical values with the mode.
● Drop duplicate entries if any exist.

2️⃣ Data Transformation: Scaling & Encoding

● Normalize Annual_Income and Spending_Score using Min-Max Scaling.
● Standardize Purchase_Frequency using Z-score Normalization.
● Convert Gender into numerical values using One-Hot Encoding.
● Label encode the Preferred_Category column.

3️⃣ Feature Engineering

● Create a new feature: Define a Customer Loyalty Score based on
Spending_Score and Purchase_Frequency (e.g., High, Medium, Low).
● Binning: Group customers into different income levels (e.g., Low, Medium,
High).
● Create an engagement metric: Combine Last_Transaction_Days and
Purchase_Frequency to categorize customers as Active, Dormant, or
Churned.

4️⃣ Data Visualization

1. Univariate Analysis (Distribution of Individual Features)

● Age Distribution – Use a histogram or KDE plot to show the distribution

of customer ages.
● Annual Income & Spending Score – Use box plots to detect
income/spending score outliers.
● Preferred Category Count – Use a bar plot to visualize the frequency of
product categories.

2. Bivariate Analysis (Relationship Between Two Features)

● Income vs Spending Score – Use a scatter plot to see spending trends

across income levels.
● Gender vs Spending Score – Use a box plot to compare spending habits
across genders.
● Correlation Heatmap – Show relationships between numerical features
using a heatmap.

3. Multivariate Analysis

● Pair Plot – Use Seaborn’s pairplot to visualize multiple relationships in the

dataset.
4. Customer Segmentation Insights

● Loyalty Score Distribution – Use a bar plot to show the count of High,
Medium, and Low loyalty customers.
● Engagement Status – Use a pie chart to show the percentage of Active,
Dormant, and Churned customers.

5.Interactive Customer Segmentation Analysis with Plotly

● Create an Interactive Scatter Plot of Annual Income vs Spending Score

● Color customers based on their Loyalty Score (High, Medium, Low)
● Use hover effects to display customer details
● Enhance visualization interactivity using Plotly Express

Answers:
# Identify and count missing values
print("Missing Values Count:\n", df.isnull().sum())

# Option 1: Remove rows with missing values

df_dropped = df.dropna()
print("\nData after dropping missing values:\n", df_dropped)

# Option 2: Fill missing values

df_filled = df.copy()
df_filled['A'].fillna(df_filled['A'].mean(), inplace=True) # Fill numerical with
mean
df_filled['B'].fillna(df_filled['B'].mode()[0], inplace=True) # Fill categorical with
mode

print("\nData after filling missing values:\n", df_filled)

# Drop duplicate entries

df_no_duplicates = df_filled.drop_duplicates()
print("\nData after dropping duplicates:\n", df_no_duplicates)

# Normalize Annual_Income & Spending_Score using Min-Max Scaling

scaler = MinMaxScaler()
df[['Annual_Income', 'Spending_Score']] =
scaler.fit_transform(df[['Annual_Income', 'Spending_Score']])

# Standardize Purchase_Frequency using Z-score Normalization

scaler = StandardScaler()
df[['Purchase_Frequency']] = scaler.fit_transform(df[['Purchase_Frequency']])

# Convert Gender into numerical values using One-Hot Encoding

df = pd.get_dummies(df, columns=['Gender'], drop_first=True) # 'drop_first=True'
avoids dummy variable trap

# Label encode the Preferred_Category column

label_encoder = LabelEncoder()
df['Preferred_Category'] = label_encoder.fit_transform(df['Preferred_Category'])

# Display the processed DataFrame

print(df)

Customer Segmentation Project
No ratings yet
Customer Segmentation Project
16 pages
Marketing & Retail Analytics-Milestone 1 - 300521
71% (14)
Marketing & Retail Analytics-Milestone 1 - 300521
18 pages
Low Code AIML USL Project CreditCardCustomerSegmentation Vijay Borade Aug23
67% (3)
Low Code AIML USL Project CreditCardCustomerSegmentation Vijay Borade Aug23
66 pages
Customer Segmentation Using RFM Analysis: Overview
No ratings yet
Customer Segmentation Using RFM Analysis: Overview
11 pages
Dod STD 2183
No ratings yet
Dod STD 2183
19 pages
Oracle WMS PICK (White Paper)
100% (16)
Oracle WMS PICK (White Paper)
35 pages
FIFO
No ratings yet
FIFO
13 pages
Tasks For Students-1
No ratings yet
Tasks For Students-1
3 pages
Ads Phase 5
No ratings yet
Ads Phase 5
23 pages
CSUDS Project
No ratings yet
CSUDS Project
13 pages
Case Study Module 1
No ratings yet
Case Study Module 1
4 pages
Customer Segmentation in Python
No ratings yet
Customer Segmentation in Python
71 pages
PDF Custome Segmentation
No ratings yet
PDF Custome Segmentation
18 pages
Customer Segmentation New
No ratings yet
Customer Segmentation New
11 pages
Data Mining
No ratings yet
Data Mining
10 pages
Suwarti - Final Project
No ratings yet
Suwarti - Final Project
20 pages
DA Final
No ratings yet
DA Final
13 pages
Axe Submission
No ratings yet
Axe Submission
4 pages
DSML - Project Report - Group 3
No ratings yet
DSML - Project Report - Group 3
17 pages
Tech Documentation
No ratings yet
Tech Documentation
5 pages
MRA Project - Shehroz Khan
67% (3)
MRA Project - Shehroz Khan
19 pages
Business Analytics Course
No ratings yet
Business Analytics Course
11 pages
Business Problem Statement
No ratings yet
Business Problem Statement
20 pages
Varshini Phase 2
No ratings yet
Varshini Phase 2
19 pages
MRA MS Week 1
No ratings yet
MRA MS Week 1
11 pages
Machine Learning - Project
80% (10)
Machine Learning - Project
14 pages
RFM How To Automatically Segment Customers Using Purchase Data and A Few Lines of Python
No ratings yet
RFM How To Automatically Segment Customers Using Purchase Data and A Few Lines of Python
8 pages
MRA MS Week 1
No ratings yet
MRA MS Week 1
11 pages
Project Analysis of Shopping Trends Using Data Analytics
No ratings yet
Project Analysis of Shopping Trends Using Data Analytics
4 pages
Assignment 1
No ratings yet
Assignment 1
4 pages
ADS Phase4
No ratings yet
ADS Phase4
21 pages
Customer Segmentation Analysis
No ratings yet
Customer Segmentation Analysis
5 pages
Assignment 3
No ratings yet
Assignment 3
2 pages
K-Means Clustering For Customer Segmentation - A Practical Example - Kimberly Coffey, PH.D - PDF
100% (2)
K-Means Clustering For Customer Segmentation - A Practical Example - Kimberly Coffey, PH.D - PDF
41 pages
Customer Segmentation
No ratings yet
Customer Segmentation
9 pages
Final Ca
No ratings yet
Final Ca
10 pages
Vaibhav Kumar MRA Project Milestone 1
100% (3)
Vaibhav Kumar MRA Project Milestone 1
29 pages
Phase 1
No ratings yet
Phase 1
4 pages
MRA Milestone 1 RFM
No ratings yet
MRA Milestone 1 RFM
28 pages
Daa 01
No ratings yet
Daa 01
11 pages
DWDM PPT
No ratings yet
DWDM PPT
13 pages
Phase 4
No ratings yet
Phase 4
5 pages
Another Project-Creating Customer Segments
No ratings yet
Another Project-Creating Customer Segments
31 pages
Notes
No ratings yet
Notes
50 pages
Five Data
No ratings yet
Five Data
3 pages
AllLife Bank Customer Segmentation Unsupervised Learning-Coded-Project-Business-Report
No ratings yet
AllLife Bank Customer Segmentation Unsupervised Learning-Coded-Project-Business-Report
10 pages
DAB 303 Project 2
No ratings yet
DAB 303 Project 2
12 pages
Text
No ratings yet
Text
3 pages
IIM PBA Assignment 2
No ratings yet
IIM PBA Assignment 2
3 pages
Gaurav Upadhyay ML Project
No ratings yet
Gaurav Upadhyay ML Project
8 pages
Customer Segmentation Project
No ratings yet
Customer Segmentation Project
13 pages
Customer Segmentation 1683225943
No ratings yet
Customer Segmentation 1683225943
34 pages
Mini-Project - Churn Analysis .
No ratings yet
Mini-Project - Churn Analysis .
15 pages
Machine Learning - Customer Segment Project. Approved by UDACITY
100% (1)
Machine Learning - Customer Segment Project. Approved by UDACITY
19 pages
Description
No ratings yet
Description
5 pages
Umendra Pratap singhMRA PROJECT MILESTONE-1 02aprill2023
No ratings yet
Umendra Pratap singhMRA PROJECT MILESTONE-1 02aprill2023
31 pages
TSK 1
No ratings yet
TSK 1
3 pages
Major 74 Team
No ratings yet
Major 74 Team
20 pages
Assignment 1
No ratings yet
Assignment 1
4 pages
Advanced E-Commerce Business Questions and Analytical Hints
From Everand
Advanced E-Commerce Business Questions and Analytical Hints
Zemelak Goraga
No ratings yet
Setting Up an Online Business
From Everand
Setting Up an Online Business
KingsHub
No ratings yet
E-commerce Store Optimization for Beginners
From Everand
E-commerce Store Optimization for Beginners
David. O Otaru
No ratings yet
How To Win Customers Every Day _ Volume 7: Data-Driven Selling: The Complete Guide to Success
From Everand
How To Win Customers Every Day _ Volume 7: Data-Driven Selling: The Complete Guide to Success
Max Editorial
No ratings yet
Ecommerce Customer Dataset
No ratings yet
Ecommerce Customer Dataset
6 pages
Project titles-EBPL 2025
No ratings yet
Project titles-EBPL 2025
2 pages
Effective Participation in Group Discussions
No ratings yet
Effective Participation in Group Discussions
10 pages
bfaceb86-a886-404c-9378-3a8fb918889a
No ratings yet
bfaceb86-a886-404c-9378-3a8fb918889a
10 pages
Ns
No ratings yet
Ns
1 page
Phase S
No ratings yet
Phase S
5 pages
Phase
No ratings yet
Phase
3 pages
Milestone Jansi
No ratings yet
Milestone Jansi
2 pages
Cast 1
No ratings yet
Cast 1
28 pages
Amazon Braket: Developer Guide
No ratings yet
Amazon Braket: Developer Guide
54 pages
Sales Budgeting and Forecasting
0% (1)
Sales Budgeting and Forecasting
16 pages
Heat Capacities of Inorganic and Organic Compounds in The Ideal Gas State
No ratings yet
Heat Capacities of Inorganic and Organic Compounds in The Ideal Gas State
5 pages
YANN - C264 Debug Training
100% (1)
YANN - C264 Debug Training
31 pages
PH and PH Meter-1
100% (1)
PH and PH Meter-1
9 pages
Csma Ca
No ratings yet
Csma Ca
10 pages
Simulation of Pre-Stressed Slabs Using Abaqus CDP Material Model
No ratings yet
Simulation of Pre-Stressed Slabs Using Abaqus CDP Material Model
10 pages
Megersa MBA Thesis For Defense (2024)
No ratings yet
Megersa MBA Thesis For Defense (2024)
74 pages
T7S 1250 Pr332/P Lsirc in 1250A 4P F F: General Information
No ratings yet
T7S 1250 Pr332/P Lsirc in 1250A 4P F F: General Information
3 pages
Varian Catalog GPC-SEC
No ratings yet
Varian Catalog GPC-SEC
40 pages
STD 9 Worksheet On Gravitation-2 - 1695986277296 - Xpq9F
No ratings yet
STD 9 Worksheet On Gravitation-2 - 1695986277296 - Xpq9F
4 pages
Get Finite Element Design of Concrete Structures 2nd Ed Edition G. A. Rombach PDF Ebook With Full Chapters Now
100% (9)
Get Finite Element Design of Concrete Structures 2nd Ed Edition G. A. Rombach PDF Ebook With Full Chapters Now
85 pages
JOTRON TRON UAIS TR-2500 - Operation - Installation Manual
No ratings yet
JOTRON TRON UAIS TR-2500 - Operation - Installation Manual
77 pages
MA26 Meter & MP-T1 Pulser: Document Ref 903158-001 Rev - 1 10/2001
100% (1)
MA26 Meter & MP-T1 Pulser: Document Ref 903158-001 Rev - 1 10/2001
28 pages
Including:: 4 Authors
No ratings yet
Including:: 4 Authors
34 pages
Design and Development of A Petrol-Powered Hammer Mill For Rural Nigerian Farmers
No ratings yet
Design and Development of A Petrol-Powered Hammer Mill For Rural Nigerian Farmers
11 pages
23G-04 1 06
No ratings yet
23G-04 1 06
17 pages
Noblelft FD20-35 Operation & Maintenance Manual
No ratings yet
Noblelft FD20-35 Operation & Maintenance Manual
108 pages
Preliminary Dpp-04: For Unacademy Subscription Use Code - Join For Updates
No ratings yet
Preliminary Dpp-04: For Unacademy Subscription Use Code - Join For Updates
7 pages
Exp Limiting Friction
No ratings yet
Exp Limiting Friction
2 pages
Chapter 4 Bending Part 1
No ratings yet
Chapter 4 Bending Part 1
35 pages
Anatomy Spleen
No ratings yet
Anatomy Spleen
32 pages
Preheat Calculation 2 PDF
No ratings yet
Preheat Calculation 2 PDF
3 pages
0 - A Manual For The Part-Compositor Framework
No ratings yet
0 - A Manual For The Part-Compositor Framework
10 pages
Edi Lab - 2019-2020
No ratings yet
Edi Lab - 2019-2020
13 pages