0% found this document useful (0 votes)

42 views23 pages

ML Assignment 1

Uploaded by

Aadrika

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

42 views23 pages

ML Assignment 1

Uploaded by

Aadrika

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

Customer

Segmentation Using
K-Means & DBSCAN
(E-Commerce)
Efrem Joseph Charls 221210037
Gautham Binoy Dev 221210039
Immanuel Thomas Francis 221210050
Table of contents
I II III

Methodology Data Overview Preprocessing

Statistical methods used Brief view of subject Data cleaning,
to explore & model data. matter - the dataset. normalization, etc ...

IV V VI
Clustering
Feature Engineering Analysis Conclusions
Creating new ones & Identifying distinct groups Key insights, trends &
reﬁning those that exist. & data patterns. patterns derived.
I

Methodology
Statistical methods used to explore & model data.
Methodology
Loading the dataset & performing
Data Collection +
Exploratory Data Analysis to
EDA
understand data characteristics.

Preprocessing & Preparing data for clustering and

Feature Engineering creating new features to better capture
underlying patterns within data.
Abstract
Segmenting data via K-Means &
Clustering + Model
DBSCAN while assessing cluster quality
Evaluation
using various metrics.
Developing actionable insights off of
Insight Derivation &
identiﬁed key segments, trends &
Solutioning
anomalies in data.
II

Data Overview
Brief view of subject matter - the dataset.
Data Overview
A unique identiﬁer assigned to each customer. This ﬁeld is critical for grouping transactions
CustomerID
by customer and performing customer-level analysis.

The country where the customer resides, which can give insight into geographical
Country
purchasing patterns.

A unique number assigned to each transaction. This ﬁeld helps to identify individual
InvoiceNo
purchase events and track customer orders.

StockCode A product identiﬁer, representing the items purchased in each transaction.

Description A brief description of each product or stock item.

Quantity The number of units of each product sold in a particular transaction.

UnitPrice The price of a single unit of the product sold.

InvoiceDate The date and time at which the transaction occurred.

4,06,829 rows
Records & Transactions within the Dataset
III

Preprocessing
Data cleaning, normalization & data transformation.
Dataset Challenges & Considerations

Missing Anomalies & Skewed Data

CustomerIDs Outliers
Rows without CustomerID are Unusual transactions, such as Data may be highly skewed,
incomplete and cannot be refunds that indicate negative with a small number of
used for customer values, should not distort customers contributing to a
segmentation. analysis large % of revenue.
Preprocessing Steps

Handling Missing Data

Rows with missing values were removed as they could not contribute to
customer level analysis. Retaining such rows would introduce ambiguity further.

Data Type Conversion

Correct data types are required for efficient processing & analysis. As such,
features from the dataset were mapped to their appropriate data types.

Data Cleaning & Integrity Checks

Ensuring that the data is free from inconsistencies & errors via removal of
duplicates, addressing anomalies & outlier detection.
Preprocessing Steps - Contd

Feature Scaling
Clustering Algorithms are sensitive to the scale / range of data. Larger range
features can dominate smaller range features. (We used Standardization)

Final Prepared Dataset

Final dataset post preprocessing with unique customer rows and is now ready
for further statistical analysis.
IV

Feature Engineering
Creating new ones & reﬁning those that exist.
Key Features Engineered
Recency - R
Number of days since the customer’s last order, represents how recently they’ve
interacted with the platform.
Recency = Current Date - Last Order Date

Frequency - F
Number of orders a customer placed within a speciﬁc time period, indicates
how frequent their purchases are.
Frequency = Count <Order ID>

Monetary - M
The total monetary value of the purchases made by the customer,
reﬂects their contribution to overall revenue.
Monetary = ∑ Order Value
V

Clustering Analysis
Identifying distinct groups & data patterns.
Clustering
Clustering in ML is an unsupervised
learning technique used to group
similar data points into clusters.
Unlike supervised learning,
clustering does not require labelled
data. It automatically discovers
inherent groupings based on the
characteristics of the data.

Common clustering algorithms

include K-Means, DBSCAN,
Hierarchical, Spectral etc … each
with its own set of unique strengths
and weaknesses.
K-Means & DBSCAN
Theoretical Framework - Similarities & Differences
K-Means Similarities DBSCAN
● Best with spherical / Unsupervised Learning ● Can handle arbitrary-shaped
circular clusters. clusters.
● Struggles with irregular Both are unsupervised, i.e. they don’t rely ● Does not struggle with
shapes. on labelled data to group data points. irregular shapes.
● Sensitive to noise & ● Identiﬁes outliers and labels
outliers. Every point is Distance Based them. Does not force data
assigned to a cluster. points into clusters.
● Requires number of ● Automatically determines
Both rely on distance metrics (usually
clusters to be number of clusters based on
euclidean distance) for grouping data.
speciﬁed. data density.
Optimal Number of Clusters K-Means
Elbow Method Silhouette Method
Clusters Formed / Customer Segments
Results
Evaluation Metrics

01 02 03

Silhouette Score Inertia Davies-Bouldin Index

Evaluates the quality of clusters by Measures the Quantiﬁes the average similarity ratio of
measuring how similar a data point is compactness of clusters each cluster with its most similar cluster.
to its own cluster compared to other via SSD between data Lower Value -> better
clusters. points & cluster centroids. Higher Value -> worser

Metrics
VI

Conclusions
Key insights, trends & patterns derived.
Cluster & Result Interpretations
K-Means DBSCAN Noise
The K-Means algorithm
The DBSCAN algorithm, with Customers not classified into
produced clusters that were
a higher silhouette score of any cluster, which could
relatively well-defined, as 0.66, indicated that the represent outliers or less
indicated by an average clusters formed were engaged customers.
silhouette score (0.61). The
comparatively better defined.
clusters formed showed clear The density-based nature of
separation based on the
DBSCAN allowed it to identify
frequency and monetary clusters that may not have
value of customer purchases.
been as apparent with
K-Means.

Clusters Formed : Low , Medium, High Value

THANK YOU
Feel free to ask Questions.

Class9 Lesson Plan 2nd Week of Nov
No ratings yet
Class9 Lesson Plan 2nd Week of Nov
2 pages
COMP9517 Lab3 - Theory
No ratings yet
COMP9517 Lab3 - Theory
16 pages
Case Study Ai Project Report
No ratings yet
Case Study Ai Project Report
7 pages
Słowacja Wszystko PDF
No ratings yet
Słowacja Wszystko PDF
379 pages
TMCB Vol.6 Issue 2 Oct 2024
No ratings yet
TMCB Vol.6 Issue 2 Oct 2024
40 pages
30 Day AI GenAI Roadmap
No ratings yet
30 Day AI GenAI Roadmap
5 pages
CA Mad Article
No ratings yet
CA Mad Article
207 pages
ML Assignment 4
No ratings yet
ML Assignment 4
6 pages
Geru Implementation Guide-2024091512440559
No ratings yet
Geru Implementation Guide-2024091512440559
54 pages
UNIT II-Segmentation, Positioning, and Product Optimization
No ratings yet
UNIT II-Segmentation, Positioning, and Product Optimization
48 pages
TA11 de
No ratings yet
TA11 de
27 pages
Intelligent Agents and Environment
No ratings yet
Intelligent Agents and Environment
9 pages
ML Unit 1
No ratings yet
ML Unit 1
29 pages
MGM3165 Chapter 16 17
No ratings yet
MGM3165 Chapter 16 17
21 pages
Class6 Unsupervised Learning Clustering
No ratings yet
Class6 Unsupervised Learning Clustering
13 pages
Xi Ai Unit - 4 Notes
No ratings yet
Xi Ai Unit - 4 Notes
14 pages
Customer Segemntation
No ratings yet
Customer Segemntation
26 pages
Customer Segmentation With Machine Learning
No ratings yet
Customer Segmentation With Machine Learning
7 pages
Sustainability 16 08934 v2
No ratings yet
Sustainability 16 08934 v2
25 pages
Unsupervised Machine Learning (Customer Segmentation) Online Retail
No ratings yet
Unsupervised Machine Learning (Customer Segmentation) Online Retail
43 pages
Customer Segmentation
No ratings yet
Customer Segmentation
43 pages
Chapter 5 CLUSTERING
No ratings yet
Chapter 5 CLUSTERING
36 pages
Retail Sales Analysis Using Clustering: Dr. M. Rajeshwari, P.R.Bharathi Nandha
No ratings yet
Retail Sales Analysis Using Clustering: Dr. M. Rajeshwari, P.R.Bharathi Nandha
8 pages
JD-Solution Sales Specialist
No ratings yet
JD-Solution Sales Specialist
2 pages
BASIC - 2 Def390
No ratings yet
BASIC - 2 Def390
156 pages
ML Unit 3 MID1
No ratings yet
ML Unit 3 MID1
83 pages
English
No ratings yet
English
13 pages
Customer Segmentation Using Machine Learning
100% (1)
Customer Segmentation Using Machine Learning
28 pages
Research Paper Mini Project
No ratings yet
Research Paper Mini Project
13 pages
mkt304 Indi
No ratings yet
mkt304 Indi
15 pages
Customer Segmentation Using Ensemble Clustering
No ratings yet
Customer Segmentation Using Ensemble Clustering
20 pages
Review2 A15
No ratings yet
Review2 A15
14 pages
Customer Segmentation Using K
No ratings yet
Customer Segmentation Using K
16 pages
Customer Analytics Through Visualization
No ratings yet
Customer Analytics Through Visualization
6 pages
AI Note
No ratings yet
AI Note
10 pages
Data Homework Year 3
100% (1)
Data Homework Year 3
7 pages
DWM PT 2 QB Soln
No ratings yet
DWM PT 2 QB Soln
8 pages
Blue Eyes Technology
No ratings yet
Blue Eyes Technology
8 pages
The Unseen Architect Mathematics in Everyday Life
No ratings yet
The Unseen Architect Mathematics in Everyday Life
8 pages
DWDM Report
No ratings yet
DWDM Report
6 pages
5
No ratings yet
5
14 pages
IJCSP23D1055
No ratings yet
IJCSP23D1055
9 pages
Energy Consumption Prediction System
No ratings yet
Energy Consumption Prediction System
21 pages
Machine Learning Unit-4
No ratings yet
Machine Learning Unit-4
24 pages
288175101
No ratings yet
288175101
51 pages
Customer Categorization by Data Analysis Using Clustering Algorithms of Machine Learning
No ratings yet
Customer Categorization by Data Analysis Using Clustering Algorithms of Machine Learning
4 pages
Enhancing Operational Effectiveness of U.S. Naval Forces in Highly Degraded Environments: Autonomy and Artificial Intelligence in Unmanned Aircraft Systems: Abbreviated Version of Full Report (2022)
No ratings yet
Enhancing Operational Effectiveness of U.S. Naval Forces in Highly Degraded Environments: Autonomy and Artificial Intelligence in Unmanned Aircraft Systems: Abbreviated Version of Full Report (2022)
15 pages
AI Lab
No ratings yet
AI Lab
10 pages
K-Means Clustering For Customer Segmentation - A Practical Example - Kimberly Coffey, PH.D - PDF
100% (2)
K-Means Clustering For Customer Segmentation - A Practical Example - Kimberly Coffey, PH.D - PDF
41 pages
LP I Assignment A4 Clustering
No ratings yet
LP I Assignment A4 Clustering
13 pages
Objectives of Clustering
No ratings yet
Objectives of Clustering
3 pages
DWDM PPT
No ratings yet
DWDM PPT
13 pages
ADS Phase4
No ratings yet
ADS Phase4
21 pages
Final
No ratings yet
Final
48 pages
A Comparative Analyis of K-Means and Its Varinats For Customer Segmentation
No ratings yet
A Comparative Analyis of K-Means and Its Varinats For Customer Segmentation
15 pages
Aiml Project Review
No ratings yet
Aiml Project Review
22 pages
Final Synopsis
No ratings yet
Final Synopsis
9 pages
Case Study Complete Playbook Generative AI Fashion
No ratings yet
Case Study Complete Playbook Generative AI Fashion
19 pages
Suwarti - Final Project
No ratings yet
Suwarti - Final Project
20 pages
To Develop Clusters of The Users Using ML For The Customer Segmentation
No ratings yet
To Develop Clusters of The Users Using ML For The Customer Segmentation
20 pages
Customer Segmentation Report
No ratings yet
Customer Segmentation Report
8 pages
Artificial Intelligence Markup Language (AIML)
No ratings yet
Artificial Intelligence Markup Language (AIML)
17 pages
ML Project Report
No ratings yet
ML Project Report
22 pages
Honey Research Paper
No ratings yet
Honey Research Paper
4 pages
Mall Customer Segmentation Using Machine Learning Techniques
No ratings yet
Mall Customer Segmentation Using Machine Learning Techniques
17 pages
DS MP
No ratings yet
DS MP
18 pages
Building AI Tools
No ratings yet
Building AI Tools
8 pages
IEEE Conference Template
No ratings yet
IEEE Conference Template
5 pages
Prrethy-Dr. Huma Lone - AL
No ratings yet
Prrethy-Dr. Huma Lone - AL
7 pages
IJCRT2407525
No ratings yet
IJCRT2407525
9 pages
Customer Segmentation Using Machine Learning With A Coupon Generator GUI
No ratings yet
Customer Segmentation Using Machine Learning With A Coupon Generator GUI
6 pages
Ads Phase 4
No ratings yet
Ads Phase 4
12 pages
Workshop Project Report
No ratings yet
Workshop Project Report
10 pages
2019 PHD Proposal Ai Machine Learning Bioinformatics
100% (1)
2019 PHD Proposal Ai Machine Learning Bioinformatics
2 pages
Customer Segmentation
No ratings yet
Customer Segmentation
15 pages
Customer Segmentation Using Data Science
No ratings yet
Customer Segmentation Using Data Science
7 pages
Python Machine Learning
No ratings yet
Python Machine Learning
19 pages
JPSP202244
No ratings yet
JPSP202244
7 pages
Mall Customer Segmentation Using Cluster
No ratings yet
Mall Customer Segmentation Using Cluster
6 pages
Customer Segmentation Report
No ratings yet
Customer Segmentation Report
31 pages
Customer Segmentation Using Machine Learning: Ilavendhan@galgotiasuniversity - Edu.in
No ratings yet
Customer Segmentation Using Machine Learning: Ilavendhan@galgotiasuniversity - Edu.in
7 pages
CSUDS Project
No ratings yet
CSUDS Project
13 pages
WQD7005 Case Study - 17219402
No ratings yet
WQD7005 Case Study - 17219402
21 pages
IEEE Conference Template 5
No ratings yet
IEEE Conference Template 5
5 pages
Segmentation Analysis
No ratings yet
Segmentation Analysis
17 pages
Chapter 1,2 Report
No ratings yet
Chapter 1,2 Report
5 pages
The Nature of Software
No ratings yet
The Nature of Software
10 pages
15 Electtical
No ratings yet
15 Electtical
2 pages
Mission: Dimensional Structure
100% (10)
Mission: Dimensional Structure
22 pages
The Secret Of Machine Learning
From Everand
The Secret Of Machine Learning
Mhd Arjunanta
No ratings yet