Workshop Project Report
Workshop Project Report
Submit by,
Divyanshu Khandelwal_2115500055_3S_Class roll no. :- 22
Suryansh Agrawal_2115500147_3S_Class roll no. :- 42
Sonal Mittal_2115500140_3S_Class roll no. :- 40
Anshika Singh_2115500024_3S_Class roll no. :- 10
Introduction:
Customer segmentation is a crucial aspect of marketing
strategies. Clustering algorithms aid in identifying patterns
within data to categorize customers into groups with similar
traits. This project utilizes two clustering algorithms—DBSCAN
and K-Means—to segment customers based on their
purchasing behavior.
Dataset:
The dataset used in this project contains transactional records
from a retail store. It includes attributes such as customer ID,
purchase history, frequency of purchases, and total amount
spent.
Methodology:
Data Preprocessing
2. K-Means Clustering
- K-Means partitions data into K clusters based on centroids'
proximity.
- Parameters: Number of clusters (K).
- Advantages: Simple, scalable, and efficient for large
datasets.
- Implementation: Utilizing scikit-learn's KMeans algorithm.
Model Building and Evaluation
DBSCAN Model
- Identified clusters based on varying epsilon values and
minimum points.
- Evaluated silhouette scores and visualized clusters using
scatter plots.
K-Means Model
- Explored different K values to find optimal clusters.
- Assessed the inertia scores and visualized clusters using
scatter plots.
Comparative Study
Performance Metrics
- Silhouette Score: Measures the compactness and separation
between clusters. Higher scores indicate better-defined
clusters.
-Inertia: Measures how internally coherent clusters are. Lower
values represent better clustering.
Results and Observations
---