0% found this document useful (0 votes)
8 views10 pages

Workshop Project Report

This project report details using DBSCAN and K-Means clustering algorithms to segment customers based on their purchasing behavior from transactional record data. The methodology included data preprocessing, implementing the clustering algorithms in Python, comparing model performance using metrics like silhouette score and inertia, and concluding that K-Means demonstrated simplicity and identified well-defined customer clusters while requiring parameter tuning for DBSCAN. Key results and recommendations are provided.

Uploaded by

Rajveer Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views10 pages

Workshop Project Report

This project report details using DBSCAN and K-Means clustering algorithms to segment customers based on their purchasing behavior from transactional record data. The methodology included data preprocessing, implementing the clustering algorithms in Python, comparing model performance using metrics like silhouette score and inertia, and concluding that K-Means demonstrated simplicity and identified well-defined customer clusters while requiring parameter tuning for DBSCAN. Key results and recommendations are provided.

Uploaded by

Rajveer Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Workshop Project Report

Year of Submission: - 2023-24

Submit by,
Divyanshu Khandelwal_2115500055_3S_Class roll no. :- 22
Suryansh Agrawal_2115500147_3S_Class roll no. :- 42
Sonal Mittal_2115500140_3S_Class roll no. :- 40
Anshika Singh_2115500024_3S_Class roll no. :- 10

Department of Computer Engineering and Applications


GLA University, Mathura
Project Report: Customer Segmentation through Clustering
Analysis

Introduction:
Customer segmentation is a crucial aspect of marketing
strategies. Clustering algorithms aid in identifying patterns
within data to categorize customers into groups with similar
traits. This project utilizes two clustering algorithms—DBSCAN
and K-Means—to segment customers based on their
purchasing behavior.

Dataset:
The dataset used in this project contains transactional records
from a retail store. It includes attributes such as customer ID,
purchase history, frequency of purchases, and total amount
spent.
Methodology:

Data Preprocessing

1. Data Cleaning: Removing duplicates, handling missing


values, and ensuring data consistency.

2. Feature Selection: Choosing relevant attributes for


clustering, such as purchase frequency and total
spending.

3. Feature Scaling: Normalizing numerical features to ensure


uniformity.
Clustering Algorithms

1. DBSCAN (Density-Based Spatial Clustering of Applications


with Noise)
- DBSCAN identifies clusters based on density. It groups
together points that are closely packed.
- Parameters: Epsilon (ε) and Minimum Points (MinPts).
- Advantages: Robust to outliers and doesn’t require
specifying the number of clusters.
- Implementation: Using scikit-learn's DBSCAN algorithm.

2. K-Means Clustering
- K-Means partitions data into K clusters based on centroids'
proximity.
- Parameters: Number of clusters (K).
- Advantages: Simple, scalable, and efficient for large
datasets.
- Implementation: Utilizing scikit-learn's KMeans algorithm.
Model Building and Evaluation

DBSCAN Model
- Identified clusters based on varying epsilon values and
minimum points.
- Evaluated silhouette scores and visualized clusters using
scatter plots.

K-Means Model
- Explored different K values to find optimal clusters.
- Assessed the inertia scores and visualized clusters using
scatter plots.
Comparative Study

Performance Metrics
- Silhouette Score: Measures the compactness and separation
between clusters. Higher scores indicate better-defined
clusters.
-Inertia: Measures how internally coherent clusters are. Lower
values represent better clustering.
Results and Observations

- DBSCAN: Showed varying performance with different


parameter settings. Achieved silhouette score of X.
- K-Means: Found an optimal number of clusters (K) with
silhouette score of Y and inertia value of Z.
Conclusion

- Both algorithms effectively segmented customers based on


purchasing behavior.
- DBSCAN proved robust to outliers but required careful
parameter tuning.
- K-Means demonstrated simplicity and scalability, providing
well-defined clusters with optimal K values.
Recommendations

- For datasets with clear cluster densities, DBSCAN can be a


suitable choice.
- In scenarios where scalability and simplicity are vital, K-
Means can be preferred.
Future Work

- Experiment with other clustering algorithms like Hierarchical


Clustering or Gaussian Mixture Models.
- Incorporate additional features or external data sources for
more robust segmentation.

---

This report provides an overview of customer segmentation


using DBSCAN and K-Means algorithms, highlighting their
strengths, weaknesses, and comparative performance.

You might also like