0% found this document useful (0 votes)

5 views35 pages

Clusturing Algorithms For Customer Segmentation

This document discusses customer segmentation using clustering algorithms, specifically k-means, k-means++, and mini-batch k-means, to improve targeted marketing strategies and customer experience. The performance of these algorithms is evaluated based on silhouette score, computational efficiency, and adjusted Rand index (ARI) to determine the most effective method for segmenting customers. The project aims to provide insights into customer behavior, enabling businesses to make data-driven decisions for enhanced marketing efforts.

Uploaded by

jagriti

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views35 pages

Clusturing Algorithms For Customer Segmentation

Uploaded by

jagriti

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 35

Submitted to:

Artificial Neural Networks

Submitted by:
Jagriti Lakher
Date: 31thMarch 2023
Abstract
Customer segmentation is the process of dividing a customer base into groups of
individuals with similar characteristics to create targeted marketing strategies and
improve customer experience. Clustering algorithm is a type of unsupervised machine
learning algorithm that groups a set of data into clusters based on similarities. Clustering
algorithms play a crucial role in customer segmentation by identifying groups of
customers who share similar characteristics or behavior. It helps businesses to better
understand their customer base and make data-driven decisions about product
development, pricing, and customer service. Through Customer segmentation, a business
can identify areas for improvement, track trends and optimize their operations to better
serve their customers. Out of many clustering algorithms, three representative algorithms
namely k-means, k-means++, and mini-batch k-means are used. The performance of the
three algorithms will be compared on the basis of silhouette score, computational
efficiency, and adjusted rand index (ARI) parameters. Overall, this project has the
potential to provide valuable insights into customer behavior and preferences leading to
more effective business strategies.

Keywords: Adjusted Rand Index (ARI), K-means, K-means++, Mini-batch K-means,

Silhouette score.

1. Table of Content

i
s
Abstract.................................................................................................................................i

Table of Contents.................................................................................................................ii

List of Abbreviation............................................................................................................iii

List of Figures......................................................................................................................v

CHAPTER-1........................................................................................................................1

1. Introduction...............................................................................................................1

1.1 Introduction to Customer Segmentation................................................................1

1.2 Problem Statement.................................................................................................2

1.3 Objectives..............................................................................................................2

CHAPTER-2........................................................................................................................3

2. Literature Review......................................................................................................3

CHAPTER-3........................................................................................................................5

3. Methodology.............................................................................................................5

3.1 Data Preprocessing................................................................................................6

3.2 Dataset Collection.................................................................................................6

3.3 Hardware and Software Requirements..................................................................6

3.3.1 Hardware Requirements..........................................................................................

3.3.2 Software Requirements...........................................................................................

3.4 Analysis Parameter................................................................................................7

3.4.1 The silhouette scores...............................................................................................

3.4.2 The Adjusted Rand index (ARI).............................................................................

3.4.3 The within-cluster sum of squares...........................................................................

CHAPTER-4........................................................................................................................9

4. Implementation:........................................................................................................9

4.1 Algorithms.............................................................................................................9

4.1.1 K- Means.................................................................................................................

ii
4.1.2 K- Means++...........................................................................................................

4.1.3 Mini Batch K-means.............................................................................................

CHAPTER-5......................................................................................................................13

5. Result Analysis and Comparison............................................................................13

5.1 Performance comparison on first dataset............................................................13

5.2 Performance comparison on second dataset........................................................16

5.3 Performance comparison on third dataset...........................................................19

5.4 Overall Performance Comparison.......................................................................22

CHAPTER-6......................................................................................................................23

6. Conclusion and Future Works.................................................................................23

6.1 Conclusion...........................................................................................................23

6.2 Future Works.......................................................................................................23

7. References...............................................................................................................24

8. Annex......................................................................................................................25

8.1 Source code and function....................................................................................25

8.1.1 Source code for k-means algorithm.......................................................................

8.1.2 Source code for k-means++ algorithm..................................................................

8.1.3 Source code for mini batch algorithm...................................................................

List of Abbreviation

iii
Abbreviation Definition
B2C Business to customer
B2B Business to business
CLR Cyclical Learning Rate
HDD Hard Disk Drive
IDE Integrated Development Environment
RAM Random Access Memory
RI Random Index
SGD Stochastic Gradient Descent
WCSS Within-cluster sum of squares

iv
List of Figures
FIGURE 1: CONCEPTUAL FRAMEWORK FOR CLASSIFICATION................................................................
FIGURE 2: NUMBER OF MALES AND FEMALES (500 DATA SETS).........................................................
FIGURE 3: DISTPLOT OF AGE, ANNUAL INCOME AND SPENDING SCORE (500 DATA SETS)...................
FIGURE 4: CLUSTERING USING K MEANS (500 DATA SETS)..................................................................
FIGURE 5: CLUSTERING USING K MEANS++ (500 DATA SETS).............................................................
FIGURE 6: CLUSTERING USING MINI BATCH K MEANS (500 DATA SET WITH 250 BATCH