Customer Categorization by Data Analysis Using Clustering Algorithms of Machine Learning

This document discusses using clustering algorithms for customer segmentation. It introduces k-means clustering, agglomerative clustering, and mean shift clustering. For k-means clustering, the elbow method is used to determine the optimal number of clusters. Both k-means and agglomerative clustering are applied to a retail customer dataset containing average visits and spending. Mean shift clustering is also applied to see different patterns. The analysis finds that k-means, agglomerative clustering, and mean shift clustering all effectively segment the customer data into clusters.

Uploaded by

monajigari vedhanth reddy

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views

Customer Categorization by Data Analysis Using Clustering Algorithms of Machine Learning

Uploaded by

monajigari vedhanth reddy

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 4

Customer Categorization by data analysis using

Clustering algorithms of Machine learning

INTRODUCTION:
As more and more business being coming up every day, it has become significantly
important for the old businesses to apply marketing strategies to stay in the market as
the competition has been cut to throat. As the customer base is increasing day by day
it has become challenging for the companies to cater to the needs of each and every
customer, this is where Data mining serves a very important role to unravel hidden
patterns stored in the company’s database. Customer segmentation is one of the
application of data mining which helps to segment the customers with similar patterns
into similar clusters hence, making easier for the business to handle the large customer
base. This segmentation can directly or indirectly influence the marketing strategy as
it opens many new paths to discover like for which segment the product will be good,
customising the marketing plans according to the each segment, providing discounts
for a specific segment, and decipher the customer and object relationship which has
been previously unknown to the company.
Customer segmentation allows companies to visualise what actually the customers are
buying which will prompt the companies to better serve their customers resulting in
customer satisfaction, it also allows the companies to find who their target customers
are and improvise their marketing tactics to generate more revenues from them.
Clustering has been proven effective to implement customer segmentation. Clustering
comes under unsupervised learning, having ability to find clusters over unlabelled
dataset. There are a number of clustering algorithm over which like k-means,
hierarchical clustering, DBSCAN clustering etc. In this paper, three different
clustering algorithms have been implemented over a dataset with two features with
200 records.

CLUSTERING TECHNIQUES:
*K-means Clustering:
It is the simplest algorithm of clustering based on partitioning principle. The
algorithm is sensitive to the initialization of the centroids position, the number of K
(centroids) is calculated by elbow method (discussed in later section), after calculation
of K centroids by the terms of Euclidean distance data points are assigned to the
closest centroid forming the cluster, after the cluster formation the barycentre’s are
once again calculated by the means of the cluster and this process is repeated until
there is no change in centroid position.
*Agglomerative Clustering-:
Agglomerative Clustering is based on forming a hierarchy represented by
dendrograms (discussed in later section). Dendrogram acts as memory for the
algorithm to tell about how the clusters are being formed. The clustering starts with
forming N clusters for N data points and then merging along the closest data points
together in each step such that the current step contains one cluster less than the
previous one.
*Mean shift Clustering-:
This clustering algorithm is a non-parametric iterative algorithm functions by
assuming the all the data points in the feature space as empirical probability density
function. The algorithm clusters each data point by allowing data point converge to a
region of local maxima which is achieved by fixing a window around each data point
finding the mean and then shifting the window to the mean and repeat the steps until
all the data point converges forming the clusters.
*Elbow Method-:
Elbow method is used for finding optimal value of K for K-means clustering
algorithm. This method works by finding the SSE of each data point with its nearest
centroid with different values of K. As value of K increases the SSE will decrease and
at a particular value of K where there is most decline in the SSE is the elbow, the
point at which we should stop dividing data further.

METHODOLOGY:
Data Collection:
The dataset has been taken from a local retail shop consisting of two features, average
number of visits to the shop and average amount of shopping done on yearly basis.
Feature Scaling:
The data has been scaled using Standard Scaler [9], by applying standard scaler the
data gets centred around 0 with standard deviation of 1.
X - mean(x)/stdev(x)
x = entry in a feature set xi ϵ X
mean (X) = mean of feature set X
stdev (X) = standard deviation of X
K means Clustering:
Choosing the optimal number of clusters:
Elbow method is applied to calculate value of K for the dataset.
Step-1: Run the algorithm for various values of k i.e making the k vary from 1 to 10.
Step-2: Calculate the within cluster squared error.
Step-3: Plot the calculated error, where a bent elbow like structure will form, will
give the optimal value of clusters.
Algorithm:
Step-1: Initialize the K (= 5) clusters.
Step-2: Assign the data point that is closest to any particular cluster.
Step-3: Recalculate the centroid position based on the mean of the cluster formed.
Step-4: Repeat step 2 and 3 until the centroid position remains unchanged in the
previous and current iteration.
Agglomerative Clustering:
Choosing the optimal number of clusters:
Cluster value for this algorithm have been calculated by the Dendrogram
Algorithm:
Step-1: Each data point is taken as to be a cluster.
Step-2: Merge the two closest cluster.
Step-3: Step 2 needs to be repeated until all the data points are merged together to
form a single cluster. However, as we have defined the value of K as 5, the algorithm
will stop when all the data points are part of any of the 5 clusters.
Mean Shift Clustering:
This non-parametric clustering method is being applied to see some different pattern
in a dataset as Kmeans and Agglomerative gave almost the same result. There is no
need of choosing the number of clusters. However, it needs one input parameter,
bandwidth (radius) which is calculated using K-nearest neighbour algorithm. This
algorithm follows an iterative approach where a point of local maxima is found
around each data point defined by probability density function, and iterates until when
all the data point converges up the hill (created by PDF), also known as ‘hill climbing
algorithm’.
Algorithm:
Step-1: A window is associated around each data point created by PDF.
Step-2: Mean around the window is calculated.
Step-3: Window is moved towards the newly calculated mean.
Step-4: Step 2 and 3 are repeated until when all the data points converge to a local
maxima resulting in clusters.

CONCLUSION:
In this data science project, we went through the customer segmentation model. We
developed this using a class of machine learning known as unsupervised learning.
Specifically, we made use of a clustering algorithm called K-means clustering. We
analyzed and visualized the data and then proceeded to implement our algorithm. In
this we have opted for internal clustering validation rather than external clustering
validation, which depends on some external data like labels. Internal cluster validation
can be used for choosing clustering algorithm which best suits the dataset and can
correctly cluster data into its opposite cluster.

Basics To Advanced ChatGPT, AI, Prompt Engineering, Excel, PowerBI, Tableau, SQL
50% (4)
Basics To Advanced ChatGPT, AI, Prompt Engineering, Excel, PowerBI, Tableau, SQL
42 pages
Airbus Automation Philosophy
No ratings yet
Airbus Automation Philosophy
36 pages
UNIT - 4 DWDM
No ratings yet
UNIT - 4 DWDM
27 pages
MINOR PROJECT
No ratings yet
MINOR PROJECT
10 pages
K Mean
No ratings yet
K Mean
7 pages
Text Analytics Unit-3
No ratings yet
Text Analytics Unit-3
11 pages
3. Chapter 5 CLUSTERING
No ratings yet
3. Chapter 5 CLUSTERING
36 pages
Facebook Live Seller
No ratings yet
Facebook Live Seller
8 pages
K-Means Clustering
No ratings yet
K-Means Clustering
8 pages
21AI71-module-5-textbook
No ratings yet
21AI71-module-5-textbook
25 pages
LP I Assignment A4 Clustering
No ratings yet
LP I Assignment A4 Clustering
13 pages
unsupervised learning
No ratings yet
unsupervised learning
23 pages
Artificial Intelligence Report
No ratings yet
Artificial Intelligence Report
23 pages
Data Mining Business Report Set
No ratings yet
Data Mining Business Report Set
12 pages
Unsupervised Learning Final
No ratings yet
Unsupervised Learning Final
17 pages
som-new
No ratings yet
som-new
21 pages
PeerEval Unsupervised
No ratings yet
PeerEval Unsupervised
6 pages
U1 - KMeans - 5th Sem - DS
No ratings yet
U1 - KMeans - 5th Sem - DS
14 pages
Peer Eval
No ratings yet
Peer Eval
6 pages
ML extended
No ratings yet
ML extended
25 pages
JNTUK R20 B.Tech CSE 3-2 Machine Learning Unit 4 Notes
No ratings yet
JNTUK R20 B.Tech CSE 3-2 Machine Learning Unit 4 Notes
23 pages
Working of K Means Algorithm - YashBhure
No ratings yet
Working of K Means Algorithm - YashBhure
14 pages
Unit-4
No ratings yet
Unit-4
46 pages
Dynamic Approach To K-Means Clustering Algorithm-2
No ratings yet
Dynamic Approach To K-Means Clustering Algorithm-2
16 pages
Zara
No ratings yet
Zara
47 pages
A Novel Approach of Implementing An Optimal K-Means Plus Plus Algorithm For Scalar Data
No ratings yet
A Novel Approach of Implementing An Optimal K-Means Plus Plus Algorithm For Scalar Data
6 pages
K-Means Clustering
No ratings yet
K-Means Clustering
5 pages
K-MEANS-FINAL
No ratings yet
K-MEANS-FINAL
10 pages
ML Unit 5
No ratings yet
ML Unit 5
50 pages
Data Science Analysis Final Project
No ratings yet
Data Science Analysis Final Project
10 pages
DMBI5
No ratings yet
DMBI5
9 pages
ML - Machine Learning PDF
No ratings yet
ML - Machine Learning PDF
13 pages
Hierarchical Clustering: Required Data
No ratings yet
Hierarchical Clustering: Required Data
6 pages
K Means Clustering
No ratings yet
K Means Clustering
6 pages
DSUP_Exp5[1]
No ratings yet
DSUP_Exp5[1]
7 pages
DWDM Unit5
No ratings yet
DWDM Unit5
14 pages
KNN VS Kmeans
No ratings yet
KNN VS Kmeans
3 pages
Machine Learning Theory
100% (1)
Machine Learning Theory
12 pages
Assignment 5
No ratings yet
Assignment 5
3 pages
Simple K Means
No ratings yet
Simple K Means
3 pages
ML - Unit - 2
No ratings yet
ML - Unit - 2
13 pages
Unit 7 Clustering (P) (1) (1)
No ratings yet
Unit 7 Clustering (P) (1) (1)
22 pages
S-6
No ratings yet
S-6
5 pages
Unit 4
No ratings yet
Unit 4
4 pages
Presentation 1
No ratings yet
Presentation 1
47 pages
Unit-Iv Material
No ratings yet
Unit-Iv Material
24 pages
Assignment 4 A
No ratings yet
Assignment 4 A
15 pages
Complete Referenec of Sementics
No ratings yet
Complete Referenec of Sementics
6 pages
DOC-20231118-WA0008new Unit 5
No ratings yet
DOC-20231118-WA0008new Unit 5
15 pages
Credit Card Usage Analysis Using KMeans Clustering Report
No ratings yet
Credit Card Usage Analysis Using KMeans Clustering Report
16 pages
Fundamentals of Data Science Unit 3
No ratings yet
Fundamentals of Data Science Unit 3
15 pages
EML %th Module
No ratings yet
EML %th Module
40 pages
K Means Clustering
No ratings yet
K Means Clustering
22 pages
ML UNIT 4 Sir
No ratings yet
ML UNIT 4 Sir
42 pages
Task 22
No ratings yet
Task 22
5 pages
K_Means_Clustering_Report
No ratings yet
K_Means_Clustering_Report
3 pages
AppliedML-Chap1-Clustering
No ratings yet
AppliedML-Chap1-Clustering
37 pages
CV UNIT 4
No ratings yet
CV UNIT 4
60 pages
Data Mining Graded Assignment: Problem 1: Clustering Analysis
100% (3)
Data Mining Graded Assignment: Problem 1: Clustering Analysis
39 pages
AP1
No ratings yet
AP1
6 pages
K-Means_Clustering_Report
No ratings yet
K-Means_Clustering_Report
2 pages
The Secret Of Machine Learning
From Everand
The Secret Of Machine Learning
Mhd Arjunanta
No ratings yet
Task-3 Help
No ratings yet
Task-3 Help
4 pages
Untitled Spreadsheet - Sheet
No ratings yet
Untitled Spreadsheet - Sheet
16 pages
BMS Join Our BMS DD Final
No ratings yet
BMS Join Our BMS DD Final
5 pages
BMS FACTSHEET Final
No ratings yet
BMS FACTSHEET Final
5 pages
Sop Uc
No ratings yet
Sop Uc
2 pages
PPS Mid 2 Complete Notes
No ratings yet
PPS Mid 2 Complete Notes
95 pages
Iys 2019 Deck
No ratings yet
Iys 2019 Deck
15 pages
Dynamic Programming
0% (1)
Dynamic Programming
51 pages
UCM630xA Usermanual
No ratings yet
UCM630xA Usermanual
553 pages
3shape PC Catalog 24.3
No ratings yet
3shape PC Catalog 24.3
11 pages
Paper 1
No ratings yet
Paper 1
12 pages
LE4
No ratings yet
LE4
6 pages
Notes On Algorithm
No ratings yet
Notes On Algorithm
34 pages
Acceldata Pulse Slides
No ratings yet
Acceldata Pulse Slides
23 pages
890GX Extreme4: User Manual
No ratings yet
890GX Extreme4: User Manual
72 pages
J2EE Ashok PDF
100% (1)
J2EE Ashok PDF
189 pages
Oklahoma State University Dissertation Template
100% (2)
Oklahoma State University Dissertation Template
7 pages
Topology Optimization Based on the on OFF Method for Synchronous Motor
No ratings yet
Topology Optimization Based on the on OFF Method for Synchronous Motor
4 pages
Nato Joint Advanced Distributed Learning: Online Course Catalogue
No ratings yet
Nato Joint Advanced Distributed Learning: Online Course Catalogue
47 pages
Test Method RC 37401 Polished Stone Value PDF
No ratings yet
Test Method RC 37401 Polished Stone Value PDF
1 page
Introduction To Injection Mold Design
No ratings yet
Introduction To Injection Mold Design
78 pages
Product Range Brochure 2023 - EN - HORIBA Medical
No ratings yet
Product Range Brochure 2023 - EN - HORIBA Medical
17 pages
Development of Protection Scheme For NCITs Based On IEC 61850-9-2 - MSc. Thesis Victoria Univ. Australia
No ratings yet
Development of Protection Scheme For NCITs Based On IEC 61850-9-2 - MSc. Thesis Victoria Univ. Australia
106 pages
Jtag Ijtag Semiconductor and Board Test Security Ebook
No ratings yet
Jtag Ijtag Semiconductor and Board Test Security Ebook
25 pages
Assignment 1
No ratings yet
Assignment 1
2 pages
The Evolution of Communication: LS Tud Ies
No ratings yet
The Evolution of Communication: LS Tud Ies
15 pages
Bettis Multiport Flow Selector Flier
No ratings yet
Bettis Multiport Flow Selector Flier
6 pages
ET - Getting Started Guide
91% (22)
ET - Getting Started Guide
65 pages
Detonation Sensing Module Filter P - N E740401 by Waukesha Repair at Synchronics Electronics PVT
100% (2)
Detonation Sensing Module Filter P - N E740401 by Waukesha Repair at Synchronics Electronics PVT
2 pages
DU Master of Library N Information Science
No ratings yet
DU Master of Library N Information Science
15 pages
Movie Recommender System Using Content Based AndCollaborative Filtering
No ratings yet
Movie Recommender System Using Content Based AndCollaborative Filtering
7 pages
Adventcalendar 2018
No ratings yet
Adventcalendar 2018
15 pages
Spectrum Design Brief Presentation Fevrier 2024 For Customers
No ratings yet
Spectrum Design Brief Presentation Fevrier 2024 For Customers
8 pages
Service Tool: Manual 26250 L-Series Integrated Speed Control
No ratings yet
Service Tool: Manual 26250 L-Series Integrated Speed Control
2 pages