0% found this document useful (0 votes)
49 views1 page

Customer Spent Analysis Using K-Means Clustering

The document discusses segmenting customer spending data into clusters using k-means clustering. It describes the steps of the k-means clustering algorithm, including initializing centroids, assigning data points to the closest centroid, recalculating centroids, and repeating until centroids are stable. It also discusses using the elbow method to determine the optimal number of clusters by analyzing the within cluster sum of squares at different values of k.

Uploaded by

Vicky Nagar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views1 page

Customer Spent Analysis Using K-Means Clustering

The document discusses segmenting customer spending data into clusters using k-means clustering. It describes the steps of the k-means clustering algorithm, including initializing centroids, assigning data points to the closest centroid, recalculating centroids, and repeating until centroids are stable. It also discusses using the elbow method to determine the optimal number of clusters by analyzing the within cluster sum of squares at different values of k.

Uploaded by

Vicky Nagar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1

To Group

Finding the Problem - Categorizing the information based on


1 Application Amount spent

Input: Amount Spent Data

Amount spent analysis and segregate


2 Collecting Dataset data as different category

Pandas - Load CSV Format Dataset

dataset = pandas.read_csv('dataset.csv')

No. of
Rows &
Load Dataset from the directory & Columns
Summarize the details such as no. of rows
3 Load, Summarize and Columns & Content
dataset.shape

Display 1st 5
rows of dataset

dataset.head(5)

Income = dataset['INCOME'].values
Spend = dataset['SPEND'].values
4 Segregating & Zipping Dataset Numpy X = np.array(list(zip(Income, Spend)))

It is an iterative algorithm that divides the


unlabeled dataset into k different clusters
in such a way that each dataset belongs
Definition only one group that has similar properties

Select the number K to decide the


1 number of clusters

2 Select random points or centroids

5 Algorithm K-Means Clustering

Calculate the distance between two


points. So, we will draw a median
3 between both the centroids

Customer spent
analysis using K- To find the closest cluster, so we will
repeat the process by choosing a new
Means Clustering 4 centroid

Steps

Reassign each datapoint to the new


5 centroid

As reassignment has taken place, so we


will again go to the step-4, which is
6 finding new centroids or K-points

As we got the new centroids so again will


draw the median line and reassign the
7 data points

There are no dissimilar data points on


either side of the line, which means our
8 model is formed

This method uses the concept of WCSS


value. Within Cluster Sum of Squares (
WCSS )

It is calculated by measuring the distance


6 Finding best K Value Elbow Method between each data point and its centroid,
squaring this distance, and summing
these squares across one cluster.

Inertia
A good model is one with low inertia AND
a low number of clusters ( K )

Fitting Model to Optimized K-


7 Value

8 Visualizing Clustered Result

You might also like