0% found this document useful (0 votes)
8 views11 pages

K Means Clustering

The document explains K-Means Clustering, an unsupervised learning algorithm used to partition data into clusters based on similarity. It describes the process of customer segmentation in a retail store using features like annual income and spending habits, detailing the steps involved in the K-Means algorithm. Additionally, it includes a Python code example for implementing K-Means clustering and visualizing the results.

Uploaded by

jeyaboopathi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views11 pages

K Means Clustering

The document explains K-Means Clustering, an unsupervised learning algorithm used to partition data into clusters based on similarity. It describes the process of customer segmentation in a retail store using features like annual income and spending habits, detailing the steps involved in the K-Means algorithm. Additionally, it includes a Python code example for implementing K-Means clustering and visualizing the results.

Uploaded by

jeyaboopathi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 11

K – Means Clustering

Dr.J.Jeyaboopathiraja
Assistant Professor
Department of Computer Science
Sri Ramakrishna College of Arts & Science
Coimbatore
Machine Learning is broadly classified into Supervised and Unsupervised
Learning based on how the model learns from data.

Supervised Learning

🔹 Definition: The model is trained on labeled data, meaning each input has a
corresponding output.
🔹 Goal: Learn from the past data and make predictions on new data.
🔹 Types:
Classification (Predict categories) → e.g., Spam or Not Spam

Unsupervised Learning

🔹 Definition: The model is trained on unlabeled data (no predefined outputs).


🔹 Goal: Identify patterns, relationships, and hidden structures in data.
🔹 Types:
Clustering (Grouping similar data) → e.g., Customer Segmentation
K – Means Clustering

K-Means Clustering is a popular unsupervised learning algorithm


that partitions data into distinct clusters based on similarity.

Unsupervised learning is a type of machine learning that works with data


that has no labels or categories. The main goal is to find patterns and
relationships in the data without any guidance.

In this approach, the machine analyzes unorganized information and


groups it based on similarities, patterns, or differences.
Customer Segmentation in a Retail Store
A retail store wants to classify its customers into different groups based
on their annual income and spending habits.

Collect Data
Suppose we have a dataset with two key features for each customer:
Annual Income (in $1000s)
Spending Score (1–100) (how much they spend compared to their
income)
Initialize Centroids
Randomly select 3 points as initial centroids.
Let's assume:
Centroid 1: (15, 80)
Centroid 2: (40, 50)
Centroid 3: (80, 20)
Step 1: Choose the Number of Clusters (K)
The value of K (number of clusters) is chosen manually or using the
Elbow Method.
Here, we assume K = 3, meaning we divide customers into 3 groups.

Step 2: Initialize K Cluster Centroids


K-Means starts by selecting K random points from the dataset as initial
cluster centers.

Step 3: Assign Each Customer to the Nearest Centroid


Each customer is assigned to the nearest cluster using Euclidean distance

Step 4: Compute New Cluster Centroids


The centroid of each cluster is updated as the average of all customers in
that group.

Step 5: Repeat Steps 3-4 Until Clusters Stabilize


The algorithm stops when cluster centers no longer change.
Cluster 1: Budget Shoppers
Low income, high spending score
These customers spend a lot despite having a low income.
Example: Frequently shop for fashion but have limited earnings.

Cluster 2: Average Shoppers


Moderate income, moderate spending score
These customers spend proportionally to their earnings.
Example: Middle-class customers who buy regularly but not
excessively.

Cluster 3: Luxury Shoppers


High income, low spending score
These customers earn a lot but spend cautiously.
Example: Wealthy individuals who only buy high-end products
occasionally.
import numpy as np
import matplotlib.pyplot as plt from sklearn.cluster import KMeans

# Sample data: [Annual Income ($1000s), Spending Score (1-100)]


data = np.array([
[15, 80], [16, 75], [40, 50],
[42, 45], [80, 20], [85, 15]
])
# Apply K-Means clustering with 3 clusters
kmeans = KMeans(n_clusters=3, random_state=42, n_init=10)
kmeans.fit(data)
# Get cluster centers and labels
centroids = kmeans.cluster_centers_
labels = kmeans.labels_

# Plot the clusters


plt.scatter(data[:, 0], data[:, 1], c=labels, cmap='viridis', marker='o', edgecolors='k')
plt.scatter(centroids[:, 0], centroids[:, 1], c='red', marker='X', s=200, label='Centroids')
plt.xlabel("Annual Income ($1000s)")
plt.ylabel("Spending Score (1-100)")
plt.title("Customer Segmentation using K-Means Clustering")
plt.legend()
plt.show()
THANK YOU

You might also like