0% found this document useful (0 votes)
15 views3 pages

Assignment 4 28855

The document discusses using k-means clustering to group customers based on annual income and spending score. It provides an example of applying the k-means algorithm over multiple iterations to assign customers to clusters. Evaluation metrics like within-cluster sum of squares and silhouette score are used to analyze the quality of the resulting clusters. The goal is to identify stable, well-defined clusters after several iterations as the clustering process converges.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views3 pages

Assignment 4 28855

The document discusses using k-means clustering to group customers based on annual income and spending score. It provides an example of applying the k-means algorithm over multiple iterations to assign customers to clusters. Evaluation metrics like within-cluster sum of squares and silhouette score are used to analyze the quality of the resulting clusters. The goal is to identify stable, well-defined clusters after several iterations as the clustering process converges.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Name Abbiha Mustafa

SAP 28855
Subject Artificial Intelligence
Assignment 4

Unsupervised Learning
1. Use k-means clustering algorithm to group/cluster items of your
choice.
I. K-Means Clustering Example:
Consider a simple example of clustering customers based on their purchase behavior. We'll use
two features: "Annual Income" and "Spending Score."

II. Distance Formula:


The Euclidean distance formula is commonly used for k-means clustering:

Distance=√(𝑋2 − 𝑋1 )2 + (𝑌2 − 𝑌1 )2

2. Iterations and Table:


Initial Data:

Customer Annual Income ($) Spending Score (1-100)


A 45,000 25
B 60,000 50
C 30,000 15
D 80,000 75
E 50,000 40
F 150,000 85
G 120,000 90

Iteration 1:
• Assume initial centroids (cluster centers).
• Assign each point to the nearest centroid.
• Recalculate centroids based on the assigned points.

Customer Distance to Cluster 1 Distance to Cluster 2


Assigned Cluster
A 10,000 50,000 1

B 5,000 45,000 1

C 20,000 30,000 2

D 70,000 20,000 2

E 20,000 30,000 2

F 105,000 50,000 2

G 75,000 5,000 1

Iteration 2:
• Use updated centroids from Iteration 1.
• Reassign points based on new centroids.
• Recalculate centroids.

Customer Distance to Cluster 1 Distance to Cluster 2 Assigned Cluster

A 5,000 70,000 1

B 0 65,000 1

C 25,000 10,000 2

D 75,000 30,000 2

E 25,000 10,000 2

F 120,000 55,000 2

G 90,000 0 1
3. Evaluation:
To evaluate the resulting clusters, we can use metrics like the within-cluster sum of squares
(WCSS) or silhouette score:
• WCSS: Measure the sum of squared distances within each cluster. A lower WCSS
indicates denser and more compact clusters.
• Silhouette Score: Measure how similar an object is to its cluster compared to other
clusters. The score ranges from -1 to 1, and higher values indicate better-defined
clusters.

4. Conclusion:
After a few iterations, if the WCSS stops decreasing significantly and the silhouette score
becomes stable or increases, it suggests that the clusters are becoming more stable and well-
defined.

You might also like