0% found this document useful (0 votes)
9 views3 pages

Kmean PGM

The document outlines a process for applying k-Means clustering to the Iris dataset, utilizing the 'cluster' and 'factoextra' libraries for analysis and visualization. It calculates the Silhouette Score to evaluate clustering quality, indicating reasonable performance with some overlap between clusters. The results are visualized through various plots, and a comparison with original species labels is provided to understand cluster assignments.

Uploaded by

Triveni Jayaram
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views3 pages

Kmean PGM

The document outlines a process for applying k-Means clustering to the Iris dataset, utilizing the 'cluster' and 'factoextra' libraries for analysis and visualization. It calculates the Silhouette Score to evaluate clustering quality, indicating reasonable performance with some overlap between clusters. The results are visualized through various plots, and a comparison with original species labels is provided to understand cluster assignments.

Uploaded by

Triveni Jayaram
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

# Load necessary libraries

#library(datasets) # For the iris dataset

library(cluster) # For kmeans

library(factoextra) # For visualization (optional, but recommended)

# Load the Iris dataset

data(iris)

# Apply k-Means clustering (assuming 3 clusters)

set.seed(42) # For reproducibility

kmeans_result <- kmeans(iris[, 1:4], 3) # Cluster based on the 4 features

# Evaluate clustering quality (Silhouette score)

silhouette_avg <- silhouette(kmeans_result$cluster, dist(iris[, 1:4]))

print("Silhouette Score:")

print(summary(silhouette_avg))

# Comment on clustering quality

cat("\nComment on Clustering Quality:\n")

cat("The Silhouette Score measures how similar a data point is to its own cluster compared
to other clusters. A score closer to +1 indicates better clustering, -1 indicates poor
clustering, and 0 indicates overlapping clusters.\n")

cat("The average Silhouette Score is", mean(silhouette_avg[,3]), ". This suggests a


reasonable clustering performance, although it's not exceptionally high. There is some
overlap between the clusters, which is expected with the Iris dataset.\n")

# Plot the clusters (using factoextra - highly recommended for k-means visualization)

fviz_cluster(kmeans_result, data = iris[, 1:4],

palette = c("#E41A1C", "#377EB8", "#4DAF4A"), # Color palette


geom = "point", # Show points

ellipse = TRUE, # Show ellipses around clusters

ggtheme = theme_bw() # Use a white background

# Plot clusters (base R - Sepal features)

plot(iris[, 1], iris[, 2], col = kmeans_result$cluster,

pch = 19, xlab = "Sepal Length", ylab = "Sepal Width",

main = "K-Means Clustering of Iris (Sepal Features)")

points(kmeans_result$centers[, 1], kmeans_result$centers[, 2],

col = 1:3, pch = 8, cex = 2) # Plot cluster centers

# Plot clusters (base R - Petal features)

plot(iris[, 3], iris[, 4], col = kmeans_result$cluster,

pch = 19, xlab = "Petal Length", ylab = "Petal Width",

main = "K-Means Clustering of Iris (Petal Features)")

points(kmeans_result$centers[, 3], kmeans_result$centers[, 4],

col = 1:3, pch = 8, cex = 2) # Plot cluster centers

# Compare with original labels (for understanding cluster assignments)

print("\nComparison with Original Labels:")

table(iris$Species, kmeans_result$cluster)

#Investigate which species are in which cluster

for (i in 1:3) { #loop through the 3 clusters


cluster_data <- iris[kmeans_result$cluster == i, ]

print(paste("\nCluster", i, ":"))

print(table(cluster_data$Species)) #show the counts of each species in this cluster

You might also like