AIM: To Cluster the target variable using the K-means clustering algorithm
ALGORITHM:
1. Load the Iris Dataset:
Load the built-in Iris dataset using the data(iris) function. This dataset contains
measurements of Sepal Length, Sepal Width, Petal Length, and Petal Width for three
species of iris flowers.
2. Specify the Number of Clusters:
Define the number of clusters ( num_clusters) you want to create. In this example,
num_clusters is set to 3.
3. Select Variables for Clustering:
Choose the variables for clustering. In this code, we select "Sepal.Length" and
"Sepal.Width" as the variables for clustering and store them in the selected_vars
variable.
4. Perform K-means Clustering:
Use the kmeans() function to perform K-means clustering on the selected variables.
Pass the selected variables ( selected_vars) and the number of clusters
(num_clusters) as arguments to the kmeans() function.
The result is stored in the kmeans_result object.
5. Print Cluster Assignments:
Retrieve the cluster assignments for each data point using
kmeans_result$cluster.
Print the cluster assignments to the console.
6. Print Cluster Centers:
Retrieve the coordinates of the cluster centers using kmeans_result$centers.
Print the cluster centers to the console.
7. Visualize the Clusters:
Create a scatterplot to visualize the clusters based on the selected variables (Sepal
Length vs. Sepal Width).
Color data points according to their cluster assignments ( col =
cluster_assignments).
Plot cluster centers with different symbols ( points(cluster_centers, col =
1:num_clusters, pch = 8, cex = 2)).
8. End of Program:
The program execution completes after displaying the cluster assignments, cluster
centers, and the cluster visualization.
PROGRAM :
# Load the Iris dataset (built-in dataset)
data(iris)
# Specify the number of clusters (you can change this)
num_clusters <- 3
# Select the variables for clustering (Sepal Length and Sepal Width)
selected_vars <- iris[, c("Sepal.Length", "Sepal.Width")]
# Perform K-means clustering
kmeans_result <- kmeans(selected_vars, centers = num_clusters)
# Print cluster assignments for each data point
cluster_assignments <- kmeans_result$cluster
print(cluster_assignments)
# Print the coordinates of cluster centers
cluster_centers <- kmeans_result$centers
print(cluster_centers)
# Visualize the clusters (scatterplot for Sepal Length vs. Sepal Width)
plot(selected_vars, col = cluster_assignments, pch = 19, main = "K-means Clustering (Iris
Dataset)")
points(cluster_centers, col = 1:num_clusters, pch = 8, cex = 2)
OUTPUT