DWM Exp4
DWM Exp4
Introduction:
Procedure:
Program Codes:
for i in range(1,11):
kmeans = KMeans(n_clusters=i)
kmeans.fit(data)
inertias.append(kmeans.inertia_)
kmeans = KMeans(n_clusters=2)
kmeans.fit(data)
plt.scatter(x, y, c=kmeans.labels_)
plt.show()
X = np.random.rand(100, 2)
kmeans = KMeans(n_clusters=3)
kmeans.fit(X)
labels = kmeans.labels_
centroids = kmeans.cluster_centers_
# Agglomerative Clustering
agg_clustering = AgglomerativeClustering(n_clusters=3)
agg_labels = agg_clustering.fit_predict(X)
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_blobs
from sklearn.cluster import KMeans, AgglomerativeClustering
from sklearn.preprocessing import StandardScaler
# K-Means Clustering
kmeans = KMeans(n_clusters=4, random_state=42)
kmeans_labels = kmeans.fit_predict(X)
# Agglomerative Clustering
agglo = AgglomerativeClustering(n_clusters=4)
agglo_labels = agglo.fit_predict(X)
# Plot results
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 5))
ax1.scatter(X[:, 0], X[:, 1], c=kmeans_labels, cmap='viridis', marker='o')
ax1.set_title("K-Means Clustering")
ax2.scatter(X[:, 0], X[:, 1], c=agglo_labels, cmap='plasma', marker='o')
ax2.set_title("Agglomerative Clustering")
plt.show()
Implementation/Output snapshot:
Conclusion: K-means and Agglomerative clustering effectively group data
points into clusters. K-means is computationally efficient but requires
specifying the number of clusters, while Agglomerative clustering is
hierarchical and does not require pre-specifying cluster numbers but can be
computationally expensive.
Review Questions:
Answer: K-means is an unsupervised machine learning algorithm used for clustering data into
KKK groups. It operates through the following steps:
Answer: The ideal number of clusters can be identified using the following methods:
Elbow Method: Plot the within-cluster sum of squares (WCSS) against the number of
clusters and find the "elbow point" where the rate of decrease slows.
Silhouette Score: Evaluates how well-separated clusters are, with higher scores
indicating better-defined clusters.
Gap Statistic: Compares clustering performance against a random reference dataset to
determine the most suitable number of clusters.