0% found this document useful (0 votes)
11 views9 pages

DWM Exp4

The document outlines an experiment on implementing clustering algorithms, specifically K-means and Agglomerative clustering, using Python. It includes a detailed procedure for loading and preprocessing data, applying the algorithms, evaluating clusters, and visualizing results. The conclusion highlights the strengths and weaknesses of both algorithms, and review questions address key concepts related to K-means clustering and distance metrics used in Agglomerative clustering.

Uploaded by

Mayank vora
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views9 pages

DWM Exp4

The document outlines an experiment on implementing clustering algorithms, specifically K-means and Agglomerative clustering, using Python. It includes a detailed procedure for loading and preprocessing data, applying the algorithms, evaluating clusters, and visualizing results. The conclusion highlights the strengths and weaknesses of both algorithms, and review questions address key concepts related to K-means clustering and distance metrics used in Agglomerative clustering.

Uploaded by

Mayank vora
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

DWM Exp.

Name – Mayank Vora


Class - TE-03 / 63
Batch - C

Aim: Implementation of Clustering Algorithm (K-means / Agglomerative)


using Python

Introduction:

Clustering is an unsupervised machine learning technique used to group


similar data points together. K-means clustering partitions the dataset into
K clusters by minimizing intra-cluster variance, whereas Agglomerative
clustering follows a hierarchical approach by merging or splitting clusters
based on distance metrics.

Procedure:

1. Load the dataset.


2. Preprocess the data (if necessary).
3. Apply the K-means and Agglomerative clustering algorithms.
4. Evaluate the clusters formed.
5. Visualize the results.

Program Codes:

import matplotlib.pyplot as plt


from sklearn.cluster import KMeans, AgglomerativeClustering
import pandas as pd
import numpy as np

# K means algorithm on predefined data values.


x = [4, 5, 10, 4, 3, 11, 14 , 6, 10, 12]
y = [21, 19, 24, 17, 16, 25, 24, 22, 21, 21]
plt.scatter(x, y)
plt.show()

data = list(zip(x, y))


inertias = []

for i in range(1,11):
kmeans = KMeans(n_clusters=i)
kmeans.fit(data)
inertias.append(kmeans.inertia_)

plt.plot(range(1,11), inertias, marker='o')


plt.title('Elbow method')
plt.xlabel('Number of clusters')
plt.ylabel('Inertia')
plt.show()

kmeans = KMeans(n_clusters=2)
kmeans.fit(data)

plt.scatter(x, y, c=kmeans.labels_)
plt.show()

X = np.random.rand(100, 2)

kmeans = KMeans(n_clusters=3)
kmeans.fit(X)
labels = kmeans.labels_
centroids = kmeans.cluster_centers_

# Visualize the clusters


plt.scatter(X[:, 0], X[:, 1], c=labels)
plt.scatter(centroids[:, 0], centroids[:, 1], marker='x', s=200,
linewidths=3, color='r')
plt.title('K-means Clustering')
plt.xlabel('X')
plt.ylabel('Y')
plt.show()

# Agglomerative Clustering
agg_clustering = AgglomerativeClustering(n_clusters=3)
agg_labels = agg_clustering.fit_predict(X)

# Visualize the agglomerative clustering results


plt.scatter(X[:, 0], X[:, 1], c=agg_labels)
plt.title('Agglomerative Clustering')
plt.xlabel('X')
plt.ylabel('Y')
plt.show()

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_blobs
from sklearn.cluster import KMeans, AgglomerativeClustering
from sklearn.preprocessing import StandardScaler

X, _ = make_blobs(n_samples=300, centers=4, cluster_std=1.0,


random_state=42)
X = StandardScaler().fit_transform(X)

# K-Means Clustering
kmeans = KMeans(n_clusters=4, random_state=42)
kmeans_labels = kmeans.fit_predict(X)

# Agglomerative Clustering
agglo = AgglomerativeClustering(n_clusters=4)
agglo_labels = agglo.fit_predict(X)

# Plot results
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 5))
ax1.scatter(X[:, 0], X[:, 1], c=kmeans_labels, cmap='viridis', marker='o')
ax1.set_title("K-Means Clustering")
ax2.scatter(X[:, 0], X[:, 1], c=agglo_labels, cmap='plasma', marker='o')
ax2.set_title("Agglomerative Clustering")
plt.show()
Implementation/Output snapshot:
Conclusion: K-means and Agglomerative clustering effectively group data
points into clusters. K-means is computationally efficient but requires
specifying the number of clusters, while Agglomerative clustering is
hierarchical and does not require pre-specifying cluster numbers but can be
computationally expensive.

Review Questions:

1. What is the K-means clustering algorithm, and how does it work?

Answer: K-means is an unsupervised machine learning algorithm used for clustering data into
KKK groups. It operates through the following steps:

 Randomly initialize KKK cluster centroids.


 Assign each data point to the closest centroid.
 Update each centroid by computing the mean of all points assigned to it.
 Repeat the assignment and centroid update process until the centroids remain unchanged
or a stopping condition is met.

2. How do you determine the optimal number of clusters in K-means?

Answer: The ideal number of clusters can be identified using the following methods:

 Elbow Method: Plot the within-cluster sum of squares (WCSS) against the number of
clusters and find the "elbow point" where the rate of decrease slows.
 Silhouette Score: Evaluates how well-separated clusters are, with higher scores
indicating better-defined clusters.
 Gap Statistic: Compares clustering performance against a random reference dataset to
determine the most suitable number of clusters.

3. What are the common distance metrics used in Agglomerative


Clustering?

Answer: Some widely used distance metrics include:

 Euclidean Distance (default): Measures the straight-line distance between points.


 Manhattan Distance: Computes distance based on grid-like paths, summing absolute
differences between coordinates.
 Cosine Similarity: Evaluates the cosine of the angle between vectors to measure
similarity rather than direct distance.

You might also like