0% found this document useful (0 votes)
33 views8 pages

Experiment 3.1 K-Mean

The document discusses implementing K-Means clustering. It shows how to identify clusters in 1D and 2D data using scikit-learn KMeans. It generates scatter plots to visualize clustering for different numbers of clusters on randomly generated data.

Uploaded by

Arslan Mansoori
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views8 pages

Experiment 3.1 K-Mean

The document discusses implementing K-Means clustering. It shows how to identify clusters in 1D and 2D data using scikit-learn KMeans. It generates scatter plots to visualize clustering for different numbers of clusters on randomly generated data.

Uploaded by

Arslan Mansoori
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

EXPERIMENT 9

Aim: Implementation of K-Mean Clustering

COURSE OUTCOMES

CO4 Evaluate machine learning model’s performance and apply learning strategy to
improve the performance of supervised and unsupervised learning model.

CO5 Develop a suitable model for supervised and unsupervised learning algorithm and
optimize the model on the expected accuracy.

K Means Clustering
In this model Data is divided into clusters on the basis of nearest mean to each cluster.
1. Identify 2 groups in 1D Array
from sklearn.cluster import KMeans
import numpy as np

data = np.array([1,2,3,4,5,6,7,8,9,10,91,92,93,94,95,96,97,98,99,100])

kmeans = KMeans(n_clusters=2).fit(data.reshape(-1,1))
kmeans.predict(data.reshape(-1,1))

1. Identify 5 groups in 1D Array


from sklearn.cluster import KMeans
import numpy as np
data = np.array([101, 107, 106, 199, 204, 205, 207, 306, 310, 312, 312, 314, 317, 318, 380, 377,
379, 382, 466, 469, 471, 472, 557, 559, 562, 566, 569])

kmeans = KMeans(n_clusters=5).fit(data.reshape(-1,1))
kmeans.predict(data.reshape(-1,1))

2. Identify 2 groups in 2 D Array


from sklearn.cluster import KMeans
import numpy as np
X = np.array([[1, 2], [1, 4], [1, 0], [10, 2], [10, 4], [10, 0]])
kmeans = KMeans(n_clusters=2, random_state=0).fit(X)
kmeans.predict([[0, 0], [12, 3]])
kmeans.predict([[11,11], [8, 9]])
kmeans.predict([[2,20], [4, 4]])
Explanation:
1 2
1 4
1 0
10 2
10 4
10 0
Ans is [1,0]
[0,0] will be predicted in Column No 1
[12,3] will be predicted in Column No 0

Similarly check [11,11] [8,9] it must come in [0,0]


And Check[2,2][4,4] it must come in [1,1]

3. Plotting K means cluster for 2D Group for 2 Clusters


from sklearn.cluster import KMeans
import numpy as np
X = np.array([[1, 2], [1, 4], [1, 0], [10, 2], [10, 4], [10, 0]])
kmeans = KMeans(n_clusters=2, random_state=0).fit(X)
y_predict= kmeans.fit_predict(X)
#kmeans.predict([[0, 0], [12, 3]])

import matplotlib.pyplot as mtp

mtp.scatter(X[y_predict == 0, 0], X[y_predict == 0, 1], s = 100, c = 'blue', label = 'Cluster 1')


#for first cluster
mtp.scatter(X[y_predict == 1, 0], X[y_predict == 1, 1], s = 100, c = 'green', label = 'Cluster 2')
#for second cluster
mtp.xlim(0,10)
mtp.ylim(0,10)
mtp.show()

4. Plot a scatter Chart for 300 random numbers


%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns; sns.set() # for plot styling
import numpy as np
from sklearn.datasets import make_blobs
X, y_true = make_blobs(n_samples=300, centers=4,
cluster_std=0.60, random_state=0)
plt.scatter(X[:, 0], X[:, 1], s=50);
# The scatter() function plots one dot for each observation. It needs two arrays of the same
length, one for the values of the x-axis, and one for values on the y-axis.
# Using : means that we take all elements in the correspond array dimension.
# s tells the size of the marker. (This is the size of the marker)

Now seeing this chart we can identify that there are 4 different clusters.
The k-means algorithm does this automatically, and in Scikit-Learn uses the typical estimator
API:

from sklearn.cluster import KMeans


kmeans = KMeans(n_clusters=4)
kmeans.fit(X)
y_kmeans = kmeans.predict(X)
plt.scatter(X[:, 0], X[:, 1], c=y_kmeans, s=50, cmap='viridis')
centers = kmeans.cluster_centers_
plt.scatter(centers[:, 0], centers[:, 1], c='black', s=200, alpha=0.5);

5. Plot a scatter Chart for 300 random numbers (For the same data increase the clusters to 5
say)
from sklearn.cluster import KMeans
kmeans = KMeans(n_clusters=5)
kmeans.fit(X)
y_kmeans = kmeans.predict(X)

plt.scatter(X[:, 0], X[:, 1], c=y_kmeans, s=50, cmap='viridis')


centers = kmeans.cluster_centers_
plt.scatter(centers[:, 0], centers[:, 1], c='black', s=200, alpha=0.5);

Figure 30: 5 Clusters


6. Plot a scatter Chart for 300 random numbers (For the same data increase the clusters to 6
say)

from sklearn.cluster import KMeans


kmeans = KMeans(n_clusters=6)
kmeans.fit(X)
y_kmeans = kmeans.predict(X)

plt.scatter(X[:, 0], X[:, 1], c=y_kmeans, s=50, cmap='viridis')


centers = kmeans.cluster_centers_
plt.scatter(centers[:, 0], centers[:, 1], c='black', s=200, alpha=0.5);

Figure 31: 6 Clusters


Similarly do the same for 7 Clusters and 8 Clusters

Figure 32: 7 Clusters

Figure 33: 12 Clusters

Viva Questions
1. What is the main difference between k-Means and k-Nearest Neighbours?
2. How is Entropy used as a Clustering Validation Measure?
3. How to determine k using the Elbow Method?
4. What is the difference between Classical k-Means and Spherical k-Means?
5. What is the difference between k-Means and k-Medians and when would you use one
over another?

You might also like