3/12/24, 2:43 PM Tutorial for K Means Clustering in Python Sklearn - MLK - Machine Learning Know ledge
principal component 1 principal component 2
0 -0.192221 0.319683
1 -0.458175 -0.018152
2 0.052562 0.551854
3 -0.402357 -0.014239
4 -0.031648 0.155578
Finding Optimum Value of K
i) Elbow Method with Within-Cluster-Sum of Squared Error
(WCSS)
Let us again use the elbow method with Within-Cluster-Sum of Squared Error (WCSS) to
determine the optimum value of K. From the graph it looks like there is a bend between 5 and
6.
In [16]:
K=range(2,12)
wss = []
for k in K:
kmeans=cluster.KMeans(n_clusters=k)
kmeans=kmeans.fit(pca_df)
wss_iter = kmeans.inertia_
wss.append(wss_iter)
In [17]:
https://machinelearningknow ledge.ai/tutorial-for-k-means-clustering-in-python-sklearn/ 27/35
3/12/24, 2:43 PM Tutorial for K Means Clustering in Python Sklearn - MLK - Machine Learning Know ledge
plt.xlabel('K')
plt.ylabel('Within-Cluster-Sum of Squared Errors (WSS)')
plt.plot(K,wss)
Out[17]:
···
https://machinelearningknow ledge.ai/tutorial-for-k-means-clustering-in-python-sklearn/ 28/35
3/12/24, 2:43 PM Tutorial for K Means Clustering in Python Sklearn - MLK - Machine Learning Know ledge
ii) The Silhouette Method
Using the Silhouette method, it can be seen that the Silhouette value is maximum for K=5.
Hence it can be concluded that the dataset can be segmented properly with 6 clusters.
In[18]:
import sklearn.cluster as cluster
import sklearn.metrics as metrics
for i in range(2,12):
labels=cluster.KMeans(n_clusters=i,random_state=200).fit(pca_df).labels_
print ("Silhouette score for k(clusters) = "+str(i)+" is "
+str(metrics.silhouette_score(pca_df,labels,metric="euclidean",sample_siz
Out[18]:
Silhouette score for k(clusters) = 2 is 0.4736269407502857
Silhouette score for k(clusters) = 3 is 0.44839082753844756
Silhouette score for k(clusters) = 4 is 0.43785291876777566
Silhouette score for k(clusters) = 5 is 0.45130680489606634
Silhouette score for k(clusters) = 6 is 0.4507847568968469
Silhouette score for k(clusters) = 7 is 0.4458795480456887
Silhouette score for k(clusters) = 8 is 0.4132957148795121
Silhouette score for k(clusters) = 9 is 0.4170428610065107
Silhouette score for k(clusters) = 10 is 0.4309783655094101
Silhouette score for k(clusters) = 11 is 0.42535265774570674
https://machinelearningknow ledge.ai/tutorial-for-k-means-clustering-in-python-sklearn/ 29/35