0% found this document useful (0 votes)
4 views4 pages

K Means Clustering

The document outlines a data analysis process using the Iris dataset with KMeans clustering. It includes steps for data scaling, determining the optimal number of clusters using the Elbow method, and visualizing the clusters. The final Silhouette Score of 0.457 indicates moderate clustering quality for k=3.

Uploaded by

regularuse0001
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views4 pages

K Means Clustering

The document outlines a data analysis process using the Iris dataset with KMeans clustering. It includes steps for data scaling, determining the optimal number of clusters using the Elbow method, and visualizing the clusters. The final Silhouette Score of 0.457 indicates moderate clustering quality for k=3.

Uploaded by

regularuse0001
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Name : Aachal Patil

PRN : 2223000817

Roll No: B44

from sklearn import datasets


from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans
import pandas as pd
iris = datasets.load_iris()
iris_df = pd.DataFrame(data=iris.data, columns=iris.feature_names)
iris_df['target'] = iris.target
iris_df.head()

sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) target

0 5.1 3.5 1.4 0.2 0

1 4.9 3.0 1.4 0.2 0

2 4.7 3.2 1.3 0.2 0

3 4.6 3.1 1.5 0.2 0

4 5.0 3.6 1.4 0.2 0

Next steps: Generate code with iris_df


toggle_off View recommended plots New interactive sheet

scaler = StandardScaler()
scaled_data = scaler.fit_transform(iris.data)

sse = []
for k in range(1, 11):
km = KMeans(n_clusters=k, random_state=2)
km.fit(scaled_data)
sse.append(km.inertia_)

import seaborn as sns


import matplotlib.pyplot as plt
sns.set_style("whitegrid")
g = sns.lineplot(x=range(1, 11), y=sse)
g.set(xlabel="Number of cluster (k)",
ylabel="Sum Squared Error",
title='Elbow Method')

plt.show()
kmeans = KMeans(n_clusters = 3, random_state = 2)
kmeans.fit(scaled_data)

▾ KMeans i ?

KMeans(n_clusters=3, random_state=2)

import seaborn as sns


import matplotlib.pyplot as plt
import pandas as pd

scaled_data_df = pd.DataFrame(scaled_data)

scaled_data_df['Cluster'] = kmeans.labels_

plt.figure(figsize=(8, 6))

sns.scatterplot(x=scaled_data_df.iloc[:, 0], y=scaled_data_df.iloc[:, 1], hue='Cluster', palette='deep', data=scaled_data_df, s=100, edgecolor='k', marker='o')

plt.title('Clusters Visualized (First Two Features)', fontsize=16)


plt.xlabel('Feature 1', fontsize=12)
plt.ylabel('Feature 2', fontsize=12)
plt.legend(title='Cluster')
plt.show()

from sklearn.metrics import silhouette_score


score = silhouette_score(scaled_data, kmeans.labels_ )
print(f"Silhouette Score: {score:.3f}")

Silhouette Score: 0.457

REPORT:

Silhouette Score is 0.457 for k=3

The score is in between 0.2 and 0.5 which indicates moderate clustering quality

You might also like