0% found this document useful (0 votes)
5 views4 pages

085

The document demonstrates the use of KMeans clustering on the Iris dataset using Python libraries such as sklearn and pandas. It includes data preprocessing, fitting the model, and visualizing the clusters along with the Elbow Method to determine the optimal number of clusters. The output includes cluster assignments and a plot of the sum of squared errors for different values of k.

Uploaded by

mohapatram2001
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views4 pages

085

The document demonstrates the use of KMeans clustering on the Iris dataset using Python libraries such as sklearn and pandas. It includes data preprocessing, fitting the model, and visualizing the clusters along with the Elbow Method to determine the optimal number of clusters. The output includes cluster assignments and a plot of the sum of squared errors for different values of k.

Uploaded by

mohapatram2001
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

7.

Input:-

from sklearn.cluster import KMeans

import pandas as pd

from sklearn.preprocessing import MinMaxScaler

from matplotlib import pyplot as plt

from sklearn.datasets import load_iris

%matplotlib inline

iris = load_iris()

df = pd.DataFrame(iris.data,columns=iris.feature_names)

df.head()

Output:-

sepal length (cm) sepal width (cm) petal length (cm) petal width (cm)

0 5.1 3.5 1.4 0.2

1 4.9 3.0 1.4 0.2

2 4.7 3.2 1.3 0.2

3 4.6 3.1 1.5 0.2

4 5.0 3.6 1.4 0.2

km = KMeans(n_clusters=3)

yp = km.fit_predict(df)

yp

output:-

/usr/local/lib/python3.10/dist-packages/sklearn/cluster/_kmeans.py:870: FutureWarning: The


default value of `n_init` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to
suppress the warning

warnings.warn(

array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,

1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,

1, 1, 1, 1, 1, 1, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,

0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 2, 2, 2, 2, 0, 2, 2, 2,

2, 2, 2, 0, 0, 2, 2, 2, 2, 0, 2, 0, 2, 0, 2, 2, 0, 0, 2, 2, 2, 2,

2, 0, 2, 2, 2, 2, 0, 2, 2, 2, 0, 2, 2, 2, 0, 2, 2, 0], dtype=int32)

df['cluster'] = yp

df

output:-

sepal length (cm) sepal width (cm) petal length (cm) petal width (cm)
cluster

0 5.1 3.5 1.4 0.2 1

1 4.9 3.0 1.4 0.2 1

2 4.7 3.2 1.3 0.2 1

3 4.6 3.1 1.5 0.2 1

4 5.0 3.6 1.4 0.2 1

... ... ... ... ... ...

145 6.7 3.0 5.2 2.3 2

146 6.3 2.5 5.0 1.9 0

147 6.5 3.0 5.2 2.0 2

148 6.2 3.4 5.4 2.3 2

149 5.9 3.0 5.1 1.8 0

df1 = df[df.cluster==0]

df2 = df[df.cluster==1]

df3 = df[df.cluster==2]

plt.scatter(df1['petal length (cm)'],df1['petal width (cm)'],color='blue')

plt.scatter(df2['petal length (cm)'],df2['petal width (cm)'],color='green')

plt.scatter(df3['petal length (cm)'],df3['petal width (cm)'],color='yellow')


sse = []

k_rng = range(1,10)

for k in k_rng:

km = KMeans(n_clusters=k)

km.fit(df)

sse.append(km.inertia_)

sse

output:-

[777.5306,

247.6317486870259,

78.851441426146,

59.348765576102416,

46.985061122661136,

40.014722118528,

35.193121336329575,
31.15838077200577,

29.096493888464483]

plt.xlabel('K')

plt.ylabel('Sum of squared error')

plt.plot(k_rng,sse)

plt.title('The Elbow Method showing the optimal value of k')

output:-

You might also like