45B AIML Practical07 Clustering
45B AIML Practical07 Clustering
import numpy as nm
import matplotlib.pyplot as mtp
import pandas as pd
Carlo [1962,
0 Italy 0.0 3.0 0.0
Abate 1963]
[1968,
Andrea 1970,
3 de Italy 1971, 0.0 36.0 30.0
Adamich 1972,
1973]
Philippe
4 Belgium [1994] 0.0 2.0 2.0
Adams
Emilio
863 Spain [1976] 0.0 1.0 0.0
Zapico
Zhou
864 China [2022] 0.0 23.0 23.0
Guanyu
[1999,
2000,
Ricardo
865 Brazil 2001, 0.0 37.0 36.0
Zonta
2004,
2005]
[1975,
Renzo
866 Italy 1976, 0.0 7.0 7.0
Zorzi
1977]
[1979,
Ricardo
867 Argentina 1980, 0.0 11.0 10.0
Zunino
1981]
x = dataset[['Race_Entries', 'Race_Starts']]
#finding optimal number of clusters using the elbow method
from sklearn.cluster import KMeans
wcss_list= [] #Initializing the list for the values of WCSS
#Using for loop for iterations from 1 to 10.
for i in range(1, 11):
kmeans = KMeans(n_clusters=i, init='k-means++', max_iter=300, n_init=10, random_state=0)
kmeans.fit(x)
wcss_list.append(kmeans.inertia_)
mtp.plot(range(1, 11), wcss_list)
mtp.title('The Elbow Method Graph')
mtp.xlabel('Number of clusters(k)')
mtp.ylabel('wcss_list')
mtp.show()
https://colab.research.google.com/drive/1lMbyWJ57Y8yEjDX02i2XrF_nKrKep-Fk#scrollTo=Z7Y3hDiCyg3e&printMode=true 1/7
3/8/24, 9:38 PM 45B_AIML_Practical07_Clustering.ipynb - Colaboratory
# Plotting centroids
mtp.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1], s=300, c='yellow', label='Centroid')
/usr/local/lib/python3.10/dist-packages/sklearn/cluster/_kmeans.py:870: FutureWarning
warnings.warn(
https://colab.research.google.com/drive/1lMbyWJ57Y8yEjDX02i2XrF_nKrKep-Fk#scrollTo=Z7Y3hDiCyg3e&printMode=true 2/7
3/8/24, 9:38 PM 45B_AIML_Practical07_Clustering.ipynb - Colaboratory
keyboard_arrow_down 2. K-Medoids:
I. Importing Packages & Loading Dataset:
SOURCE CODE:
Collecting scikit-learn-extra
Downloading scikit_learn_extra-0.3.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.0 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.0/2.0 MB 8.9 MB/s eta 0:00:00
Requirement already satisfied: numpy>=1.13.3 in /usr/local/lib/python3.10/dist-packages (from scikit-learn-extra) (1.25.2)
Requirement already satisfied: scipy>=0.19.1 in /usr/local/lib/python3.10/dist-packages (from scikit-learn-extra) (1.11.4)
Requirement already satisfied: scikit-learn>=0.23.0 in /usr/local/lib/python3.10/dist-packages (from scikit-learn-extra) (1.2.2)
Requirement already satisfied: joblib>=1.1.1 in /usr/local/lib/python3.10/dist-packages (from scikit-learn>=0.23.0->scikit-learn-
Requirement already satisfied: threadpoolctl>=2.0.0 in /usr/local/lib/python3.10/dist-packages (from scikit-learn>=0.23.0->scikit
Installing collected packages: scikit-learn-extra
Successfully installed scikit-learn-extra-0.3.0
{'data': array([[1.423e+01, 1.710e+00, 2.430e+00, ..., 1.040e+00, 3.920e+00,
1.065e+03],
[1.320e+01, 1.780e+00, 2.140e+00, ..., 1.050e+00, 3.400e+00,
1.050e+03],
[1.316e+01, 2.360e+00, 2.670e+00, ..., 1.030e+00, 3.170e+00,
1.185e+03],
...,
[1.327e+01, 4.280e+00, 2.260e+00, ..., 5.900e-01, 1.560e+00,
8.350e+02],
[1.317e+01, 2.590e+00, 2.370e+00, ..., 6.000e-01, 1.620e+00,
8.400e+02],
[1.413e+01, 4.100e+00, 2.740e+00, ..., 6.100e-01, 1.600e+00,
5.600e+02]]),
'target': array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2]),
'frame': None,
'target_names': array(['class_0', 'class_1', 'class_2'], dtype='<U7'),
'DESCR': '.. _wine_dataset:\n\nWine recognition dataset\n------------------------\n\n**Data Set Characteristics:**\n\n
:Number of Instances: 178\n :Number of Attributes: 13 numeric, predictive attributes and the class\n :Attribute
Information:\n \t\t- Alcohol\n \t\t- Malic acid\n \t\t- Ash\n\t\t- Alcalinity of ash \n \t\t- Magnesium\n\t\t- Total phenols\n
\t\t- Flavanoids\n \t\t- Nonflavanoid phenols\n \t\t- Proanthocyanins\n\t\t- Color intensity\n \t\t- Hue\n \t\t- OD280/OD315 of
diluted wines\n \t\t- Proline\n\n - class:\n - class_0\n - class_1\n - class_2\n\t\t\n
:Summary Statistics:\n \n ============================= ==== ===== ======= =====\n Min
Max Mean SD\n ============================= ==== ===== ======= =====\n Alcohol: 11.0 14.8
13.0 0.8\n Malic Acid: 0.74 5.80 2.34 1.12\n Ash: 1.36 3.23 2.36
0.27\n Alcalinity of Ash: 10.6 30.0 19.5 3.3\n Magnesium: 70.0 162.0 99.7 14.3\n
Total Phenols: 0.98 3.88 2.29 0.63\n Flavanoids: 0.34 5.08 2.03 1.00\n
Nonflavanoid Phenols: 0.13 0.66 0.36 0.12\n Proanthocyanins: 0.41 3.58 1.59 0.57\n Colour
Intensity: 1.3 13.0 5.1 2.3\n Hue: 0.48 1.71 0.96 0.23\n OD280/OD315 of
diluted wines: 1.27 4.00 2.61 0.71\n Proline: 278 1680 746 315\n
============================= ==== ===== ======= =====\n\n :Missing Attribute Values: None\n :Class Distribution: class_0
(59), class_1 (71), class_2 (48)\n :Creator: R.A. Fisher\n :Donor: Michael Marshall (MARSHALL%[email protected])\n
:Date: July, 1988\n\nThis is a copy of UCI ML Wine recognition datasets.\nhttps://archive.ics.uci.edu/ml/machine-learning-
databases/wine/wine.data\n\nThe data is the results of a chemical analysis of wines grown in the same\nregion in Italy by three
different cultivators. There are thirteen different\nmeasurements taken for different constituents found in the three types
of\nwine.\n\nOriginal Owners: \n\nForina, M. et al, PARVUS - \nAn Extendible Package for Data Exploration, Classification and
Correlation. \nInstitute of Pharmaceutical and Food Analysis and Technologies,\nVia Brigata Salerno, 16147 Genoa,
Italy.\n\nCitation:\n\nLichman, M. (2013). UCI Machine Learning Repository\n[https://archive.ics.uci.edu/ml]. Irvine, CA:
University of California,\nSchool of Information and Computer Science. \n\n.. topic:: References\n\n (1) S. Aeberhard, D.
Coomans and O. de Vel, \n Comparison of Classifiers in High Dimensional Settings, \n Tech. Rep. no. 92-02, (1992), Dept. of
sw = []
for i in range(2, 11):
kMedoids = KMedoids(n_clusters=i, random_state=0)
kMedoids.fit(x_scaled)
y_kmed = kMedoids.predict(x_scaled)
silhouette_avg = silhouette_score(x_scaled, y_kmed)
sw.append(silhouette_avg)
https://colab.research.google.com/drive/1lMbyWJ57Y8yEjDX02i2XrF_nKrKep-Fk#scrollTo=Z7Y3hDiCyg3e&printMode=true 4/7
3/8/24, 9:38 PM 45B_AIML_Practical07_Clustering.ipynb - Colaboratory
import numpy as np
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2]
we see 3 observations are added over here.- (181, 13)
SOURCE CODE:
https://colab.research.google.com/drive/1lMbyWJ57Y8yEjDX02i2XrF_nKrKep-Fk#scrollTo=Z7Y3hDiCyg3e&printMode=true 5/7
3/8/24, 9:38 PM 45B_AIML_Practical07_Clustering.ipynb - Colaboratory
scaler = StandardScaler().fit(m)
x_scaled_extreme = scaler.transform(m)
0.7016574585635359
output
SOURCE CODE:
https://colab.research.google.com/drive/1lMbyWJ57Y8yEjDX02i2XrF_nKrKep-Fk#scrollTo=Z7Y3hDiCyg3e&printMode=true 6/7
3/8/24, 9:38 PM 45B_AIML_Practical07_Clustering.ipynb - Colaboratory
https://colab.research.google.com/drive/1lMbyWJ57Y8yEjDX02i2XrF_nKrKep-Fk#scrollTo=Z7Y3hDiCyg3e&printMode=true 7/7