0% found this document useful (0 votes)

19 views8 pages

Lab Report6 - B21CI014

Uploaded by

Dikshant Gupta (B21CI014)

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views8 pages

Lab Report6 - B21CI014

Uploaded by

Dikshant Gupta (B21CI014)

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Name - Dikshant Gupta Roll no.

- B21CI014
Prof. - Yashashwi Verma

Subject Name - Introduction to machine learning

LAB REPORT 6

K-means introduction-
One of the most straightforward and well-liked unsupervised
machine learning algorithms is K-means clustering. The dataset's
required number of centroids is indicated by the target number
k. A cluster's center is represented by a centroid, which can be
either an actual or hypothetical location. The word "means" in
the K-means algorithm refers to averaging the data or finding
the centroid. The K-means technique in data mining begins with
the initial set of centroids that are randomly chosen and are
used as the starting points for each cluster in order to process
the learning data. It then uses iterative (repeated)calculations
to optimize the positions of the centroids.
“It is an iterative algorithm that divides the unlabeled dataset
into k different clusters in such a way that each dataset
belongs to only one group that has similar properties.”

The k-means clustering algorithm mainly performs two tasks:

1. Determines the best value for K center points or centroids by

an iterative process.

2. Assigns each data point to its closest k-center. Those data

points which are near to the particular k-center, create a
cluster.

Q.1) a) To begin answering this issue, we must perform some

preprocessing on our dataset, such as checking for null values,
scaling or normalizing our dataset, and visualizing data points
in accordance with the inquiry by selecting all of the dataset's
pairs. After this displaying the scatter plot using the Seaborn
Library and estimating the value of 'k' that is 5, because as we
can see in all the plots that the data points are almost divided
into 5 clusters so we can assume the value of “k” to be equal to
5.

And the code used to plot the graph in which we get the int
representing the number of elements in this object is given
below-

sns.set_style("whitegrid")

sns.pairplot(df,size=3);

plt.show()
And the next code used to plot the graph in which the new
parameter hue represents which column in the data frame, you
want to use for color encoding is given below-

sns.set_style("whitegrid")

sns.pairplot(df,hue =1,size=3);

plt.show()
(b)Using the sci-kit learn library and the value of k=5, depict
the data points in this section of the question by plotting a
scatter plot with centroid points using the k-means library, and
by using different colors for each of the five labels.

And the code used to plot the graph or to create a scatter plot
is given below-

I am explaining the code here step by step firstly The point at

which the elbow shape is created is 5, that is, our K value or
an optimal number of clusters is 5. Now let’s train the model on
the dataset with a number of clusters 5 and the “init” argument
is the method for initializing the centroid.

from sklearn.cluster import KMeans

kmeans = KMeans(n_clusters = 5, init = 'k-means++', max_iter = 300, n_init
= 10, random_state = 0)
y_kmeans = kmeans.fit_predict(x)

Now, y_kmeans gives us different clusters corresponding to X.

Now let’s plot all the clusters using matplotlib. The scatter()
function plots one dot for each observation and the function
legend() is used to Place a legend on the axes where a legend is
an area describing the elements of the graph.

plt.figure(figsize=(10,5))
plt.scatter(x[y_kmeans == 0, 0], x[y_kmeans == 0, 1], s = 100, c =
'orange', label =11.2 )
plt.scatter(x[y_kmeans == 1, 0], x[y_kmeans == 1, 1], s = 100, c = 'red',
label = 12.0)
plt.scatter(x[y_kmeans == 2, 0], x[y_kmeans == 2, 1], s = 100, c =
'green', label = 12.8)
plt.scatter(x[y_kmeans == 3, 0], x[y_kmeans == 3, 1], s = 100, c =
'black', label = 13.6)
plt.scatter(x[y_kmeans == 4, 0], x[y_kmeans == 4, 1], s = 100, c =
'yellow', label = 14.4)

#Plotting the centroids of the cluster

plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:,1], s
= 100, c = 'grey', label = 'Centroids')

plt.legend()

(c)in this part, we use the elbow method for finding the optimal
k value. In the Elbow method, we are actually varying the number
of clusters ( K ) from 1 – 10. For each value of K, we are
calculating WCSS ( Within-Cluster Sum of Square ). WCSS is the
sum of the squared distance between each point and the centroid
in a cluster. When we plot the WCSS with the K value, the plot
looks like an Elbow. As the number of clusters increases, the
WCSS value will start to decrease. WCSS value is largest when K
= 1. When we analyze the graph we can see that the graph will
rapidly change at a point and thus creating an elbow shape. From
this point, the graph starts to move almost parallel to the
X-axis. The K value corresponding to this point is the optimal K
value or an optimal number of clusters.

As we import the dataset above, Now slice the important features

df
x = df.iloc[:,[1,5,13]].values
x

Next, We have to find the optimal K value for clustering the

data. Now we are using the Elbow method to find the optimal K
value.

from sklearn.cluster import KMeans

wcss_list = []
for i in range(1,11):
kmeans = KMeans(n_clusters=i, init='k-means++', random_state= 45)
kmeans.fit(x)
wcss_list.append(kmeans.inertia_)

“init” argument is the method for initializing the centroid. We

calculated the WCSS value for each K value. Now we have to plot
the WCSS with the K value.

plt.plot(range(1, 11), wcss_list)

plt.title('The Elbow Method Graph')
plt.xlabel('Number of clusters(k)')
plt.ylabel('wcss_list')
plt.show()
The point at which the elbow shape is created is 2, that is, our
K value or an optimal number of clusters is 2. Now let’s train
the model on the dataset with a number of clusters 2.

from sklearn.cluster import KMeans

kmeans = KMeans(n_clusters=2, max_iter = 300, n_init = 10, random_state =
40)
y_predict = kmeans.fit_predict(x)

We use the fit predict method that returns for each observation
which clusters it belongs to. The cluster to which the client
belongs and it will return these cluster numbers into a single
vector that is called y K-means and y_kmeans give us different
clusters corresponding to X. Now let’s plot all the clusters
using matplotlib.

plt.figure(figsize=(10,5))
plt.scatter(x[y_predict == 0, 0], x[y_predict == 0, 1], s = 100, c =
'green', label = 'cluster1')
plt.scatter(x[y_predict == 1, 0], x[y_predict == 1, 1], s = 100, c =
'red', label = 'cluster2')

plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:,1], s

= 100, c = 'black', label = 'Centroids')

plt.legend()
As you can see there are 2 clusters in total which are
visualized in different colors and the centroid of each cluster
is visualized in black color.

Google colab file link-

https://colab.research.google.com/drive/1TZRxGVUfWGiYiirMPIBFByDcT7j-Rxgw#scrollTo=Di0
u81sUHJpB

K Means Clustering
No ratings yet
K Means Clustering
11 pages
K Means
No ratings yet
K Means
26 pages
Assignment 4 A
No ratings yet
Assignment 4 A
15 pages
K Means
No ratings yet
K Means
3 pages
DWM Exp7 C49
No ratings yet
DWM Exp7 C49
11 pages
K Means - Ipynb - Colab
No ratings yet
K Means - Ipynb - Colab
10 pages
AAM 7th Prac
No ratings yet
AAM 7th Prac
4 pages
CSC649 Lecture 3 Unsupervised ML - KMeansClustering
No ratings yet
CSC649 Lecture 3 Unsupervised ML - KMeansClustering
22 pages
Unit 4
No ratings yet
Unit 4
22 pages
Wa0033.
No ratings yet
Wa0033.
38 pages
DWM Exp4
No ratings yet
DWM Exp4
9 pages
Data Science Analysis Final Project
No ratings yet
Data Science Analysis Final Project
10 pages
DS Prac 8
No ratings yet
DS Prac 8
4 pages
K-Means Algoritham
No ratings yet
K-Means Algoritham
3 pages
ML - K-Means
No ratings yet
ML - K-Means
12 pages
Experiment 9
No ratings yet
Experiment 9
10 pages
SE KMeansClustering
No ratings yet
SE KMeansClustering
21 pages
Practical 03
No ratings yet
Practical 03
3 pages
Department Of: Computer Science & Engineering
No ratings yet
Department Of: Computer Science & Engineering
4 pages
Aam Unit 4 QB With Answer
No ratings yet
Aam Unit 4 QB With Answer
11 pages
3.1 K - Means
No ratings yet
3.1 K - Means
16 pages
LAB7 Kmeans
No ratings yet
LAB7 Kmeans
11 pages
Stop Using The Elbow Criterion For K-Means
No ratings yet
Stop Using The Elbow Criterion For K-Means
7 pages
Experiment 3.1 K-Mean
No ratings yet
Experiment 3.1 K-Mean
8 pages
Unit 4
No ratings yet
Unit 4
63 pages
Unit 4 Aiml
No ratings yet
Unit 4 Aiml
24 pages
K Means Clustering
No ratings yet
K Means Clustering
1 page
Kmeans Clustering Code
No ratings yet
Kmeans Clustering Code
2 pages
Presentation 1
No ratings yet
Presentation 1
47 pages
Algo
No ratings yet
Algo
59 pages
Elbow Method For Optimal Cluster Number in K-Means
No ratings yet
Elbow Method For Optimal Cluster Number in K-Means
8 pages
K-Means Coding
No ratings yet
K-Means Coding
6 pages
Avinash Tiwari 9
No ratings yet
Avinash Tiwari 9
4 pages
Unit IV
No ratings yet
Unit IV
96 pages
AI Week 11
No ratings yet
AI Week 11
21 pages
Data Mining-4
No ratings yet
Data Mining-4
9 pages
ML Clustering2
No ratings yet
ML Clustering2
11 pages
Airline Reservation
No ratings yet
Airline Reservation
2 pages
K.means Clustering
No ratings yet
K.means Clustering
8 pages
UNIT - 3 - Clustering
No ratings yet
UNIT - 3 - Clustering
21 pages
K Means Clustering
No ratings yet
K Means Clustering
5 pages
"These Are Just Rough Notes For References" What Is K-Means Clustering
No ratings yet
"These Are Just Rough Notes For References" What Is K-Means Clustering
9 pages
Machine Learning Lab
No ratings yet
Machine Learning Lab
20 pages
Experiment 10 Vtu ML
No ratings yet
Experiment 10 Vtu ML
5 pages
Unit 4 Aam
No ratings yet
Unit 4 Aam
26 pages
Da Exp 10
No ratings yet
Da Exp 10
6 pages
01 K Means - Merged
No ratings yet
01 K Means - Merged
26 pages
Kmean
No ratings yet
Kmean
24 pages
Bone Suplement Market Segmentation
No ratings yet
Bone Suplement Market Segmentation
20 pages
K-Means Clustering Algorithm - Javatpoint
No ratings yet
K-Means Clustering Algorithm - Javatpoint
21 pages
Unsupervisd Learning Algorithm
No ratings yet
Unsupervisd Learning Algorithm
6 pages
Lecture 11 K Means Clustering
No ratings yet
Lecture 11 K Means Clustering
8 pages
02.1 K-Means Example
No ratings yet
02.1 K-Means Example
12 pages
K Means
No ratings yet
K Means
25 pages
ML DSBA Lab7
No ratings yet
ML DSBA Lab7
6 pages
K-Means in Python - Solution
No ratings yet
K-Means in Python - Solution
6 pages
K Means Clustering Algorithm
No ratings yet
K Means Clustering Algorithm
12 pages
Class-9 - AI Term-1 - Revision Worksheet Part-B Unit 1.1,1.2,1.3
100% (1)
Class-9 - AI Term-1 - Revision Worksheet Part-B Unit 1.1,1.2,1.3
13 pages
Knowledge Representation in AI
No ratings yet
Knowledge Representation in AI
14 pages
K-Means Clustering Algorithm
No ratings yet
K-Means Clustering Algorithm
13 pages
Data Science Crash Course - SharpSight PDF
100% (3)
Data Science Crash Course - SharpSight PDF
107 pages
Using Machine Learning To Locate Support and Resistance Lines For Stocks
No ratings yet
Using Machine Learning To Locate Support and Resistance Lines For Stocks
14 pages
Polymers 16 03368
No ratings yet
Polymers 16 03368
45 pages
Top 10 Youtube Channels To Learn Python in 2023
No ratings yet
Top 10 Youtube Channels To Learn Python in 2023
19 pages
SRS-Diabetes Detection Using Machine Learning
100% (1)
SRS-Diabetes Detection Using Machine Learning
8 pages
Excel AI For Beginner
No ratings yet
Excel AI For Beginner
83 pages
DrIver Drowsiness Detection PPT - PPTX 20250214 144426 0000
No ratings yet
DrIver Drowsiness Detection PPT - PPTX 20250214 144426 0000
18 pages
Python NumPy and Machine Learning A Comprehensive Guide
No ratings yet
Python NumPy and Machine Learning A Comprehensive Guide
10 pages
Statistical Learning Slides
No ratings yet
Statistical Learning Slides
60 pages
BRI405B
No ratings yet
BRI405B
2 pages
ABC-AI-Big-Data-and-Cloud-Computing
No ratings yet
ABC-AI-Big-Data-and-Cloud-Computing
67 pages
Gujarat Technological University: Bachelor of Engineering
No ratings yet
Gujarat Technological University: Bachelor of Engineering
2 pages
Annexure II Sci
No ratings yet
Annexure II Sci
109 pages
Risk Manegment Cors - p572-629
No ratings yet
Risk Manegment Cors - p572-629
58 pages
Large Scale Disk Failure Prediction PAKDD 2020 Competition and Workshop AI Ops 2020 February 7 May 15 2020 Revised Selected Papers Cheng He
100% (7)
Large Scale Disk Failure Prediction PAKDD 2020 Competition and Workshop AI Ops 2020 February 7 May 15 2020 Revised Selected Papers Cheng He
55 pages
Ai Algorithm Failure
No ratings yet
Ai Algorithm Failure
36 pages
Roboanalyzer Nacomm2011
No ratings yet
Roboanalyzer Nacomm2011
9 pages
Artificial Intelligence Enabled Marketing Solution
No ratings yet
Artificial Intelligence Enabled Marketing Solution
14 pages
An Empirical Comparison of Machine-Learning Methods On Bank Client Credit Assessments
No ratings yet
An Empirical Comparison of Machine-Learning Methods On Bank Client Credit Assessments
23 pages
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
AIMLSyllabus
No ratings yet
AIMLSyllabus
15 pages
ML (Lab 8) Tasks Bilal Habib (5th Semester)
No ratings yet
ML (Lab 8) Tasks Bilal Habib (5th Semester)
16 pages
Deep Learning For Abusive Comment Analysis
No ratings yet
Deep Learning For Abusive Comment Analysis
7 pages
AI Safety
No ratings yet
AI Safety
13 pages
Eel2010 C25
No ratings yet
Eel2010 C25
9 pages
Predicting Sonar Rocks Against Mines With ML
No ratings yet
Predicting Sonar Rocks Against Mines With ML
7 pages
Eel2010 C26
No ratings yet
Eel2010 C26
9 pages
Eel2010 C28
No ratings yet
Eel2010 C28
8 pages
Sahil Chavan Resume 24
No ratings yet
Sahil Chavan Resume 24
1 page
Abdelrahman Mohamed Resume
No ratings yet
Abdelrahman Mohamed Resume
4 pages
Ex 9
No ratings yet
Ex 9
2 pages
EEL2010 Minor2S
No ratings yet
EEL2010 Minor2S
2 pages
Statement of Purpose
No ratings yet
Statement of Purpose
1 page
Shub Hang I Sanjay
No ratings yet
Shub Hang I Sanjay
1 page
Malicious Use of AI - UNCCT-UNICRI Report - Web
No ratings yet
Malicious Use of AI - UNCCT-UNICRI Report - Web
58 pages

Lab Report6 - B21CI014

Uploaded by

Lab Report6 - B21CI014

Uploaded by

Name - Dikshant Gupta Roll no.

Subject Name - Introduction to machine learning

The k-means clustering algorithm mainly performs two tasks:

1. Determines the best value for K center points or centroids by

2. Assigns each data point to its closest k-center. Those data

Q.1) a) To begin answering this issue, we must perform some

I am explaining the code here step by step firstly The point at

from sklearn.cluster import KMeans

Now, y_kmeans gives us different clusters corresponding to X.

#Plotting the centroids of the cluster

As we import the dataset above, Now slice the important features

Next, We have to find the optimal K value for clustering the

from sklearn.cluster import KMeans

“init” argument is the method for initializing the centroid. We

plt.plot(range(1, 11), wcss_list)

from sklearn.cluster import KMeans

plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:,1], s

Google colab file link-

You might also like