KNN, Kmeans
KNN, Kmeans
MACHINE LEARNING
Approved by AICTE, New Delhi | Affiliated to JNTUH, Hyderabad | Accredited by NAAC “A” Grade |
Departments : CSE,ECE & Mech are Accredited by NBA | Hyderabad | PIN: 500068
CERTIFICATE
BRANCH : CSE
REGULATION : R18
VISION
To be a centre of excellence in technical education to empower the
young talent through quality education and innovative engineering for well-
being of the society.
MISSION
1. Provide quality education with innovative methodology and intellectual
human capital.
2. Provide conducive environment for research and developmental
activities.
3. Inculcate holistic approach towards nature, society and human ethics with
lifelong learning attitude.
Vision
Mission
Computer Science & Engineering (CSE) is one of the most prominent technical
fields in Engineering. The curriculum offers courses with various areas of
emphasis on theory, design and experimental work. Subject matter ranges from
basics of Computers & Programming Languages to Compiler Design and Cloud
Computing. It maintains strong tie-ups with industry and is dedicated to preparing
students for a career in Web Technologies, Object Oriented Analysis and Design,
Networking & Security, Databases, Data Mining & Data Warehousing and
Software Testing.
13. Proficiency on the contemporary skills towards development of innovative apps and
firmware products.
14. Capabilities to participate in the construction of software systems of varying complexity.
Things to Do:
1) Students should not bring any electronic gadgets into the lab.
2) They should not come late.
3) You should not create any disturbances to others.
Course Outcomes: After the completion of the course the student can able to:
understand complexity of Machine Learning algorithms and their limitations;
understand modern notions in data analysis-oriented computing;
be capable of confidently applying common Machine Learning algorithms in practice and
implementing their own;
be capable of performing experiments in Machine Learning using real-world data.
List of Experiments
1. The probability that it is Friday and that a student is absent is 3 %. Since there are 5 school
days in a week, the probability that it is Friday is 20 %. What is theprobability that a student is
absent given that today is Friday? Apply Baye’s rule in python to get the result. (Ans: 15%)
4. Given the following data, which specify classifications for nine combinations of VAR1 predict a
classification for a case where VAR1=0.906 and VAR2=0.606, using the result of kmeans
clustering with 3 means (i.e., 3 centroids)
VAR1 VAR2 CLASS
1.713 1.586 0
0.180 1.786 1
0.353 1.240 1
0.940 1.566 0
1.486 0.759 1
1.266 1.106 0
1.540 0.419 1
0.459 1.799 1
0.773 0.186 1
5. The following training examples map descriptions of individuals onto high, medium and low
credit-worthiness.
medium skiing design single twenties no ->highRisk
high golf trading married forties yes ->lowRisk
low speedway transport married thirties yes ->medRisk
medium football banking single thirties yes ->lowRisk
high flying media married fifties yes ->highRisk
low football security single twenties no ->medRisk
medium golf media single thirties yes ->medRisk
high skiing banking single thirties yes ->highRisk
low golf unemployed married forties yes ->highRisk
Input attributes are (from left to right) income, recreation, job, status, age-group, home-owner. Find the
unconditional probability of `golf' and the conditional probability of `single' given `medRisk' in the
dataset?
1 2 3 4 5 6 7 8 9 10 11 12 i ii
CO1 3 3 2 3 3 2 1 2 2 0 1 3 3 3
CO2 2 3 2 3 3 2 1 2 2 0 1 3 2 2
CO3 3 2 3 3 2 1 2 2 0 1 3 2 2 3
CO4 3 2 3 3 2 1 2 2 0 1 3 2 2 2
CO5 3 2 3 3 2 1 2 2 0 1 3 2 3 2
CO6 3 2 3 3 2 1 2 2 0 1 3 2 2 3
Avg 2.83 2.3 2.6 3 2.3 1.3 1.6 2 0.6 0.6 2.3 2.3 2.3 2.5
MACHINE LEARNING:
Introduction:
Machine learning is a subfield of artificial intelligence that enables the systems to learn
and improve from experience without being explicitly programmed. Machine learning
algorithms detect patterns in data and learn from them, in order to make their own predictions. In
traditional programming, a computer engineer writes a series of directions that instruct a
computer how to transform input data into a desired output. Machine learning, on the other hand,
is an automated process that enables machines to solve problems with little or no human input,
and take actions based on past observations.
Machine learning can be put to work on massive amounts of data and can perform much
more accurately than humans. It helps us to save time and money on tasks and analyses,
like solving customer pain points to improve customer satisfaction, support ticket automation,
and data mining from internal sources and all over the internet.
The four most common and most used types of machine learning:
I. Supervised Learning:
Data is labeled to tell the machine what patterns (similar words and images, data categories,
etc.) it should be looking for and recognize connections with.
Here we have a dataset of different types of shapes which includes square, rectangle, triangle,
and Polygon. Now the first step is that we need to train the model for each shape.
o If the given shape has four sides, and all the sides are equal, then it will be labelled as
a Square.
o If the given shape has three sides, then it will be labelled as a triangle.
o If the given shape has six equal sides then it will be labelled as hexagon.
Now, after training, we test our model using the test set, and the task of the model is to
identify the shape.The machine is already trained on all types of shapes, and when it finds a new
shape, it classifies the shape on the bases of a number of sides, and predicts the output.
Supervised Machine Learning algorithm can be broadly classified into Regression and
Classification Algorithms
1. Regression:
Regression algorithms are used if there is a relationship between the input variable and
the output variable. It is used for the prediction of continuous variables, such as Weather
forecasting, Market Trends, etc. Below are some popular Regression algorithms which come
under supervised learning:
o Linear Regression
o Regression Trees
o Non-Linear Regression
o Polynomial Regression
2. Classification:
Classification algorithms are used when the output variable is categorical, which means
there are two classes such as Yes-No, Male-Female, True-false, etc.
o Random Forest
o Decision Trees
o Logistic Regression
supervised learning, we have the input data but no corresponding output data. The goal of
unsupervised learning is to find the underlying structure of dataset, group that data according to
similarities, and represent that dataset in a compressed format.
Here, unlabeled input data is considered, which means it is not categorized and
corresponding outputs are also not given. Now, this unlabeled input data is fed to the machine
learning model in order to train it. Firstly, it will interpret the raw data to find the hidden patterns
from the data and then will apply suitable algorithms such as k-means clustering, Decision tree,
etc.
Once it applies the suitable algorithm, the algorithm divides the data objects into groups
according to the similarities and difference between the objects.
Unsupervised Machine Learning algorithm can be broadly classified into Clustering and
Association Algorithms.
1. Clustering: Clustering is a method of grouping the objects into clusters such that objects
with most similarities remains into a group and has less or no similarities with the objects of
another group. Cluster analysis finds the commonalities between the data objects and
categorizes them as per the presence and absence of those commonalities.
o K-means clustering
o Hierarchal clustering
o Anomaly detection
o Neural Networks
o Apriori algorithm
Text classification: In text classification, the goal is to classify a given text into one or more
predefined categories. Semi-supervised learning can be used to train a text classification model
using a small amount of labeled data and a large amount of unlabeled text data.
Image classification: In image classification, the goal is to classify a given image into one or
more predefined categories. Semi-supervised learning can be used to train an image
classification model using a small amount of labeled data and a large amount of unlabeled
image data.
Anomaly detection: In anomaly detection, the goal is to detect patterns or observations that
are unusual or different from the norm.
Reinforcement Learning (RL) is the science of decision making. It is about learning the
optimal behavior in an environment to obtain maximum reward. In RL, the data is accumulated
from machine learning systems that use a trial-and-error method. Data is not part of the input
that we would find in supervised or unsupervised machine learning.
Reinforcement learning uses algorithms that learn from outcomes and decide which
action to take next. After each action, the algorithm receives feedback that helps it determine
whether the choice it made was correct, neutral or incorrect. It is a good technique to use for
automated systems that have to make a lot of small decisions without human guidance.
PROGRAM 1
1. a. The probability that it is Friday and that a student is absent is 3%. The probability
that it is Friday is 20%. What is the probability that a student is absent given that today is
Friday? Apply Baye’s rule in python to get the result.
AIM: To find the probability that a student is absent given that today is Friday
THEORY:
Baye’s theorem gives the formula for determining conditional probability. Conditional
probability is the likelihood of an outcome occurring, based on a previous outcome having
occurred in similar circumstances. Baye’s theorem provides a way to revise existing predictions
or theories i.e., update the probabilities given new or additional evidence.
Bayes' theorem relies on incorporating prior probability distributions in order to generate
posterior probabilities. Prior probability, in Bayesian statistical inference, is the probability of an
event occurring before new data is collected. Posterior probability is the revised probability of an
event occurring after taking into consideration the new information.
where, P(A)= The probability of A occurring
P(B)= The probability of B occurring
P(AIB)=The probability of A given B
P(BIA)= The probability of B given A
P(AᴖB)= The probability of both A and B occurring
FLOW CHART:
SOURCE CODE:
#pf is for p(F)
pf =0.20
paif = 0.03
#pa is for p(A)
pa= paif/pf
# pfa is for p(F/A)
pfa =paif/pa
#paf is for p(A/F)
paf=(pfa*pa)/pf
#convert to percentage
r=paf*100
#print the result
print(“The probability of a given student is absent given that day is Friday =”,int(r),”%”)
Output:The probability of given student is absent given that day is Friday =15%
Result: Thus the program to find probability of student absent given that day is Friday is
executed and the output is verified.
1.b10% of the patients entering into a clinic are having liver disease. 5% of the patients are
alcoholic .The probability of that patient is alcoholic given that they have liver disease is
7%.Find the probability that a patient having liver disease give that alcoholic.
AIM: To write a program to find probability of liver disease patients to that of alcoholic.
SOURCE CODE:
Result: Thus the program to find the probability of liver disease patients to that of alcoholic is
executed and the output is verified.
PROGRAM 2
2.a. AIM: To write a python program to fetch and display records from the product table.
Flow Chart:
SOURCE CODE:
BACKEND:
$ sudomysql
mysql>create user ‘USERNAME’ @’Localhost’ identified by ‘PASSWORD’;
mysql>grant all on *.* to ‘USERNAME’ @ ‘Localhost’;
mysql>flush privileges;
mysql>exit
$ mysql -u USERNAME -p
Enter password: PASSWORD
mysql>create database DATABASENAME;
mysql> use DATABASENAME;
mysql>create table product(pcodevarchar(20),pnamevarchar(30));
mysql>Insert into product values(“p101”,”A”);
mysql>Insert into product values(“p102,”B);
mysql>Insert into product values(“p103,”c);
mysql>exit
FRONT END:
importmysql.connector
d=mysql.connector.connect(host=”Localhost”,user=”USERNAME”,password=”PASSWORD”,s
database=”DATABASENAME”)
print(d)
k=d.cursor()
k.execute(“select * from product”)
r=k.fetchall()
print(“the records from products table are”)
for I in r:
print(i)
Output:
The records from product table are
(‘p101’,’A’)
(‘p102’,’B’)
(‘p103’,’C’)
Result:To fetch and display records from product table is executed and verified.
2. b.AIM:To write a python program to fetch and display records from customer table in
descending order by names.
SOURCE CODE:
BackEnd:
$ mysql -u USERNAME -p
Enter password:PASSWORD
mysql>use DATABASENAME;
mysql>create table customer( cname char(30),caddress char(100), cmobile_no real);
FrontEnd:
importmysql.connector
d=mysql.connector.connect(host=”Localhost”,user=”USERNAME”,password=”PASSWORD”,s
database=”DATABASENAME”)
print(d)
k=d.cursor()
k.execute(“select * from customers ORDER BY cname DESC”)
r=k.fetchall()
print(“the records from customer table are”)
fori in r:
print(i)
Output:
The records from customer table are
(Z1,Mumbai,965432198)
(H1,Hyderabad,7123456789)
(A1,Delhi,0876543210)
Result:Thus the program to fetch and display records from customer table executed and verified.
2.c. AIM:To write a program to design GUI using tkinter to read data and store into database.
SOURCE CODE:
BackEnd:
$ mysql -u USERNAME -p
ENTER PASSWORD: PASSWORD
mysql>use DATABASENAME;
mysql>create table cplayer( cname char(100), crunsint);
mysql>exit
FrontEnd:
importtkinter as t
importmysql.connector
def f1():
x = v1.get()
y = v2.get()
d = mysql.connector.connect(host = 'localhost', user = 'USERNAME', password =
'PASSWORD’, database = 'DATABASE')
c = d.cursor()
s = "insert into cplayer(cname,cruns)values(%s,%s)"
a = (x, int(y))
c.execute(s,a)
d.commit()
d.close()
w = t.Tk()
l1 = t.Label(text = "player name")
l1.place(x = 100, y = 50)
v1 = t.StringVar()
t1 = t.Entry(text = " ", textvariable = v1)
t1.place(x = 200, y = 50)
l2 = t.Label(text = "player runs")
l2.place(x = 100, y = 100)
v2 = t.StringVar()
t2 = t.Entry(text = " ", textvariable = v2)
t2.place(x = 200, y = 100)
b = t.Button(text = "submit", command = f1)
b.place(x = 300, y = 200)
w.mainloop()
OUTPUT:
$ mysql -u USERNAME -p
ENTER PASWORD: PASSWORD
mysql> use DATABASE;
mysql>select * from cplayer;
RESULT: Thus the program to design and read values from GUI is executed and the output is
verified.
2.d. AIM:To write a program to design GUI using tkinter to read data and store into database.
SOURCECODE:
BackEnd:
$mysql -u USERNAME -p
ENTER PASSWORD:PASSWORD
mysql>use DATABASENAME;
mysql>create table student (sname char(100),rollnovarchar(20),gender char(20),year int,Branch
char(20));
mysql>exit
FrontEnd:
importtkinter as t
importmysql.connector
def f1():
x = v1.get()
y = v2.get()
p = v3.get()
q = v4.get()
r = v5.get()
d = mysql.connector.connect(host = "localhost", user = "spandana", password =
"abhee123", database = "vir1")
c = d.cursor()
z = "insert into student(sname,rollno,gender,year,branch)values(%s,%s,%s,%s,%s)"
a = (x,y,p,int(q),r)
c.execute(z,a)
d.commit()
d.close()
w = t.Tk()
w.title("Student Registration Form")
l1 = t.Label(text = "Name")
l1.place(x = 100, y =50)
v1 = t.StringVar()
t1 = t.Entry(text = " ", textvariable = v1)
t1.place(x = 200, y = 50)
l2 = t.Label(text = "Roll No.")
l2.place(x = 100, y = 100)
v2 = t.StringVar()
t2 = t.Entry(text = " ", textvariable = v2)
t2.place(x = 200, y = 100)
v3 = t.StringVar()
l3 = t.Label(text = "Gender")
l3.place(x = 100, y = 150)
r1 = t.Radiobutton(w, text = "male", variable = v3, value = "male", command = f1)
r1.pack()
r1.place(x = 100, y =150)
r2 = t.Radiobutton(w, text = "female", variable = v3, value = "female", command = f1)
r2.pack()
r2.place(x = 200, y = 150)
l4 = t.Label(text = "year")
l4.place(x = 100, y = 200)
v4 = t.IntVar()
v4.set("Select your year")
drop = t.OptionMenu(w,v4,"1","2","3","4")
drop.pack()
drop.place(x = 200, y = 250)
l5 = t.Label(w , text= "Branch")
l5.place(x = 100, y = 300)
v5 = t.StringVar()
v5.set("Select your Branch")
drop = t.OptionMenu(w, v5, "CSE","AIML","DS","ECE")
drop.pack
drop.place(x = 200, y = 300)
b = t.Button(text = "Submit", command = f1)
b.place(x = 300, y = 400)
w.mainloop()
OUTPUT:
$ mysql -u USERNAME -p
ENTER PASWORD: PASSWORD
mysql> use DATABASE;
mysql>select * from student;
RESULT:Thus the program to design and read values from GUI is executed and output is
verified.
AIM: Towrite a program to classify a person as overweight, underweight, normal weight using
python
THEORY:
o K-NN algorithm assumes the similarity between the new case/data and available cases
and put the new case into the category that is most similar to the available categories.
o K-NN algorithm can be used for Regression as well as for Classification but mostly it is
used for the Classification problems.
FLOWCHART:
SOURCE CODE:
Department of Computer Science and Engineering, SREYAS Page 25
Machine Learning Lab Manual
Dataset:
$ gedit weight.csv
height,weight,target
137,35,0
137,48,1
137,80,2
140,20,0
140,50,1
140,100,2
141,15,0
141,52,1
141,200,2
143,56,1
143,30,0
143,99,2
145,59,1
145,110,2
145,35,0
146,62,1
146,47,0
146,47,0
146,88,2
148,65,1
148,100,2
148,25,0
150,68,1
150,46,0
150,86,2
155,72,1
155,56,0
155,88,2
161,78,1
161,565,2
161,100,2
170,85,1
170,50,0
170,100,2
174,88,1
174,66,0
174,120,2
182,93,1
182,60,0
182,110,2
187,96,1
187,70,0
187,125,2
193,98,1,
193,65,0
193,100,2
Python Code:
import pandas as pd
importmatplotlib.pyplot as pt
importseaborn as sb
fromsklearn import model_selection
fromsklearn.metrics import confusion_matrix
fromsklearn.neighbors import kNeighborsClassifier
#Data Visualization
df = pd.read_csv("weight.csv")
print(df)
pt.xlabel("height")
pt.ylabel("weight")
df1 = df[df.target = = 0]
df2 = df.[df.target = =1]
df3 = df[df.target = =2]
#Scatter Diagram
pt.scatter(df1["height"], df1["weight"], color = "red", marker = "+")
pt.scatter(df2["height"], df2["weight"], color = "green", marker = "*")
pt.scatter(df3["height"], df3["weight"], color = "black", marker = ".")
pt.show()
#Experiance
x = df.drop(["target"], axis = "columns")
y = df["target"]
xtrain,xtest,ytrain,ytest = model_selection.train_test_split(x,ytest_size = 0.2, random_state = 1)
print(xtrain)
print(ytrain)
knn =KNeighborsClassifier(n_neighbors = 5)
knn.fit(xtrain,ytrain)
#Task
ypredict = knn.predict(xtest)
cm = confusion_matrix(ytest,ypredict)
print("confusion matrix = ",cm)
pt.figure(figsize = (10,5))
sb.heatmap(cm, annot = true)
pt.xlabel("Predicted Value")
pt.ylabel("Actual value from Dataset")
pt.show()
#End User Input
print("Enter Height and Weight")
h = int(input())
w = int(input())
data = {"height" : [h], "weight" : [w]}
k = pd.DataFrame(data)
pt = knn.predict(k[["height","weight"]])
print("predicted target = ", pt)
#Performance
acc = knn.score(xtest,ytest)
acc = int(round(acc,2)*100)
print("accuracy = ", acc, "%")
Output:
weight.csv
43 records
height weight
33 174 66
43 193 100
32 174 88
23 155 72
17 146 88
31 170 100
29 170 85
36 18260
40 187 125
4 140 50
24 155 56
14 145 35
10 143 30
39 187 70
26 161 78
27 161 565
38 187 96
20 148 25
18 148 65
25 155 88
6 141 15
28 161 100
13 145 110
7 141 52
42 193 65
1 137 48
16 146 47
0 140 35
15 143 62
5 140 100
11 143 99
9 143 56
8 141 200
12 145 59
37 182 110
33 0
43 2
32 1
23 1
17 2
31 2
29 1
36 0
40 2
4 1
24 0
14 0
10 0
39 0
26 1
27 2
38 1
20 0
18 1
25 2
6 0
28 2
13 2
7 1
42 0
1 1
16 0
0 0
15 1
5 2
11 2
9 1
8 2
12 1
37 2
Result: Thus the program to classify a person as overweight, underweight, normal weight has
been executed and output is verified.
Source Code:
$ gedit house.csv
area,price
Department of Computer Science and Engineering, SREYAS Page 31
Machine Learning Lab Manual
500,1000000
525,1210000
540,1400000
600,1700000
635,1870000
670,1900000
800,2900000
900,3200000
1000,3600000
1050,3800000
1100,nan
1200,4500000
1250,4600000
1300,5000000
1325,5100000
1350,5200000
1400,5300000
1500,5450000
1550,5600000
1600,5700000
1700,6000000
1725,nan
1800,6500000
18256700000
1877,7200000
1900,7000000
1960,7700000
2000,8500000
2100,8600000
2200,8500000
2500,9000000
2600,8700000
2700,8300000
Python Code:
# import Modules
import pandas as pd
importmatplotlib.myplot as pt
fromsklearn import model_selection
fromsklearn.neighbors import KNeighborsRegressor
#Data visualization
df = pd.read_csv(“house .csv”).
print(df)
pt.xlabel (“area”)
pt-ylabel (‘price’)
pt.plot (df. Area, ds.price)
pt.show()
#Data preprocessing
print(“Missing value information”)
print (df. Is nac). Sum())
df[“price”].fillna (df [“price”] . mean(), limit = 1; inplace=True)
Output:
The missing value information
area 0
price 2
dtype : int64
area price
14 1325 5100000.0
19 1600 5700000.0
3 600 1700000.0
27 2000 8500000.0
31 1600 7700000.0
26 1960 8000000.0
20 1700 70000000.0
Result: Thus the program tofind the price for a given area of a house using KNN as regressor
has been executed and output is verified.
PROGRAM 6
6a.) K Means Clustering
Theory:
K-Means Clustering is an Unsupervised Learning algorithm, It is an iterative algorithm
that divides the unlabeled dataset into k different clusters in such a way that each dataset belongs
only one group that has similar properties.
Source Code:
Data Set:
$ gedit customer1.csv
id,gender,age,income,spendingscore
1,male,19,1245000,39
2,male,21,1245000,81
3,female,20,1328000,6
4,female,23,1328000,77
5,female,31,1411000,40
…….
…….
…….
196,female,35,9960000,79
197,female,45,1e+07,28
198,male,32, 1e+07,74
199,male,32,1e+07,18
200,male,30,1e+07,83
Python Code:
#import modules
importmatplotlib.pyplot as pf
importnumpy as np
fromsklearn.cluster import kmeans
fromsklearn.preprocessing import StandardScaler
import pandas as pd
#data preprocessing,
df = pd.read_csv (“customer1. csv)
print(“data set before scaling”)
print (df)
s=StandardScaler().
t = df.iloc[:, [3,4]]
x = s.fit_transtorm (t)
df [“income”] = x[:,0]
#data visualization
inc=df.iloc[:,3]
ss= df.iloc[:,4]
pt.title(“income vs spending score scatter diagram”)
pt.xlabel (“income)
pt.y label (“spending score”).
pt.scatter(inc, ss)
pt.show()
# to predict clusters
dc = km.predict (y)
dc=km.predict(y)
print(dc)
df [‘cluster’]=dc
print (“datasd after cluster assignment for each example”)
print (df)
print (“centeroids of five clusters”)
cen = km. Clusters_centers_
print(un)
c1 = df [df. clusters ==0]
c2=df [df. clusters == 1]
c3 =df [df. Cluster ==2]
c4=df[df.cluster ==3]
c5=df [df.clusters = =4]
pt.xlabel(“income”)
pt.ylabel (“spending score”)
pt-scatter (c1.income, c1. spending score, color = “red”, label=”cluster1”)
pt-scalter (c2. income, c2∙spending score, color=”blue”, label = ”cluster2”)
pt-scatter (c3.income, c3. spending score, color = “green”, label = ”cluster3”)
Output:
Dataset before scaling:
Result: Thus the program to implement KMeans Clustering algorithm using python has been
executed and the output is verified.
6.b)Given the following data, which specify classifications for nine combinations of VAR1 and
VAR2 predict a classification for a case where VAR1=0.906 and VAR2=0.606, using the result
of k-means clustering with 3 means (i.e., 3centroids)
SOURCE CODE:
importmatplotlib.pyplot as pt
importnumpy as np
import pandas as pd
fromsklearn.cluster import KMeans
fromsklearn.preprocessing import StandardScaler
#data preprocessing
df = pd.read_csv (“var12.csv”)
print(df)
s = StandardScales()
t = df.iloc[:, [0, 1]]
x = sfit_transform (t)
df [VAR1’] = x[:,0]
df [‘VAR2’] = x[:,1]
print (“dataset after scaling”)
print (df)
#data visualization
inc = df.iloc[:,0]
ss=df.iloc[:,1]
pt.title (“var1 vsvar 2 scatter diagram”)
pt.xlabel (VARI”)
pt.ylabel(“VAR2”)
pt scatter (inc, ss)
pt.show()
# find the no g cluster using the elbow method
wcss = [ ]
k=[]
y = df.iloc[:, [0, 1]]
km = KMeans (n_clusters = 3)
wcss.append(km.inertia_)
k.append(1)
pt.title (“no of clusters vs the wess line. Graph”)
pt.xlabel (“k”)
pt.ylabel(“wcss”)
pt.plot (k, wcss)
pt.show()
#fit the data using kmeans al algonthm km=kMeans (n_clusters =3)
km.fit(y)
# To predict the clusters
dc=km.predict (y)
print (dc)
df [“cluster”]=dc
print (“the dataset after the cluster assignment for each example”)
print (df)
print(“the centeroids of fif clusters”)
cen=km.cluster _centers_
print (cen)
c1 = df [df.cluster == 0]
c2=df [df.cluster == 1]
c3 = df [df.cluster == 2]
pt.xlabel (“VAR1”)
pt.ylabel(“VAR2”)
pt.scatter (C, VARI, C1.VAR2, color=”red”, label = “cluster1”)
pt.scatter (C2.VARI, C2. VAR2, color=”blue”, label = “cluster 2”)
pt.scatter (C3.VARI, C3.VAR2, color = “green”, label = “cluster 3”)
pt.scatter (cen [:,0], can [:,1], color = “black”, label = “centroid”)
pt.legend ()
pt.show()
print(’enter the VAR1 VAR2”)
inc = float(input ())
ss=float(input())
data = {“VAR1”: [inc], “VAR2”: [ss]}
k = pd.DataFrame(data)
pc = km.predict (K[[“VAR1”, “VAR 2”]])
print(“The cluster for given input = “, pc)
OUTPUT:
Result: Thus the program to implement KMeans Clustering algorithm using python has been
executed and the output is verified.